<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2694-1473</journal-id>
<journal-title-group>
<journal-title>Journal of the Society for Clinical Data Management</journal-title>
</journal-title-group>
<issn pub-type="epub">2694-1473</issn>
<publisher>
<publisher-name>Society for Clinical Data Management</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.47912/jscdm.164</article-id>
<article-categories>
<subj-group>
<subject>Opinion paper</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>CDISC Implementation in an Academic Research Organization</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Jentoft</surname>
<given-names>Katie</given-names>
</name>
<email>KJENTOFT@mgh.harvard.edu</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Tustison</surname>
<given-names>Eric</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Yu</surname>
<given-names>Hong</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>NCRI, Mass General Hospital</aff>
<aff id="aff-2"><label>2</label>Mass General Hospital</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2022-12-23">
<day>23</day>
<month>12</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>2</volume>
<issue>3</issue>
<elocation-id>3</elocation-id>
<history>
<date date-type="received" iso-8601-date="2022-02-18">
<day>18</day>
<month>02</month>
<year>2022</year>
</date>
<date date-type="accepted" iso-8601-date="2022-11-28">
<day>28</day>
<month>11</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2022 The Author(s)</copyright-statement>
<copyright-year>2022</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>SCDM publishes JSCDM content in an open access manner under a Attribution-Non-Commercial-ShareAlike (CC BY-NC-SA) license. This license lets others remix, adapt, and build upon the work non-commercially, as long as they credit SCDM and the author and license their new creations under the identical terms. See <uri xlink:href="https://creativecommons.org/licenses/by-nc-sa/4.0/">https://creativecommons.org/licenses/by-nc-sa/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://www.jscdm.org/articles/10.47912/jscdm.164/"/>
<abstract>
<sec>
<title>Introduction:</title>
<p>The United States Food and Drug Administration (FDA) requirement for standardized data submissions led our Academic Research Organization (ARO) to use CDISC data standards in clinical trials since January 2018. Implementing CDISC data standards effectively enables standardized data collection and facilitates data submissions to the FDA.</p>
</sec>
<sec>
<title>Objectives:</title>
<p>The objective of this paper is to illustrate the positives and negatives of our ARO&#8217;s three-phased implementation of CDISC data standards, inclusive of partially automated dataset conversion, CDASH case report forms, and Pinnacle 21 data checks. Our ARO shares our experience to support other organizations in standardizing their data for FDA submissions.</p>
</sec>
<sec>
<title>Methods:</title>
<p>Our ARO went through three phases of CDISC data standardization implementation: phase one &#8211; application of CDISC SDTM conversion to non-standardized datasets, phase two &#8211; utilization of CDASH case report forms, phase three &#8211; leveraging ongoing Pinnacle 21 data checks to identify data issues.</p>
</sec>
<sec>
<title>Results:</title>
<p>Phase one required significant time to create a standardized dataset upon study conclusion. Phase two required additional resources for start-up activities but proportionally reduced the overall effort to produce the final dataset. Phase three required investment upon start-up and ongoing targeted data review but aims to reduce the production cost of the final standardized dataset.</p>
</sec>
<sec>
<title>Conclusion:</title>
<p>This evolution of CDISC data standards implementation refined our standardization process to meet FDA requirements, streamlining data collection and overall efficiency of clinical trials. We support collaborations to develop open-source training materials and examples of CDISC data standards implementation to improve the standardization process for other AROs.</p>
</sec>
</abstract>
<kwd-group>
<kwd>Manage Clinical Research Data</kwd>
<kwd>Collect data</kwd>
<kwd>Define Data</kwd>
<kwd>Design Form</kwd>
<kwd>Process Data</kwd>
<kwd>Clean</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Introduction</title>
<p>The Data Management group at the Sean M. Healey &amp; AMG Center for ALS (amyotrophic lateral sclerosis) and Neurological Clinical Research Institute at Massachusetts General Hospital is an academic research organization (ARO) responsible for data management of multicenter clinical trials. We strive to effectively adhere to the United States Food and Drug Administration (FDA) requirements. Particularly with Clinical Data Interchange Standards Consortium (CDISC) standards for data submissions,<sup><xref ref-type="bibr" rid="B1">1</xref><xref ref-type="bibr" rid="B2">2</xref></sup> our team&#8217;s best practices have evolved from reactive to proactive as we developed standardization for multiple trials. We followed an iterative approach, modifying existing processes, developing new tools, and working with CDISC experts. By sharing our experiences, we hope to foster collaboration and assist other AROs facing similar challenges.</p>
</sec>
<sec>
<title>Background</title>
<p>The FDA requirement for standardized data submissions prompted our ARO&#8217;s use of CDISC data standards in clinical trials since January 2018.<sup><xref ref-type="bibr" rid="B1">1</xref><xref ref-type="bibr" rid="B2">2</xref><xref ref-type="bibr" rid="B3">3</xref></sup> The currently supported data standards require CDISC Study Data Tabulation Model (SDTM) for clinical data, which &#8220;provides a standard for organizing and formatting data to streamline processes in collection, management, analysis and reporting.&#8221;<sup><xref ref-type="bibr" rid="B4">4</xref></sup> This model is used for the final dataset submitted to the FDA.<sup><xref ref-type="bibr" rid="B3">3</xref></sup> There are multiple ways to create a compliant dataset. The first way is to convert non-standardized data to standardized study data.<sup><xref ref-type="bibr" rid="B3">3</xref></sup> A second way to create a compliant dataset is to use CDISC Clinical Data Acquisition Standards Harmonization (CDASH) to collect standardized data from the beginning of a study, easing the conversion to SDTM.<sup><xref ref-type="bibr" rid="B5">5</xref></sup> Further, the FDA&#8217;s Pinnacle 21 data checks can be run on CDASH-compliant data collection to support ongoing data review.<sup><xref ref-type="bibr" rid="B6">6</xref></sup> Our ARO&#8217;s experiences with each of these three approaches illustrate the evolution of our standardization procedures. This article will discuss each approach, along with lessons learned and ideas for future improvements.</p>
</sec>
<sec>
<title>Methods</title>
<sec>
<title>Phase one: Early experience with data conversion to SDTM format</title>
<p>One of our early experiences with CDISC data standards was supporting a clinical trial with more than 100 participants started in 2017. We designed the electronic Case Report Forms (eCRFs) according to our internal standards following Good Clinical Data Management Practices but did not use specific CDISC recommendations.<sup><xref ref-type="bibr" rid="B7">7</xref></sup> As we neared study completion, we developed a plan to convert the collected data into the SDTM format. This became a multi-team project with Data Managers (DMs), Systems Analysts (SAs), and an external CDISC consultant working together to achieve the compliant format. The overall goal was to convert the existing dataset to SDTM and then to create the analysis datasets using the Analysis Data Model (ADaM) and the Clinical Study Report (CSR).<sup><xref ref-type="bibr" rid="B8">8</xref></sup></p>
<p>The DMs reviewed the data and issued queries to the sites to resolve discrepancies, ensuring the dataset was as clean as possible prior to conversion. DMs also provided advice to the SAs on SDTM mappings and specification questions. The SAs produced SDTM tables for the data and developed a proprietary data conversion tool to partially automate the process. The consultant mentored us on conversion questions, reviewing, and troubleshooting. The consultant received our dataset tables, exported them to the Pinnacle 21 data review validator,<sup><xref ref-type="bibr" rid="B9">9</xref></sup> and provided feedback. The consultant also helped prepare the final ADaM dataset and CSR. As multiple individuals worked on this effort over varying lengths of time, we performed a retrospective review of the study timeline to determine estimated hours required to convert the data into the SDTM format.</p>
</sec>
<sec>
<title>Phase two: Implementation of CDASH-compliant data collection methods</title>
<p>In 2018, we incorporated CDISC data standards during the design phase of our next trial by developing CDASH compliant data collection to ease the conversion of the final dataset to SDTM.<sup><xref ref-type="bibr" rid="B4">4</xref><xref ref-type="bibr" rid="B5">5</xref></sup> Data collection directly into SDTM format would be unwieldy primarily due to its vertical data structure. The CDASH standards document provides for one-to-one conversion to SDTM for many data fields and prescribes ways to bridge to SDTM when one-to-one conversion is not available.<sup><xref ref-type="bibr" rid="B10">10</xref></sup> For this trial, we began by identifying the common CDASH domains provided in the CDASH standards that we planned to use for data collection.<sup><xref ref-type="bibr" rid="B10">10</xref></sup> Then, using the domain query text recommendations and CDASH eCRF design principles,<sup><xref ref-type="bibr" rid="B10">10</xref></sup> we designed our eCRFs to include the relevant questions for our trial. CDASH questions that were optional and not relevant were excluded. Additional data points desired by the trial sponsor but not in the CDASH domain were either included as additional questions in the same eCRF or as separate eCRFs. For all eCRFs, the data collected had to be coded based on the CDASH standards to enable conversion to SDTM.<sup><xref ref-type="bibr" rid="B10">10</xref></sup> The coding rules are also prescribed in the CDASH standards and could be used as-is for fields taken directly from a domain or customized for original fields as long as the standard structure was maintained.<sup><xref ref-type="bibr" rid="B10">10</xref></sup> This specified approach to eCRFs&#8212;standard design principles, query text, and coding&#8212;significantly transformed how we developed the entire eCRF package and required more up-front work than legacy trials not following CDASH.</p>
<p>The second half of this experience involved converting the dataset to SDTM, which was completed by an external CDISC consultant. Our data conversion tool developed for the 2017 trial was not sufficient for the 2018 trial. Therefore, our ARO decided to outsource the SDTM conversion based on the time and resources available.</p>
</sec>
<sec>
<title>Phase three: Ongoing SDTM conversion and Pinnacle 21 checks throughout trial</title>
<p>In 2020, the third trial we intentionally managed with CDISC in mind followed CDISC data standards more holistically than the previous two trials. In addition to creating eCRFs with CDASH-compliant fields, we also developed a process for ongoing SDTM conversion throughout the trial. We provided SAS data exports every two weeks to an external CDISC consultant, which they used to create and update the SDTMs. After receiving each updated dataset, the consultant exported the SDTM output to the Pinnacle 21 data review validator and communicated back to us any data or structural issues, which we then worked to resolve.<sup><xref ref-type="bibr" rid="B9">9</xref></sup></p>
</sec>
</sec>
<sec>
<title>Results</title>
<p>For the first trial, based on retrospective review of study timeline, it took one full-time equivalent (FTE) approximately seven months, or 1,120 hours, to convert the data into SDTM format.</p>
<p>For the second trial, it took one FTE approximately two months, or 320 hours, to convert the data into SDTM format. The entire conversion effort was outsourced to an external CDISC consultant.</p>
<p>For the third trial, which is still in progress, SDTM conversion is ongoing. Producing the final dataset is expected to require less relative effort than the previous two trials, because we no longer need to complete the entire data conversion at the end of the trial. Specialized data review and data cleaning processes evenly distribute the preparation work for data conversion throughout the trial. While producing the final SDTM dataset is expected to take less time than in the previous trials, it is important to emphasize the significant time investment during study start-up and throughout the trial to prepare for SDTM.</p>
</sec>
<sec>
<title>Discussion</title>
<p>Over the course of these three trials, we developed expertise in CDISC data standards by gradually incorporating CDISC principles, query text, and coding over time, learning it was better to plan for standardization as early as possible in the trial management process rather than assuming it was something best left as part of trial closeout. From not using CDISC data standards prior to 2018 to incorporating CDISC in the earliest trial design phases in 2020, our ARO has come a long way in implementing standardization.</p>
<p>In our 2017 trial, it took significantly more time and effort than an average non-standardized trial to produce a final standardized dataset. As this was our first SDTM conversion, there was a lot to learn while doing the project. Fundamentally we discovered we could not convert all data points successfully as our eCRF data collection tools were not designed to facilitate SDTM. For example, free text data that had to be parsed would often be placed in the Comments (CO) domain in SDTM, which is not as easily accessible for analysis as pre-defined SDTM fields. Additionally, some of the issues identified by the Pinnacle 21 report could not be resolved and had to be explained in the Study Data Reviewer&#8217;s Guide.</p>
<p>The limitations imposed by the initial design proved to be challenging and a key area for improvement in subsequent studies.</p>
<p>In our 2018 trial, we reduced the overall time and effort to produce a similar size dataset, although this approach also required more work to be done at study start-up. This experience gave us a strong understanding of how to design custom eCRFs to build robust and compliant data collection tools. Additionally, the eCRFs we developed for common CDASH domains could be reused in other trials. The SDTM conversion was smoother and more efficient than in the first trial due to the decision to outsource the entire process. Our main challenge in this experience was delayed data cleaning, because Pinnacle 21 data validation was not run while the trial was in progress. We did not fully understand the necessity of Pinnacle 21 and did not allocate resources toward it until SDTM conversion. At that point, we could not implement Pinnacle 21 proactively.</p>
<p>In our 2020 trial, additional effort at start-up was now expected, and ongoing targeted data cleaning and conversion efforts required more work than studies that do not have these processes in place. The overall data cleaning process for CDISC compliance has become more effective, with internal logic checks run on the raw data in real-time and frequent Pinnacle 21 data validation. Because issues can be identified soon after they occur, they can be resolved before they worsen or develop into problematic patterns. We estimate the final effort to produce a standardized dataset upon study closeout will be minimal, because most standardization will have already been completed.</p>
<p>One benefit of our most recent experience implementing concurrent SDTM conversion and Pinnacle 21 checks while a trial was ongoing was to identify eCRF data points that did not convert cleanly to SDTM variables. While the eCRF fields were developed according to CDISC data standards, the actual data entered did not always convert well to SDTM. For example, the general CDISC principle to avoid blank fields did not work well for recording adverse event (AE) outcome dates for ongoing AEs. The original design of our AE eCRF required an outcome date for ongoing AEs to document when the assessment was made that the AE was &#8220;recovering/resolving&#8221; or &#8220;not recovered/not resolved.&#8221; However, we learned from the Pinnacle 21 output that best practice was for AE outcome date to be left blank if the outcome was ongoing, as outcome dates for ongoing events did not convert to SDTM outputs.</p>
<p>The results of ongoing Pinnacle 21 checks also led us to revise eCRF completion guidance to clinical research personnel, which improved the consistency and quality of data entered in the electronic data capture system. As the entire study team aligned to follow CDISC data standards from the first moment of data collection, we pivoted away from traditional wide-sweeping data cleaning methods prior to database lock. Instead, we focused our attention with laser precision on key fields and critical data flow in nearly real-time to fully support the goal of SDTM conversion. This made it a seamlessly integrated step in trial management rather than an awkward burden at the end of a trial. Overall, this iterative process led to improved data quality for the trial through real-time data cleaning that led to more accurate interim analyses and deepened our understanding of CDISC-compliant design for implementation in future trials.</p>
<p>In our efforts to achieve data standardization, we learned the hard way through missed opportunities. We identified areas that needed improvement too late in the process to benefit our early trials. However, these experiences proved to be invaluable for understanding how to revise our processes for subsequent trials to achieve CDISC compliance. Based on our experiences implementing CDISC data standards, we feel there is a real need for AROs to have comprehensive and continuous CDISC training. Ideally it would be broken down into bite-sized pieces, with practice material and many detailed examples. Online resources similar to W3Schools for SQL training,<sup><xref ref-type="bibr" rid="B11">11</xref></sup> which is highly interactive and easy to reference on the Web, would be hugely beneficial for organizations of all sizes. For example, an online module could display a sample eCRF and prompt for conformant CDASH field annotations; the module could autodetect deviations from CDASH annotation principles and display a correct alternative. It would also be less overwhelming than a day or week of formal CDISC instruction from an expert, as it takes ongoing practice to fully understand the principles and goals of these standards. While having expert-led CDISC training can be a great place to start, it would be cost-prohibitive to contract an expert on retainer to answer all the questions that inevitably arise during CDISC implementation, especially for small organizations just getting started with CDISC. Additionally, while we appreciate the extent of CDISC reference material freely available online, we wish it was easier to understand which CDISC documents are needed for which tasks. A virtual look-up tool or visual schematic would help, such as a quick start guide that provides a high-level view with guidance on where to go for more detailed information. Open-source training in both technical and design principles would be key to help all users, especially those who are learning CDISC for the first time.</p>
<p>Reflecting on the results of the upfront work to implement CDISC compliance, our organization saved increasing amounts of time in the preparation of the datasets for analysis during our three phases of CDISC data standardization implementation. It was progressively easier and faster to finalize the second dataset compared to the first, and the third dataset is poised to continue this trend. Because datasets can be finalized more quickly due to CDISC preparation, analysis can also begin more quickly. However, the amount of time spent on data analysis is independent of the time spent on data preparation. Therefore, the absolute analysis time is not affected positively or negatively by using CDISC.</p>
<p>Because data standardization leads to faster data preparation, our experience as an ARO leads us to advocate for required standards for National Institutes of Health (NIH) data sharing. CDISC is a strong contender for data standards, given its widespread use for clinical trial data submitted to the FDA. The challenge is that most of the institutions running NIH-funded studies do not necessarily have the resources to create CDISC-compliant datasets. While it seems redundant to create separate standards, perhaps another standard would be simpler or more cost-effective to implement than CDISC while still enabling NIH studies to achieve standardized data.</p>
<p>We believe it would benefit the research community dramatically if we converted all existing CRFs to standards. In general, the research landscape has changed significantly with the COVID-19 pandemic. Many people are using big data, artificial intelligence, or machine learning in health care research. Having data standards is critical for data aggregation and effective analysis of large datasets. With increased standardization, more knowledge can be derived more quickly than in the past, ultimately leading to new treatments for devastating diseases and improved health care for everyone.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>Overall, the third approach we took to CDISC implementation is the experience we would recommend based on our experiences so far: starting with CDISC data standards in mind from the earliest stages of database development, using CDASH, and running SDTM conversion and Pinnacle 21 checks concurrently with active data collection. Given the resources we had for each trial, we made the best decisions we could to produce CDISC-compliant datasets. Each experience helped refine our understanding and influenced our data management processes for future trials. For an ARO to proactively implement CDISC data standards, we advocate for open-source educational resources and ongoing community discussion to enable standardization for all clinical trials.</p>
</sec>
</body>
<back>
<sec>
<title>Competing Interests</title>
<p>The authors have no competing interests to declare.</p>
</sec>
<ref-list>
<ref id="B1"><mixed-citation publication-type="webpage"><label>1.&#160;</label><collab>U.S. Department of Health &amp; Human Services/U.S</collab>. <article-title>Food &amp; Drug Administration. Study Data Standards: What You Need to Know</article-title>. Published <month>September</month> <year>2017</year>. <uri>https://www.fda.gov/media/98907/download</uri>. Accessed January 4, 2022.</mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="webpage"><label>2.&#160;</label><collab>CDISC</collab>. <uri>cdisc.org. https://www.cdisc.org</uri>. Accessed May 20, 2022.</mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="webpage"><label>3.&#160;</label><collab>U.S. Department of Health &amp; Human Services/U.S</collab>. <article-title>Food &amp; Drug Administration. Study Data Technical Conformance Guide: Technical Specifications Document</article-title>. Published <month>July</month> <year>2020</year>. <uri>https://www.fda.gov/media/136460/download</uri>. Accessed April 11, 2022.</mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="webpage"><label>4.&#160;</label><collab>CDISC</collab>. <article-title>SDTM</article-title>. <uri>https://www.cdisc.org/standards/foundational/sdtm</uri>. Accessed April 14, 2022.</mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="webpage"><label>5.&#160;</label><collab>CDISC</collab>. <article-title>CDASH</article-title>. <uri>https://www.cdisc.org/standards/foundational/cdash</uri>. Accessed April 14, 2022.</mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="webpage"><label>6.&#160;</label><collab>Pinnacle 21</collab>. <uri>Pinnacle21.com. https://www.Pinnacle21.com</uri>. Accessed April 14, 2022.</mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="webpage"><label>7.&#160;</label><collab>GCDMP</collab>. <uri>scdm.org. https://scdm.org/gcdmp/</uri>. Accessed April 14, 2022.</mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="webpage"><label>8.&#160;</label><collab>ADaM</collab>. <uri>cdisc.org. https://www.cdisc.org/standards/foundational/adam</uri>. Accessed April 14, 2022.</mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="webpage"><label>9.&#160;</label><collab>Pinnacle 21</collab>. <article-title>Downloads</article-title>. <uri>https://www.Pinnacle21.com/downloads</uri>. Accessed February 17, 2022.</mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="webpage"><label>10.&#160;</label><collab>CDISC</collab>. <article-title>CDASH v1.1</article-title>. Published <month>January</month> <day>18</day>, <year>2011</year>. <uri>https://www.cdisc.org/system/files/members/standard/foundational/cdash/cdash_std_1_1_2011_01_18.pdf</uri>. Accessed April 15, 2022.</mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="webpage"><label>11.&#160;</label><collab>W3schools</collab>. <article-title>SQL Tutorial</article-title>. <uri>https://www.W3Schools.com/sql/default.asp</uri>. Accessed February 17, 2022.</mixed-citation></ref>
</ref-list>
</back>
</article>