Clinical Data Management in the United States: Where We Have Been and Where We Are Going

Mary A. Banach; Kaye H. Fendt; Johann Proeve; Dale Plummer; Samina Qureshi; Nimita Limaye; Mary A. Banach; Nimita Limaye; Samina Qureshi; Johann Proeve; Dale Plummer; Kaye H. Fendt

doi:10.47912/jscdm.61

CDM ensures quality clinical research data—meaning valid, reliable, and statistically sound data—to make public health decisions. These decisions guarantee safe and effective medical products from original development to marketing of these products. In each section of this review on the state of CDM in the US, the goal is to address how CDM facilitates the acquisition and validation of quality data that can be shared with all constituents.

Brief History of CDM in the US

For many years the US Food and Drug Administration (FDA), the medical products industry, the healthcare industry, and academics have been working to develop methods, standards, and procedures for collecting, storing, transferring, analyzing, and reporting clinical research data. The goal of these efforts is to increase the efficiency of medical product development. This section traces the history and changes in CDM in the US and their impact on data quality.

While some regulations of medical products marketed in the United States date back to the early 1800s, the original Food and Drug Act serves as the foundation of regulations for today’s medical products. The Act was passed by Congress on June 30, 1906, and signed into law by President Theodore Roosevelt. The Food and Drug Act of 1906 was completely revised in 1938 in the Federal Food, Drug and Cosmetic Act (FD&C Act) of 1938, which required that all new drugs be shown to be safe before being marketed in the US.¹

Prior to the 1960s, few regulatory standards were required for controlled clinical research in medical product development. In the 1960s, statistical inference, applied to clinical trials, was initiated, as can be seen in National Institutes of Health – National Institute of Allergy and Infectious Diseases (NIH-NIAID) activities. In 1962, the Kefauver-Harris Amendments to the FD&C Act “revolutionized drug development.” The amendments included the requirement that drugs should demonstrate efficacy. They also required adequate and “well-controlled studies” that provided “substantial evidence” of efficacy for approval of new drugs.¹ A 1987 new drug application (NDA) rewrite said that case report tabulations (CRTs) replaced the earlier all case report forms (CRFs) requirement. These CRTs mandated that all data from the earliest clinical pharmacology studies and all safety data from other studies be included in the submission. This NDA rewrite also gave sponsors permission to use clinical research organizations (CROs) to collect clinical research data.

While the original intent for data was archival, the enterprise needed usable datasets, as well as experienced professional clinical data managers. In the 1970s, the Public Health Service (USPHS) recognized the need for a well-educated CDM workforce with a Request for Proposal (RFP) providing funding for graduate-level education for clinical data managers. A new discipline was born. Unfortunately, the recognized need for data managers grew more rapidly than anticipated. When the USPHS funding ended, the curriculum was not continued at the university level, despite the need for well-educated data managers.

As increasingly complex requirements for data managers were highlighted, the data management group separated from the Pharmaceutical Research and Manufacturers of America® (PhRMA). Subsequently, the group re-formed as the Society for Clinical Data Management (SCDM).

The mission of the SCDM, promoting clinical data management excellence, includes promotion of standards of good practice within clinical data management. In alignment with this part of the mission, the SCDM Board of Trustees established a committee in 1998 to determine standards for Good Clinical Data Management Practices (GCDMP) p. 2.

The initial version of the GCDMP was published in September 2000 and continues to be updated.

In the 1990s, changes in CDM were guided by Janet Woodcock’s (Former Director, FDA 1994–2004) focus on certification of professionals involved in clinical trials research for medical products. This included significant support from the Council on Economic Growth (CEG), the FDA, and the SCDM industry/agency collaborations. These institutions were focused on helping to assure clinical trial data integrity and usability.

In 1997, a volunteer data standards development effort provided an infrastructure for industry, academic, and government organizations to collaborate on data standards. This volunteer group worked as a Drug Information Association Special Interest Area Community (DIA-SIAC) in 1998–1999 and became an independent organization in 2000: the Clinical Data Interchange Standards Consortium (CDISC).

CDISC is an open, multidisciplinary, non-profit organization committed to the development of worldwide industry standards. These standards support the electronic acquisition, exchange, submission and archiving of clinical trials data and metadata for medical and biopharmaceutical product development.

In 1997, the first Joint FDA/DIA (Drug Information Association) Meeting was held, and the FDA specifically asked for industry input into CDISC data standards. The primary goal for these standards was to promote the development and implementation of data content standards throughout the medical product development enterprise. This would enable more accurate and efficient communication of research processes and results. Today, the CDISC data standards have been globally accepted. The work of SCDM (GCDMP) and CDISC provides the standards for clinical research data integrity. SCDM and CDISC serve as the foundation for integrating today’s technological changes in clinical research data. They also offer a path for the NIH’s Clinical and Translational Science Awards (CTSA) to provide innovative solutions in clinical research. Today, National Academies of Sciences, Engineering, and Medicine (NASEM’s) data sharing initiatives depend on CDM findings to ensure clinical research data quality for all.

As noted in this brief history, the role of data management professionals in the science of clinical trials continues to be central to the integrity of the data relied upon as evidence for important public health decisions.

How CDM Work Varies

The most comprehensive definition of data management comes from the Data Management Associations International (DAMA): “Data Management (DM) is the business function of planning for, controlling and delivering data and information assets.²” SCDM provides the structure for good data management practices. Since 1998,the standards are reviewed and published in the Good Clinical Data Management Practices (GCDMP). The business of data management includes not only the development, execution, and supervision of plans, but also the policies, programs, projects, processes, practices, and procedures that are involved in managing the data. This requires a system for controlling, protecting, delivering, and enhancing the value of data and information assets. In the clinical research community, the responsibility for this system falls to CDM.

CDM varies across the healthcare research community from industry to academia to non-profits. Even the definition of a clinical data manager can change considerably depending on the setting. As the FDA reminds us, the reliability, quality, integrity, and traceability of the data are the responsibility of the investigators and data managers.³

In industry, CDM is a core function managing all steps from the case report form (CRF) design to the delivery of the clean database. Among the tasks are the programming of data listings, programming of edit checks, and coding of adverse events and medical history data. Coding requires both knowledge of and facility with the International Classification of Diseases, Clinical Modification (ICD-CM) and Medical Dictionary for Regulatory Activities Terminology (MedDRA®-owned and copyright of International Council for Harmonization (ICH)).⁴

Monitoring of clinical trial data is often a joint effort between CDM and Clinical Operations. The FDA and the European Medicines Agency (EMA) both offer guidance in the collection and monitoring of clinical research data.^5,6 This guidance reflects the FDA and EMA joint work for the International Conference on Harmonisation (ICH) on the preparation of quality data in clinical research.

For academic medical centers in the US, CDM is the cornerstone for all research projects. The projects range from small exploratory analyses to large multicenter clinical trials. Such projects require the same expertise as that in industry. Depending on funding constraints, academics may not have available candidates who are well-acquainted with the professional demands of DM.⁴

Biomedical researchers in the academic clinical research community do find that data management is the key to their research success.⁷ The data managers/project managers provide the infrastructure to gather quality data for the project. These managers must overcome problems encountered in the development of data collection plans. Such problems often include assigning and managing tasks and responsibilities for the research project, as well as the development of the data collection instruments. Thus, they ensure quality biomedical research data that can be shared with the entire community.

Unfortunately, in many academic settings, good data management is largely undefined and is left as a decision for the data or repository owner. CDM processes and deliverables in academia need to be identified and implemented to realize the goal of providing high-quality data. In research settings, the FAIR guiding principles of making data “findable, accessible, interoperable, and reusable” are indispensable.⁸

Another source of clinical research and biomedical data, and accordingly, a need for CDM leadership, is in the real-world data (RWD) found in registries.⁹ Of primary concern when working with registries is the lack of knowledge and implementation of standards and guidelines in clinical practice.¹⁰ These activities impact the quality of the registry’s results. For clinical trials, registries have several shortcomings. The four most cited problems are that (1) registry patients are assigned to clinical trials without randomization, (2) registry patients are not followed up with in a standardized fashion, (3) registries are missing data, and (4) registry patient enrollment is less supervised when compared to a clinical trial.¹⁰ It is believed that including registries in CDM practice, providing standards for classification and coding, and having consistent definitions would offer a powerful set of quality data, as well as an opportunity for education for practitioners.

Traditionally, CDM ensures quality data for consumers.¹¹ In today’s world of automation and interoperability, it has become imperative that data consumers develop a greater understanding of the data, how it was collected or generated, and how it can be used. It is essential that the data not be solely defined by the researcher. Consumers need to know how to retrieve the data, how to assess its quality, and how to have confidence in the data—whether it comes from industry, academia, or non-profits.¹² The entire biomedical community must be a participant in CDM activities.

Role of CDM in Biomedical Research

The biomedical research enterprise is dependent on high-quality interoperable data to ensure advances in healthcare and medical product development.² CDM provides the tools to examine the observations that were made by applying reliable evaluation techniques to these observations.¹³ This will lead to a systematic and thorough analysis of the data. The entire lifecycle of data governance from design of the databases to preparation and the instruction of those that are capturing and reporting the data must be established. Plans for transmittal of the data after cleaning, reviewing, and documenting the materials must be identified. Finally, proposals for the storage, archiving, and presenting of the data must be detailed. The quality of data collection and elimination of “noise” and bias are essential elements good CDM practice in biomedical research.¹⁴

In the 21st Century Cures Act, the FDA stressed that surveillance and testing capabilities must be improved in preparation for future pandemics. The FDA’s Sentinel Initiative cites the goal of interoperability of the Sentinel Common Data Model. The 2022 “FDA’s Budget: Advancing the Goal of the Opioid Crisis” contains funding for digital health initiatives and digital health data.¹⁵ In all of these cases, a strong data management strategy is needed.

Biomedical research findings are dependent on data sharing and good data management as detailed in the many NIH and NASEM initiatives in response to COVID-19.¹⁶ Technology has driven fundamental changes in CDM and biomedical research beyond electronic data capture (EDC). Fit-for-purpose data strategies are being developed, which focus on what is needed, rather than capturing everything that is available.¹⁷ Artificial intelligence/machine learning (AI/ML) technologies are providing accelerated database releases, as well as automating significant portions of data validation and severe adverse event (SAE) reconciliation. These technologies enable rapid data transformation to submission-ready datasets and run predictive analytics to provide real-time insights and to manage risk.

Currently, there are many ongoing efforts at mapping and reconciliation of the terminologies that facilitate data interoperability.¹⁸ There are also a large number of inputs from disparate data sources, which must be mapped to one another. The data manager is at the center of the drive for data quality, tracking and overseeing all the inputs that are contributing to the exponential growth in biomedical research.

Recruitment, Training, and Education

CDM practitioners can be found along a spectrum, from very experienced to novice. These practitioners carry out data acquisition, processing, and validation. Many data managers are trained on the job through institutionally developed training, apprenticeships, or self-learning programs. Frequently in academia, each investigator must train his or her own data management staff. Such training is often highly focused on past practices and organizational procedures, rather than on underlying theories, principles, and methods that are based on evidence.⁴

The role of the data manager has been evolving from that of a data acquisition professional to a clinical data manager to a data scientist. The data manager has responsibility for including colleagues from other disciplines in their pursuit of high-quality data, and the data scientist needs new skills, such as programming, statistics, data visualization, and analytics.⁴ They need to understand concepts around AI and ML, deep learning, and big data to communicate with all stakeholders. For risk-based quality management (RBQM) and decentralized clinical trials (DCTs), an understanding of the concepts and tools associated with the use of eSource, data interoperability, and data integration strategies is essential.

One of the first steps in CDM education and training is how to collect and code the data. How do we improve the accuracy of the data abstracted from medical records?¹⁹ Often there is little appreciation for the noise and bias in the data.¹⁴ Training must address what data is required and what is not required. When data is not provided, the training needs to show how to specify the explanations for the missing data.

Training for coding the data captured involves a sound understanding of the terminologies involved, namely Medical Dictionary for Regulatory Activities (MedDRA®) and World Health Organization – Drug Dictionary (WHO-DD). An understanding of the scope, hierarchy, coding convention, and best practices is required for successful performance of the coding role. The MedDRA® Points to Consider (PTC) documents provide a framework for industry to develop their own customized best practices documents.²⁰ It is important that all individuals utilizing the data understand the nuances of terminology structure, as well as its limitations.

Quality data are essential for all research projects. Yet training in achieving, assessing, and producing quality research data is rarely taught.¹⁸ Basic tools to plan, achieve, and control the quality of research data must be given to all informaticists and clinical research data managers. Descriptions of error rates and evaluation of data quality can vary between training, monitoring, and auditing phases of a study.²¹ Formal guidelines furnish all researchers with the tools for improving the efficiency of monitoring and the quality of the data captured from clinical studies.²²

The FDA and EMA offer guidance on RBQM training.²³ This training should include principles of clinical investigations and human subject protection. Study-specific training should cover the trial design, protocol requirements, study monitoring plan, and any applicable standard operating procedures and study-specific electronic systems.

All who are involved in collecting and managing data for drug approval need to understand that only quality data can support the inferences in the FDA submission.^3,21 Quality data are the key input and directly impact the quality of the analysis, retrieval, and presentation data output. The data manager coordinates all aspects of data quality and depends on the support and training of all colleagues in the production of high-quality datasets.

Strengths, Challenges, and Regulatory Issues

This section focuses on the unique strengths, challenges, and regulatory issues for CDM in the US in ensuring data quality in clinical research.

A primary strength of CDM in the US is the invaluable leadership that the data managers/data scientists in this community, as well as the FDA, provide to ensure quality data in medical product development and clinical research. A second strength, as well as a challenge, is in evidence throughout the COVID-19 pandemic. Without the support of CDM, surveillance, testing, and vaccine development could not have occurred in record time. A third strength is that CDM sits at the center of the hub connecting all stakeholders with the materials and infrastructure to complete their projects from the design of the clinical trial and research study, case report forms, and database structure to collection, evaluation, and reporting of the data.

A primary challenge at this time is defining a path for data management to become a respected data science by enumerating the essential components and unique perspective for CDM practitioners. A second challenge is to include CDM during the design stage of studies, including evaluating data standards, data quality, and reproducibility of the research.^24,25,26 Data collectors at research institutions need to be included in training on data quality. As terminologies (such as ICD and MedDRA®) exponentially increase the term counts and increase the scope of what is included, it becomes challenging for data management personnel to keep up with and communicate the new standards.

The first regulatory issue concerns following FDA guidance with respect to Human Subject Protection-Bioresearch Monitoring (HSP-BIMO).³ The FDA provides an outline of how to plan for data quality beginning with the design process. Suggested steps are listed in the document that would improve the quality of research data. A second regulatory issue is in accessing and application of updated FDA standards for study data submission.²⁷ The responsibility for meeting all FDA data requirements falls to CDM. A third regulatory issue focuses on using RWD. The introduction of the 21st Century Cures Act in 2016 looked beyond the traditional data portals to patient-derived data and other RWD for medical products approval.^9,28 FDA officials have has said that RWD can support products if evidentiary standards are not lost. CDM must look at not only the methodologies used to collect the data, but also the reliability of the underlying information.

In conclusion, CDM in the US has come a long way in the last quarter of a century. It is truly the center for all clinical research activities and for sharing of clinical research data. Without valid, reliable clinical data, the biomedical research enterprise would likely collapse. The new technologies and novel discoveries offer great hope for medicine in the 21st century, but quality data is the key to their success. CDM requires a competent, well-trained workforce, where all stakeholders contribute to the enterprise, sharing and communicating their findings.

It is essential that existing resources and organizations for CDM be optimized. Having collaborations and contributions from all stakeholders in formulating a cohesive program for future CDM professionals is the first step to ensuring excellence in this endeavor. Working with the CDM community to identify challenges and best solutions will be beneficial for all.

Acknowledgements

The authors wish to thank Bathsheba Malsheen, PhD, and the entire DIA CDM Community for discussions, suggestions, and reviews of this paper from early drafts to the final document.

Competing Interests

The authors have no competing interests to declare.

References

1. Junod SW. FDA and clinical drug trials: a short history. In Davies M, Kerimani F, eds. A quick guide to clinical trials. 2008: 25–55. Washington: Bioplan, Inc.;

2. Mosley M. The DAMA Guide to The Data Management Body of Knowledge: DAMA-DMBOK Guide. Estados Unidos: Technics Publications; 2009.

3. FDA. FDA’s HSP/BIMO Initiative Accomplishments: Update June 2014 to Concept Paper: Quality in FDA-Regulated Clinical Research; 2014.

4. Zozus MN, Lazarov A, Smith LR, et al. Analysis of professional competencies for the clinical research data management profession: Implications for training and professional certification. Journal of the American Medical Informatics Association. 2017; 24(4): 737–745. DOI: http://doi.org/10.1093/jamia/ocw179

5. FDA. E6 (R2) Good Clinical Practice: Integrated Addendum to ICH E6 (R1). Guidance for Industry 2018.

6. ICH. Draft ICH E6 Guidance. EMA; 2021.

7. Read KB. Adapting data management education to support clinical research projects in an academic medical center. J Med Libr Assoc. 2019; 107(1): 89–97. DOI: http://doi.org/10.5195/JMLA.2019.580

8. Wilkinson MD, Dumontier M, Aalbersberg IJJ, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data. 2016; 3: 160018–160018. DOI: http://doi.org/10.1038/sdata.2016.18

9. FDA. Framework for FDA’s Real-World Evidence Program. FDA Report. 2018. https://www.fda.gov/media/120060/download. Accessed August 31, 2021.

10. Pop B, Fetica B, Blaga ML, et al. The role of medical registries, potential applications and limitations. Med Pharm Rep. 2019; 92(1): 7–14. DOI: http://doi.org/10.15386/cjmed-1015

11. Wang RY, Strong DM. Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems. 1996; 12(4): 5–33. DOI: http://doi.org/10.1080/07421222.1996.11518099

12. Garrett L. COVID-19: The medium is the message. The Lancet. 2020; 395(10228): 942–943. DOI: http://doi.org/10.1016/S0140-6736(20)30600-0

13. Zozus M. The Data Book. 2017. DOI: http://doi.org/10.1201/9781315151694

14. Kahneman D, Sibony O, Sunstein CR. Noise: A flaw in human judgment. Little, Brown; 2021.

15. FDA. FDA’s Budget 2022: Advancing the Goal of Ending the Opioid Crisis. 2021. https://www.fda.gov/news-events/fda-voices/fdas-budget-advancing-goal-ending-opioid-crisis.

16. Martone M, Nakamura R. Workshop Highlights from the Co-chairs. Paper presented at: Changing the Culture of Data Management and Sharing ꟷ A Workshop; 2021. https://www.nationalacademies.org/event/04-29-2021/changing-the-culture-of-data-management-and-sharing-a-workshop.

17. Harrison T, F. Luna-Reyes L, Pardo T, De Paula N, Najafabadi M, Palmer J. The data firehose and AI in government: Why data management is a key to value and ethics. Paper presented at: Proceedings of the 20th Annual International Conference on Digital Government Research; >2019. https://dl.acm.org/doi/10.1145/3325112.3325245. DOI: http://doi.org/10.1145/3325112.3325245

18. Nahm M. Data Quality in Clinical Research. In Richesson RL, Andrews JE, eds. Clinical Research Informatics. London: Springer London; 2012:175–201. DOI: http://doi.org/10.1007/978-1-84882-448-5_10

19. Zozus MN, Pieper C, Johnson CM, et al. Factors Affecting Accuracy of Data Abstracted from Medical Records. PLOS ONE. 2015; 10(10): e0138649. DOI: http://doi.org/10.1371/journal.pone.0138649

20. MSSO. MedDRA® Term Selection: Points to Consider – Rev. 4.1. 2021. https://www.meddra.org/how-to-use/support-documentation/english. Accessed July 15, 2021.

21. Estabrook RW, Woodcock J, Nolan VP, Davis JR. Assuring data quality and validity in clinical trials for regulatory decision making: workshop report (1999). NAS Report. 1999. http://nap.edu/9623.

22. Houston L, Yu P, Martin A, Probst Y. Heterogeneity in clinical research data quality monitoring: A national survey. Journal of Biomedical Informatics. 2020; 108. DOI: http://doi.org/10.1016/j.jbi.2020.103491

23. FDA. Oversight of Clinical Investigations — A Risk-Based Approach to Monitoring; 2013.

24. Griffin PC, Khadake J, LeMay KS, et al. Best practice data life cycle approaches for the life sciences. F1000Research. 2017; 6. DOI: http://doi.org/10.12688/f1000research.12344.1

25. Zozus MN, Sanns W, Eisenstein E, Sanns B. Beyond EDC. Journal of the Society for Clinical Data Management. 2021; 1(1). DOI: http://doi.org/10.47912/jscdm.33

26. Simms S, Strong M, Jones S, Ribeiro M. The future of data management planning: Tools, policies, and players. IJDC. 2016; 11(1). DOI: http://doi.org/10.2218/ijdc.v11i1.413

27. FDA. Study Data for Submission to CDER and CBER. 2021. https://www.fda.gov/industry/study-data-standards-resources/study-data-submission-cder-and-cber. Accessed June 25, 2021.

28. FDA. 21st Century Cures Act. 2016. https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/21st-century-cures-act. Accessed June 25, 2021.