Introduction
Clinical data standards and data anonymization constitute two critical components of clinical research studies that ensure the quality, integrity, and privacy of data collection. Whereas clinical data standards refer to the uniformity and consistency of the data collected in clinical trials,1 data anonymization generally refers to the process used to protect the privacy of participants.2
When implementing clinical data standards and data anonymization in academic studies, researchers should consider several factors, such as the types of data collected, study designs used, data management systems, data sharing policies of sponsoring and host institutions, as well as all regulatory requirements. To this end, all institutions should have a clearly articulated data sharing policy that outlines the conditions under which their data can be shared and the procedures for accessing the data by others.
Background
In Clinical Data Management (CDM), combining strict standards with advanced anonymization is key for the protection of sensitive information, particularly in academic biomedical research. This setting requires balancing data standards, privacy regulations, and innovative research methods. Furthermore, academic research that spans multiple disciplines depends heavily on these standards for effective information exchange, collaboration, and reproducibility of results. Adhering to standards, such as the Research Data Alliance’s (RDA) outputs and FAIR (Findable, Accessible, Interoperable, and Reusable) principles ensures a unified data framework that facilitates interdisciplinary research and knowledge integration.4
Standardized formats and protocols in academic research, such as the Clinical Data Interchange Standards Consortium (CDISC) in clinical research,3 are crucial for harmonizing datasets and for enabling collaboration.4 Existing literature indicates variability in the adoption of data standards across institutions globally, with academic settings often lagging in the effective implementation and training required for consistent use.5 Despite the well-documented benefits, such as allowing researchers to focus more on scientific discovery than on data preparation and processing, challenges such as resource limitations and training gaps have hindered the consistent application of these standards.6
In health and medical research, especially as practiced in the U.S. or when conducting research under Food and Drug Administration (FDA) or National Institutes of Health (NIH) regulations, compliance with the Health Insurance Portability and Accountability Act (HIPAA) is crucial. HIPAA sets national standards for protecting health information, with strict controls on its use and disclosure.7 This applies to any U.S.-involved research or FDA-regulated studies. Within the European Union, General Data Protection Regulation’s (GDPR) comprehensive data protection standards significantly affect how academic institutions handle personal data. Adhering to data standards is vital for a structured approach to data management, ensuring transparency and compliance with GDPR, especially in international collaborations.8
Alongside data standards, advanced anonymization techniques such as k-anonymity and differential privacy, are essential when conducting research with human subjects, particularly with respect to protecting individual privacy.9,10 Yet, balancing data utility with data privacy protection is a key factor in the research process, requiring researchers to be adept in these methods to meet all ethical and legal standards. Although previous studies, such as Chevrier et al (2019),11 explored the use of anonymization techniques, there is still limited evidence that specifically addresses the systematic training and comprehensive application of anonymization standards within academic settings, especially across diverse institutional types. Given the importance, international, and multidisciplinary nature of clinical research today, the purpose of this study was to obtain a snapshot of the practice patterns of Clinical Data Management units conducting academic institutions specifically as those patterns relate to clinical data standards and data anonymization.
Methods
The Academic Relations Committee of the Society for Clinical Data Management (SCDM) has developed a comprehensive questionnaire to survey practice patterns regarding the instruction and use of clinical data standards and data anonymization procedures in day-to-day work. Established in 1994, SCDM is a non-profit organization that is committed to advancing excellence in Clinical Data Management through thought leadership, education, and advocacy. With more than 3,100 members globally, SCDM’s membership spans a wide range of clinical research institutions, including pharmaceutical companies, medical device companies, clinical research organizations, academic research organizations, universities, and university hospitals, making it a leading industry authority.
The survey questionnaire, written in English, was organized into three distinct sections: institutional information (four questions), data standards (eight questions), and data anonymization (two questions). The survey portal was accessible from June 24 to September 30, 2023, and the survey was designed to be completed in approximately five minutes. Completing the questionnaire was considered consent to participate in the survey. A schematic of the survey questions can be found in Figure 1.
The questionnaire was developed with the SurveyMonkey web platform (https://www.surveymonkey.com/) and distributed through the SCDM Data Connections e-Newsletter, a social media campaign that was amplified through the personal social media accounts of the SCDM Academic Relations Committee members. Although specific job titles or roles of respondents were not collected, the distribution strategy aimed to reach a representative sample of professionals that were actively engaged in clinical data management. Given SCDM’s established outreach channels and member base – which includes a diverse array of clinical research professionals from Contract Research Organizations (CROs), universities, pharmaceutical companies, and other clinical research bodies – the respondents were assumed to be reasonably positioned within their institutions to report accurately on institutional practices. Inconsistent responses were excluded from the analysis (n = 10; ie, responder declared no data standard used, then specified CDISC as data standard). Frequencies were calculated for categorical variables and percentages were rounded to the nearest whole percentage point.
Limitations: while the survey targeted data management professionals, the lack of specific data on job titles may limit detailed insights into the respondents’ precise roles or institutional knowledge. This limitation should be considered when interpreting the study findings.
Results
Results displayed herein were systematically arranged according to question and theme, commencing with a concise overview of the countries involved, progressing through the categorization of institutions utilized, and culminating in the application and instruction of data norms, as delineated by each survey query.
A total of 51 completed questionnaires were collected. The questions that covered data standards were answered by 98% (n = 50) of all participants while the questions that solicited information on data anonymization were answered by 67% (n = 34) of all participants. The percentage of questions answered across respondents ranged from 67 to 100% (see Table 1).
Countries represented by survey respondents.
Country | Number Responses (% of Sample) | 1 | 2 | 3 | 4 | 5 | 6 |
United States of America | 20 (40) | 3 | 1 | 2 | 0 | 11 | 3 |
Italy | 7 (14) | 0 | 0 | 0 | 3 | 1 | 3 |
Japan | 5 (10) | 2 | 0 | 0 | 0 | 1 | 2 |
India | 4 (8) | 1 | 2 | 0 | 0 | 1 | 0 |
Afghanistan | 2 (4) | 1 | 0 | 0 | 0 | 1 | 0 |
France | 2 (4) | 1 | 0 | 0 | 1 | 0 | 0 |
Mexico | 2 (4) | 0 | 1 | 0 | 0 | 1 | 0 |
Portugal | 2 (4) | 0 | 0 | 0 | 2 | 0 | 0 |
Australia | 1 (50) | 1 | 0 | 0 | 0 | 0 | 0 |
Canada | 1 (50) | 0 | 0 | 0 | 1 | 0 | 0 |
China | 1 (50) | 0 | 1 | 0 | 0 | 0 | 0 |
Egypt | 1 (50) | 0 | 1 | 0 | 0 | 0 | 0 |
Philippines | 1 (50) | 0 | 1 | 0 | 0 | 0 | 0 |
Sweden | 1 (50) | 0 | 0 | 0 | 0 | 1 | 0 |
Total | 50 (100) | 9 | 7 | 2 | 7 | 17 | 8 |
1. Academic Research Organization (ARO); 2. Contract Research Organization (CRO), 3. Non-university hospital, 4. Other clinical research organization, 5. University, 6. University hospital.
One questionnaire was excluded from the analysis for incomplete responses. For this reason, 50 out of 51 questionnaires were used for metrics calculations.
Countries
Completed surveys were collected from respondents representing 14 different countries (see Table 1, Figure 2).
Institutional Type
Responses were received from many different types of institutions, most notably universities (n = 17, 34%), academic research organizations (n = 9, 18%), and university hospitals (n = 8, 16%). Other institutions were also represented, including contract research organizations (n = 7, 14%), other clinical research organizations (n = 7, 14%), and non-university hospitals (n = 2, 4%). (see last row of Table 1).
Use of Data Standards
The adoption of data standards varied greatly across the various institutional types and countries. Of the institutions surveyed, 66% employed data standards (n = 33), while the remaining 34% did not (n = 17). Breaking down the 33 institutions, we found that 100% of contract research organizations (CROs, 7/7) had adopted data standards, followed by 82.3% of universities (14/17) and 60% of the academic research organizations (AROs, 6/9), whereas only 12.5% of the university hospitals (1/8) had done so. Due to the low number of replies collected for all the other institutional types (n < 5) it would be difficult to draw inferences (see Figure 3).
Geographically, the USA is at the forefront, with 80% (16/20) of the institutions employing data standards. Notably, the respondents from institutions in Australia, Canada, and Italy reported no usage of data standards. Due to low response rates from the remaining countries (n < 5), drawing formal interpretations would be tenuous at best. These data highlight tremendous variability in the adoption of data standards across the different institutional types and countries, with a significant emphasis on usage in American institutions (see Table 2).
Use of data standards by country.
Country | Yes (%) | No (%) |
United States of America | 16 (80%) | 4 (20%) |
Japan | 4 (80%) | 1 (20%) |
India | 3 (75%) | 1 (25%) |
France | 2 (100%) | 0 (0%) |
Portugal | 2 (100%) | 0 (0%) |
Afghanistan | 1 (50%) | 1 (50%) |
China | 1 (100%) | 0 (0%) |
Egypt | 1 (100%) | 0 (0%) |
Mexico | 1 (50%) | 1 (50%) |
Philippines | 1 (100%) | 0 (0%) |
Sweden | 1 (100%) | 0 (0%) |
Australia | 0 (0%) | 0 (0%) |
Canada | 0 (0%) | 0 (0%) |
Italy | 0 (0%) | 0 (0%) |
CDISC and HL7 FHIR
Our investigation into the most used data standards reveals that the Clinical Data Interchange Standards Consortium (CDISC) is the most reported standard, adopted by 84.8% (28/33) of those who have adopted and who use data standards. The Health Level 7 Fast Healthcare Interoperability Resources (HL7 FHIR) is the only other cited data standard, used by one university. CDISC is adopted by the 100% of the CROs (7/7) and 85.7% of the universities (12/14). Due to the low number of responses for all the other categories (n < 5) further interpretation would be difficult (see Figure 4).
Internal or External data standard service
When considering respondents that utilized data standards and who predominantly relied on internal or external coding services, the majority of 69.7% (23/33) reported using internal coding services while 18.2% (6/33) reported using external services. Importantly, 85.7% of the universities (12/14) relied on internal services. Similar to the previous questions, low response rates preclude additional interpretation of findings (n < 5) (see Table 3).
Use of INTERNAL or EXTERNAL coding service by kind of institution.
Kind of institution | Internal n (%) | External n (%) | Not available n (%) |
University | 12 (85.7%) | 1 (7.1%) | 1 (7.1%) |
Contract research organization | 4 (54.1%) | 3 (42.8%) | 0 (0%) |
Academic research organization | 3 (50%) | 1 (16.7%) | 2 (33.3%) |
Other clinical research organization | 2 (66.7%) | 0 (0%) | 1 (33.3%) |
University Hospital | 1 (100%) | 0 (0%) | 0 (0%) |
Non- University Hospital | 1 (50%) | 1 (50%) | 0 (0%) |
Free or on-demand internal data standard coding service
We also explored whether those institutions that utilize internal data standards used free or on-demand coding services. The majority, 60.9% (14/23), reported using on-demand services, whereas 34.8% (8/23) used external services. 50% of the universities (6/12) relied on an on-demand service(s). Since the responses for all other categories were too few (n < 5), it is not possible to derive any meaningful insights. (see Table 4).
Use of FREE or ON-DEMAND coding service by kind of institution.
Kind of institution | Free n (%) | On-demand n (%) | Not available n (%) |
University | 5 (41.6%) | 6 (50%) | 1 (8.3%) |
Contract research organization | 1 (25%) | 3 (75%) | 0 (0%) |
Academic research organization | 1 (33.3%) | 2 (66.7%) | 0 (0%) |
Other clinical research organization | 0 (0%) | 2 (100%) | 0 (0%) |
University Hospital | 0 (0%) | 1 (100%) | 0 (0%) |
Non- university Hospital | 1 (100%) | 0 (0%) | 0 (0%) |
Use of naming convention
Even if an institution does not use data standards, using naming conventions is a common practice both for administrative and clinical practices. To describe and to quantify this practice, we asked participants if they followed any naming convention or not. The most, 58.8% (10/17), reported no usage of a naming convention, whereas 23.5% (4/17) actually use one. At University Hospitals, 71.4% (5 out of 7) of respondents indicated that they do not use a naming convention, resulting in the highest number of negative responses to this question. Similar to the previous questions, as the number of responses for all other categories is below 5, extracting any insights from them is not feasible. (see Table 5).
Data Standards and Data Anonymization Education by Institutional Type.
Institution Type | Yes (%) | No (%) | Not Applicable (%) |
Use of Naming Convention | |||
Academic Research Organization | 1 (33.0) | 1 (33.0) | 1 (33.0) |
Contract Research Organization | |||
Non-University Hospital | |||
Other Clinical Research Organization | 3 (75.0) | 1 (25.0) | |
University | 1 (33.0) | 1 (33.0) | 1 (33.0) |
University Hospital | 2 (28.6) | 5 (71.4) | |
Perceived Value of Data Standards | |||
Academic Research Organization | 2 (66.7) | 1 (33.0) | |
Contract Research Organization | |||
Non-University Hospital | |||
Other Clinical Research Organization | 3 (75.0) | 1 (25.0) | |
University | 2 (66.7) | 1 (33.0) | |
University Hospital | 6 (85.7) | 1 (14.3) | |
Data Standards Education | |||
Academic Research Organization | 1 (11.1) | 3 (33.3) | 5 (55.5) |
Clinical Research Organization | 2 (28.6) | 3 (42.9) | 2 (28.5) |
Non-University Hospital | 1 (50.0) | 1 (50.0) | |
Other Clinical Research Organization | 2 (28.6) | 2 (28.6) | 3 (42.8) |
University | 3 (17.6) | 8 (47.0) | 6 (34.4) |
University Hospital | 2 (25.0) | 5 (62.5) | 1 (12.5) |
Data Anonymization Education | |||
Academic Research Organization | 4 (44.4) | 5 (55.6) | |
Contract Research Organization | 2 (28.6) | 3 (42.9) | 2 (28.5) |
Non-University Hospital | 1 (50.0) | 1 (50.0) | |
Other Clinical Research Organization | 4 (57.1) | 3 (42.9) | |
University | 4 (23.5) | 8 (47.1) | 5 (29.4) |
University Hospital | 3 (35.7) | 4 (50.0) | 1 (14.3) |
Note. There was a total of 51 survey responses across all types of institutions; however, responses to specific items were computed by item type among those responding to each item.
Perceived value of data standards
Our research allows us to explore whether institutions not utilizing a data standard perceived them as added value. Most, 76.5% (13/17), reported using a data standard would be an added value and only 5.9% (1/17) shared a different opinion. 85.7% of the university hospitals (6/7) relied on internal service. Due to the insufficient number of responses (n < 5) for all other categories, it is impractical to draw any conclusions. (see Table 5).
Data standards education
Analysing the level of data standards education across various institutions, we found that 44% (22/50) did not have any training program, including 47.1% of universities (8/17) and 62.5% of university hospitals (5/8). Conversely, 22% (11/50) of the institutions organize a training program. This highlights a varied approach to data standards education across different types of institutions (see Table 5).
Data anonymization education
Finally, we explored the level of education around data anonymization and found that 48% (24/50) do not implement any form of data anonymization training program, (47.1% of universities (8/17). On the other hand, 20% (10/50) do have some form of training programme around data anonymization. This highlights a varied approach to data anonymization education across different types of institutions. Due to the low number of replies collected for most of the institutions (n < 5) no conclusions can be drawn. However, our limited data suggest that academic institutions involved in clinical research either are not prioritizing the education of data anonymization or lack the necessary resources to deliver this education (see Table 5).
Discussion
This study unveils a critical issue in the realm of clinical research within academic institutions: the tremendous variability in the adoption and application of clinical data standards and anonymization techniques. Evidence of data standard adoption variability has been reported in the literature,5,6 but with no clear description of the extent of the phenomenon. Our survey, which encompassed a diverse range of 50 institutions, indicates a disparity in findings. While a majority (66%) have embraced data standards, a strikingly lower percentage offer training in these crucial areas (only 22% in data standards and 20% in anonymization). This gap is not just a matter of policy but reflects a deeper challenge in aligning educational initiatives with the evolving needs of data-driven research.
The findings of this survey reveal a significant disconnection between the nominal adoption of clinical data standards and their practical implementation within academic institutions, as highlighted by the discrepancies in training provision for data standards and anonymization. The survey’s lack of job role data limits insight into respondents’ familiarity with institutional practices; however, its distribution through SCDM’s professional channels likely ensured that respondents held relevant knowledge in data management practices and data standards within their organization.
Although many institutions have nominally adopted data standards, their actual implementation, especially in terms of training, remains substantially lower. Embi et al. (2015) highlighted that resource allocation is a critical factor, with institutions often facing financial and human resource constraints that impede the establishment of effective training programs.10 Moreover, the organizational structure and culture within an institution can dramatically affect how consistently and pervasively standardization methods are adopted and implemented. For universities, in contrast to industry, units may operate completely independently of one other in the same institution, resulting in units that have vastly different approaches to adoption and adherence. At many academic research clinical sites, the departments are so distinct that their philosophies and guiding principles lead to significantly different approaches to implementation, such as those seen between Biostatistics and Pharmacy or Radiology and Nursing. This perception can significantly affect the willingness to engage in the comprehensive training of research staff.
The term ‘standards’ itself appears to be interpreted variably across institutions, impacting the consistency of its application. Kho et al. (2015) discussed the importance of clear and universally accepted definitions for data standards, which are essential for the consistent application of these practices across diverse educational and professional backgrounds.12 Without such clarity, institutions may adopt superficial or incomplete implementations of data standards.
Impact of external factors
External factors also play a crucial role in the adoption and robustness of data standards. Harris et al. (2016) highlighted how regulatory pressures can compel institutions to align their data management practices more closely with international standards.13 Additionally, the need for compliance in international collaborations can push institutions towards adopting more stringent data practices to align with global partners.
Strategic recommendations
To bridge these gaps, institutions should consider the following strategies:
effective communication about the benefits of data standards can help to shift institutional culture towards a more positive valuation of these practices. Institutions need to develop clear, jargon-free communication strategies that outline these benefits.
training programs should be tailored to the needs and backgrounds of institutional members to ensure effective learning and application, as supported by the work of Harris et al. (2016), who emphasized the importance of collaborative networks in improving the adoption of clinical data standards.13
introducing incentives for departments or individuals that adhere to data standards can link these practices to tangible benefits, such as grant eligibility or enhanced publication opportunities.
Despite the many strengths of this study, there are also several limitations that must be recognized. First, the survey’s limited sample size likely will not fully represent the diversity of academic institutions globally. A sizeable percentage of the 50 respondents were from one country, the United States, which means that findings may be skewed toward those practices found in the United States. Participating organizations outside the United States may have different practice patterns (response bias). Second, the survey’s design may influence the interpretation of findings, most notably the way questions were answered. Responses to questions may vary based on national or cultural norms in each country. Third, given the rapid evolution in data management practices globally, the findings that result from a study may quickly become outdated. Finally, the study relies on quantitative data only and therefore misses out on qualitative insights and interpretations that could be helpful in creating a fuller interpretation of these data. These limitations suggest that while the study provides valuable initial insights, a more comprehensive approach to understanding the practices in clinical data standards and data anonymization in academic settings is needed.
Conclusions
This study uncovered significant disparities in the adoption and training of clinical data standards and anonymization across academic institutions globally. With a substantial majority of respondents recognizing the value of data standards yet lacking effective implementation, the need for standardized training and globalized data management protocols is evident. These findings call for an integrative approach, one that incorporates structured training into academic curricula and fosters collaborative standard-setting efforts. Addressing this divide is crucial, not only for data privacy and integrity, but also for enhancing the overall quality and reliability of clinical research as well as future usability of the data collected, thereby contributing positively to the field of health care.
Acknowledgments
We thank the Society for Clinical Data Management for the provision of the web platform that hosted the survey.
Competing Interests
The authors have no competing interests to declare.
References
1. Aspden P, Corrigan JM, Wolcott J, et al., Patient safety: Achieving a new standard for care. National Academies Press (US). Corrigan editors. 2004; 4: 127–168. https://www.ncbi.nlm.nih.gov/books/NBK216088/ Accessed July 10 2024
2. Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and understanding of anonymization and de-identification in the biomedical literature: scoping review. J Med Internet Res. 2019; 21(5): e13484. DOI: http://doi.org/10.2196/13484 Accessed July 10 2024
3. Clinical Data Interchange Standards Consortium (CDISC). Clinical Data Acquisition Standards Harmonization implementation guide (Version 2.1). CDISC; 2018. Accessed July 10, 2024. https://www.cdisc.org/standards/foundational/cdash Accessed July 10 2024
4. U.S. Department of Health and Human Services. Summary of the HIPAA Privacy Rule; 2022. https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html Accessed July 10 2024
5. Hufstedler H, Roell Y, Peña A, et al. Navigating data standards in public health: A brief report from a data-standards meeting. J Glob Health. 2024; 14: 03024. DOI: http://doi.org/10.7189/jogh.14.03024
6. Hudson LD, Kush RD, Almario EN, et al. Global standards to expedite learning from medical research data. Clin Transl Sci. 2018; 11(4): 342–344. DOI: http://doi.org/10.1111/cts.12556
7. European Parliament. Regulation (EU) 2016/679 of the European parliament and of the council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation); 2016. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679 Accessed July 10 2024
8. Sweeney L. k-Anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 2002; 10(5): 557–570. DOI: http://doi.org/10.1142/S0218488502001648 Accessed July 10 2024
9. International Standards Organization (ISO). Privacy enhancing data de-identification terminology and classification of techniques. ISO/IEC 20889:2018; 2018. https://www.iso.org/obp/ui/#iso:std:iso-iec:20889:ed-1:v1:en Accessed July 10 2024
10. Embi PJ, Yackel TR, Logan JR, Bowen JL, Cooney TG, Gorman PN. Impacts of computerized physician documentation in a teaching hospital: Perceptions of faculty and resident physicians. J Am Med Inf Association. 2015; 21(e1): 300–309. DOI: http://doi.org/10.1197/jamia.M1525 Accessed July 10 2024
11. Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and understanding of anonymization and de-identification in the biomedical literature: Scoping review. J Med Internet Res. 2019; 21(5): e13484. DOI: http://doi.org/10.2196/13484 Accessed July 10 2024
12. Kho, ME, Duffett M., Willison DJ, Cook DJ, Brouwers MC. Written informed consent and selection bias in observational studies using medical records: Systematic review. BMJ. 2015; 350: h1485. DOI: http://doi.org/10.1136/bmj.b866 Accessed July 10 2024
13. Harris PA, Taylor R, Minor BL, et al. The REDCap consortium: Building an international community of software platform partners. J Bio Inf. 2016; 95: 103208. DOI: http://doi.org/10.1016/j.jbi.2019.103208 Accessed July 10 2024