Skip to main content
Original Research

Implementation of Clinical Data Interchange Standard Consortium (CDISC) standards to Real-World Data: Challenges and Strategies in SDTM Development in the Setting of Observational Studies

Authors: SARA RIZZOLI , Alessandra Ori , Alessandra Mignani , Fabio Ferri , Lucia Simoni

  • Implementation of Clinical Data Interchange Standard Consortium (CDISC) standards to Real-World Data: Challenges and Strategies in SDTM Development in the Setting of Observational Studies

    Original Research

    Implementation of Clinical Data Interchange Standard Consortium (CDISC) standards to Real-World Data: Challenges and Strategies in SDTM Development in the Setting of Observational Studies

    Authors: , , , ,

Abstract

Introduction. In the realm of clinical trials, the Clinical Data Interchange Standard Consortium (CDISC) standards have become increasingly mandated by numerous national and regional regulatory agencies. For several years, the modeling of observational studies (OSs) was not explicitly addressed in the SDTM Implementation Guide (IG). However, in 2024, the document “Considerations for SDTM Implementation in Observational Studies and Real-World Data v1.0 (Final)” was released.

Aim. This paper aims to describe the challenges encountered when applying the CDISC SDTM standard to map data from real-world studies before the release of the CDISC document dedicated to OSs. 

Methods. The SDTM mapping process began exploring the rationale for using SDTM datasets and continued through programming, and validation.

Results. Data from three OSs conducted by IQVIA Solutions Italy between 2020 and 2024 and enrolling 1543 patients affected by asthma, diabetes and a rare disease were analyzed. A total of 86 SDTM domains were created, 95% of which were conventional SDTM domains according to the CDISC guidelines. Main validation issues arose because Exposure dataset was missing, or because variable EPOCH was not found or due to violation of conformance rules (e.g. “Value not found in codelist”), as these rules, designed for clinical trials had not been adapted yet to OSs. These issues were managed on a case-by-case basis through changes to the SDTM domains or by providing documented justification.

Conclusions. The challenges experienced till 2024 can now be solved thanks to the document released for OSs. However, implementing SDTM for OSs still needs ad-hoc solutions.

Keywords: SDTM, Observational Studies, Real-world data, conformance rules, validation, Pinnacle 21

How to Cite:

RIZZOLI, S., Ori, A., Mignani, A., Ferri, F. & Simoni, L., (2026) “Implementation of Clinical Data Interchange Standard Consortium (CDISC) standards to Real-World Data: Challenges and Strategies in SDTM Development in the Setting of Observational Studies”, Journal of the Society for Clinical Data Management 6(1). doi: https://doi.org/10.47912/jscdm.446

85 Views

35 Downloads

Published on
25 Feb 2026
Peer Reviewed

Introduction

In the world of clinical trials, which are undertaken with the intent of submitting a new medical product or intervention to regulatory authorities for marketing authorization approval, a set of global data standards has been adopted and is required by an increasing number of national and regional regulatory agencies. These standards were developed through Clinical Data Interchange Standard Consortium (CDISC), a global nonprofit organization that was launched in 1997 to generate open-access platform-agnostic data standards for clinical research and its link to health care.

CDISC standards include standards for the planning and design of research protocols (Protocol Representation Model [PRM]); for the collection of case report form data (clinical data acquisition standards harmonization [CDASH]); for the aggregation and tabulation (study data tabulation model [SDTM]) of clinical data; and for statistical analysis (analysis dataset model [ADaM]). Currently, CDISC standards are required for electronic submissions of study data to the United States (US) Food and Drug Administration (FDA) and the Japanese Pharmaceutical and Medical Devices Agency (PMDA)1,2 and they are now the preferred standards for electronic data submission to China National Medical Products Administration (NMPA).3 However, electronic study data can be required in countries other than US, China and Japan by the local authority even if the submission of the e-data package is not mentioned in the local regulatory guidelines. For instance, the European Medicines Agency (EMA) currently does not require an e-data package in the submission but it is performing a pilot to establish the value of receiving individual patient data.4 CDISC standards were initially created for the regulatory submission of clinical trial data to support the approval of medical products, however, both the SDTM and ADaM structures are also valuable tools for data review and pooled analyses. The increased visibility of CDISC standards has highlighted their value in other areas of medical research, with recent publications of therapeutic area user guides (TAUGs) and the Observational Study Guide for observational studies (OSs).5

OSs differ significantly from randomized controlled trials in terms of study goals, design, subject populations, clinical settings, regulatory and study oversight requirements, and data collection and management practices. In particular, in OSs, the exposure to a determinant is passive, in fact the researcher observes its exposure: for instance, the exposure could be a treatment administered as per routine clinical practice or a risk factor not related to a treatment or a disease (e.g. in studies on natural history of disease). In clinical trials the exposure is active, meaning that the study protocol provides indications on how the patient should be exposed to a treatment or a risk factor and randomization could be applied to intervention arms. In addition, OSs do not always have a comparative aim; they might therefore not have a comparison cohort. OSs are often not focused on a specific treatment, but rather they could be aimed at studying a disease or a specific population without any attention given to treatment. As a result, the treatments in OSs could be absent (e.g. when the determinant is the disease), singular and multiple (e.g. when the exposure is one specific active principle and a class of drugs, respectively). These differences pose challenges that hinder the adoption of CDISC standards in observational research.5

Our team plans and conducts OSs that include primary data collection, and applyies CDISC standards throughout the study lifecycle. Challenges encountered in CDISC implementation include gaps in biomedical concepts, which makes it difficult to map items collected in case report forms to standard SDTM domains, and violations of validation conformance rules. After several years in which the modelling of OSs was not explicitly discussed in the SDTM Implementation Guide (IG), on February 28 2024, an Observational Study Guide named “Considerations for SDTM Implementation in Observational Studies and Real-World Data v1.0 (Final)” was published on the CDISC website.6 This offers guidance and/or implementation strategies to address most encountered issues when performing SDTM mapping from for OSs and real-world data (RWD).

Objective

This paper aims to answer to the question: Does our experience in mapping OSs data to CDISC SDTM align with the CDISC Observational Study Guide?

Methods

The workflow of the SDTM mapping process is depicted in Figure 1. As a starting point, we investigated the reasons why the pharmaceutical company requested the application of the CDISC standards to data from OSs. The reasons can be the need to generate ready-to-use databases from OSs for regulatory purposes, or to perform data review, or pooled analyses. Sources to be taken as reference during SDTM mapping such as SDTM model7 and IG,8 TAUGs9,10,11 and the Sponsor’s specific technical requirements, if any, were then clearly defined and the need for adaptations of CDISC standards due to the observational nature of the study was discussed. Starting from agreed sources, technical specifications (STDM Annotated electronic case report form (eCRF) and SDTM source-to-target specification file) were defined and agreed with the Sponsor. SDTM datasets were developed using SAS 9.4 and Enterprise Guide 8.2; Pinnacle 21 was used as validation tool to meet conformance rules, even if not mandatory.

Figure 1
Figure 1

SDTM delivery process workflow.

Data was gathered from three multi-national/national, multi-center, retrospective and prospective cohort studies that were conducted by IQVIA Solutions Italy between 2020 and 2024, and which enrolled a total of 1543 patients.

The first study was a multi-center observational study with six-month follow-up to describe treatment patterns in patients with asthma. No treatment protocol was imposed, and patients received care according to local prescribing information and routine clinical practice. In this study, therapies both prescribed by the physician and those actually taken by the patients were collected (the latter being recorded through patient diaries). SDTM datasets were delivered in 2022.

The second study was a prospective, non-interventional investigation that assessed the effectiveness of a pharmacological treatment in patients with type 2 diabetes, and was administered as part of routine clinical care and without a comparator. SDTM datasets were provided in 2023.

The third study was a non-interventional study that evaluated the long-term safety and effectiveness of a pharmacological treatment in patients with a rare disease, which was also administered in routine clinical practice and without a comparator. SDTM datasets were delivered in 2024.

The application of the CDISC standards in OSs was evaluated both when a medication to treat the disease under study was defined in the study protocol and when it was not.

Results

Eighty-six SDTM datasets were created, 95% of which were conventional SDTM domains according to the SDTM IG (including 27 supplemental qualifiers for domains). All the fields in the eCRF were mapped in the SDTM domains with the exception of “check questions”, i.e. questions whose sole purpose is to guide completion and that refer to other fields/sections of the eCRF where the relevant information to be mapped into SDTM is entered (e.g. the field “Indicate laboratory assessments performed close to baseline”, where, based on the response provided, the completion is redirected to a specific form). However, it was not possible to create some typical SDTM domains such as Exposure as expected (EX) or Trial Elements (TE) because the study did not foresee the administration of a protocol-defined intervention treatment. It was necessary to develop four custom domains to map all collected data: (i) a domain to collect the detail of all the inclusion and exclusion criteria answers as collected in eCRF (in addition to the traditional SDTM Inclusion/Exclusion Criteria Not Met domain); (ii) a domain used for visits that did not occur (missed visits) and for visits performed remotely; (iii) a domain to map patient diary data including data from an electronic diary, specifically designed for the study; and (iv) a domain to map information regarding work/school days lost due to the disease.

Issues were generated by the validation program at dataset, variable and conformance rule level during the mapping process of SDTM as these rules were designed for clinical trials and are not adapted to OSs. The issues were discussed with the Sponsor and were managed on a case-by-case basis. Tables 1, 2 and 3 show the most relevant challenges/issues during SDTM development and validation at dataset, variable and conformance rule level, respectively, how they were managed, and the recommended strategy proposed by CDISC in its newly released document.6

Table 1

Challenges during SDTM development/validation (dataset level).

Validation issue/challenge How issue/challenge was managed Recommended strategy by CDISC*
Missing Exposure (EX) dataset.
  • If a treatment for the disease under study was cited among study objectives data about this treatment were mapped in the Exposure as Collected (EC) domain and other (disease specific or not) treatments in the Concomitant Medications (CM) domain.

  • If a treatment for the disease under study was NOT cited among study objectives, data about all collected treatments were mapped in the CM domain.

Any medication used to treat the disease under study should be represented using the EC/EX domains, if the Investigator deems it appropriate.
All other treatments should be represented in the CM domain.
Adverse events (AE) might not be relevant for all observational studies. If the collection of adverse events or clinical events occurred during study was foreseen in the study protocol, AE and CE domains were developed, respectively. If an unintended consequence is considered to be an adverse event by the Investigator, then the AE domain should be used.
The difference between adverse and clinical events is defined in the protocol, and ultimately it is up to the Investigator or Sponsor to decide whether to use the AE or CE domain.
Missing Subject Elements (SE) dataset.
Trial arms and elements may not be relevant to the study. Subject-level data on these concepts will therefore not apply.
The issue was justified because “Trial arms and elements are not relevant to observational research. Therefore, neither are subjects’ progression through these”. The issue should be explained to regulatory authorities in the study data reviewer’s guide (SDRG).
Missing Trial Arms (TA) dataset.
Observational studies may not have planned arms.
The issue was justified because “Planned arms are not foreseen for this observational study”. The issue should be explained to regulatory authorities in the SDRG.
Missing Trial Elements (TE) dataset.
The concept of trial elements may not be relevant to the study.
The issue was justified because “Without trials arms there are no elements to describe”. The issue should be explained to regulatory authorities in the SDRG.
  • *In this column, only the strategy most relevant to our use case has been cited from “Considerations for SDTM Implementation for Observational Studies v1.0”.

Table 2

Challenges during SDTM development/validation (variable level).

Validation issue/challenge How issue/challenge was managed Recommended strategy by CDISC*
  • No arm to describe in observational studies.

  • ARM, ARMCD, ACTARM, and ACTARMCD are expected variables.

  • ACTARMUD and ARMNRS are expected variables, too.

In Demographics (DM) domain the following variables were mapped:
  • If a treatment for the disease under study was cited among study objectives:

    ARM = “Drug name”/“SCREENING FAILURE”

    ACTARM = “Drug name”/“SCREENING FAILURE”

    ARMCD = “Drug code”/“SCREENFAIL”

    ACTARMCD = “Drug code”/“SCREENFAIL”

  • If NOT:

    ARM = “Observational”

    ACTARM = “Unplanned Treatment”

    ARMCD = “OBS”

    ACTARMCD = “UNPLAN”

    ACTARMUD and ARMNRS were not mapped.

  • Populate ARM/ARMCD and ACTARM/ACTARMCD with the same values.

  • If there is no use case for ARM/ARMCD and ACTARM/ACTARMCD, leave values null and populate ARMNRS with the appropriate reason.

Reference Start Date (RFSTDTC) and Reference End Date (RFENDTC) are expected variables but study reference periods may not be relevant.
  • If a treatment for the disease under study was cited among study objectives, RFSTDTC/RFENDTC were set as date/time when subject was first/last exposed to study treatment.

  • If NOT:

    • RFSTDTC and RFENDTC were left null.

    • • The issue “Missing value RFSTDTC when subject is treated” was justified because “there was no study drug exposure due to observational study design”.

  • Use registration date as RFSTDTC.

  • Set the date of occurrence of the evaluated event as RFENDTC.

  • Document how the RFSTDTC and RFENDTC were defined/populated in the Define.XML or SDRG.

In Trial Summary (TS) domain, Study Start Date (TSPARMCD = SSTDTC) is defined as the earliest date of informed consent among any subject (Date/Time of Informed Consent, RFICDTC) enrolled in the study. Informed consent may not be available in observational studies. Study Start Date was mapped as the earliest date of informed consent among any enrolled subject (RFICDTC in DM domain).
  • Sponsors should set the study start date to the earliest reference start date for any subject.

  • Document how the study start date was defined/populated in the Define.XML or the SDRG if Define.XML is not used.

Missing value for Baseline record (in multiple domains). The issue was justified because “The collection of parameters (such as laboratory tests, items of questionnaires, …) was not mandatory”. Not cited in the CDISC document.
NULL value in SEX variable marked as Required (DM domain). The issue was justified because “Demographic characteristics were not collected for violators”. Not cited in the CDISC document.
Regulatory Expected variable EPOCH not found (in multiple domains). The EPOCH variable specifies what phase of the study the data corresponds with. Including it allows the reviewer to easily determine at which phase of the study the observation or the event occurred, as well as the intervention the subject experienced during that phase. The issue was justified because observational studies are not interventional studies and so, EPOCH is not foreseen. The issue should be explained to regulatory authorities in the SDRG.
Missing value for RFXSTDTC (RFXENDTC, RFSTDTC, RFENDTC) in DM domain, when a treatment for the disease under study was cited among study objectives.
Observational studies do not include regimented exposure to a protocol-defined drug.
The issue was justified because “The variable was missing since there was no study drug exposure due to observational study design”. The issue should be explained to regulatory authorities in the SDRG.
  • *In this column, only the strategy most relevant to our use case has been cited from “Considerations for SDTM Implementation for Observational Studies v1.0”.

Table 3

Challenges during SDTM development/validation (conformance rule level).

Validation issue/challenge How Issue/Challenge was managed Recommended strategy by CDISC*
Value not found in extensible codelist (multiple domains). In case the value could not be converted into controlled terminology issue was justified because “The value can be taken as it was because the codelist is extensible”. Not cited in the CDISC document.
Variable length is too long for actual data (multiple domains). The validation rule “Variable length is too long for actual data” requires that the variable’s length is assigned based on actual stored data to minimize file size. Since the submission to regulatory authority is not foreseen (and so data will not be transferred) the issue was disregarded. Not cited in the CDISC document.
  • *Referring to “Considerations for SDTM Implementation for Observational Studies v1.0”.

Discussion

In the realm of conventional clinical trials, which are conducted with the goal of submitting a new medical product or intervention to regulatory bodies for marketing authorization, a set of international data standards has been adopted and is increasingly being mandated by various national and regional regulatory agencies. CDISC standards were created to be utilized for regulatory submissions of clinical trial data to support the approval of medical products for the market; their advantage in standardization has made them valuable tools for data review and pooled analyses. However, the recent expansion of CDISC standards, the development of TAUGs, and the rise in CDISC’s visibility have highlighted the importance of data standards in other fields, such as observational research. In 2024, CDISC published the document “Considerations for SDTM Implementation in Observational Studies and Real-World Data v1.0 (Final)6 to guide the implementation of CDISC standards in OSs. This paper reports on the experience in applying the CDISC SDTM standard to map data from real-world studies before the release of the CDISC document dedicated to OSs to verify if our experience was aligned with this CDISC Guide. Before mapping data from the OSs, the reasons and applications of CDISC standards within the pharmaceutical company were investigated, sources to take as reference during mapping were clearly defined, and it was agreed with the Sponsor that some adaptations of CDISC standards would be necessary due to the observational nature of the study. A crucial factor in succeeding in the mapping process was to involve the Sponsor in the revision/approval of technical specifications or deliverables and to set up discussions between the Sponsor and SDTM programmers to fine-tune the process and to share specific cases in which the CDISC standards should have been adapted. As a result, the data from three multi-national/national, multi-center, retrospective and prospective cohort studies conducted by IQVIA Solutions Italy between 2020 and 2024 were mapped into 86 SDTM datasets, four of which were custom domains specifically created to map peculiar data (such as patient diary data) collected in the study. Some conformance rules (at dataset, variable, or controlled terminology level) were not met, and the resulting issues were discussed with the Sponsor to define the appropriate non-conformance rationale.

Real-world data (RWD) and real-world evidence (RWE) are becoming increasingly significant in clinical research and health care decision-making. To effectively utilize RWD and produce reliable RWE, data must be clearly defined and organized in a manner that ensures semantic interoperability and consistency among all stakeholders. The implementation of data standards is fundamental to the provision of high-quality evidence for the advancement of clinical medicine and therapeutics.12 In 2018, the CDISC President and CEO appointed a Blue-Ribbon Commission that was charged with preparing CDISC for the next decade of growth and change by considering what factors would most influence utilization of CDISC standards.13 The Commissioners believe that CDISC standards are relevant for every aspect of the research enterprise; however, they also concluded that CDISC must be prepared for significant changes as the research world is undergoing substantial disruption. The core CDISC foundational standards model must be fine-tuned both to support implementation through better internal alignment and to better reflect the core biomedical concepts common to research protocols. As the use of RWD in clinical research grows, CDISC standardization remains necessary to maximize the value of RWE in research datasets. CDISC must fundamentally change its historically hands-off approach to implementation. One key effort to support implementation is to build a new content layer that standardizes the transformation of data across the CDISC foundational standards. The CDISC standards will become de facto one standard, evolving to one well-refined model on the back end where foundational standards, and therapeutic area specific extensions, become views of data, while developing a more accessible profile on the front end so that non-experts can leverage the benefits of standardization.

The CDISC RWD Connect Initiative conducted a qualitative Delphi survey that involved an expert advisory board with multiple key stakeholders tasked with understanding the barriers to implementing CDISC standards for RWD and identifying the tools and guidance that may be needed to implement CDISC standards more easily.1 It was widely agreed that the standardization of RWD is necessary, and that the primary focus should be on CDISC’s ability to improve data sharing and the quality of RWE. There are many ongoing data standardization efforts around activities related to human health data, each with different definitions, levels of granularity, and purposes. Among these, CDISC has been successful in standardizing clinical trial-based data for regulation worldwide. However, the complexity of the CDISC standards and the fact that they were developed for different purposes, combined with the lack of awareness and incentives to use a new standard, and insufficient training and implementation support, are significant barriers to setting up the use of CDISC standards for RWD.1

The current CDISC RWD Strategy proposes to collaborate, partner, and harmonize with other industry standards initiatives and standards organizations to enable an efficient pathway for RWD to be transformed for ultimate use cases, such as data sharing, regulatory submissions, exploratory analysis and incorporation into clinical research trials. Moreover, other CDISC priorities are to provide the industry with training and education on the use and the importance of standards in the RWD ecosystem, and to support the use of RWD by Regulatory Agencies. In this context, in 2024 CDISC published “Considerations for SDTM Implementation in Observational Studies and Real-World Data”,6 a tool necessary for proper, efficient real-world data transformations and metadata-rich data exchange. This article presents the challenges we faced when mapping OS data to SDTM and suggests strategies or workarounds. This paper identified several challenges that had previously been encountered during CDISC implementation activities over the years preceding the release of the CDISC Observational Study Guide. The collection and dissemination of use cases, development of tools and support systems for the RWD community, and collaboration with other standards development organizations are potential steps forward. Using CDISC standards will help to link clinical trial data and RWD and will promote innovation in health data science.1

Limitations

This paper reports the experience related to sponsored OSs from an Italian single contract research organization devoted to conducting international and local OSs. However, we believe that our expertise represents a good approximation of the challenges and solutions about SDTM implementation in observational research.

Conclusions

The adoption of data standards is one of the cornerstones that supports high-quality evidence for the development of clinical medicine and therapeutics.

The CDISC document “Considerations for SDTM Implementation in Observational Studies and Real-World Data” confirms the issues and challenges that have already been encountered in our CDISC implementation activities over the years preceding its release in 2024.

The document is a valid support because it provides possible solutions to challenges that are typical of the observational study design and speeds up the evaluation of automatic alerts from validation software.

However, implementing SDTM for OSs still requires ad-hoc solutions and the management of issues on a case-by-case basis through making changes to the SDTM domains and by providing documented justification.

Competing Interests

The authors have no competing interests to declare.

Author contributions

  • Sara Rizzoli developed SDTM described in the paper; analyzed the results presented in the paper; contributed to manuscript drafts.

  • Alessandra Ori contributed to manuscript drafts.

  • Alessandra Mignani developed SDTM described in the paper; contributed to manuscript drafts.

  • Fabio Ferri contributed to manuscript drafts.

  • Lucia Simoni contributed to manuscript drafts.

References

1. Facile R, Muhlbradt EE, Gong M, Li Q, Popat V, Pétavy F, Cornet R, Ruan Y, Koide D, Saito T, Hume S, Rockhold F, Bao W, Dubman S, Jauregui Wurst B. Use of Clinical Data Interchange Standards Consortium (CDISC) Standards for Real-world Data: Expert Perspectives From a Qualitative Delphi Survey JMIR Med Inform. 2022 Jan 27; 10(1):e30363. DOI:  http://doi.org/10.2196/30363

2. CDISC. Roadmap. Accessed May 29, 2025. https://www.cdisc.org/cdisc-roadmap.

3. CDISC. Global Regulatory Requirements. Accessed May 29, 2025. https://www.cdisc.org/global-regulatory-requirements.

4. Brazzo V, Vaghi P. Submitting data worldwide: to FDA and beyond (Oral communication) at CDISC Italian User Network, 2024.

5. Neville J, LeRoy B. Considerations for Using CDISC Standards in Observational Studies – PHUSE US Connect 2019 Paper SI08.

6. CDISC. Considerations for SDTM Implementation in Observational Studies and Real-World Data. Version 1.0 Final 28-02-2024. Published February, 28, 2024 Accessed May 29, 2025. https://www.cdisc.org/standards/real-world-data.

7. CDISC. Study Data Tabulation Model Version 2.0 (Final) Accessed May 29, 2025. https://www.cdisc.org/standards/foundational/sdtm/sdtm-v2-0.

8. CDISC. Study Data Tabulation Model Implementation Guide: Human Clinical Trials Version 3.4 (Final) Published July, 21, 2022 Accessed May 29, https//www.cdisc.org/standards/foundational/sdtmig/sdtm-and-sdtmig-conformance-rules-v2-0.

9. CDISC. Therapeutic Area Data Standards User Guide for Asthma Version 1.0 2013-11-26.

10. CDISC. Therapeutic Area Data Standards User Guide for Diabetes Version 1.0 2014-08-01.

11. CDISC. Therapeutic Area User Guide for Rare Diseases Version 1.0 2023-12-14.

12. Santillan C, Minkue Mi Edou J. ADaM conversions: The good, the bad and the ugly PhUSE 2014 Paper DH01. https://www.lexjansen.com/phuse/2014/dh/DH01_ppt.pdf.

13. CDISC. 2018–2019 Blue Ribbon Commission Insights Accessed May 29, 2025 https://www.cdisc.org/system/files/about/brc/2018-2019_Blue_Ribbon_Commission_Insights.pdf.