The evolving Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model, or ADaM,1 continually poses new and interesting (and sometimes frustrating) challenges for SAS programmers. This is especially true when a clinical trial does not fit the straightforward structure of screening, treatment, and follow-up time periods. If the time periods are further subdivided, and the number of variables in the ADaM ADSL domain quickly increase, then the task of linking the Basic Data Structure (BDS) and Occurrence Data Structure (OCCDS) data sets to ADSL can become daunting. The ADaM Implementation Guide offers quite an assortment of timing variables that the programmers may rarely use.
This case study describes the approaches and obstacles to programming ADaM data for a clinical trial in which subjects were treated with the same study drug via different routes of administration during different time points to determine the safest and most effective way of delivering the drug. Examples of routes of administration (not necessarily used in this trial) include oral, intravenous, and subcutaneous injections. The trial design required the use of ADaM variables for phases, periods, and subperiods. Included in this case study is a discussion of code from SAS macros that helped to derive the numerous variables in an efficient manner. This case study assumes that the reader has a basic to intermediate level of experience with CDISC concepts, including the Study Data Tabulation Model (SDTM), ADaM, and Controlled Terminology.
The purpose of this case study is to provide a better understanding of the ADaM phase, period, and subperiod timing concepts through the use of example data and scenarios from an actual clinical trial. The types of studies that may require these extra timing variables, how best to develop a programming plan, and how to explain the plan to other project team members will be discussed. Example SAS code is provided, which can be applied to other trials. In addition, the case study will address how to overcome some specific challenges with trial data, such as how to handle screen failures, subjects who discontinue the study early, and partial dates or dates that do not have a time associated with them.
The clinical trial presented in this case study is part of an ongoing program; therefore, trial design specifics are not divulged here. Generalized terms such as Route 1 and Route 2 will be used to represent the routes of administration, the number of study days have been rounded, and the drug name and indication will not be revealed. These details are not critical to the explanation of the particular challenges and lessons learned from this trial as they relate to ADaM.
During creation of the ADaM datasets, we realized what a unique situation existed in the application of ADaM datasets to the non-traditional study. In this paper we have explained our thought process post-hoc and have provided our thought processes in creating the ADaM data sets in hopes of helping others in similar situations.
In the trial, each subject received the same study drug throughout the trial. During study days 1–10, the subject received the drug via Route 1. During study days 11–20, the subject received the drug via three different but related routes, which will be referred to as Route 2.1, Route 2.2, and Route 2.3. Route 3 was used for days 21–30, and days 31–40 were the follow-up time period. The study drug was commercially available; subjects received the approved route of administration during screening and returned to the approved route of administration during the follow-up period. To simplify the examples in this case study, information about treatment during the screening and follow-up periods was not included.
The schema shown in Figure 1 provides a visual of the trial design. The biostatistician inserted a similar schema into the Statistical Analysis Plan (SAP) to guide the programming of the ADaM domains. The schema also illustrates how the ADaM phases, analysis periods, and analysis subperiods were assigned for this trial. Below are the definitions of the key permissible BDS timing variables, as described in the ADaM Implementation Guide v1.2.2
Analysis phase is represented by the character variable APHASE and is a categorization of timing within a study. It can be a higher-level categorization of the analysis periods or an analysis epoch. Analysis phase is independent of the treatment variables within ADSL and may be populated for spans of time when a subject is not on treatment. APHASEN is the numeric variable that can accompany APHASE to provide a numeric representation for sorting. (Associated ADSL variables: APHASEw)
Analysis period is represented by the numeric variable APERIOD. It is a record-level timing variable that represents the analysis period within the study associated with the record for analysis purposes. APERIODC is the character variable that can accompany APERIOD to provide a text description. (Associated ADSL variables: APxxSDT, APxxEDT)
Analysis subperiod within period is represented by the numeric variable ASPER. This is a numeric value characterizing a sublevel within APERIOD. The value of ASPER resets to 1 when the APERIOD value changes. ASPERC is the character variable that can accompany ASPER to provide a text description. (Associated ADSL variables: PxxSw)
There is an important relationship between the period and treatment variables in the ADaM ADSL data. Planned treatment for period xx is represented as TRTxxP, and actual treatment for period xx is represented as TRTxxA. The “xx” must align with the period number to give the treatment applied during the associated analysis period. Treatment start and end dates, period start and end dates, and subperiod start and end dates also must include the “xx” in the variable name to tie all of them together to ensure that all are associated with the same analysis period of the trial. Figure 2 shows additional information that was included in the SAP to guide programming for relating the study design to the treatment and timing variables. It is important to note that there may be a gap between the end of treatment in a period and the end of the actual period. In addition, each period or phase ends one day before the start of the next phase or period to avoid overlap.
SAS macros for deriving phase, period, and subperiod variables
SAS macro code is provided below for one approach to derive period start and end dates in ADSL, calling the macro once for each of the three periods. Similar code can be used to loop over the phases and subperiods:
*--The PERIODLOOP macro assigns start and stop of each treatment and period,;
*--based on the route of administration in SDTM.EX.;
%macro periodloop(per=, _route=);
proc sort data=ex out=ex&per;
where exroute = “&_route”;
by usubjid exstdtc;
*--Select the first dose within the time periods for the selected route of;
*--administration for each subject. This is the treatment start date and;
*--period start date;
data per&per._start(keep=usubjid tr0&per.sdtm ap0&per.sdt);
*--Select the last dose within the time periods for the selected route of;
*--administration for each subject. This is the treatment end date.;
*--Period end date is derived later outside of the macro;
data per&per._endtrt(keep=usubjid tr0&per.edtm);
by usubjid exstdtc;
*--Merge the period/treatment start and end date information;
data per&per.(keep=usubjid tr0&per.sdtm tr0&per.edtm ap0&per.sdt);
merge per&per._start &per._endtrt;
%periodloop(per=1, _route=%str(ROUTE 1)
%periodloop(per=2, _route=%str(ROUTE 2)
%periodloop(per=3, _route=%str(ROUTE 3)
A separate section of code at the end of the program derives period end dates as one day prior to the subsequent period start date and calculates treatment duration for each treatment period. The end date of the last period is the subject’s end of study date. Or if a subject discontinues in the middle of period 1 or period 2, then the period 1 or period 2 end date is the subject’s end of study date, and the start and end date for period 3 will be missing.
The following is sample SAS code for assigning visit dates within a BDS data program to the appropriate phase, period, and subperiod. This macro code is called within each applicable ADaM data program:
if (.z < ph1sdt<=&_visitdt.) and (&_visitdt<=ph2sdt or ph1edt=.) then
aphase = aphase1;
else if (.z < ph2sdt<&_visitdt) and (&_visitdt<=ph3sdt or ph2edt=.) then
aphase = aphase2;
else if (.z < ph3sdt<&_visitdt) and (&_visitdt<=ph3edt or ph3edt=.) then
aphase = aphase3;
if (.z < &_visitdt <= ap01sdt) then do;
aperiod = .;
aperiodc = ‘ ‘;
else if (&_visitdt > ap01sdt) and (&_visitdt<= ap02sdt or ap01edt=.) then do;
aperiod = 1;
aperiodc = trt01p;
else if (&_visitdt > ap02sdt) and (&_visitdt<= ap03sdt or ap02edt=.) then do;
aperiod = 2;
aperiodc = trt02p;
else if (&_visitdt > ap03sdt) and (&_visitdt<= ap04sdt or ap03edt=.) then do;
aperiod = 3;
aperiodc = trt03p;
if (&_visitdt >= p02s1sdt) and (&_visitdt<= p02s2sdt or p02s1edt=.) then do;
asper = 1;
asperc = p02s1;
else if (&_visitdt >= p02s2sdt) and (&_visitdt<= p02s3sdt or p02s2edt=.) then do;
asper = 2;
asperc = p02s2;
else if ((&_visitdt >= p02s3sdt) and (&_visitdt<= p02s3edt or p02s3edt=.))
or (&_visitdt. = ap03sdt) then do;
asper = 3;
asperc = p02s3;
Table 1a, 1b, 1c, 1d, 1e contain some key ADSL variables from the trial. As shown, the number of variables to keep track of in ADSL grows when phases, periods, and subperiods are required. There are many more variables that could be necessary to include in ADSL, such as dose unit and treatment duration for each period.
|1||xxx-yy-201||ROUTE 1||1||ROUTE 2||2||ROUTE 3||3||XXX||XXX||XXX|
|2||xxx-yy-204||ROUTE 1||1||ROUTE 2||2||ROUTE 3||3||XXX||XXX||XXX|
|3||xxx-yy-301||SCREEN FAIL||99||SCREEN FAIL||99||SCREEN FAIL||99|
Table 2 presents an example of some of the key variables in the associated ADVS data set.
|xxx-yy-201||Systolic Blood Pressure (mmHg)||SCREENING||29APR19:00:00:00||PRE-TREAT||SCREENING|
|xxx-yy-201||Systolic Blood Pressure (mmHg)||DAY 1||13MAY19:11:41:00||ROUTE 1||TREATMENT||1||ROUTE 1|
|xxx-yy-201||Systolic Blood Pressure (mmHg)||DAY 10||22MAY19:11:31:00||ROUTE 2.1||TREATMENT||2||ROUTE 2||1||ROUTE 2.1|
|xxx-yy-201||Systolic Blood Pressure (mmHg)||DAY 12||24MAY19:11:51:00||ROUTE 2.2||TREATMENT||2||ROUTE 2||2||ROUTE 2.2|
|xxx-yy-201||Systolic Blood Pressure (mmHg)||DAY 14||26MAY19:10:23:00||ROUTE 2.3||TREATMENT||2||ROUTE 2||3||ROUTE 2.3|
|xxx-yy-201||Systolic Blood Pressure (mmHg)||DAY 20||01JUN19:10:52:00||ROUTE 3||TREATMENT||3||ROUTE 3|
|xxx-yy-204||Systolic Blood Pressure (mmHg)||SCREENING||25JUN19:00:00:00||PRE-TREAT||SCREENING|
|xxx-yy-204||Systolic Blood Pressure (mmHg)||DAY 1||09JUL19:11:01:00||ROUTE 1||TREATMENT||1||ROUTE 1|
|xxx-yy-204||Systolic Blood Pressure (mmHg)||DAY 10||19JUL19:10:26:00||ROUTE 2.1||TREATMENT||2||ROUTE 2||1||ROUTE 2.1|
|xxx-yy-204||Systolic Blood Pressure (mmHg)||DAY 12||21JUL19:10:58:00||ROUTE 2.2||TREATMENT||2||ROUTE 2||2||ROUTE 2.2|
|xxx-yy-204||Systolic Blood Pressure (mmHg)||DAY 14||23JUL19:11:02:00||ROUTE 2.3||TREATMENT||2||ROUTE 2||3||ROUTE 2.3|
|xxx-yy-204||Systolic Blood Pressure (mmHg)||DAY 20||29JUL19:11:03:00||ROUTE 3||TREATMENT||3||ROUTE 3|
|xxx-yy-301||Systolic Blood Pressure (mmHg)||SCREENING||05MAR20:11:03:00||PRE-TREAT||SCREENING||1||SCREEN FAIL|
After the steps described above are completed, data-specific challenges must be addressed to ensure further CDISC compliance as follows:
Several values may be blank, which is correct. The analysis periods do not include the screening phase of the study; therefore, the variables relating to period and subperiod are blank for the screening records.
ASPER and ASPERC are blank for periods 1 and 3, as they are not applicable.
Data for the Screen Failure subject were only recorded during the screening phase of the study; therefore, values for variables related to subsequent parts of the study are blank. It is appropriate for these values to be blank for screen failures per the CDISC rules described in the ADaM Implementation Guide, as this is how the model indicates that the subject did not progress through later parts of the trial. To avoid any confusion by a data reviewer not familiar with the ADaM concept details, the user can include a detailed summary describing information gathered and derived for screen failures in the Analysis Data Reviewers Guide3 that accompanies these data for submission.
Start and end dates
The start and end dates for phases, periods, and subperiods cannot be based on visit names/numbers (in case these vary from subject to subject). They have to be based on visit dates relative to planned events. For example, Period 2 is not assigned to always start on visit day 10 but rather on the first dose date using the Route 2 administration method. While this was scheduled to occur on day 10, rarely do all subjects follow the visit schedule precisely without deviation.
TRTP is derived to coincide with the study periods and subperiods in the manner that is most meaningful to reviewers. In this case study, the value of TRTP in ADVS comes from the corresponding value of TRTxxP that matches with the period of the visit when the vital sign was recorded. The exception is the records where TRTP = ROUTE 2.1, ROUTE 2.2, and ROUTE 2.3. The ADaMIG v1.2 does not include TRTxx variables for subperiods. TRTP can also be blank for the screening records, but it was decided to derive it as PRE-TREAT for clarity in the data listings.
Trial design and data collection methods
Trial design and data collection methods can impart additional challenges. In this trial, visit time was not collected for all assessments. This led to challenges in using time as part of assigning phase, period, and subperiod start and end dates. Therefore, only visit dates were used, and the end of each period was defined as one day prior to the start of the next period. Additionally, the trial design included three groups of subjects with different visit schedules, and the groups did not all receive the same set of routes of administration. The SAS programs were written to derive the CDISC variables separately for each group, and then the three sets of data were integrated to create the final ADaM data sets. The assignment of treatments and labeling of phases, periods, and subperiods were revisited numerous times as the data were reviewed, and differences between the various SDTM domains (such as the visit schedule for a particular parameter) that were incorporated into the ADaM data also required additional consideration.
Not every study will require additional effort to designate phases, periods, and subperiods as was necessary for the case study trial. Sponsors and biostatisticians may request that programmers utilize the ADaM variable options when the timing of a particular event or finding as it relates to another event or finding during a different time point in the study is of interest. The crossover study design lends itself to the use of these methods in particular. Subjects are their own comparators when administered Drug A during the first period of the study and Drug B during the second period of the study. The biostatistician may often desire the analysis to be presented by period. When CDISC data are developed utilizing the concepts described in this case study, results can be easily traced to source data and replicated as needed.
By following the steps described below and referencing the provided SAS code and example data, as well as addressing data-specific challenges, programmers and analysts can more efficiently manage the increased complexity in ADaM data when required.
When the study is broken into phases, periods, and subperiods, there are two distinct steps:
Derive the timing variables in ADSL that define when each phase, period, and subperiod starts and ends.
Apply the timing variables from ADSL to the visit or event dates in the BDS and OCCDS data sets to assign each record to the appropriate phase, period, and subperiod.
When these two steps are completed correctly, data can be analyzed by phase, period, and subperiod as required by biostatisticians and sponsors.
The idea of designating phases, periods, and subperiods for a clinical trial may initially appear conceptually simple. However, as demonstrated by this case study, data collection and CDISC compliance can quickly become complex. While this trial design, combined with the rules dictated in the ADaM Implementation Guide, resulted in many challenging obstacles to overcome, it also yielded a number of lessons learned that will benefit future studies. For example, a sponsor may consider the value of consistently capturing not just dates but also times when possible for each type of data. The clinical data manager may consult with the CDISC programmer during database design and perhaps provide additional variables and/or labels to make the ADaM data programming simpler and more transparent. As such, the project manager will have a better understanding of the time and resources required to achieve CDISC compliance and to format data sets to be both analysis and submission ready.
Articles and examples of each of the CDISC standards can be found on the CDISC.org website in the Knowledge Base sections. More examples of studies that utilize the phase, period, and subperiod timing variables will hopefully be added over time with sample data and descriptions of study-specific challenges and how to overcome them.
This case study emphasizes the importance of considering different variables and relevant time periods during trial design, as well as the types of analyses and conclusions that can be highlighted to meet the goals of a trial. Biostatisticians, data managers, and CDISC programmers should be consulted early in the trial design process, and a trial schema (shown in Figure 1) should be included in every SAP when trial time periods are not straightforward. The sponsor will benefit from an understanding of how the trial design will ultimately be represented in the final data submission, and the impact of some of the design decisions that are not easily apparent until the CDISC rules are applied.
The goal of the CDISC organization is to achieve standardization in data submissions that will ultimately improve all health research. Such standardization is more difficult for analysis data because each study has different endpoints, populations, indications, and sample size and power challenges. However, this is a worthwhile effort as standardization allows researchers to combine data and analyses across studies and vastly improve the capabilities of researchers to solve current challenges in health research.
Thank you to my supervisor and colleagues, editorial staff, and Westat senior leadership for your assistance and support in the writing of this paper.
DISCLAIMER: The contents of this paper are the work of the author and do not necessarily represent the opinions, recommendations, or practices of Westat.
The author has no competing interests to declare.
1. Clinical Data Interchange Standards Consortium, Inc. CDISC Analysis Data Model Team. Analysis data model (ADaM) version 2.1. 2009. Accessed April 19, 2022. https://www.cdisc.org/standards/foundational/adam
2. Clinical Data Interchange Standards Consortium, Inc. CDISC Analysis Data Model Team. Analysis data model implementation guide version 1.2. 2019. Accessed April 19, 2022. https://www.cdisc.org/standards/foundational/adam
3. US Food and Drug Administration. Study Data Technical Conformance Guide: Technical Specifications Document. October 2021. Accessed April 19, 2022. https://www.fda.gov/media/154109/download.