Hierarchical composite endpoints (HCE), including the recently introduced kidney HCE, are complex endpoints that are usually analyzed by win statistics and are visualized using novel maraca plots. As a result of its novelty and the complexity of the analyses of HCE using win statistics, the construction of analysis datasets that conform to the fundamental principles put forward by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM) is not straightforward.
We show that in the case of a fixed followup it is possible to construct an analysis dataset that conforms to Basic Data Structure principles and is analysisready for conducting multiple analyses, including win statistics generation and visualization of HCE using maraca plots.
We use theoretical justification for the fixed followup designs to show that the pairwise comparisons of participants for the win statistics analyses can be reduced to a participantlevel ranking, and use the fundamental principles put forward by CDISC and Tidy principles of the data science community to derive an ADaMcompliant dataset.
In the setting of fixed followup designs, we construct an ADaMcompliant dataset for conducting win statistics analyses and visualization using maraca plots, with the required metadata traceability.
Based on the growing importance of HCEs in clinical trials, and the difficulty in creating ADaMcompliant datasets for these analyses, we provide principles to create such datasets, to prompt the clinical community and CDISC to work towards standardization of analysis datasets for hierarchical composite endpoints.
Hierarchical composite endpoints (HCEs) are complex endpoints^{1,2,3,4} that are analyzed using win statistics and visualized using maraca plots.^{6} An HCE has a hierarchical structure and uses the most clinically severe event of a participant in studies with a fixed followup design. This results in an ordinal endpoint, similar to the severity scale endpoints. As a result of its hierarchical nature, an HCE can combine outcomes of different types into a composite, for example, clinical events of death and hospitalization with numerical laboratory variables or symptom summary scores.^{7,8} In addition, the clinical events may contribute to the composite with the time of the corresponding event, as an additional layer of severity. This means that participants having an event of the same severity are compared using the timing of the event, with a later event signifying a better outcome. Overall, the ordering is done so that a higher order means a better outcome. A characteristic of ordinal endpoints is that the concepts of better or worse are defined but not the quantitative magnitude of how much better or worse (unlike a continuous endpoint). HCEs are implemented in different therapeutic areas: COVID19,^{9,10} heart failure,^{8,11} and chronic kidney disease (CKD),^{7} to name a few.
Due to its novelty and the complexity of the analyses involving HCE, the construction of analysis datasets conforming to the fundamental principles put forward by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM)^{12} is not straightforward nor is it apparent whether it is possible. These fundamental principles were suggested with the purpose of providing standardization of the datasets across various stakeholders included in the conduct, analysis, and reporting of clinical trials in order to achieve transparency in analyses, as well as in communication and review.^{13} ADaM is one of the implementations of these fundamental principles; other implementations of similar principles are known to the data science community as Tidy data principles.^{14}
Win statistics^{5} (win ratio,^{15} win odds^{16,17} or win ratio with ties,^{18,19} net benefit^{20}) are statistical methods for analyzing HCE and are based on the principle of comparing each participant in the active group with each participant in the control group using multiple outcomes and differing followups for these outcomes. Construction of an ADaM compliant analysis dataset is therefore a challenge facing every clinical trialist involved in the analysis and reporting in a regulatory setting where such data structures are a requirement.
Using theoretical justification in the case of a fixed followup, we show that it is possible to construct an analysis dataset, ADHCE, that conforms to ADaM principles using the Basic Data Structure (BDS) that is analysisready for conducting win statistics analyses. In other words, this dataset can be used for performing the analyses without having to manipulate data first. The created BDS for the HCE analysis will therefore allow the separation of the analysis data creation from the analysis result generation (as is the intention of ADaM datasets), even for such complex analyses as win statistics calculations.
Traceability between analysis data values and their specific predecessor records is provided in the form of data point traceability. Traceability facilitates transparency of analysis conduct and allows for its replication. Detailed traceability is particularly important for the HCE derivation as it involves multiple outcomes derived through complex data manipulations from different datasets. Construction of a single ADHCE dataset that follows the BDS and is analysisready is important for clear communication of results and software development for analysis and reporting.
An ADaM dataset is a particular type of analysis dataset that follows the ADaM fundamental principles defined in the ADaM^{12} and is compliant to ADaM defined structures or follows as closely as possible to the ADaMIG variable naming and other conventions.^{13} Currently, ADaM has three structures: Subject Level Analysis Dataset (ADSL), Basic Data Structure (BDS), and Occurrence Data Structure (OCCDS). An ADaM dataset contains both source and derived data; it is therefore important to clearly document the variable derivations and how to use them for obtaining the analysis results. ADSL is a required, participantlevel dataset that contains participants’ baseline and demographic characteristics, population flags that indicate the participant’s inclusion in different analysis populations, planned and actual treatment variables for each period, and important dates. The BDS datasets contain endpoints and data that vary over time during the course of a study and are organized as one or more records per subject per analysis parameter per analysis timepoint. It is often optimal to have more than one BDS analysis dataset, but not necessarily one dataset per analysis. The BDS datasets are the main data structures used for complex statistical analyses but are not designed to support analysis of incidence of adverse events or other occurrence data. Analysis of such data is supported in the OCCDS. For commonly used analysis methods (eg, analysis of variance or covariance, logistic regression and so on) the BDS implementation is straightforward. A more complex analysis method for timetoevent analyses has its own standardized BDS, ADTTE, that is well developed^{21} and widely used. Although the BDS supports most statistical analyses, it does not support all statistical analyses. For example, it does not support simultaneous analysis of multiple dependent (response/outcome) variables or a correlation analysis across a range of response variables.
In the ADaM design, at a minimum, the analysis datasets should contain the datasets needed for the recreation of specific statistical methods. There is no requirement that every analysis has its own dataset, but rather, a single dataset can support multiple analyses to achieve the optimal number of analysis datasets. Each analysis dataset should contain all the analysisenabling variables required for performing the statistical analysis it is designed to support (it can even contain supportive variables not needed for the analysis but that are of interest for traceability purposes). This can lead to redundancy, that is, the same data appearing in multiple datasets, but this is necessary for having analysisready datasets. Analysisready does not mean that the results can be generated in a single statistical procedure, but rather that each of the summary statistics included in the results can be derived with minimal programming effort using standard statistical procedures with the dataset as input.
We briefly describe the fundamental principles governing the structure of BDS in connection to Tidy data principles and discuss the structure of ADLB (analysis datasets for laboratory values) that is used for the ANCOVAtype analyses and ADTTE for timetoevent analyses, as these two datasets, alongside the participantlevel ADSL, are the source datasets for ADHCE. Then, following the BDS principles, we construct the ADHCE dataset, which is analysisready for multiple analyses (with its metadata traceability describing the source datasets and variables) and provide the minimal steps required to perform these analyses using ADHCE.
The methodology provided here is applicable only for fixed followup settings. For settings without fixed followup, we explore the challenges associated with the derivation of an analysis dataset that conforms to the BDS principles.
Consider the case of two treatment groups, with active and control treatments, and assume that all participants have the same followup and there are no dropouts, meaning all participants were followed for all events of interest until the end of the fixed followup. The kidney HCE^{4,7} has the following construction: during a fixed followup, participants are followed for one of the six dichotomous events in the provided hierarchy described in
The outcomes in the kidney HCE.
1.  Death  Timing (later is better)  Worst  ADTTE 
2.  Dialysis  Timing (later is better)  ADTTE  
3.  Sustained eGFR <15  Timing (later is better)  ADTTE  
4.  Sustained >=57% decline in eGFR  Timing (later is better)  ADTTE  
5.  Sustained >=50% decline in eGFR  Timing (later is better)  ADTTE  
6.  Sustained >=40% decline in eGFR  Timing (later is better)  ADTTE  
7.  Individual rate of change of GFR  Actual values (higher is better)  Best  ADLB 
eGFR = estimated glomerular filtration rate.
If a participant experiences death, they are ranked in the category one and the timing of the death is used to determine the ranking within that category, with an earlier death being a worse outcome (a lower rank is assigned). Otherwise, if the participant is alive at the end of the followup, then the next event in the hierarchy is considered for ranking this participant and so on. If the participant did not experience any of the six events, then they fall into category seven in which the individual rate of change of glomerular filtration rate (GFR) is used to further rank the participants, with a lower rate of kidney decline being a better outcome (ranked higher).
The timetoevent (TTE) analysis dataset, ADTTE, is an ADaM BDS dataset that includes additional TTE variables designed for survival analyses. The distinguishing feature of survival data is that at the end of the observation period the event of interest may not have occurred for all subjects. The single ADTTE dataset can support multiple survival analyses, for example, Cox proportional hazards regression, Logrank test and so on. For a given analysis parameter value (PARAM or the short name of the analysis parameter value PARAMCD), ADTTE has one record per subject and the two variables used in all models of survival analyses: the analysis value, AVAL, which shows the timepoint until when the participant was observed for the event of interest and the censoring variable, CNSR, which indicates whether or not the event of interest occurred. The variable ADTTE.AVAL therefore shows either the timing of the occurrence of the event (if CNSR=0) or the length of the fixed followup duration for participants without an event (CNSR=1). ADTTE should also include the subject identifier (SUBJID) and the treatment variable showing planned treatment allocation (TRTP) in a randomized, controlled trial. The fixedfollow up duration is stored in Primary Analysis Day (PADY), which is inherited from ADSL, since this variable is a common analysis date for all participants and is needed across multiple datasets. The ADTTE dataset contains the six dichotomous events of interest (
The BDS for laboratory data, ADLB, has one row per subject per visit per analysis parameter value and contains GFR measurements under a specific analysis parameter, PARAM, and the variables AVISIT, which indicates the timepoint of measurements (categorical variable with visit names); analysis day ADY for the number of days relative to an anchor date (in this case, the date of randomization); the analysis values AVAL, which contain the GFR measurements at each visit; and the BASE variable for the baseline GFR values for each subject. In addition, the individual rate of change of GFR over time can be derived (see the supplementary material)^{7} in ADLB.AVAL corresponding to a new analysis parameter value (PARAM = “Rate of change of GFR”).
An HCE can be analyzed using the methods for ordinal endpoints, for example, rank ANCOVA,^{22} ordinal logistic regression^{23} or win statistics.^{5} We consider the win odds^{17} but the same principles can be applied to other win statistics. Based on the hierarchy defined above, based on which each participant in the active group is compared with each participant in the control group using each participant’s clinically most severe outcome. Hence, first we select the clinically most severe outcomes of the participants from the given fixed followup duration, then compare participants based on those outcomes. If the participant in the active group has a less severe outcome than the participant in the control group, then this is a “win” for the participant in the active group. Forming all possible comparisons of participants in the active group with participants in the control group, we derive the total number of wins, losses, and ties of the active group. The win odds of the active group against control is formed as the total number of wins (plus half of all ties) divided by the total number of losses (plus the second half of the ties). Win odds greater (less) than 1.0 is indicative of the treatment effect in the active (control) group, while win odds of 1.0 is indicative of no difference between groups.
To visualize HCEs, maraca plots (so named after their visual similarity to the musical instrument) were introduced.^{6} On the maraca plot for a kidney HCE, the xaxis is divided into the seven HCE component categories in severity order from left to right. The six TTE components are visualized with adjoined cumulative KaplanMeier plots. For the continuous component, the xaxis corresponds to the annualized rate of change of GFR and a beneficial effect on the continuous component is characterized by a shift to the right. The associated vertical dashed lines show the median values for the annualized rates of changes of GFR among participants without dichotomous outcomes in the two treatment groups. Each participant contributes to the HCE with one event, and the width of each category (dichotomous or continuous outcomes) corresponds to the percentage of that category in the composite. An illustration of analysis results with win odds is provided in
Win statistics analysis example.
Kidney hierarchical composite endpoint  3 years  Active 
118 (15.7)  1.33  (1.18, 1.50)  <0.001 
Control 
172 (22.9) 
n (%) shows the number and percentage of participants with a dichotomous event. The percentage is calculated using the number of participants in each treatment group as a denominator.
A maraca plot for HCEs.
The win odds compares every participant in the active group with every participant in the control group (a cartesian product) and hence requires these pairwise comparisons in a dataset so that the summary of wins/losses/ties is calculated. But a dataset with that structure will not be an ADaM compliant analysis dataset and, in fact, will have a very
Another possible structure for the analysis dataset would be to keep only the number of wins/losses/ties for each participant as a counting response variable. But this would mean having multiple response variables, which is also noncompliant with ADaM principles. Keeping only the wins for each participant plus half the number of ties allows a compliant dataset to be created, but limits analysis to only win odds analysis. For a win ratio analysis, a different definition of the analysis value would be needed to keep only the number of wins without ties. Importantly, different types of analyses, eg maraca visualization or ordinal regression, cannot be performed using these analyses’ values.
We derive an ADaM compliant dataset (see
Schematic representation of relationship of ADHCE source data.
To derive AVAL in ADHCE (
if ADTTE.PARAM=”Allcause death” and ADTTE.CSNR=0 then ADHCE.AVAL = 1*ADTTE.PADY + ADTTE.AVAL,
else if ADTTE.PARAM=”Dialysis” and ADTTE.CSNR=0 then ADHCE.AVAL = 2*ADTTE.PADY + ADTTE.AVAL and so on.
For participants without any dichotomous outcomes, we use the individual rate of change of GFR from ADLB, which can be negative. Regardless of their rate of change, a participant without any outcomes should have a higher AVAL than any other participant in all other categories, as shown in
ADHCE.AVAL = 7*ADTTE.PADY + ADLB.AVAL(PARAM = “Rate of change of GFR”) – m + 1,
where m is the minimum of all values ADLB.AVAL(PARAM = “Rate of change of GFR”) for participants who did not have any of the dichotomous events.
The categorization of AVAL, AVALCAT1, contains the type of the event (presented in
Illustration of analysis dataset ADHCE.
001  A  21  Death  0  1080  Kidney Hierarchical composite endpoint  KHCE 
Illustration of ADHCE Analysis Variable Metadata, Including Analysis Parameter Value.
ADHCE  *ALL*  SUBJID  Subject Identifier for the Study  Char  $11  ADSL.SUBJID  
ADHCE  *ALL*  TRTP  Planned Treatment  Char  $2  A, P  ADSL.TRT01P 
ADHCE  *ALL*  AVAL  Analysis Value  Num  3.2  First, identify participants with any of the 1–6 dichotomous events by selecting the PARAM value in ADTTE corresponding to these events. Then select the most severe event of a participant and the corresponding timing of the event. 

ADHCE  *ALL*  AVALCAT1  Analysis Value Category 1  Char  $11  “Death”, “Dialysis”, “eGFR < 15”, “eGFR >= 57%”, “eGFR >= 50%”, “eGFR >= 40%”, “eGFR”  If the result comes from ADTTE, then set to ADTTE.PARAM 
ADHCE  *ALL*  AVALCA1N  Analysis Value Category 1 (N)  Num  3.0  if AVALCAT1 = “Death” then AVALCA1N = PADY 

ADHCE  *ALL*  PADY  Primary Analysis Day  Num  3.0  ADSL.PADY 
Analysis Results Metadata.
Table 14.1.1  
Primary Endpoint Analysis: Kidney hierarchical composite endpoint by Day 1080 – win statistics  
Comparison of treatment group  
Kidney Hierarchical composite endpoint  
KHCE  
AVAL  
Primary efficacy analysis as prespecified in protocol  
ADHCE  
FASFL=’Y’ and PARAMCD= “KHCE”  
The kidney hierarchical composite endpoint by Day 1080 is analyzed using win odds  
PROC FREQ DATA = ADHCE; 
The dataset ADHCE (
Similarly in the R software, the package
The maraca plots are
The most important question in creating BDS datasets is the decision of when to keep the required analysis value as a new variable (column) in the dataset or as a new record (row). A similar rule exists in creating Tidy datasets, which states that the column headers should not be values, but variable names.^{14} In the ADaM implementation, the analysis values are stored in a column called AVAL, and the rules for adding new variables that contain analysis values are stricter. The main rule is to keep all analysis values in AVAL and to group them by the analysis parameter (PARAM) values. There are some permitted deviations though. For example, the BASE variable contains the values of AVAL corresponding to the baseline (initial timepoint). While AVALCATy (eg, AVALCAT1, AVALCAT2, and so on) and AVALCAyN are parameter variant categorizations of analysis values to categorical and numerical categories, respectively. Additional variables for analysis can be created, only if they follow the fundamental rule of adding new columns to a BDS, according to which a parameterinvariant (calculated the same way for all parameters for which the variable is populated in a dataset) function of AVAL and BASE can be derived into a new variable if it does not involve a transformation of BASE. For example, the variable CHG (change from baseline), which is derived as CHG = AVAL – BASE, is parameterinvariant and does not include a transformation of BASE, so CHG can be a new column in the analysis dataset. But a transformation of analysis values that does not meet this condition should be added as a new parameter, and AVAL should contain the transformed values. Therefore, the fundamental principle of BDS is that only one analysis variable per participant can be derived as a column in the dataset (in any other case not covered by the permitted deviations and by the fundamental rule of adding new columns described above), while multiple analysis values need to be retained in the same variable under different analysis parameter values.
An ADaM dataset is a particular type of analysis dataset that follows the ADaM fundamental principles defined in the ADaM and is compliant to ADaM defined structures or follows as closely as possible to the ADaMIG variable naming and other conventions.^{13} ADTTE (Timetoevent analysis dataset)^{21} is a special case exception. It does not strictly follow the fundamental principle of basic data structure as it essentially has two analysis values: length of the followup (AVAL variable) and a censoring variable showing whether an event happened during that followup (CNSR variable). This flexibility allows two dependent variables that can be used in statistical modelling. CDISC standardization of this dataset makes this a widely used and ADaM compliant dataset. Timetoevent analyses are common in clinical trials (including as a primary analysis), hence standardization of this dataset was important and is helpful for implementation.
To follow the BDS fundamental principles for the hierarchical composite endpoints in the absence of fixed followup is difficult since the participants are compared using their shared followup approach.^{15} This leads to transitivity issues^{16,24} and consequently participants cannot be compared on a common clinical scale, hence the impossibility to derive one analysis value per participant. All relevant events of the participant along with the maximum length of followup for each participant therefore need to be retained as analysis values. Different analysis values from these multiple values would contribute to analysis that depend upon which participants are compared. Therefore, this may potentially lead to multiple analysis values per participant, hence to the creation of a noncompliant analysis dataset. This would mean that an analysis dataset for win statistics analyses with variable followups will either follow the BDS principles but not be analysisready (multiple data transformations should be done on this dataset before win statistics can be calculated) or the dataset will be analysisready but will not be ADaM compliant.
The presence of a fixed followup is of course a restriction, but it solves different statistical issues (for example, the analysis results can be interpreted on a participant level which may be more clinically meaningful) and, as described in this paper, solves issues of having multiple analysis variables as columns, hence creating the possibility to derive a dataset that conforms to the fundamental principles of ADaM and is analysisready for multiple analyses.
We have provided the principles of constructing an analysis dataset for the hierarchical composite endpoints in a fixed followup setting. As an example, we have used the novel kidney HCE, but the same principles can be applied for HCEs in different therapeutic areas as well. We demonstrated that the constructed analysis dataset conforms to the fundamental principles of BDS, and so it is an ADaM compliant dataset. It is analysisready for multiple analyses, including generating win statistics and visualization using maraca plots. The purpose of this paper is to highlight the principles and to provide an example for content illustration with only key variables included. The constructed ADHCE dataset should not be considered as a standardization of the structure and appearance of the dataset. In line with the general note in CDISC guidance documents, eventual implementation of the dataset may follow the same principles but have a different display and contents.
Here we want to highlight the growing importance of hierarchical composite endpoints in clinical trials, including their use as a primary endpoint, and we urge the clinical community and CDISC to work together to derive a standardized analysis dataset for hierarchical composite endpoints and for win statistics analyses in general, similar to the ADTTE dataset. We hope that this paper serves as the first modest step in this direction.
We would like to thank
The contents of this paper are the work of the authors and do not necessarily represent the opinions, recommendations, or practices of AstraZeneca. Any brand and product names are trademarks of their respective companies.
The authors have no competing interests to declare.