<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2694-1473</journal-id>
<journal-title-group>
<journal-title>Journal of the Society for Clinical Data Management</journal-title>
</journal-title-group>
<issn pub-type="epub">2694-1473</issn>
<publisher>
<publisher-name>Society for Clinical Data Management</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.47912/jscdm.265</article-id>
<article-categories>
<subj-group>
<subject>Original research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Basic Data Structure for Hierarchical Composite Endpoints: An Application to Kidney Disease Trials</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Gasparyan</surname>
<given-names>Samvel B.</given-names>
</name>
<email>samvel.gasparyan@astrazeneca.com</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Major</surname>
<given-names>Nicole</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>B&#228;ckberg</surname>
<given-names>Christoffer</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Ravikiran</surname>
<given-names>Srivathsa</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wani</surname>
<given-names>Parag</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Karpefors</surname>
<given-names>Martin</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Late-Stage Development, Cardiovascular, Renal, and Metabolism, BioPharmaceuticals R&amp;D, AstraZeneca, Gothenburg, SE</aff>
<aff id="aff-2"><label>2</label>Late-Stage Development, Cardiovascular, Renal, and Metabolism, BioPharmaceuticals R&amp;D, AstraZeneca, Gaithersburg, US</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2024-02-06">
<day>06</day>
<month>02</month>
<year>2024</year>
</pub-date>
<pub-date pub-type="collection">
<year>2024</year>
</pub-date>
<volume>4</volume>
<issue>1</issue>
<elocation-id>2</elocation-id>
<history>
<date date-type="received" iso-8601-date="2023-08-09">
<day>09</day>
<month>08</month>
<year>2023</year>
</date>
<date date-type="accepted" iso-8601-date="2024-01-10">
<day>10</day>
<month>01</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2024 The Author(s)</copyright-statement>
<copyright-year>2024</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>SCDM publishes JSCDM content in an open access manner under a Attribution-Non-Commercial-ShareAlike (CC BY-NC-SA) license. This license lets others remix, adapt, and build upon the work non-commercially, as long as they credit SCDM and the author and license their new creations under the identical terms. See <uri xlink:href="https://creativecommons.org/licenses/by-nc-sa/4.0/">https://creativecommons.org/licenses/by-nc-sa/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://www.jscdm.org/articles/10.47912/jscdm.265/"/>
<abstract>
<sec>
<title>Introduction:</title>
<p>Hierarchical composite endpoints (HCE), including the recently introduced kidney HCE, are complex endpoints that are usually analyzed by win statistics and are visualized using novel maraca plots. As a result of its novelty and the complexity of the analyses of HCE using win statistics, the construction of analysis datasets that conform to the fundamental principles put forward by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM) is not straightforward.</p>
</sec>
<sec>
<title>Objectives:</title>
<p>We show that in the case of a fixed follow-up it is possible to construct an analysis dataset that conforms to Basic Data Structure principles and is analysis-ready for conducting multiple analyses, including win statistics generation and visualization of HCE using maraca plots.</p>
</sec>
<sec>
<title>Methods:</title>
<p>We use theoretical justification for the fixed follow-up designs to show that the pair-wise comparisons of participants for the win statistics analyses can be reduced to a participant-level ranking, and use the fundamental principles put forward by CDISC and Tidy principles of the data science community to derive an ADaM-compliant dataset.</p>
</sec>
<sec>
<title>Results:</title>
<p>In the setting of fixed follow-up designs, we construct an ADaM-compliant dataset for conducting win statistics analyses and visualization using maraca plots, with the required metadata traceability.</p>
</sec>
<sec>
<title>Conclusions:</title>
<p>Based on the growing importance of HCEs in clinical trials, and the difficulty in creating ADaM-compliant datasets for these analyses, we provide principles to create such datasets, to prompt the clinical community and CDISC to work towards standardization of analysis datasets for hierarchical composite endpoints.</p>
</sec>
</abstract>
<kwd-group>
<kwd>hierarchical composite endpoints</kwd>
<kwd>win statistics</kwd>
<kwd>maraca plots</kwd>
<kwd>CDISC ADaM</kwd>
<kwd>Tidy data</kwd>
<kwd>basic data structures</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>Introduction</title>
<p>Hierarchical composite endpoints (HCEs) are complex endpoints<sup><xref ref-type="bibr" rid="B1">1</xref>,<xref ref-type="bibr" rid="B2">2</xref>,<xref ref-type="bibr" rid="B3">3</xref>,<xref ref-type="bibr" rid="B4">4</xref></sup> that are analyzed using win statistics and visualized using maraca plots.<sup><xref ref-type="bibr" rid="B6">6</xref></sup> An HCE has a hierarchical structure and uses the most clinically severe event of a participant in studies with a fixed follow-up design. This results in an ordinal endpoint, similar to the severity scale endpoints. As a result of its hierarchical nature, an HCE can combine outcomes of different types into a composite, for example, clinical events of death and hospitalization with numerical laboratory variables or symptom summary scores.<sup><xref ref-type="bibr" rid="B7">7</xref>,<xref ref-type="bibr" rid="B8">8</xref></sup> In addition, the clinical events may contribute to the composite with the time of the corresponding event, as an additional layer of severity. This means that participants having an event of the same severity are compared using the timing of the event, with a later event signifying a better outcome. Overall, the ordering is done so that a higher order means a better outcome. A characteristic of ordinal endpoints is that the concepts of better or worse are defined but not the quantitative magnitude of how much better or worse (unlike a continuous endpoint). HCEs are implemented in different therapeutic areas: COVID-19,<sup><xref ref-type="bibr" rid="B9">9</xref>,<xref ref-type="bibr" rid="B10">10</xref></sup> heart failure,<sup><xref ref-type="bibr" rid="B8">8</xref>,<xref ref-type="bibr" rid="B11">11</xref></sup> and chronic kidney disease (CKD),<sup><xref ref-type="bibr" rid="B7">7</xref></sup> to name a few.</p>
<p>Due to its novelty and the complexity of the analyses involving HCE, the construction of analysis datasets conforming to the fundamental principles put forward by the Clinical Data Interchange Standards Consortium (CDISC) Analysis Data Model (ADaM)<sup><xref ref-type="bibr" rid="B12">12</xref></sup> is not straightforward nor is it apparent whether it is possible. These fundamental principles were suggested with the purpose of providing standardization of the datasets across various stakeholders included in the conduct, analysis, and reporting of clinical trials in order to achieve transparency in analyses, as well as in communication and review.<sup><xref ref-type="bibr" rid="B13">13</xref></sup> ADaM is one of the implementations of these fundamental principles; other implementations of similar principles are known to the data science community as Tidy data principles.<sup><xref ref-type="bibr" rid="B14">14</xref></sup></p>
<p>Win statistics<sup><xref ref-type="bibr" rid="B5">5</xref></sup> (win ratio,<sup><xref ref-type="bibr" rid="B15">15</xref></sup> win odds<sup><xref ref-type="bibr" rid="B16">16</xref>,<xref ref-type="bibr" rid="B17">17</xref></sup> or win ratio with ties,<sup><xref ref-type="bibr" rid="B18">18</xref>,<xref ref-type="bibr" rid="B19">19</xref></sup> net benefit<sup><xref ref-type="bibr" rid="B20">20</xref></sup>) are statistical methods for analyzing HCE and are based on the principle of comparing each participant in the active group with each participant in the control group using multiple outcomes and differing follow-ups for these outcomes. Construction of an ADaM compliant analysis dataset is therefore a challenge facing every clinical trialist involved in the analysis and reporting in a regulatory setting where such data structures are a requirement.</p>
<p>Using theoretical justification in the case of a fixed follow-up, we show that it is possible to construct an analysis dataset, ADHCE, that conforms to ADaM principles using the Basic Data Structure (BDS) that is analysis-ready for conducting win statistics analyses. In other words, this dataset can be used for performing the analyses without having to manipulate data first. The created BDS for the HCE analysis will therefore allow the separation of the analysis data creation from the analysis result generation (as is the intention of ADaM datasets), even for such complex analyses as win statistics calculations.</p>
<p>Traceability between analysis data values and their specific predecessor records is provided in the form of data point traceability. Traceability facilitates transparency of analysis conduct and allows for its replication. Detailed traceability is particularly important for the HCE derivation as it involves multiple outcomes derived through complex data manipulations from different datasets. Construction of a single ADHCE dataset that follows the BDS and is analysis-ready is important for clear communication of results and software development for analysis and reporting.</p>
</sec>
<sec>
<title>Background</title>
<sec>
<title>Basic data structures for common analysis methods</title>
<p>An ADaM dataset is a particular type of analysis dataset that follows the ADaM fundamental principles defined in the ADaM<sup><xref ref-type="bibr" rid="B12">12</xref></sup> and is compliant to ADaM defined structures or follows as closely as possible to the ADaMIG variable naming and other conventions.<sup><xref ref-type="bibr" rid="B13">13</xref></sup> Currently, ADaM has three structures: Subject Level Analysis Dataset (ADSL), Basic Data Structure (BDS), and Occurrence Data Structure (OCCDS). An ADaM dataset contains both source and derived data; it is therefore important to clearly document the variable derivations and how to use them for obtaining the analysis results. ADSL is a required, participant-level dataset that contains participants&#8217; baseline and demographic characteristics, population flags that indicate the participant&#8217;s inclusion in different analysis populations, planned and actual treatment variables for each period, and important dates. The BDS datasets contain endpoints and data that vary over time during the course of a study and are organized as one or more records per subject per analysis parameter per analysis timepoint. It is often optimal to have more than one BDS analysis dataset, but not necessarily one dataset per analysis. The BDS datasets are the main data structures used for complex statistical analyses but are not designed to support analysis of incidence of adverse events or other occurrence data. Analysis of such data is supported in the OCCDS. For commonly used analysis methods (eg, analysis of variance or covariance, logistic regression and so on) the BDS implementation is straightforward. A more complex analysis method for time-to-event analyses has its own standardized BDS, ADTTE, that is well developed<sup><xref ref-type="bibr" rid="B21">21</xref></sup> and widely used. Although the BDS supports most statistical analyses, it does not support all statistical analyses. For example, it does not support simultaneous analysis of multiple dependent (response/outcome) variables or a correlation analysis across a range of response variables.</p>
<p>In the ADaM design, at a minimum, the analysis datasets should contain the datasets needed for the recreation of specific statistical methods. There is no requirement that every analysis has its own dataset, but rather, a single dataset can support multiple analyses to achieve the optimal number of analysis datasets. Each analysis dataset should contain all the analysis-enabling variables required for performing the statistical analysis it is designed to support (it can even contain supportive variables not needed for the analysis but that are of interest for traceability purposes). This can lead to redundancy, that is, the same data appearing in multiple datasets, but this is necessary for having analysis-ready datasets. Analysis-ready does not mean that the results can be generated in a single statistical procedure, but rather that each of the summary statistics included in the results can be derived with minimal programming effort using standard statistical procedures with the dataset as input.</p>
<p>We briefly describe the fundamental principles governing the structure of BDS in connection to Tidy data principles and discuss the structure of ADLB (analysis datasets for laboratory values) that is used for the ANCOVA-type analyses and ADTTE for time-to-event analyses, as these two datasets, alongside the participant-level ADSL, are the source datasets for ADHCE. Then, following the BDS principles, we construct the ADHCE dataset, which is analysis-ready for multiple analyses (with its metadata traceability describing the source datasets and variables) and provide the minimal steps required to perform these analyses using ADHCE.</p>
<p>The methodology provided here is applicable only for fixed follow-up settings. For settings without fixed follow-up, we explore the challenges associated with the derivation of an analysis dataset that conforms to the BDS principles.</p>
</sec>
</sec>
<sec>
<title>Methods</title>
<sec>
<title>The kidney hierarchical composite endpoint: the definition and the algorithm for construction</title>
<p>Consider the case of two treatment groups, with active and control treatments, and assume that all participants have the same follow-up and there are no dropouts, meaning all participants were followed for all events of interest until the end of the fixed follow-up. The kidney HCE<sup><xref ref-type="bibr" rid="B4">4</xref>,<xref ref-type="bibr" rid="B7">7</xref></sup> has the following construction: during a fixed follow-up, participants are followed for one of the six dichotomous events in the provided hierarchy described in <xref ref-type="table" rid="T1">Table 1</xref>.</p>
<table-wrap id="T1">
<label>Table 1</label>
<caption>
<p>The outcomes in the kidney HCE.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Rank</bold></td>
<td align="left" valign="top"><bold>Outcome</bold></td>
<td align="left" valign="top"><bold>Subcategorization</bold></td>
<td align="left" valign="top"><bold>Favorability</bold></td>
<td align="left" valign="top"><bold>Source dataset</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">1.</td>
<td align="left" valign="top">Death</td>
<td align="left" valign="top">Timing (later is better)</td>
<td align="left" valign="top">Worst</td>
<td align="left" valign="top">ADTTE</td>
</tr>
<tr>
<td align="left" valign="top">2.</td>
<td align="left" valign="top">Dialysis</td>
<td align="left" valign="top">Timing (later is better)</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">ADTTE</td>
</tr>
<tr>
<td align="left" valign="top">3.</td>
<td align="left" valign="top">Sustained eGFR &lt;15</td>
<td align="left" valign="top">Timing (later is better)</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">ADTTE</td>
</tr>
<tr>
<td align="left" valign="top">4.</td>
<td align="left" valign="top">Sustained &gt;=57% decline in eGFR</td>
<td align="left" valign="top">Timing (later is better)</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">ADTTE</td>
</tr>
<tr>
<td align="left" valign="top">5.</td>
<td align="left" valign="top">Sustained &gt;=50% decline in eGFR</td>
<td align="left" valign="top">Timing (later is better)</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">ADTTE</td>
</tr>
<tr>
<td align="left" valign="top">6.</td>
<td align="left" valign="top">Sustained &gt;=40% decline in eGFR</td>
<td align="left" valign="top">Timing (later is better)</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">ADTTE</td>
</tr>
<tr>
<td align="left" valign="top">7.</td>
<td align="left" valign="top">Individual rate of change of GFR</td>
<td align="left" valign="top">Actual values (higher is better)</td>
<td align="left" valign="top">Best</td>
<td align="left" valign="top">ADLB</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>eGFR = estimated glomerular filtration rate.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>If a participant experiences death, they are ranked in the category one and the timing of the death is used to determine the ranking within that category, with an earlier death being a worse outcome (a lower rank is assigned). Otherwise, if the participant is alive at the end of the follow-up, then the next event in the hierarchy is considered for ranking this participant and so on. If the participant did not experience any of the six events, then they fall into category seven in which the individual rate of change of glomerular filtration rate (GFR) is used to further rank the participants, with a lower rate of kidney decline being a better outcome (ranked higher).</p>
<p>The time-to-event (TTE) analysis dataset, ADTTE, is an ADaM BDS dataset that includes additional TTE variables designed for survival analyses. The distinguishing feature of survival data is that at the end of the observation period the event of interest may not have occurred for all subjects. The single ADTTE dataset can support multiple survival analyses, for example, Cox proportional hazards regression, Log-rank test and so on. For a given analysis parameter value (PARAM or the short name of the analysis parameter value PARAMCD), ADTTE has one record per subject and the two variables used in all models of survival analyses: the analysis value, AVAL, which shows the timepoint until when the participant was observed for the event of interest and the censoring variable, CNSR, which indicates whether or not the event of interest occurred. The variable ADTTE.AVAL therefore shows either the timing of the occurrence of the event (if CNSR=0) or the length of the fixed follow-up duration for participants without an event (CNSR=1). ADTTE should also include the subject identifier (SUBJID) and the treatment variable showing planned treatment allocation (TRTP) in a randomized, controlled trial. The fixed-follow up duration is stored in Primary Analysis Day (PADY), which is inherited from ADSL, since this variable is a common analysis date for all participants and is needed across multiple datasets. The ADTTE dataset contains the six dichotomous events of interest (<xref ref-type="table" rid="T1">Table 1</xref>), each having a unique PARAM value.</p>
<p>The BDS for laboratory data, ADLB, has one row per subject per visit per analysis parameter value and contains GFR measurements under a specific analysis parameter, PARAM, and the variables AVISIT, which indicates the timepoint of measurements (categorical variable with visit names); analysis day ADY for the number of days relative to an anchor date (in this case, the date of randomization); the analysis values AVAL, which contain the GFR measurements at each visit; and the BASE variable for the baseline GFR values for each subject. In addition, the individual rate of change of GFR over time can be derived (see the supplementary material)<sup><xref ref-type="bibr" rid="B7">7</xref></sup> in ADLB.AVAL corresponding to a new analysis parameter value (PARAM = &#8220;Rate of change of GFR&#8221;).</p>
</sec>
<sec>
<title>An HCE analysis results metadata &#8211; win statistics and maraca plot</title>
<p>An HCE can be analyzed using the methods for ordinal endpoints, for example, rank ANCOVA,<sup><xref ref-type="bibr" rid="B22">22</xref></sup> ordinal logistic regression<sup><xref ref-type="bibr" rid="B23">23</xref></sup> or win statistics.<sup><xref ref-type="bibr" rid="B5">5</xref></sup> We consider the win odds<sup><xref ref-type="bibr" rid="B17">17</xref></sup> but the same principles can be applied to other win statistics. Based on the hierarchy defined above, based on which each participant in the active group is compared with each participant in the control group using each participant&#8217;s clinically most severe outcome. Hence, first we select the clinically most severe outcomes of the participants from the given fixed follow-up duration, then compare participants based on those outcomes. If the participant in the active group has a less severe outcome than the participant in the control group, then this is a &#8220;win&#8221; for the participant in the active group. Forming all possible comparisons of participants in the active group with participants in the control group, we derive the total number of wins, losses, and ties of the active group. The win odds of the active group against control is formed as the total number of wins (plus half of all ties) divided by the total number of losses (plus the second half of the ties). Win odds greater (less) than 1.0 is indicative of the treatment effect in the active (control) group, while win odds of 1.0 is indicative of no difference between groups.</p>
<p>To visualize HCEs, maraca plots (so named after their visual similarity to the musical instrument) were introduced.<sup><xref ref-type="bibr" rid="B6">6</xref></sup> On the maraca plot for a kidney HCE, the x-axis is divided into the seven HCE component categories in severity order from left to right. The six TTE components are visualized with adjoined cumulative Kaplan-Meier plots. For the continuous component, the x-axis corresponds to the annualized rate of change of GFR and a beneficial effect on the continuous component is characterized by a shift to the right. The associated vertical dashed lines show the median values for the annualized rates of changes of GFR among participants without dichotomous outcomes in the two treatment groups. Each participant contributes to the HCE with one event, and the width of each category (dichotomous or continuous outcomes) corresponds to the percentage of that category in the composite. An illustration of analysis results with win odds is provided in <xref ref-type="table" rid="T2">Table 2</xref>, with the corresponding maraca plot in <xref ref-type="fig" rid="F1">Figure 1</xref>.</p>
<table-wrap id="T2">
<label>Table 2</label>
<caption>
<p>Win statistics analysis example.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top" rowspan="2"><bold>Endpoint</bold></td>
<td align="left" valign="top" rowspan="2"><bold>Timepoint</bold></td>
<td align="left" valign="top" rowspan="2"><bold>Group</bold></td>
<td align="left" valign="top" rowspan="2"><bold>Participants with event n (%)</bold></td>
<td align="left" valign="top" colspan="3"><bold>Comparison of treatment groups</bold></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Estimate</bold></td>
<td align="left" valign="top"><bold>95% CI</bold></td>
<td align="left" valign="top"><bold>p-value</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Kidney hierarchical composite endpoint</td>
<td align="left" valign="top">3 years</td>
<td align="left" valign="top">Active<break/>N = 750</td>
<td align="left" valign="top">118 (15.7)</td>
<td align="left" valign="top">1.33</td>
<td align="left" valign="top">(1.18, 1.50)</td>
<td align="left" valign="top">&lt;0.001</td>
</tr>
<tr>
<td align="left" valign="top" rowspan="3"></td>
<td align="left" valign="top" rowspan="3"></td>
<td align="left" valign="top">Control<break/>N = 750</td>
<td align="left" valign="top" rowspan="3">172 (22.9)</td>
<td align="left" valign="top" rowspan="3"></td>
<td align="left" valign="top" rowspan="3"></td>
<td align="left" valign="top" rowspan="3"></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>n (%) shows the number and percentage of participants with a dichotomous event. The percentage is calculated using the number of participants in each treatment group as a denominator.</p></fn>
</table-wrap-foot>
</table-wrap>
<fig id="F1">
<label>Figure 1</label>
<caption>
<p>A maraca plot for HCEs.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/jscdm-4-1-265-g1.png"/>
</fig>
</sec>
</sec>
<sec>
<title>Results</title>
<sec>
<title>ADHCE as an analysis-ready BDS</title>
<p>The win odds compares every participant in the active group with every participant in the control group (a cartesian product) and hence requires these pair-wise comparisons in a dataset so that the summary of wins/losses/ties is calculated. But a dataset with that structure will not be an ADaM compliant analysis dataset and, in fact, will have a very <italic>messy</italic> structure according to Tidy principles, since each row will not be an observation, but a combination of observations from two treatment groups. Like BDS principles, the data science community uses Tidy principles,<sup><xref ref-type="bibr" rid="B14">14</xref></sup> according to which each variable should form a column, each observation should form a row, and each type of observational unit should form a dataset. Any violation of these principles results in <italic>messy</italic> datasets, for example, if column headers are values, not variable names or if variables are stored in both rows and columns. The Tidy principles are like the BDS principles, but they also describe in detail how these principles can be violated. The use of pair-wise comparisons in a dataset would therefore result in two columns representing the treatment groups and hence having the column names as analysis values (because the treatment group is used as an analysis value), violating another Tidy principle.</p>
<p>Another possible structure for the analysis dataset would be to keep only the number of wins/losses/ties for each participant as a counting response variable. But this would mean having multiple response variables, which is also non-compliant with ADaM principles. Keeping only the wins for each participant plus half the number of ties allows a compliant dataset to be created, but limits analysis to only win odds analysis. For a win ratio analysis, a different definition of the analysis value would be needed to keep only the number of wins without ties. Importantly, different types of analyses, eg maraca visualization or ordinal regression, cannot be performed using these analyses&#8217; values.</p>
<p>We derive an ADaM compliant dataset (see <xref ref-type="fig" rid="F2">Figure 2</xref>), ADHCE, with a single analysis variable that is analysis-ready for multiple analyses. The theoretical justification for this is that the number of wins of a participant can be derived using the rank of the participant in the overall dataset (both treatment groups combined) and the rank of that participant in their own treatment group.<sup><xref ref-type="bibr" rid="B17">17</xref>,<xref ref-type="bibr" rid="B18">18</xref></sup> Therefore, the participant-level ranking from the worst outcome to the most favorable can help to create an analysis value for the win statistics calculation. This methodology is applicable only in the cases of fixed follow-up durations since in case of differing follow-ups between participants comparison issues may arise, known as transitivity issues,<sup><xref ref-type="bibr" rid="B4">4</xref>,<xref ref-type="bibr" rid="B24">24</xref></sup> which would lead to comparisons not being on the participant level (impossibility to rank participants using their outcomes).</p>
<fig id="F2">
<label>Figure 2</label>
<caption>
<p>Schematic representation of relationship of ADHCE source data.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="figures/jscdm-4-1-265-g2.png"/>
</fig>
<p>To derive AVAL in ADHCE (<xref ref-type="table" rid="T4">Table 4</xref>), first identify participants with any of dichotomous outcomes by selecting the PARAM value in ADTTE corresponding to this event (for example, selecting ADTTE.PARAM= &#8220;All-cause death&#8221; and ADTTE.CNSR=0). Then select the most severe event of a participant and the corresponding timing of the event from ADTTE.AVAL. If ADTTE.PADY shows the length of the fixed follow-up, then the algorithm for AVAL for each participant is shown in <bold>Box 1</bold></p>
<boxed-text>
<caption><p><bold>Box 1:</bold> Derivation of ADHCE.AVAL for dichotomous outcomes</p></caption>
<list list-type="bullet">
<list-item><p>if ADTTE.PARAM=&#8221;All-cause death&#8221; and ADTTE.CSNR=0 then ADHCE.AVAL = 1*ADTTE.PADY + ADTTE.AVAL,</p></list-item>
<list-item><p>else if ADTTE.PARAM=&#8221;Dialysis&#8221; and ADTTE.CSNR=0 then ADHCE.AVAL = 2*ADTTE.PADY + ADTTE.AVAL and so on.</p></list-item>
</list>
</boxed-text>
<p>For participants without any dichotomous outcomes, we use the individual rate of change of GFR from ADLB, which can be negative. Regardless of their rate of change, a participant without any outcomes should have a higher AVAL than any other participant in all other categories, as shown in <bold>Box 2</bold>.</p>
<boxed-text>
<caption><p><bold>Box 2:</bold> Derivation of ADHCE.AVAL for the continuous outcome</p></caption>
<list list-type="bullet">
<list-item><p>ADHCE.AVAL = 7*ADTTE.PADY + ADLB.AVAL(PARAM = &#8220;Rate of change of GFR&#8221;) &#8211; m + 1,</p></list-item>
<list-item><p>where m is the minimum of all values ADLB.AVAL(PARAM = &#8220;Rate of change of GFR&#8221;) for participants who did not have any of the dichotomous events.</p></list-item>
</list>
</boxed-text>
<p>The categorization of AVAL, AVALCAT1, contains the type of the event (presented in <xref ref-type="table" rid="T1">Table 1</xref>), while AVALCA1N is the numeric order of this categorization. As part of the traceability, we provide an illustration of the ADHCE dataset (<xref ref-type="table" rid="T3">Table 3</xref>), the metadata of analysis variables (including analysis parameter values) included in ADHCE (<xref ref-type="table" rid="T4">Table 4</xref>). For full traceability between the results, the analysis datasets and the source datasets the analysis results metadata is presented in <xref ref-type="table" rid="T5">Table 5</xref> (for results in <xref ref-type="table" rid="T2">Table 2</xref> and <xref ref-type="fig" rid="F1">Figure 1</xref>).</p>
<table-wrap id="T3">
<label>Table 3</label>
<caption>
<p>Illustration of analysis dataset ADHCE.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>SUBJID</bold></td>
<td align="left" valign="top"><bold>TRTP</bold></td>
<td align="left" valign="top"><bold>AVAL</bold></td>
<td align="left" valign="top"><bold>AVALCAT1</bold></td>
<td align="left" valign="top"><bold>AVALCA1N</bold></td>
<td align="left" valign="top"><bold>PADY</bold></td>
<td align="left" valign="top"><bold>PARAM</bold></td>
<td align="left" valign="top"><bold>PARAMCD</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">001</td>
<td align="left" valign="top">A</td>
<td align="left" valign="top">21</td>
<td align="left" valign="top">Death</td>
<td align="left" valign="top">0</td>
<td align="left" valign="top">1080</td>
<td align="left" valign="top">Kidney Hierarchical composite endpoint</td>
<td align="left" valign="top">KHCE</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T4">
<label>Table 4</label>
<caption>
<p>Illustration of ADHCE Analysis Variable Metadata, Including Analysis Parameter Value.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Dataset Name</bold></td>
<td align="left" valign="top"><bold>Parameter Identifier</bold></td>
<td align="left" valign="top"><bold>Variable Name</bold></td>
<td align="left" valign="top"><bold>Variable Label</bold></td>
<td align="left" valign="top"><bold>Variable Type</bold></td>
<td align="left" valign="top"><bold>Display Format</bold></td>
<td align="left" valign="top"><bold>Codelist/Controlled Terms</bold></td>
<td align="left" valign="top"><bold>Source/Derivation</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><italic>file name of the analysis dataset</italic></td>
<td align="left" valign="top"><italic>PARAMCD or *ALL* or *DEFAULT*</italic></td>
<td align="left" valign="top"><italic>name</italic></td>
<td align="left" valign="top"><italic>description</italic></td>
<td align="left" valign="top"><italic>type</italic></td>
<td align="left" valign="top"><italic>display information</italic></td>
<td align="left" valign="top"><italic>valid values or codes and decodes</italic></td>
<td align="left" valign="top"><italic>where the variable came from in the source data or how the variable was derived</italic></td>
</tr>
<tr>
<td align="left" valign="top">ADHCE</td>
<td align="left" valign="top">*ALL*</td>
<td align="left" valign="top">SUBJID</td>
<td align="left" valign="top">Subject Identifier for the Study</td>
<td align="left" valign="top">Char</td>
<td align="left" valign="top">$11</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">ADSL.SUBJID</td>
</tr>
<tr>
<td align="left" valign="top">ADHCE</td>
<td align="left" valign="top">*ALL*</td>
<td align="left" valign="top">TRTP</td>
<td align="left" valign="top">Planned Treatment</td>
<td align="left" valign="top">Char</td>
<td align="left" valign="top">$2</td>
<td align="left" valign="top">A, P</td>
<td align="left" valign="top">ADSL.TRT01P</td>
</tr>
<tr>
<td align="left" valign="top">ADHCE</td>
<td align="left" valign="top">*ALL*</td>
<td align="left" valign="top">AVAL</td>
<td align="left" valign="top">Analysis Value</td>
<td align="left" valign="top">Num</td>
<td align="left" valign="top">3.2</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">First, identify participants with any of the 1&#8211;6 dichotomous events by selecting the PARAM value in ADTTE corresponding to these events. Then select the most severe event of a participant and the corresponding timing of the event.<break/>If ADTTE.PARAM=&#8221; All-cause death&#8221; and ADTTE.CSNR=0 then ADHCE.AVAL = 1*ADTTE.PADY + ADTTE.AVAL<break/>Else if ADTTE.PARAM=&#8221;Dialysis&#8221; and ADTTE.CSNR=0 then ADHCE.AVAL = 2*ADTTE.PADY + ADTTE.AVAL and so on. Here we are using the numeric rank of each type of an event, 1 for death, 2 for dialysis and so on, following the order of the outcomes in Table 1. If the participant did not experience any of the outcomes in 1&#8211;6 then the participant falls into category 7. For this participant select the record from ADLB with PARAM = &#8220;Rate of change of GFR&#8221; and derive AVAL as ADHCE.AVAL = 7*ADTTE.PADY + ADLB.AVAL &#8211; m+1, where m is the minimum of all values ADLB.AVAL(PARAM = &#8220;Rate of change of GFR&#8221;) for participants who did not have any of the dichotomous events.</td>
</tr>
<tr>
<td align="left" valign="top">ADHCE</td>
<td align="left" valign="top">*ALL*</td>
<td align="left" valign="top">AVALCAT1</td>
<td align="left" valign="top">Analysis Value Category 1</td>
<td align="left" valign="top">Char</td>
<td align="left" valign="top">$11</td>
<td align="left" valign="top">&#8220;Death&#8221;, &#8220;Dialysis&#8221;, &#8220;eGFR &lt; 15&#8221;, &#8220;eGFR &gt;= 57%&#8221;, &#8220;eGFR &gt;= 50%&#8221;, &#8220;eGFR &gt;= 40%&#8221;, &#8220;eGFR&#8221;</td>
<td align="left" valign="top">If the result comes from ADTTE, then set to ADTTE.PARAM<break/>Else if ADLB.PARAM = &#8220;Rate of change of GFR&#8221; then AVALCAT1 = &#8220;eGFR&#8221;</td>
</tr>
<tr>
<td align="left" valign="top">ADHCE</td>
<td align="left" valign="top">*ALL*</td>
<td align="left" valign="top">AVALCA1N</td>
<td align="left" valign="top">Analysis Value Category 1 (N)</td>
<td align="left" valign="top">Num</td>
<td align="left" valign="top">3.0</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">if AVALCAT1 = &#8220;Death&#8221; then AVALCA1N = PADY<break/>Else if AVALCAT1 = &#8220;Dialysis&#8221; then AVALCA1N = 2*PADY<break/>Else if AVALCAT1 = &#8220;eGFR &lt; 15&#8221;then AVALCA1N = 3*PADY<break/>Else if AVALCAT1 = &#8220;eGFR &gt;= 57%&#8221; then AVALCA1N = 4*PADY<break/>Else if AVALCAT1 = &#8220;eGFR &gt;= 50%&#8221; then AVALCA1N = 5*PADY<break/>Else if AVALCAT1 = &#8220;eGFR &gt;= 40%&#8221; then AVALCA1N = 6*PADY<break/>Else if AVALCAT1 = &#8220;eGFR&#8221; then AVALCA1N= 7*PADY</td>
</tr>
<tr>
<td align="left" valign="top">ADHCE</td>
<td align="left" valign="top">*ALL*</td>
<td align="left" valign="top">PADY</td>
<td align="left" valign="top">Primary Analysis Day</td>
<td align="left" valign="top">Num</td>
<td align="left" valign="top">3.0</td>
<td align="left" valign="top"></td>
<td align="left" valign="top">ADSL.PADY</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T5">
<label>Table 5</label>
<caption>
<p>Analysis Results Metadata.</p>
</caption>
<table>
<thead>
<tr>
<td align="left" valign="top"><bold>Metadata Field</bold></td>
<td align="left" valign="top"><bold><italic>Definition of field</italic></bold></td>
<td align="left" valign="top"><bold>Metadata</bold></td>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top"><bold>DISPLAY IDENTIFIER</bold></td>
<td align="left" valign="top"><italic>Unique identifier for the specific analysis display</italic></td>
<td align="left" valign="top">Table 14.1.1</td>
</tr>
<tr>
<td align="left" valign="top"><bold>DISPLAY NAME</bold></td>
<td align="left" valign="top"><italic>Title of display</italic></td>
<td align="left" valign="top">Primary Endpoint Analysis: Kidney hierarchical composite endpoint by Day 1080 &#8211; win statistics</td>
</tr>
<tr>
<td align="left" valign="top"><bold>RESULT IDENTIFIER</bold></td>
<td align="left" valign="top"><italic>Identifies the specific analysis result within a display</italic></td>
<td align="left" valign="top">Comparison of treatment group</td>
</tr>
<tr>
<td align="left" valign="top"><bold>PARAM</bold></td>
<td align="left" valign="top"><italic>Analysis parameter</italic></td>
<td align="left" valign="top">Kidney Hierarchical composite endpoint</td>
</tr>
<tr>
<td align="left" valign="top"><bold>PARAMCD</bold></td>
<td align="left" valign="top"><italic>Analysis parameter code</italic></td>
<td align="left" valign="top">KHCE</td>
</tr>
<tr>
<td align="left" valign="top"><bold>ANALYSIS VARIABLE</bold></td>
<td align="left" valign="top"><italic>Analysis variable being analyzed</italic></td>
<td align="left" valign="top">AVAL</td>
</tr>
<tr>
<td align="left" valign="top"><bold>REASON</bold></td>
<td align="left" valign="top"><italic>Rationale for performing this analysis</italic></td>
<td align="left" valign="top">Primary efficacy analysis as pre-specified in protocol</td>
</tr>
<tr>
<td align="left" valign="top"><bold>DATASET</bold></td>
<td align="left" valign="top"><italic>Dataset(s) used in the analysis</italic>.</td>
<td align="left" valign="top">ADHCE</td>
</tr>
<tr>
<td align="left" valign="top"><bold>SELECTION CRITERIA</bold></td>
<td align="left" valign="top"><italic>Specific and sufficient selection criteria for analysis subset and/or numerator</italic></td>
<td align="left" valign="top">FASFL=&#8217;Y&#8217; and PARAMCD= &#8220;KHCE&#8221;</td>
</tr>
<tr>
<td align="left" valign="top"><bold>DOCUMENTATION</bold></td>
<td align="left" valign="top"><italic>Textual description of the analysis performed</italic></td>
<td align="left" valign="top">The kidney hierarchical composite endpoint by Day 1080 is analyzed using win odds</td>
</tr>
<tr>
<td align="left" valign="top"><bold>PROGRAMMING STATEMENTS</bold></td>
<td align="left" valign="top"><italic>The analysis syntax used to perform the analysis</italic></td>
<td align="left" valign="top">PROC FREQ DATA = ADHCE;<break/>TABLES TRTP * AVAL / MEASURES;<break/>ODS OUTPUT MEASURES = MEASURES0;<break/>RUN;<break/>DATA MEASURES;<break/>SET MEASURES0;<break/>WP = (VALUE + 1) / 2 ;<break/>ASE = ASE / 2 ;<break/>ALPHA = 0.05 ;<break/>C = PROBIT (1 &#8211; ALPHA / 2);<break/>WO = WP/(1-WP);<break/>LCL0 = WP &#8211; C * ASE;<break/>UCL0 = WP + C * ASE;<break/>LCL = LCL0/(1- LCL0);<break/>UCL = UCL0/(1- UCL0);<break/>Z = ABS (WP &#8211; 0.5) / ASE;<break/>P = 2 * (1 &#8211; PROBNORM (Z));<break/>KEEP WO LCL UCL P;<break/>RUN;</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Analysis and visualization using ADHCE</title>
<p>The dataset ADHCE (<xref ref-type="table" rid="T3">Table 3</xref>) is analysis-ready for win odds analysis and visualization using maraca plots. Win odds in the SAS&#174; software<sup><xref ref-type="bibr" rid="B25">25</xref></sup> (using the procedures <italic>freq</italic> or <italic>npar1way</italic>) is provided in the Appendix of Gasparyan et al.<sup><xref ref-type="bibr" rid="B17">17</xref></sup> For example, using <italic>proc freq</italic> the win odds can be calculated as follows (caution should be made to select the control group as the reference). See <bold>Box 3</bold>.</p>
<boxed-text>
<caption><p><bold>Box 3:</bold> SAS implementation of win odds</p></caption>
<p><monospace>proc freq data = ADHCE;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;tables TRTP * AVAL / measures;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;ods output Measures = Measures0;</monospace></p>
<p><monospace>run;</monospace></p>
<p><monospace>data measures;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;set measures0;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;WP = (value + 1) / 2 ;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;ASE = ASE / 2 ;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;alpha = 0.05 ;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;C = PROBIT (1 &#8211; alpha / 2);</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;WO = WP/(1-WP);</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;LCL0 = WP &#8211; C * ASE;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;UCL0 = WP + C * ASE;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;LCL = LCL0/(1- LCL0);</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;UCL = UCL0/(1- UCL0);</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;Z = abs (WP &#8211; 0.5) / ASE;</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;P = 2 * (1 &#8211; PROBNORM (Z));</monospace></p>
<p><monospace>&#160;&#160;&#160;&#160;&#160;keep WO LCL UCL P;</monospace></p>
<p><monospace>run;</monospace></p>
</boxed-text>
<p>Similarly in the R software, the package <italic>hce</italic><sup><xref ref-type="bibr" rid="B26">26</xref></sup> can be used to derive the win odds. This confirms that the analysis dataset ADHCE is analysis-ready for win odds analysis since it is possible to perform the calculations without first having to manipulate the data (<xref ref-type="table" rid="T5">Table 5</xref>). The package <italic>maraca</italic><sup><xref ref-type="bibr" rid="B27">27</xref></sup> in R can be utilized for producing <xref ref-type="fig" rid="F1">Figure 1</xref> from the dataset ADHCE with minimal programming. The maraca package recognizes the ADHCE data structure as of class &#8220;adhce&#8221;, meaning that it expects all the variables mentioned in the dataset&#8217;s derivation above and hence can effortlessly produce the plot as shown in <bold>Box 4</bold>.</p>
<boxed-text>
<caption><p><bold>Box 4:</bold> R implementation of maraca plots</p></caption>
<p><monospace>library(ggplot2)</monospace></p>
<p><monospace>library(maraca)</monospace></p>
<p><monospace>class(ADHCE) #adhce</monospace></p>
<p><monospace>plot(ADHCE)</monospace></p>
</boxed-text>
<p>The maraca plots are <italic>ggplot2</italic><sup><xref ref-type="bibr" rid="B28">28</xref></sup> objects and hence allow for customization. The maraca plots have the functionality of also producing an associated analysis dataset that can be used for validating this output.<sup><xref ref-type="bibr" rid="B29">29</xref></sup></p>
</sec>
</sec>
<sec>
<title>Discussion</title>
<p>The most important question in creating BDS datasets is the decision of when to keep the required analysis value as a new variable (column) in the dataset or as a new record (row). A similar rule exists in creating Tidy datasets, which states that the column headers should not be values, but variable names.<sup><xref ref-type="bibr" rid="B14">14</xref></sup> In the ADaM implementation, the analysis values are stored in a column called AVAL, and the rules for adding new variables that contain analysis values are stricter. The main rule is to keep all analysis values in AVAL and to group them by the analysis parameter (PARAM) values. There are some permitted deviations though. For example, the BASE variable contains the values of AVAL corresponding to the baseline (initial timepoint). While AVALCATy (eg, AVALCAT1, AVALCAT2, and so on) and AVALCAyN are parameter variant categorizations of analysis values to categorical and numerical categories, respectively. Additional variables for analysis can be created, only if they follow the fundamental rule of adding new columns to a BDS, according to which a parameter-invariant (calculated the same way for all parameters for which the variable is populated in a dataset) function of AVAL and BASE can be derived into a new variable if it does not involve a transformation of BASE. For example, the variable CHG (change from baseline), which is derived as CHG = AVAL &#8211; BASE, is parameter-invariant and does not include a transformation of BASE, so CHG can be a new column in the analysis dataset. But a transformation of analysis values that does not meet this condition should be added as a new parameter, and AVAL should contain the transformed values. Therefore, the fundamental principle of BDS is that only one analysis variable per participant can be derived as a column in the dataset (in any other case not covered by the permitted deviations and by the fundamental rule of adding new columns described above), while multiple analysis values need to be retained in the same variable under different analysis parameter values.</p>
<p>An ADaM dataset is a particular type of analysis dataset that follows the ADaM fundamental principles defined in the ADaM and is compliant to ADaM defined structures or follows as closely as possible to the ADaMIG variable naming and other conventions.<sup><xref ref-type="bibr" rid="B13">13</xref></sup> ADTTE (Time-to-event analysis dataset)<sup><xref ref-type="bibr" rid="B21">21</xref></sup> is a special case exception. It does not strictly follow the fundamental principle of basic data structure as it essentially has two analysis values: length of the follow-up (AVAL variable) and a censoring variable showing whether an event happened during that follow-up (CNSR variable). This flexibility allows two dependent variables that can be used in statistical modelling. CDISC standardization of this dataset makes this a widely used and ADaM compliant dataset. Time-to-event analyses are common in clinical trials (including as a primary analysis), hence standardization of this dataset was important and is helpful for implementation.</p>
<p>To follow the BDS fundamental principles for the hierarchical composite endpoints in the absence of fixed follow-up is difficult since the participants are compared using their shared follow-up approach.<sup><xref ref-type="bibr" rid="B15">15</xref></sup> This leads to transitivity issues<sup><xref ref-type="bibr" rid="B16">16</xref>,<xref ref-type="bibr" rid="B24">24</xref></sup> and consequently participants cannot be compared on a common clinical scale, hence the impossibility to derive one analysis value per participant. All relevant events of the participant along with the maximum length of follow-up for each participant therefore need to be retained as analysis values. Different analysis values from these multiple values would contribute to analysis that depend upon which participants are compared. Therefore, this may potentially lead to multiple analysis values per participant, hence to the creation of a non-compliant analysis dataset. This would mean that an analysis dataset for win statistics analyses with variable follow-ups will either follow the BDS principles but not be analysis-ready (multiple data transformations should be done on this dataset before win statistics can be calculated) or the dataset will be analysis-ready but will not be ADaM compliant.</p>
<p>The presence of a fixed follow-up is of course a restriction, but it solves different statistical issues (for example, the analysis results can be interpreted on a participant level which may be more clinically meaningful) and, as described in this paper, solves issues of having multiple analysis variables as columns, hence creating the possibility to derive a dataset that conforms to the fundamental principles of ADaM and is analysis-ready for multiple analyses.</p>
</sec>
<sec>
<title>Conclusion</title>
<p>We have provided the principles of constructing an analysis dataset for the hierarchical composite endpoints in a fixed follow-up setting. As an example, we have used the novel kidney HCE, but the same principles can be applied for HCEs in different therapeutic areas as well. We demonstrated that the constructed analysis dataset conforms to the fundamental principles of BDS, and so it is an ADaM compliant dataset. It is analysis-ready for multiple analyses, including generating win statistics and visualization using maraca plots. The purpose of this paper is to highlight the principles and to provide an example for content illustration with only key variables included. The constructed ADHCE dataset should not be considered as a standardization of the structure and appearance of the dataset. In line with the general note in CDISC guidance documents, eventual implementation of the dataset may follow the same principles but have a different display and contents.</p>
<p>Here we want to highlight the growing importance of hierarchical composite endpoints in clinical trials, including their use as a primary endpoint, and we urge the clinical community and CDISC to work together to derive a standardized analysis dataset for hierarchical composite endpoints and for win statistics analyses in general, similar to the ADTTE dataset. We hope that this paper serves as the first modest step in this direction.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>We would like to thank <italic>Damian Kruszewski</italic> for valuable discussions on this topic. We thank <italic>Finn Landell</italic> for their guidance and the overall support of this project.</p>
</ack>
<sec>
<title>Disclaimer</title>
<p>The contents of this paper are the work of the authors and do not necessarily represent the opinions, recommendations, or practices of AstraZeneca. Any brand and product names are trademarks of their respective companies.</p>
</sec>
<sec>
<title>Competing Interests</title>
<p>The authors have no competing interests to declare.</p>
</sec>
<ref-list>
<ref id="B1"><mixed-citation publication-type="journal"><label>1.&#160;</label><string-name><surname>Packer</surname> <given-names>M</given-names></string-name>. <article-title>Proposal for a new clinical end point to evaluate the efficacy of drugs and devices in the treatment of chronic heart failure</article-title>. <source>Journal of cardiac failure</source>. <year>2001</year>; <volume>7</volume>(<issue>2</issue>): <fpage>176</fpage>&#8211;<lpage>182</lpage>. DOI: <pub-id pub-id-type="doi">10.1054/jcaf.2001.25652</pub-id></mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="journal"><label>2.&#160;</label><string-name><surname>Packer</surname> <given-names>M</given-names></string-name>. <article-title>Development and evolution of a hierarchical clinical composite end point for the evaluation of drugs and devices for acute and chronic heart failure: a 20-year perspective</article-title>. <source>Circulation</source>. <year>2016</year>; <volume>134</volume>(<issue>21</issue>): <fpage>1664</fpage>&#8211;<lpage>1678</lpage>. DOI: <pub-id pub-id-type="doi">10.1161/CIRCULATIONAHA.116.023538</pub-id></mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="journal"><label>3.&#160;</label><string-name><surname>Gasparyan</surname> <given-names>SB</given-names></string-name>, et al. <article-title>Hierarchical Composite Endpoints in COVID-19: The DARE-19 Trial, in Case Studies in Innovative Clinical Trials</article-title>. <source>Chapman and Hall/CRC</source>. <year>2023</year>; <fpage>95</fpage>&#8211;<lpage>148</lpage>. DOI: <pub-id pub-id-type="doi">10.1201/9781003288640-7</pub-id></mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="journal"><label>4.&#160;</label><string-name><surname>Little</surname> <given-names>DJ</given-names></string-name>, et al. <article-title>Validity and utility of a hierarchical composite endpoint for clinical trials of kidney disease progression: A review</article-title>. <source>Journal of the American Society of Nephrology</source>. <year>2023</year>; <volume>34</volume>(<issue>12</issue>): <fpage>1928</fpage>&#8211;<lpage>1935</lpage>. DOI: <pub-id pub-id-type="doi">10.1681/ASN.0000000000000244</pub-id></mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="journal"><label>5.&#160;</label><string-name><surname>Dong</surname> <given-names>G</given-names></string-name>, et al. <article-title>Win statistics (win ratio, win odds, and net benefit) can complement one another to show the strength of the treatment effect on time-to-event outcomes</article-title>. <source>Pharmaceutical Statistics</source>; <year>2022</year>. DOI: <pub-id pub-id-type="doi">10.1002/pst.2251</pub-id></mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="journal"><label>6.&#160;</label><string-name><surname>Karpefors</surname> <given-names>M</given-names></string-name>, <string-name><surname>Lindholm</surname> <given-names>D</given-names></string-name>, <string-name><surname>Gasparyan</surname> <given-names>SB</given-names></string-name>. <article-title>The maraca plot: A novel visualization of hierarchical composite endpoints</article-title>. <source>Clinical Trials</source>. <year>2022</year>; <volume>20</volume>(<issue>1</issue>): <fpage>84</fpage>&#8211;<lpage>88</lpage>. DOI: <pub-id pub-id-type="doi">10.1177/17407745221134949</pub-id></mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="journal"><label>7.&#160;</label><string-name><surname>Heerspink</surname> <given-names>HJ</given-names></string-name>, et al. <article-title>Development and Validation of a New Hierarchical Composite End Point for Clinical Trials of Kidney Disease Progression</article-title>. <source>Journal of the American Society of Nephrology</source>. <year>2023</year>; <volume>34</volume>(<issue>12</issue>): <fpage>2025</fpage>&#8211;<lpage>2038</lpage>. DOI: <pub-id pub-id-type="doi">10.1681/ASN.0000000000000243</pub-id></mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="journal"><label>8.&#160;</label><string-name><surname>Kondo</surname> <given-names>T</given-names></string-name>, et al. <article-title>Use of Win Statistics to Analyze Outcomes in the DAPA-HF and DELIVER Trials</article-title>. <source>NEJM Evidence</source>. <year>2023</year>; <volume>2</volume>(<issue>11</issue>): <elocation-id>EVIDoa2300042</elocation-id>. DOI: <pub-id pub-id-type="doi">10.1056/EVIDoa2300042</pub-id></mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="journal"><label>9.&#160;</label><string-name><surname>Kosiborod</surname> <given-names>M</given-names></string-name>, et al. <article-title>Effects of dapagliflozin on prevention of major clinical events and recovery in patients with respiratory failure because of COVID-19: Design and rationale for the DARE-19 study</article-title>. <source>Diabetes, Obesity and Metabolism</source>. <year>2021</year>; <volume>23</volume>(<issue>4</issue>): <fpage>886</fpage>&#8211;<lpage>896</lpage>. DOI: <pub-id pub-id-type="doi">10.1111/dom.14296</pub-id></mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="journal"><label>10.&#160;</label><string-name><surname>Kosiborod</surname> <given-names>MN</given-names></string-name>, et al. <article-title>Dapagliflozin in patients with cardiometabolic risk factors hospitalised with COVID-19 (DARE-19): a randomised, double-blind, placebo-controlled, phase 3 trial</article-title>. <source>The Lancet Diabetes Endocrinology</source>. <year>2021</year>; <volume>9</volume>(<issue>9</issue>): <fpage>586</fpage>&#8211;<lpage>594</lpage>. DOI: <pub-id pub-id-type="doi">10.1016/S2213-8587(21)00180-7</pub-id></mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="journal"><label>11.&#160;</label><string-name><surname>Pocock</surname> <given-names>SJ</given-names></string-name>, et al. <article-title>The win ratio method in heart failure trials: lessons learnt from EMPULSE</article-title>. <source>European journal of heart failure</source>; <year>2023</year>. DOI: <pub-id pub-id-type="doi">10.1002/ejhf.2853</pub-id></mixed-citation></ref>
<ref id="B12"><mixed-citation publication-type="journal"><label>12.&#160;</label><collab>CDISC</collab>, <source>Analysis Data Model (ADaM)</source>; <year>2009</year>.</mixed-citation></ref>
<ref id="B13"><mixed-citation publication-type="journal"><label>13.&#160;</label><collab>CDISC</collab>, <source>Analysis Data Model Implementation Guide</source>; <year>2021</year>.</mixed-citation></ref>
<ref id="B14"><mixed-citation publication-type="journal"><label>14.&#160;</label><string-name><surname>Wickham</surname> <given-names>H</given-names></string-name>. <article-title>Tidy data</article-title>. <source>Journal of Statistical Software</source>. <year>2014</year>; <volume>59</volume>(<issue>10</issue>): <fpage>1</fpage>&#8211;<lpage>23</lpage>. DOI: <pub-id pub-id-type="doi">10.18637/jss.v059.i10</pub-id></mixed-citation></ref>
<ref id="B15"><mixed-citation publication-type="journal"><label>15.&#160;</label><string-name><surname>Pocock</surname> <given-names>SJ</given-names></string-name>, et al. <article-title>The win ratio: a new approach to the analysis of composite endpoints in clinical trials based on clinical priorities</article-title>. <source>European Heart Journal</source>. <year>2012</year>; <volume>33</volume>(<issue>2</issue>): <fpage>176</fpage>&#8211;<lpage>182</lpage>. DOI: <pub-id pub-id-type="doi">10.1093/eurheartj/ehr352</pub-id></mixed-citation></ref>
<ref id="B16"><mixed-citation publication-type="journal"><label>16.&#160;</label><string-name><surname>Brunner</surname> <given-names>E</given-names></string-name>, <string-name><surname>Vandemeulebroecke</surname> <given-names>M</given-names></string-name>, <string-name><surname>M&#252;tze</surname> <given-names>T</given-names></string-name>. <article-title>Win odds: An adaptation of the win ratio to include ties</article-title>. <source>Statistics in Medicine</source>. <year>2021</year>; <volume>40</volume>(<issue>14</issue>): <fpage>3367</fpage>&#8211;<lpage>3384</lpage>. DOI: <pub-id pub-id-type="doi">10.1002/sim.8967</pub-id></mixed-citation></ref>
<ref id="B17"><mixed-citation publication-type="journal"><label>17.&#160;</label><string-name><surname>Gasparyan</surname> <given-names>SB</given-names></string-name>, et al. <article-title>Power and sample size calculation for the win odds test: application to an ordinal endpoint in COVID-19 trials</article-title>. <source>Journal of Biopharmaceutical Statistics</source>. <year>2021</year>; <volume>31</volume>(<issue>6</issue>): <fpage>765</fpage>&#8211;<lpage>787</lpage>. DOI: <pub-id pub-id-type="doi">10.1080/10543406.2021.1968893</pub-id></mixed-citation></ref>
<ref id="B18"><mixed-citation publication-type="journal"><label>18.&#160;</label><string-name><surname>Gasparyan</surname> <given-names>SB</given-names></string-name>, et al. <article-title>Adjusted win ratio with stratification: calculation methods and interpretation</article-title>. <source>Statistical Methods in Medical Research</source>. <year>2021</year>; <volume>30</volume>(<issue>2</issue>): <fpage>580</fpage>&#8211;<lpage>611</lpage>. DOI: <pub-id pub-id-type="doi">10.1177/0962280220942558</pub-id></mixed-citation></ref>
<ref id="B19"><mixed-citation publication-type="journal"><label>19.&#160;</label><string-name><surname>Dong</surname> <given-names>G</given-names></string-name>, et al. <article-title>The win ratio: on interpretation and handling of ties</article-title>. <source>Statistics in Biopharmaceutical Research</source>; <year>2019</year>. DOI: <pub-id pub-id-type="doi">10.1080/19466315.2019.1575279</pub-id></mixed-citation></ref>
<ref id="B20"><mixed-citation publication-type="journal"><label>20.&#160;</label><string-name><surname>Buyse</surname> <given-names>M</given-names></string-name>. <article-title>Generalized pairwise comparisons of prioritized outcomes in the two-sample problem</article-title>. <source>Statistics in Medicine</source>. <year>2010</year>; <volume>29</volume>(<issue>30</issue>): <fpage>3245</fpage>&#8211;<lpage>3257</lpage>. DOI: <pub-id pub-id-type="doi">10.1002/sim.3923</pub-id></mixed-citation></ref>
<ref id="B21"><mixed-citation publication-type="journal"><label>21.&#160;</label><collab>CDISC</collab>. <source>The ADaM Basic Data Structure for Time-to-Event Analyses</source>; <year>2012</year>.</mixed-citation></ref>
<ref id="B22"><mixed-citation publication-type="book"><label>22.&#160;</label><string-name><surname>Stokes</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Davis</surname> <given-names>CS</given-names></string-name>, <string-name><surname>Koch</surname> <given-names>GG</given-names></string-name>. <source>Categorical data analysis using SAS</source>. <edition>Third</edition> ed. <year>2012</year>: <publisher-name>SAS institute</publisher-name>.</mixed-citation></ref>
<ref id="B23"><mixed-citation publication-type="book"><label>23.&#160;</label><string-name><surname>Harrell</surname> <given-names>FE</given-names></string-name>. <source>Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis</source>. Vol. <volume>608</volume>. <year>2001</year>: <publisher-name>Springer</publisher-name>. DOI: <pub-id pub-id-type="doi">10.1007/978-1-4757-3462-1</pub-id></mixed-citation></ref>
<ref id="B24"><mixed-citation publication-type="journal"><label>24.&#160;</label><string-name><surname>Gasparyan</surname> <given-names>SB</given-names></string-name>, et al. <article-title>Design and Analysis of Studies Based on Hierarchical Composite Endpoints: Insights from the DARE-19 Trial</article-title>. <source>Ther Innov Regul Sci</source>. <year>2022</year>; <volume>56</volume>(<issue>5</issue>): <fpage>785</fpage>&#8211;<lpage>794</lpage>. DOI: <pub-id pub-id-type="doi">10.1007/s43441-022-00420-1</pub-id></mixed-citation></ref>
<ref id="B25"><mixed-citation publication-type="webpage"><label>25.&#160;</label><collab>SAS Institute Inc</collab>. <source>The SAS System. Version 9.4</source>. <year>2013</year>, <publisher-name>SAS Institute Inc.</publisher-name>, <uri>http://www.sas.com/</uri>: <publisher-loc>Cary, NC</publisher-loc>.</mixed-citation></ref>
<ref id="B26"><mixed-citation publication-type="webpage"><label>26.&#160;</label><string-name><surname>Gasparyan</surname>, <given-names>SB</given-names></string-name>. <source>hce: Design and Analysis of Hierarchical Composite Endpoints</source>. R package version &gt;=0.5.0. <year>2022</year>. <uri>https://CRAN.R-project.org/package=hce</uri>.</mixed-citation></ref>
<ref id="B27"><mixed-citation publication-type="webpage"><label>27.&#160;</label><string-name><surname>Karpefors</surname> <given-names>M</given-names></string-name>, <string-name><surname>Gasparyan</surname> <given-names>SB</given-names></string-name>, <string-name><surname>Huhn</surname> <given-names>M</given-names></string-name>. <article-title>maraca: The Maraca Plot: Visualization of Hierarchical Composite Endpoints in Clinical Trials</article-title>. R package version &gt;=0.5.0. <year>2023</year>. <uri>https://CRAN.R-project.org/package=maraca</uri>.</mixed-citation></ref>
<ref id="B28"><mixed-citation publication-type="book"><label>28.&#160;</label><string-name><surname>Wickham</surname> <given-names>H</given-names></string-name>. <source>ggplot2: Elegant Graphics for Data Analysis</source>. Use R! <year>2016</year>: <publisher-loc>Springer New York, NY</publisher-loc>. DOI: <pub-id pub-id-type="doi">10.1007/978-0-387-98141-3</pub-id></mixed-citation></ref>
<ref id="B29"><mixed-citation publication-type="webpage"><label>29.&#160;</label><string-name><surname>Major</surname> <given-names>N</given-names></string-name>, et al. <source>Validating novel maraca plots&#8211;R and SAS love story</source>. <uri>https://www.pharmasug.org/proceedings/2023/SA/PharmaSUG-2023-SA-068.pdf</uri>, in <italic>PharmaSUG 2023</italic>. <year>2023</year>: <publisher-loc>San Francisco</publisher-loc>.</mixed-citation></ref>
</ref-list>
</back>
</article>