Education and professional development

Big Data Education for Clinical Documentation

Authors: Melinda Jenkins (Rutgers University) , Judith Katz (Emory University) , Courtney Omary (Emory University) , Suzan Ahmad (Rutgers University) , Ramya Govindarajan (Emory University) , Roy Simpson (Emory University)

  • Big Data Education for Clinical Documentation

    Education and professional development

    Big Data Education for Clinical Documentation

    Authors: , , , , ,


Nursing educators and students need new competencies and tools for creating and analyzing clinical documentation and big data to maximize value-based reimbursements and data sharing.  Carefully structured and shareable documentation of nursing diagnoses, goals, interventions and outcomes, available in the EHR (electronic health record), may be aggregated and analyzed to provide a foundation for population health, quality improvement and clinical research.  One way to access anonymized clinical data for use in academic and clinical data sharing is via Project NeLL™ (Nurses electronic Learning Laboratory™), an innovative suite of online applications for teaching and practicing nursing data science from the Nell Woodruff Hodgson School of Nursing of Emory University's Center for Data Science.  Emory is collaborating with Rutgers University School of Nursing to teach graduate students using Project NeLL.

Keywords: Collect data, Query Database

How to Cite:

Jenkins, M., Katz, J., Omary, C., Ahmad, S., Govindarajan, R. & Simpson, R., (2023) “Big Data Education for Clinical Documentation”, Journal of the Society for Clinical Data Management 3(S1). doi:



Published on
08 Nov 2023
Peer Reviewed

Nursing educators and students need new competencies and tools for creating and analyzing clinical documentation and big data to maximize value-based reimbursements and data sharing. Carefully structured and shareable documentation of nursing diagnoses, goals, interventions and outcomes, available in the EHR (electronic health record), may be aggregated and analyzed to provide a foundation for population health, quality improvement and clinical research.

With copious clinical data in EHRs, as academic institutions endeavor to provide graduate- and DNP-level students with hands-on big data experience, the need for anonymized databases of clinical information has never been stronger. Anonymized data is essential to be HIPPA compliant. Patient confidentiality must be a key priority of any aggregated data set so actual patients cannot be identified. Leveraging this data in the service of clinical documentation, clinical inquiry, quality improvement, and research offers nursing, for the first time, an opportunity to take an anticipatory approach to patient care.

However, few academic institutions have the information technology staff and expertise to secure clinical data from actual patient care episodes across the continuum of care and the resources to remove patient identifiers from that data. It is an additional challenge to provide a user-friendly interface to facilitate searches through large and complex EHR datasets.

One such way to access anonymized clinical data for use in academic and clinical data sharing is via Project NeLL™ (Nurses electronic Learning Laboratory™), a web-based application for teaching and practicing nursing data science.2 Project NeLL™ was conceptualized by faculty and staff at the Nell Woodruff Hodgson School of Nursing of Emory University’s (NHWSON) Center for Data Science (CDS).

The Project NeLL™ suite includes:

  • Searchable electronic health record (EHR) database: this contains patient records from 2012 through 2019 (soon expanding to December 2021); randomly selected across all healthcare entities of Emory Healthcare.

  • User-friendly interface: this is designed to point and click to select the desired data without any prior knowledge of SQL or other programming languages.

  • A comprehensive Data dictionary: this allows users to understand each data element and the typical values.

  • Data analysis and visualization tools: there are several ways the selected data can be displayed, and a depiction of how codes are layered.

  • Interactive learning resources: 4 modules on Big Data cover the principles, approaches, tools, etc. to effectively use big data for research and quality improvement.

  • Video library: include “how to” various aspects of the application, forming research questions, and using Excel.

Emory’s NHWSON CDS built the Project NeLL™ database with patient data extracted from a multi-year data warehouse. Emory Healthcare used the Cerner EHR system until October 2022 before migrating to EPIC. Currently, the Project NeLL™ database has data from the Cerner system but will be expanding to include data from the Epic system gradually. Guided by HIPPA regulations, the de-identification procedures removed patients’ names, addresses, phone numbers; regenerated IDs; shifted admission and treatment dates; and truncated zip codes to three digits. The aggregated data, which included seven years of patient care provided by a major healthcare organization, represented hospital and clinical interactions across the continuum of care. The database includes more than 1.2 million unique patients, 2.7 billion healthcare system visits, and 37 trillion data points. In contrast to the MIMIC dataset that is limited to intensive care records, the Project NeLL™ database contains patient records across hospital and ambulatory care settings. In addition, Project NeLL™ will be continuously updated with recent patient encounters.

Project NeLL™ Facilitates Nursing Education, Clinical Discovery

Project NeLL™ was incorporated into the NHWSON’s Doctor of Nursing Practice (DNP) informatics coursework, which was conducted in the Spring and Summer 2022 semesters. Multiple modules of the coursework focused on studying big data concepts including data mining, legal and ethical issues, information literacy, data management plans and basic data analysis, and visualization.

In the user friendly front-end, multiple pathways are available to explore, including queries driven by diagnosis, procedure, medications, demographics, lab values, etc. An embedded data dictionary supports curious inquiry, and ICD code visualization provides students with an optimal learning experience. Project NeLL™ also builds students’ data evaluation skills by providing descriptive statistics and graphical data visualization, a type of “data preview,” of a limited dataset before the complete dataset is downloaded.

Faculty use Project NeLL™ in course work to build nurse scientists’ skills of collecting, analyzing, and interpreting big data in the context of improving patient and population care, influencing policy, and creating efficiencies in healthcare systems. Practical content can be built around the use of Project NeLL™ depending on the academic course’s focus, whether it is research process, quality improvement, informatics, leadership, or data analysis.

Feedback from the first cohort of students led faculty to use video tutorials to incorporate spreadsheet fundamentals into the learning. By the end of the course, students independently designed and implemented data query based on a clinical question, analyzed Project NeLL™ data, and presented evidence-based recommendations based on their analysis.

Case: Analyzing Statin Use by Demographic Groups

To illustrate how Project NeLL™’s rich database can be mined to yield insights into previously difficult to understand queries, consider this case examining statin use that was completed by NHWSON students.

Research has indicated that for patients with cardiovascular disease, certain populations are far less likely to receive guideline-recommended statin therapy than others. A retrospective study of the National Health and Nutrition Examination Survey found that being Black and Hispanic, having a low income, lacking health insurance coverage, having limited access to health care services, and being a female at a young age renders patients less likely to be prescribed a statin than those individuals who do not fit this profile.3 In addition, there is extensive evidence to show that women are far less likely than men to be prescribed guideline-recommended statin therapy.4,5 This finding is surprising because research has shown that female patients with cardiovascular disease derive the same or greater benefit from statin therapy as male patients with cardiovascular disease.6

To better understand why statin use within these specific demographic profiles is not pursued, students executed a demographic analysis to explore the trends of statin therapy adherence against a sample from the Project NeLL™ population matching the profile. While working on the case study, students got to use the NeLL video tutorials, the data dictionary and retrieved data from the EHR big database using the user friendly front-end.

Additionally, DNP students used Project NeLL™ to investigate topics such as:

  • the cost/value of employing nurse anesthetists compared to other providers during one cardiac procedure

  • racial differences in opioid administration among breast cancer patients

  • the effectiveness of ED triage algorithms

  • feasibility of quality improvement interventions for blood transfusions, mental health care coordination, and targeted telehealth programs.

Overall, DNP students reported being challenged in learning new data search and discovery skills. Additionally, the students gained a deep understanding of data quality and of nursing’s professional responsibility to use big data to positively influence the delivery of care, patient outcomes, healthcare systems, and the profession itself.

Rutgers School of Nursing Serving as Beta Site

Now, for the first time, CDS is offering this vitally important database to leading academic institutions across the country with an interest in exploring big data in the service of clinical inquiry.

Today, Rutgers School of Nursing (RSON) is serving as the Beta site to demonstrate the potential of rigorous nursing documentation in Nursing Informatics curriculum and to test how Project NeLL™’s extensive anonymized patient care database can be used for Nursing-centered big data exploration and analysis. RSON is the first academic institution outside of Emory to use Project NeLL™ in the service of graduate- and DNP-level coursework. Prior to using Project NeLL™ in the Information Technology course that is required for all DNPs, students practiced calculating quality measures from a small anonymized clinical dataset.

One goal of integrating Project NeLL™ at RSON is to use real life clinical data to teach students some of the current big data challenges. An additional goal is to make students aware of the high value of concise nursing documentation feeding EHR big data so that it may be transformed into knowledge and wisdom to improve care. Nursing students learn the basics of health data science with Project NeLL™ through technologies that gather, validate, structure, and analyze the multidisciplinary EHR data.

The use of Beta testers is an essential phased process of software application development. The application is tested by users working under standard operating conditions in two segments: Alpha testing, the first phase of user evaluation; and Beta testers.7 Both groups involve testers who resemble the market segment for whom the application was developed. Once Alpha evaluations are complete, a distinct second group of similar users is recruited to conduct Beta testing. Testers from both phases enter into their respective evaluation periods knowing that the software is close to completion but not perfect. The purpose of Alpha and Beta testing is to identify operational units of the product that are not performing to specifications and/or offer opportunities to improve.

How Rutgers Students Experienced Project NeLL

When RSON first began to work with Project NeLL™, approximately 80 graduate students were enrolled in Fall 2022. Students were patient with a few technical difficulties that surfaced early on. We addressed issues with email addresses, payments, passwords, and completion of legal agreements that were needed before gaining access to Project NeLL™’s data. With the competent and ready assistance of Project NeLL™ staff, RSON eased the students into their own search for EHR data relevant to their respective DNP specialties.

The Rutgers students signed a Sublicense Agreement and then were given a log-on to the Project NeLL™ application. They accessed the Project NeLL™ website with their laptops. The application is a web-based model and therefore no software is downloaded. Students purchase a one-year access to the application and may access the application as often as they need to. They may select multiple data sets and review them before downloading a data set as a cmv file. Most students used Excel to analyze their data.

Before diving into Project NeLL™, students experienced the public quality reporting that relies on EHR data. They searched the CMS Care Compare website for a hypothetical or family need, such as Hospitalization for Heart Failure.8 Then, they checked the data elements and quality measures that made up the Five-Star Quality Ratings System. For example, the quality of care rating for a hospital included timely and effective care, complications and deaths, unplanned hospital visits, psychiatric unit services, payment, and value of care. The infection rate and death rate for in-hospital heart failure was compared to a national benchmark. Many students commented that they had not heard of the Care Compare website before this assignment.

Electronic Clinical Quality Measures

Students began their independent big data learning by selecting an Electronic Clinical Quality Measure (eCQM) relevant to their specialty practice and they defined how they would query the Project NeLL™ data to calculate the quality measure. The foundation of clinical quality analyses, eCQMs, which stem from evidence-based guidelines, underlie much of the public quality data from the Centers for Medicare and Medicaid Services online.9

Rutgers students consulted the Project NeLL™ data dictionary, searched its web-based EHR data for eCQMs related to their clinical specialty and drilled down into the data needed for the numerator and denominator to calculate each measure. A large part of the lesson centered on evaluating and critically thinking about the specific database tables needed to collect eCQM data. As they isolated the data elements and specified the required tables, many students began to recognize, often for the first time, the tremendous magnitude and variety of EHR data, even though some data might be missing or buried in text notes. This recognition builds competencies that will carry through into their clinical courses and their own careful documentation.

For example, CMS2v12, Preventive care and screening: Screening for depression and follow-up plan, is defined as “Percentage of patients aged 12 years and older screened for depression on the date of the encounter or up to 14 days prior to the date of the encounter using an age-appropriate standardized depression screening tool AND if positive, a follow-up plan is documented on the date of or up to two days after the date of the qualifying encounter.”10 Students considered where the documentation was found, and who did it, to satisfy the numerator “Patients screened for depression on the date of the encounter or up to 14 days prior to the date of the encounter using an age-appropriate standardized tool AND if positive, a follow-up plan is documented on the date of or up to two days after the date of the qualifying encounter.” This search brought awareness to the need to document with a standardized and validated depression screening tool, such as the Patient Health Questionnaire (PHQ9), as well as the potential for a reminder within the EHR as decision support to do the screening.11

Another Look at Statin Use

RSON students also used Project NeLL™ data to calculate an eCQM on the use of statin therapy for the prevention and treatment of cardiovascular disease (CMS347v5).12 This standard has been determined based upon the American College of Cardiology (ACC)/American Heart Association (AHA)/Multi-society (MS) 2019 Guideline recommendation and is intended to provide a strong evidence-based foundation for the treatment of blood cholesterol for the primary and secondary prevention and treatment of patients with atherosclerotic cardiovascular disease (ASCVD) in patients of all ages.13

In the statin quality measure case study, the Project NeLL™ data dictionary supported the students’ online search through the diagnosis and medication tables to calculate the proportion of “All patients who have an active diagnosis of clinical ASCVD or ever had an ASCVD procedure” who were prescribed statin therapy during a two-year period, 2017–2019. A sample of 1000 ASCVD patient encounters from Project NeLL™ were reviewed to determine their statin therapy prevalence.

Quality and population health data online

In another exercise, students searched online for population health indicators from two or more counties in their state. They compared this data to a query for the same population health indicators specific to a zip code for patients in the Project NeLL™ database.14 They discussed evidence-based nursing and public health interventions available to address modifiable risk factors related to the population health indicators. For example, looking into data from a county with higher tobacco use than a neighboring county prompted a discussion of assessments and interventions relative to tobacco use. This analysis fits with eCQM CMS138v11, Preventive care and screening: Tobacco use, screening and cessation intervention.15 Students viewed the required numerator and denominator data in Project NeLL™ to calculate the eCQM.

Finally, students were given a choice of case studies relevant to different specialties. They were free to search Project NeLL™ for data relevant to their preliminary Doctor of Nursing Practice project questions. They used the skills learned from previous exercises to identify a quality of care clinical question, design a query, and evaluate and analyze a Project NeLL™ data set. They submitted their queries and spreadsheets showing the results of their analyses.

Focusing on Clinical Decision Support

After learning the specifics of data needed for electronic Clinical Quality Measures (eCQMs), students discussed a nursing-focused clinical decision support that they have used or read about. They proposed a method in the EHR to collect the data elements that would likely be required to activate the decision support. Students then identified gaps in the quality of care and considered DNP projects that could be built around decision support designed to improve the quality of care.

Patient Reported Outcome Measures

Students selected two measures from the Patient Reported Outcomes Measurement Information System (PROMIS) from the online catalog and designed a query to search Project NeLL™ to find any matching data and identify the database table containing the measures.16 For example, the PHQ9 mentioned above, is comparable to a PROMIS questionnaire that patients complete as part of the history taken for nursing assessment.11 PROMIS questionnaires provide structured data that may become part of an EHR or patient portal and thereby become available for quality measurement.

For a related assignment to demonstrate the potential of careful documentation, students were assigned to match standardized nursing terminology—Clinical Care Classification (CCC)—to ICD-10 and CPT codes in specialty-related superbills.17,18,19 The students then used reference sheets to match the CCC nursing diagnoses to SNOMED CT and LOINC codes.20,21 Each CCC diagnosis is modified by a goal or expected outcome.

Through these lessons, the limitations of EHRs as a data source become obvious, emphasizing the potential for improvement with rigorous clinical documentation. Several of the students came to the same conclusion faculty had – adding more structured nurse-sensitive data into EHRs can actually change the way nursing care is measured and valued.

Value of Rigorous Documentation

Overall, students were excited about being able to apply the information about quality of care research to the Project NeLL™ data source. It is a very strong way to position the EHR as a data source for public-facing quality and safety information.

Next steps at RSON include expanding the use of Project NeLL™ to focus on projects that are part of other courses. The Informatics Database course could design assignments, focusing perhaps on the table structure and available queries, around Project NeLL™. In addition, Project NeLL™ may offer value in the DNP Quality and Safety course where students learn how to evaluate and improve the quality of care. Datasets downloaded from Project NeLL™ could be used as a rich source of data for analyses in the school’s graduate Statistics course as well. Faculty leading these required DNP courses are watching the current beta experience to evaluate how Project NeLL™ could contribute to and support upcoming course and curriculum revisions.

Highlighting the potential for DNP and clinical research projects to use local de-identified data raises the hopeful expectation that local clinical data access will become part and parcel of applied science, as evidenced by Rutgers’ NIH Clinical and Translational Science Award: New Jersey Alliance for Clinical and Translational Science.22

Interoperability: The Next Frontier for Data Science

Computers have difficulty reading and consistently analyzing text-based entries, such as Nursing Notes. This unstructured data, which is the kind of clinical documentation usually taught to nursing students, contains valuable information and insights. The Federal Health IT Strategic Plan intends to enhance clinical documentation in EHRs with standardized terms using the United States Core Data for Interoperability.23

In accordance with the Federal plan, most clinical documentation will be mapped to SNOMED-CT or LOINC standardized terminology to facilitate data sharing, aggregation, and “big data” analyses.11,12,23 Patient history, vital signs and exam templates will adhere to SNOMED-CT and LOINC terms. Laboratory and imaging tests will be coded with LOINC. In addition, many valid and reliable patient questionnaires, such as Patient-Reported Outcome Measurement surveys, will be documented and matched to coded LOINC terms. Familiar nursing terms for assessment, goals, and interventions, such as those in Clinical Care Classification, are already aligned with SNOMED-CT.8 These terms can be presented on the EHR interface screen and mapped to the coded terminologies in the computer backend database.


Working with Project NeLL™ helps nurses be aware of how the old research adage of “garbage in, garbage out” affects the quality of data in EHRs, potentially compromising the effectiveness of quality assessment, data science, and nursing research. Rather than view nursing documentation as a narrative record of completed patient care tasks, the Emory and Rutgers nursing students reported that they were, for the first time, thinking of this activity as entering structured data into the EHR for scientific and research purposes. In addition, EHR data is critical to calculate value-based payment rewards for quality processes and outcomes.

As nurses progress from implementing the orders of others to making diagnoses and formulating orders of their own as advanced practice registered nurses (APRNs), Project NeLL™’s data can be instrumental in changing nurses’ mindsets from task execution to informed patient care-focused decision making. After working with Project NeLL™, APRNs and DNPs may view EHRs as an ongoing opportunity to document nursing quality. Taking ownership of EHR documentation empowers nurses to participate in a Learning Health System “in which internal data and experience are systematically integrated with external evidence, and that knowledge is put into practice. As more organizations look at value-based care and pursue their learning health systems journeys, those that do not rethink how they operate risk being left behind”.24

Project NeLL™ is currently being updated with additional years of data and the user application is being enhanced based on the feedback from Rutgers students. Emory CDS is actively seeking other nursing schools to become clients. Project NeLL™ can be easily integrated in any course focusing on big data, research methods, and clinical improvement projects. The application provides easy access to data and allows students to focus on analyzing their data and drawing conclusions.

Competing Interests

Melinda Jenkins: CCDS Custom Clinical Decision Support, Inc. Raleigh NC.


1. American Association of Colleges of Nursing. ( 2021). The Essentials: Core competencies for professional nursing education. Accessed Apr. 26, 2023.

2. Project NeLL. Nell Woodruff Hodgson School of Nursing of Emory University Center for Data Science. Updated 2022. Accessed Oct. 15, 2022.

3. Gu A, Kamat S, Argulian E. Trends and disparities in statin use and low-density lipoprotein cholesterol levels among US patients with diabetes, 1999–2014. Diabetes Res Clin Pract. 2018; 138: 1–10. DOI:

4. Zhang H, Plutzky J, Shubina M, Turchin A. Drivers of the sex disparity in statin therapy in patient with coronary artery disease: A cohort study. PLOS ONE; 2016. DOI:

5. Nanna MG, Wang TY, Xiang Q, et al. Sex Differences in the Use of Statins in Community Practice. Circ Cardiovasc Qual Outcomes. 2019; 12(8): e005562. DOI:

6. Puri R, Nissen SE, Shao M, et al. Sex-related differences of coronary atherosclerosis regression following maximally intensive statin therapy: insights from SATURN. JACC Cardiovasc Imaging. 2014; 7(10): 1013–1022. DOI:

7. Beta site. PC Magazine Encyclopedia. Updated 2022. Accessed Oct. 15, 2022.

8. Care-compare. Accessed Oct. 15, 2022.

9. Eligible Clinician Electronic Clinical Quality Measures (eCQM). eCQI Resource Center. Updated Dec. 29, 2021. Accessed Oct. 15, 2022.

10. Preventive Care and Screening: Screening for Depression and Follow up Plan. eCQI Resource Center. Updated Aug. 29, 2022. Accessed Oct. 15, 2022.

11. Kroenke K, Spitzer RL, Williams JBW. The PHQ-9. Validity of a Brief Depression Severity Measure. J Gen Intern Med. 2001 Sep; 16(9): 606–613. DOI:

12. Statin Therapy for the Prevention and Treatment of Cardiovascular Disease. ECQI Resource Center 2022. Updated Aug. 29, 2022. Accessed Oct. 15, 2022.

13. Rubenfire M. 2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease. J Am Coll Cardiol 2019.

14. County Health Rankings and Roadmaps. ( 2022). University of Wisconsin Population Health Institute. Updated 2022. Accessed Oct. 15, 2022.

15. Preventive Care and Screening: Tobacco Use: Screening and Cessation Intervention. eCQI Resource Center. Updated Aug. 29, 2022. Accessed Oct. 15, 2022.

16. Health Measures. Patient Reported Outcome Measurement Information System (PROMIS). Updated 2022. Accessed Oct. 15, 2022.

17. Clinical Care Classification. HCA Healthcare. Updated 2022. Accessed Oct. 15, 2022.

18. International Classification of Diseases (ICD). World Health Organization. Updated 2022. Accessed Oct. 15, 2022.

19. Current Procedural Terminology (CPT). American Medical Association. Updated 2022. Accessed Oct. 15, 2022.

20. SNOMED International. ( 2022). 5-Step Briefing. Available online from

21. Logical Observation Identifiers, Names and Codes (LOINC). Regenstrief Institute. Updated 2022. Accessed Oct. 15, 2022.

22. New Jersey Alliance for Clinical and Translational Science. Rutgers University. Updated 2019. Accessed Oct. 15, 2022.

23. US Core Data for Interoperability. Office of the National Coordinator of Health Information Technology. Updated 2022. Accessed Oct. 15, 2022.

24. About Learning Health Systems. Agency for Healthcare Research and Quality. Updated May 2019. Accessed Oct. 15, 2022.