Predictive Modeling for Clinical Features Associated With Neurofibromatosis Type 1
Citation Manager Formats
Make Comment
See Comments

Abstract
Objective To perform a longitudinal analysis of clinical features associated with neurofibromatosis type 1 (NF1) based on demographic and clinical characteristics and to apply a machine learning strategy to determine feasibility of developing exploratory predictive models of optic pathway glioma (OPG) and attention-deficit/hyperactivity disorder (ADHD) in a pediatric NF1 cohort.
Methods Using NF1 as a model system, we perform retrospective data analyses using a manually curated NF1 clinical registry and electronic health record (EHR) information and develop machine learning models. Data for 798 individuals were available, with 578 comprising the pediatric cohort used for analysis.
Results Males and females were evenly represented in the cohort. White children were more likely to develop OPG (odds ratio [OR]: 2.11, 95% confidence interval [CI]: 1.11–4.00, p = 0.02) relative to their non-White peers. Median age at diagnosis of OPG was 6.5 years (1.7–17.0), irrespective of sex. Males were more likely than females to have a diagnosis of ADHD (OR: 1.90, 95% CI: 1.33–2.70, p < 0.001), and earlier diagnosis in males relative to females was observed. The gradient boosting classification model predicted diagnosis of ADHD with an area under the receiver operator characteristic (AUROC) of 0.74 and predicted diagnosis of OPG with an AUROC of 0.82.
Conclusions Using readily available clinical and EHR data, we successfully recapitulated several important and clinically relevant patterns in NF1 semiology specifically based on demographic and clinical characteristics. Naive machine learning techniques can be potentially used to develop and validate predictive phenotype complexes applicable to risk stratification and disease management in NF1.
Neurofibromatosis type 1 (NF1) is one of the most common monogenic disorders, occurring in 1 of every 3,000 births. Caused by germline mutations in the NF1 gene (OMIM: 613113), NF1 is a fully penetrant disorder; however, it is marked by extreme clinical variability, with highly discordant clinical phenotypes. At present, it is not possible at the time of diagnosis to predict which patients with NF1 will develop specific clinical manifestations such as optic pathway glioma (OPG) or neurobehavioral problems (e.g., attention-deficit/hyperactivity disorder [ADHD]) in the future. This high degree of clinical heterogeneity hampers accurate predictive assessment relevant to precision medicine and limits clinicians' ability to focus medical resources on individuals with NF1 at the highest risk for specific complications. As a result, disease monitoring and surveillance guidelines are inconsistently implemented across the NF1 population.1,2
Our ability to implement proactive approaches to the care of individuals with NF1 requires a delineation of potential risk factors for specific disease phenotypes. In this regard, recent studies have used clinical data to link age,3 sex,3,4 comorbid diagnoses,5 and NF1 coding variants6,-,8 to important NF1-related outcomes. As an initial step toward developing clinically actionable predictive algorithms in NF1, we used informatics-based approaches to perform a longitudinal analysis of NF1 clinical features stratified across demographic characteristics. In addition, we determined the feasibility of developing an informatics-based exploratory predictive model of OPG and ADHD in a pediatric cohort by applying machine learning strategies to a manually curated NF1 clinical registry and existing electronic health record (EHR) data.
Methods
Patients and Data Description
This study was performed using retrospective clinical data extracted from 2 sources within the Washington University Neurofibromatosis (NF) Center. First, data were extracted from an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children's Hospital. All individuals included in this database had a clinical diagnosis of NF1 based on current National Institutes of Health Consensus Development Conference diagnostic criteria9 and had been assessed over multiple visits from 2002 to 2016 for the presence of clinical features associated with NF1. Data points in this registry included demographic information, such as age, race, and sex, in addition to NF1-related clinical features and associated conditions, such as café-au-lait macules, skinfold freckling, cutaneous neurofibromas, Lisch nodules, OPG, hypertension, ADHD, and cognitive impairment. These data were maintained in a semistructured format containing textual and binary fields, capturing each individual's data over multiple clinical visits. From these data, clinical features and phenotypes were extracted using data manipulation, imputation, and text mining techniques. Data obtained from this NF1 clinical registry were converted to data tables, which captured each patient visit and the presence/absence of specific clinical features at each visit. Clinical features that were once marked as present were assumed to be present for all future visits, and missing data were assumed absent for that specific visit. Categorical variables are reported as frequencies and proportions and compared using odds ratios (ORs). Continuously distributed traits, adhering to both conventional normality assumptions and homogeneity of variances, are reported as mean and standard deviations and compared using analysis of variance methods. Nonparametric equivalents were used for data with nonnormative distributions.
Clinical Feature Extraction From Clinical Registry and EHR
The NF1 Clinical Registry comprised string-based clinical feature values, such as ADHD, OPG, and asthma. From these data, we extracted 27 unique clinical features in addition to longitudinal data on the development of NF1-related clinical features and associated diagnoses. For each clinical feature, age at initial presentation and/or diagnosis was computed, and median age of occurrence was calculated for each sex. The exact age of presentation and/or diagnosis could not be definitively ascertained for any feature that was present at a child's initial clinic visit. As such, we computed the age of diagnosis only for clinical features for which we have at least one visit documenting feature absence before the manifestation of that feature.
Diagnosis codes from the EHR-derived data set were also extracted. Diagnosis codes were recorded as 15,890 unique International Classification of Diseases, Ninth Revision/Tenth Revision (ICD-9/10) codes. Given the large number of ICD-9/10 codes, a consistent, concept-level roll up of relevant codes to a single phenotype description was created by mapping the extracted ICD-9/10 values to phenome-wide association codes called Phecodes,10,11 which have been demonstrated to better align with clinical disease compared with individual ICD codes.12
Machine Learning Analyses
Using a combination of clinical features obtained from the NF1 Clinical Registry and EHR-derived data sets, we developed prediction models using a gradient boosting platform for identifying patients with specific NF1-related diagnoses to establish the usefulness of clinical history and documentation of clinical findings in predicting the phenotypic variability of NF1. Initial analyses used a state-of-the-art classification algorithm, gradient boosting model, which uses a tree-based algorithm to produce a predictive model from an ensemble of weak predictive models. A gradient boosting model was selected as it supports identifying the importance of features used in the final prediction model. Subsequent analyses used training each model for 3 different feature sets: (1) demographic features for all patients, including race, sex, and family history of NF1 (5 features); (2) clinical features associated with NF1 (27 features) extracted from the NF1 Clinical Registry; and (3) diagnosis codes extracted from the EHR data, which were reduced to 50 Phecodes. Four-fold cross-validation was then applied for the 3 models, and comparisons for the prediction accuracies of each model were determined. A positive predictive value, F1 score, and the area under the receiver operator characteristic (AUROC) curve were used as evaluation metrics. Scikit Learn, a machine learning library in Python, was used to implement all analyses.13
Standard Protocol Approvals, Registrations, and Patient Consents
The NF1 Clinical Registry is an existing longitudinal clinical registry that was manually curated using clinical data obtained from patients followed in the Washington University NF Clinical Program at St. Louis Children's Hospital. All individuals included in this database have a clinical diagnosis of NF1 based on current National Institutes of Health criteria and have provided informed consent for participation in the clinical registry. All data collection, usage, and analysis for this study were approved by the Institutional Review Board at the Washington University School of Medicine.
Data Availability
Anonymized data not published within this article will be made available by request from any qualified investigator.
Results
Prevalence Analysis
Data for 798 individuals were available in the NF1 Clinical Registry, in which the majority of individuals were under the age of 18 years, likely reflecting the clinical referral bias of pediatric patients to the Washington University NF Clinical Program. Consistent with an absence of any notable sex predilections for the diagnosis of NF1, males and females were evenly represented in the database, and the majority of individuals in the database were White, similar to the demographics of the catchment area of St. Louis Children's Hospital (81.8% vs 82.8%, χ2 = 0.068, p = 0.79) (Table 1).17 Table 1 includes the distribution of race for the non-White individuals (18.2%) comprising of American Indian or Alaska Native, Asian, Black, Native Hawaiian or other Pacific Islander and Others.
Demographics From Neurofibromatosis Type 1 Clinical Registry
Among the pediatric patients included in the NF1 Clinical Registry (n = 578), White children were more likely to develop OPG relative to non-Whites (OR: 2.11, 95% confidence interval [CI]: 1.11–4.00, p = 0.02), as previously reported (Table 2).14,15 White children were more likely to have Lisch nodules than their non-White peers (OR: 1.75, 95% CI: 1.15–2.67, p = 0.009), consistent with previous studies demonstrating a greater likelihood of developing Lisch nodules in individuals with light irides compared with those with dark irides.16 Of interest, White children were less likely to exhibit skinfold freckling than their non-White peers (OR: 0.28; 95% CI: 0.09–0.03, p = 0.04), a finding not previously reported. Finally, non-White children were less likely to harbor T2 hyperintensities on neuroimaging in the basal ganglia (OR: 1.95, 95% CI: 1.10–3.45, p = 0.02) and cerebellum (OR: 2.15, 95% CI: 1.20–3.85, p = 0.01) compared with Whites.
Prevalence of Clinical Features Associated With NF1 in the Pediatric Cohort, Stratified by Sex and Race
To complement these findings, a similar analysis was performed using data from the NF1 Clinical Registry, revealing an elevated male-to-female sex ratio for the diagnosis of ADHD (OR: 1.90, 95% CI: 1.33–2.70, p < 0.001). This likely reflects important sex differences related to the clinical presence of impulsivity and hyperactive behaviors among males relative to females in the context of NF1.17 Furthermore, females were more likely to have a diagnosis of scoliosis compared with males in the NF1 Clinical Registry (OR: 1.77, 95% CI: 1.17–2.66, p = 0.01), consistent with the female predominance observed in idiopathic (non-NF1) juvenile scoliosis.18
Cutaneous neurofibromas were the most common tumor manifestation in this cohort, reported in 59% of both females and males with NF1. Slightly more than 200 of 578 children with NF1 (35%) presented with a plexiform neurofibroma, which is in accordance with previously reported frequencies (16%–40%).19 Studies have shown that individuals with NF1 have an 8–13% lifetime risk of developing malignant peripheral nerve sheath tumors (MPNSTs); however, the mean age of diagnosis of MPNSTs is typically older than 25 years.20,21 Because adult data were excluded from analysis, the prevalence of MPNST in this cohort was low (1.2%). Despite MPNST diagnosis being more prevalent in females (6 females compared with 1 male; p = 0.08), no significant sexual dimorphism for was observed in this cohort, similar to previous reports.22,23
Finally, more children in the NF1 Clinical Registry were found to have a maternal family history of NF1 compared with a paternal family history of NF1 (28.3% vs 17.3%, χ2 = 15.5, p < 0.001), despite the expected equal distribution of maternal and paternal inheritance in familial NF1.24 Although a prominent maternal parent-of-origin bias has been observed for familial NF1 microdeletion syndrome,25 other studies have failed to demonstrate a parent-of-origin effect for NF1 as a whole.24,26
Age-Based Analysis
Of 578 patients, 438 (76%) patients were included in the age-based analysis as they had multiple clinical visits. The mean interval between 2 consecutive visits in our data set was 470 days (SD: 310, median: 378 days). All 438 patients presented to their initial clinic visit with café-au-lait macules, thus precluding estimates of a median age at onset (Figure, A; Table 3). As previously noted, skinfold freckling is apparent in most children by age 8–9 years, whereas Lisch nodules are detected in 50% of affected individuals by the early teens. Scoliosis was most likely to present during early adolescence, with a median age at onset of 12.5 years, without a significance difference in age at onset between sexes. The median age at ADHD diagnosis was 9.1 years (3.1–17.9), and an earlier diagnosis was observed in males (8.6 years vs 9.4 years; p = 0.42).
The Y-axis shows the percentage of children who presented with a particular clinical feature as a function of age (years). ADHD = attention-deficit/hyperactivity disorder; MPNST = malignant peripheral nerve sheath tumor; OPG = optic pathway glioma.
Median Age of Neurofibromatosis Type 1–Associated Features—Individuals With >1 Clinical Visit
With respect to tumor development, the median age at diagnosis of OPG was 6.5 years (1.7–17.0), irrespective of sex. Of interest, endocrinologic issues were significantly more likely to present earlier in female children (3.7 years vs 11.3 years, p = 0.013), perhaps reflecting a greater proportion of symptomatic OPG in females, which results in precocious puberty or other hormonal derangements.27 Cutaneous neurofibromas increase as a function of age, whereas plexiform neurofibromas are usually detected during the first decade of life in both males and females with NF1. Seven children, 1 male and 6 females, in our cohort were diagnosed with MPNST. The male child was diagnosed at age 5.9 years, and the median age of diagnosis for the female children was 15.7 years. This is consistent with previous pediatric case reports that demonstrate early age of MPNST diagnosis in males with NF1.28,29
Prediction Analysis Using Clinical Features
Exploratory prediction analyses were performed for the diagnosis of ADHD and OPG, 2 common NF1 clinical phenotypes. Both diagnoses were present in greater than 18% of children in the cohort and exhibited variable ages at onset and trends that indicated a propensity for sexual and racial dimorphism. Models for predicting plexiform neurofibroma were included for comparison purposes. The generated prediction models performed well, and the performance increased with the addition of clinical features (Table 4). The Gradient Boosting classification model predicted the clinical diagnosis of ADHD with an AUROC of 0.74 and predicted the diagnosis of OPG with an AUROC of 0.82. For the OPG gradient boosting classification model, the most important demographic feature was White race, female sex, and a maternal history of NF1. The presence of precocious puberty, T2 hyperintensities within the cerebellum, basal ganglia, and other locations, as well as the presence of Lisch nodules, plexiform, and dermal neurofibromas were the most predictive clinical features, whereas the most important EHR-derived codes included kyphoscoliosis and scoliosis, amblyopia, and other dyschromia. For the ADHD model, the most important demographic feature was male sex and a family history of NF1 irrespective of the parent. The most important clinical features included the presence of a learning disability, scoliosis, Lisch nodules, plexiform, and dermal neurofibromas. The most predictive EHR-derived codes included other benign neoplasm of connective and other soft tissues. For the exploratory model of plexiform neurofibromas, the AUROC was 69%, in which the most important demographic feature was white race, female sex, and maternal history of NF1. The most important clinical feature was dermal neurofibromas, Lisch nodules, and learning disability. The most predictive EHR-derived codes included other dyschromia, astigmatism, and disorders of optic nerve and visual pathways.
Cross-Validation Performance Results for Predicting OPG, ADHD, and Plexiform Neurofibromas Among Children With Neurofibromatosis Type 1
Discussion
Previous studies aimed at determining prognostic markers for NF1 have identified only a small number of demographic and clinical characteristics relevant to risk stratification for NF1-related medical complications.4 The primary challenge encountered in these studies is that the associations between the identified prognostic determinants and patient outcomes are generally weak from a quantitative perspective, which significantly limits their applicability for clinical decision making. Similarly, although there is extant literature aimed at dissecting the genetic basis of phenotypic heterogeneity in NF1,6,-,8,30 the translation of such sequencing-based disease staging/monitoring into prognostic models has been limited. Together, NF1 can be accurately and reproducibly diagnosed in children, but subsequent disease management of affected patients is not informed by empiric or widely understood prognostic features. This challenge is emblematic of the broader challenge of informing and delivering precision medicine, wherein sufficiently granular and tailored evidence either does not exist or has not been studied in systematic ways. As such, identifying computational approaches whereby evidence can be generated based on existing data sets, wherein NF1 can be systematically and reproducibly diagnosed and where subsequent disease surveillance and management can be made less variable and more precise, is an ideal test case for the methods that will inform and enable precision medicine writ large.
We hypothesize that one of the primary reasons for the failure of current approaches to identify clinically useful prognostic factors for NF1 is the reliance on conventional and reductionist pair-wise association testing in which dyads of clinical features and outcomes are iteratively tested for quantitatively significant associations in a population of patients. Fortunately, there are an increasing number of machine learning and multiscale modeling techniques that can provide investigators and clinicians with the tools needed to quickly generate hypotheses concerning the relationship between entities found in heterogeneous collections of scientific data—for example, exploring potential linkages between a gene, phenotype, and disease management protocols, thus enabling the forward engineering of prognostic and therapeutic strategies based on knowledge generated via basic science studies.31,-,34
First, the demographics of the current cohort accurately reflects that of the greater referral population, substantiating the absence of a sex or racial predilection in children with NF1. Second, we could effectively reproduce the racial discordance previously reported for OPG14; however, we also explored previously unknown racial differences in the development of other NF1-related clinical features, including pigmentary abnormalities and T2 hyperintensities. Although further work will be required to define the basis for racial disparities in T2 hyperintensities, we and others have reported a reduced incidence of gliomas in non-White patients compared with Whites,35,-,38 even among those with NF1.14 Because gliomas and T2 hyperintensities can be difficult to distinguish without applying strict radiographic criteria,39 it is possible that some of these brain lesions were actually low-grade gliomas. These findings suggest that race may serve as an important predictive factor for a variety of different NF1-related features. Further investigation into the racial differences observed in NF1 is warranted. Third, our data analysis revealed a clear female predominance for the development of scoliosis in NF1, which is a well-established association in juvenile idiopathic scoliosis,18 but it is poorly recognized in the context of NF1. Fourth, although there was sexual dimorphism for OPG, a finding that has been reproduced many times in the NF1 literature,3,40,41 we found an earlier age at onset of endocrinologic abnormalities in females, supporting previous studies demonstrating a greater risk for precocious puberty and vision loss in young females with NF1.3,27 Fifth, the earlier development of MPNSTs in male children with NF1 is difficult to interpret because of the limited sample size but warrants further evaluation.
Studying the influence of age and demographic characteristics on the development of NF1 clinical features has the potential to inform more personalized approaches to the identification of symptom complexes and ultimately the clinical management of children with NF1. As such, the application of modern computational approaches42,43 to NF1 facilitated the development of exploratory predictive models with variable performance to identify patients with OPG, ADHD, and plexiform neurofibromas using demographic, clinical features, and EHR data recorded before the clinical manifestation of the feature. The variability in model performance demonstrated herein for diagnosis of OPG and ADHD is most reasonably explained by differences in disease presentation, diagnostic methodology, and differences in clinical expertise of the NF1 clinician. We anticipate that these models would enable evidence-based, precision medicine approaches to the management and treatment of individuals diagnosed with NF1 (where such approaches currently do not exist) and further be applicable to other cancers in which the intersection of complex clinical and pleiotropic disease phenotypes must be understood to predict and understand oncogenesis.
As with all studies using EHR data, 1 inherent limitation of this study relates to the quality and completeness of the EHR data, as well as the racial composition of our clinic population. Nonetheless, this is the first study to use high dimensional clinical phenotypes extracted from electronically collected and heterogeneous clinical records to develop prediction models for features associated with NF1. Together, future application of these methodologies to the study of NF1 is expected to advance the diagnosis and care of patients and develop predictive models for subphenotyping and proactive management of NF1, thus representing an opportunity to use precision medicine paradigms in disease states in which the current evidence base precludes such an approach.
Study Funding
Study funded in part by a grant from the NCI (1-R35-NS097211-01 to D.H.
TAKE-HOME POINTS
→ NF1 can be accurately and reproducibly diagnosed in children, but subsequent disease management of affected patients is not informed by empiric or widely understood prognostic features.
→ Longitudinal analysis and exploratory predictive models developed and validated for clinical features associated with NF1 enable evidence-based, precision medicine approaches to the management and treatment of individuals diagnosed with NF1.
→ Using retrospective data analysis, we successfully recapitulated several clinically relevant patterns in NF1 semiology.
→ The machine learning model developed using EHRs and curated clinical data predicted the diagnosis of ADHD with an AUROC of 0.74 and predicted diagnosis of OPG with an AUROC of 0.82.
Disclosure
The authors report no disclosures relevant to the manuscript. Full disclosure form information provided by the authors is available with the full text of this article at Neurology.org/cp.
Appendix Authors

Footnotes
Funding information and disclosures are provided at the end of the article. Full disclosure form information provided by the authors is available with the full text of this article at Neurology.org/cp.
↵* These authors contributed equally to this work.
The Article Processing Charge was funded by the authors.
- Received August 6, 2020.
- Accepted February 25, 2021.
- Copyright © 2021 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Academy of Neurology.
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
References
- 1.↵
- Gutmann DH,
- Ferner RE,
- Listernick RH,
- Korf BR,
- Wolters PL,
- Johnson KJ
- 2.↵
- 3.↵
- 4.↵
- Morris SM,
- Acosta MT,
- Garg S, et al
- 5.↵
- Eby NS,
- Griffith JL,
- Gutmann DH,
- Morris SM
- 6.↵
- 7.↵
- 8.↵
- 9.↵Neurofibromatosis: conference statement. JAMA Neurol. 1988;45(5):575-578.
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- Abadin SS,
- Zoellner NL,
- Schaeffer M,
- Porcelli B,
- Gutmann DH,
- Johnson KJ
- 15.↵
- King A,
- Listernick R,
- Charrow J,
- Piersall L,
- Gutmann DH
- 16.↵
- Boley S,
- Sloan JL,
- Pemov A,
- Stewart DR
- 17.↵
- Cohen R,
- Halevy A,
- Aharon S,
- Shuper A
- 18.↵
- 19.↵
- 20.↵
- Evans DGR,
- Baser ME,
- McGaughran J,
- Sharif S,
- Howard E,
- Moran A
- 21.↵
- Uusitalo E,
- Rantanen M,
- Kallionpää RA, et al
- 22.↵
- Bates JE,
- Peterson CR,
- Dhakal S,
- Giampoli EJ,
- Constine LS
- 23.↵
- van Noesel MM,
- Orbach D,
- Brennan B, et al
- 24.↵
- 25.↵
- Neuhäusler L,
- Summerer A,
- Cooper DN,
- Mautner VF,
- Kehrer-Sawatzki H
- 26.↵
- Riccardi VM,
- Wald JS
- 27.↵
- Virdis R,
- Sigorini M,
- Laiolo A, et al
- 28.↵
- Kudesia S,
- Bhardwaj A,
- Thakur B,
- Kishore S,
- Bahal N
- 29.↵
- 30.↵
- 31.↵
- Ahn AC,
- Tewari M,
- Poon CS,
- Phillips RS
- 32.↵
- 33.↵
- 34.↵
- Payne PR,
- Johnson SB,
- Starren JB,
- Tilson HH,
- Dowdy D
- 35.↵
- Jiang W,
- Rixiati Y,
- Kuerban Z,
- Simayi A,
- Huang C,
- Jiao B
- 36.↵
- Ostrom QT,
- Cote DJ,
- Ascha M,
- Kruchko C,
- Barnholtz-Sloan JS
- 37.↵
- Peckham-Gregory EC,
- Montenegro RE,
- Stevenson DA, et al
- 38.↵
- Stenzel AE,
- Fenstermaker RA,
- Wiltsie LM,
- Moysich KB
- 39.↵
- Griffith JL,
- Morris SM,
- Mahdi J,
- Goyal MS,
- Hershey T,
- Gutmann DH
- 40.↵
- Melloni G,
- Eoli M,
- Cesaretti C, et al
- 41.↵
- Trevisson E,
- Cassina M,
- Opocher E, et al
- 42.↵
- 43.↵
- Way GP,
- Allaway RJ,
- Bouley SJ,
- Fadul CE,
- Sanchez Y,
- Greene CS
The Nerve!: Rapid online correspondence
REQUIREMENTS
You must ensure that your Disclosures have been updated within the previous six months. Please go to our Submission Site to add or update your Disclosure information.
Your co-authors must send a completed Publishing Agreement Form to Neurology Staff (not necessary for the lead/corresponding author as the form below will suffice) before you upload your comment.
If you are responding to a comment that was written about an article you originally authored:
You (and co-authors) do not need to fill out forms or check disclosures as author forms are still valid
and apply to letter.
Submission specifications:
- Submissions must be < 200 words with < 5 references. Reference 1 must be the article on which you are commenting.
- Submissions should not have more than 5 authors. (Exception: original author replies can include all original authors of the article)
- Submit only on articles published within 6 months of issue date.
- Do not be redundant. Read any comments already posted on the article prior to submission.
- Submitted comments are subject to editing and editor review prior to posting.
You May Also be Interested in
Dr. Nicole Sur and Dr. Mausaminben Hathidara
► Watch
Related Articles
- No related articles found.