Longitudinal Magnetic Resonance Imaging as a Potential Correlate in the Diagnosis of Alzheimer Disease: Exploratory Data Analysis

Background: Alzheimer disease (AD) is a degenerative progressive brain disorder where symptoms of dementia and cognitive impairment intensify over time. Numerous factors exist that may or may not be related to the lifestyle of a patient that result in a higher risk for AD. Diagnosing the disorder in its beginning period is important, and several techniques are used to diagnose AD. A number of studies have been conducted on the detection and diagnosis of AD. This paper reports the empirical study performed on the longitudinal-based magnetic resonance imaging (MRI) Open Access Series of Brain Imaging dataset. Furthermore, the study highlights several factors that influence the prediction of AD. Objective: This study aimed to correlate the effect of various factors such as age, gender, education, and socioeconomic background of patients with the development of AD. The effect of patient-related factors on the severity of AD was assessed on the basis of MRI features, Mini-Mental State Examination (MMSE), Clinical Dementia Rating (CDR), estimated total intracranial volume (eTIV), normalized whole brain volume (nWBV), and Atlas Scaling Factor (ASF). Methods: In this study, we attempted to establish the role of longitudinal MRI in an exploratory data analysis (EDA) of AD patients. EDA was performed on the dataset of 150 patients for 343 MRI sessions (mean age 77.01 [SD 7.64] years). The T1-weighted MRI of each subject on a 1.5-Tesla Vision (Siemens) scanner was used for image acquisition. Scores of three features, MMSE, CDR, and ASF, were used to characterize the AD patients included in this study. We assessed the role of various features (ie, age, gender, education, socioeconomic status, MMSE, CDR, eTIV, nWBV, and ASF) on the prognosis of AD. Results: The analysis further establishes the role of gender in the prevalence and development of AD in older people. Moreover, a considerable relationship has been observed between education and socioeconomic position on the progression of AD. Also, outliers and linearity of each feature were determined to rule out the extreme values in measuring the skewness. The differences in nWBV between CDR=0 (nondemented), CDR=0.5 (very mild dementia), and CDR=1 (mild dementia) are significant (ie, P<.01). Conclusions: A substantial correlation has been observed between the pattern and other related features of longitudinal MRI data that can significantly assist in the diagnosis and determination of AD in older patients. (JMIR Biomed Eng 2020;5(1):e14389) doi: 10.2196/14389


Introduction
Alzheimer disease (AD) is a degenerative brain ailment characterized by the development of dementia and other related cognitive impairments [1][2][3].It is a heterogeneous, irreversible neurodegenerative disorder that may find an association with genetic complexity in the individual.The Alzheimer's Association describes dementia as a syndrome comprising a cluster of symptoms that encompass several features including age, gender, education, and the Mini-Mental State Examination (MMSE) of the inflicted patients [4].
There has been a significant increase in the number of AD cases in recent years.It has been reported that it is the sixth most diagnosed disease in the Unites States.As of 2018, 5.7 million Americans of all ages have been diagnosed with AD [4].Approximately 44 million people worldwide are living with AD or an associated kind of dementia [5].
With the advancement of technology pertaining to treatment methodologies and development of novel diagnostic tools, many of the modern age diseases are being diagnosed earlier and treated successfully.In contrast, AD still remains a poorly diagnosed ailment with little success in treatment.
In the information technology era, machine and deep learning tools have found a wide scope in medical diagnosis [6].Although medical expert opinion, disease symptom, and other related data from the patient remain the prime parameters that help in the diagnosis of a particular disease, machine learning predictions, data analytics visualizations, and other artificial intelligence techniques have emerged as alternate ways to predict diseases and help the current state of the medical world in a great way [7,8].
The occurrence of cognitive disorders is a common feature observed in elderly people, and this can be considered a primary indication of a growing dementing syndrome like AD [9].Individuals with cognitive disorders experience mild cognitive impairment (MCI) [10][11][12].Various biomarkers or related parameters may evolve that can help in the diagnosis of AD in patients.Similarly, techniques like magnetic resonance imaging (MRI) studies, positron emission tomography scans, and neurochemical testing of the cerebrospinal fluid can also help in the diagnosis of AD [13,14].
In this study, we systematically examined the distinct and interactive impact of age, gender, education, socioeconomic status (SES), Mini Mental State Examination (MMSE), Clinical Dementia Rating (CDR), estimated total intracranial volume (eTIV), normalized whole brain volume (nWBV), and Atlas Scaling Factor (ASF) on the basis of several longitudinal MRI sessions of various patients.The information was retrieved from the Open Access Series of Brain Imaging (OASIS-2) dataset.We performed exploratory data analysis (EDA) to understand the correlation between various feature sets.Consistent with the literature, we predicted that men were more likely to be diagnosed with AD compared with women.The gender bias can be correlated to the dataset dependency.The ε4 allele of the apolipoprotein E gene (APOE-ε4) has also been reported to play a major role in the occurrence of AD.We did not include APOE-ε4 data in the study in order to avoid the complexity.A significant relationship has been observed among educational background and SES of the patients and emergence of dementia.Anomalies and linearity of each of the features were resolved to remove extreme values in determining the skewness.

Subjects
The dataset used in this study consists of a longitudinal collection of MRI data in demented and nondemented older adults.A total of 150 subjects aged 60 to 96 years participated in 373 MRI sessions.The data included in this study were based on the subjects reported to a longitudinal collection of MRI scans at the Washington University Alzheimer Disease Research Center [15].

Methodology
In this analysis, we cataloged previous EDA.The general objective of the study was to report the relative association between the target group (demented or nondemented) and other features that play a major role in the diagnosis of AD.Furthermore, we examined the risk of AD induction in inflicted patients.We analyzed longitudinal MRI data of both healthy patients and patients with AD [15].

Scoring Rules
In this study, we used the following instruments to determine the state of the healthy versus inflicted brain.
• SES: according to the Hollingshead Index of Social Position, the SES is classified into groups of highest status (1) and lowest status (0) [16] • MMSE: values range from 0 to 30; 0 to 9 indicates extreme impairment, 10 to 18 demonstrates moderate dementia, 19 to 23 mild dementia, and 24 to 30 is considered normal [17] • CDR: scored after a semistructured discussion with the patient, with scores ranging from 0 to 3 (ie, 0=none, 0.5=very mild, 1=mild, 2=moderate, 3=extreme dementia) [18] Experiment Environment Empirical analysis of the dataset described in this paper was performed using Python libraries conducted on the Jupyter platform of Anaconda Navigator.The Jupyter platform presents a well-defined skeleton for developers to process, develop, and assess their models.Python is an interpreted and high-level programming language comprising dynamic semantics.It includes Seaborn, a visualization library through which statistical graphs can be plotted with the aim of performing univariate and multivariate analyses.

Exploratory Data Analysis
EDA is a data analysis methodology using techniques that are usually graphical.It maximizes understanding of the dataset, reveals underlying structure, detects anomalies and outliers, extracts imperative features, and ascertains ideal factor settings [19].EDA is not similar to statistical graphics despite the fact that the two terms are used interchangeably.It is a more direct approach that allows the data to reveal the underlying model and its structure [19].
In this study, we focused on establishing a correlation between attributes of MRI tests and patient classification groups.The primary objective of performing this exploratory analysis was to determine the association of data among the features before performing the data analysis or data extraction process.It was supposed to assist in understanding the data subclassification and facilitate choosing the proper analysis technique for the model later.

Dataset Description
The dataset comprised 373 observations and 15 attributes, out of which group was the target variable while the rest were the independent variables in this empirical study.Table 1 provides a description of the dataset attributes.The P values used for comparison in the study are shown in Table 2.

Summary Statistics
Statistical information includes count, mean, standard deviation, first quartile, second quartile (median), third quartile, and minimum and maximum values of each attribute as shown in Table 3.
From the data depicted in Table 3, we can infer that the mean value is less than the median on some features and greater than the median value on certain another sets of features.The median value is represented by 50% (50th percentile) in the index column.The median value of each feature aids in the data preprocessing when dealing with the imputation step.There is a large difference in the 75th percentile and maximum values of predictors in MR delay, CDR, and eTIV.The observation suggests the occurrence of extreme values (ie, outliers) in the dataset.

Data Exploration
Initially, the dataset consisted of 373 MRI sessions out of which there were nondemented (n=72), demented (n=64), and converted patients (n=14).On the first visit, patients were grouped as nondemented and were categorized as demented at a later visit.The 14 converted patients are those patients which were found to be nondemented in the first visit, but in their second and third visits, they were diagnosed with dementia.Therefore, only the subjects of the first visit are being considered throughout the study, and total of 150 subjects have been explored under this analysis.
The dataset consists of many missing values (ie, some of the rows of certain attributes consist of no value, which is determined during the EDA step).To locate exactly which column comprises missing values, a heat map is plotted for all 373 MRI sessions initially, consisting of all the patient visits (Figure 2A).The SES and MMSE columns contain missing values (represented by yellow lines on a purple background).Figure 2B

Results
In this section, the results of the EDA are reported.Subsequent to applying the preprocessing and data preparation strategies, we attempted to break down the data outwardly and make sense of the dispersion of features as far as adequacy and effectiveness are concerned.By breaking down data, we have tried to make it more simple and meaningful.This helped in increasing the efficiency of the analysis.

Patient Demographic Profiles
The study comprised 62 males and 88 females within the age range of 60 to 96 years.Table 4 illustrates the demographic summary of patients who were examined for AD.

Gender and Demented Proportion
The bar chart as demonstrated in Figure 4 confirms that men are more prone to dementia than women.The blue color, coded as 0, represents female, while the orange color, coded as 1, represents male.Of the 150 patients, 78 are in the demented category.Of the 78 demented patients, 40 are male.

Correlation Matrix With a Heat Map
In order to build the model, an essential condition is to eliminate the correlated variables.Correlations were obtained by applying the Python Pandas corr() function, which aided us in visualizing the correlation grid built using a heat map.
The correlation matrix with heat map is illustrated in Figure 5.The dark shades represent positive correlation while lighter shades represent negative correlation.We exclude the target variable (ie, group) and then checked for the correlated independent variables.Thus we can infer that eTIV has a strong positive correlation with male/female (M/F) whereas it has a strong negative correlation with ASF among all.

Outliers Check With Box-Whisker Plot
A box-whisker plot displays the spread of quantitative data in a manner that facilitates comparisons between attributes.In Figure 6, the box illustrates the dataset's quartiles whereas the whiskers stretch out to demonstrate whatever remains of the dispersion.The box-whisker schema is a standardized method for displaying the data distribution, which is dependent on 5 major aspects: minimum value, first quartile, median value (second quartile), third quartile, and maximum value.The middle rectangle traverses the first quartile to the third quartile, known as interquartile range (IQR).A fragment inside the rectangle demonstrates the median value.Whiskers above and beneath the rectangle demonstrate the areas of the minimum and greatest value.Outliers are either 3×IQR or progressively over the third quartile or 3×IQR or more beneath the first quartile.Thus, we can infer from Figure 6 that age, patient education level (EDUC), SES, MMSE, eTIV, and nWBV feature columns show outliers.

Skewness and Distribution Plot
The linearity of the attributes was determined by plotting a distribution graph.The graph was used to study the skewness of both the target variable and the independent variables.From Figure 7, it can be concluded that group, visit, MR delay, M/F, hand, and age feature columns appear to be normally distributed while all the remaining independent variables are discovered to be experiencing skewness.

Effect of Independent Variable on Dependent Variable
A graph was plotted between the target variable (ie, group: demented/nondemented) and independent variables.We plotted 8 such graphs, for age, EDUC, MMSE, ASF, eTIV, nWBV, SES, and CDR, shown in Figure 8.
The following features were inferred: (1) age: between 60 and 90 years; (2) EDUC: demented patients were less educated; (3) SES: considerable in the prevalence of dementia as we move from highest status to lowest status; (4) MMSE: nondemented group got much higher MMSE scores than the demented group; (5) CDR: more individuals with a score of 0.5 (ie, very mild dementia), fewer individuals with a score of 1 (ie, mild dementia), and very few with a score of 0 (ie, no dementia); (6) eTIV: higher for demented patients; (7) nWBV: nondemented group has higher brain volume ratio than demented group; and (8) ASF: demented patients have higher score than nondemented ones.The differences in nWBV between CDR=0 (nondemented), CDR=0.5 (very mild dementia), and CDR=1 (mild dementia) comes out to be significant (ie, P<.01).

Impact of Socioeconomic Status and Education Level in the Demented Group
The relationship between SES and EDUC on dementia can be inferred from Figure 9, which shows that individuals with the highest status (1) exhibit higher education levels while individuals with the lowest status ( 5) exhibit lower education level.Thus, years of education have an immense effect on dementia.The scatter plot with linear regression lines for SES

Correlation Between Converted Patients and Clinical Dementia Rating
The data shown in Table 3 suggest that 14 patients converted.These patients were earlier classified as nondemented and in a later visit found to have dementia.We tended to draw a relationship among these 14 converted patients with their respective CDR values on subsequent (second and third) visits.For developing a correlation between dementia and other related factors, we focused on changes incurred in CDR values.For earlier visits, it was 0.0, signifying that the patient was nondemented, while at a later visit, it changed to 0.5, indicating the patient had very mild dementia.Figure 10 shows a correlation between converted patients with their CDR values.

Principal Findings
This study provides an understanding of attributes related to AD in older adults.We observed that men are more likely to have AD compared with women.There are several major differences that frequently appear between men and women in the occurrence, presentation, and development of psychiatric disorders [20].Earlier studies suggested that women are more prone to develop AD since they are at greater risk of depression compared with men [21].The genetic factor APOE-ε4 has also been reported to affect men and women differently [21].Riedel et al [22] stated that age, APOE-ε4, and sex are the most serious risk factors in the development of AD.Further, the rate of AD is practically identical in women and men until late age when the frequency becomes more prominent in women [22].
We performed an empirical analysis on the dataset comprising longitudinally obtained T1-weighted MRI data of 150 patients aged between 60 to 96 years.Among the 15 studied features, we found that only gender, age, educational years, SES, MMSE, CDR, eTIV, and nWBV were significantly associated with making an impact on the occurrence of AD in both demented and nondemented subjects.Our analysis demonstrated that patients aged between 70 and 90 years exhibit a higher clustering of dementia than nondemented patients.Since AD has a lower survival rate, it is the reason why data available in the aged patient is scarce.All patients examined were right-handed, thus handedness doesn't have an effect in this analysis.Of the 150 patients, demented patients were found to be less educated compared with nondemented patients (Figure 8B).
We found an independent link between various features in both demented and nondemented groups and found that there were numerous correlated indicators of AD.Unfortunately, this study lacks an adequate feature set that could have helped in uncovering related associations efficiently.
We observed that over the change from higher (score 1) to lower (score 5) SES, there was a considerable decrease in the prevalence of dementia.In general, education has been found to be directly associated with SES.In fact, there seems to be a high to moderate level of association between education and occupation-based SES [23].Social epidemiology relates education with SES by defining education as "the transition from a socioeconomic position largely received from parents to an achieved socioeconomic position as an adult" [16].Various components of SES, viz education, income and occupational status, can influence AD development in the aged patients [24].
The MMSE, a complete measure of cognitive impairment, has been widely used in the detection of AD.Arevalo-Rodriguez et al [25] performed an analysis to determine the MMSE accuracy for the detection of AD in people with mild MCI.In fact, the MMSE score cannot aid in categorizing people as demented or nondemented [25].In contrast to this, we identified that the nondemented study group got a much higher MMSE score than the demented group.
The scoring of the CDR have been widely used in clinical trials and longitudinal studies to determine the state of dementia [18].We found the CDR peaks at 0.5 (very mild dementia), followed by 1 (mild dementia) and 0 (no dementia).Unsurprisingly, our results are in agreement with those illustrated by Marcus et al [15], which states that patients who were categorized to be nondemented in the first visit were found to be demented in later visits with a CDR of greater than 0.
The plot for eTIV summarizing various data shows that demented patients have more eTIV compared with nondemented patients (Figure 8F).The intracranial volume, describing brain size, is found to be less in AD patients.Earlier, Tate et al [26] reported that there were certain patients for which the total intracranial volume emerged to have an impact on dementia prediction when the data were examined in a nonparametric manner.
In line with an earlier study using a subset of the data [27], we found that the nondemented group had a higher nWBV than the demented group.This could be attributed to the fact that AD may lead to shrinkage of neuronal tissues of the brain.Marcus et al [15] exploited nWBV as an approach to evaluate the anatomical features of the brain to determine the level of dementia.Several other studies suggested that nWBV declines upon advancement of AD stage and growing age of the patients [27][28][29][30][31].
Our findings suggest that demented patients have a higher ASF when compared with nondemented ones.The scaling factor changes the skull and native-space brain to the atlas target, which is determined by calculating the determinant of the transform matrix [32].
On the basis of data analysis, we infer that there was no correlation between the repeated measures.In longitudinal data analysis, it seems to be an easy and straightforward approach but an unrealistic alternative.To this end, we can justify it as a fair approach to assess the relationship among covariates irrespective of the visits.This structure was chosen at the commencement of the analysis, and we suggested that it bears a resemblance to the experimental correlations for improved estimate of standard errors.

Limitation
More feature set brain mapping is required to strengthen the robustness of the results and discover the causal methods underlying the relation between distinct features of both longitudinal and cross-sectional MRI data and the consequence on the late-life health.

Conclusion and Future Work
This study highlights the relationship between the target and the independent features in MRI sessions of AD patients.It can be argued that whatever effect the independent features have on the prediction of the target variable (demented/nondemented), it is unlikely to be dependent on the sample size relationship.We infer that men are more likely to suffer from AD than women.The study also finds that attributes such as eTIV, nWBV, and ASF have a greater correlation in the prevalence of AD in women compared with men.Finally, we conclude that imaging biomarkers play a major role in the diagnosis of AD.

Figure 1
Figure 1 outlines the dataset attributes in terms of the total count of each attribute for 15 columns on the basis of null/nonnull and data type of respective attributes.It can be seen from the figure that SES and MMSE consist of values less than the total 373 MRI sessions, marked by the red right bracket in the figure.This is what missing values relates to.The rest of the features, marked by the blue right brackets, do not contain any missing values (ie, for the total 373 sessions, all recorded MRI features emerged as nonnull and without any missing value).
delineates the count of missing values in numeric form for all attributes.Figures 3A and 3B highlight the heat map and count of missing values for the 150 subjects for visit 1. SES is the only feature that consists of 8 missing values, while the rest of the features have all values filled.

Figure 2 .
Figure 2. Illustration of missing values for 373 magnetic resonance imaging sessions for all patient visits.

Figure 3 .
Figure 3. Outline of missing values for 150 patients for the first visit.

Figure 5 .
Figure 5. Heat map illustrating the correlations among the dataset features.

Figure 7 .
Figure 7. Distribution plot of the dataset features.
a positive correlation among EDUC and SES.

Figure 9 .
Figure 9. Scatter plot for socioeconomic status and level of education.

Figure 10 .
Figure 10.Distribution plot for converted patients and their Clinical Dementia Rating value.

Table 1 .
Detail of dataset attributes.

Table 2 .
P value for the corresponding attribute.

Table 3 .
Summary statistics of each attribute.
b MR: magnetic resonance.c EDUC: educational level of the patient.d SES: socioeconomic status.e MMSE: Mini-Mental State Examination.f CDR: Clinical Dementia Rating.g eTIV: estimated total intracranial volume.h nWBV: normalized whole brain volume.i ASF: Atlas Scaling Factor.

Table 4 .
Demographic profile of the study population (n=150).