Diagnosis of Type 2 Diabetes Using Electrogastrograms: Extraction and Genetic Algorithm–Based Selection of Informative Features

Background: Electrogastrography is a noninvasive electrophysiological procedure used to measure gastric myoelectrical activity. EGG methods have been used to investigate the mechanisms of the human digestive system and as a clinical tool. Abnormalities in gastric myoelectrical activity have been observed in subjects with diabetes. Objective: The objective of this study was to use the electrogastrograms (EGGs) from healthy individuals and subjects with diabetes to identify potentially informative features for the diagnosis of diabetes using EGG signals. Methods: A total of 30 features were extracted from the EGGs of 30 healthy individuals and 30 subjects with diabetes. Of these, 20 potentially informative features were selected using a genetic algorithm–based feature selection process. The selected features were analyzed for further classification of EGG signals from healthy individuals and subjects with diabetes. Results: This study demonstrates that there are distinct variations between the EGG signals recorded from healthy individuals and those from subjects with diabetes. Furthermore, the study reveals that the features Maragos fractal dimension and Hausdorff box-counting fractal dimension have a high degree of correlation with the mobility of EGGs from healthy individuals and subjects with diabetes. Conclusions: Based on the analysis on the extracted features, the selected features are suitable for the design of automated classification systems to identify healthy individuals and subjects with diabetes. (JMIR Biomed Eng 2020;5(1):e20932) doi: 10.2196/20932


Introduction
Digestion is the breakdown of food into small water-soluble molecules that can be absorbed by the intestinal epithelium [1]. During digestion, food enters the mouth and undergoes mechanical and chemical processes that result in the breakdown of food and absorption of nutrients [1].
Electrogastrography is a noninvasive technique used to measure and record the gastric myoelectrical activity associated with the process of digestion [2]. Electrogastrograms (EGG) are the recordings of the electrical signals originating from the stomach muscles. Several cutaneous electrodes are placed on the upper abdomen, over the stomach, for the acquisition of EGG signals [3]. The dominant frequency of the EGG signal is identical to the frequency of the electrical activity of the stomach. The frequency of a healthy EGG signal ranges from 2.6 to 3.7 cycles per minute (cpm), which is produced by the interstitial cells of Cajal located in the muscular wall of the gastric corpus and antrum [4].
Diabetic gastropathy is defined as a spectrum of neuromuscular abnormalities of the stomach. In diabetic gastropathy, the normal average EGG signal (3 cpm) is disrupted by bradygastrias, tachygastrias, and other mixed dysrhythmias [5]. Several studies have identified neuromuscular abnormalities in subjects with diabetes with upper gastrointestinal symptoms for the diagnosis of gastric dysrhythmias [6][7][8][9][10]. Koch et al (2001) [6] discussed the clinical applications of electrogastrography in diabetic gastropathy. Altintop et al (2016) [7] proposed the use of parametric methods such as Cramer-Rao lower bound and power spectral density for the analysis of EGG signals obtained from subjects with gastroparesis and healthy volunteers using cutaneous electrodes [7]. Additionally, the authors extracted several features from the power spectral density functions, which were utilized to identify subjects with gastroparesis and healthy subjects [7].
The frequency spectra of healthy and diabetic EGG signals often show an exponential increase of power toward the very low frequency range (<1 cpm) [5]. These frequencies are not likely to originate from the stomach or other parts of the human body. These ultralow frequency components of EGG signals may be caused by factors such as low-frequency electrode noise due to variations in electrode potential, and movement artifacts [5]. Therefore, it is necessary to filter frequencies <1 cpm to avoid false interpretation. In recent years, the empirical mode decomposition (EMD) technique has been used to preprocess or filter several biosignals with high accuracy [11][12][13][14][15]. Furthermore, a study has proposed the use of the noise-assisted multivariate empirical mode decomposition for multichannel electromyography signal processing [11].
Feature extraction is a technique used to extract useful information that is hidden in biosignals. The selection of the appropriate feature is important, as it leads to precise analysis and high classification accuracy [16]. Additionally, during the feature selection process, potentially informative features can be selected for future classification processes and analysis. Furthermore, the performance of the classifier is highly sensitive to efficient informative features [17]. Several studies have proposed various feature extraction methods such as time domain features, frequency domain features, and time frequency domain features for the analysis of biosignals [16][17][18]. A study reviewed feature extraction methods on EEG signals using linear analysis in frequency and time-frequency domains and showed that the frequency domain methods provided more detailed information on EEG signal analysis than the time-frequency methods [15]. Another study about textile image classification based on its texture used the feature extraction methods Gray level co-occurrence matrix (GLCM), linear binary pattern, and a moment invariant [18]. The study found that the best result was achieved using a combination of GLCM and linear binary pattern features [18].
The objective of this work was to extract features from EGG signals from healthy individuals and subjects with diabetes to select useful and highly informative features for the diagnosis of diabetes. Additionally, we aimed to evaluate the correlation between the selected features and the process of digestion in both groups of individuals.

Participants
A total of 30 healthy individuals and 30 subjects with diabetes participated in this study. Participants ranged in age from 20 to 50 years.
The ethical clearance (HR/2017/MS/002) to conduct this research study was obtained from Global Hospitals & Health City, Chennai.

EGG Signal Acquisition
An EGG measurement system with 3 surface electrodes was developed and used to record the EGGs from healthy individuals and subjects with diabetes. Of the 3 electrodes, 2 electrodes were positioned on the outer curvature (fundus) and on the inner curvature (mid corpus) of the stomach with a separation distance of 5 cm between the electrodes, in accordance with the standard electrode placement protocol [19][20][21]. For isolation purposes, the third electrode was placed as ground, away from the stomach area. The acquired EGGs were amplified with an amplification system developed using IC AD624 [2] and were logged using LABVIEW hardware and software.
EGGs from all participants were acquired for a period of 10 minutes (Figures 1 and 2). All EGGs were preprocessed and analyzed using custom made functions in MATLAB R2011b [2].

Preprocessing of EGG Signals
The EMD analysis was used to decompose the input EGG signal into different frequency components called intrinsic mode functions (IMFs) [11]. The number of IMFs can be extracted following two fundamental requirements, the number of extrema or zero-crossings must be the same or differ by at most 1, and the mean of upper and lower envelopes of IMFs should be 0. By applying the EMD algorithm, the EGG signal (x[n]) can be represented as follows [11,12]: where IMF i [n] is the i th IMF, r k [n] is the residue, and k is the total number of IMFs. The length, nonlinearity, and nonstationarity of the EGG signal determines the number of IMFs to be generated [11]. The EMD filter is well established and has been described in detail in the available literature [13][14][15]. In this study, the IMFs holding the ultralow frequency components <1 cpm were removed and the rest of the IMFs were added to obtain the filtered EGG signal. Further, the preprocessed EGG signals from healthy individuals and subjects with diabetes were subjected to feature extraction methods.

Feature Extraction
The feature extraction technique plays a vital role in achieving high classification accuracy in the analysis of biosignal processing. The process of feature extraction involves the transformation of raw EGG signals into a feature vector [22]. The 30 EGG signal features including descriptive statistics (mean, median, mode, minimum value, maximum value, standard deviation, skewness, and kurtosis), Hjorth parameters (activity, mobility, and complexity), entropy measures (Renyi entropy, Tsallis entropy, spectral entropy, and image entropy), fractal dimensions (Maragos fractal dimensions, MFD; and Hausdorff box count fractal dimension, HFD), the fast Fourier transform (FFT) peak, and the GLCM (contrast, correlation, energy, and homogeneity) were extracted from preprocessed healthy and diabetic EGG signals.

FFT Peak
The peak frequency of healthy and diabetic EGG signals was extracted using the FFT. By taking the FFT for recorded healthy and diabetic EGG signals, the frequency components present in the EGG signal were plotted against an amplitude spectrum of a single side. Further, the frequency component with maximum amplitude was considered as the peak frequency of an EGG signal.

Hjorth Parameters
Hjorth parameters are used to characterize the information on the temporal dynamics of the measured biosignals. In this work, the Hjorth features activity, mobility, and complexity were extracted from healthy and diabetic EGG signals.
Activity represents the measurement of variance or the average power of an EGG signal. Activity is given as follows [23]: where (y(t)) is the input EGG signal.
Mobility represents the average frequency of an EGG signal. The mobility parameter is defined as the square root of the ratio of the variance of the first derivative of the signal and the variance of the signal. Mobility of an EGG signal is defined as follows: The mobility parameter has a proportion of standard deviation of the power spectrum.
Complexity represents a measure of variability of an EGG signal. Complexity of an EGG signal is defined as follows: The complexity parameter indicates the similarity between input EGG signals to a pure sine wave. The value of complexity converges to 1 as the shape of the signal gets more similar to a pure sine wave.

Entropy Measures
Entropy is defined as a measure of disorder associated with a system, and hence, it is a measure of information content, uncertainty, and complexity of the system.
The Rényi entropy of the sample H(α) is given by the following equation [24]: where p i is the probability that a random variable takes a given value of n values and alpha is the order of the entropy measure. As alpha increases, the Rényi entropy increases. The Rényi entropy is an effective measure of the complexity of the signal [24][25][26]. The complexity of the EGG signals recorded from healthy individuals and subjects with diabetes were extracted using the Rényi entropy with 5 different orders of the entropy measure (alpha=0.2, 0.4, 0.6, 0.8, and 0.9).
The Tsallis entropy is one of the most promising information theoretic methods for biosignal analysis. The Tsallis entropy (H R ) is defined as follows [25]: where p i is a given set of probabilities and alpha is a real number. As alpha increases, the Tsallis entropy decreases. The information content of the EGG signals recorded from healthy individuals and subjects with diabetes were extracted using the Tsallis entropy with 5 different orders of the entropy measure (alpha=0.2, 0.4, 0.6, 0.8, and 0.9).
Time domain and frequency domain are the two different possible ways in which the entropy of a biosignal can be computed. The spectral entropy of EGG signals shall be computed in frequency domain [26]. The spectral components can be evaluated using the FFT. The concept of spectral entropy originates from a measure of information called Shannon entropy when applied to the power spectrum of a signal, spectral S is given as follows [27]: where , N=number of frequencies region, and p k are spectral amplitudes of k frequencies region.

Fractal Dimension
Fractals are mathematical sets with a high degree of geometrical complexity, which can model many classes of time series data as well as images [28]. Maragos and Sun [29] developed an approach for estimating the fractal dimension of time dependent signals using morphological erosion and dilation operations to create covers around a signal's graph at multiple scales. Maragos and Sun [29] proposed the "morphological covering method," which utilizes multiscale morphological operations with varying structuring elements that improve other covering methods. Experimental investigations on the morphological covering method demonstrate good performance with low estimation errors.

Spectrogram
The preprocessed EGG signals recorded from healthy individuals and subjects with diabetes were converted into a time corrected instantaneous frequency spectrogram using a spectrogram method. The spectrogram was plotted as an image with the intensities encoding the levels. The spectrogram had time on the x-axis and frequency on the y-axis [30]. Further, image entropy and HFD as well as the four GLCM features, contrast, correlation, energy, and homogeneity were extracted from converted healthy and diabetic spectrograms.
Image entropy is defined as a scalar value that represents the entropy of a grayscale image. Entropy is a measure of disorder or randomness that can be used to characterize the texture of the input image. Images with lesser entropy have lot of black sky, less contrast, and a large number of pixels. Image entropy is expressed by the equation [31]: where P i is the probability that the difference between 2 adjacent pixels is equal to i, and log 2 is the base 2 logarithm.
The HFD is a descriptor of the complexity of the geometry of a given set. The set can be the trajectory of any dynamical system and can be reconstructed from the measured data. Suppose that A is the set whose dimension is to be calculated. Let C(r,A) = {B 1 , B 2 …B K } be a finite cover of the set A by sets whose diameters are less than r. Then, the following function defines a measure of the set A [30]: For most values of D, the limit leads to a degenerate measure, either or The box-counting dimension estimate can be written as follows: with sufficiently small r. The problem is determining if a given box of grid contains a point (or points) of trajectory over all boxes in grids.
The GLCM is a sum of the number of times that the pixel with the gray level value i occurred in the specified spatial relationship to a pixel with the value j. The spatial relationship is defined as the pixel of interest and the pixel to its immediate right (horizontally adjacent). The size of the GLCM is proportional to the number of gray levels in the image [32,33]. In addition, the GLCM exposes certain properties about the spatial distribution of the gray levels in the texture image. The features contrast, correlation, energy, and homogeneity were extracted from the GLCM matrix of healthy and diabetic EGG signals.
Contrast is a measure of the intensity (contrast) between a pixel and its neighbor pixel over the whole image. Contrast is 0 for a constant image. In general, the property contrast is also known as variance and inertia [32]. Correlation is a measure of the correlation between a pixel and its neighbor pixel over the whole image. The correlation value shall be 1 or -1 for a perfectly positively or negatively correlated image, respectively. Energy is the sum of squared elements in the GLCM matrix. Energy is 1 for a constant image. In general, the property energy is also known as uniformity and uniformity of energy. Homogeneity is a measure of closeness of the distribution of elements in the GLCM to the GLCM diagonal. The homogeneity value shall be 1 for a diagonal GLCM.

Feature Selection Using a Genetic Algorithm
Using different feature extraction methods, a number of features can be extracted and, from them, effective informative features can be selected [34]. Further, the performance of a classifier is highly sensitive to the efficiency of the feature selection methods. Genetic algorithms are search-optimization techniques based on Darwin's principle of natural selection [34,35].
In this work, a genetic algorithm-based feature selection method was adapted to search, identify, and select potentially informative features from extracted healthy and diabetic EGG signal features for feature analysis. The flowchart of the genetic algorithm is shown in Figure 3. If F is the total number of features, then 2 F possible feature subsets can be created. The initial set of possible solutions or populations with a fixed population size is randomly constructed and fitness of each individual is evaluated with its fitness function. In this work, classification accuracy was adopted as the fitness measure. By adopting a genetic algorithm, the optimization was performed to select the optimal subset of features [34,35].
Of the 30 features extracted from preprocessed healthy and diabetic EGG signals, the 20 best features were chosen using a genetic algorithm-based feature selection method.

Results
Different patterns of EGG signals were observed in healthy individuals and subjects with diabetes. Figures 4A and 4B show a typical EGG signal recorded from a healthy individual and the single-sided amplitude spectrum of a healthy EGG signal, respectively. Figures 5A and 5B show a typical EGG signal recorded from a subject with diabetes and the single-sided amplitude spectrum of a diabetic EGG signal, respectively.  The variation of spectral entropy values was evaluated as a function of mobility of EGG signals recorded from healthy individuals ( Figure 6A) and subjects with diabetes ( Figure 6B). We found that the spectral entropies extracted from healthy EGG signals (R=0.96741) and diabetic EGG signals (R=0.90993) had a high correlation with mobility.
The variation of HFD values was investigated as a function of mobility of EGGs recorded from healthy subjects ( Figure 7A) and subjects with diabetes ( Figure 7B). HFD values extracted from healthy EGGs had a high degree of correlation (R=0.91737) with mobility. HFD values extracted from diabetic EGGs had a good correlation (R=0.77178) with mobility.  The average Hjorth parameters activity, mobility, and complexity of EGG signals were recorded from healthy individuals and subjects with diabetes ( Figure 9). The average mobility and complexity of the EGG signals recorded from healthy individuals are higher than the average mobility and complexity of the EGG signals recorded from subjects with diabetes. Further, the average activity of the EGG signals recorded from subjects with diabetes is higher than the average activity of the EGG signals recorded from healthy individuals.
The MFD and HFD values of the EGG signals were recorded from healthy individuals and subjects with diabetes ( Figures  10A and 10B, respectively). The average MFD of EGG signals recorded from healthy individuals is higher than the average MFD of EGG signals recorded from subjects with diabetes. Furthermore, the HFD of EGG signals recorded from healthy individuals are lower than the average HFD of EGG signals recorded from subjects with diabetes.

Principal Findings
Type 2 diabetes is a chronic disease that prevents the physiological system from using insulin efficiently. It is expected that the global number of type 2 diabetes cases will reach around 450 million by 2030. Undiagnosed diabetes is often associated with complications such as cardiovascular and kidney diseases. However, these risk factors are preventable by the early detection and diagnosis of diabetes [36]. In this regard, a method for the early detection of type 2 diabetes is of high value. The method needs to be simple, self-applicable, noninvasive, and safe. This study aimed to develop a device for mass screening of diabetes. The results confirmed that the frequency of 3 cpm is dominant in the EGG signals acquired from healthy individuals. However, a frequency of 9.6 cpm was dominant in the EGG signals acquired from subjects with diabetes. It was demonstrated that the EGG signals with diabetic complexities cannot be visualized or examined by naked eyes. Mobility is the average frequency of an EGG; therefore, it was highly correlated with the dynamic process of digestion. Additionally, the extracted features of healthy and diabetic EGGs were found to be well correlated with the physiological process of digestion. Further, it was demonstrated that the features spectral entropy, energy, HFD, and MFD provide information about abnormalities in the EGGs.

Conclusion
Human gastric myoelectrical activity can be measured using a noninvasive technique known as EGG. However, although frequency characteristics are one of the most significant parameters, the visual analysis of EGG signals is very difficult. Subjects with diabetes who have poorly controlled diet habits are often suspected of diabetic gastroparesis. In this work, features such as time domain features, frequency domain features, and time-frequency domain features were extracted from EGG signals recorded from healthy individuals and subjects with diabetes. Further, potentially informative features were selected using a genetic algorithm-based feature selection method. Additionally, the correlation of the extracted features with the mobility of the digestive system was analyzed. Results demonstrate that the extracted features grasp individual informative characteristics that can be used for analysis. Further, the features MFD and HFD have a high degree of correlation with the mobility of healthy and diabetic EGG signals. Additionally, the spectral entropy of EGG signals recorded from healthy individuals is highly correlated with the mobility of EGG signals recorded from healthy individuals and subjects with diabetes. This work appears to be of high clinical significance, as these extracted potentially informative features can be used for the analysis and classification of digestive system disorders. In the future, deep learning techniques can be utilized for the automated classification of healthy and diabetic EGG signals.