#### Published on 30.01.19 in Vol 4, No 1 (2019): Jan-Dec

#### Preprints (earlier versions) of this paper are available at http://preprints.jmir.org/preprint/10740, first published Apr 09, 2018.

#### This paper is in the following e-collection/theme issue:

#### Original Paper

## Automatic Near Real-Time Outlier Detection and Correction in Cardiac Interbeat Interval Series for Heart Rate Variability Analysis: Singular Spectrum Analysis-Based Approach

Graduate School of Excellence Computational Engineering, Technische Universität Darmstadt, Darmstadt, Germany

### Corresponding Author:

Michael Lang, MSc

Graduate School of Excellence Computational Engineering

Technische Universität Darmstadt

Dolivostraße 15

Darmstadt, 64293

Germany

Phone: 49 61511624401

Fax:49 61511624404

Email:

### ABSTRACT

Background: Heart rate variability (HRV) is derived from the series of R-R intervals extracted from an electrocardiographic (ECG) measurement. Ideally all components of the R-R series are the result of sinoatrial node depolarization. However, the actual R-R series are contaminated by outliers due to heart rhythm disturbances such as ectopic beats, which ought to be detected and corrected appropriately before HRV analysis.

Objective: We have introduced a novel, lightweight, and near real-time method to detect and correct anomalies in the R-R series based on the singular spectrum analysis (SSA). This study aimed to assess the performance of the proposed method in terms of (1) detection performance (sensitivity, specificity, and accuracy); (2) root mean square error (RMSE) between the actual N-N series and the approximated outlier-cleaned R-R series; and (3) how it benchmarks against a competitor in terms of the relative RMSE.

Methods: A lightweight SSA-based change-point detection procedure, improved through the use of a cumulative sum control chart with adaptive thresholds to reduce detection delays, monitored the series of R-R intervals in real time. Upon detection of an anomaly, the corrupted segment was substituted with the respective outlier-cleaned approximation obtained using recurrent SSA forecasting. Next, N-N intervals from a 5-minute ECG segment were extracted from each of the 18 records in the MIT-BIH Normal Sinus Rhythm Database. Then, for each such series, a number (randomly drawn integer between 1 and 6) of simulated ectopic beats were inserted at random positions within the series and results were averaged over 1000 Monte Carlo runs. Accordingly, 18,000 R-R records corresponding to 5-minute ECG segments were used to assess the detection performance whereas another 180,000 (10,000 for each record) were used to assess the error introduced in the correction step. Overall 198,000 R-R series were used in this study.

Results: The proposed SSA-based algorithm reliably detected outliers in the R-R series and achieved an overall sensitivity of 96.6%, specificity of 98.4% and accuracy of 98.4%. Furthermore, it compared favorably in terms of discrepancies of the cleaned R-R series compared with the actual N-N series, outperforming an established correction method on average by almost 30%.

Conclusions: The proposed algorithm, which leverages the power and versatility of the SSA to both automatically detect and correct artifacts in the R-R series, provides an effective and efficient complementary method and a potential alternative to the current manual-editing gold standard. Other important characteristics of the proposed method include the ability to operate in near real-time, the almost entirely model-free nature of the framework which does not require historical training data, and its overall low computational complexity.

#### JMIR Biomed Eng 2019;4(1):e10740

doi:10.2196/10740

### Citation

Please cite as:

Loading authors...

### KEYWORDS

- change-point detection;
- cumulative sum;
- forecasting;
- heart rate variability;
- R-R series;
- singular spectrum analysis;
- ventricular premature complexes

### Introduction

#### Background on Heart Rate Variability

Oscillations in the time interval between successive heart beats, referred to as heart rate variability (HRV), have long been known to allow for insight into the intricate control mechanisms of the autonomic nervous system (ANS) [1-4]. Accordingly, research into HRV has attracted considerable attention over the past 4 decades, as evidenced by an exponential growth of published work [5,6].

In a nutshell, heart rhythm and rate can be said to be governed by the 2 competing divisions of the ANS, that is, the sympathetic nervous system and the parasympathetic nervous system. An increased sympathetic input to the sinoatrial (SA) node yields an increase in heart rate (HR) mediated by the release of adrenaline and noradrenaline and the subsequent activation of β-1-adrenoceptors, resulting in an accelerated diastolic depolarization. On the other hand, an increase in the parasympathetic outflow decreases the HR through the release of acetylcholine by the vagus nerve, which binds to and activates M2 muscarinic acetylcholine receptors in the SA node and eventually slows down the diastolic depolarization rate [4,7-9].

Various HRV measures have been established and are usually categorized as either time domain, frequency domain, or nonlinear analysis techniques [2,4]. To a large extent, the popularity of HRV is because of its easy acquisition and seemingly straightforward interpretation.

#### The Necessity of Preprocessing the R-R Series

It is crucial to realize that by virtue of its very definition, the HRV encompasses only oscillatory phenomena between heart beats resulting from the SA node depolarization [4,10]. While, ideally, all components of an R-R series originate from the SA node, the actual R-R series in both healthy and diseased subjects are contaminated by outliers due to artifacts and heart rhythm disturbances such as ectopic beats (ie, heart beats whose origin is outside of the SA node). Thus arises the necessity to ensure that the HRV analysis is performed on a series representing only the actual normal sinus rhythm (NSR) interbeat intervals, commonly referred to as N-N series (as in normal-to-normal).

The detrimental impact of ectopics on HRV measures is pronounced and well-documented [6,11-17]. In a recent study Stapelberg et al [18] examined the sensitivity of 38 time domain, frequency domain and nonlinear HRV measures to the addition of artifacts in real and artificial 24-hour recordings. In accordance with previous findings, they concluded short-term time domain HRV measures to be more sensitive to the presence of artifacts than their long-term counterparts. Furthermore, frequency domain measures were found to generally be more sensitive than time domain measures, whereas the less commonly used nonlinear measures were shown to exhibit some inherent robustness properties.

Ectopic heart beats are ubiquitous phenomena and not limited to patients with cardiac disorders and diseases. Hingorani et al [19] retrospectively examined the prevalence of cardiac arrhythmias by scrutinizing 24-hour Holter recordings of 1273 healthy volunteers from 22 phase I clinical trials; note that this specific population is not a representative sample of the overall population. The sample was heavily biased in that, consistent with the requirements of phase I clinical trials, subjects were extensively screened by history, physical examination, and laboratory tests and were therefore significantly healthier than the overall population. Crucially, all subjects underwent 12-lead pretrial electrocardiography (ECG) and were excluded if they had any cardiac conduction disorder or disease, including a personal or familiar history thereof. Those exhibiting ≥2 consecutive ectopics as well as bigeminy, trigeminy and quadrigeminy in their 12-lead ECG examination were also excluded. Despite these rigorous exclusion criteria, Hingorani et al [19], among other findings, found premature atrial complexes in 60.8% of the examined Holter records, followed by premature ventricular complexes (PVCs), which were observed in 43.4% of healthy volunteers. While multifocal PVCs occurred in 5.3%, 3.3% exhibited >200 PVCs per 24-hour ECG recordings. A relatively high prevalence of occasional ectopic beats has been reported by numerous other authors as well both in healthy and diseased subjects [20-23].

In summary, the detection and correction of ectopic beats in tachograms are not merely imperative for formal consistency with the HRV definition but arises from the impact of ectopics on HRV measures and the general prevalence of ectopics.

#### Prior Work and State-of-the-Art

Compared with the overall research interest directed toward HRV applications, the crucial issue of the discrepancy between most real-world R-R series and their respective N-N series has arguably not received as much attention. The actual gold standard of manual R-R series editing, advocated for in the field’s relevant guideline by the European Society of Cardiology and the North American Society of Pacing and Electrophysiology Task Force [4], remains unaltered to this day [11,24].

It should be noted that, aiding and abetting the aforementioned exponential growth of the HRV-related (applied) research, a number of well-designed and widely used software packages for the HRV analysis have been developed and made available to the general public; these include, but are not limited to, Kubios [25,26], Nevrokard’s aHRV [27], and others [28-30]. More often than not, these commercial software packages tend to be closed source and therefore provide only a limited benefit to algorithm developers. On the other hand, it ought to be acknowledged that they are well suited for the needs of practitioners, as they combine advanced preprocessing and analysis features, intuitive graphical interfaces, and support for various file formats.

In automated approaches the task of the detection and correction of outliers is usually tackled as a two-step process.

First, outliers in the R-R tachogram are detected, usually by straightforward thresholding based on a fixed percentage difference to previous intervals or their SD [31], although some more sophisticated methods have also been proposed [12].

Second, the detected (regions of) artifacts from step 1 are either discarded (thereby reducing the effective length of the tachogram), substituted by preceding uncorrupted intervals or by intervals obtained through interpolation, with linear and cubic spline interpolation being the most popular techniques [11].

#### Objective of This Work

The objective of this study was to introduce a general framework of low computational complexity that allows for an automatic near real-time detection and correction of outliers in the R-R series based on the singular spectrum analysis (SSA). The main novel contributions of this work include (1) the application of a recently proposed model-free lightweight SSA change-point detection (l-SSA-CPD) algorithm [32]; (2) a modification of l-SSA-CPD through the use of an adaptive control limit sequential ranks (AC-SRC) control chart [33] to drastically reduce detection delays; and (3) upon detection of an anomaly, the substitution of the corrupted tachogram segment by an approximation obtained through recurrent SSA forecasting based on a small outlier-free tachogram segment.

An extensive simulation study comprising 198,000 5-minute long R-R series, obtained by randomly inserting varying amounts of simulated ectopic beats in records taken from the MIT-BIH Normal Sinus Rhythm Database (NSRDB), is conducted to assess the artifact detection and correction performance of the proposed algorithm.

#### Fundamentals of Basic Univariate Singular Spectrum Analysis and Singular Spectrum Analysis Based Change-Point Detection

Recently, this author proposed a low-complexity model-free approach based on SSA and nonparametric cumulative sum (CUSUM) control charts for real-time cardiac anomaly detection, referred to as l-SSA-CPD [32]. It was shown that l-SSA-CPD reliably detects anomalies even when directly applied to unprocessed (ie, no preprocessing was performed) raw ECG and photoplethysmographic records from common databases publicly available through Physiobank [34].

In this study, modifications to the original l-SSA-CPD algorithm will be introduced, and its capabilities will be expanded by adding a recurrent forecasting feature to generate appropriate substitutes of the corrupted tachogram entries. First, however, a brief restatement of SSA and l-SSA-CPD [32] is required.

##### Basic Singular Spectrum Analysis Algorithm

SSA is a powerful technique of time series analysis owing much of its appeal to an inherent model-free concept combining elements of conventional time series analysis, multivariate geometry and statistics, and signal processing [35]. The univariate basic SSA can be considered as the application of the principal component analysis to the Hankel matrix (obtained through an embedding of the original univariate time series) with the subsequent attempt to reconstruct the original series.

Consider *N* observations = (*x*_{1},..., *x _{N}*) of a univariate time series and a lag-integer or embedding dimension

*M*(1 <

*M*< <

*N*) also commonly referred to as the window length. The basic SSA algorithm then encompasses the following 4 steps:

1. Embedding = (*x*_{1}, ..., *x _{N}* → Χ ∈ ℝ

^{M × K})

A trajectory matrix Χ is constructed by mapping into a sequence of *K = N − M + 1* lagged column vectors *X _{j}* = (

*x*, ...,

_{j}*x*)

_{j+M-1}^{T},

*j*= 1, ...,

*K*of size

*M*, yielding

Χ is called a Hankel matrix because of its characteristic of having equal elements on the antidiagonals. One can think of Χ as multivariate data with *M* characteristics and *K* observations and accordingly *X _{j}* of Χ as column vectors in the

*M*-dimensional space ℝ

^{M}.

2. Singular Value Decomposition of Χ

Taking the singular value decomposition (SVD) of Χ decomposes the trajectory matrix into its orthogonal bases and yields a set of *M* eigenvalues and eigenvectors. Let the eigenvalues of ΧΧ^{T} be λ_{1} ≥ ··· ≥ λ_{M} ≥ 0 and *U*_{1},..., *U _{M}* be the respective eigenvectors. Then, with

*d*= max (

*i*, such that λ

_{i}>0) denoting the rank of Χ, the SVD of Χ can be expressed as the sum of

*d*elementary matrices

with rank 1 matrices and being the eigenvectors of Χ^{T}Χ. Accordingly, the decomposition in Equation (2) is completely characterized by the set of so-called “eigentriples” .

Note that owing to the symmetry of left and right singular vectors, the SVDs of Χ with lag *M* and *K = N − M + 1* are equivalent, entailing the lack of any additional benefit from the use of larger embedding dimensions *M* > *N* / 2 [32,35-39].

3. Eigentriple Grouping

Consider the task of extracting a particular component or signal of interest from observed data contaminated by various artifacts such as noise. To do so, during the third stage of the basic SSA, disjoint subsets of indices {1, ..., *d*} are determined such that the respective systems of eigenvectors span the subspaces associated with those signal components.

In the example of aiming at the separation of a signal from unwanted noise and other disturbances, this entails determining an appropriate subset of indices *I* = {*i*_{1}, ..., *i _{l}*},

*l*<

*d*≤

*M*that span an

*l*-dimensional subspace in ℝ

^{M}, denoted as ℒ

_{I}⊂ ℝ

^{M}=

*span*{

*U*} =

_{I}*span*{

*U*, ...,

_{i1}*U*}, representing the signal whereas the remaining eigentriples with

_{il}*Ī*= {

*i*

_{1}, ...,

*i*} /

_{d}*I*are said to span the noise subspace ℒ

_{Ī}⊂ ℝ

^{M}=

*span*{

*U*}.

_{Ī}The trajectory matrix component Χ_{I} corresponding to the subset *I* of eigentriples associated with the signal of interest is then

and the component Χ_{Ī} corresponding to the subset *Ī* = {*i*_{1},..., *i _{d}*}

*I*associated with the remainder of the observed signal is

such that the overall SVD of Χ can be rewritten as

In the case of separability (see, eg, [36] at 17), the contribution of Χ_{I} to the entire observed signal Χ is represented by the respective share of eigenvalues .

4. Diagonal Averaging

For perfectly separable signal components, all matrices in the expansion of Equation (5) are Hankel matrices, which require that they all have equal entries on their antidiagonals. In real-world problems, however, perfect separability is rarely achievable. While an approximate separability usually suffices, the final fourth step in the SSA algorithm is required, as the matrices Χ_{i} from step 3 would have unequal entries on their antidiagonals. Thus, the Hankelization of all Χ_{i} is performed, that is, the compliance with the Hankel structure is enforced by taking the average of each antidiagonal and then replacing each element of that antidiagonal with the determined average. This yields

with being the diagonally averaged matrices.

One can then, for example, easily reconstruct the approximation of the signal of interest through the eigentriples with indices *I* through the one-to-one correspondence between and the respective time series which provides an approximation of either the entire time series or some components of it, depending on the particular choice of indices *I*.

##### Singular Spectrum Analysis Based Change-Point Detection Rationale

The sequential application of SSA for change detection is based on Moskvina and Zhigljavsky [40] (see [39] for an exhaustive discussion; see also [35] for an earlier account of the basic idea). The rationale is to slide 2 (possibly intersecting) moving windows over the time series and to apply the first 3 stages of the basic SSA each time. Think of the first window as adaptively generating a low-dimensional base subspace representation (capturing the main structure in the series) and the second window, which contains at least *M* new data samples, as generating a test subspace. As both windows are slid over the series, some sort of distance between the base and the test subspace is monitored. If no change-point occurs, the distance remains small, whereas it spikes in the presence of a sudden significant change in the main structure of the series.

The algorithm by Moskvina and Zhigljavsky has been successfully applied to various change detection problems and some improved variations of it have been presented as well (see [32] for a detailed background). Computational complexity, however, ought to be acknowledged as a potentially significant limitation of the said algorithm. Recently, l-SSA-CPD [32] has been proposed as a lightweight alternative. Contrary to the algorithm by Moskvina and Zhigljavsky, which recalculates the SVD of 2 sliding windows each time new observations become available, l-SSA-CPD relies on a nominal low-dimensional signal subspace representation computed only once at startup. As new observations become available, l-SSA-CPD simply allocates them into a vector and calculates a test statistic based on the squared Euclidean distance and the angle between this new data vector and the previously determined nominal reference subspace.

##### Singular Spectrum Analysis Parameter Selection

The behavior of SSA is largely determined by 2 tuning parameters, namely the window length *M* and the number *l* of eigenvalues to be used in the embedding and grouping steps, respectively. While it is undisputed that the successful application of SSA hinges on an appropriate choice of *M* and *l*, practitioners are faced with the nuisance that no clear-cut rules as to how they ought to be determined exist.

For a series of length *N* and window length (or embedding dimension) 1 < *M* < *N*, as previously discussed, there is no additional benefit from using *M* > *N* / 2 (see, eg, [35] at 69 and[36] at 47). It is widely acknowledged that as the appropriate window length *M* depends on both the underlying data and the particular application, no universal rule for determining it exists [35-39]. General recommendations vary between *M* ≃ *N* / 4 [37] and *M* ≃ *N* / 2 [35,36]. In the case of a periodic signal with period *T*, it is important for *M* to be, at least, equal to *T* for the SSA to capture the main structure of the series. Furthermore, *M* should be proportional to *T* [35,36,38]. Of note, small windows act like a smoothing filter of width 2*M* − 1 [36].

Similarly, it is well known that selecting a proper *l* is also crucial, for issues are known to arise when eigentriple grouping comprises either an insufficient or an excessive number of eigenvalues. More specifically, if *l* is too small, the SSA is unable to capture the entire signal whereas if it is too large noise components are unwittingly captured. Since the contribution of a particular series component (associated with the corresponding eigentriple) to the entire series is proportional to the respective eigenvalue, it is common practice to select *l* by identifying the leading eigenvalues through visual inspection of the eigenvalue spectra or by setting a predefined eigenvalue share (see, eg, [35,36,38] for a more exhaustive discussion).

In addition to the above-referred monographs, interested readers are also referred to some fairly recent contributions by Hassani et al [41] and Khan and Poskitt [42,43], which approach the issue through the concept of separability and information theory, respectively.

In the light of the pragmatic suboptimal approach pursued here, a detailed discussion of the SSA parameter selection is, however, beyond the scope of this study and therefore omitted.

### Methods

#### The Proposed Outlier Detection and Correction Framework

The proposed framework consisted of 2 separate stages, namely the detection and the correction of outlier corrupted segments of the tachogram.

##### Outlier Detection Using Lightweight Singular Spectrum Analysis Change-Point Detection and Adaptive Control Limit Sequential Ranks Control Charts

Let *M*, *N*, *l*, *p*, *q* be fixed integers such that *l* < *M* < *N* / 2 and 0 ≤ *p* < *q*. Then l-SSA-CPD proceeds as follows:

1. Initialization at *n* = 0

Akin to MZ, the first 3 steps of the basic SSA are applied on the interval [*n* + 1, *n* + *N*] to obtain a low-dimensional nominal subspace ℒ_{I} = ℒ_{I}^{(n=0)} which captures the main structure of the series. This involves embedding as in Equation (1) to obtain a trajectory matrix, which, hereafter, we refer to as the base matrix Χ_{B} = Χ_{B}^{(0)} = Χ_{B}^{(n=0)}, taking the SVD of Χ_{B} and obtaining ℒ_{I} through an appropriate choice of *I* = {*i*_{1},...,*i _{l}*},

*l*<

*d*≤

*M*with

*d*= max (

*i*, such that λ

_{i}> 0).

2. Then, for each *n* = 0,1,... l-SSA-CPD proceeds as follows:

- Construction of a test matrix Χ
_{T}^{(n)}on the interval [*n*+*p*+ 1,*n*+*q*+*M*− 1] as

- Computation of the detection statistic
*D*as_{n,I,p,q}

with

and with the angle ∠ (Χ_{T}^{(n)}, ℒ_{I}) taking values in [0, π / 2] and accordingly . *Χ _{j}*

^{(n)}= [

*x*, ...,

_{n+j}*x*

_{n+j+M −1}]

^{T}are the column vectors of Χ

_{T}

^{(n)}whereas U

_{I}= [

*U*

_{i1}, ...,

*U*] denotes the

_{il}*M*×

*l*matrix of eigenvectors spanning ℒ

_{I}and º denotes the Hadamard (or element-wise) product. Note that throughout this study,

*p*=

*N*,

*q*=

*N*+ 1 is used, that is, the width

*Q*=

*q*×

*p*of the test matrices Χ

_{T}

^{(n)}used always equals 1. In other words, each test matrix is actually a single column vector containing

*M*new observations.

*Q*= 1 is primarily chosen to allow for an agile response time and to minimize the computational burden [32].

- Monitoring of
*D*using the AC-SRC Control Chart_{n,I,p,q}

To reduce detection delays, instead of the sequential ranks CUSUM (SRC) by McDonald [44], as used in the conventional l-SSA-CPD [32], a sequential ranks-based CUSUM with adaptive control limits referred to as AC-SRC [33] is used. AC-SRC is inspired by a distribution-free CUSUM proposed by Chatterjee and Qiu [45] where instead of a fixed threshold a sequence of control limits, determined by the bootstrap estimate of the conditional distribution of the CUSUM test statistic given the last time it was zero, is used. AC-SRC carries the approach of Chatterjee and Qiu over to McDonald’s SRC; the main advantage of the SRC (which is that it does not require training data and control limits can thus easily be determined in advance, eg, through Monte Carlo simulations) is retained while detection delays are significantly reduced because of the use of an adaptive sequence of control limits. Furthermore, owing to a trade-off favoring the lower computational burden and simplicity over the optimal performance, AC-SRC lacks some of the refined fine-tuning routines used in the Chatterjee and Qiu method. The author refrains from a detailed discussion of AC-SRC and refers the interested reader to previous studies [33,45].

Let the sequential rank of *D _{n,I,p,q}* be denoted as

The AC-SRC then starts with the same recursive equation as the SRC, ie,

with *C*_{0} = 0 and *k* a reference constant. Let *T _{n}* denote the so-called “sprint length” expressing the time elapsed since

*C*was last 0, ie,

_{n}Furthermore, let *Y _{j}* be a random variable with distribution

Chatterjee and Qiu showed that the conditional distributions of *C _{n}* |

*T*in Equation (11) are easier to handle than the unconditional distribution of

_{n}*C*and under some regularity conditions (see [45] and references therein for details), depend only on

_{n}*j*and the underlying process generating distribution, but not on

*n*. Then, for any positive integer

*j*

_{max}≤

*n*, the distribution of

*C*can be approximated by means of the conditional distributions in Equation (11) as

_{n}with *Y* * ∼ *C _{n}* |

*T*>

_{n}*j*

_{max}and

*I*being the indicator function.

It can be shown that [44], given the observed process is in-control, the quantities *R _{n}* / (

*n*+ 1) in Equation (10) are independent and discrete uniform on . Herein lies the key advantage of AC-SRC, in that for some fixed

*j*

_{max}and

*k*one can determine a sequence of control limits from

with representing a simulation of Equation (10) for up to some *N*_{AC-SRC} wherein the quantities *R _{n}* / (

*n*+ 1) are substituted by the set of random variables as discrete uniform on . The sequence of control limits can then easily and without the need for training data samples be determined by means of Monte Carlo simulations as outlined elsewhere [33].

The AC-SRC signals a change to have occurred if

or if

To limit the computational burden, it is recommended to calculate control limits only up to a reasonably small *j*_{max} after which the control limit is kept fixed at *h*_{AC-SRCjmax}. In addition, following recommendations by Chatterjee and Qiu, the reference constant *k* is set proportionate to the expected average sprint length *E* {*T _{n}*}. Throughout this work,

*k*was empirically calibrated such that , whereas the average run length (ARL) was set to a nominal ARL

_{0}=500. A detailed discussion as to the empirical calibration of control charts would, however, be beyond the scope of this work [33]. Note that as

*k*increases, accordingly

*j*

_{max}decreases

*E*{

*T*}, that is, the likelihood of

_{n}*C*bouncing back to 0 increases [33,45]. For the particular use case examined here, the use of a small

_{n}*j*

_{max}to allow for agile change detection is recommended (see Tables 1-4 and the respective discussion in the next section).

Finally, as with the SRC in the conventional l-SSA-CPD algorithm, the control chart is reinitialized after a change has been detected (by setting *C _{n}*=0 and

*T*=0) to allow for the detection of multiple and potentially nearby change-points [32].

_{n}Monitoring of the l-SSA-CPD test statistic *D _{n,I,p,q}* with

*N*=20,

*M*=10 by means of AC-SRC with

*j*

_{max}=8,

*E*{

*T*}=6 is showcased on a 5-minute tachogram excerpt of record 16,273 MIT-BIH NSRDB artificially contaminated with 2 and 8 ectopics in Figures 1 and 2, respectively. Note the remarkably low detection delay due to the use of AC-SRC.

_{n}##### Outlier Correction Using Recurrent Singular Spectrum Analysis Forecasting

Once an outlier has been detected at say *n=τ*, accounting for the inherent detection delay and accordingly an uncertainty as to the exact outlier location, the interval [τ − Δ_{1}, τ], is designated as corrupted tachogram segment. Note that Δ_{1} depends on the (average) detection delay and should be chosen carefully. However, it need not be exact or otherwise optimal. Then, a second interval of length Δ_{2} immediately preceding the corrupted one is used to obtain an anomaly-free forecast to substitute the corrupted segment with.

The rationale of the proposed framework is depicted in Figure 3, with green and yellow rectangles representing the designated uncorrupted and corrupted segments and the detected change highlighted in red.

As the proposed method operates sequentially as new observations arrive, an appropriate uncontaminated data segment (either uncorrupted to begin with or previously cleaned) to serve as a basis for forecasting is always available. While it may be beneficial to use large values for Δ_{2} (assuming sufficient past observations were available), keeping Δ_{1}, Δ_{2} as small as possible is recommended; as doing so allows for near real-time operation and significantly reduces the computational burden. For the purpose of this study, simplicity was prioritized over (more often than not unnecessary) optimality.

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.9712 | 0.9827 | 0.9825 |

16,272 | 0.9415 | 0.9802 | 0.9794 |

16,273 | 0.9905 | 0.9910 | 0.9909 |

16,420 | 0.9771 | 0.9778 | 0.9777 |

16,483 | 0.9953 | 0.9846 | 0.9846 |

16,539 | 0.9326 | 0.9878 | 0.9870 |

16,773 | 0.9692 | 0.9832 | 0.9829 |

16,786 | 0.9722 | 0.9758 | 0.9755 |

16,795 | 0.8800 | 0.9922 | 0.9905 |

17,052 | 0.9521 | 0.9830 | 0.9825 |

17,453 | 0.9768 | 0.9904 | 0.9902 |

18,177 | 0.9549 | 0.9757 | 0.9754 |

18,184 | 0.9719 | 0.9805 | 0.9803 |

19,088 | 0.9711 | 0.9917 | 0.9914 |

19,090 | 0.9900 | 0.9745 | 0.9745 |

19,093 | 0.9744 | 0.9837 | 0.9834 |

19,140 | 0.9689 | 0.9920 | 0.9917 |

19,830 | 0.9948 | 0.9909 | 0.9909 |

*j*

_{max}=6 and lightweight singular spectrum analysis change-point detection with

*N*=20,

*M*=10,

*Q*=1, 75% of leading eigentriples.

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.9335 | 0.9860 | 0.9854 |

16,272 | 0.8595 | 0.9850 | 0.9828 |

16,273 | 0.9199 | 0.9922 | 0.9913 |

16,420 | 0.9141 | 0.9810 | 0.9803 |

16,483 | 0.9569 | 0.9893 | 0.9888 |

16,539 | 0.7494 | 0.9889 | 0.9863 |

16,773 | 0.8781 | 0.9858 | 0.9844 |

16,786 | 0.9038 | 0.9779 | 0.9767 |

16,795 | 0.7210 | 0.9936 | 0.9897 |

17,052 | 0.8259 | 0.9851 | 0.9829 |

17,453 | 0.9013 | 0.9933 | 0.9921 |

18,177 | 0.8101 | 0.9785 | 0.9771 |

18,184 | 0.8508 | 0.9833 | 0.9819 |

19,088 | 0.8667 | 0.9924 | 0.9910 |

19,090 | 0.9403 | 0.9793 | 0.9789 |

19,093 | 0.8920 | 0.9869 | 0.9854 |

19,140 | 0.8796 | 0.9946 | 0.9933 |

19,830 | 0.9510 | 0.9945 | 0.9941 |

*j*

_{max}=8 and lightweight singular spectrum analysis change-point detection with

*N*=20,

*M*=10,

*Q*=1, 75% of leading eigentriples.

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.8816 | 0.9875 | 0.9864 |

16,272 | 0.8240 | 0.9871 | 0.9844 |

16,273 | 0.8710 | 0.9940 | 0.9925 |

16,420 | 0.8741 | 0.9850 | 0.9839 |

16,483 | 0.8990 | 0.9900 | 0.9889 |

16,539 | 0.7218 | 0.9908 | 0.9879 |

16,773 | 0.8202 | 0.9871 | 0.9850 |

16,786 | 0.8421 | 0.9814 | 0.9794 |

16,795 | 0.6619 | 0.9943 | 0.9898 |

17,052 | 0.7753 | 0.9893 | 0.9865 |

17,453 | 0.8450 | 0.9939 | 0.9921 |

18,177 | 0.7539 | 0.9824 | 0.9805 |

18,184 | 0.7952 | 0.9853 | 0.9832 |

19,088 | 0.8030 | 0.9930 | 0.9909 |

19,090 | 0.8869 | 0.9824 | 0.9814 |

19,093 | 0.8379 | 0.9892 | 0.9869 |

19,140 | 0.8004 | 0.9951 | 0.9931 |

19,830 | 0.9063 | 0.9948 | 0.9939 |

*j*

_{max}=12 and lightweight singular spectrum analysis change-point detection with

*N*=20,

*M*=10,

*Q*=1, 75% of leading eigentriples.

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.8623 | 0.9878 | 0.9866 |

16,272 | 0.7947 | 0.9885 | 0.9854 |

16,273 | 0.8774 | 0.9942 | 0.9929 |

16,420 | 0.8585 | 0.9851 | 0.9838 |

16,483 | 0.8950 | 0.9904 | 0.9893 |

16,539 | 0.7260 | 0.9920 | 0.9890 |

16,773 | 0.7951 | 0.9880 | 0.9857 |

16,786 | 0.8022 | 0.9814 | 0.9790 |

16,795 | 0.6913 | 0.9941 | 0.9900 |

17,052 | 0.7866 | 0.9905 | 0.9881 |

17,453 | 0.8176 | 0.9951 | 0.9932 |

18,177 | 0.7644 | 0.9841 | 0.9823 |

18,184 | 0.7846 | 0.9866 | 0.9847 |

19,088 | 0.8119 | 0.9936 | 0.9917 |

19,090 | 0.8733 | 0.9837 | 0.9827 |

19,093 | 0.8299 | 0.9898 | 0.9875 |

19,140 | 0.7949 | 0.9953 | 0.9934 |

19,830 | 0.8979 | 0.9952 | 0.9944 |

*j*

_{max}=16 and lightweight singular spectrum analysis change-point detection with

*N*=20,

*M*=10,

*Q*=1, 75% of leading eigentriples.

Time series forecasting plays a crucial role in many scientific fields and arguably also represents one of the most popular (real-world) application areas of the SSA. Note that there exist different approaches for SSA forecasting [46], most notably recurrent and vector forecasting. Refraining from any detailed discussion (see [35-38] for excellent monographs on the subject matter), the recurrent SSA forecasting approach used in this study will briefly be outlined. As with other more advanced capabilities that go beyond the realm of basic SSA, forecasting and missing value imputation (note that the 2 are closely related with the latter being the more general problem incorporating forecasting as a special case) in a way sacrifice some of the model-free beauty and appeal of SSA by invoking certain model assumptions. On the other hand, this appears to be outweighed by the successful application of SSA forecasting in numerous areas [47-52]. Furthermore, Golyandina and Zhigljavsky [36] point out that for short-term forecasts, there is actually only very little use of the imposed model, that is, the series approximately satisfy some linear recurrent formulas.

Consider again a univariate time series = (*x*_{1}, ..., *x _{N}*) of length

*N*, a fixed embedding dimension

*M*(1 <

*M*< <

*N*) and the respective trajectory matrix Χ = [

*Χ*

_{1}, ...,

*Χ*with

_{K}*K*=

*N − M*+ 1 as in Equation (1). Let ℒ

_{r}⊂ ℝ

^{M}be a linear space of dimension

*r*<

*M*and

*U*

_{1}, ...,

*U*(ie, the eigenvectors obtained through the SVD of Χ) be an orthonormal basis in ℒ

_{r}_{r}. Furthermore, let , that is, is the orthogonal projection of the column vector

*Χ*of Χ onto ℒ

_{i}_{r}, and its Hankelized counterpart be the trajectory matrix corresponding to the time series . In addition, given a vector

*Y*= (

*y*

_{1}, ...,

*y*)

_{M}^{T}∈ ℝ

^{M}, let

*Y*

^{∇}∈ ℝ

^{M −1}denote the vector consisting of the first

*M*− 1 components of

*Y*and similarly

*Y*Δ ∈ ℝ

^{M −1}the vector consisting of the last

*M*−1 components of

*Y*. Set

*v*

^{2}= π

_{1}

^{2}+ ··· + π

_{r}^{2}where π

_{i}|

_{i}=1, ...,

*r*denotes the last component of the eigenvector

*U*. Assuming ℯ

_{i}_{M}= (0, 0, ..., 0, 1)

^{T}∉ ℒ

_{r}(ie, ℒ

_{r}not to be a vertical space),

*v*

^{2}represents the squared cosine of the angle between ℯ

_{M}and ℒ

_{r}and

*v*

^{2}< 1 holds. The last component

*y*of any vector

_{M}*Y*= (

*y*

_{1}, ...,

*y*)

_{M}^{T}∈ ℒ

_{r}can then be shown to be a linear combination of its first

*M*− 1 components

*y*

_{1}, ...,

*y*

_{M −1}(ie, of

*Y*

^{∇})

with vector *A* = (*a*_{1}, ..., *α*_{M −1})^{T} being

Note that the representation in Equation (15) does not depend on the choice of the basis *U*_{1}, ..., *U _{r}* in ℒ

_{r}.

With the required notation now having been introduced, the time series can be defined as

with being the reconstructed series as in basic SSA and *y*_{N +1}, ..., *y _{N+NF}* the recurrent SSA forecast of length

*N*.

_{F}Two examples of the automatic detection and correction of outliers using the method proposed in this study are shown in Figures 4 and 5 and compared with the benchmark approach of replacing the corrupted segment with an immediately preceding block of R-R data, referred to as block replacement [28]. Note how in both cases, while all outliers have been detected and corrected, there are ectopic-free segments of tachogram which have also been corrected because of false alarms.

#### Performance Evaluation

The performance and utility of the proposed method are evaluated using records that are publicly available through Physiobank [34]. More specifically, excerpts of all 18 records in the MIT-BIH NSRDB are used. The European Society of Cardiology/North American Society of Pacing and Electrophysiology Task Force guideline [4] recommends to perform the HRV analysis on ECG recordings of at least 5 minutes, and the said length can be considered as the accepted standard for the short-term HRV analysis. Accordingly, for each of the 18 records in MIT-BIH NSRDB excerpts corresponding to the first 5-minute period of NSR are collected. Note that the records in MIT-BIH NSRDB are cardiologist-annotated, that is, they have manually been checked for errors in beat detection and labeling. Thus, one can circumvent the step of QRS complex-detection and use the R-R intervals as provided (and validated by experts) by Physiobank directly. For all 18 records the first 5 minutes were free of ectopic beats and therefore selected. However, it will be pointed out that 4 records contained a couple of extreme outliers (most likely measurement errors), which were removed manually.

Specifically, the 4 affected records were records number 16,272, 18,177, 19,088, and 19,140, for which 2, 1, 2, and 1 data points were removed, respectively. Besides being readily discernible because they deviate by some order of magnitude from surrounding interbeat intervals, these outliers are also marked as artifacts in the respective annotation files.

Next, the effect of a PVC on the R-R interval was simulated by decreasing the respective entry to 2/3 of its actual value and by increasing the following entry to 4/3 of its value (Textbox 1). That is, for a PVC at say *n* = *τ*, one sets

which is consistent with the actual signature of a single PVC (see, eg, [53]) in an R-R series characterized by a premature beat (and accordingly a significantly smaller R-R interval) followed by a compensatory pause (longer R-R interval) followed by a return to baseline (NSR).

Extensive Monte Carlo simulations were then used to determine the performance characteristics of the proposed method, wherein a large number of artificial R-R series based on the 18 5-minute MIT-BIH NSRDB records were created by first randomly drawing an integer between 1 and 6 and then randomly selecting the position(s) within the series where the respective PVCs would be placed (with the constraint that PVCs be at least 5 samples apart).

### Results

With the proposed method consisting of 2 stages (detection and correction), a separate analysis of each stage appears appropriate.

#### Detection Step

In reporting results pertaining to the detection performance of the first stage, established metrics commonly used in the literature are relied upon the sensitivity (*Se*), specificity (*Sp*), and accuracy (*Acc*), which are defined as

and

with TP, FP, TN, and FN representing the number of true positives, false positives, true negatives, and false negatives, respectively. Intuitively, sensitivity quantifies the ability to correctly detect (actual) anomalies, specificity quantifies the proportion of nonabnormal segments that are correctly identified as such, and accuracy combines both of the aforementioned aspects. More formally, *Se* expresses the empirical statistical power, 1 – *Sp* the type I (false positive), and 1 – *Se* the type II (missed detection/false negative) error rate.

As reported previously [32], for an event occurring at a time instance *n* = *τ*, one allows for a certain detection delay Δ_{d} and considers a signal from the AC-SRC control chart as true positive if it falls in the interval [*τ*, *τ* + Δ_{d}]. All results presented in this study were obtained using Δ_{d} = Δ_{1} = *M*.

Note that the design and calibration of a detection algorithm involves an inherent trade-off between false alarms and missed detections and that furthermore reducing the false alarm rate usually entails an increase in the detection delay. For this specific use case, high sensitivity and short detection delays are arguably of the highest priority; for one wants a high likelihood of detecting all actual outliers (to reliably “clean” the R-R series, thereby transforming it into the respective N-N series) with a manageable (ie, small) detection delay to be able to narrow down the corrupted segments in the series as much as possible. The latter is particularly relevant because the proposed method substitutes the (allegedly) corrupted segment by an approximation of the outlier-free segment obtained using recurrent SSA forecasting; one aims to keep the forecasting horizon small to obtain acceptable results without having to tweak the SSA parameters extensively. This comes at the expense of higher false alarm rates, as is evidenced by the results presented in Tables 1-4.

As in the considered design, *j*_{max} is linked to the average sprint length *E* {*T _{n}*}, better agility is generally achieved with rather small

*j*

_{max}in a use case, as the one considered here, were AC-SRC monitors for large deviations [33,45].

While AC-SRC achieved satisfactory performance over a wide range of values *j*_{max} (results not reported here), upon examination of the results reported in Tables 1-4, which are based on 18,000 R-R series, *j*_{max}=6 was chosen to proceed to the second stage of the proposed method (ie, the correction of corrupted segments of the R-R series).

#### Correction Step

As mentioned earlier this design choice entails higher false alarm rates, as discernible in the 2 examples shown in Figures 4 and 5, which both exhibit actually uncorrupted segments that have been “corrected” by the proposed algorithm because of false alarms, thereby introducing some distortions. More importantly though, both examples also clearly show that all artificially inserted outliers because of PVCs were detected and appropriately corrected.

Record | RMSE_{SSA}^{a,b} | RMSE_{block} | RRMSE^{c} | Percent increase or decrease |

16,265 | 0.0225 | 0.0251 | 0.8964 | 10.36 |

16,272 | 0.0239 | 0.0354 | 0.6751 | 32,49 |

16,273 | 0.0136 | 0.0221 | 0.6154 | 38.46 |

16,420 | 0.0246 | 0.0326 | 0.7546 | 24.54 |

16,483 | 0.0162 | 0.0196 | 0.8265 | 17.35 |

16,539 | 0.0282 | 0.0342 | 0.8246 | 17.54 |

16,773 | 0.0299 | 0.0509 | 0.5874 | 41.26 |

16,786 | 0.0249 | 0.0366 | 0.6803 | 31.97 |

16,795 | 0.0323 | 0.0450 | 0.7178 | 28.22 |

17,052 | 0.0213 | 0.0414 | 0.5145 | 48.55 |

17,453 | 0.0256 | 0.0344 | 0.7442 | 25.58 |

18,177 | 0.0296 | 0.0357 | 0.8291 | 17.09 |

18,184 | 0.0298 | 0.0450 | 0.6622 | 33.78 |

19,088 | 0.0151 | 0.0198 | 0.7626 | 23.74 |

19,090 | 0.0187 | 0.0245 | 0.7633 | 23.67 |

19,093 | 0.0350 | 0.0439 | 0.7973 | 20.27 |

19,140 | 0.0123 | 0.0156 | 0.7885 | 21.15 |

19,830 | 0.0110 | 0.0125 | 0.8800 | 12.00 |

Average | 0.0230 | 0.0319 | 0.7210 | 27.9 |

^{a}RMSE: root mean square error.

^{b}SSA: singular spectrum analysis.

^{c}RRMSE: relative root mean square error.

Results pertaining to the second stage of the proposed method are reported in Table 5, wherein the root mean square error (RMSE) is used to quantify the discrepancy between the outlier-cleaned series and the actual (uncontaminated) series. Let be the uncontaminated N-N series (ie, the ground truth) and denote our allegedly cleaned R-R series.

Calculation of the RMSE is straightforward,

Furthermore, to better compare the RMSE of the proposed method to its competitor (the commonly used block replacement method is used here for benchmarking purposes), consider also the relative RMSE (RRMSE) given by

Note that if the RRMSE, as given in Equation (18) is <1, it implies that the novel approach presented in this paper outperforms the competing benchmark approach by 100·(1 − RRMSE). It shall be emphasized, however, that the block replacement method as it its applied here (to correct for outliers in the R-R series) uses the first stage of the proposed method as well, its performance should thus be seen in the context of being applied to pretty tight segments of the series that ought to be substituted, which is all because of the AC-SRC l-SSA-CPD algorithm of the first stage of the framework presented in this study.

That being said, the results reported in Table 5, which are based on 180,000 R-R series, show that the proposed method, which substitutes corrupted tachogram segments by an approximation of the respective outlier-free segment obtained by

means of recurrent SSA forecasting, clearly outperforms the competing block replacement method. In fact, it outperforms its competitor for all 18 records (on which the extensive simulations built upon), on average by almost 30%.

### Discussion

#### Background and Relevance

The HRV has long been known to allow for valuable insights into the intricate control mechanisms of the ANS and has a long track record of exponentially growing research interest and output. A crucial prerequisite for any HRV analysis is the exclusion of all artifacts and abnormalities, that is, a clean N-N series is required. Such a clean N-N series is virtually never available, mostly because of various interferences, beats not detected by the QRS complex-detector, and various highly prevalent heart rhythm disturbances.

Compared with the overall research effort directed toward the HRV, the issue of automatic R-R series (pre)processing has received rather little attention. In fact, manual editing still represents the gold standard. It should be noted that every automatic approach comes with risks and benefits; the former lie in the inherently nonzero type I and type II error rates, whereas the latter include, but are not limited to, considerable savings in both cost and time. Considering that error rates for manual approaches are nonzero as well and that they are most likely an increasing function of time (whereas in automatic approaches they are usually not), the choice of which one or which particular combination to use should always be made on an individual basis, considering all requirements and circumstances of the particular case. Accordingly, while the proposed method could be construed as a potential alternative to manual editing, in the light of the above considerations, this author would rather recommend it as a complementary method and or a preprocessing tool.

The main contribution of this work is the introduction of a general framework of low computational complexity that allows for an automatic near real-time detection and correction of outliers in R-R series based on the SSA. While related work pertaining to the use of the SSA in ECG and R-R data processing exists [54], to the best of the author’s knowledge, this study is the first to propose a general SSA-based framework that handles both detection and correction of (unwanted) anomalies in the R-R series.

#### Principal Findings

An extensive simulation study comprising 198,000 5-minute R-R series, obtained by randomly inserting varying amounts of simulated ectopic beats in records taken from the MIT-BIH NSRDB was conducted to assess the artifact detection and correction performance of the proposed algorithm.

It was shown that the proposed algorithm reliably detects outliers in the R-R series and achieved an overall sensitivity of 96.6%, specificity of 98.4%, and accuracy of 98.4%. Furthermore, it compares favorably in terms of discrepancies of the cleaned R-R series compared with the actual N-N series, outperforming a block replacement approach on average by almost 30%. It should also be emphasized that a suboptimal pragmatic approach was pursued, deliberately refraining from an optimization of the various (SSA and AC-SRC) tuning parameters, which would most likely have resulted in further performance improvements at the expense of simplicity and computational complexity.

Other important characteristics of the proposed method include the ability to operate in near real-time, the almost entirely model-free nature of the framework, and the low computational complexity. Moreover, it should be pointed out that the proposed method is not limited to the R-R series but can be applied broadly, as all of its components are deliberately kept as general as possible. This entails the additional benefit of not being limited to the removal of PVCs (although only this has extensively been investigated and tested so far) but rather being able to deal with all kinds of anomalies that may contaminate the observed series of interest.

#### Limitations and Future Research

This work has several limitations, some of which open (new) avenues for future research.

A major limitation is because of the lack of established and generally recognized and validated procedures other than manual editing, which transforms the performance comparison among competing methods in a nontrivial task. Second, to increase the readability and decrease the length, the paper fails to provide general recommendations and guidelines as to how various tuning parameters should be chosen. Finally, the SSA represents a very active research field, as in the last couple of years, and several promising advances and new developments have been reported, but thus far have not been considered in the research reported in this work. This, however, does not infringe on the relevance of this work, as the objective was to introduce a simple, general framework of low computational complexity and to investigate the use of SSA in both the detection and correction step.

Several of the abovementioned recent developments, however, appear to open promising avenues for future research. In particular, the author intends to expand and improve the work presented here by focusing on the issue of the SSA parameter selection and by considering latest developments, especially pertaining to SSA forecasting. For the latter, the interested reader is referred to recent publications by Hassani and Kalantari [55-58].

#### Acknowledgments

ML was supported by the “Excellence Initiative” of the German Federal and State Governments and the Graduate School of Excellence Computational Engineering at Technische Universität Darmstadt. The views expressed in this article are solely those of the author in his private capacity and do not necessarily reflect the views of Technische Universität Darmstadt or any other organization.

#### Conflicts of Interest

None declared.

#### References

- Billman GE. Heart rate variability - a historical perspective. Front Physiol 2011;2:86 [FREE Full text] [CrossRef] [Medline]
- Billman GE, Huikuri HV, Sacha J, Trimmel K. An introduction to heart rate variability: methodological considerations and clinical applications. Front Physiol 2015;6:55 [FREE Full text] [CrossRef] [Medline]
- Rajendra AU, Paul JK, Kannathal N, Lim CM, Suri JS. Heart rate variability: a review. Med Biol Eng Comput 2006 Dec;44(12):1031-1051. [CrossRef] [Medline]
- ESC/NASPE. Heart rate variability: standards of measurement, physiological interpretation and clinical use. Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology. Circulation 1996 Mar 01;93(5):1043-1065 [FREE Full text] [Medline]
- Huikuri HV, Mäkikallio T, Airaksinen KE, Mitrani R, Castellanos A, Myerburg RJ. Measurement of heart rate variability: a clinical tool or a research toy? J Am Coll Cardiol 1999 Dec;34(7):1878-1883 [FREE Full text] [Medline]
- Sassi R, Cerutti S, Lombardi F, Malik M, Huikuri HV, Peng C, et al. Advances in heart rate variability signal analysis: joint position statement by the e-Cardiology ESC Working Group and the European Heart Rhythm Association co-endorsed by the Asia Pacific Heart Rhythm Society. Europace 2015 Sep;17(9):1341-1353. [CrossRef] [Medline]
- Laske TG, Shrivastav M, Iaizzo PA. The cardiac conduction system. In: Iaizzo PA, editor. Handbook of cardiac anatomy, physiology, and devices, 3rd ed. Cham, Switzerland: Springer International Publishing; 2015:215-234.
- Iaizzo PA, Fitzgerald K. Autonomic nervous system. In: Iaizzo PA, editor. Handbook of cardiac anatomy, physiology, and devices, 3rd ed. Cham, Switzerland: Springer International Publishing; 2015:235-250.
- Karemaker JM. An introduction into autonomic nervous function. Physiol Meas 2017 May;38(5):R89-R118. [CrossRef] [Medline]
- Berntson GG, Bigger JT, Eckberg DL, Grossman P, Kaufmann PG, Malik M, et al. Heart rate variability: origins, methods, and interpretive caveats. Psychophysiology 1997 Nov;34(6):623-648. [Medline]
- Peltola MA. Role of editing of R-R intervals in the analysis of heart rate variability. Front Physiol 2012;3:148 [FREE Full text] [CrossRef] [Medline]
- Nabil D, Bereksi Reguig F. Ectopic beats detection and correction methods: A review. Biomedical Signal Processing and Control 2015 Apr;18:228-244. [CrossRef]
- Salo MA, Huikuri HV, Seppänen T. Ectopic beats in heart rate variability analysis: effects of editing on time and frequency domain measures. Ann Noninvasive Electrocardiol 2001 Jan;6(1):5-17. [Medline]
- Kim KK, Lim YG, Kim JS, Park KS. Effect of missing RR-interval data on heart rate variability analysis in the time domain. Physiol Meas 2007 Dec;28(12):1485-1494. [CrossRef] [Medline]
- Kim KK, Kim JS, Lim YG, Park KS. The effect of missing RR-interval data on heart rate variability analysis in the frequency domain. Physiol Meas 2009 Oct;30(10):1039-1050. [CrossRef] [Medline]
- Kim KK, Baek HJ, Lim YG, Park KS. Effect of missing RR-interval data on nonlinear heart rate variability analysis. Comput Methods Programs Biomed 2012 Jun;106(3):210-218. [CrossRef] [Medline]
- Lippman N, Stein KM, Lerman BB. Comparison of methods for removal of ectopy in measurement of heart rate variability. Am J Physiol 1994 Jul;267(1 Pt 2):H411-H418. [CrossRef] [Medline]
- Stapelberg NJC, Neumann DL, Shum DHK, McConnell H, Hamilton-Craig I. The sensitivity of 38 heart rate variability measures to the addition of artifact in human and artificial 24-hr cardiac recordings. Ann Noninvasive Electrocardiol 2018 Jan;23(1). [CrossRef] [Medline]
- Hingorani P, Karnad DR, Rohekar P, Kerkar V, Lokhandwala YY, Kothari S. Arrhythmias Seen in Baseline 24-Hour Holter ECG Recordings in Healthy Normal Volunteers During Phase 1 Clinical Trials. J Clin Pharmacol 2016 Dec;56(7):885-893. [CrossRef] [Medline]
- Manolio TA, Furberg CD, Rautaharju PM, Siscovick D, Newman AB, Borhani NO, et al. Cardiac arrhythmias on 24-h ambulatory electrocardiography in older women and men: the Cardiovascular Health Study. J Am Coll Cardiol 1994 Mar 15;23(4):916-925. [Medline]
- Simpson RJ, Cascio WE, Schreiner PJ, Crow RS, Rautaharju PM, Heiss G. Prevalence of premature ventricular contractions in a population of African American and white men and women: the Atherosclerosis Risk in Communities (ARIC) study. Am Heart J 2002 Mar;143(3):535-540. [Medline]
- Messineo FC. Ventricular ectopic activity: prevalence and risk. Am J Cardiol 1989 Dec 05;64(20):53J-56J. [Medline]
- Kostis JB, McCrone K, Moreyra AE, Gotzoyannis S, Aglitz NM, Natarajan N, et al. Premature ventricular complexes in the absence of identifiable heart disease. Circulation 1981 Jun;63(6):1351-1356. [Medline]
- Laborde S, Mosley E, Thayer JF. Heart Rate Variability and Cardiac Vagal Tone in Psychophysiological Research - Recommendations for Experiment Planning, Data Analysis, and Data Reporting. Front Psychol 2017;8:213 [FREE Full text] [CrossRef] [Medline]
- Kubios HRV. URL: http://www.kubios.com/ [accessed 2018-05-13] [WebCite Cache]
- Tarvainen MP, Niskanen J, Lipponen JA, Ranta-Aho PO, Karjalainen PA. Kubios HRV--heart rate variability analysis software. Comput Methods Programs Biomed 2014;113(1):210-220. [CrossRef] [Medline]
- nevrokard HRV. URL: http://www.nevrokard.eu/hrv.htm [accessed 2018-05-13] [WebCite Cache]
- Pichot V, Roche F, Celle S, Barthélémy J, Chouchou F. HRVanalysis: A Free Software for Analyzing Cardiac Autonomic Activity. Front Physiol 2016 Nov;7:557 [FREE Full text] [CrossRef] [Medline]
- Perakakis P, Joffily M, Taylor M, Guerra P, Vila J. KARDIA: a Matlab software for the analysis of cardiac interbeat intervals. Comput Methods Programs Biomed 2010 Apr;98(1):83-89. [CrossRef] [Medline]
- Kaufmann T, Sütterlin S, Schulz SM, Vögele C. ARTiiFACT: a tool for heart rate artifact processing and heart rate variability analysis. Behav Res Methods 2011 Dec;43(4):1161-1170. [CrossRef] [Medline]
- Clifford G, McSharry P, Tarassenko L. Characterizing artefact in the normal human 24-hour RR time series to aid identification and artificial replication of circadian variations in human beat to beat heart rate using a simple threshold. Computers in Cardiology 2002;29:129-132. [CrossRef]
- Lang M. A Low-Complexity Model-Free Approach for Real-Time Cardiac Anomaly Detection Based on Singular Spectrum Analysis and Nonparametric Control Charts. Technologies 2018 Feb 15;6(1):26. [CrossRef]
- Lang M, Zoubir AM. A nonparametric cumulative sum scheme based on sequential ranks and adaptive control limits. In: Proceedings of the 23rd European Signal Processing Conference (EUSIPCO). 2015 Presented at: 23rd European Signal Processing Conference (EUSIPCO); 2015 Aug 31; Nice, France p. 1984-1988. [CrossRef]
- Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 2000 Jun 13;101(23):E215-E220 [FREE Full text] [Medline]
- Golyandina N, Nekrutkin V, Zhigljavsky A. Analysis of Time Series Structure: SSA and Related Techniques. Boca Raton, FL, USA: Chapman and Hall/CRC; 2001.
- Golyandina N, Zhigljavsky A. Singular Spectrum Analysis for Time Series (SpringerBriefs in Statistics). Berlin/Heidelberg, Germany: Springer; 2013.
- Elsner JB, Tsonis AA. Singular Spectrum Analysis: A New Tool in Time Series Analysis. New York, NY, USA: Springer; 1996:a.
- Sanei S, Hassani H. Singular Spectrum Analysis of Biomedical Signals. Boca Raton, FL, USA: CRC Press; 2015.
- Moskvina V. Application of the singular spectrum analysis for change-point detection in time series. Ph.D. Thesis, Cardiff University: Cardiff, UK 2001.
- Moskvina V, Zhigljavsky A. An Algorithm Based on Singular Spectrum Analysis for Change-Point Detection. Communications in Statistics - Simulation and Computation 2003 Jan 06;32(2):319-352. [CrossRef]
- Hassani H, Mahmoudvand R, Zokaei M. Separability and window length in singular spectrum analysis. Comptes Rendus Mathematique 2011 Sep;349(17-18):987-990. [CrossRef]
- Atikur Rahman Khan M, Poskitt DS. A Note on Window Length Selection in Singular Spectrum Analysis. Aust. N. Z. J. Stat 2013 Jun 19;55(2):87-108. [CrossRef]
- Khan AR, Poskitt DS. Signal Identification in Singular Spectrum Analysis. Aust. N. Z. J. Stat 2016 Mar 23;58(1):71-98. [CrossRef]
- McDonald D. A cusum procedure based on sequential ranks. Naval Research Logistics 1990:37-646. [CrossRef]
- Chatterjee S, Qiu P. Distribution-free cumulative sum control charts using bootstrap-based control limits. Ann. Appl. Stat 2009 Mar;3(1):349-369. [CrossRef]
- Ghodsi M, Hassani H, Rahmani D, Silva ES. Vector and recurrent singular spectrum analysis: which is better at forecasting? Journal of Applied Statistics 2017 Nov 15;45(10):1872-1899. [CrossRef]
- Hassani H, Ghodsi Z, Gupta R, Segnon M. Forecasting Home Sales in the Four Census Regions and the Aggregate US Economy Using Singular Spectrum Analysis. Comput Econ 2015 Dec 17;49(1):83-97. [CrossRef]
- Silva ES, Hassani H. On the use of singular spectrum analysis for forecasting U.S. trade before, during and after the 2008 recession. International Economics 2015 May;141:34-49. [CrossRef]
- Hassani H, Heravi S, Brown G, Ayoubkhani D. Forecasting before, during, and after recession with singular spectrum analysis. Journal of Applied Statistics 2013 Oct;40(10):2290-2302. [CrossRef]
- Hassani H, Heravi S, Zhigljavsky A. Forecasting UK Industrial Production with Multivariate Singular Spectrum Analysis. J. Forecast 2012 Apr 10;32(5):395-408. [CrossRef]
- Miranian A, Abdollahzade M, Hassani H. Day-ahead electricity price analysis and forecasting by singular spectrum analysis. IET Generation Transmission & Distribution 2013;7(4):346.
- Hassani H, Thomakos D. A review on singular spectrum analysis for economic and financial time series. Statistics and Its Interface 2010;3(3):377-397. [CrossRef]
- Clifford GD, Tarassenko L. Quantifying errors in spectral estimates of HRV due to beat replacement and resampling. IEEE Trans Biomed Eng 2005 Apr;52(4):630-638. [CrossRef] [Medline]
- Thuraisingham RA. Use of SSA and MCSSA in the Analysis of Cardiac RR Time Series. Journal of Computational Medicine 2013;2013:1-7. [CrossRef]
- Kalantari M, Yarmohammadi M, Hassani H. Singular Spectrum Analysis Based on L1-Norm. Fluct. Noise Lett 2016 Mar;15(01):1650009. [CrossRef]
- Kalantari M, Yarmohammadi M, Hassani H, Silva ES. Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis. Fluct. Noise Lett 2018 Jun;17(02):1850017. [CrossRef]
- Hassani H, Kalantari M, Yarmohammadi M. An improved SSA forecasting result based on a filtered recurrent forecasting algorithm. Comptes Rendus Mathematique 2017 Sep;355(9):1026-1036. [CrossRef] [Medline]
- Hassani H, Kalantari M. A novel signal extraction approach for filtering and forecasting noisy exponential series. Comptes Rendus Mathematique 2018 May;356(5):563-570. [CrossRef] [Medline]

#### Abbreviations

AC-SRC: adaptive control limits sequential ranks cumulative sum |

ANS: autonomic nervous system |

ARL: average run length |

CUSUM: cumulative sum |

ECG: electrocardiography |

HR: heart rate |

HRV: heart rate variability |

l-SSA-CPD: lightweight SSA change-point detection |

NSR: normal sinus rhythm |

NSRB: Normal Sinus Rhythm Database |

PVC: premature ventricular complex |

RMSE: root mean square error |

RRMSE: relative root mean square error |

SA: sinoatrial |

SRC: sequential ranks cumulative sum |

SSA: singular spectrum analysis |

SVD: singular value decomposition |

Edited by G Eysenbach; submitted 09.04.18; peer-reviewed by H Hassani, IN Gomez, L Becker; comments to author 02.05.18; revised version received 15.05.18; accepted 09.06.18; published 30.01.19

#### Copyright

©Michael Lang. Originally published in JMIR Biomedical Engineering (http://biomedeng.jmir.org), 30.01.2019.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Biomedical Engineering, is properly cited. The complete bibliographic information, a link to the original publication on http://biomedeng.jmir.org/, as well as this copyright and license information must be included.