This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Biomedical Engineering, is properly cited. The complete bibliographic information, a link to the original publication on http://biomedeng.jmir.org/, as well as this copyright and license information must be included.

Heart rate variability (HRV) is derived from the series of R-R intervals extracted from an electrocardiographic (ECG) measurement. Ideally all components of the R-R series are the result of sinoatrial node depolarization. However, the actual R-R series are contaminated by outliers due to heart rhythm disturbances such as ectopic beats, which ought to be detected and corrected appropriately before HRV analysis.

We have introduced a novel, lightweight, and near real-time method to detect and correct anomalies in the R-R series based on the singular spectrum analysis (SSA). This study aimed to assess the performance of the proposed method in terms of (1) detection performance (sensitivity, specificity, and accuracy); (2) root mean square error (RMSE) between the actual N-N series and the approximated outlier-cleaned R-R series; and (3) how it benchmarks against a competitor in terms of the relative RMSE.

A lightweight SSA-based change-point detection procedure, improved through the use of a cumulative sum control chart with adaptive thresholds to reduce detection delays, monitored the series of R-R intervals in real time. Upon detection of an anomaly, the corrupted segment was substituted with the respective outlier-cleaned approximation obtained using recurrent SSA forecasting. Next, N-N intervals from a 5-minute ECG segment were extracted from each of the 18 records in the MIT-BIH Normal Sinus Rhythm Database. Then, for each such series, a number (randomly drawn integer between 1 and 6) of simulated ectopic beats were inserted at random positions within the series and results were averaged over 1000 Monte Carlo runs. Accordingly, 18,000 R-R records corresponding to 5-minute ECG segments were used to assess the detection performance whereas another 180,000 (10,000 for each record) were used to assess the error introduced in the correction step. Overall 198,000 R-R series were used in this study.

The proposed SSA-based algorithm reliably detected outliers in the R-R series and achieved an overall sensitivity of 96.6%, specificity of 98.4% and accuracy of 98.4%. Furthermore, it compared favorably in terms of discrepancies of the cleaned R-R series compared with the actual N-N series, outperforming an established correction method on average by almost 30%.

The proposed algorithm, which leverages the power and versatility of the SSA to both automatically detect and correct artifacts in the R-R series, provides an effective and efficient complementary method and a potential alternative to the current manual-editing gold standard. Other important characteristics of the proposed method include the ability to operate in near real-time, the almost entirely model-free nature of the framework which does not require historical training data, and its overall low computational complexity.

Oscillations in the time interval between successive heart beats, referred to as heart rate variability (HRV), have long been known to allow for insight into the intricate control mechanisms of the autonomic nervous system (ANS) [

In a nutshell, heart rhythm and rate can be said to be governed by the 2 competing divisions of the ANS, that is, the sympathetic nervous system and the parasympathetic nervous system. An increased sympathetic input to the sinoatrial (SA) node yields an increase in heart rate (HR) mediated by the release of adrenaline and noradrenaline and the subsequent activation of β-1-adrenoceptors, resulting in an accelerated diastolic depolarization. On the other hand, an increase in the parasympathetic outflow decreases the HR through the release of acetylcholine by the vagus nerve, which binds to and activates M2 muscarinic acetylcholine receptors in the SA node and eventually slows down the diastolic depolarization rate [

Various HRV measures have been established and are usually categorized as either time domain, frequency domain, or nonlinear analysis techniques [

It is crucial to realize that by virtue of its very definition, the HRV encompasses only oscillatory phenomena between heart beats resulting from the SA node depolarization [

The detrimental impact of ectopics on HRV measures is pronounced and well-documented [

Ectopic heart beats are ubiquitous phenomena and not limited to patients with cardiac disorders and diseases. Hingorani et al [

In summary, the detection and correction of ectopic beats in tachograms are not merely imperative for formal consistency with the HRV definition but arises from the impact of ectopics on HRV measures and the general prevalence of ectopics.

Compared with the overall research interest directed toward HRV applications, the crucial issue of the discrepancy between most real-world R-R series and their respective N-N series has arguably not received as much attention. The actual gold standard of manual R-R series editing, advocated for in the field’s relevant guideline by the European Society of Cardiology and the North American Society of Pacing and Electrophysiology Task Force [

It should be noted that, aiding and abetting the aforementioned exponential growth of the HRV-related (applied) research, a number of well-designed and widely used software packages for the HRV analysis have been developed and made available to the general public; these include, but are not limited to, Kubios [

In automated approaches the task of the detection and correction of outliers is usually tackled as a two-step process.

First, outliers in the R-R tachogram are detected, usually by straightforward thresholding based on a fixed percentage difference to previous intervals or their SD [

Second, the detected (regions of) artifacts from step 1 are either discarded (thereby reducing the effective length of the tachogram), substituted by preceding uncorrupted intervals or by intervals obtained through interpolation, with linear and cubic spline interpolation being the most popular techniques [

The objective of this study was to introduce a general framework of low computational complexity that allows for an automatic near real-time detection and correction of outliers in the R-R series based on the singular spectrum analysis (SSA). The main novel contributions of this work include (1) the application of a recently proposed model-free lightweight SSA change-point detection (l-SSA-CPD) algorithm [

An extensive simulation study comprising 198,000 5-minute long R-R series, obtained by randomly inserting varying amounts of simulated ectopic beats in records taken from the MIT-BIH Normal Sinus Rhythm Database (NSRDB), is conducted to assess the artifact detection and correction performance of the proposed algorithm.

Recently, this author proposed a low-complexity model-free approach based on SSA and nonparametric cumulative sum (CUSUM) control charts for real-time cardiac anomaly detection, referred to as l-SSA-CPD [

In this study, modifications to the original l-SSA-CPD algorithm will be introduced, and its capabilities will be expanded by adding a recurrent forecasting feature to generate appropriate substitutes of the corrupted tachogram entries. First, however, a brief restatement of SSA and l-SSA-CPD [

SSA is a powerful technique of time series analysis owing much of its appeal to an inherent model-free concept combining elements of conventional time series analysis, multivariate geometry and statistics, and signal processing [

Consider _{1},..., _{N}

1. Embedding _{1}, ..., _{N}^{M × K })

A trajectory matrix Χ is constructed by mapping _{j}_{j}_{j+M-1}^{T },

Χ is called a Hankel matrix because of its characteristic of having equal elements on the antidiagonals. One can think of Χ as multivariate data with _{j}^{M }.

2. Singular Value Decomposition of Χ

Taking the singular value decomposition (SVD) of Χ decomposes the trajectory matrix into its orthogonal bases and yields a set of ^{T } be λ_{1} ≥ ··· ≥ λ_{M } ≥ 0 and _{1},..., _{M}_{i }>0) denoting the rank of Χ, the SVD of Χ can be expressed as the sum of

with rank 1 matrices ^{T }Χ. Accordingly, the decomposition in Equation (2) is completely characterized by the set of so-called “eigentriples”

Note that owing to the symmetry of left and right singular vectors, the SVDs of Χ with lag

3. Eigentriple Grouping

Consider the task of extracting a particular component or signal of interest from observed data contaminated by various artifacts such as noise. To do so, during the third stage of the basic SSA, disjoint subsets of indices {1, ...,

In the example of aiming at the separation of a signal from unwanted noise and other disturbances, this entails determining an appropriate subset of indices _{1}, ..., _{l}^{M }, denoted as ℒ_{I } ⊂ ℝ^{M } = _{I}_{i1}_{il}_{1}, ..., _{d}_{Ī } ⊂ ℝ^{M } = _{Ī}

The trajectory matrix component Χ_{I } corresponding to the subset

and the component Χ_{Ī } corresponding to the subset _{1},..., _{d}

such that the overall SVD of Χ can be rewritten as

In the case of separability (see, eg, [_{I } to the entire observed signal Χ is represented by the respective share of eigenvalues

4. Diagonal Averaging

For perfectly separable signal components, all matrices in the expansion of Equation (5) are Hankel matrices, which require that they all have equal entries on their antidiagonals. In real-world problems, however, perfect separability is rarely achievable. While an approximate separability usually suffices, the final fourth step in the SSA algorithm is required, as the matrices Χ_{i } from step 3 would have unequal entries on their antidiagonals. Thus, the Hankelization of all Χ_{i } is performed, that is, the compliance with the Hankel structure is enforced by taking the average of each antidiagonal and then replacing each element of that antidiagonal with the determined average. This yields

with

One can then, for example, easily reconstruct the approximation of the signal of interest through the eigentriples with indices

The sequential application of SSA for change detection is based on Moskvina and Zhigljavsky [

The algorithm by Moskvina and Zhigljavsky has been successfully applied to various change detection problems and some improved variations of it have been presented as well (see [

The behavior of SSA is largely determined by 2 tuning parameters, namely the window length

For a series of length

Similarly, it is well known that selecting a proper

In addition to the above-referred monographs, interested readers are also referred to some fairly recent contributions by Hassani et al [

In the light of the pragmatic suboptimal approach pursued here, a detailed discussion of the SSA parameter selection is, however, beyond the scope of this study and therefore omitted.

The proposed framework consisted of 2 separate stages, namely the detection and the correction of outlier corrupted segments of the tachogram.

Let

1. Initialization at

Akin to MZ, the first 3 steps of the basic SSA are applied on the interval [_{I } = ℒ_{I }^{(n=0)} which captures the main structure of the series. This involves embedding as in Equation (1) to obtain a trajectory matrix, which, hereafter, we refer to as the base matrix Χ_{B } = Χ_{B }^{(0)} = Χ_{B }^{(n=0)}, taking the SVD of Χ_{B } and obtaining ℒ_{I } through an appropriate choice of _{1},...,_{l}_{i } > 0).

2. Then, for each

Construction of a test matrix Χ_{T }^{(n)} on the interval [

Computation of the detection statistic _{n,I,p,q}

with

and with the angle ∠ (Χ_{T }^{(n)}, ℒ_{I }) taking values in [0, π / 2] and accordingly _{j}^{(n)} = [_{n+j}_{n+j+M −1}]^{T } are the column vectors of Χ_{T }^{(n)} whereas U_{I } = [_{i1}, ..., _{il}_{I } and º denotes the Hadamard (or element-wise) product. Note that throughout this study, _{T }^{(n)} used always equals 1. In other words, each test matrix is actually a single column vector containing

Monitoring of _{n,I,p,q}

To reduce detection delays, instead of the sequential ranks CUSUM (SRC) by McDonald [

Let the sequential rank of _{n,I,p,q}

The AC-SRC then starts with the same recursive equation as the SRC, ie,

with _{0} = 0 and _{n}_{n}

Furthermore, let _{j}

Chatterjee and Qiu showed that the conditional distributions of _{n}_{n}_{n}_{max} ≤ _{n}

with _{n}_{n}_{max} and

It can be shown that [_{n}_{max} and

with _{AC-SRC} wherein the quantities _{n}

The AC-SRC signals a change to have occurred if

or if

To limit the computational burden, it is recommended to calculate control limits only up to a reasonably small _{max} after which the control limit is kept fixed at _{AC-SRCjmax}. In addition, following recommendations by Chatterjee and Qiu, the reference constant _{n}_{0}=500. A detailed discussion as to the empirical calibration of control charts would, however, be beyond the scope of this work [_{max} decreases _{n}_{n}_{max} to allow for agile change detection is recommended (see

Finally, as with the SRC in the conventional l-SSA-CPD algorithm, the control chart is reinitialized after a change has been detected (by setting _{n}_{n}

Monitoring of the l-SSA-CPD test statistic _{n,I,p,q}_{max}=8, _{n}

Once an outlier has been detected at say _{1}, τ], is designated as corrupted tachogram segment. Note that Δ_{1} depends on the (average) detection delay and should be chosen carefully. However, it need not be exact or otherwise optimal. Then, a second interval of length Δ_{2} immediately preceding the corrupted one is used to obtain an anomaly-free forecast to substitute the corrupted segment with.

The rationale of the proposed framework is depicted in

As the proposed method operates sequentially as new observations arrive, an appropriate uncontaminated data segment (either uncorrupted to begin with or previously cleaned) to serve as a basis for forecasting is always available. While it may be beneficial to use large values for Δ_{2} (assuming sufficient past observations were available), keeping Δ_{1}, Δ_{2} as small as possible is recommended; as doing so allows for near real-time operation and significantly reduces the computational burden. For the purpose of this study, simplicity was prioritized over (more often than not unnecessary) optimality.

Detection performance of adaptive control limit sequential ranks with _{max }=6 and lightweight singular spectrum analysis change-point detection with

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.9712 | 0.9827 | 0.9825 |

16,272 | 0.9415 | 0.9802 | 0.9794 |

16,273 | 0.9905 | 0.9910 | 0.9909 |

16,420 | 0.9771 | 0.9778 | 0.9777 |

16,483 | 0.9953 | 0.9846 | 0.9846 |

16,539 | 0.9326 | 0.9878 | 0.9870 |

16,773 | 0.9692 | 0.9832 | 0.9829 |

16,786 | 0.9722 | 0.9758 | 0.9755 |

16,795 | 0.8800 | 0.9922 | 0.9905 |

17,052 | 0.9521 | 0.9830 | 0.9825 |

17,453 | 0.9768 | 0.9904 | 0.9902 |

18,177 | 0.9549 | 0.9757 | 0.9754 |

18,184 | 0.9719 | 0.9805 | 0.9803 |

19,088 | 0.9711 | 0.9917 | 0.9914 |

19,090 | 0.9900 | 0.9745 | 0.9745 |

19,093 | 0.9744 | 0.9837 | 0.9834 |

19,140 | 0.9689 | 0.9920 | 0.9917 |

19,830 | 0.9948 | 0.9909 | 0.9909 |

Detection performance of adaptive control limit sequential ranks with _{max }=8 and lightweight singular spectrum analysis change-point detection with

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.9335 | 0.9860 | 0.9854 |

16,272 | 0.8595 | 0.9850 | 0.9828 |

16,273 | 0.9199 | 0.9922 | 0.9913 |

16,420 | 0.9141 | 0.9810 | 0.9803 |

16,483 | 0.9569 | 0.9893 | 0.9888 |

16,539 | 0.7494 | 0.9889 | 0.9863 |

16,773 | 0.8781 | 0.9858 | 0.9844 |

16,786 | 0.9038 | 0.9779 | 0.9767 |

16,795 | 0.7210 | 0.9936 | 0.9897 |

17,052 | 0.8259 | 0.9851 | 0.9829 |

17,453 | 0.9013 | 0.9933 | 0.9921 |

18,177 | 0.8101 | 0.9785 | 0.9771 |

18,184 | 0.8508 | 0.9833 | 0.9819 |

19,088 | 0.8667 | 0.9924 | 0.9910 |

19,090 | 0.9403 | 0.9793 | 0.9789 |

19,093 | 0.8920 | 0.9869 | 0.9854 |

19,140 | 0.8796 | 0.9946 | 0.9933 |

19,830 | 0.9510 | 0.9945 | 0.9941 |

Detection performance of adaptive control limit sequential ranks with _{max }=12 and lightweight singular spectrum analysis change-point detection with

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.8816 | 0.9875 | 0.9864 |

16,272 | 0.8240 | 0.9871 | 0.9844 |

16,273 | 0.8710 | 0.9940 | 0.9925 |

16,420 | 0.8741 | 0.9850 | 0.9839 |

16,483 | 0.8990 | 0.9900 | 0.9889 |

16,539 | 0.7218 | 0.9908 | 0.9879 |

16,773 | 0.8202 | 0.9871 | 0.9850 |

16,786 | 0.8421 | 0.9814 | 0.9794 |

16,795 | 0.6619 | 0.9943 | 0.9898 |

17,052 | 0.7753 | 0.9893 | 0.9865 |

17,453 | 0.8450 | 0.9939 | 0.9921 |

18,177 | 0.7539 | 0.9824 | 0.9805 |

18,184 | 0.7952 | 0.9853 | 0.9832 |

19,088 | 0.8030 | 0.9930 | 0.9909 |

19,090 | 0.8869 | 0.9824 | 0.9814 |

19,093 | 0.8379 | 0.9892 | 0.9869 |

19,140 | 0.8004 | 0.9951 | 0.9931 |

19,830 | 0.9063 | 0.9948 | 0.9939 |

Detection performance of adaptive control limit sequential ranks with _{max }=16 and lightweight singular spectrum analysis change-point detection with

Record | Sensitivity | Specificity | Accuracy |

16,265 | 0.8623 | 0.9878 | 0.9866 |

16,272 | 0.7947 | 0.9885 | 0.9854 |

16,273 | 0.8774 | 0.9942 | 0.9929 |

16,420 | 0.8585 | 0.9851 | 0.9838 |

16,483 | 0.8950 | 0.9904 | 0.9893 |

16,539 | 0.7260 | 0.9920 | 0.9890 |

16,773 | 0.7951 | 0.9880 | 0.9857 |

16,786 | 0.8022 | 0.9814 | 0.9790 |

16,795 | 0.6913 | 0.9941 | 0.9900 |

17,052 | 0.7866 | 0.9905 | 0.9881 |

17,453 | 0.8176 | 0.9951 | 0.9932 |

18,177 | 0.7644 | 0.9841 | 0.9823 |

18,184 | 0.7846 | 0.9866 | 0.9847 |

19,088 | 0.8119 | 0.9936 | 0.9917 |

19,090 | 0.8733 | 0.9837 | 0.9827 |

19,093 | 0.8299 | 0.9898 | 0.9875 |

19,140 | 0.7949 | 0.9953 | 0.9934 |

19,830 | 0.8979 | 0.9952 | 0.9944 |

Adaptive control limits sequential ranks cumulative sum (AC-SRC) monitoring with _{max}=8, _{n}_{0}=500 of _{n,I,p,q}

Adaptive control limits sequential ranks cumulative sum (AC-SRC) monitoring with _{max}=8, _{n}_{0}=500 of _{n,I,p,q}

Schematic depiction of the proposed R-R series outlier detection and correction framework. Adaptive control limits sequential ranks cumulative sum with low detection delay is used to monitor lightweight singular spectrum analysis (SSA) change-point detection’s test statistic; when it signals (red vertical bar), a segment of length Δ1 immediately preceding the detection is labeled as being corrupted (yellow rectangle) and a larger, yet still reasonably small segment of length Δ2 (green rectangle) is used to construct a trajectory matrix for the recurrent SSA forecasting which eventually yields a short-term SSA forecast of length Δ1 which the corrupted segment is then substituted with.

Time series forecasting plays a crucial role in many scientific fields and arguably also represents one of the most popular (real-world) application areas of the SSA. Note that there exist different approaches for SSA forecasting [

Consider again a univariate time series _{1}, ..., _{N}_{1}, ..., _{K}_{r } ⊂ ℝ^{M } be a linear space of dimension _{1}, ..., _{r}_{r }. Furthermore, let _{i}_{r }, and its Hankelized counterpart _{1}, ..., _{M}^{T } ∈ ℝ^{M }, let ^{∇} ∈ ℝ^{M −1} denote the vector consisting of the first ^{M −1} the vector consisting of the last ^{2} = π_{1}^{2} + ··· + π_{r}^{2} where π_{i }|_{i }=1, ..., _{i}_{M } = (0, 0, ..., 0, 1)^{T } ∉ ℒ_{r } (ie, ℒ_{r } not to be a vertical space), ^{2} represents the squared cosine of the angle between ℯ_{M } and ℒ_{r } and ^{2} < 1 holds. The last component _{M}_{1}, ..., _{M}^{T } ∈ ℒ_{r } can then be shown to be a linear combination of its first _{1}, ..., _{M −1} (ie, of ^{∇})

with vector _{1}, ..., _{M −1})^{T } being

Note that the representation in Equation (15) does not depend on the choice of the basis _{1}, ..., _{r}_{r }.

With the required notation now having been introduced, the time series

with _{N +1}, ..., _{N+NF}_{F}

Two examples of the automatic detection and correction of outliers using the method proposed in this study are shown in

Ectopic beats detected on a 5-minute tachogram excerpt of record 19093 MIT-BIH Normal Sinus Rhythm Database (NSRDB) with 6 artificially inserted ectopics using lightweight SSA change-point detection with _{max}=6, _{n}_{0}=500 are corrected by means of the proposed method with Δ_{1}=_{2}=_{1} with the immediately preceding block of equal length (bottom).

Ectopic beats detected on a 5-minute tachogram excerpt of record 16265 MIT-BIH Normal Sinus Rhythm Database (NSRDB) with 5 artificially inserted ectopics using lightweight SSA change-point detection with N=20, M=10, Q=1, 75% of leading eigentriples and adaptive control limits sequential ranks cumulative sum with jmax=6, E{Tn}=4, ARL0=500 are corrected by means of the proposed method with Δ1=M, Δ2=N (top) or by replacing the corrupted segment of length Δ1 with the immediately preceding block of equal length (bottom).

The performance and utility of the proposed method are evaluated using records that are publicly available through Physiobank [

RR(tau)=RR(tau)-RR(tau)/3;

RR(tau+1)=RR(tau+1)+RR(tau+1)/3;

Specifically, the 4 affected records were records number 16,272, 18,177, 19,088, and 19,140, for which 2, 1, 2, and 1 data points were removed, respectively. Besides being readily discernible because they deviate by some order of magnitude from surrounding interbeat intervals, these outliers are also marked as artifacts in the respective annotation files.

Next, the effect of a PVC on the R-R interval was simulated by decreasing the respective entry to 2/3 of its actual value and by increasing the following entry to 4/3 of its value (

which is consistent with the actual signature of a single PVC (see, eg, [

Extensive Monte Carlo simulations were then used to determine the performance characteristics of the proposed method, wherein a large number of artificial R-R series based on the 18 5-minute MIT-BIH NSRDB records were created by first randomly drawing an integer between 1 and 6 and then randomly selecting the position(s) within the series where the respective PVCs would be placed (with the constraint that PVCs be at least 5 samples apart).

With the proposed method consisting of 2 stages (detection and correction), a separate analysis of each stage appears appropriate.

In reporting results pertaining to the detection performance of the first stage, established metrics commonly used in the literature are relied upon the sensitivity (

and

with TP, FP, TN, and FN representing the number of true positives, false positives, true negatives, and false negatives, respectively. Intuitively, sensitivity quantifies the ability to correctly detect (actual) anomalies, specificity quantifies the proportion of nonabnormal segments that are correctly identified as such, and accuracy combines both of the aforementioned aspects. More formally,

As reported previously [_{d } and considers a signal from the AC-SRC control chart as true positive if it falls in the interval [_{d }]. All results presented in this study were obtained using Δ_{d } = Δ_{1} =

Note that the design and calibration of a detection algorithm involves an inherent trade-off between false alarms and missed detections and that furthermore reducing the false alarm rate usually entails an increase in the detection delay. For this specific use case, high sensitivity and short detection delays are arguably of the highest priority; for one wants a high likelihood of detecting all actual outliers (to reliably “clean” the R-R series, thereby transforming it into the respective N-N series) with a manageable (ie, small) detection delay to be able to narrow down the corrupted segments in the series as much as possible. The latter is particularly relevant because the proposed method substitutes the (allegedly) corrupted segment by an approximation of the outlier-free segment obtained using recurrent SSA forecasting; one aims to keep the forecasting horizon small to obtain acceptable results without having to tweak the SSA parameters extensively. This comes at the expense of higher false alarm rates, as is evidenced by the results presented in

As in the considered design, _{max} is linked to the average sprint length _{n}_{max} in a use case, as the one considered here, were AC-SRC monitors for large deviations [

While AC-SRC achieved satisfactory performance over a wide range of values _{max} (results not reported here), upon examination of the results reported in _{max}=6 was chosen to proceed to the second stage of the proposed method (ie, the correction of corrupted segments of the R-R series).

As mentioned earlier this design choice entails higher false alarm rates, as discernible in the 2 examples shown in

Average root mean square errors of artificially contaminated R-R series.

Record | RMSE_{SSA}^{a,b} |
RMSE_{block} |
RRMSE^{c} |
Percent increase or decrease |

16,265 | 0.0225 | 0.0251 | 0.8964 | 10.36 |

16,272 | 0.0239 | 0.0354 | 0.6751 | 32,49 |

16,273 | 0.0136 | 0.0221 | 0.6154 | 38.46 |

16,420 | 0.0246 | 0.0326 | 0.7546 | 24.54 |

16,483 | 0.0162 | 0.0196 | 0.8265 | 17.35 |

16,539 | 0.0282 | 0.0342 | 0.8246 | 17.54 |

16,773 | 0.0299 | 0.0509 | 0.5874 | 41.26 |

16,786 | 0.0249 | 0.0366 | 0.6803 | 31.97 |

16,795 | 0.0323 | 0.0450 | 0.7178 | 28.22 |

17,052 | 0.0213 | 0.0414 | 0.5145 | 48.55 |

17,453 | 0.0256 | 0.0344 | 0.7442 | 25.58 |

18,177 | 0.0296 | 0.0357 | 0.8291 | 17.09 |

18,184 | 0.0298 | 0.0450 | 0.6622 | 33.78 |

19,088 | 0.0151 | 0.0198 | 0.7626 | 23.74 |

19,090 | 0.0187 | 0.0245 | 0.7633 | 23.67 |

19,093 | 0.0350 | 0.0439 | 0.7973 | 20.27 |

19,140 | 0.0123 | 0.0156 | 0.7885 | 21.15 |

19,830 | 0.0110 | 0.0125 | 0.8800 | 12.00 |

Average | 0.0230 | 0.0319 | 0.7210 | 27.9 |

^{a}RMSE: root mean square error.

^{b}SSA: singular spectrum analysis.

^{c}RRMSE: relative root mean square error.

Results pertaining to the second stage of the proposed method are reported in

Calculation of the RMSE is straightforward,

Furthermore, to better compare the RMSE of the proposed method to its competitor (the commonly used block replacement method is used here for benchmarking purposes), consider also the relative RMSE (RRMSE) given by

Note that if the RRMSE, as given in Equation (18) is <1, it implies that the novel approach presented in this paper outperforms the competing benchmark approach by 100·(1 − RRMSE). It shall be emphasized, however, that the block replacement method as it its applied here (to correct for outliers in the R-R series) uses the first stage of the proposed method as well, its performance should thus be seen in the context of being applied to pretty tight segments of the series that ought to be substituted, which is all because of the AC-SRC l-SSA-CPD algorithm of the first stage of the framework presented in this study.

That being said, the results reported in

means of recurrent SSA forecasting, clearly outperforms the competing block replacement method. In fact, it outperforms its competitor for all 18 records (on which the extensive simulations built upon), on average by almost 30%.

The HRV has long been known to allow for valuable insights into the intricate control mechanisms of the ANS and has a long track record of exponentially growing research interest and output. A crucial prerequisite for any HRV analysis is the exclusion of all artifacts and abnormalities, that is, a clean N-N series is required. Such a clean N-N series is virtually never available, mostly because of various interferences, beats not detected by the QRS complex-detector, and various highly prevalent heart rhythm disturbances.

Compared with the overall research effort directed toward the HRV, the issue of automatic R-R series (pre)processing has received rather little attention. In fact, manual editing still represents the gold standard. It should be noted that every automatic approach comes with risks and benefits; the former lie in the inherently nonzero type I and type II error rates, whereas the latter include, but are not limited to, considerable savings in both cost and time. Considering that error rates for manual approaches are nonzero as well and that they are most likely an increasing function of time (whereas in automatic approaches they are usually not), the choice of which one or which particular combination to use should always be made on an individual basis, considering all requirements and circumstances of the particular case. Accordingly, while the proposed method could be construed as a potential alternative to manual editing, in the light of the above considerations, this author would rather recommend it as a complementary method and or a preprocessing tool.

The main contribution of this work is the introduction of a general framework of low computational complexity that allows for an automatic near real-time detection and correction of outliers in R-R series based on the SSA. While related work pertaining to the use of the SSA in ECG and R-R data processing exists [

An extensive simulation study comprising 198,000 5-minute R-R series, obtained by randomly inserting varying amounts of simulated ectopic beats in records taken from the MIT-BIH NSRDB was conducted to assess the artifact detection and correction performance of the proposed algorithm.

It was shown that the proposed algorithm reliably detects outliers in the R-R series and achieved an overall sensitivity of 96.6%, specificity of 98.4%, and accuracy of 98.4%. Furthermore, it compares favorably in terms of discrepancies of the cleaned R-R series compared with the actual N-N series, outperforming a block replacement approach on average by almost 30%. It should also be emphasized that a suboptimal pragmatic approach was pursued, deliberately refraining from an optimization of the various (SSA and AC-SRC) tuning parameters, which would most likely have resulted in further performance improvements at the expense of simplicity and computational complexity.

Other important characteristics of the proposed method include the ability to operate in near real-time, the almost entirely model-free nature of the framework, and the low computational complexity. Moreover, it should be pointed out that the proposed method is not limited to the R-R series but can be applied broadly, as all of its components are deliberately kept as general as possible. This entails the additional benefit of not being limited to the removal of PVCs (although only this has extensively been investigated and tested so far) but rather being able to deal with all kinds of anomalies that may contaminate the observed series of interest.

This work has several limitations, some of which open (new) avenues for future research.

A major limitation is because of the lack of established and generally recognized and validated procedures other than manual editing, which transforms the performance comparison among competing methods in a nontrivial task. Second, to increase the readability and decrease the length, the paper fails to provide general recommendations and guidelines as to how various tuning parameters should be chosen. Finally, the SSA represents a very active research field, as in the last couple of years, and several promising advances and new developments have been reported, but thus far have not been considered in the research reported in this work. This, however, does not infringe on the relevance of this work, as the objective was to introduce a simple, general framework of low computational complexity and to investigate the use of SSA in both the detection and correction step.

Several of the abovementioned recent developments, however, appear to open promising avenues for future research. In particular, the author intends to expand and improve the work presented here by focusing on the issue of the SSA parameter selection and by considering latest developments, especially pertaining to SSA forecasting. For the latter, the interested reader is referred to recent publications by Hassani and Kalantari [

adaptive control limits sequential ranks cumulative sum

autonomic nervous system

average run length

cumulative sum

electrocardiography

heart rate

heart rate variability

lightweight SSA change-point detection

normal sinus rhythm

Normal Sinus Rhythm Database

premature ventricular complex

root mean square error

relative root mean square error

sinoatrial

sequential ranks cumulative sum

singular spectrum analysis

singular value decomposition

ML was supported by the “Excellence Initiative” of the German Federal and State Governments and the Graduate School of Excellence Computational Engineering at Technische Universität Darmstadt. The views expressed in this article are solely those of the author in his private capacity and do not necessarily reflect the views of Technische Universität Darmstadt or any other organization.

None declared.