This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Biomedical Engineering, is properly cited. The complete bibliographic information, a link to the original publication on http://biomedeng.jmir.org/, as well as this copyright and license information must be included.
Heart rate variability (HRV) is used to assess cardiac health and autonomic nervous system capabilities. With the growing popularity of commercially available wearable technologies, the opportunity to unobtrusively measure HRV via photoplethysmography (PPG) is an attractive alternative to electrocardiogram (ECG), which serves as the gold standard. PPG measures blood flow within the vasculature using color intensity. However, PPG does not directly measure HRV; it measures pulse rate variability (PRV). Previous studies comparing consumer-grade PRV with HRV have demonstrated mixed results in short durations of activity under controlled conditions. Further research is required to determine the efficacy of PRV to estimate HRV under free-living conditions.
This study aims to compare PRV estimates obtained from a consumer-grade PPG sensor with HRV measurements from a portable ECG during unsupervised free-living conditions, including sleep, and examine factors influencing estimation, including measurement conditions and simple editing methods to limit motion artifacts.
A total of 10 healthy adults were recruited. Data from a Microsoft Band 2 and a Shimmer3 ECG unit were recorded simultaneously using a smartphone. Participants wore the devices for >90 min during typical day-to-day activities and while sleeping. After filtering, ECG data were processed using a combination of discrete wavelet transforms and peak-finding methods to identify R-R intervals. P-P intervals were edited for deletion using methods based on outlier detection and by removing sections affected by motion artifacts. Common HRV metrics were compared, including mean N-N, SD of N-N intervals, percentage of subsequent differences >50 ms (pNN50), root mean square of successive differences, low-frequency power (LF), and high-frequency power. Validity was assessed using root mean square error (RMSE) and Pearson correlation coefficient (
Data sets for 10 days and 9 corresponding nights were acquired. The mean RMSE was 182 ms (SD 48) during the day and 158 ms (SD 67) at night.
Owing to overall poor concurrent validity and inconsistency among participant data, PRV was found to be a poor surrogate for HRV under free-living conditions. These findings suggest that free-living HRV measurements would benefit from examining alternate sensing methods, such as multiwavelength PPG and wearable ECG.
With the growing ubiquity of commercially available wearable technologies, obtaining long-term physiological measurements under free-living conditions is feasible and permits longitudinal examination of ecologically valid patterns. This presents an opportunity for continuous patient monitoring under free-living conditions, including the potential to identify at-risk individuals (eg, patients with cardiac disease). Heart rate variability (HRV) is a well-established, powerful metric used to assess cardiac health, including autonomic nervous system function regulating cardiac activity. Compared with an individual’s heart rate (HR) averaged over a short period, HRV measures variations in HR primarily as an indicator of the efforts of the sympathetic and parasympathetic nervous systems to achieve an optimal cardiac response under constantly changing stimuli [
The (gold) criterion standard for measuring HRV is through an electrocardiogram (ECG) to obtain a direct recording of cardiac electrical activity. On ECG, the R wave represents the maximum upward deflection of a normal QRS complex. The duration between two successive R waves defines the R-R interval [
PPG sensors measure changes in pulsatile blood flow within an individual’s vasculature using color intensity signals [
The accuracy of PRV as a measure of HRV has been investigated with clinical devices under controlled, and often stationary, conditions [
PPG sensors have been found to be sensitive to motion artifacts, changes in blood flow caused by movement, compression and deformation of the vasculature arising from pressure disturbances at the interface between the sensor and the skin [
Although HR and PR are correlated and closely related, the use of PRV to estimate HRV requires further research, especially under free-living conditions. In this study, the concurrent validity of PRV measurements from a consumer-facing PPG sensor is compared with HRV measurements from a portable ECG under 2 unsupervised conditions up to 4.5 hours each: (1) while engaging in regular activities of daily living and (2) during sleep. A secondary goal of this study is to examine factors influencing estimation errors of PRV for HRV, including motion artifacts, measurement conditions, and editing approaches.
A convenience sample of healthy individuals aged 18-65 years was recruited for the study. Individuals with a history of cardiac and/or sleep disorders were excluded to minimize the collection of irregular cardiac signals. Under these conditions, approval for this study was granted by the University of Waterloo Research Ethics Committee on September 5, 2017, filed under protocol #31197.
A total of 2 wearable devices were used to acquire cardiovascular signals in this study: (1) a commercially available optical PPG wearable device (Microsoft Band 2 or MB2, Microsoft) and (2) a research-grade wearable ECG device (Shimmer3 ECG, Shimmer). Both wearables were recorded simultaneously with signals transmitted via Bluetooth to a smartphone (Pixel or Nexus 3, Google). To synchronize the devices, triaxial accelerations were also recorded with both devices. Participants were asked to wear the devices twice, for at least 90 min each, once during daily activities and a second time when sleeping.
To record ECG, hydrogel electrodes (Kendall 233 Hydrogel, Covidien) were placed in a 4-lead bipolar limb lead configuration (ie, left arm [LA], right arm [RA], left leg [LL], right leg) on the participant’s chest as shown in
Electrocardiogram 4-lead bipolar limb electrocardiogram configuration on participants’ chests. LA: left arm; RA: right arm; V: precordial leads.
Given the free-living nature of data collection, participants were instructed on how to set up and monitor device connection and logging status to facilitate troubleshooting. To ensure proper electrode placement, a (trained) researcher placed the electrodes in the 4-lead bipolar limb lead configuration (
Following data collection, all postprocessing and statistical analyses were conducted using MATLAB 2018a (MathWorks).
Postprocessing of data from the Shimmer and Microsoft Band. ECG: electrocardiogram; HRV: heart rate variability; P-P: time between 2 P peaks in a photoplethysmogram or peak-to-peak intervals; PRV: pulse rate variability; R-R: time between 2 R peaks in an ECG.
Shimmer3 and MB2 were coarsely synchronized by aligning triaxial acceleration peaks from tapping both devices simultaneously on a table. Each device was tapped 3 times in 2 orientations with 10 s of rest between orientations. Fine synchronization was performed using a cross-correlation method described below (cross-correlation synchronization).
Both LA-RA and LL-RA ECG signals were filtered using a first order bandpass Butterworth filter from 1 to 25 Hz. A maximal overlap discrete wavelet transform with a Daubechies least-asymmetric wavelet with 4 vanishing movements was used to enhance the R peaks in the ECG, followed by a threshold-based peak-finding function used to identify the R-peaks [
P-P intervals and corresponding time stamps were recorded directly from MB2 outputs as the time interval between 2 continuous heartbeats [
None (condition A): This condition contains the raw P-P intervals.
Threshold deletion (condition B): Removing implausible P-P interval values for a healthy individual at rest, walking, or sleeping (P-P<0.3 s or P-P>2.5 s) [
Moving average deletion (condition C): Threshold deletion (as described in B above) and removing changes in P-P intervals faster than physiologically plausible indicated by a moving average filter. This was done following Morelli et al [
Acceleration-based deletion (condition D): A series of threshold filters, moving average filters (described in C above), and an acceleration filter. Considering that low PPG signal quality may be attributable to movement, Morelli et al [
In this study, no significant correlation between Wt and ᴋ was found. As such, a threshold of ᴋ=0.02 m/s2 was used to filter the data with τ=40 s (the same parameters as used by Morelli et al) [
Following coarse synchronization of MB2 and Shimmer3, consistent delays between the 2 devices were observed. To identify the highest correlation between devices, a cross-correlation between P-P and R-R data was conducted. The estimate of the time-shift was applied to the P-P data, similar to the method used by Pietilä et al [
After postprocessing, the following time domain HRV and PRV features were extracted for each trial, where N-N refers to either R-R or P-P:
Mean N-N: the mean of all N-N intervals
Mean HR: reciprocal of mean N-N, in beats per minute (bpm)
SDNN: a measure of overall variability, the SD of all N-N intervals, also known as RRSD
pNN50: percentage of subsequent differences more than 50 ms
RMSSD: root mean square of subsequent differences
LF, HF, LF/HF ratio: low-frequency power (LF), high-frequency power (HF), and the ratio of LF to HF
SD1 and SD2: SDs of short (x=y) and long (orthogonal to x=y) diagonal Poincaré plot axes [
For spectral measures, R-R and P-P intervals were converted to instantaneous HR (60/N-N, where N-N is interval time in seconds) and then interpolated to 4 Hz using a piecewise cubic Hermite interpolation (MATLAB function “pchip”). This ensured regular time intervals between data points, a prerequisite for estimating the Fourier transform and signal power. The Fourier transform was performed (using “fft” function in MATLAB) on the entire data set for each participant. This allowed for the calculation of frequency domain HRV features such as LF (0.04-0.15 Hz) and HF (0.15-0.40 Hz). LF and HF were computed in normalized units by the sum of LF and HF. The ratio of LF to HF was also reported.
To quantify the concurrent validity between R-R and P-P intervals, the following metrics were used:
Root mean square error (RMSE): RMSE between matched R-R and P-P samples
Pearson correlation coefficient (
To compare PPG-derived metrics across collection and processing conditions (ie, day- or nighttime collection, filtering condition), two-tailed paired
This section presents the results of (1) investigating the concurrent validity between R-R and P-P intervals across published filtering methods, (2) a comparison between ECG- and PPG-derived metrics of HRV, and (3) a comparison across free-living data collection conditions (ie, day and night). A total of 10 volunteers were recruited (3 men and 7 women, aged 20-61 years) for this study for a total of 19 trials (1 day and 1 night per participant). One participant’s ECG night data were corrupted and therefore not analyzed or reported.
After processing, a large amount of data was lost. The number of matched and windowed N-N intervals is described in
Group mean (SD) of data sample sizes used for comparison between R-R and P-P intervals across processing and collection conditions.
Collection condition and processing condition | Number of samples, mean (SD) | Percent R-R intervals compared, mean (SD) | Percent P-P intervals compared, mean (SD) | |
|
||||
|
A | 5168.70 (1683.92) | 52.35 (26.89) | 47.29 (18.32) |
|
B | 4706.5 (1447.57) | 48.25 (26.30) | 43.91 (18.31) |
|
C | 3311.30 (1316.13) | 34.68 (24.73) | 32.29 (16.88) |
|
D | 1847.40 (1334.28) | 23.03 (26.28) | 21.39 (15.92) |
|
||||
|
A | 8418.78 (5179.41) | 55.05 (27.70) | 53.30 (19.26) |
|
B | 8197.11 (5060.21) | 53.79 (27.31) | 52.06 (19.15) |
|
C | 7383.00 (4075.76) | 46.89 (24.24) | 46.66 (19.38) |
|
D | 7177.33 (4901.63) | 41.15 (27.67) | 42.97 (21.23) |
A larger data sample was acquired at night than that acquired during the day. Despite formal instructions and training on the operation and charging of the sensor systems, several technical barriers were frequently encountered that limited the number of samples in each trial. These included inadvertent misplacement of ECG electrodes or MB2, insufficient battery charging before night collection, and/or dropped Bluetooth stream to the mobile device.
Group mean root mean square error, concurrent validity (R2), and number of matched samples across processing and collection conditions.
Processing condition | Day (n=10) | Night (n=9) | |
|
|||
|
A | 182 (48) | 158 (67) |
|
B | 165 (42) | 136 (53) |
|
C | 144 (39) | 120 (45) |
|
D | 122 (47) | 119 (45) |
|
|||
|
A | 0.15 (0.12) | 0.28 (0.17) |
|
B | 0.14 (0.13) | 0.33 (0.19) |
|
C | 0.18 (0.13) | 0.34 (0.21) |
|
D | 0.22 (0.17) | 0.34 (0.21) |
The RMSE ranged between 46 and 285 ms across all conditions. Increased editing reduced the average error (RMSE). Under condition C, error was further examined by generating Bland-Altman plots comparing the P-P intervals with R-R intervals, as shown in
Bland-Altman plots for 1 participant under processing condition C for (A) day and (B) night. P-P: time between 2 P peaks in a photoplethysmogram or peak-to-peak intervals; R: time between 2 R peaks in an electrocardiogram.
Across all conditions,
Under condition D, no data sets showed strong correlations. Only 3 (1 day, 2 nights) were moderate, 7 were fair (1 day, 6 nights), and 9 were poor (7 days, 2 nights). Paired
Compared with condition C, condition D improved RMSE and
Correlation between absolute error and mean change in triaxial acceleration (Wt) under the same conditions as Morelli et al (A) and (B) comparison of |Error| and Wt over time for a sample with low correlation (R2=0.16) (C), and (D) comparison of |Error| and Wt over time for a sample with higher correlation (R2=0.50).
Comparison of mean heart rate variability and pulse rate variability metrics under processing condition C.
Features | Day | Night | |||||||||||||||
|
HRVa | PRVb | |Error| | HRV | PRV | |Error| | |||||||||||
|
|||||||||||||||||
|
NN (ms), mean (SD) | 829 (70) | 833 (51) | 19 (15) | .59 | 967 (151) | 960 (142) | 10 (9) | .08 | ||||||||
|
SDNNd (ms), mean (SD) | 90 (36) | 98 (25) | 25 (20) | .48 | 87 (37) | 69 (25) | 20 (10) | .03 | ||||||||
|
pNN50e (%), mean (SD) | 30.60 (24.51) | 39.74 (16.18) | 15.90 (11.30) | .14 | 38.58 (30.59) | 21.35 (15.06) | 19.48 (14.11) | .02 | ||||||||
|
RMSSDf (ms), mean (SD) | 104 (58) | 116 (38) | 42 (36) | .54 | 101 (57) | 67 (20) | 34 (36) | .02 | ||||||||
|
SD1g (ms), mean (SD) | 74 (41) | 82 (27) | 30 (26) | .54 | 72 (40) | 48 (21) | 24 (26) | .02 | ||||||||
|
SD2h (ms), mean (SD) | 94 (40) | 110 (25) | 31 (33) | .24 | 97 (35) | 83 (29) | 18 (14) | .05 | ||||||||
|
|||||||||||||||||
|
LFi (nu), mean (SD) | 0.70 (0.03) | 0.69 (0.02) | 0.03 (0.02) | .43 | 0.70 (0.03) | 0.72 (0.01) | 0.02 (0.02) | .02 | ||||||||
|
HFj (nu), mean (SD) | 0.30 (0.03) | 0.31 (0.02) | 0.03 (0.02) | .43 | 0.30 (0.03) | 0.28 (0.01) | 0.02 (0.02) | .02 | ||||||||
|
LF/HF ratio, mean (SD) | 2.39 (0.32) | 2.26 (0.22) | 0.28 (0.23) | .29 | 2.43 (0.28) | 2.64 (0.13) | 0.21 (0.18) | .01 |
aHRV: heart rate variability.
bPRV: pulse rate variability.
cResults from paired
dSDNN: SD of all N-N intervals.
epNN50: percent of subsequent differences more than 50 ms.
fRMSSD: root mean square of subsequent differences.
gSD1: SD of short (x=y) Poincaré plot axis.
hSD2: SD of long (orthogonal to x=y) Poincaré plot axis.
iLF: low-frequency power.
jHF: high-frequency power.
Compared with processing condition C, similar results were observed in condition D (
Time series plots of matched and edited R-R and P-P intervals (
Time series of matched time between 2 R peaks in an electrocardiogram and time between 2 P peaks in a photoplethysmogram or peak-to-peak intervals for a single participant under processing condition C during (A) day and (B) night. P-P: time between 2 P peaks in a photoplethysmogram or peak-to-peak intervals; R-R: time between 2 R peaks in an electrocardiogram.
Poincaré plots for the same participant under condition C are shown in
Poincaré plots for a single participant under processing condition C for (a) P-P intervals during the day, (b) R-R intervals during the day, (c) P-P intervals at night, and (d) R-R intervals at night. P-P: time between 2 P peaks in a photoplethysmogram or peak-to-peak intervals; R-R: time between 2 R peaks in an electrocardiogram.
P-P versus R-R intervals for a participant under processing condition C during (A) day and (B) night. P-P: time between 2 P peaks in a photoplethysmogram or peak-to-peak intervals; R-R: time between 2 R peaks in an electrocardiogram.
Night collections were found to have a slight decrease in RMSE, indicated by a mean decrease in RMSE of 24 (SD 45) ms, ranging from −89 ms to +40 ms difference across participants. For the participant highlighted in
Although night data had more matched samples, an unpaired
This paper examined the accuracy and concurrent validity of PRV measurements from a commercially available PPG sensor against HRV measurements obtained from a portable ECG sensor during unsupervised daytime and nighttime conditions. Accuracy and concurrent validity were examined across different editing methods and day and night collection conditions. In general, concurrent validity and HRV metrics were stronger at night compared with daytime conditions. Although collection during the night was more accurate with a lower mean error, this finding was not generalizable across all participants. Editing to remove outliers was effective in reducing noise, as reflected by the reduced RMSE for conditions B, C, and D. However, efforts to remove samples affected by motion artifacts using accelerometry (ie, condition D) were not as effective in this study compared with previous studies. The implications of these findings on ambulatory measurement of HRV using a commercially available PPG sensor to indicate health are discussed.
Although PPG sensors have strong mean HR measurement capabilities, the results from this study indicate poorer HRV capabilities. As expected, both ECG and PPG methods demonstrated similar mean R-R values with differences of less than 20 ms, reflecting established capabilities to estimate mean HR [
The implications of PPG sensing errors on HRV metrics are highlighted in
When comparing day and night collection conditions, concurrent validity and HRV metrics indicate more accurate HRV estimates at night. Improved concurrent validity at night may be attributed to fewer errors related to ambient light changes at night [
Simple editing methods to improve PPG signals were examined in this study. PPG recordings are known to be affected by motion artifacts, contact force, posture, and ambient temperature [
Our finding of relatively ineffective use of motion artifact compensation suggests that other factors affect PPG signals. For example, changes in respiration and peripheral vascular factors (ie, vascular volume, vasomotor activity, and vasoconstrictor waves) are known to affect the AC and DC frequency components of the PPG waveform [
The primary limitations of this study were the sample population and technical limitations of the devices. In this study, a convenience sample of 18- to 65-year-old participants with no known cardiac history participated. Although those with known cardiac conditions were excluded, the presence of underlying vascular disease in our cohort is unknown. As such, the findings may not be applicable to target disease populations. The impact of vascular conditions, such as atherosclerosis and cholesterol deposits in the arterial walls leading to decreased vessel compliance, which have been shown to alter pulse waveform from the classic triphasic pattern to mono- or biphasic patterns [
The devices used in this study were limited in several ways. Both Shimmer3 and MB2 devices logged using separate device clocks, with potential for drift (approximately 1-2 s) over the course of a single trial. The devices were synchronized using an external mechanical stimulus (ie, 3 taps in 2 orientations) and by applying a data-driven delay estimate (ie, cross-correlation). Although these procedures have been used in previous studies with good results coupled with qualitative and quantitative observation of synchronized signals, the potential for dropped samples or desynchronization exists. The publicly available documentation for MB2 offers little to no insight into R-R interval processing or adjustment for when faced with motion artifacts and is no longer commercially available at the time of writing. Of note, signal drops were observed sporadically, including (1) large amplitude arm movements and (2) when MB2 was out of Bluetooth range from the smartphone for long periods (>10 min). We interpret these signal drops as obvious situations where motion artifacts and wireless communication are severely challenged with little to no impact on our findings. Furthermore, the resolution of RR intervals reported by MB2 was 10 ms, limiting accuracy similar to quantization error (ie, round-off errors). Given the large number of samples, resolution limitations are unlikely to affect mean values (eg, mean RR) but may increase variability (eg, RMSSD) estimates. However, the observed underestimation is unlikely to arise from quantization errors and are interpreted as systematic errors associated with the sensing method.
Wearable technologies are becoming more sophisticated with commercially available products capable of providing consumers access to information previously limited to clinical settings, including HRV and ECG data to identify arrhythmias [
In future, examining more editing, correction, and interpolation techniques for interbeat intervals may enhance the interpretability and quality of the P-P intervals obtained from commercially available wearables [
The objective of this study was to assess the validity of PRV measurements taken from a PPG sensor by comparing it with the HRV measurements taken from a portable ECG while individuals were engaged in activities of daily living and during sleep. Although PPG sensors demonstrated greater validity at night, overall concurrent validity was poor. HRV metrics pNN50 and LF/HF ratio were especially sensitive to errors in point-to-point accuracy. Increased editing via deletion improved the RMSE but had a small impact on
Comparison of mean heart rate variability and pulse rate variability metrics under processing condition D.
electrocardiogram
high-frequency power (0.15-0.40 Hz)
heart rate
heart rate variability
left arm
low-frequency power (0.04-0.15 Hz)
left leg
Microsoft Band 2
N-N intervals (either R-R or P-P intervals)
percent of subsequent differences more than 50 ms
time between 2 P peaks in a PPG or peak-to-peak intervals
photoplethysmography
pulse rate
pulse rate variability
pulse transit time
Pearson correlation coefficient
right arm
root mean square error
root mean square of subsequent differences
time between 2 R peaks in an ECG
standard deviation of short (x=y) Poincaré plot axis
standard deviation of long (orthogonal to x=y) Poincaré plot axis
standard deviation of all N-N intervals, also known as RRSD
mean change in triaxial acceleration
This study was supported by the National Sciences and Engineering Research Council of Canada (NSERC) Discovery grant (RGPIN-2015-05317).
None declared.