This article is part of the series Advances in Electrocardiogram Signal Processing and Analysis.

Open Access Research Article

Clustering and Symbolic Analysis of Cardiovascular Signals: Discovery and Visualization of Medically Relevant Patterns in Long-Term Data Using Limited Prior Knowledge

Zeeshan Syed1*, John Guttag1 and Collin Stultz12

Author Affiliations

1 Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA

2 Brigham and Women's Hospital, Cambridge, MA 02115, USA

For all author emails, please log on.

EURASIP Journal on Advances in Signal Processing 2007, 2007:067938  doi:10.1155/2007/67938


The electronic version of this article is the complete one and can be found online at: http://asp.eurasipjournals.com/content/2007/1/067938


Received: 30 April 2006
Revisions received: 18 December 2006
Accepted: 27 December 2006
Published: 5 March 2007

© 2007 Syed et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper describes novel fully automated techniques for analyzing large amounts of cardiovascular data. In contrast to traditional medical expert systems our techniques incorporate no a priori knowledge about disease states. This facilitates the discovery of unexpected events. We start by transforming continuous waveform signals into symbolic strings derived directly from the data. Morphological features are used to partition heart beats into clusters by maximizing the dynamic time-warped sequence-aligned separation of clusters. Each cluster is assigned a symbol, and the original signal is replaced by the corresponding sequence of symbols. The symbolization process allows us to shift from the analysis of raw signals to the analysis of sequences of symbols. This discrete representation reduces the amount of data by several orders of magnitude, making the search space for discovering interesting activity more manageable. We describe techniques that operate in this symbolic domain to discover rhythms, transient patterns, abnormal changes in entropy, and clinically significant relationships among multiple streams of physiological data. We tested our techniques on cardiologist-annotated ECG data from forty-eight patients. Our process for labeling heart beats produced results that were consistent with the cardiologist supplied labels 98.6 of the time, and often provided relevant finer-grained distinctions. Our higher level analysis techniques proved effective at identifying clinically relevant activity not only from symbolized ECG streams, but also from multimodal data obtained by symbolizing ECG and other physiological data streams. Using no prior knowledge, our analysis techniques uncovered examples of ventricular bigeminy and trigeminy, ectopic atrial rhythms with aberrant ventricular conduction, paroxysmal atrial tachyarrhythmias, atrial fibrillation, and pulsus paradoxus.

References

  1. D Kopec, MH Kabir, D Reinharth, O Rothschild, JA Castiglione, Human errors in medical practice: systematic classification and reduction with automated information systems. Journal of Medical Systems 27(4), 297–313 (2003). PubMed Abstract | Publisher Full Text OpenURL

  2. GD Martich, CS Waldmann, M Imhoff, Clinical informatics in critical care. Journal of Intensive Care Medicine 19(3), 154–163 (2004). PubMed Abstract | Publisher Full Text OpenURL

  3. Z Syed, J Guttag, Prototypical biological signals. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '07), April 2007, Honolulu, Hawaii, U.S.A.

  4. CS Daw, CEA Finney, ER Tracy, A review of symbolic analysis of experimental data. Review of Scientific Instruments 74(2), 915–930 (2003). Publisher Full Text OpenURL

  5. E Braunwald, D Zipes, P Libby, Heart Disease: A Textbook of Cardiovascular Medicine (WB Saunders, Philadelphia, Pa, USA, 2001)

  6. D Cuesta-Frau, JC Pérez-Cortés, G Andreu-García, Clustering of electrocardiograph signals in computer-aided Holter analysis. Computer Methods and Programs in Biomedicine 72(3), 179–196 (2003). PubMed Abstract | Publisher Full Text OpenURL

  7. CS Myers, LR Rabiner, A comparative study of several dynamic time-warping algorithms for connected-word recognition. The Bell System Technical Journal 60(7), 1389–1409 (1981)

  8. DL Donoho, De-noising by soft-thresholding. IEEE Transactions on Information Theory 41(3), 613–627 (1995). Publisher Full Text OpenURL

  9. G Chen, Q Wei, H Zhang, Discovering similar time-series patterns with fuzzy clustering and DTW methods. Proceedings of Joint 9th IFSA World Congress and 20th NAFIPS International Conference (NAFIPS '01), July 2001, Vancouver, BC, Canada 4, 2160–2164

  10. EJ Keogh, MJ Pazzani, Scaling up dynamic time warping for data mining applications. Proceeding of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '00), August 2000, Boston, Mass, USA, 285–289

  11. TF Gonzalez, Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38(2-3), 293–306 (1985)

  12. J Fraden, MR Neuman, QRS wave detection. Medical and Biological Engineering and Computing 18(2), 125–132 (1980). PubMed Abstract | Publisher Full Text OpenURL

  13. R Hamming, Error-detecting and error-checking codes. The Bell System Technical Journal 29(2), 147–160 (1950)

  14. GM Landau, JP Schmidt, D Sokol, An algorithm for approximate tandem repeats. Journal of Computational Biology 8(1), 1–18 (2001). PubMed Abstract | Publisher Full Text OpenURL

  15. SF Altschul, W Gish, W Miller, EW Myers, DJ Lipman, Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990). PubMed Abstract OpenURL

  16. D Jennings, T Amabile, L Ross, Informal covariation assessments: data-based versus theory-based judgements. Judgement Under Uncertainty: Heuristics and Biases (Cambridge University Press, Cambridge, UK, 1982), pp. 211–230

  17. M Baumert, V Baier, S Truebner, A Schirdewan, A Voss, Short- and long-term joint symbolic dynamics of heart rate and blood pressure in dilated cardiomyopathy. IEEE Transactions on Biomedical Engineering 52(12), 2112–2115 (2005). PubMed Abstract | Publisher Full Text OpenURL

  18. N Abramson, Information Theory and Coding (McGraw Hill, New York, NY, USA, 1963)

  19. I Kojadinovic, Relevance measures for subset variable selection in regression problems based on -additive mutual information. Computational Statistics & Data Analysis 49(4), 1205–1227 (2005). PubMed Abstract | Publisher Full Text | PubMed Central Full Text OpenURL

  20. NJ Holter, New method for heart studies. Science 134(3486), 1214–1220 (1961). PubMed Abstract | Publisher Full Text OpenURL

  21. R Agarwal, J Gotman, D Flanagan, B Rosenblatt, Automatic EEG analysis during long-term monitoring in the ICU. Electroencephalography and Clinical Neurophysiology 107(1), 44–58 (1998). PubMed Abstract | Publisher Full Text OpenURL

  22. M Lagerholm, C Peterson, G Braccini, L Edenbrandt, L Sörnmo, Clustering ECG complexes using hermite functions and self-organizing maps. IEEE Transactions on Biomedical Engineering 47(7), 838–848 (2000). PubMed Abstract | Publisher Full Text OpenURL