# Alexander Kain — Curriculum Vitae

## 1 Present Positions

### Oregon Health & Science University

Associate Professor
Computer Science & Electrical Engineering (
CSEE)
Center for Spoken Language Understanding (CSLU)
Department of Pediatrics
School of Medicine (SOM)
Oregon Health & Science University (OHSU)
3181 SW Sam Jackson Park Road
Portland, Oregon 97239-3098
Email: kaina@ohsu.edu
Phone / Fax : (503) 349-3750 / (503) 346-3754

## 4 Scholarship

### 4.2 Grants

#### Completed

1. 2010/09/27–2016/09/30: National Science Foundation BCS-1027834, "Computational Models for the Automatic Recognition of Non-Human Primate Social Behaviors", PI: Kain (OHSU). To develop methods that will permit researchers to remotely and automatically monitor behavior of primates and other highly social animals.
2. 2013/12/01–2015/08/31: National Institute of Health 1R43DA037588-01A1, "Screening for Sleep Disordered Breathing with Minimally Obtrusive Sensors", PI: Snider (BioSpeech). Sleep disordered breathing (SDB) is believed to be a widespread, under-diagnosed condition associated with detrimental health problems, at a high cost to society. The current gold standard for diagnosis of SDB is a time-consuming, expensive, and obtrusive (requiring many attached wires) sleep study, or polysomnography (PSG). The immediate objective of our research is to develop and evaluate a hardware design and a set of algorithms for automatically detecting obstructive, central, or mixed apneas and hypopneas from acoustic, peripheral oxygen saturation (SpO2), and pulse rate data, using an ambient microphone and a wireless pulse oximeter. The long-term goal is to create a low-cost, easy-to-operate, minimally obtrusive, at-home device that can be used for early and frequent screen for SDB in patients' homes, significantly increasing patient comfort while capturing more representative sleep data compared to a clinical sleep study. In collaboration with Chad Hagen, M. D. at the Sleep Disorders Program at OHSU, we aim to (1) develop a screening system by selecting minimally obtrusive sensor hardware and extending state-of-the art algorithms for automatically detecting SDB from acoustic, SpO2, and pulse rate data; (2) collect patient data in the sleep lab and at home from representative populations using the proposed system; (3) determine the screening accuracy by comparing the performance of the proposed system on the collected data against standard PSG-derived clinical results; and (4) measure the usability of an at-home screening device by the target population, by asking subjects who participated in the at-home data collection to complete a survey on various aspects of the setup and operation of the proposed system. My role: I provide signal-processing and machine-learning expertise, and assist with all scientific aspects of the project. Amount: \$205K.
3. 2012/04/01–2015/03/31: National Institute of Health 5R44DC009515-03, "SBIR Phase 2: Computer-based auditory skill building program for aural (re)habilitation", PI: Connors (BioSpeech). To extend an adaptive computer-guided software program that focuses on learning phoneme discrimination and identification. See Phase I description. Amount: \$400K.
4. 2011/12/01–2015/08/31: National Institute of Health R21DC012139, "Computer-Based Pronunciation Analysis for Children with Speech Sound Disorders", PI: Kain (OHSU). In this work we are developing speech-production assessment and pronunciation training tools for children with speech sound disorders. To-date, computer-assisted pronunciation training has not yet been successfully extended to help children with speech sound disorders, primarily because of a lack of accuracy in phoneme-level analysis of the speech signal. My role: I am creating a set of algorithms that will reliably identify and score the intelligibility of a phoneme within an isolated target word, providing immediate, relevant, and understandable feedback about pronunciation errors. The use of human perceptual data during training is an important and new component of the proposed approach. As PI, I am also responsible for overall project supervision and management. Amount: \$416K.
5. 2010/06/09–2015/05/31: National Science Foundation IIS-0964102, "Semi-Supervised Discriminative Training of Language Models", PI: Kain (OHSU). To conduct fundamental research in statistical language modeling to improve human language technologies, including automatic speech recognition (ASR) and machine translation (MT).
6. 2010/05/15–2015/04/30: National Science Foundation IIS-0964468, "HCC: Medium: Synthesis and Perception of Speaker Identity", PI: Kain (OHSU). Millions of Americans with impaired or absent speech communication ability rely on Augmentative and Alternative Communication devices with voice output (Speech Generating Devices, or SGDs) to communicate. A psychologically important and desirable feature is the ability to speak with one's own voice, i. e. the ability for the SGD to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. However, current text-to-speech (TTS) systems can only create speech with one or very few supplied speaker characteristics, and cannot be trained to take on the user's voice. My role: Together with Ph. D. students and co-investigators, I am creating a TTS synthesis system that generates speech that sounds like that of a specific individual (Speaker Identity Synthesis, or SIS). In the process we are building and evaluating analysis and synthesis models of the relevant acoustic features, including pitch, duration, and spectrum. Since the system includes a trainability component, this project also involves use of advanced mapping technology in the form of a joint-density Gaussian mixture model. I first proposed this approach in a 1998 publication which has since been cited over 370 times. As PI, I am also responsible for overall project supervision, management, and mentorship of graduate student Mohammadi. Amount: \$905K.
7. 2011/04/01–2012/03/31: National Institute of Health 5R42DC008712, "User Adaptation of AAC Device Voices - Phase 2", PI: Klabbers (BioSpeech). Developing and evaluating voice transformation and prosody modification technologies to customize synthetic voices in AAC devices, mimicking the individual user's pre-morbid speech. See Phase 1 description.
8. 2011/03/01–2013/03/31: National Institute of Health 1R43DC011706-01, "SBIR Phase 1: Computerized System for Phonemic Awareness Intervention", PI: Connors (BioSpeech). Phonemic awareness, defined as “the ability to notice, think about, and work with the individual sounds in spoken words”, is considered a necessary skill for literacy. The financial and quality-of-life costs of these impairments are significant, not only because of the link with reading difficulties and hence with future employability, but also because there may exist further links between reading difficulties and a range of psychiatric disorders. This argues for phonemic awareness intervention beyond what can be taught in a regular pre-school or elementary school curriculum. Such intervention is typically provided in the form of one-on-one sessions with a specialized professional (e. g. a Speech Language Pathologist). However, responding to cost concerns and poor access to these services, and also recognizing the importance of frequent intervention sessions, usage of computerized intervention systems is becoming more common. These computerized intervention systems have been steadily improving. However, one significant drawback continues to be their restricted response modalities, typically consisting of the child using a touch screen or a pointing device to select from a set of pictures. By confining the phonemic awareness skills that the system addresses to those that can be tapped into via picture-point-and-click , these systems have a restricted scope of what they can teach. A second drawback of many current systems is that their user interface (e. g. visual layout, tempo) is typically not tunable to the individual characteristics of the child. Given the prevalence of phonemic awareness issues in a broad range of neurodevelopmental disorders, including Autism Spectrum Disorder and Developmental Language Disorder, individual tuning may be critical to address individual neurocognitive weaknesses, such as problems in memory, attention, visual scanning, perceptualmotor coordination, and processing speed. We have addressed these drawbacks by (1) taking advantage of drag-and-drop and other touch response modalities that current low-cost touch screen computers are capable of processing and that children are increasingly more familiar with, and (2) by incorporating multiple dimensions of individual tunability into the system. My role: Since 2005, I have been the primary developer of the BioSpeech text-to-speech system, a medium-size software project comprised of approximately 10,000 lines of code. For this project, I assisted with integration with the graphical user interface, as well as provided solutions to the problem of synthesizing illegal (i. e. not found in normal use of English) phoneme sequences.
9. 2009/09/01–2013/08/31: National Science Foundation IIS-0915754, "RI: Small: Modeling Coarticulation for Automatic Speech Recognition", PI: Kain (OHSU). We have developed a data-driven, triphone formant trajectory model and methodology for estimating its parameters. In this model, formant targets are speaker dependent, but independent of speaking style. We have validated this model using perceptual listening tests. An analysis of conversationally and clearly spoken speech confirmed that (1) formant trajectories in clear vowels reach their targets more frequently, (2) formants show considerable asynchronicity, and (3) phoneme formant targets approximate their expected values. We also found preliminary evidence that targets derived from clear speech alone perform better at modeling both styles than targets from conversational speech. Having created and validated this model, we are now in the process of applying the approach to disordered speech, paving the way for an objective diagnosis of the degree of coarticulation of dysarthria. Another application is an objective evaluation of the effectiveness of specific speech interventions for certain kinds of dysarthria, e. g. the Lee Silverman Voice Treatment. Finally, this research may also provide an avenue for automatically transforming conversationally-spoken speech to sound as if it had been spoken clearly, thus increasing its intelligibility. A real-time, transparent version of this algorithm would be a desirable feature in many general telecommunications devices. My role: As PI, I am responsible for all aspects of the project, including overall project supervision and management, as well as mentoring of graduate student Bush.
10. 2009/07/15–2012/06/30: National Science Foundation IIS-0905095, "HCC: Automatic detection of atypical patterns in cross-modal affect", PI: van Santen (OHSU).The expression of affect in face-to-face situations requires the ability to generate a complex, coordinated, cross-modal affective signal, having gesture, facial expression, vocal prosody, and language content modalities. This ability is compromised in neurological disorders such as Parkinson's disease and autism spectrum disorder (ASD). The long term goal is to build computer-based interactive systems for remediation of poor affect communication and diagnosis of the underlying neurological disorders based on analysis of affective signals. A requirement for such systems is technology to detect atypical patterns in affective signals. We developed a play situation for eliciting affect and collected audio-visual data from approximately 60 children between the ages of 4–7 years old, half of them with ASD and the other half constituting a control group of typically developing children. We labeled the data on relevant affective dimensions, developed algorithms for the analysis of affective incongruity, and then tested the algorithms against the labeled data in order to determine their ability to differentiate between ASD and typical development. My role: I created special delexicalized speech stimuli, using a novel delexicalization algorithm that rendered the lexical content of an utterance unintelligible while preserving important acoustic prosodic cues. Preference tests showed that the proposed method preserved drastically more speaker identity, and sounded more natural than conventional methods. These delexicalized speech stimuli were used in perceptual tests to exclude the effect of lexical content on affect.
11. 2009/07/17–2012/06/30: National Institute of Health 5R21DC010035, "Quantitative Modeling of Segmental Timing in Dysarthria", PI: van Santen (OHSU). The project seeks to apply a quantitative modeling framework to segment durations in sentences produced by speakers with a variety of neurological diagnoses and dysarthrias. My role: I was responsible for software development for custom recording of speech data and for the extension of my previously published hybridization algorithm for the purposes of creating special perceptual speech stimuli.
12. 2008–2009: Nancy Lurie Marks Family Foundation award, "In Your Own Voice: Personal AAC Voices for Minimally Verbal Children with Autism Spectrum Disorder", PI: van Santen (OHSU). My role: I performed research and development to adapt a text-to-speech voice to sound like a particular child's voice; a task made particularly challenging by the difficulty of extracting reliable acoustic features from children's speech.
13. 2007/09/01–2011/08/31: National Science Foundation IIS-0713617, "HCC: High-quality Compression, Enhancement, and Personalization of Text-to-Speech Voices", PI: Kain (OHSU). My role: Together with Ph. D. students and co-investigators, I developed text-to-speech (TTS) technologies that focus on elimination of concatenation errors and improved accuracy in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics, using an asynchronous interpolation model that Jan van Santen and I proposed in 2002. These algorithmic advances added to the general acceptability of Speech Generating Devices (SGDs), used by individuals with impaired or absent speech communication.
14. 2007/01/01–2008/06/30: National Institute of Health 1R41DC008712, "User Adaptation of AAC Device Voices - Phase 1", PI: van Santen (BioSpeech). Speech communication ability is impaired or absent in millions of Americans due to neurological disorders and diseases and to trauma, including autism, Parkinson's disease, and stroke. Augmentative and Alternative Communication (AAC) devices that are operated via switches, keyboards, and a broad range of other input devices, and that have synthetic speech as output, are often the only manner in which these individuals can communicate. A psychologically important feature that no currently available systems have is the ability to speak with the user's voice, i.e., the ability to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. This project used voice transformation (VT) technology to accomplish this goal. My role: I developed and evaluated voice transformation and prosody modification technologies to customize synthetic voices using concatenative speech synthesis technologies, with the aim of mimicking the individual user's pre-morbid speech.
15. 2006/09/01–2008/03/31: National Institute of Health 1R41DC007240, "Voice Transformation for Dysarthria - Phase 1", PI: van Santen (BioSpeech). Dysarthria is a motor speech disorder due to weakness or poor co- ordination of the speech muscles. Affected muscles include the lungs, larynx, oro- and nasopharynx, soft palate, and articulators (lips, tongue, teeth, and jaw). The degree to which these muscle groups are compromised determines the particular pattern of speech impairment. For example, poor lung function affects the overall volume or loudness, while problems with specific articulators may cause mispronunciations of certain phonemes. There is a great variety of diseases that can cause dysarthria, including Parkinson’s, Multiple Sclerosis, and strokes. My role: I continued development of software that transforms speech compromised by dysarthria into easier-to-understand and more natural-sounding speech. In addition, I designed a hardware configuration that allowed the software to reside on a wearable computer, with a headset microphone as input and powered speaker as output, giving the user full mobility while wearing the speaking-aid.
16. 2005/01/10–2010/12/31: National Institute of Health 5R01DC007129, "Expressive crossmodal affect integration in Autism", PI: van Santen (OHSU). Autistic Spectrum Disorders (ASD) form a group of neuropsychiatric conditions whose core behavioral features include impairments in reciprocal social interaction, in communication, and repetitive, stereotyped, or restricted interests and behaviors. The importance of prosodic deficits in the adaptive communicative competence of speakers with ASD, as well as for a fuller understanding of the social disabilities central to these disorders is generally recognized; yet current studies are few in number and have significant methodological limitations. The objective of the proposed project is to detail prosodic deficits in young speakers with ASD through a series of experiments that address these disabilities and related areas of function. My role: I developed a delexicalization algorithm that rendered the lexical content of an utterance unintelligible, while preserving important acoustic prosodic cues.
17. 2005/01/01–2006/06/30: National Science Foundation IIP-0441125, "STTR Phase 1: Small Footprint Speech Synthesis", PI: Kain (BioSpeech). Text-to-speech (TTS) systems have recognized societal benefits for universal access, education, and information access by voice. For example, TTS-based augmentative devices are available for individuals who have lost their voice; and reading machines for the blind have been available for several decades. My role: I developed and implemented a novel algorithm that led to dramatic decreases in disk and memory requirements at a given speech quality level and minimization of the amount of voice recordings needed to create a new synthetic voice. The latter point enabled building personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds or for individuals who are about to undergo surgery that will irreversibly alter their speech.
18. 2001/10/01–2005/09/30: National Science Foundation IIS-0117911, "Making Dysarthric Speech Intelligible", PI: van Santen (OHSU). My role: I developed software that transforms speech compromised by dysarthria into easier-to-understand and more natural-sounding speech. The strategy for improving intelligibility is the manipulation of a small set of highly relevant speech features; specifically the energy, pitch, and formant frequencies of an input speech waveform. Pitch and energy are appropriately smoothed, and formant frequencies are mapped with a joint-density Gaussian mixture model, a technique I first introduced in 1998 that since has become the most often used mapping technique in the field. Results from perceptual tests indicated that the transformation improved intelligibility, and that the accompanying removal of the vocal fry improved perceived naturalness.

### 4.3 Publications/Creative Work

#### Peer-reviewed Journal Articles and 4–5 page Conference Papers

##### 2016
1. S. Mohammadi, A. Kain, “A Voice Conversion Mapping Function based on a Stacked Joint-Autoencoder”, Interspeech, 2016.
2. B. Snider and A. Kain, “Classification of Respiratory Effort and Disordered Breathing during Sleep from Audio and Pulse Oximetry Signals”, ICASSP, 2016.
##### 2015
1. M. Langarani, J. van Santen, S. Mohammadi, A. Kain, “Data-driven Foot-based Intonation Generator for Text-to-Speech Synthesis”, Interspeech, 2015.
2. S. Mohammadi, A. Kain, “Semi-supervised Training of a Voice Conversion Mapping Function using a Joint-Autoencoder”, Interspeech, 2015.
3. S. Dudy, M. Asgari, and A. Kain, “Pronunciation Analysis for Children with Speech Sound Disorders”, IEEE Engineering in Medicine and Biology society (EMBC), Milan, 2015. (PMC4710861).
##### 2014
1. A. Amano-Kusumoto, J.-P. Hosom, A. Kain, J. Aronoff, “Determining the relevance of different aspects of formant contours to intelligibility”, Speech Communication, vol. 59, April 2014.
2. K. Tjaden, A. Kain, J. Lam, “Hybridizing Conversational and Clear Speech to Investigate the Source of Increased Intelligibility in Parkinson’s Disease”, Journal of Speech, Language, and Hearing Research, Volume 57, August 2014.
3. S. Mohammadi, A. Kain, “Voice conversion using Deep Neural Networks with speaker-independent pre-training”, IEEE Spoken Language Technology Workshop (SLT), 2014.
4. B. Bush, A. Kain, “Modeling Coarticulation in Continuous Speech”, Interspeech 2014.
##### 2013
1. S. Mohammadi, A. Kain, “Transmutative Voice Conversion”, ICASSP, 2013.
2. B. Bush, A. Kain, “Estimating Phoneme Formant Targets and Coarticulation Parameters of Conversational and Clear Speech”, ICASSP, 2013.
3. B. Snider and A. Kain, “Automatic Classification of Breathing Sounds during Sleep”, ICASSP, 2013.
##### 2012
1. S. Mohammadi, A. Kain, J. van Santen, “Making Conversational Vowels More Clear”, Proceedings of Interspeech, 2012.
2. E. Morley, E. Klabbers, J. van Santen, A. Kain, S. Mohammadi, “Synthetic F0 can Effectively Convey Speaker ID in Delexicalized Speech”, Interspeech, 2012.
##### 2011
1. E. Morley, J. van Santen, E. Klabbers, A. Kain, “F0 Range and Peak Alignment across Speakers and Emotions”, ICASSP, 2011.
2. B. Bush, J.-P. Hosom, A. Kain, and A. Amano-Kusumoto, “Using a genetic algorithm to estimate parameters of a coarticulation model”, Interspeech, 2011.
##### 2010
1. A. Kain and T. Leen, “Compression of Line Spectral Frequency Parameters using the Asynchronous Interpolation Model”, Proceedings of 7th ISCA Workshop on Speech Synthesis, September 2010.
2. A. Kain and J. van Santen, “Frequency-domain delexicalization using surrogate vowels”, Interspeech, 2010.
3. A. Amano-Kusumoto, J.-P. Hosom, and A. Kain, “Speaking style dependency of formant targets”, Interspeech, 2010.
4. E. Klabbers, A. Kain, and J. van Santen, “Evaluation of speaker mimic technology for personalizing SGD voices”, Interspeech, 2010.
##### 2009
1. A. Kain, J. van Santen, “Using Speech Transformation to Increase Speech Intelligibility for the Hearing- and Speaking-impaired”, Proceedings of ICASSP, April 2009.
2. Q. Miao, A. Kain, J. van Santen, “Perceptual Cost Function for Cross-fading Based Concatenation”, Proceedings of Interspeech, 2009.
3. R. Moldover, A. Kain, “Compression of Line Spectral Frequency Parameters with Asynchronous Interpolation”, Proceedings of ICASSP, April 2009.
##### 2008
1. A. Kain, A. Amano-Kusumoto, and J.-P. Hosom, “Hybridizing Conversational and Clear Speech to Determine the Degree of Contribution of Acoustic Features to Intelligibility”, Journal of the Acoustical Society of America, vol. 124, issue 4, October 2008, pp. 2308–2319.
##### 2007
1. A. Kain, J. Hosom, X. Niu, J. van Santen, M. Fried-Oken, J. Staehely, “Improving the Intelligibility of Dysarthric Speech”, Speech Communication, vol. 49, issue 9, September 2007, pp. 743–759.
2. E. Klabbers, J. van Santen, A. Kain, “The Contribution of Various Sources of Spectral Mismatch to Audible Discontinuities in a Diphone Database”, IEEE Transactions on Audio, Speech, and Language Processing Journal, Volume 15, Issue 3, pp. 949–956, March 2007.
3. A. Kusumoto, A. Kain, P. Hosom, and J. van Santen, “Hybridizing Conversational and Clear Speech”, Proceedings of Interspeech, August 2007.
4. A. Kain, Q. Miao, J. van Santen, “Spectral Control in Concatenative Speech Synthesis”, Proceedings of 6th ISCA Workshop on Speech Synthesis, August 2007.
5. A. Kain and J. van Santen, “Unit-Selection Text-to-Speech Synthesis Using an Asynchronous Interpolation Model”, Proceedings of 6th ISCA Workshop on Speech Synthesis, August 2007.
##### 2006
1. X. Niu, A. Kain, J. van Santen, “A Noninvasive, Low-cost Device to Study the Velopharyngeal Port During Speech and Some Preliminary Results”, Proceedings of Interspeech, September 2006.
##### 2005
1. J. van Santen, A. Kain, E. Klabbers, and T. Mishra, “Synthesis of Prosody using Multi- level Unit Sequences”, Speech Communication Journal, vol. 46, issues 3–4, pp. 365–375, July 2005.
2. X. Niu, A. Kain, J. van Santen, “Estimation of the Acoustic Properties of the Nasal Tract during the Production of Nasalized Vowels”, Proceedings of EUROSPEECH, September 2005.
##### 2004
1. A. Kain, X. Niu, J. Hosom, Q. Miao, J. van Santen, “Formant Re-synthesis of Dysarthric Speech”, Proceedings of 5th ISCA Workshop on Speech Synthesis, June 2004.
2. J. van Santen, A. Kain, and E. Klabbers, “Synthesis by Recombination of Segmental and Prosodic Information”, Speech Prosody 2004, March 2004.
3. H. Duxans, A. Bonafonte, A. Kain, and J. van Santen, “Including Dynamic and Phonetic Information in Voice Conversion Systems”, Proceedings of ICSLP, October 2004.
##### 2003
1. J. Hosom, A. Kain, T. Mishra, J. van Santen, M. Fried-Oken, J. Staehely, “Intelligibility of modifications to dysarthric speech”, Proceedings of ICASSP, May 2003.
2. A. Kain and J. van Santen, “A speech model of acoustic inventories based on asynchronous interpolation”, Proceedings of EUROSPEECH, pp. 329-332, August 2003.
3. J. van Santen, L. Black, G. Cohen, A. Kain, E. Klabbers, T. Mishra, J. de Villiers, X. Niu, “Applications of computer generated expressive speech for communication disorders”, Proceedings of EUROSPEECH, pp. 1657-1660, August 2003.
##### 2002
1. A. Kain and J. van Santen, “Compression of Acoustic Inventories using Asynchronous Interpolation”, Proceedings of IEEE Workshop on Speech Synthesis, pp. 83-86, September 2002.
2. J. van Santen, J. Wouters, and A. Kain, “Modification of Speech: A Tribute to Mike Macon”, Proceedings of IEEE Workshop on Speech Synthesis, September 2002.
##### 2001
1. A. Kain and M. Macon, “Design and Evaluation of a Voice Conversion Algorithm based on Spectral Envelope Mapping and Residual Prediction”, Proceedings of ICASSP, May 2001.
##### 2000 and earlier
1. A. Kain and Y. Stylianou, “Stochastic Modeling of Spectral Adjustment for High Quality Pitch Modification”, Proceedings of ICASSP, June 2000, vol. 2, pp. 949–952.
2. J. House, A. Kain, and J. Hines, “ESP - Metaphor for learning: an evolutionary algorithm”, Proceedings of GECCO 2000, Las Vegas, NV.
3. A. Kain and M. Macon, “Personalizing a speech synthesizer by voice adaptation”, Third ESCA / COCOSDA International Speech Synthesis Workshop, November 1998, pp. 225–230.
4. A. Kain and M. Macon, “Text-to-speech voice adaptation from sparse training data”, Proceedings of ICSLP, November 1998, vol. 7, pp. 2847–50.
5. A. Kain and M. Macon, “Spectral Voice Conversion for Text-to-Speech Synthesis”, Proceedings of ICASSP, May 1998, vol. 1, pp. 285–288.
6. S. Sutton, R. Cole, J. de Villiers, J. Schalkwyk, P. Vermeulen, M. Macon, Y. Yan, E. Kaiser, B. Rundle, K. Shobaki, P. Hosom, A. Kain, J. Wouters, D. Massaro, M. Cohen, “Universal Speech Tools: The CSLU Toolkit”, Proceedings of ICSLP, November 1998, vol. 7, pp. 3221–24.
7. N. Malayath, H. Hermansky, A. Kain and R. Carlson, “Speaker-independent Feature Extraction by Oriented Principal Component Analysis”, Proceedings of EUROSPEECH 1997.

#### Abstracts

1. J.-P. Hosom, A. Kain, and B. Bush, “Towards the recovery of targets from coarticulated speech for automatic speech recognition”, The Journal of the Acoustical Society of America, 130(4), page 2407, 2011.
2. A. Kain, "Speech transformation: Increasing intelligibility and changing speakers", Journal of the Acoustical Society of America, 126(4), page 2205, 2009.

#### Ph. D. Thesis

High Resolution Voice Transformation”, OGI School of Science & Engineering, 2001.