# Alexander Kain — Curriculum Vitae

ORCID 0000-0001-5807-9311
http://cslu.ohsu.edu/~kain

## 1 Present Positions

### Oregon Health & Science University

Associate Professor
Computer Science & Electrical Engineering (CSEE)
Center for Spoken Language Understanding (CSLU)
Institute on Development and Disability (IDD)
Department of Otolaryngology and Department of Pediatrics
Institute on Development & Disability (IDD)
School of Medicine (SOM)
Oregon Health & Science University (OHSU)
3181 SW Sam Jackson Park Road
Portland, Oregon 97239-3098
Email: kaina@ohsu.edu
Phone: (503) 349-3750

### BioSpeech, Inc.

Chief Technology Innovation Officer (CTIO)
9946 SW 61st Ave
Portland, Oregon 97239-3098
Email: kain@biospeech.com

## 3 Professional Experience

### Other

• 2018–present, Chief Technology Innovation Officer (CTIO)
2005–2018, Chief Scientist
BioSpeech, Inc., Portland, OR
• 2001–2008, Lead Speech Synthesis Technologist
Sensory, Inc., Santa Clara, CA
• 1999, Visiting Researcher
AT&T Research Labs, Florham Park, NJ

## 4 Scholarship

### 4.1 Areas of Research/Scholarly Interest

I am interested in innovation, application, and education in computational biomedicine, drawing on many years of machine learning and biological signal processing experience. Recent innovations include:
• Representation learning is useful for transferring learned knowledge to tasks for which few or no examples are given but a task representation exists. Specifically, many traditional approaches to high-dimensional regression are deficient in representing the global variance of the targets without additional post-processing. We are exploring how a novel joint autoencoder architecture can address this shortcoming.
• We are experimenting with a novel decomposing autoencoder architecture to decompose parallel data streams into separate aspects such as “content” vs. “style”.
Recent applications include:
• Phonological disorders affect 10% of preschool and school-age children, adversely affecting their communication, academic performance, and interaction level. Effective pronunciation training requires prolonged supervised practice and interaction. Unfortunately, many children do not have access or only limited access to a speech- language pathologist. Computer-assisted pronunciation training has the potential for being a highly effective teaching aid; however, to-date such systems remain incapable of identifying pronunciation errors with sufficient accuracy. We are experimenting with a phonetic-feature-based speech recognizer that allows for a very detailed classification of speech sounds.
• Speech Intelligibility is the degree to which listeners can understand a speech signal's message. Historically, the specific acoustic sources of intelligibility are poorly understood, and automatic approaches to modify the degree of intelligibility were limited. We invented a hybridization approach that allows for precisely measuring the degree of contribution of one or more acoustic features to speech intelligibility. We applied this approach to find the most relevant acoustic features that cause the intelligibility improvement in clearly-spoken typical and dysarthric speech. This allows a principled study of different remedial strategies. We are also creating algorithms that automatically improve the intelligibility of dysarthric or conversational speech signals, using approaches from speech analysis, machine learning, and speech synthesis. These algorithms may be instrumental for next-generation hearing- and speaking-aids.
• Sleep-disordered breathing is a highly prevalent condition associated with many adverse health problems. As the current means of diagnosis (polysomnography) is obtrusive and ill-suited for mass screening of the population, we explore a minimal-contact, automatic approach that uses acoustics-based methods in conjunction with pulse oximetry.
As an educator in machine learning, I have created the curriculum for, and regularly teach, the course CS627 Data Science Programming. I also teach EE 682 Digital Signal Processing and EE 658 Speech Signal Processing. My lectures make use of a python-based jupyter-notebook approach.

### 4.2 Current Collaborators

• Jeanne-Marie Guise (OHSU): Epidemiology of Preventable Safety Events
• Jun Wang (University of Texas, Austin): Silent Speech Interface
• Miranda Lim (OHSU): REM Sleep Behavior Disorder
• Kris Tjaden, Fredrik Van Brenk (University at Buffalo): dysarthria
• Lina Reiss (OHSU): cochlear implants
• Deanna Britton (PSU/OHSU): dystussia
• Michelle Molis (VA): speech intelligibility
• Derek Lam (OHSU): airway obstruction
• Frederick Shic (University of Washington), Tyler Duffield (OHSU), Trevor Hall (OHSU): ASD and Virtual Reality

### 4.3 Grants

#### Planned/Pending

1. National Institute of Health R01, “Silent Speech Interfaces”, PI: Wang (OHSU), awarded.
2. National Institute of Health R01, “Binaural Spectral Integration with Hearing Loss and Hearing Devices”, PI: Reiss (OHSU) scored 36 (23rd percentile) on 2018/03/05, resubmission is planned.
3. National Institute of Health R21, “Utility of cough-related airflow measures in determining laryngeal impairment”, PI: Britton (OHSU) scored on 2017/06/12, resubmission is planned.
4. National Institute of Health R01, “Epidemiology of Preventable Safety Events in pre-hospital EMS for Children”, PI: Jeanne-Marie Guise (OHSU).
5. National Institute of Health R01, “Computer-Assisted Pronunciation Analysis and Training for children with speech sound disorders”, PI: Kain (OHSU), in development.
6. National Institute of Health R01, “Automatic Detection of REM Sleep Behavior Disorder”, PI: Lim (OHSU), in development.

#### Current

1. 2015/09/01–2020/08/31: National Institute of Health 2R01DC004689-11A1, “Therapeutic Approaches to Dysarthria: Acoustic and Perceptual Correlates”, PIs: Tjaden (State University of New York at Buffalo) and Kain (OHSU). 90% of the one million Americans living with idiopathic Parkinson’s disease (PD) and 50% of the 500,000 Americans living with Multiple Sclerosis (MS) will experience dysarthria at some point during the disease. The perceptual sequelae of dysarthria have devastating consequences for quality of life and participation in society by virtue of their effect on social and psychological variables such as employment, leisure activities and relationships. Knowledge of therapy techniques for maximizing perceived speech adequacy, as indexed by intelligibility, therefore is of paramount importance. As a result of our incomplete knowledge of the comparative merits of dysarthria therapy techniques and their variants, however, the choice of a particular technique is not based on a rigorous research base, but is based on either trial and error or the clinician’s educational and experience biases. The proposed project will address these barriers by comparing the acoustic and perceptual consequences of rate reduction, increased vocal intensity and clear speech variants in MS and PD. Our approach is to employ established acoustic measures and perceptual paradigms as well as a state-of-the-art speech re-synthesis technique that will permit conclusions concerning the underlying speech production characteristics, as inferred from the acoustic signal, causing improved intelligibility. Amount: \$3.3M.
2. 2017/05/01–2020/04/30: National Institute of Health 4R44DC015145-02, “SBIR Phase 2: Prosody Assessment Toolbox”, PI: Lindaas-Hamilton (BioSpeech). Abnormal receptive or expressive prosody is present in a wide range of disorders, including Autism Spectrum Disorder (ASD), Cognitive Impairment, Down’s syndrome, dysarthria, Parkinson’s disease, depression, schizophrenia, aphasia, Alzheimer’s disease, TBI, Language Impairment, bipolar disorder, ADHD, and PTSD. The characteristics of these prosodic abnormalities and underlying brain dysfunction are still largely unknown, due to the dearth of instruments for assessing prosodic deficits. Building on our broad expertise in computerized prosody assessment, we propose to build a system for researchers that performs automated scoring and acoustic analysis of expressive prosody, allows stimuli to be acoustically modified for detailed perceptual assessment of receptive prosody, and can be extended by researchers to include novel tasks. My role: I provide signal processing and machine-learning expertise, and assist with all scientific aspects of the project. Amount: \$706K.

#### Completed

1. 2015/12/10–2019/04/30: National Institute of Health 5R01DC013996-02, “Automatic Voice-Based Assessment of Language Abilities”, PI: van Santen (OHSU). Since untreated language disorder can lead to serious behavioral and educational problems, large-scale early language assessment is urgently needed not only for early identification of language disorder but also for planning interventions and tracking progress. However, such large-scale efforts would pose a large burden on professional staff and on other scarce resources. As a result, clinicians, educators, and researchers have argued for the use of computer based assessment. Recently, progress has been made with computer based language assessment, but it has been limited to language comprehension. One contributing factor is that a key technology needed for this, Automatic Speech Recognition (ASR), is perceived as inadequate for accurate scoring of language tests since even the best ASR systems have word error rates in excess of 20%. However, this perception is based on a limited perspective of how ASR can be used for assessment, in which a general-purpose ASR system provides an (often inaccurate) transcript of the child's speech, which then would be scored automatically according to conventional rules. We take an alternative perspective, and propose an innovative approach that comprises two core concepts: (1) creating a special-purpose, test-specific ASR systems whose search space is carefully matched to the space of responses a test may elicit, and (2) integrating these systems with machine-learning based scoring algorithms whereby the latter operate not on the final, best transcript generated by the ASR system, but on the rich layers of intermediate representations that the ASR system computes in the process of recognizing the input speech. My role: (1) Developing automatic voice-based scoring methods for each language test, (2) developing pronunciation screening methods to detect atypical speech, and (3) evaluating the accuracy of automatic voice-based scoring, stopping, and pronunciation screening systems, and comparing TD group with groups with neuro-developmental disorders. Amount: \$638K.
2. 2014/09/01–2017/08/31: National Institute of Health 1R43MH101978-01A1, “System for automatic classification of rodent vocalizations”, PI: Lahvis (BioSpeech). Development of treatments for neuropsychiatric disorders presents a formidable challenge. To advance drug discovery, assessments of laboratory rodents are widely employed by academia and industry to model neuropsychiatric disorders. Substantial recent advances in digital recordings of rodent ultrasonic vocalizations (USVs) have engendered interest in assessment of USVs to measure behavior change. A practical obstacle to USV assessment is that they are classified manually. We propose a software system that allows a user to rapidly interrogate recordings of rodent USVs for prosodic content. My role: I provide signal processing and machine-learning expertise, and assist with all scientific aspects of the project. Amount: \$324K.
3. 2013/12/01–2017/08/31: National Institute of Health 1R43DA037588-01A1, “Screening for Sleep Disordered Breathing with Minimally Obtrusive Sensors”, PI: Snider (BioSpeech). Sleep disordered breathing (SDB) is believed to be a widespread, under-diagnosed condition associated with detrimental health problems, at a high cost to society. The current gold standard for diagnosis of SDB is a time-consuming, expensive, and obtrusive (requiring many attached wires) sleep study, or polysomnography (PSG). The immediate objective of our research is to develop and evaluate a hardware design and a set of algorithms for automatically detecting obstructive, central, or mixed apneas and hypopneas from acoustic, peripheral oxygen saturation (SpO2), and pulse rate data, using an ambient microphone and a wireless pulse oximeter. The long-term goal is to create a low-cost, easy-to-operate, minimally obtrusive, at-home device that can be used for early and frequent screen for SDB in patients' homes, significantly increasing patient comfort while capturing more representative sleep data compared to a clinical sleep study. In collaboration with Chad Hagen, M. D. at the Sleep Disorders Program at OHSU, we aim to (1) develop a screening system by selecting minimally obtrusive sensor hardware and extending state-of-the art algorithms for automatically detecting SDB from acoustic, SpO2, and pulse rate data; (2) collect patient data in the sleep lab and at home from representative populations using the proposed system; (3) determine the screening accuracy by comparing the performance of the proposed system on the collected data against standard PSG-derived clinical results; and (4) measure the usability of an at-home screening device by the target population, by asking subjects who participated in the at-home data collection to complete a survey on various aspects of the setup and operation of the proposed system. My role: I provide signal-processing and machine-learning expertise, and assist with all scientific aspects of the project. Amount: \$205K.
4. 2016/01/01–2017/04/01: National Institute of Health 1R44DC015145-01, “SBIR Phase 1: Prosody Assessment Toolbox”, PI: Connors (BioSpeech). Current instruments for assessing prosodic deficits are decades behind those that are used for clinical assessment of other aspects of language. We propose to build a system that addresses these shortcomings. The system performs automated scoring and acoustic analysis of expressive prosody, allows stimuli to be acoustically modified for detailed perceptual assessment of receptive prosody, and can be extended by researchers to include novel tasks. It is evaluated with individuals who have ASD (adults and children), DS (adults and children), or MCI, and a typically developing control group. My role: I provide signal processing and machine-learning expertise, and assist with all scientific aspects of the project. Amount: \$1.6M.
5. 2010/09/27–2016/09/30: National Science Foundation BCS-1027834, "Computational Models for the Automatic Recognition of Non-Human Primate Social Behaviors", PI: Kain (OHSU). To develop methods that will permit researchers to remotely and automatically monitor behavior of primates and other highly social animals. Amount: \$578K.
6. 2012/04/01–2015/03/31: National Institute of Health 5R44DC009515-03, "SBIR Phase 2: Computer-based auditory skill building program for aural (re)habilitation", PI: Connors (BioSpeech). To extend an adaptive computer-guided software program that focuses on learning phoneme discrimination and identification. See Phase I description. Amount: \$400K.
7. 2011/12/01–2015/08/31: National Institute of Health R21DC012139, "Computer-Based Pronunciation Analysis for Children with Speech Sound Disorders", PI: Kain (OHSU). In this work we are developing speech-production assessment and pronunciation training tools for children with speech sound disorders. To-date, computer-assisted pronunciation training has not yet been successfully extended to help children with speech sound disorders, primarily because of a lack of accuracy in phoneme-level analysis of the speech signal. My role: I am creating a set of algorithms that will reliably identify and score the intelligibility of a phoneme within an isolated target word, providing immediate, relevant, and understandable feedback about pronunciation errors. The use of human perceptual data during training is an important and new component of the proposed approach. As PI, I am also responsible for overall project supervision and management. Amount: \$416K.
8. 2010/06/09–2015/05/31: National Science Foundation IIS-0964102, "Semi-Supervised Discriminative Training of Language Models", PI: Kain (OHSU). To conduct fundamental research in statistical language modeling to improve human language technologies, including automatic speech recognition (ASR) and machine translation (MT). Amount: \$519K.
9. 2010/05/15–2015/04/30: National Science Foundation IIS-0964468, "HCC: Medium: Synthesis and Perception of Speaker Identity", PI: Kain (OHSU). Millions of Americans with impaired or absent speech communication ability rely on Augmentative and Alternative Communication devices with voice output (Speech Generating Devices, or SGDs) to communicate. A psychologically important and desirable feature is the ability to speak with one's own voice, i. e. the ability for the SGD to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. However, current text-to-speech (TTS) systems can only create speech with one or very few supplied speaker characteristics, and cannot be trained to take on the user's voice. My role: Together with Ph. D. students and co-investigators, I am creating a TTS synthesis system that generates speech that sounds like that of a specific individual (Speaker Identity Synthesis, or SIS). In the process we are building and evaluating analysis and synthesis models of the relevant acoustic features, including pitch, duration, and spectrum. Since the system includes a trainability component, this project also involves use of advanced mapping technology in the form of a joint-density Gaussian mixture model. I first proposed this approach in a 1998 publication which has since been cited over 370 times. As PI, I am also responsible for overall project supervision, management, and mentorship of graduate student Mohammadi. Amount: \$905K.
10. 2011/04/01–2012/03/31: National Institute of Health 5R42DC008712, "User Adaptation of AAC Device Voices - Phase 2", PI: Klabbers (BioSpeech). Developing and evaluating voice transformation and prosody modification technologies to customize synthetic voices in AAC devices, mimicking the individual user's pre-morbid speech. See Phase 1 description. Amount: \$410K.
12. 2009/09/01–2013/08/31: National Science Foundation IIS-0915754, "RI: Small: Modeling Coarticulation for Automatic Speech Recognition", PI: Kain (OHSU). We have developed a data-driven, triphone formant trajectory model and methodology for estimating its parameters. In this model, formant targets are speaker dependent, but independent of speaking style. We have validated this model using perceptual listening tests. An analysis of conversationally and clearly spoken speech confirmed that (1) formant trajectories in clear vowels reach their targets more frequently, (2) formants show considerable asynchronicity, and (3) phoneme formant targets approximate their expected values. We also found preliminary evidence that targets derived from clear speech alone perform better at modeling both styles than targets from conversational speech. Having created and validated this model, we are now in the process of applying the approach to disordered speech, paving the way for an objective diagnosis of the degree of coarticulation of dysarthria. Another application is an objective evaluation of the effectiveness of specific speech interventions for certain kinds of dysarthria, e. g. the Lee Silverman Voice Treatment. Finally, this research may also provide an avenue for automatically transforming conversationally-spoken speech to sound as if it had been spoken clearly, thus increasing its intelligibility. A real-time, transparent version of this algorithm would be a desirable feature in many general telecommunications devices. My role: As PI, I am responsible for all aspects of the project, including overall project supervision and management, as well as mentoring of graduate student Bush. Amount: \$466.
13. 2009/07/15–2012/06/30: National Science Foundation IIS-0905095, "HCC: Automatic detection of atypical patterns in cross-modal affect", PI: van Santen (OHSU).The expression of affect in face-to-face situations requires the ability to generate a complex, coordinated, cross-modal affective signal, having gesture, facial expression, vocal prosody, and language content modalities. This ability is compromised in neurological disorders such as Parkinson's disease and autism spectrum disorder (ASD). The long term goal is to build computer-based interactive systems for remediation of poor affect communication and diagnosis of the underlying neurological disorders based on analysis of affective signals. A requirement for such systems is technology to detect atypical patterns in affective signals. We developed a play situation for eliciting affect and collected audio-visual data from approximately 60 children between the ages of 4–7 years old, half of them with ASD and the other half constituting a control group of typically developing children. We labeled the data on relevant affective dimensions, developed algorithms for the analysis of affective incongruity, and then tested the algorithms against the labeled data in order to determine their ability to differentiate between ASD and typical development. My role: I created special delexicalized speech stimuli, using a novel delexicalization algorithm that rendered the lexical content of an utterance unintelligible while preserving important acoustic prosodic cues. Preference tests showed that the proposed method preserved drastically more speaker identity, and sounded more natural than conventional methods. These delexicalized speech stimuli were used in perceptual tests to exclude the effect of lexical content on affect.
14. 2009/07/17–2012/06/30: National Institute of Health 5R21DC010035, "Quantitative Modeling of Segmental Timing in Dysarthria", PI: van Santen (OHSU). The project seeks to apply a quantitative modeling framework to segment durations in sentences produced by speakers with a variety of neurological diagnoses and dysarthrias. My role: I was responsible for software development for custom recording of speech data and for the extension of my previously published hybridization algorithm for the purposes of creating special perceptual speech stimuli.
15. 2008–2009: Nancy Lurie Marks Family Foundation award, "In Your Own Voice: Personal AAC Voices for Minimally Verbal Children with Autism Spectrum Disorder", PI: van Santen (OHSU). My role: I performed research and development to adapt a text-to-speech voice to sound like a particular child's voice; a task made particularly challenging by the difficulty of extracting reliable acoustic features from children's speech.
16. 2007/09/01–2011/08/31: National Science Foundation IIS-0713617, "HCC: High-quality Compression, Enhancement, and Personalization of Text-to-Speech Voices", PI: Kain (OHSU). My role: Together with Ph. D. students and co-investigators, I developed text-to-speech (TTS) technologies that focus on elimination of concatenation errors and improved accuracy in the areas of coarticulation, degree of articulation, prosodic effects, and speaker characteristics, using an asynchronous interpolation model that Jan van Santen and I proposed in 2002. These algorithmic advances added to the general acceptability of Speech Generating Devices (SGDs), used by individuals with impaired or absent speech communication.
17. 2007/01/01–2008/06/30: National Institute of Health 1R41DC008712, "User Adaptation of AAC Device Voices - Phase 1", PI: van Santen (BioSpeech). Speech communication ability is impaired or absent in millions of Americans due to neurological disorders and diseases and to trauma, including autism, Parkinson's disease, and stroke. Augmentative and Alternative Communication (AAC) devices that are operated via switches, keyboards, and a broad range of other input devices, and that have synthetic speech as output, are often the only manner in which these individuals can communicate. A psychologically important feature that no currently available systems have is the ability to speak with the user's voice, i.e., the ability to produce speech that mimics the individual's pre-morbid speech or speech that the individual may be able to intermittently produce. This project used voice transformation (VT) technology to accomplish this goal. My role: I developed and evaluated voice transformation and prosody modification technologies to customize synthetic voices using concatenative speech synthesis technologies, with the aim of mimicking the individual user's pre-morbid speech.
18. 2006/09/01–2008/03/31: National Institute of Health 1R41DC007240, "Voice Transformation for Dysarthria - Phase 1", PI: van Santen (BioSpeech). Dysarthria is a motor speech disorder due to weakness or poor co- ordination of the speech muscles. Affected muscles include the lungs, larynx, oro- and nasopharynx, soft palate, and articulators (lips, tongue, teeth, and jaw). The degree to which these muscle groups are compromised determines the particular pattern of speech impairment. For example, poor lung function affects the overall volume or loudness, while problems with specific articulators may cause mispronunciations of certain phonemes. There is a great variety of diseases that can cause dysarthria, including Parkinson’s, Multiple Sclerosis, and strokes. My role: I continued development of software that transforms speech compromised by dysarthria into easier-to-understand and more natural-sounding speech. In addition, I designed a hardware configuration that allowed the software to reside on a wearable computer, with a headset microphone as input and powered speaker as output, giving the user full mobility while wearing the speaking-aid.
19. 2005/01/10–2010/12/31: National Institute of Health 5R01DC007129, "Expressive cross-modal affect integration in Autism", PI: van Santen (OHSU). Autistic Spectrum Disorders (ASD) form a group of neuropsychiatric conditions whose core behavioral features include impairments in reciprocal social interaction, in communication, and repetitive, stereotyped, or restricted interests and behaviors. The importance of prosodic deficits in the adaptive communicative competence of speakers with ASD, as well as for a fuller understanding of the social disabilities central to these disorders is generally recognized; yet current studies are few in number and have significant methodological limitations. The objective of the proposed project is to detail prosodic deficits in young speakers with ASD through a series of experiments that address these disabilities and related areas of function. My role: I developed a delexicalization algorithm that rendered the lexical content of an utterance unintelligible, while preserving important acoustic prosodic cues.
20. 2005/01/01–2006/06/30: National Science Foundation IIP-0441125, "STTR Phase 1: Small Footprint Speech Synthesis", PI: Kain (BioSpeech). Text-to-speech (TTS) systems have recognized societal benefits for universal access, education, and information access by voice. For example, TTS-based augmentative devices are available for individuals who have lost their voice; and reading machines for the blind have been available for several decades. My role: I developed and implemented a novel algorithm that led to dramatic decreases in disk and memory requirements at a given speech quality level and minimization of the amount of voice recordings needed to create a new synthetic voice. The latter point enabled building personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds or for individuals who are about to undergo surgery that will irreversibly alter their speech.
21. 2001/10/01–2005/09/30: National Science Foundation IIS-0117911, "Making Dysarthric Speech Intelligible", PI: van Santen (OHSU). My role: I developed software that transforms speech compromised by dysarthria into easier-to-understand and more natural-sounding speech. The strategy for improving intelligibility is the manipulation of a small set of highly relevant speech features; specifically the energy, pitch, and formant frequencies of an input speech waveform. Pitch and energy are appropriately smoothed, and formant frequencies are mapped with a joint-density Gaussian mixture model, a technique I first introduced in 1998 that since has become the most often used mapping technique in the field. Results from perceptual tests indicated that the transformation improved intelligibility, and that the accompanying removal of the vocal fry improved perceived naturalness.

### 4.4 Publications/Creative Work

In the following lists, students or interns under my mentorship are underlined.

#### Peer-reviewed Journal Articles and 4–5 page Conference Papers

##### In preparation / submitted
• D. Britton, A. Kain, Y-W. Chen, J. Wiedrick, J. O. Benditt, A. L. Merati, D. Graville, “Extreme sawtooth sign in motor neuron disease (MND) suggests laryngeal resistance to forced airflow”, submitted to Journal of Speech, Language, and Hearing Research
• N. Sathe, A. Kain, and L. Reiss, “Fusion and Identification of Dichotic Consonants in Normal-Hearing and Hearing-Impaired Listeners”
• B. Bush, A. Kain, “A Continuous Speech Coarticulation Model and Its Application to Conversational and Clear Speech”
• S. Mohammadi, A. Kain, “Joint Deep Autoencoders for High-Dimensional Regression and the Application to Voice Conversion”
• A. Kain, T. Dinh, Y. Chen, J. Melanson, K. Tjaden, “Intra- and Inter-Speaker Sub-Segmental Duration Conversion”
• K. Tjaden, A. Kain, G. Wilding, “Clear Speech Variants: An Investigation of Intelligibility in Parkinson’s Disease”
• B. Snider, A. Kain, “Automatic Sleep Apnea Detection”
##### 2019
1. F. Van Brenk, A. Kain, K. Tjaden, “Identifying Acoustic Correlates of Speaker-Dependent Variation in Slowed Speech Intelligibility: A Hybridization Approach”, International Congress of Phonetic Sciences (ICPhS), 2019.
2. D. Tuan, A. Kain, K. Tjaden, “Using a Manifold Vocoder for Spectral Voice and Style Conversion”, Interspeech, 2019.
##### 2018
1. S. Dudy, S. Bedrick, M. Asgari, A. Kain, “Automatic Analysis of Pronunciations for Children with Speech Sound Disorders”, Computer Speech & Language Journal, 2018.
##### 2017
1. A. Kain, M. Del Giudice, K. Tjaden, “A Comparison of Sentence-level Speech Intelligibility Metrics”, Interspeech, 2017.
2. S. Mohammadi, A. Kain, “Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion”, Interspeech, 2017.
3. S. Mohammadi, A. Kain, “An Overview of Voice Conversion Systems”, Speech Communication, 2017.
##### 2016
1. S. Mohammadi, A. Kain, “A Voice Conversion Mapping Function based on a Stacked Joint-Autoencoder”, Interspeech, 2016.
2. B. Snider and A. Kain, “Classification of Respiratory Effort and Disordered Breathing during Sleep from Audio and Pulse Oximetry Signals”, ICASSP, 2016.
##### 2015
1. M. Langarani, J. van Santen, S. Mohammadi, A. Kain, “Data-driven Foot-based Intonation Generator for Text-to-Speech Synthesis”, Interspeech, 2015.
2. S. Mohammadi, A. Kain, “Semi-supervised Training of a Voice Conversion Mapping Function using a Joint-Autoencoder”, Interspeech, 2015.
3. S. Dudy, M. Asgari, and A. Kain, “Pronunciation Analysis for Children with Speech Sound Disorders”, IEEE Engineering in Medicine and Biology society (EMBC), Milan, 2015. (PMC4710861).
##### 2014
1. A. Amano-Kusumoto, J.-P. Hosom, A. Kain, J. Aronoff, “Determining the relevance of different aspects of formant contours to intelligibility”, Speech Communication, vol. 59, April 2014.
2. K. Tjaden, A. Kain, J. Lam, “Hybridizing Conversational and Clear Speech to Investigate the Source of Increased Intelligibility in Parkinson’s Disease”, Journal of Speech, Language, and Hearing Research, Volume 57, August 2014.
3. S. Mohammadi, A. Kain, “Voice conversion using Deep Neural Networks with speaker-independent pre-training”, IEEE Spoken Language Technology Workshop (SLT), 2014.
4. B. Bush, A. Kain, “Modeling Coarticulation in Continuous Speech”, Interspeech 2014.
##### 2013
1. S. Mohammadi, A. Kain, “Transmutative Voice Conversion”, ICASSP, 2013.
2. B. Bush, A. Kain, “Estimating Phoneme Formant Targets and Coarticulation Parameters of Conversational and Clear Speech”, ICASSP, 2013.
3. B. Snider and A. Kain, “Automatic Classification of Breathing Sounds during Sleep”, ICASSP, 2013.
##### 2012
1. S. Mohammadi, A. Kain, J. van Santen, “Making Conversational Vowels More Clear”, Proceedings of Interspeech, 2012.
2. E. Morley, E. Klabbers, J. van Santen, A. Kain, S. Mohammadi, “Synthetic F0 can Effectively Convey Speaker ID in Delexicalized Speech”, Interspeech, 2012.
##### 2011
1. E. Morley, J. van Santen, E. Klabbers, A. Kain, “F0 Range and Peak Alignment across Speakers and Emotions”, ICASSP, 2011.
2. B. Bush, J.-P. Hosom, A. Kain, and A. Amano-Kusumoto, “Using a genetic algorithm to estimate parameters of a coarticulation model”, Interspeech, 2011.
##### 2010
1. A. Kain and T. Leen, “Compression of Line Spectral Frequency Parameters using the Asynchronous Interpolation Model”, Proceedings of 7th ISCA Workshop on Speech Synthesis, September 2010.
2. A. Kain and J. van Santen, “Frequency-domain delexicalization using surrogate vowels”, Interspeech, 2010.
3. A. Amano-Kusumoto, J.-P. Hosom, and A. Kain, “Speaking style dependency of formant targets”, Interspeech, 2010.
4. E. Klabbers, A. Kain, and J. van Santen, “Evaluation of speaker mimic technology for personalizing SGD voices”, Interspeech, 2010.
##### 2009
1. A. Kain, J. van Santen, “Using Speech Transformation to Increase Speech Intelligibility for the Hearing- and Speaking-impaired”, Proceedings of ICASSP, April 2009.
2. Q. Miao, A. Kain, J. van Santen, “Perceptual Cost Function for Cross-fading Based Concatenation”, Proceedings of Interspeech, 2009.
3. R. Moldover, A. Kain, “Compression of Line Spectral Frequency Parameters with Asynchronous Interpolation”, Proceedings of ICASSP, April 2009.
##### 2008
1. A. Kain, A. Amano-Kusumoto, and J.-P. Hosom, “Hybridizing Conversational and Clear Speech to Determine the Degree of Contribution of Acoustic Features to Intelligibility”, Journal of the Acoustical Society of America, vol. 124, issue 4, October 2008, pp. 2308–2319.
##### 2007
1. A. Kain, J. Hosom, X. Niu, J. van Santen, M. Fried-Oken, J. Staehely, “Improving the Intelligibility of Dysarthric Speech”, Speech Communication, vol. 49, issue 9, September 2007, pp. 743–759.
2. E. Klabbers, J. van Santen, A. Kain, “The Contribution of Various Sources of Spectral Mismatch to Audible Discontinuities in a Diphone Database”, IEEE Transactions on Audio, Speech, and Language Processing Journal, Volume 15, Issue 3, pp. 949–956, March 2007.
3. A. Kusumoto, A. Kain, P. Hosom, and J. van Santen, “Hybridizing Conversational and Clear Speech”, Proceedings of Interspeech, August 2007.
4. A. Kain, Q. Miao, J. van Santen, “Spectral Control in Concatenative Speech Synthesis”, Proceedings of 6th ISCA Workshop on Speech Synthesis, August 2007.
5. A. Kain and J. van Santen, “Unit-Selection Text-to-Speech Synthesis Using an Asynchronous Interpolation Model”, Proceedings of 6th ISCA Workshop on Speech Synthesis, August 2007.
##### 2006
1. X. Niu, A. Kain, J. van Santen, “A Noninvasive, Low-cost Device to Study the Velopharyngeal Port During Speech and Some Preliminary Results”, Proceedings of Interspeech, September 2006.
##### 2005
1. J. van Santen, A. Kain, E. Klabbers, and T. Mishra, “Synthesis of Prosody using Multi- level Unit Sequences”, Speech Communication Journal, vol. 46, issues 3–4, pp. 365–375, July 2005.
2. X. Niu, A. Kain, J. van Santen, “Estimation of the Acoustic Properties of the Nasal Tract during the Production of Nasalized Vowels”, Proceedings of EUROSPEECH, September 2005.
##### 2004
1. A. Kain, X. Niu, J. Hosom, Q. Miao, J. van Santen, “Formant Re-synthesis of Dysarthric Speech”, Proceedings of 5th ISCA Workshop on Speech Synthesis, June 2004.
2. J. van Santen, A. Kain, and E. Klabbers, “Synthesis by Recombination of Segmental and Prosodic Information”, Speech Prosody 2004, March 2004.
3. H. Duxans, A. Bonafonte, A. Kain, and J. van Santen, “Including Dynamic and Phonetic Information in Voice Conversion Systems”, Proceedings of ICSLP, October 2004.
##### 2003
1. J. Hosom, A. Kain, T. Mishra, J. van Santen, M. Fried-Oken, J. Staehely, “Intelligibility of modifications to dysarthric speech”, Proceedings of ICASSP, May 2003.
2. A. Kain and J. van Santen, “A speech model of acoustic inventories based on asynchronous interpolation”, Proceedings of EUROSPEECH, pp. 329-332, August 2003.
3. J. van Santen, L. Black, G. Cohen, A. Kain, E. Klabbers, T. Mishra, J. de Villiers, X. Niu, “Applications of computer generated expressive speech for communication disorders”, Proceedings of EUROSPEECH, pp. 1657-1660, August 2003.
##### 2002
1. A. Kain and J. van Santen, “Compression of Acoustic Inventories using Asynchronous Interpolation”, Proceedings of IEEE Workshop on Speech Synthesis, pp. 83-86, September 2002.
2. J. van Santen, J. Wouters, and A. Kain, “Modification of Speech: A Tribute to Mike Macon”, Proceedings of IEEE Workshop on Speech Synthesis, September 2002.
##### 2001
1. A. Kain and M. Macon, “Design and Evaluation of a Voice Conversion Algorithm based on Spectral Envelope Mapping and Residual Prediction”, Proceedings of ICASSP, May 2001.
##### 2000 and earlier
1. A. Kain and Y. Stylianou, “Stochastic Modeling of Spectral Adjustment for High Quality Pitch Modification”, Proceedings of ICASSP, June 2000, vol. 2, pp. 949–952.
2. J. House, A. Kain, and J. Hines, “ESP - Metaphor for learning: an evolutionary algorithm”, Proceedings of GECCO 2000, Las Vegas, NV.
3. A. Kain and M. Macon, “Personalizing a speech synthesizer by voice adaptation”, Third ESCA / COCOSDA International Speech Synthesis Workshop, November 1998, pp. 225–230.
4. A. Kain and M. Macon, “Text-to-speech voice adaptation from sparse training data”, Proceedings of ICSLP, November 1998, vol. 7, pp. 2847–50.
5. A. Kain and M. Macon, “Spectral Voice Conversion for Text-to-Speech Synthesis”, Proceedings of ICASSP, May 1998, vol. 1, pp. 285–288.
6. S. Sutton, R. Cole, J. de Villiers, J. Schalkwyk, P. Vermeulen, M. Macon, Y. Yan, E. Kaiser, B. Rundle, K. Shobaki, P. Hosom, A. Kain, J. Wouters, D. Massaro, M. Cohen, “Universal Speech Tools: The CSLU Toolkit”, Proceedings of ICSLP, November 1998, vol. 7, pp. 3221–24.
7. N. Malayath, H. Hermansky, A. Kain and R. Carlson, “Speaker-independent Feature Extraction by Oriented Principal Component Analysis”, Proceedings of EUROSPEECH 1997.

#### Abstracts

1. N. Sathe, A. Kain, and L. Reiss, “Fusion and Identification of Dichotic Consonants in Normal-Hearing and Hearing-Impaired Listeners”, ARO Mid-Winter Meeting, 2019.
2. D. Britton, A. Kain, Y-W. Chen, J. Wiedrick, J. O. Benditt, A. L. Merati, D. Graville, “Extreme sawtooth sign in motor neuron disease (MND) suggests laryngeal resistance to forced airflow”, Fall Voice Conference, 2018.
3. B. Snider and A. Kain, “Estimation of Localized Ideal Oximetry Sensor Lag via Oxygen Desaturation--Disordered Breathing Event Cross-Correlation”, SLEEP: Journal of Sleep and Sleep Disorders Research, 40, page A232, 2017.
4. J.-P. Hosom, A. Kain, and B. Bush, “Towards the recovery of targets from coarticulated speech for automatic speech recognition”, The Journal of the Acoustical Society of America, 130(4), page 2407, 2011.
5. A. Kain, "Speech transformation: Increasing intelligibility and changing speakers", Journal of the Acoustical Society of America, 126(4), page 2205, 2009.

#### Ph. D. Thesis

High Resolution Voice Transformation”, OGI School of Science & Engineering, 2001.

#### Technical Reports

1. B. R. Snider and A. Kain, “Adaptive Reduction of Additive Noise from Sleep Breathing Sounds”, CSLU-2012-001.
2. A. Kain, J.-P. Hosom, S. H. Ferguson, B. Bush, “Creating a speech corpus with semi-spontaneous, parallel conversational and clear speech”, CSLU-11-003.

#### Patents

1. J. van Santen and A. Kain, OHSU. System and Method for Compressing Concatenative Acoustic Inventories for Speech Synthesis.
2. A. Kain and Y. Stylianou, AT&T Research Laboratories. Stochastic Modeling Of Spectral Adjustment For High Quality Pitch Modification.

#### Datasets

• A. Kain, The VOICES dataset, Linguistic Data Consortium Catalog, LDC2006S01, ISBN 1-58563-363-1, 2006.

#### OHSU Disclosures

1. #2717 Children's pronunciation database, 04/08/2019, Seeking Commercial Partners.
2. #2489 TimeView software, 08/15/2017, non-exclusively licensed under the MIT open-source license.
3. #2275 PyTTS Text-to-Speech software with 16 voices, 05/12/2016, Exclusively Licensed.
4. #1365 Mexican Spanish female diphone voice, 12/08/2008, Seeking Commercial Partners.
5. #1364 Mexican Spanish male diphone voice, 12/08/2008, Seeking Commercial Partners.
6. #1362 American English female diphone voice (AS), 12/08/2008, Seeking Commercial Partners.
7. #1361 American English male speaker diphone voice, 12/08/2008, Seeking Commercial Partners.
8. #1360 German male speaker diphone voice, 12/08/2008, Seeking Commercial Partners.
9. #1359 German female speaker diphone voice, 12/08/2008, Seeking Commercial Partners.
10. #1358 New Flinger singing synthesis, 12/08/2008, Inactive.
11. #1195 Clear-Speech Corpus, Speaker JPH, 05/07/2007, Seeking Commercial Partners.
12. #1065 Controlling Formant Frequencies in Concatenative Speech Synthesis Systems, 05/16/2006, Inactive.
13. #1061 Noninvasive Nasal Flow Measurement Device and Algorithm, 05/11/2006, Inactive.
14. #0868 CSLU System and Method for Synthesis Based Speech Enhancement, 09/24/2004, Exclusively Licensed.
15. #0844 CSLU Voice transformation for Dysarthria with Formant Re-synthesis, 06/03/2004, Exclusively Licensed.
16. #0665 Voice Transformation (High Resolution), 11/13/2002, Seeking Commercial Partners.
17. #0566 Method to compress concatenative acoustic inventories for speech synthesis, 07/01/2001, Exclusively Licensed.

### 4.5 Invited Lectures, Conference Presentations, or Professorships

#### International and National

1. Conference presentation: “Siamese Autoencoders for Speech Style Extraction and Switching Applied to Voice Identification and Conversion”, Interspeech, Stockholm, Sweden, 2017.
2. Conference presentation: “A Comparison of Sentence-level Speech Intelligibility Metrics”, Interspeech, Stockholm, Sweden, 2017.
3. Conference presentation: “Semi-supervised Training of a Voice Conversion Mapping Function using a Joint-Autoencoder”, Interspeech, Dresden, Germany, 2016.
4. Conference presentation: “Hybridizing Conversational and Clear Speech to Investigate the Source of Intelligibility Variation in Parkinson’s Disease”, Conference on Motor Speech, Sarasota, Florida, 2014.
5. Conference presentation: “Transmutative Voice Conversion”, ICASSP, Vancouver, Canada, 2013.
6. Conference presentation: ”Frequency-domain delexicalization using surrogate vowels”, Interspeech, Makuhari, Japan, 2010.
7. Conference presentation: ”Compression of Line Spectral Frequency Parameters using the Asynchronous Interpolation Model”, 7th ISCA Workshop on Speech Synthesis, Kyoto, Japan, 2010.
8. Invited conference presentation: ”Hybridizing Conversational and Clear Speech to Determine the Degree of Contribution of Acoustic Features to Intelligibility”, Meeting of the Acoustical Society of America, 2009, San Diego, CA.
9. Invited conference presentation for a Special Session on Voice Transformation: ”Using Speech Transformation to Increase Speech Intelligibility for the Hearing- and Speaking-impaired”, ICASSP, Taipei, Taiwan, 2009.
10. Conference presentation: ”Compression of Line Spectral Frequency Parameters with Asynchronous Interpolation”, ICASSP, Taipei, Taiwan, 2009.
11. Conference presentation: ”Hybridizing Conversational and Clear Speech”, Interspeech, Antwerp, Belgium, 2007.
12. Conference presentation: ”Spectral Control in Concatenative Speech Synthesis”, 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, 2007.
13. Conference presentation: ”Unit-Selection Text-to-Speech Synthesis Using an Asynchronous Interpolation Model”, 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, 2007.
14. Conference presentation: ”Formant Re-synthesis of Dysarthric Speech”, 5th ISCA Workshop on Speech Synthesis, Pittsburgh, PA, USA, 2004.
15. Conference presentation: ”A speech model of acoustic inventories based on asynchronous interpolation”, EUROSPEECH, Geneva, Switzerland, 2003.
16. Conference presentation: ”Compression of Acoustic Inventories using Asynchronous Interpolation”, IEEE Workshop on Speech Synthesis, Santa Monica, CA, 2002.
17. Conference presentation: ”Design and Evaluation of a Voice Conversion Algorithm based on Spectral Envelope Mapping and Residual Prediction”, ICASSP, Salt Lake City, UT, 2001.
18. Conference presentation: ”Spectral Voice Conversion for Text-to-Speech Synthesis”, ICASSP, Seattle, WA, 1998.

#### Regional and Local

• Presentation at CSLU Seminar Series approximately 1–2 times annually

### 4.6 Awards

• 2017, 2013 OHSU Technology Transfer and Business Development Award
• 2017 NVIDIA Academic Hardware Donation Program
• 2005 OHSU Commercialization Award

## 5 Service

### 5.1 Membership in Professional Societies

• International Speech Communication Association (ISCA)
• Institute of Electrical and Electronics Engineers (IEEE)
• Acoustical Society of America (ASA)

### 5.2 Granting Agency Review Work

• I served as a grant reviewer for the Spinal Cord Injury/Disease Research Program, 2018.
• I served as an additional reviewer on one grant proposal for the National Science Foundation, 2013.
• National Science Foundation, April, 2010. I reviewed several grant proposals, and then participated in two-day panel discussions.

### 5.3 Editorial and Ad Hoc Review Activities

• I typically review 1–3 journal articles annually. I have reviewed for:
• Journal of the Acoustical Society of America
• Journal of Computer, Speech, and Language
• IEEE Transactions on Audio, Speech and Language Processing
• Speech Communication Journal
• Transactions on Accessible Computing
• Transactions on Asian and Low-Resource Language Information Processing
• Journal of Speech, Language, and Hearing Research
• International Journal of Speech-Language Pathology
• I review a combined 4–8 conference papers annually for international conferences Interspeech and ICASSP. These conference papers are five pages long, and are scored along several dimensions.
• Guest Editor for a special Voice Transformation issue of the IEEE Transactions on Audio, Speech and Language Processing Journal, Volume 18, Issue 5, July 2010. My responsibilities included co-organization of the issue publication, co-authoring the introduction, and reviewing articles.

### 5.4 Committees

#### International/National

• Publications Chair for the international conference InterSpeech 2012 in Portland Oregon. Over several months, I coordinated with the Technical Program Committee, the Organizing Committee, and the Professional Conference Organizer to produce the electronic proceedings and the abstract book.

#### Departmental

• I participate in multiple distinct Dissertation Advisory Committee (DAC, for Ph. D. students) and Thesis Advisory Committee (TAC, for Masters Students). These are typically semi-annual half-hour meetings wherein a student meets with his/her research advisor and other faculty to discuss progress, course work, and future plans.
• I participate in the annual Qualifying Exam Committee, an annual one-day meeting wherein pre-qualifying Ph. D. students present their Qualifying Exam Work to faculty and other students. Faculty are assigned to be readers on several papers, and the written work and presentation are scored along several dimensions.
• I participate in Ph. D. Thesis Committees as needed. Prior to the Ph. D. defense, several faculty are assigned as evaluators of the written thesis, a task that typically takes 1–3 days due to the volume of information (usually over 100 pages).
• Member of the Admissions Committee reviewing applications of M. S. and Ph. D. students to the CS/EE program.
• From 2014–2017 I was a member of the Faculty Council Committee, which makes available to the Dean informed representative faculty and departmental opinion, counsel, affairs, and problems of the Medical School, especially in areas of administrative and operational policies directly concerned with educational matters.
• I give outreach talks to introduce our center and its educational mission to other academic institutions. Most recent talks were at Reed University, April 2015; Intel, June 2015; George Fox University, December 2015; OHSU's summer talk series, July 2016.

### 5.5 Activities

• I created and maintain several open-source projects:
• TimeView (https://github.com/lxkain/timeview) is a cross-platform (Windows, MacOS, Linux) desktop application for viewing and editing Waveforms, Time-Value data, and Segmentation data. These data can easily be analyzed or manipulated using a library of built-in processors; for example, a linear filter can operate on a waveform, or an activity detector can create a segmentation from a waveform. Processors can be easily customized or created from scratch.
• Multiple Isotonic Regression (https://github.com/lxkain/multi-isoreg) is an algorithm that, given a sequence, can find the minimum error and any number of optimal inflection points of segments that are either monotonically rising or falling. This allows finding shapes like up-down (one peak), or down-up-down, or up-down-up-down (2 peaks), etc. Special emphasis was placed on performance.
• Joint-Density Regression (https://github.com/lxkain/jd-reg) can create non-linear mapping functions using K-Means or GMMs.
• I worked on designing a formal mentee feedback mechanism to mentors, as an outcome of the 2017 “Making a Meaningful Difference” Leadership class offered by OHSU's Niki Steckler.
I created and maintain CSLU's unified educational computation environment. This is implemented via a custom-installation of the latest Ubuntu Linux distribution, with the Python anaconda distribution pre-installed. The virtual disk image is distributed online and can be used with the free VirtualBox Virtual Machine application on Windows, OSX, and Linux alike.
• I manage and maintain CSLU's Virtual Reality Laboratory, which features a HTC Vive headset, driven by Valve's SteamVR software. Custom worlds can be created using Epic's Unreal Engine.
• I oversee operations and maintenance of CSLU's Audio Laboratory hardware and software systems, which feature two WhisperRoom sound-proof booths, a 10-channel 32-bit 96 kHz Focusrite recording audio interface connected to a Digital Audio Workstation running custom software, a Kay Elemetrics laryngograph for capturing a high-quality voicing signal, high-quality condenser microphones, numerous digital video cameras, a teleprompter, and other support equipment.
I designed the 2016 CSLU printed educational catalog.
• In 2012–2013, Izhak Shafran and I worked with Portland State University (PSU) on increasing educational collaboration. This required several meetings with key faculty and administrators, both at OHSU and PSU. As a result, CSLU and PSU cross listed certain Computer Science and Electrical Engineering courses, which offered an important increase in the breadth of courses CSEE can offer its students, and in turn also serves by exposing CSEE faculty to promising PSU students for recruitment purposes.

## 6 Education

### 6.1 Students

• I have mentored Ph. D. students: Dinh, Snider, Mohammadi, Bush, Dudy, Khan, Moldover. I typically meet each of them one-on-one for 1–1.5 hours weekly to discuss their research. I have also co-mentored Ph. D. students: Langarani, Wallis, Resalat, Sathe, Bayestehtashk, Niu, Amano-Kusumoto, Miao. These students have found employment at Apple, Microsoft, Amazon, and other similar high-tech firms.
• I have co-/mentored Master's students: Yaeger, Alder, Velata, Moore.
• When appropriate, I lead weekly 1.5-hour project group meetings wherein 2–7 students and possibly additional faculty members report on and discuss their research with each other.
• When appropriate, I lead weekly 1-hour reading group meetings, wherein one student presents a published paper of his/her choice to a group of students and faculty (invitation is open to all of CSLU), with subsequent discussion.
• I supervised a total of 6 undergraduate students in the summers of 2008, 2009, and 2013, and 2016, funded through the National Science Foundation's Research Experiences for Undergraduates (REU) program and through the University Center for Excellence in Developmental Disabilities (UCEDD).
• I work with volunteers who would like to become Research Assistants, students, or co-authors on a publication with me.

### 6.2 Courses

Nearly all of my lectures are taught using jupyter notebooks, which, in addition to regular text and equations, allow for interactive code examples, graphical widgets, and data visualization during class. Students can download these notebooks and use them as reference or starting points for their own projects.

#### CS 627 Data Science Programming

This course is a best-of compilation of concepts, practices, and python-based software libraries (all free, open-source, and unrestricted) that allow for rapid, straight-forward, and easy-to-maintain implementation of new ideas and scientific questions. Students will gain awareness and initial working knowledge of some of the most fundamental computational tools for performing a wide variety of academic research. As such, it will focus on providing breadth instead of depth, which means that for each concept we will talk about motivation, key concepts, and concrete usage scenarios, but without exhaustive mathematical background or proofs, which can be acquired in more specialized classes. In this class we will: write programs in python; perform numeric tasks using numpy and scipy; manage data using pandas; discuss audio, image and text processing using scipy.signal, scikit-image, nltk, and pynini; apply machine learning algorithms such as deep neural networks, convolutional neural networks, and autoencoders using scikit-learn and keras; visualize data using matplotlib and pyqtgraph; use pyqt/QT to build graphical user interfaces; address performance issues via compilation/profiling/parallelization tools, and much more (winner of the 2019 Sakai Torchbearer Award).
I have created the curriculum for, and teach this 3-credit course (20$×$1.5-hour lectures). Creating the curriculum required approximately 200 hours. Due to the quickly changing landscape at the edge of technology, updating and teaching the course requires approximately 100 hours each time it is offered. Grading students' answers and evaluating their project outcomes requires a total of approximately 3 hours per student over the course of the class (unless a TA is available). Students' evaluation scores averaged 5.0/5.0 in Fall 2015, 5.15/6.0 in Fall 2016, 5.64/6.0 in Winter 2018, 5.83/6.0 in Spring 2019.
1
As a point of reference, over the last 56 classes, the mean evaluation score was 4.3, and the median score was 4.4. Please refer to the OHSU Educators Portfolio document for further details.

#### EE 682 Digital Signal Processing

This course teaches students the core principals of digital signal processing. We survey a variety of topics in class lecture/discussion based on assigned readings while exploring specific topics/applications in depth through lab assignments and a final project. Specifically, we cover the core topic areas in digital signal processing including an overview of discrete-time signals and systems, the discrete-time Fourier transform, the $z$-Transform and transform analysis, the discrete Fourier Series, the discrete Fourier transform, circular convolution, network structures for FIR systems, design of IIR and FIR filters, and multi-rate processing.
I co-teach this 3-credit course (10$×$1.5-hour lectures). Creating the lectures required approximately 180 hours. Students' evaluation scores averaged 5.13/6.0 in Winter 2019.

#### EE 658 Speech Signal Processing

Speech systems are becoming commonplace in today's computer systems and Augmentative and Alternative Communication (AAC) devices. Topics include speech production and perception by humans, linear predictive features, pitch estimation, speech coding, speech enhancement, prosodic speech modification, Voice Conversion (VC), Text-to-Speech (TTS), and automatic speech recognition (ASR).
I have created the curriculum for, and teach this 3-credit course (20$×$1.5-hour lectures). Creating the curriculum required approximately 180 hours. Due to the quickly changing landscape at the edge of technology, updating and teaching the course requires approximately 200 hours each time it is offered. Grading students' answers and evaluating their project outcomes requires a total of approximately 3 hours per student over the course of the class (unless a TA is available). Students' evaluation scores averaged 4.7/5.0 in Winter 2016.

#### CS 653 Text-to-Speech Synthesis

This course will introduce students to the problem of synthesizing speech from text input. Speech synthesis is a challenging area that draws on expertise from a diverse set of scientific fields, including signal processing, linguistics, psychology, statistics, and artificial intelligence. Fundamental advances in each of these areas will be needed to achieve truly human-like synthesis quality and advances in other realms of speech technology (like speech recognition, speech coding, speech enhancement). In this course, we will consider current approaches to sub-problems such as text analysis, pronunciation, linguistic analysis of prosody, and generation of the speech waveform. Lectures, demonstrations, and readings of relevant literature in the area will be supplemented by student lab exercises using hands-on tools.
I have created the curriculum for and teach the second half of this 3-credit course (10$×$1.5-hour lectures). Creating the curriculum required approximately 120 hours of my time. Students' evaluation scores averaged 4.4/5.0 in 2015.

#### CS 606 Computational Approaches to Speech and Language Disorders

This course covers a range of speech and language analysis algorithms that have been developed for measurement of speech or language based markers of neurological disorders, for the creation of assistive devices, and for remedial applications. Topics include introduction to speech and language disorders, robust speech signal processing, statistical approaches to pitch and timing modeling, voice transformation algorithms, speech segmentation, and modeling of disfluency. The class uses a wide array of clinical data, and is closely tied to several ongoing research projects.
I have created and taught 2$×$1.5-hour lectures for this course.

### 6.3 Presentations

• Speaker at the 2017 OHSU Symposium on Educational Excellence, on “Interactive lecturing with jupyter notebooks”.
• Twice yearly Course Advertisement Talks to preview my courses that are scheduled the following quarter to prospective students.

### 6.4 Awards

• 2019 Sakai Torchbearer Award for the use of jupyter notebooks in education
• 2017 nominated for OHSU Excellence in Graduate Teaching Award
• 2014 OHSU Excellence in Graduate Teaching Award