Back to the computer science and engineering home page.
Division of
Biomedical
Computer Science

Course Info
Description
Prerequisites
Homework
Project
Grading
Your Grades

A note about plagiarism

Resources
CSLU Speech Toolkit
Tcl/tk

CS550 Spoken Dialogue Systems
Winter 2016

 

Instructor
Peter Heeman

When
Tuesday/Thursday 4:00pm-5:30pm

Where
Gaines Hall 5

3 credits
Piazza Discussion Board
Recorded Lectures


Course Information

Spoken dialogue systems are already being deployed to help people find out flight information, trade stock, access email, and find out traffic conditions. With the continuing advancements in speech technology, more information and services will become readily available. A simple cell phone will be enough to hook into the information age.

This course teaches the fundamentals of spoken dialogue systems. Spoken dialogue systems include components for speech recognition, parsing, semantic interpretation, dialogue management, text generation, speech synthesis, and agent architecture. The course will be organized in terms of 3 frameworks for dialogue management: finite-state machines, form-filling, and speech-act reasoning. We will examine how speech recognition, parsing, and semantic interpretation fit into each framework. We will also contrast hand-crafting a dialogue manager with using machine learning.

There is no textbook for the class.


Prerequisites

Programming assignments will be in Tcl/Tk and will use the CSLU toolkit. No prior experience with either is required.

During the course, we will be going into the basics of different formalisms for expressing knowledge, such as finite state machines and context free grammars. Students will be taught the basics of these different formalisms, and are not expected to have already taken a course on automata and formal languages.


CSLU Speech Toolkit

Students will be using the CSLU Speech Toolkit for this class. It has been loaded on the CSE Windows machines. For students who have their own Windows-based PC, they can download it and install it for free onto their own machines. Instructions for downloading it are located at http://www.cslu.ogi.edu/toolkit/download. The toolkit has many aspects to it. We will be using it solely to build spoken dialogue systems, starting with the Rapid Application Development (RAD) environment. Check out http://www.cslu.ogi.edu/toolkit/docs/2.0/apps/rad/. This page has a series of tutorials. In particular, tutorial 1, 2, 6, 11, 15, and 16 are particularly useful. The others use features of RAD that we will not be exploring.

Tcl/Tk

The CSLU Speech Toolkit allows you to incorporate Tck/Tk code in building your spoken dialogue systems with RAD. If you want to bypass the graphical interface of RAD, the toolkit has functions written that can be easily incorporated into a Tck/TK program. Hence, in this course, we be using Tcl/Tk. Tcl/Tk is automatically installed with the toolkit. Some of the tutorials mentioned in the previous section focus on using Tcl/TK. Other sources of information are located at http://tcl.ActiveState.com/doc. Tcl is a scripting language and Tk is a graphics toolkit. You will be mainly using the Tcl part.

Here is some information that I compiled about Tcl/Tk to get you started.


Homework

For each section, there will be a homework assignment, which will involve either creating a technology or incorporating it into a spoken dialogue system.

Final Project

Toward the end of the course, students will do a final project, which can be individual or group-based (at most 3 students). Groups will build on the systems that they have built during the homework assignments. Below are some example projects. The writeup would discuss the application and the needed capabilities of the spoken dialogue system. It would discuss and justify the choices in underlying technology.

Timeline
Week 5Project groups decided
Week 6Each group hands in one-page writeup of what their project will entail
Week 7Each group meets with professor for feedback on their proposal
Week 10Presentation and Writeup due.

Groups work well when all members contribute to the project. Members do not have to contribute in the same way, rather the group should take advantage of the differing strengths of its team members. To encourage each member to fully participate, after finishing the project, each team member must hand in an evaluation of their team consisting of a one paragraph statement of how well they thought their team worked together, and a score between 0 and 10 of each of their team members.


Grading

Assignments 40%
Presentation 15%
Final Project & Presentation 30%
Final Exam 15%

Class Schedule

Below is a tentative vesion of the class schedule. This will be changing over the next two weeks. You can count on at least the next being accurate. I am making this available so that you can get an idea of what will be taught in the course. I am making tentative versions of the homeworks and the class lecture slides.

Tue Jan 5
Class 1
Finite-State Dialogue Management Basics of building simple spoken dialogue systems using Finite State Models, including how speech recognition, parsing, semantic intepretation can be easily incorporated.
Thu Jan 7
Class 2
Parsing Compositional Meaning. Bottom-up parsing algorithm.
  Homework 1 Implement a simple spoken language system using the CSLU toolkit.
This will be a system-controlled dialogue:
user responses will be highly contrained, just single words or short phrases.
Due Thursday January 14 by 4:00pm.
Tue Jan 12
Class 3
Semantic Interpretation Semantic interpretation using parallel semantic rules. Knowledge representation formalisms, including frames, hierarchical frames, FOPC and lambda calculus, and event-based semantics.
Supplementary Reading: Chapter 14 & 15 of D. Jurafsky & P. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, Prentice Hall 2000.
Thu Jan 14
Class 4
Form-based Dialogue Management Dialogue manager uses data structures to guide its behaviors.
  Homework 2 Implement a system that uses the speech recognition grammar and that does limited semantic processing.
Due Tuesday January 19 by 4:00pm.
Tue Jan 19
Class 5
Continuation  
  Homework 3 Search, parsing, semantic interpretation.
Here is some information about writing tcl scripts. Due Tuesday January 26 by 4:00pm.
Thu Jan 21
Class 6
Hierarchical Forms Dialogue manager uses hierarchical data structures to guide its behaviors.
Tue Jan 26
Class 7
Speech Acts Philosophical and Artificial Intelligence view of Speech Acts.

Required Reading: David R. Traum, Speech Acts for Dialogue Agents, in Michael Wooldridge and Anand Rao, editors, ``Foundations And Theories Of Rational Agents'', Kluwer Academic Publishers, pages 169--201, 1999.

Thu Jan 28
Class 8
ISU Toolkit for building dialogue managers.

Required Reading: Staffan Larsson and David Traum (2000): Information state and dialogue management in the TRINDI Dialogue Move Engine Toolkit. In Natural Language Engineering Special Issue on Best Practice in Spoken Language Dialogue Systems Engineering, Cambridge University Press, U.K. (pp. 323-340, 18 pages)

  Homework 4 Lambda Expressions. Form-Filling Dialogues.
Here is the file class04form.tcl that you need. Due Thursday February 4 by 4:00pm.
Tue Feb 2
Class 9
Continuation  
Thu Feb 4
Class 10
Information State Example Banking Application cast as in Information State approach

  Homework 5 Build a form-based spoken dialogue system. The car inventory is here (hw5cars.tcl)

Due Thursday February 11.
Tue Feb 9
Class 11
Learning Dialogue Strategies Rather than hand-craft a dialogue strategy, machine learning techniques can be used.

Required Reading:
A Stochastic Model of Human-Machine Interactin for Learning Dialog Strategies, Levin, Pieraccini and Eckert, Transactions on Speech and Audio Processing, 2000.

Thu Feb 11
Class 12
Learning Dialogue Strategies II MDP, Model-based RL, Model-Free RL, Flight Domain problem
  Homework 6 Augment an information-state dialogue system. Start with the code in IS1-Engine.tcl and IS1-Agent.tcl.
Due Thursday February 18.
Tue Feb 16
Class 13
Continuation  
  Homework 7 Information State II. Use the code in IS2-Engine.tcl, which allows allows the user to keep the turn for multiple utterances.
Due Sunday February 28.
  Homework 8 Simulated Dialogues. Use the code in IS3-Engine.tcl and IS3-Agent.tcl. Also, make sure you get the official answers for Homework 7 from the instructor as you should use those for question 2. Due Tuesday March 1.
Thu Feb 18
Class 14
Learning Dialogue Strategies III Epsilon-Greedy, Alpha, Q-Learning
Tue Feb 23
Class 15
Continuation  
  Homework 9 RL-IS. Make sure you get the official answers for Homework 8 from the instructor. Due Thursday March 3.
Thu Feb 25
Class 16
Combining RL and IS. Required Reading:
Combining Reinforcement Learning with Information-State Update Rules. Heeman. In Proceedings of the North American Chapter of the Association for Computational Linguistics Annual Meeting, pages 268-275, Rochester NY, April 2007.
Using Reinforcement Learning for Dialogue Management Policies: Towards Understanding MDP Violations and Convergence In Proceedings of Interspeech, pages 746-749, Portland OR, September 2012.
 
Tue Mar 1
Class 17
Negotiation Representing the Reinforcement Learning State in a Negotiation Dialogue. Heeman. In Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Merano Italy, December 2009.
Thu Mar 3
Class 18
Turntaking Yang and Heeman, 2010. Initiative Conflicts in Task-Oriented Dialogue
Heeman and Selfridge, 2010. Importance-Driven Turn-bidding for Spoken Dialogue Systems
  Homework 10 Learning Policies. Due Saturday March 12.
Tue Mar 8
Class 19
Initiative and Discourse Structure Walker and Whittaker, 1990. Mixed Initiative in Dialogue: An Investigation into Discourse Segmentation
Discourse Structure: Grosz and Sidner, Attention, Intentions, and the Structure of Discourse
Strayer, Heeman and Yang, 2003. Reconciling Control and Discourse Structure
Thu Mar 10
Class 20
Presentations
Turn-taking cues in task-oriented dialogue. Gravano and Hirschberg. Computer Speech and Language, 25(3):601-634, July 2011.
Pauses, gaps and overlaps in conversations. Heldner and Edlund. Journal of Phonetics, 2010.
Single-Agent vs. Multi-Agent Techniques for Concurrent Reinforcement Learning of Negotiation Dialgoue Policies. Georgila, Nelson, and Traum. Annual Meeting of the Association for Computational Linguistics, 2014.
Collaborating on Referring Expressions. Heeman and Hirst. In Compuational Linguistics, Vol. 21-3, 1995.
Or propose a paper (at least 4 pages long) from the most recent SigDial conference.
Tue Mar 15
Class 21
Final

Plagiarism

Learning from and with each other is encouraged. However, interacting so as to avoid learning is not tolerated. Any discussion in which no personal notes (or programs) are taken in, and none are taken out, are fine. From such discussions, students should learn the material well enough to construct their notes on their own afterwards. If you are in doubt, the onus is you to discuss the sitation with the professor before hand.