CS 5/662: Natural Language Processing

Fig. 1: 15th-century miniature of Jean Miélot, a noted author, translator, and illuminator.

Description

This course covers key models and algorithms that are used for automatic processing of natural language text. In natural language processing (NLP) tasks, inputs are word or character sequences, and the outputs consist of linguistic annotations to those sequences or of entirely new linguistics equences. These annotations are crucial for downstream applications like automatic speech recognition, machine translation, information extraction, and question answering. Students in this course will gain practical knowledge of and experience with standard algorithms and tools, including both traditional and neural approaches.

Logistics

Time
Tuesdays & Thursday, 16:15 – 17:45

Schedule

Date Week Day Topic HW Out HW Due
Jan 08 1 Tu Logistics, intro, some history, workflows and word counting    
Jan 10 1 Th Character encodings, file fomats, etc HW1  
Jan 15 2 Tu Text classification basics HW2 HW1
Jan 17 2 Th No Class: SDB out of town    
Jan 22 3 Tu Artificial Neural Networks & Backpropagation   HW2
Jan 24 3 Th Pytorch lab    
Jan 29 4 Tu Morphology & Phonology HW3  
Jan 31 4 Th Word Embeddings and Representations   Project Proposal
Feb 05 5 Tu Text Normalization HW4 HW3
Feb 07 5 Th Context-free grammars & tree transforms    
Feb 12 6 Tu Parsing 1: Traditional chart parsing HW5 HW4
Feb 14 6 Th Parsing 2: Shift-Reduce, dependency grammars, and neural approaches    
Feb 19 7 Tu Machine Translation 1: History & Evaluation   HW5
Feb 21 7 Th Machine Translation 2: "Traditional" SMT HW6  
Feb 26 8 Tu Machine Translation 3: Neural MT & attention architectures    
Feb 28 8 Th Sentiment and Topic models    
Mar 05 9 Tu Semantics 1: Intro & logic   HW6
Mar 07 9 Th Semantics 2 HW7  
Mar 12 10 Tu Special Topic: Biomedical applications    
Mar 14 10 Th Guest Lecture: Brooke Cowan   HW7
Mar 19 11 Tu Project Presentations    
Mar 21 11 Th Project Presentations    

Assignments & Grading

Assigned readings for each session may be found in the schedule and also in the section below, grouped by session. Students are expected to have read each week's readings before class (excepting the first session), and to be prepared to discuss them in class. Most of our assigned readings are open-access, while others are available through the OHSU Library; we will provide copies of other texts and articles as needed.

There will be multiple hands-on homework assignments given throughout the class. See below for my policy on late work and extensions. Assignments should be turned in via Sakai.

Final Project

In addition to homework assignments, there will be a final project on the subject of your choice. In the past, students have replicated a paper, implemented an algorithm or approach, etc. The only requirements are that the project be language-related, and that it involve some form of experimental evaluation. The deliverables for the project are:

  1. A written proposal, turned in to me by the date shown on the schedule above. The written proposal must include the following components:
    • What task you propose to try and implement, paper you wish to replicate, etc.
    • What data set you will use, and where you will get it
    • What your experimental evaluation will consist of, including stated hypotheses
    The proposal must be written as a complete standalone prose document, not as an outline, list of bullet points, collection of sentence fragments, etc.
  2. A short (10 minute) in-class presentation at the end of the term, showing off your final results;
  3. A short (8-10 pages, not counting references) paper following the standard structure for a scientific report describing your project and its results, including a background and literature review section.

Grade Components

Grading will be as follows:

  • 10% Participation
  • 70% Homework
  • 20% Final project

Textbooks

We will be reading from several excellent textbooks this term. Notably, they are all available online, either because the author has posted the content or because the OHSU library has electronic copies. Other readings will be provided electronically as-needed.

Resources

Other Useful Books

Assigned Readings

Week 1

Tuesday-Thursday
Also recommended:

Week 2

Take a quick look at Goldberg, Chapter 2. If its contents are totally new to you, take the time to read it before class.
Tuesday
Thursday
No class! SDB out of town.

Week 3

Tuesday
Thursday

Week 4

Tuesday
Thursday

Week 5

Tuesday
Thursday
  • J&M, Chapter 10
  • Bender, Chapter 5
  • If you're rusty on formal language theory, I strongly recommend reading Eisenstein chapter 9.

Week 6

Tuesday
  • J&M, Chapters 11 and 12
Thursday

Week 7

Tuesday
Thursday

Week 8

Tuesday
Thursday
  • No reading! Seminar at Reed!

Week 9

Tuesday
Thursday

Useful/Interesting Readings

Software Tools

Instructor

Steven Bedrick

bedricks@ohsu.edu

Steven Bedrick can usually be found in his natural habitat, Gaines Hall room 19. While he has no set office hours, GH is far enough off the beaten path that you should probably schedule something with him before making the schlep.

We strongly encourage you to consult the Student Health Center for guidance about any pre-travel immunizations that may be required before visiting Gaines Hall.

Class Policies

Late work

Each assignment will come with a due date and time, which should be treated as firm. If you need an extension, please make your request by 5:00 PM two days prior to the assignment's due date. In other words, if an assignment is due on a Tuesday, extension requests must be made by Sunday afternoon. This is to help prevent procrastination, to give you time to come to me for help, and to help me plan my grading time. Extensions are not guaranteed, and will depend on the circumstances surrounding the request.

A couple of notes about how I grade: I am very generous with partial credit, but also somewhat picky about whether or not a homework submission paid attention to an assignment's instructions. In other words, when in doubt, it is always better to make an attempt at each part of an assignment, even if you do not finish any one of them... and you would also do well to take note of what the assignment requests, especially when it comes to the written component. To put it simply, if it's in the assignment document, it's in my grading rubric. So, if I ask for a writeup that covers points A, B, and C, you can bet that my grading rubric for the assignment will have three corresponding slots.

Collaboration, Plagiarism, & Attribution

We expect and require that all submissions be the student's own, original work. Any and all code, text, etc. that you include from any other source must be properly cited (see the "Writing Code" section of MIT's student handbook for some examples of source code citation). This includes code as well as figures, prose descriptions, etc. For some of the assignments, you may be able to find example code online that does what you need. Do not use it. You are absolutely, 100% prohibited from using sample code in this class. The line between "reference", "inspiration", and "re-use" can be a bit blurry, so, when in doubt, ask!

Note further that the School of Medicine has a policy regarding ethical and professional conduct for graduate students that specifically addresses plagiarism (sections 4.b and 4.c). We expect all students to be aware of and familiar with this policy. If you have any questions about this policy, please ask.

A note regarding student collaboration: We absolutely encourage students to work together on assignments, but unless specifically directed otherwise, each student is 100% responsible for producing their own, independent version of the solution and its writeup. If you have any questions or concerns about what this should look like in practice, please ask.

Access Statement

Our program is committed to all students achieving their potential. If you have a disability or think you may have a disability (physical, learning, hearing, vision, psychological) which may need a reasonable accommodation please contact Student Access at (503) 494-0082 or e-mail studentaccess@ohsu.edu to discuss your needs. You can also find more information at http://www.ohsu.edu/student-access. Because accommodations can take time to implement, it is important to have this discussion as soon as possible. All information regarding a student’s disability is kept in accordance with relevant state and federal laws.

Equity and Inclusion

Oregon Health & Science University is committed to creating and fostering a learning and working environment based on open communication and mutual respect. If you encounter sexual harassment, sexual misconduct, sexual assault, or discrimination based on race, color, religion, age, national origin or ancestry, veteran or military status, sex, marital status, pregnancy or parenting status, sexual orientation, gender identity, disability or any other protected status please contact the Affirmative Action and Equal Opportunity Department at 503-494-5148 or aaeo@ohsu.edu. Inquiries about Title IX compliance or sex/gender discrimination and harassment may be directed to the OHSU Title IX Coordinator at 503-494-0258 or titleix@ohsu.edu.