CS 5/579, Winter 2020

Problem Solving with Large Clusters, Winter 2020


Many real-world computational problems involve data sets that are too large to process on a single computer, or that have other characteristics— fault-tolerance, etc.— that require multiple computers working together. Examples include analysis of high-throughput genomic or proteomic data, data analytics over very large data sets, large-scale social network analysis, training machine learning models on “web scale” data sets, and so forth. In Problem Solving with Large Clusters, we will explore a variety of approaches to solving these kinds of problems through a mixture of lectures and student-led discussions of the research literature in the field. We will also hear from several guest lecturers with practical experience applying cluster computing algorithms in both academia and industry. In addition to reading and discussing articles, students will learn how to program in the Hadoop map-reduce environment as well as in several other such systems through class assignments. There will also be a final project on a subject of the student’s choice involving cluster computing.

Prerequisites: A graduate level course on machine learning or probability and statistics. Students should be comfortable coding in at least one programming language, and familiar with the UNIX command-line environment.

Textbooks & Resources

Course goals

By the end of the course, students will:



This is a seminar-style class. The reading will consist of computer science research articles, and students will be expected to come to class prepared to discus that session’s assigned reading. Each session will have two students assigned. One will be in charge of presenting that day’s readings, the other will be assigned to lead the discussion of the material. Active participation in the discussions is both expected and required (note the participation component of the grading breakdown).

When & Where?

Mondays & Wednesdays, 16:00-17:30 in CDRC 3200

Starts: January 6

Ends: March 20


Instructor: Steven Bedrick

Office Location: Gaines Hall, 21

Office Hours: Wednesday, 9:00-10:30 AM