CS 6120: Natural Language Processing

Northeastern University, Fall 2025

Course Details

The knowledge coverage will be the same for session 21600 and 21601, but the lecture slides will differ. Please attend the lecture you are enrolled in.

Course description

Welcome! This is a graduate-level course on Natural Language Processing (NLP). This class introduces both foundational and modern language technologies, as well as their applications. We will begin with early statistical models, such as Naive Bayes and logistic regression, and progress to modern large language models. Along the way, we will explore a variety of core NLP tasks, such as summarization, classification, and machine translation, as well as some basic linguistic concepts.

The course will also cover different evaluation methods and commonly used metrics, as well as NLP applications in the social sciences and humanities. Beyond these, we will explore some emerging research areas such as multimodal NLP (language and vision), privacy and ethical challenges of deploying language technologies.

The goal of this class is twofold: first, to provide students with the knowledge and skills to understand and deploy language technologies; and second, to encourage a deeper appreciation of language itself and the societal implications of language technologies.

Location:

  • Session 21600: West Village H, Room 366
  • Session 21601: Richards Hall, Room 236
  • Time:

  • Session 21600: Tuesday & Friday 1:35-3:15pm
  • Session 21601: Tuesday & Friday 3:25-5:05pm
  • Grading:

    Complete independently: 5 programming assignments (30%), 6 quizzes (30%)

    Complete in groups of 1-4:
    Course project: initial pitch, research plan, sample data, grade contract, presentation, final report (40%)

    Textbook:

    Daniel Jurafsky and James H. Martin. 2025. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Online manuscript released August 24, 2025. (Draft chapters available online)

    Gradescope: code to join NGZDZP

    Ed Discussion: link to join

    Course Staff

    Instructors:

    Session 21600: David Smith, email: dasmith[at]ccs.neu.edu

    Session 21601: Si Wu, email: siwu[at]ccs.neu.edu

    TAs:

    Divya Sri Bandaru, email: bandaru.di[at]northeastern[dot]edu

    Tejus Dinesh, email: dinesh.te[at]northeastern[dot]edu

    Zankhana Pratik Mehta, email: mehta.zan[at]northeastern[dot]edu

    Office hours:

    David Smith Monday 11–12am, Thursday 4–5pm Zoom link
    Si Wu Wednesday 10am–12pm Zoom link
    Divya Sri Bandaru Monday 2pm–4pm Zoom link
    Tejus Dines Thursday 6pm–8pm Zoom link
    Zankhana Pratik Mehta Wednesday 3pm–5pm Zoom link

    Policies

    Quiz

    The quizzes will be administered in class, either at the beginning or end of a lecture. Quiz dates will not be announced in advance. The format will consist primarily of multiple choice questions and will focus on assessing your understanding of high-level concepts. If you know in advance that you will miss a lecture, or if unforeseen circumstances prevent you from attending, please let us know so that you can arrange a makeup quiz.

    Coding assignment late policy

    Assignments are due at the announced due date and time, usually 11:59 p.m. You will be granted one homework extension of four calendar days, to be used at your discretion, without having to ask. This single extension is meant to smooth over unforeseen crunches in your schedule, and you cannot simply distribute the four late days among four assignments. After the first late assignment, unexcused late assignments will be penalized 10% per calendar day late. We normally will not accept assignments after the date on which the following assignment is due or after the solutions have been handed out, whichever comes first. If you know in advance of circumstances that would cause you to turn in an assignment late, please contact the instructor before the assignment is due to ask if an extension is possible.

    Academic Integrity policy

    Please refers to the Academic Integrity Policy here.

    Disability Access Services

    We are committed to accommodating students with disabilities. Please contact Disability Access Services and follow the outlined procedure.

    Announcement

    — Added policy on quizzes and suggested readings!

    — Added late policy!

    — Gradescope and Ed Discussion are up!

    — The class website is up!

    Schedule

    Week, date Topics Lecture Slides Suggested Readings Assignments Others
    1, Fri, 9/5 Introduction and Course Logistics: Language Models in Brains and Machines Session 21601 Slides

    Session 21600 Slides
    2, Tues, 9/9 Words, Regular Expressions, and N-gram (Markov) Models Session 21601 Slides

    Session 21600 Slides
    Assignment 1 released: Empirical Regularities of Language: Evaluating Predictions and Counting Words
    2, Fri, 9/12 Text Classification: Naive Bayes, Logistic Regression, and Friends Session 21601 Slides

    Session 21600 Slides
    JM 4
    3, Tues, 9/16 Word Embeddings Session 21601 Slides

    Session 21600 Slides
    JM 5
    3, Fri, 9/19 Introduction to Neural Networks Session 21600 Slides

    Session 21601 Slides
    JM 6

    Goodfellow Chapter 6

    MIT 6.390 Intro to ML notes
    Assignment 1 due (11:59pm), Assignment 2 released: Predictive and Interpretive Text Classification
    4, Tues, 9/23 Beyond Words: Morphology, Syntax, and Semantics Session 21600 Slides

    Session 21601 Slides
    JM 17
    JM 19
    Instructions for your course project pitch
    4, Fri, 9/26 Sequence Data, Recurrent Networks, and Attention Session 21601 Slides

    Session 21600 Slides
    JM 13
    5, Tues, 9/30 Transformers Session 21601 Slides

    Session 21600 Slides
    JM 8 Assignment 2 due (11:59pm)
    5, Fri, 10/3 A Taxonomy of Large Language Models: Data, Weights, Training, and Inference Session 21601 Slides

    Session 21600 Slides
    Project pitch due (11:59pm)
    6, Tues, 10/7 Guest Lecture: Alexander Spangher (postdoc @ Stanford)
    Assignment 3 released: Probing Transformers
    6, Fri, 10/10 Pretraining Session 21601 Slides

    Session 21600 Slides
    7, Tues, 10/14 Generation Algorithms
    7, Fri, 10/17 Post-Training: RLHF, DPO, and Friends
    8, Tues, 10/21 Prompting and In-context Learning
    Assignment 3 due (11:59pm)
    8, Fri, 10/24 Evaluation, Benchmarks, and Experimental Design
    9, Tues, 10/28 Retrieval, Retrieval-Augmented Generation, and Summarization
    9, Fri, 10/31 Multilinguality
    10, Tues, 11/4 Guest Lecture: Terra Blevins (Northeastern): Multilingual Encoding in LLMs
    10, Fri, 11/7 Beyond Individuals: Language in Social Context
    11, Tues, 11/11 No Class, Veterans Day
    11, Fri, 11/14 Guest Lecture: Niloofar Mireshghallah (META AI and CMU): Analyzing the Security and Privacy of LLMs
    12, Tues, 11/18 Language and Visual Context
    12, Fri, 11/21 Guest Lecture: Lucy Li (postdoc @ UW, and incoming assistant prof @ University of Wisconsin-Madison)
    13, Tues, 11/25 Final Lecture: [Student Suggested Topics]
    13, Fri, 11/28 No Class, Happy Thanksgiving! 🦃
    14, Tues, 12/2 Project Presentations
    14, Fri, 12/5 Project Presentations
    15, Thurs, 12/11
    Final report due (11:59pm)