CS 6120: Natural Language Processing

Northeastern University, Fall 2025

Course Details

The knowledge coverage will be the same for sessions 21600 and 21601, but the lecture slides will differ. Please attend the lecture you are enrolled in.

Course description

Welcome! This is a graduate-level course on Natural Language Processing (NLP). This class introduces both foundational and modern language technologies, as well as their applications. We will begin with early statistical models, such as Naive Bayes and logistic regression, and progress to modern large language models. Along the way, we will explore a variety of core NLP tasks, such as summarization, classification, and machine translation, as well as some basic linguistic concepts.

The course will also cover different evaluation methods and commonly used metrics, as well as NLP applications in the social sciences and humanities. Beyond these, we will explore some emerging research areas such as multimodal NLP (language and vision), privacy and ethical challenges of deploying language technologies.

The goal of this class is twofold: first, to provide students with the knowledge and skills to understand and deploy language technologies; and second, to encourage a deeper appreciation of language itself and the societal implications of language technologies.

Location:

  • Session 21600: West Village H, Room 366
  • Session 21601: Richards Hall, Room 236
  • Time:

  • Session 21600: Tuesday & Friday 1:35-3:15pm
  • Session 21601: Tuesday & Friday 3:25-5:05pm
  • Grading:

    Complete independently: 5 programming assignments (30%), 5 quizzes (30%)

    Complete in groups of 1-4:
    Course project (40%): initial pitch 5%, data and experimental design 5%, grade contract 2%, presentation 3%, final report 25%

    Textbook:

    Daniel Jurafsky and James H. Martin. 2025. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd edition. Online manuscript released August 24, 2025. (Draft chapters available online)

    Gradescope: code to join NGZDZP

    Ed Discussion: link to join

    Course Staff

    Instructors:

    Session 21600: David Smith, email: dasmith[at]ccs.neu.edu

    Session 21601: Si Wu, email: siwu[at]ccs.neu.edu

    TAs:

    Divya Sri Bandaru, email: bandaru.di[at]northeastern[dot]edu

    Tejus Dinesh, email: dinesh.te[at]northeastern[dot]edu

    Zankhana Pratik Mehta, email: mehta.zan[at]northeastern[dot]edu

    Office hours:

    David Smith Monday 11–12am, Thursday 4–5pm Zoom link
    Si Wu Wednesday 10am–12pm Zoom link
    Divya Sri Bandaru Monday 2pm–4pm Zoom link
    Tejus Dines Thursday 6pm–8pm Zoom link
    Zankhana Pratik Mehta Wednesday 3pm–5pm Zoom link

    Policies

    Quiz

    The quizzes will be administered in class, either at the beginning or end of a lecture. The dates of the first three quizzes will not be announced in advance. You may drop your lowest quiz score at the end of the semester. The format will consist primarily of multiple choice questions and will focus on assessing your understanding of high-level concepts. If you know in advance that you will miss a lecture, please email us before class to be eligible for a make-up quiz. If unforeseen circumstances prevent you from attending, please contact us as soon as possible so we can consider your situation.

    Coding assignment late policy

    Assignments are due at the announced due date and time, usually 11:59 p.m. You will be granted one homework extension of four calendar days, to be used at your discretion, without having to ask. This single extension is meant to smooth over unforeseen crunches in your schedule, and you cannot simply distribute the four late days among four assignments. After the first late assignment, unexcused late assignments will be penalized 10% per calendar day late. We normally will not accept assignments after the date on which the following assignment is due or after the solutions have been handed out, whichever comes first. If you know in advance of circumstances that would cause you to turn in an assignment late, please contact the instructor before the assignment is due to ask if an extension is possible.

    Academic Integrity policy

    Please refers to the Academic Integrity Policy here.

    Disability Access Services

    We are committed to accommodating students with disabilities. Please contact Disability Access Services and follow the outlined procedure.

    Final Grade

    >93% A, 90-93% A-, >87 B+, >83-87% B, 80-83% B-, >77 C+, >73-77% C, 70-73% C-, >67 D+, >63-67 D, 60-63 D-, less than 60% F.

    Guest Lectures

    Guest lectures will be recorded and posted on Ed afterward. Students are encouraged to attend in-person to engage with the guest lecturers.

    Announcement

    — Updated quiz policy! You may drop one quiz at the end, and there will only be 5 quizzes.

    — Added policy on quizzes and suggested readings!

    — Added late policy!

    — Gradescope and Ed Discussion are up!

    — The class website is up!

    Schedule

    Week, date Topics Lecture Slides Suggested Readings Assignments Others
    1, Fri, 9/5 Introduction and Course Logistics: Language Models in Brains and Machines Session 21601 Slides

    Session 21600 Slides
    2, Tues, 9/9 Words, Regular Expressions, and N-gram (Markov) Models Session 21601 Slides

    Session 21600 Slides
    Assignment 1 released: Empirical Regularities of Language: Evaluating Predictions and Counting Words
    2, Fri, 9/12 Text Classification: Naive Bayes, Logistic Regression, and Friends Session 21601 Slides

    Session 21600 Slides
    JM 4
    3, Tues, 9/16 Word Embeddings Session 21601 Slides

    Session 21600 Slides
    JM 5
    3, Fri, 9/19 Introduction to Neural Networks Session 21600 Slides

    Session 21601 Slides
    JM 6

    Goodfellow Chapter 6

    MIT 6.390 Intro to ML notes
    Assignment 1 due (11:59pm), Assignment 2 released: Predictive and Interpretive Text Classification
    4, Tues, 9/23 Beyond Words: Morphology, Syntax, and Semantics Session 21600 Slides

    Session 21601 Slides
    JM 17
    JM 19
    Instructions for your course project pitch
    4, Fri, 9/26 Sequence Data, Recurrent Networks, and Attention Session 21601 Slides

    Session 21600 Slides
    JM 13
    5, Tues, 9/30 Transformers Session 21601 Slides

    Session 21600 Slides
    JM 8 Assignment 2 due (11:59pm)
    5, Fri, 10/3 A Taxonomy of Large Language Models: Data, Weights, Training, and Inference Session 21601 Slides

    Session 21600 Slides
    Project pitch due (11:59pm)
    6, Tues, 10/7 Guest Lecture: Alexander Spangher (postdoc @ Stanford)

    at Si's session in Richards 236

    Assignment 3 released: Probing Transformers
    6, Fri, 10/10 Pretraining Session 21601 Slides

    Session 21600 Slides
    7, Tues, 10/14 Generation Algorithms Session 21601 Slides

    Session 21600 Slides
    7, Fri, 10/17 Post-Training: RLHF, DPO, and Friends Session 21601 Slides

    Session 21600 Slides
    Instructions for your submitting your research plan and sample data.
    8, Tues, 10/21 Tokenization, Prompting and In-context Learning Session 21601 Slides

    Session 21600 Slides
    Assignment 3 due (11:59pm)
    8, Fri, 10/24 Evaluation, Benchmarks, and Experimental Design Session 21601 Slides

    Session 21600 Slides
    Assignment 4 released: Language Model Decoding Algorithms
    9, Tues, 10/28 Retrieval, Retrieval-Augmented Generation, and Summarization Session 21601 Slides

    Session 21600 Slides
    9, Fri, 10/31 Multilinguality Session 21601 Slides

    Session 21600 Slides
    Research plan and sample data due (11:59pm)
    10, Tues, 11/4 Guest Lecture: Terra Blevins (Northeastern): Breaking the Curse of Multilinguality in Language Models

    at David's session in WVH 366

    This lecture will be based on three of Prof. Blevins' papers: MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling; Targeted Multilingual Adaptation for Low-resource Language Families; and Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models Project Presentation Guidelines
    10, Fri, 11/7 Beyond Individuals: Language in Social Context Session 21601 Slides

    Session 21600 Slides
    Assignment 4 due (11:59pm) Quiz 4
    11, Tues, 11/11 No Class, Veterans Day
    Instructions for your submitting your grade contract.
    11, Fri, 11/14 Guest Lecture: Niloofar Mireshghallah (META AI and CMU): What Does It Mean for Agentic AI to Preserve Privacy? Mapping the New Data Sinks and Leaks

    at David's session in WVH 366

    12, Tues, 11/18 Language and Visual Context
    Quiz 5
    12, Fri, 11/21 Guest Lecture: Lucy Li (postdoc @ UW, and incoming assistant prof @ University of Wisconsin-Madison)

    at Si's session in Richards 236

    13, Tues, 11/25 Final Lecture: [Student Suggested Topics]
    Grade contract due tomorrow, November 26 (11:59pm)
    13, Fri, 11/28 No Class, Happy Thanksgiving! 🦃
    14, Tues, 12/2 Project Presentations
    14, Fri, 12/5 Project Presentations
    15, Thurs, 12/11
    Final report due (11:59pm)