One semester is not nearly enough to cover all the methods, tasks, and data in the field of natural language processing. The course project is a chance to focus on one or a small number of tasks, models, or approaches that you find interesting. You may work in teams of one to four students. Although it would be easier to work with others in the same section, it is permitted to collaborate across sections 21600 and 21601.
The first step in your project is to submit an initial pitch, with two key pieces of information:
- a list of the members of your project team; and
- one paragraph of five to ten sentences, describing the NLP approach you will implement and the task and data you will evaluate on.
Please submit this information as a PDF on Gradescope. We will then give feedback on the scope of your pitch, commenting on whether it seems to narrow or too broad for a team of your size, and adding further suggestions for refinement.
Now that you've formed your teams and gotten feedback on your initial pitch, you can start working. To structure your thinking, we would like you to write a brief research plan and explain some sample data. Your plan should include at least one paragraph under each of the following headings:
- Summarize the topic of your project. This may repeat what you said in the pitch, or you may amend it.
- What data will you use? If you are using an existing dataset. Please describe this dataset. If you are collecting your own data, briefly describe how you plan to collect it, and how much of it have you already collected.
- What experiments will you run?
- What evaluation methods and metrics will you use?
- What roadblocks to completing your project have you encountered? If any, discuss how you might get around them or how you might modify your plan.
- What data will you annotate? Give one example document/passage/sentence, at whatever level of granularity you are annotating. If it is longer than one page, show the first page. What possible labels are there? What label(s) do you assign to this example?
Please submit this information as a PDF on Gradescope. We will give you feedback on it.
Hopfully your projects are going well so far. Since every team's project will be different, it is hard to compare the exact amount of work each team will perform. We are asking each team, therefore, to write a specification, or grade contract for what you would need to accomplish to achieve various letter grades on your project. Since this is an upper-level class, we will start at B. Write down four sets of milestones:
- what milestones your team should accomplish to receive a B;
- what additional milestones your team should accomplish to receive a B+;
- what additional milestones your team should accomplish to receive an A-; and
- what additional milestones your team should accomplish to receive an A.
Each set of milestones should be detailed enough so it will be clear whether you've met them. Please try to be as realistic as you can, and focus on what you can control. For instance, it would be a bad idea to promise to meet a certain accuracy level; instead, talk about the approach you will pursue, the experiments you will conduct, or the things you will implement.
Submit this as a PDF on Gradescope, and we will provide feedback on any adjustments we think you should make to these milestones.
The final step is to submit your final report. This should be a PDF document in single-column format. Text should be single-spaced, i.e., depending on your formatting it might be 1-1.15 line spacing. Your report should include the following sections, for some of which we have specified minimum page counts:
- State your project title and the members of your team. Remember also to add all teammates to the Gradescope submission.
- List all the milestones from your grade contract and, for each one, describe how you met them (or not).
- For multi-person teams, describe what each member contributed.
- Describe the research question you are trying to answer.
- Describe related work.
- Describe the data you collected, the data you annotated, and what your annotation scheme was. For instance, you might list the classes along with the descriptions of them you gave to annotators for a classification problem. (at least 1 page)
- Provide one example of a document or data instance you annotated. Depending on the task, this might be very brief, but you may truncate it if it's longer than one page.
- Describe any data processing or model training you did.
- Describe experiments you ran. (at least 1 page)
- Describe your analysis of the experimental results, including visualization, statistical tests, or other methods for understanding your results. (at least 2 pages)
- Provide links to websites, such as GitHub or Google Drive, where your code and data live. These may be public or private, as long as the professors can read them.
- Summarize how your findings relate to your initial research questions.
Project Goals and Topics
To give you some context for the initial pitch, here is some information on the goals of the project and possible topics. There are three key components of any project in this course:
- annotating natural language data,
- performing natural language inference, and
- evaluating performance.
Annotating natural language data: Although there are many datasets available, students in this course should have experience looking at natural language data and annotating it to help train and evaluate models. In most cases, you will supplement data you annotate with other datasets, but annotating data yourself helps you think through the problem and evaluation. If you are defining a new task, you might spend more time on creating new evaluation data. If you are evaluating a new approach to an old task, however, you should still create some evaluation data to help develop your intuition about the task.
Performing natural language inference: Choose a task or define a new one. Choose models appropriate for your task.
Evaluating performance: Choose an evaluation protocol and evaluation metric(s) suitable for your task. Analyze the results of an NLP system to help find strengths and weaknesses, or kinds of data where it performs better or worse.
We will discuss these steps in more detail in future project stages.
Here are some suggestions for general directions your project might take. They are only suggestions, and you do not need to choose from this list.
- Define a new document classification task, with difficult-to-define criteria. For example, classify scenes in novels that are flashbacks or classify posts or articles that give advice about healthy eating. Evaluate the accuracy of large pretrained LLMs on this task, use them to annotate example data, and train a lightweight classifier on these synthetic annotations.
- Evaluate one or more LLMs for how they differ in what they output, or differ in what they refuse to output, depending on users' self-description. Generate user profiles with either explicit information (e.g., gender, age, political party) or implicit information (e.g., favorite movies, sports teams) and then observe differences in how LLMs answer sensitive questions when prompted with this information.
- Probe one or more pretrained language models for how their internal weights at different layers represent information such as named entities, grammatical relations, or semantics, such as whether nouns are abstract or concrete.