Teaching Publications Research

Si Wu (吴斯) Pronouns: she/her/hers

Khoury College of Computer Science, Northeastern University
Ph.D. student
Advisor: Prof. David A. Smith
Email: siwu@ccs[dot]neu[dot]edu
Twitter: @siwu_nlp

Research interests: I am interested in the application of natural language processing to social sciences (linguistics, cognitive science, and psychology in particular) as well as humanities.


9/7/23 I am presenting a poster at TADA this November in Amherst, and hope to chat with folks there! More details to come!

6/6/23 Our paper Composition and Deformance: Measuring Imageability with a Text-to-Image Model was accepted at the Workshop on Narrative Understanding at ACL 2023, and it's now on arXiv! Hope to see you in Toronto🇨🇦 in July!

4/28/23 We gave a talk on the Boston Globe Photo Archive project at the 2023 Greater Boston Digital Research & Pedagogy Symposium at MIT.


At Northeastern: machine translation, multimodality, psycholinguistics, and some OCR-related tasks (handwriting recognition, layout analysis, table/figure extraction).

At UC San Diego: font reconstruction, briefly part of the Print & Probability project (with Prof. Taylor Berg-Kirkpatrick). Flash memory error prediction (with Prof. Paul H. Siegel).


Google Scholar

(* indicates equal contribution)


"Composition and Deformance: Measuring Imageability with a Text-to-Image Model"
Si Wu and David A. Smith.
The 5th Workshop on Narrative Understanding at ACL, 2023

"Scalable Font Reconstruction with Dual Latent Manifolds"
Nikita Srivatsan, Si Wu, Jonathan Barron, and Taylor Berg-Kirkpatrick.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

"Digital Editions as Distant Supervision for Layout Analysis of Printed Books" (pre-print)
*Alejandro Toselli, *Si Wu, and *David A. Smith.
In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2021

"Bad Page Detector for NAND Flash Memory"
Yi Liu, Si Wu, and Paul H. Siegel.
Non-Volatile Memories Workshop, 2020

"Quantifying Gaze Behavior During Real World Interactions Using Automated Object, Face, and Fixation Detection"
Leanne Chukoskie, Shengyao Guo, Eric Ho, Yalun Zheng, Qiming Chen, Vivian Meng, John Cao, Nikhita Devgan, Si Wu, and Pamela C. Cosman.
IEEE Transactions on Cognitive and Developmental Systems, 2018


"Data Archeology for Archival Preservation: Training OCR Models for Knowledge Extraction in Cultural Heritage Collections" Giulia Taurino, Si Wu, David Smith. The Association for Computers and the Humanities (ACH) 2023





Outreach teaching:


Northeastern University (2020 - Present)
Ph.D. in Computer Science

University of California, San Diego (2016 - 2020)
B.S. in Computer Engineering


In my spare time, I enjoy photography, singing, ceramics, and traveling, and I am also a bit of a cinephile. I used to work on some hardware projects (Mars rovers, circuits, and embedded systems), so I am always down to build a fun gadget!

Human languages that I speak: English, Mandarin, Cantonese, and French (beginner!).