Teaching Publications Research

Si Wu (吴斯) Pronouns: she/her/hers

Khoury College of Computer Science, Northeastern University
Ph.D. student
Advisor: Prof. David A. Smith
Email: siwu@ccs[dot]neu[dot]edu
Twitter: @siwu_nlp

Research interests: I am interested in the application of natural language processing to social sciences (linguistics, cognitive science, and psychology in particular) as well as humanities.


9/7/23 I am presenting a poster at TADA this November in Amherst, and hope to chat with folks there! More details to come!

6/6/23 Our paper Composition and Deformance: Measuring Imageability with a Text-to-Image Model was accepted at the Workshop on Narrative Understanding at ACL 2023, and it's now on arXiv! Hope to see you in Toronto🇨🇦 in July!

4/28/23 We gave a talk on the Boston Globe Photo Archive project at the 2023 Greater Boston Digital Research & Pedagogy Symposium at MIT.


At Northeastern: multimodality, psycholinguistics, page layout analysis, and OCR.

At UC San Diego: style transfer in fonts, historical document processing (with Prof. Taylor Berg-Kirkpatrick). Flash memory error prediction (with Prof. Paul H. Siegel).


Google Scholar

(* indicates equal contribution)


"Composition and Deformance: Measuring Imageability with a Text-to-Image Model"
Si Wu and David A. Smith.
The 5th Workshop on Narrative Understanding at ACL, 2023

"Scalable Font Reconstruction with Dual Latent Manifolds"
Nikita Srivatsan, Si Wu, Jonathan Barron, and Taylor Berg-Kirkpatrick.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021

"Digital Editions as Distant Supervision for Layout Analysis of Printed Books" (pre-print)
*Alejandro Toselli, *Si Wu, and *David A. Smith.
In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2021

"Bad Page Detector for NAND Flash Memory"
Yi Liu, Si Wu, and Paul H. Siegel.
Non-Volatile Memories Workshop, 2020

"Quantifying Gaze Behavior During Real World Interactions Using Automated Object, Face, and Fixation Detection"
Leanne Chukoskie, Shengyao Guo, Eric Ho, Yalun Zheng, Qiming Chen, Vivian Meng, John Cao, Nikhita Devgan, Si Wu, and Pamela C. Cosman.
IEEE Transactions on Cognitive and Developmental Systems, 2018


"Data Archeology for Archival Preservation: Training OCR Models for Knowledge Extraction in Cultural Heritage Collections" Giulia Taurino, Si Wu, David Smith. The Association for Computers and the Humanities (ACH) 2023





Outreach teaching:


Northeastern University (2020 - Present)
Ph.D. in Computer Science

University of California, San Diego (2016 - 2020)
B.S. in Computer Engineering


I love stories and storytelling of all forms. I am particularly interested in film, singing, and photography. Additionally, I love tea, cooking, and traveling!

Human languages that I speak: English, Mandarin, Cantonese, French (beginner!).