Si Wu (吴斯) Pronouns: she/her/hers
Khoury College of Computer Science, Northeastern University
Ph.D. student
Advisor: David Smith
Email: siwu@ccs[dot]neu[dot]edu
Twitter: @siwu_nlp
Research interests: I am interested in the application of natural language processing to social sciences (linguistics, cognitive science, and psychology in particular) as well as humanities. My current research is focusing on multilinguality and multimodality.
🔊News
9/7/23 I am presenting a poster at TADA this November in Amherst, and hope to chat with folks there! More details to come!
6/6/23 Our paper Composition and Deformance: Measuring Imageability with a Text-to-Image Model was accepted at the Workshop on Narrative Understanding at ACL 2023, and it's now on arXiv! Hope to see you in Toronto🇨🇦 in July!
4/28/23 We gave a talk on the Boston Globe Photo Archive project at the 2023 Greater Boston Digital Research & Pedagogy Symposium at MIT.
🔎Research
At Northeastern: machine translation, multimodality, psycholinguistics, and some OCR-related tasks (handwriting recognition, layout analysis, table/figure extraction).
At UC San Diego: font reconstruction, briefly part of the Print & Probability project (with Prof. Taylor Berg-Kirkpatrick). Flash memory error prediction (with Prof. Paul H. Siegel).
📝Publications
Google Scholar(* indicates equal contribution)
Papers:
"Composition and Deformance: Measuring Imageability with a Text-to-Image Model"
Si Wu and David A. Smith.
The 5th Workshop on Narrative Understanding at ACL, 2023
"Scalable Font Reconstruction with Dual Latent Manifolds"
Nikita Srivatsan, Si Wu, Jonathan Barron, and Taylor Berg-Kirkpatrick.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
"Digital Editions as Distant Supervision for Layout Analysis of Printed Books" (pre-print)
*Alejandro Toselli, *Si Wu, and *David A. Smith.
In Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), 2021
"Bad Page Detector for NAND Flash Memory"
Yi Liu, Si Wu, and Paul H. Siegel.
Non-Volatile Memories Workshop, 2020
"Quantifying Gaze Behavior During Real World Interactions Using Automated Object, Face, and Fixation Detection"
Leanne Chukoskie, Shengyao Guo, Eric Ho, Yalun Zheng, Qiming Chen, Vivian Meng, John Cao, Nikhita Devgan, Si Wu, and Pamela C. Cosman.
IEEE Transactions on Cognitive and Developmental Systems, 2018
Others:
"The language of US partisan newspapers from 1869 to 1925" Si Wu, David Smith. New Directions in Analyzing Text as Data (TADA) 2023
"Data Archeology for Archival Preservation: Training OCR Models for Knowledge Extraction in Cultural Heritage Collections" Giulia Taurino, Si Wu, David Smith. The Association for Computers and the Humanities (ACH) 2023
"Archeologies of data in contemporary journalism: The digital afterlives of newspapers' photo morgues" Giulia Taurino, Si Wu, David Smith. Computation + Journalism (C+J2022)
🗣️Talks
- 06/2022, Archeologies of Data in Contemporary Journalism: the digital afterlives of newspapers’ photo morgues, Computation + Journalism Conference (C+J2022) at Columbia School of Journalism.
👩🏻🏫Teaching
TA:
- CS 6200/ IS 4200: Information Retrieval (Fall 2023, Northeastern)
Tutor:
- CSE 100: Advanced Data Structures (Winter 2019, Spring 2019, UCSD)
- CSE 30: Computer Organization and Systems Programming (Fall 2019, UCSD)
- CSE 8B: Java Programming II (Spring 2018, UCSD)
Outreach teaching:
- Splash @ UCSD 2019: Build My First Website
🎓Education
Northeastern University (2020 - Present)
Ph.D. in Computer Science
University of California, San Diego (2016 - 2020)
B.S. in Computer Engineering
🍿Miscellaneous
In my spare time, I enjoy ceramics, singing, photography, and traveling. I used to work on some hardware projects (Mars rovers, circuits, and embedded systems), so I am always down to build a fun gadget!
Human languages that I speak: English, Mandarin, Cantonese, and French (beginner!).