Tom Hartvigsen

Assistant Professor

Data Science

University of Virginia

Contact: hartvigsen@virginia.edu

(Office: 1919 Ivy Rd., Rm. 339)

Department Website

★ News ★

March'26:

Congratulations to group member Matt Landers for defending his PhD on Reinforcement Learning in Combinatorial Action Spaces!
3Cavaliers Funding Award ($80k) as PI to work on Automated Post-Surgery Feedback: AI-Driven Time Series Analysis to Improve Anesthesia Care.
Invited talk in UVA's Center for Advanced Medical Analytics

Feb'26:

Invited talk at UMD on Continually-Editing VLMs
New preprints on aligning benchmarks with human preferences and editing VLMs using human reasoning.

Jan'26:

Two papers accepted to ICLR 2026 on accelerating offline RL and uncertainty quantification for LLMs.
Paper accepted to MLSys on accelerating diffusion-based language models
Paper accepted to JAMIA on extracting social determinants of health from clinical notes

Nov'25:

Paper accepted to KDD 2026 on Instruction-Based Time Series Editing. Congratulations, Joy!
Featured interview on the AI Exchange @ UVA Podcast
Paper accepted to TMLR on Adaptively mixing language model training data via bayesian optimization

Oct'25:

Congratulations to group member Bryan Christ for defending his PhD!
Paper accepted to Scientific Reports on subtyping for Type 2 Diabetes

Sept'25:

Awarded $2M NIH grant over 2 years to develop A Model Editing Framework for Participatory Multimodal AI in Dermatology (with Ahmed Alaa and Roxana Daneshjou)
3 papers accepted to NeurIPS'25!
Paper accepted to NPJ Digital Medicine paper on risks of medical misinformation from LLMs
New preprints:

Aug'25:

4 papers accepted to EMNLP'25!
- Extracting Hidden Moderation Criteria from Reddit Communities
- Sparse Autoencoder Features for Classifications and Transferability
- ModelCitizens: Representing Community Voices in Online Safety
- Lifelong Knowledge Editing requires Better Regularization
Paper accepted to CIKM'25!
- Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing
New preprints:
- Instruction-Based Time Series Editing

July'25:

Paper accepted to COLM'25!
- PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Organizing NeurIPS'25 Workshop on Learning from Time Series for Healthcare

June'25:

Paper accepted to Rehabilitation Psychology on biased portrayals of disability by VLMs
New preprints:
- ModelCitizens: Representing Community Voices in Online Safety
- KScope: A Framework for Characterizing the Knowledge Status of Language Models
Featured in faculty research spotlight

May'25:

4 papers accepted to ACL'25!
3 papers accepted to ICML'25!
- WikiBigEdit: Understanding the limits of lifelong knowlege editing
- BalancEdit: Dynamically Balancing the Generality-Locality Trade-off in Multi-modal Model Editing
- Medical Large Language Model Benchmarks Should Prioritize Construct Validity (Oral)

March'25:

Our AAAI'25 KnowFM Workshop paper was awarded the Outstanding Paper Award.
New preprints:

Feb'25:

New preprints on empirical investigations of sparse autoencoders and lifelong model editing
Paper accepted to CPAL'25 on Sparse MoE

Jan'25:

I have been awarded a CapitalOne Faculty Fellowship to work on Time Series Reasoning!
2 papers accepted to ICLR'25!
- Composable Interventions for Language Models - Congrats, Arinbjorn!
- Learning from Time Series under Label Noise
Paper on sequential knowledge editing accepted to Workshop on Knowledgeable Foundation Models at AAAI 25
New preprint on lifelong model editing

Dec'24:

Invited talks at UVA's Genome Sciences Seminar Series and UVA's Darden Business School
New preprint on foundation models for protein phenotypes

Nov'24:

Lab member Xu Ouyang was awarded an iPRIME PhD Fellowship --- congrats Xu!
New preprint on scaling laws for LLM quantization

Oct'24:

New preprints:
Paper accepted to IEEE BigData on spike train classification

Sep'24:

3 papers accepted to NeurIPS'24!
- Are Language Models Actually Useful for Time Series Forecasting? (Spotlight!) - Congrats, Mingtian!
- Test-Time Debiasing of Vision-Language Embeddings
- UniTS: A Unified Multi-Task Time Series Model
3 papers accepted to EMNLP'24!
MATHWELL: Generating Educational Math Word Problems with Teacher Annotations - Congrats, Bryan!
Language Models Still Struggle to Zero-shot Reason about Time Series
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Aug'23:

Paper accepted to TMLR on using LLMs for robust text classification

July'24:

Paper accepted to COLM'24 on multilingual toxicity in LLMs.
Paper accepted to AIES'24 on detecting implicit social biases in VL models

July'24:

New preprints:
- composable interventions for LLMs
- extracting social determinants of health with LLMs
Paper accepted to MICCAI'24 on federated learning for medical imaging

May'24:

Paper accepted to ACL'24 on categorical knowledge editing for LLMs

Apr'24:

Nature Medicine paper on bias in computational pathology

Spring'24: Invited talks at Dartmouth, IBM Research, UCSF/UC Berkeley, and the University of Alabama, Birmingham

Hi! I'm a tenure-track Assistant Professor of Data Science and, by courtesy, Computer Science at the University of Virginia. I also have appointments in UVA's Comprehensive Cancer Center and National Security & Data Policy Institute. Before joining UVA in Fall 2023, I was a postdoc at MIT CSAIL working with Marzyeh Ghassemi. I received my PhD in Data Science from WPI where I was advised by Elke Rundensteiner and Xiangnan Kong.

Research

My research group works on machine learning and natural language processing. We work to enable responsible model deployment in ever-changing environments, especially for healthcare.

Active directions and highlights:

Continually monitoring and editing big AI models
- Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adapters (NeurIPS'23) (blog post)
- TAXI: Evaluating Categorical Knowledge Editing for Language Models (ACL'24)
- Composable Interventions for Language Models (ICLR'25)
- Understanding the Limits of Lifelong Knowledge Editing (ICML'25) (Available on EasyEdit!)
- BalancEdit: Dynamically Balancing the Generality-Locality Trade-off in Multi-modal Model Editing (ICML'25)
- Model Editing with External Graph-Based Memory (Findings of ACL'25)
- Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes (ACL'25)
- Lifelong Knowledge Editing requires Better Regularization (Findings of EMNLP'25)
- ReasonEdit: Editing Vision-Language Models using Human Reasoning
Time series AI and multi-modality
- Are Language Models Actually Useful for Time Series Forecasting? (NeurIPS'24 🌟Spotlight🌟)
- Language Models Still Struggle to Reason about Time Series (EMNLP'24)
- UniTS: A Unified Multi-Task Time Series Model (NeurIPS'24)
- Learning under Temporal Label Noise (ICLR'25)
- Instruction-Based Time Series Editing (KDD'26)
- Inferring Events from Time Series using Language Models (preprint)
- BEDTime: A Unified Benchmark for Automatically Describing Time Series (preprint)
Detecting and mitigating harmful biases in language and language models
- PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages (COLM'25)
- ModelCitizens: Representing Community Voices in Online Safety (EMNLP'25)
- Decoding the Rulebook: Extracting Hidden Moderation Criteria from Reddit Communities (EMNLP'25)
- PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models (COLM'24) // Leaderboard // blog post
- ToxiGen: Using LLMs to detect and mitigate implicit social biases (ACL'22). ToxiGen was used to train Llama2, Code Llama, phi-1.5, phi-2, and other LLMs, and to detect toxicity in Econ Forums and Laws.
Healthcare & Biomedical Data Science
- Medical Large Language Model Benchmarks Should Prioritize Construct Validity (ICML'25 🌟Oral Presentation🌟) (talk)
- Demographic Bias in Misdiagnosis by Computational Pathology Models (Nature Medicine)
- Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks (EMNLP'24)
- Dissecting the Heterogeneity of "In-the-Wild Stress" from Multimodal Sensor Data (npj Digital Medicine)
- MedBrowseComp: Benchmarking Medical Deep Research and Computer Use

Google Sites

Report abuse