Publications

PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

Onkar Susladkar, Tushar Prakash, Adheesh Juvekar, Kiet A Nguyen, Dong-Hwan Jang, Inderjit S Dhillon, Ismini Lourentzou

arXiv preprint arXiv:2601.16210 (submitted to CVPR) • 2026

Discrete video VAEs underpin modern text-to-video generation and video understanding systems, yet existing tokenizers typically learn visual codebooks at a single scale with limited vocabularies and s...

Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

Xinzhuo Li*, Adheesh Juvekar*, Xingyou Liu, Muntasir Wahed, Kiet A Nguyen, Ismini Lourentzou

arXiv preprint arXiv:2506.21546 (submitted to CVPR) • 2026

Segmentation Vision-Language Models (VLMs) have significantly advanced grounded visual understanding, yet they remain prone to pixel-grounding hallucinations, producing masks for incorrect objects or ...

* Equal contribution

CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models

CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models

Kiet A Nguyen, Adheesh Juvekar, Tianjiao Yu, Muntasir Wahed, Ismini Lourentzou

Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) • 2025

Recent advances in Large Vision-Language Models (LVLMs) have enabled general-purpose vision tasks through visual instruction tuning. While existing LVLMs can generate segmentation masks from text prom...

Prima: Multi-image vision-language models for reasoning segmentation

Prima: Multi-image vision-language models for reasoning segmentation

Muntasir Wahed*, Kiet A Nguyen*, Adheesh Juvekar, Xinzhuo Li, Xiaona Zhou, Vedant Shah, Tianjiao Yu, Pinar Yanardag, Ismini Lourentzou

arXiv preprint arXiv:2412.15209 • 2024

Despite significant advancements in Large Vision-Language Models (LVLMs) capabilities, existing pixel-grounding models operate in single-image settings, limiting their ability to perform detailed, fin...

* Equal contribution

MetaCompare 2.0: Differential ranking of ecological and human health resistome risks

MetaCompare 2.0: Differential ranking of ecological and human health resistome risks

Monjura Afrin Rumi, Min Oh, Benjamin C Davis, Connor L Brown, Adheesh Juvekar, Peter J Vikesland, Amy Pruden, Liqing Zhang

FEMS Microbiology Ecology • 2024

While numerous environmental factors contribute to the spread of antibiotic resistance genes (ARGs), quantifying their relative contributions remains a fundamental challenge. Similarly, it is importan...

Uncertainty in Action: Confidence Elicitation in Embodied Agents

Uncertainty in Action: Confidence Elicitation in Embodied Agents

Tianjiao Yu, Vedant Shah, Muntasir Wahed, Kiet A. Nguyen, Adheesh Juvekar, Tal August, Ismini Lourentzou

arXiv preprint arXiv:2503.10628 • 2025

Expressing confidence is challenging for embodied agents navigating dynamic multimodal environments, where uncertainty arises from both perception and decision-making processes. We present the first w...