Research

Publications

Handling Missing Responses under Cluster Dependence with Applications to Language Model Evaluation

Z Zeng, D Arbour, A Feller, I Dasgupta, AR Sinha, EH Kennedy

NeurIPS 2025, Nov 2025

Abstract Human annotations play a crucial role in evaluating the performance of GenAI models. Two common challenges in practice, however, are missing annotations (the response variable of interest) and cluster dependence among human-AI interactions (e.g., questions asked by the same user may be highly correlated). Reliable inference must address both...

arXiv → Google Scholar →

SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation

CY Lu, MM Tanjim, I Dasgupta, S Sarkhel, G Wu, S Mitra, S Chaterji

ICCV 2025, Oct 2025

Abstract We present SKALD, a multi-shot video assembly method that constructs coherent video sequences from candidate shots with minimal reliance on text. Central to our approach is the Learned Clip Assembly (LCA) score, a learning-based metric that measures temporal and semantic relationships between shots to quantify narrative coherence. We tackle...

Google Scholar →

VISIAR: Empower MLLM for visual story ideation

Z Xia, S Sarkhel, M Tanjim, S Petrangeli, I Dasgupta, Y Chen, J Xu, D Liu, ...

ACL Findings 2025, Aug 2025

Abstract Ideation, the process of forming ideas from concepts, is a big part of the content creation process. However, the noble goal of helping visual content creators by suggesting meaningful sequences of visual assets from a limited collection is challenging. It requires a nuanced understanding of visual assets and the...

Google Scholar →

SmartEdit: Editing-driven Engagement Prediction and Enhancement of Short-Videos

S Gupta, I Dasgupta, S Petrangeli, S Sarkhel

2025 IEEE International Conference on Multimedia and Expo (ICME), Jul 2025

Abstract Today, short-videos dominate social media, yet short-video creators lack systematic tools to predict engagement and refine content before uploading. Existing approaches focus primarily on post-publication metrics, failing to address engagement prediction via video editing elements. To address this, we curate VidES, a novel dataset linking short-vdieo engagement to specific...

Google Scholar →

HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances

S Narasimhaswamy, U Bhattacharya, X Chen, I Dasgupta, S Mitra, M Hoai

CVPR 2024, Jun 2024

Abstract Text-to-image generative models can generate high-quality humans but realism is lost when generating hands. Common artifacts include irregular hand poses shapes incorrect numbers of fingers and physically implausible finger orientations. To generate images with realistic hands we propose a novel diffusion-based architecture called HanDiffuser that achieves realism by injecting...

Google Scholar →

View All Publications on Google Scholar →

Patents

Enhancing artificial intelligence responses with contextual usage insights

Akash Vivek Maharaj, Vaishnavi Muppala, Shivakumar Vaithyanathan, Manas Garg, Kenneth George Russell, Ishita Dasgupta, Anup Bandigadi Rao, Aleksander Pejcic

US20250315460A1, Filed: Apr 2024, Published: Oct 2025

A system and method for enhancing AI assistant responses with contextual usage insights. The system determines whether to add contextual usage data to responses generated from application documentation, providing relevant context about how applications are being used.

View Patent →