School of Computer Science › Language Technologies Institute › People › Faculty › Alexander Hauptmann

Alexander Hauptmann

Research Professor, Language Technologies Institute

Contact

5519 —Gates & Hillman Centers
alex(through)cs.cmu.edu
412-268-1448

Alexander Hauptmann is a Research Professor at Carnegie Mellon University's Language Technologies Institute (LTI). His research lies at the intersection of multimedia analysis, machine learning, and natural language processing, with a focus on developing systems for large-scale video and multimedia content analysis. Dr. Hauptmann has been instrumental in creating technologies for video search, automatic annotation, and multimedia event detection, bridging the gap between human understanding and computational systems in real-world contexts.

Dr. Hauptmann earned his Ph.D. in Computer Science from Carnegie Mellon University, following earlier degrees in Computer Science from Technische Universität Berlin and Psychology from Johns Hopkins University. His multidisciplinary background informs his innovative approach to AI challenges, particularly in understanding the interplay between language, vision, and human interaction. In addition to his role at CMU, he serves as a Visiting Professor at Japan's National Institute of Informatics, contributing to global advancements in AI and multimedia research.

Throughout his career, Dr. Hauptmann has received wide recognition for his contributions, including his pioneering work in speech recognition and multimedia indexing. At CMU, he collaborates with students and researchers to push the boundaries of AI for multimedia applications, consistently producing impactful research and fostering interdisciplinary innovation. His work continues to shape the future of intelligent systems in a multimedia-driven world.

Research Areas

Information Extraction
Summarization and Question Answering
Information Retrieval
Text Mining and Analytics
Machine Learning
Multimodal Computing and Interaction

Research Statement

My research interests revolve around the integration of text, image, video, and audio analysis. In the Informedia Project we built the News-on-Demand application, which is an instantiation of the Informedia Digital Video Library idea, based completely on automatic methods for processing television and radio news. Through the combination of the strengths of speech recognition, natural language processing, information retrieval and interface design, the system is able to overcome some of the shortfalls inherent in each of the component technologies.My goal is to utilize large corpora of "found data", i.e., data that is already available through the Internet or other readily accessible open sources, to improve speech and natural language processing by exploiting advantages across different modalities. It has become clear in recent years that large volumes of text, image, video, and audio can be easily stored and made available for research and applications. However, most of these sources were not produced with computer processing in mind. My intention is to design and build intelligent, understanding programs that help process data from these sources and make the data useful for other applications. This data can be used to improve speech recognition, image understanding, natural language processing, machine learning as well as information retrieval. The challenge is to find the right data, process it into suitable form for training, learning or re-use and build mechanisms that can successfully utilize this data.Speech and multimedia technology are about to make a major impact on our daily interaction with computers. What is needed at this point are clear demonstrations of the advantages of integrated speech and multimedia interfaces.

School of Computer Science

Alexander Hauptmann

Research Professor, Language Technologies Institute

Research Areas

Research Statement

Advisees

In the News

Hauptmann’s Lab Provides Pivotal Analysis of Astroworld Festival Tragedy

Multi-modal Stance Detection Across Social Groups

Semantic Descriptions Based on Detected Objects, Attributes and Actions

An Action (Human Motion) Generation from a High-Level Description