Carnegie Mellon University

BIGDATA: Large-Scale Transductive Learning from Heterogenous Data Sources

Information Retrieval, Text Mining, and Analytics

By Yiming Yang

Important problems in the big-data era involve predictions based on heterogeneous sources of information and dependency structures in data. The fundamental research questions include how to develop a unified optimization framework for predictions based on heterogeneous information and dependency structures in various kinds of tasks; how to make the inference computationally tractable when the combined space of model parameters is extremely large; and how to significantly enhance the prediction power of the system by leveraging massively available unlabeled data in addition to human-annotated, often-sparse training data. This project will address these challenges using multiple approaches: a unified representation of heterogeneous information sources using product graphs, transductive learning over graph products, large-scale optimization algorithms, and evaluations in multiple important applications: The proposed new approach will be evaluated on benchmark data collections for context-aware collaborative filtering, semi-structured event detection and tracking, and expert finding via multisource social network analysis. If successful, the proposed work will offer principled solutions for enhancing the prediction power of systems in a broad range of tasks when recommendation, classification and regression are involved.