Alexandria Digital Research Library

Understanding the Semantics of Networked Text

Author:
Miao, Gengxin
Degree Grantor:
University of California, Santa Barbara. Electrical & Computer Engineering
Degree Supervisor:
Louise E. Moser and Xifeng Yan
Place of Publication:
[Santa Barbara, Calif.]
Publisher:
University of California, Santa Barbara
Creation Date:
2012
Issued Date:
2012
Topics:
Computer Science
Keywords:
Social Routing
Data Mining
Information Retrieval
Heterogeneous Information Network
Genres:
Online resources and Dissertations, Academic
Dissertation:
Ph.D.--University of California, Santa Barbara, 2012
Description:

Social networks are a powerful means for information sharing. A large social network typically has hundreds of millions of users. These users are interconnected through social links to friends, colleagues, family members, etc. The frequent interaction and information exchange between users form a massive heterogeneous information network. Understanding the semantic information in the textual data and the topological information in the social network poses a grant challenge for data mining researchers. This Ph.D. dissertation tackles the problem of understanding the unstructured or semi-structured data in social networks. First, we describe a parallel spectral clustering algorithm that makes possible clustering analysis on large-scale social networks with hundreds of millions of users. Comprehensive analysis, extraction and integration of information from multiple sources are necessary. Next, we describe an information extraction engine that extracts data items from Web pages without knowing the data wrapping template. We also present an information integration approach to aggregate data tables collected from the Web and hence better serve general Web search. To make information routing in collaborative networks more efficient, we describe generative models to characterize expertise awareness relationships between agents in collaborative networks and provide efficient task routing recommendations. We also describe, in depth, the first quantitative analysis of the information flow efficiency in collaborative networks. To utilize the accumulated information, we developed a topic modeling approach that allows document retrieval across multiple document sets with possible semantic gaps and vocabulary gaps.

Physical Description:
1 online resource (261 pages)
Format:
Text
Collection(s):
UCSB electronic theses and dissertations
ARK:
ark:/48907/f34x55r5
ISBN:
9781267649188
Catalog System Number:
990038915680203776
Rights:
Inc.icon only.dark In Copyright
Copyright Holder:
Gengxin Miao
Access: This item is restricted to on-campus access only. Please check our FAQs or contact UCSB Library staff if you need additional assistance.