Understanding the Semantics of Networked Text
- Degree Grantor:
- University of California, Santa Barbara. Electrical & Computer Engineering
- Degree Supervisor:
- Louise E. Moser and Xifeng Yan
- Place of Publication:
- [Santa Barbara, Calif.]
- Publisher:
- University of California, Santa Barbara
- Creation Date:
- 2012
- Issued Date:
- 2012
- Topics:
- Computer Science
- Keywords:
- Social Routing,
Data Mining,
Information Retrieval, and
Heterogeneous Information Network - Genres:
- Online resources and Dissertations, Academic
- Dissertation:
- Ph.D.--University of California, Santa Barbara, 2012
- Description:
Social networks are a powerful means for information sharing. A large social network typically has hundreds of millions of users. These users are interconnected through social links to friends, colleagues, family members, etc. The frequent interaction and information exchange between users form a massive heterogeneous information network. Understanding the semantic information in the textual data and the topological information in the social network poses a grant challenge for data mining researchers. This Ph.D. dissertation tackles the problem of understanding the unstructured or semi-structured data in social networks. First, we describe a parallel spectral clustering algorithm that makes possible clustering analysis on large-scale social networks with hundreds of millions of users. Comprehensive analysis, extraction and integration of information from multiple sources are necessary. Next, we describe an information extraction engine that extracts data items from Web pages without knowing the data wrapping template. We also present an information integration approach to aggregate data tables collected from the Web and hence better serve general Web search. To make information routing in collaborative networks more efficient, we describe generative models to characterize expertise awareness relationships between agents in collaborative networks and provide efficient task routing recommendations. We also describe, in depth, the first quantitative analysis of the information flow efficiency in collaborative networks. To utilize the accumulated information, we developed a topic modeling approach that allows document retrieval across multiple document sets with possible semantic gaps and vocabulary gaps.
- Physical Description:
- 1 online resource (261 pages)
- Format:
- Text
- Collection(s):
- UCSB electronic theses and dissertations
- Other Versions:
- http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqm&rft_dat=xri:pqdiss:3540251
- ARK:
- ark:/48907/f34x55r5
- ISBN:
- 9781267649188
- Catalog System Number:
- 990038915680203776
- Copyright:
- Gengxin Miao, 2012
- Rights:
- In Copyright
- Copyright Holder:
- Gengxin Miao
Access: This item is restricted to on-campus access only. Please check our FAQs or contact UCSB Library staff if you need additional assistance. |