Towards Querying and Mining of Large-Scale Networks

Author:

Khan, Arijit

Degree Grantor:

University of California, Santa Barbara. Computer Science

Degree Supervisor:

Xifeng Yan

Place of Publication:

[Santa Barbara, Calif.]

Publisher:

University of California, Santa Barbara

Creation Date:

2013

Issued Date:

2013

Topics:

Computer Science

Keywords:

Graph Query,
Big Graphs,
Graph Mining,
Information Networks, and
Social Networks

Genres:

Online resources and Dissertations, Academic

Dissertation:

Ph.D.--University of California, Santa Barbara, 2013

Description:

With the advent of the internet, sources of data have increased dramatically, including the World Wide Web, social networks, knowledge graphs, medical and government records. Oftentimes, relations exist among the entities in these data. Therefore, we observe structures in the data, but these structures are implicit, and not as rigid or regular as found in standard database systems. These semi-structured data are usually represented as large networks with labeled nodes and edges. Querying and mining of these linked datasets are essential for a wide range of emerging applications, such as viral marketing, web search, malware detection, image retrieval, and social networks analysis. However, the complex combinations of structure and content, coupled with the massive volume of these data, raise several challenges that require new efforts for smarter and faster graph analysis.

My research interests span the emerging problems in large-scale, heterogeneous, semi-structured data, with a focus on querying and pattern mining in social and information networks using scalable algorithms and machine learning techniques. My research on largescale graphs could be categorized into two broad directions: (1) querying of large-scale networks, including heterogeneous networks, uncertain and stream graphs, and (2) pattern mining over large graphs. In the domain of querying heterogeneous networks, due to noise and lack of schema, structured methods such as SPARQL -- which require an underlying schema to formulate a query --

are often too restrictive. Without knowing the exact structure of the data and the semantics of the entity labels and their relationships, can we still query them and obtain the relevant results? In addition, how do we query uncertain graphs and streams? In the area of graph pattern mining, what graph features one should extract in order to build an accurate and efficient classifier over large networks? From the perspective of advertising and viral marketing, what are the top-k most interesting itemsets and the top-k most influential persons in a social network? In my dissertation, I shall discuss our effective and efficient techniques to solve these emerging problems associated with querying and mining of complex Big-Graphs.

Physical Description: