Uncovering Interesting Attributed Anomalies in Large Graphs

Author:

Li, Nan

Degree Grantor:

University of California, Santa Barbara. Computer Science

Degree Supervisor:

Xifeng Yan

Place of Publication:

[Santa Barbara, Calif.]

Publisher:

University of California, Santa Barbara

Creation Date:

2013

Issued Date:

2013

Topics:

Computer Science

Keywords:

Vertex Classification,
Anomaly Detection,
Graph Indexing,
Graph Querying,
Mixture Models, and
Graph Mining

Genres:

Online resources and Dissertations, Academic

Dissertation:

Ph.D.--University of California, Santa Barbara, 2013

Description:

Graph is a fundamental model for capturing entities and their relations in a wide range of applications. Examples of real-world graphs include the Web, social networks, communication networks, intrusion networks, collaboration networks, and biological networks. In recent years, with the proliferation of rich information available for real-world graphs, vertices and edges are often associated with attributes that describe their characteristics and properties. This gives rise to a new type of graphs, namely attributed graphs. Anomaly detection has been extensively studied in many research areas, and finds important applications in real-world tasks such as financial fraud detection, spam detection and cyber security. Anomaly detection in large graphs, especially graphs annotated with attributes, is still under explored. Most of existing work in this aspect focuses on the structural information of the graphs. In this thesis, we aim to address the following questions: How do we define anomalies in large graphs annotated with attributive information? How to mine such anomalies efficiently and effectively?

A succinct yet fundamental anomaly definition is introduced: given a graph augmented with vertex attributes, an attributed anomaly refers to a constituent component of the graph, be it a vertex, an edge, or a subgraph, exhibiting abnormal features that deviate from the majority of constituent components of the same nature, in a combined structural and attributive space. For example in a social network, assume there exists a group of people, most of whom share similar taste in movies, whereas the majority of social groups in this network tend to have very diverse interests in movies; or in a collaboration network, there exists a group of closely connected experts that possess a set of required expertise, and such a group occurs scarcely in this network; we consider the groups in both scenarios as "anomalous". Applications of this research topic abound, including target marketing, recommendation systems, and social influence analysis. The goal of this work therefore is to create efficient solutions to effectively uncover interesting anomalous patterns in large attributed graphs.

In service of this goal, we have developed several frameworks using two types of approaches: (1) combinatorial methods based on graph indexing and querying; (2) statistical methods based on probabilistic models and network regularization.

Physical Description: