Alexandria Digital Research Library

Opinion detection, sentiment analysis and user attribute detection from online text data

Author:
Bhattacharjee, Kasturi
Degree Grantor:
University of California, Santa Barbara. Computer Science
Degree Supervisor:
Linda Petzold
Place of Publication:
[Santa Barbara, Calif.]
Publisher:
University of California, Santa Barbara
Creation Date:
2016
Issued Date:
2016
Topics:
Computer science
Keywords:
Online social network analysis
Data mining
Sentiment analysis
Opinion mining
User attribute detection
Natural language processing
Genres:
Online resources and Dissertations, Academic
Dissertation:
Ph.D.--University of California, Santa Barbara, 2016
Description:

With the growing increase in the use of the internet in most parts of the world today, users generate significant amounts of online text on different platforms such as online social networks, product review websites, travel blogs, to name just a few. The variety of content on these platforms has made them an important resource for researchers to gauge user activity, determine their opinions and analyze their behavior, without having to perform monetarily and temporally expensive surveys. Gaining insights into user behavior enables us to better understand their likes and dislikes, which in turn is helpful for economic purposes such as marketing, advertising and recommendations. Further, owing to the fact that online social networks have recently been instrumental in socio-political revolutions such as the Arab Spring, and for awareness-generation campaigns by MoveOn.org and Avaaz.org, analysis of online data can uncover user preferences.

The overarching goal of this Ph.D. thesis is to pose some research questions and propose solutions, mostly pertaining to user opinions and attributes, keeping in mind the large quantities of noise present in online textual data. This thesis illustrates that with the extraction of informative textual features and the use of robust NLP and machine learning techniques, it is possible to perform efficient signal extraction from online text data, and use it to better understand user behavior. The first research problem addressed is that of opinion detection and sentiment analysis of users on a given topic, from their self-generated tweets. The key idea is to select relevant hashtags and n-grams using an l1-regularized logistic regression model for opinion detection. The second research problem deals with temporal opinion detection from tweets, i.e., detecting user opinions on a topic in which the conversation evolves over time.

For instance, on the widely-discussed topic of Obamacare (the Affordable Care Act in the U.S.), various issues became the focal points of discussion among users over time, as corresponding socio-political events and occurrences took place in real-time. We propose a machine-learning model based on seminal work from the sociological literature that is based on the premise that most opinion changes occur slowly over time. Our model is able to successfully capture opinions over time using publicly available tweets, as well as to uncover the key points of discussion as time progresses. In the third research problem, we utilize distributed representation of words in a method that determines, from user reviews, aspects of products and services that users like and dislike. We harness the contextual similarity between words and effectively build meta-features that capture user sentiment at a granular level.

Finally in the fourth research problem, we propose a method to detect the age of users from their publicly available tweets. Using a method based on distributed representation of words and clustering, we are able to achieve high accuracies in age detection, as well as to simultaneously discover topics of conversation in which users of different age groups engage.

Physical Description:
1 online resource (142 pages)
Format:
Text
Collection(s):
UCSB electronic theses and dissertations
ARK:
ark:/48907/f3dz08gr
ISBN:
9781369340013
Catalog System Number:
990047189110203776
Rights:
Inc.icon only.dark In Copyright
Copyright Holder:
Kasturi Bhattacharjee
File Description
Access: Public access
Bhattacharjee_ucsb_0035D_13125.pdf pdf (Portable Document Format)