*Machine Learning Approaches to Detecting Sentiment and Influence in Social Media*
The explosion of user-generated content on the Web has led to new opportunities and significant challenges for companies that are increasingly concerned about monitoring the discussion around their products. As such, marketing organizations need to be aware of what people are saying in influential blogs, how the expressed opinions could impact their business, and how to extract business insight and value from these blogs. This has given rise to the emerging discipline of Social Media Analytics, which draws from Social Network Analysis, Machine Learning, Data Mining, Information Retrieval, and Natural Language Processing. This talk discusses two fundamental challenges in the analysis of social media – detecting sentiment and identifying influence in networks.
Sentiment Analysis focuses on the task of automatically identifying whether a piece of text expresses a positive or negative opinion about the subject matter. Most previous work in this area uses prior lexical knowledge in terms of the sentiment-polarity of words. In contrast, some recent approaches treat the task as a text classification problem, where they learn to classify sentiment based only on labeled training data. In this talk, we present a unified framework in which one can use background lexical information in terms of word-class associations, and refine this information for specific domains using any available training examples. This work has led to the formulation of a general Machine Learning framework called Dual Supervision, where classifiers can be built using both example labels and “feature labels.”
Much work in Social Network Analysis has focused on the identification of the most important actors in a social network. This has resulted in several measures of influence, authority, centrality or prestige. Most of such sociometrics (e.g., PageRank) are driven by intuitions based on an actor’s location in a network. It is our position that asking for the “most influential” actors is an ill-posed question, unless it is put in context with a specific measurable task. Constructing a predictive task of interest in a given domain provides a mechanism to quantitatively compare different measures of influence. Furthermore, when we know what type of actionable insight to gather, we need not rely on a single network prestige measure. A combination of measures is more likely to capture various aspects of the social network that are predictive and beneficial for the task. In order to do this, we introduce supervised rank aggregation techniques and show the benefits of locally-optimal order-based rank aggregation. We illustrate these ideas through a case study on a data set of 40 million Twitter users, where we study measures of influence in the context of predicting when users will be rebroadcasted.