Feature-Based Sentiment Analysis: An Introduction

The goal of this post is to provide a high level introduction to the core concepts of Sentiment Analysis. We'll define the Sentiment Analysis task, discuss the concepts of subjectivity and objectivity, and breifly discuss how Sentiment Analysis can be applied to extract specific feature-opinion pairs from text.

The Sentiment Analysis Task

What exactly is Sentiment Analysis? It's the classification and extraction of sentiment - opinions and their associated emotions - from text.

An opinion is a sentiment expressed on a specific entity, such as a product, person, organization, or location. It's expressed by an entity, and an entity that expresses an opinion is an opinion holder.

We use the term object to refer to the target entity of an opinion. An object may consist of a set of components and attributes, which we refer to as features. Opinions may also be expressed on features, and a feature may consist of a sub-set of features. An opinion about the object itself is a general opinion. An opinion about a feature of an object is a specific opinion.

The sentiment of an opinion - wether the opinion is positive, negative, or neutral - has several terms. It may be referred to as its orientation, polarity, or semantic orientation. Consider the following passage:

"The iPhone is great. However, the battery doesn't last very long."

It's comprised of 2 sentences, each of which express an opinion. The first is a positive general opinion about the iPhone object. The second is a negative specific opinion about the battery component feature.

Both of the features mentioned are explicit - they are explicitly referenced by name. However, features may also be implicit - not directly referenced, but implied. For example:

"The phone is a bit large."

This sentence contains a negative specific opinion about the implicit feature, "size".

Let's use these concepts to formally define the task:

An object o is comprised of a set of features F = {f1, f2,...,fn}. This includes a special feature that represents the object itself. Each feature fi may be explicitly represented by any term or phrase from the set Wi = {wi1, wi2,...,wim}, or implictly indicated by any term or phrase from the set Ii = {i1, i2,...,iq}.

An opinionated text document d is comprised of a set of sentences S={s1, s2, ... ,sm}. These sentences contain opinions expressed by a set of opinion holders {h1,h2, ... ,hq} on a set of objects {o1, o2, ... ,oq}. For each object oj, opinions are expressed on a subset of the object's features, Fj.

For each opinion in d, we seek to obtain a quintuple of information (hi, oj, fjk, oojk, t) where:

  • hi is the opinion holder.
  • oj is the target object of the opinion.
  • fjk is the target feature of the opinion.
  • oojk is the orientation of the opinion.
  • t is the time at which the opinion was expressed.

For each feature fjk we seek to identify all of its direct representations Wjk and implicit references Ijk.

Note that this is a simplified version. It only covers direct opinions, and omits comparative and indirect opinions. However, for the purpose of this introductory post, it'll do.

Subjectivity and Objectivity

Not all of the text in an opinionated document contains opinions. Generally, we use sentences as the basic units of text, so part of the task is identifying a document's opinionated sentences.

There are two types of opinionated sentence:

  • Subjective: Expresses feelings or beliefs.
  • Objective: Expresses factual information.

Opinions expressed in subjective sentences are explicit. For example, in this subjective sentence:

"The UI was intuitive and easy to use."

The positive ("intuitive", "easy to use") specific opinion expressed on the explicit feature "UI", is stated directly.

Opinions expressed in objective sentences are implicit. For example, in this objective sentence:

"I returned the phone after 2 days."

A negative general opinion of the phone is implied by the fact that the opinion holder returned it after such a short period of time.

This is an important distinction to keep in mind. Though objective sentences may be opinionated, nearly all unopinionated sentences are objective. And because sentiment is inherently subjective, we typically use subjective as a synonym for opinionated and objective as a synonym for unopinionated.

Approaches for Opinion Extraction

Identifying and extracting opinions is perhaps the most difficult sub-task of Sentiment Analysis. It involves modeling semantic meaning, which is a notoriously challenging problem in NLP. Advanced methods for opinion extraction range from using manually created POS patterns and opinion word lexicons in conjuction with Conditional Random Fields, to using Recurrant Neural Networks to perform unsupervised identification of expressive phrases. These approaches rely on advanced Machine-Learning and Deep Learning concepts, and are out of scope for this post. However, if you have the time and requisite knowledge, the above papers are worth a read.

A more basic approach is to simply disregard objective opinions and focus only on explicit subjective opinions, as outlined in this paper. This involves using a hand-generated set of POS tag patterns to identify explicit subjective opinions (such as noun-adjective pairs) applying a series of ML techniques to the identified opinions to determine opinion orientation and their target features. This is the approach we'll take to provide an example in the next post.

Conclusions

If you'd like to learn more, here's an excellent paper by Bing Liu that goes deeper into the details of the problem definition.

In the next post, we'll highlight the concepts we've learned so far by performing opinion extraction using the approach mentioned above.

Written by