Sentiment Analysis on long and structured texts

Question

I'm trying to learn how sentiment analysis based on machine learning techniques works by reading guides online and papers from the academia world and I'm struggling to understand the following:

Why don't people run - or, at least, hardly ever - sentiment analysis
on long and structured text like newspaper articles or speeches
transcripts?

I noticed it's pretty common analyzing reviews and newspapers headlines as they are short in terms of characters. So... I was wondering if it is just because of the computational power and time required to train ML algorithms (thinks about neural networks) or because other reasons.
Can someone help me to understand?

desertnaut · Accepted Answer

I was wondering if it is just because of the computational power and time required to train ML algorithms

It is not because of that; it is arguably because a long and structured text may probably contain segments of "positive" sentiment along with "negative" ones, it can be infinitely more subtle and nuanced, and in principle trying to simply label it overall as "positive/negative" (or even adding a couple more sentiment categories) is futile, unproductive, and at the end of the day hardly useful.
Andrew Ng has famously said:

If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.

and this is exactly the idea behind sentiment analysis: for short text excerpts, and especially for the kinds of text sentiment analysis is usually deployed for (reviews, tweets etc), a typical person has no difficulty in classifying them into such a short list of possible sentiments; additionally, this is a task we want to automate, so that we can do it massively and in scale without having to put a person going through them one by one (not a scalable approach).
These are requirements that normally do not apply to long and structure texts, like (long) newspaper articles, essays etc.; and in these cases, it is not unheard of for people reading them to disagree if, overall, they are "positive", "negative", supportive, contrarian etc (you get the idea), so any thought of actually delegating such a task to an ML model is actually beyond consideration, at least for the present, and not for lacking computing power.

Kasra Manshaei · Answer

The main reason would be the density and diversity of sentiments in long texts. Assuming the presence of a certain sentiment (positive, negative), it can be measured easier within a short text as the probability of having more than one subject or more than one specific sentiment about a subject, is less.
If you read a long text, there might be several different subjects and several different sentiment about each subject to be estimated.
As an example, assume a one-paragraph opinion about a movie. It will most probably summarize an overall opinion about the movie and will be easier to be captured.
But a two page criticism on a movie might consider very good acting, so-so music and very bad special effects. Then the whole problem of sentiment analysis will change here. Are we talking about each sentiment on each subject or an overall sentiment on whole movie? Is overall sentiment on whole movie is the average of all sentiments over subjects or detecting the sentiment of the paragraph which summarizes the whole article?
Even in almost short reviews when the writer writes about several subjects in review rather than being to-the-point about on central subject, sentiment analysis is not the easiest task.

Sentiment Analysis on long and structured texts

2 Answers

Add your own answers!

Ask a Question