A Simple sentiment analysis of tweets containing with the help of AWS/TWEEPY

Maximilian Pollak
5 min readJul 9, 2021

Come with me on a small journey exploring the Twitter API and Boto3 for AWS

Photo by Claudio Schwarz on Unsplash

The Motivation For this project, it’s simple. I’m currently attending a Bootcamp by IronHack to become a Data Analyst and within this we have to turn in different project, in order to learn more and get familiar with learned concepts.
This time we had free choice of our project, the only catch was that it should be related to Data Analytics, obviously.

So I decided to deepen my knowledge in the handling of API and also peek inside AWS a bit, and came up with the concept idea that I will present you in this article.

Analyse tweets containing specific hashtags (stock tickers), and get the overall sentiment from them.

There is quite a lot of libraries used in this project but the main ones are:
Matplotlib, Pandas, Tweepy, Boto3, Yfinance, Json and Streamlit
So let’s start off with importing these to our file.

First we need to figure out if the hashtag that the user entered is an actually stock that we can find information to.
We do this by making a function called ‘check_ticker’ that takes as an argument the Ticker entered (in a streamlit input text box → 2nd Picture)

Check_twitter function.
get the Stock Ticker from the user.

Since ‘check_twitter’ will return a boolean, we can wrap our entire logic that grabs the data and calculates the sentiment in a simple if statement.
Next we need to make a call to the twitter API via tweepy. To accomplish this we first need to authenticate ourselfs.

the wait_on_rate_limit=True option will allow tweepy to wait automatically the right time if it exceeds the rate_limit.

Once that is done we can continue and grab the tweets we want from twitter via tweepy.

Now this might look like a lot at first, but let me try and break it down a bit.
First notice that all arguments passed into the function have default values.
This makes the recursion call that we use later in the function a lot easier, plus makes our script more robust.

Second, the actual query takes place in the “for i in t_api.search(…)” line.
Here we pass a lot of arguments, to make sure we get no retweeted tweets, as well as only tweets in the English language, and to retrieve only tweets since the specified date. Notice as well the “max_id=min_id”.
This allows us, if the user requests more than 100tweets, to do pagination, meaning we will not grab any tweet that is newer than the oldest one in the first request.
To know how many tweets the user wants we just look at the “max_count” argument.
As well max_count_int(which increases for every tweet) allows us to see if we have reached the number already, if yes the function returns a list of all until now saved tweets, as well as the total count of the characters in those tweets, if no the function calls itself again passing in the arguments needed to return where it left of.

As I have mentioned in the beginning we are also using streamlit, this is how we get inputs from the user. The simple user interface you can see here

These inputs allow us to get our final output which gives us some information about the Company, as well as a graph of the stock-price and the Sentiment analysis, so that our final output will look like this.

We have the company logo, name, industry as well as current price, price change in % over the last days(however many days where chosen).
Now we probably should see how we did all that, the first couple of things are quite easy they are just a lookup in the “ticker.info” dictionary that we get from yfinance. We also in the same function already make the graph, since yfinance will give us all the information we need to plot it.

df_price is the dictionary that has all the information that we need to display everything but the % change and the plot. That information is taken from df which is the ticker history not ticker info.

The sentiment is just checking how many tweets got which sentiment, and then picking the one that appears the most to display as the overall one. And to get the sentiment we simply iterate over all the tweets we have and pass it to the Boto3 “comprehend” function.

Get_sentiment Function. I’ve changed the colour scheme here to make it easier to read this particular function.

And now the final thing missing is just calculate what the sentiments are, I have done this most likely done this in a not very efficient way but it works and is relatively fast.

Repeat this for all sentiments. (“POSITIVE”/”MIXED”/”NEGATIVE”/”NEUTRAL”)

Now what I did here was to wrap the function in a try/except because we do not know beforehand if each sentiment will be represented in the tweets. If it’s not. I will just return a str and the error instead of the values.

Once we have all that, the only thing left to build basically is the website. I’m not gonna go in detail here about how streamlit works exactly, I will try and write a different article about that in the future. But you can take a look at what the finished code looks like underneath.

We are basically checking if the search button has been pressed, and if so we want to run our code that calculates everything. As you may have noticed we also have a “clear” button, this basically just sets the search buttons state to False which means our displayed things disapear.

I hope you could learn something, maybe at least how not to do it in a very efficient manner. I quite enjoyed writing this article and I’m sure I will write more in the future.

If you have any comments or questions, please do not hesitate to leave them below.

--

--

Maximilian Pollak

A 27 year old aspiring Data Engineer with an interest for programming, investing, reading and science. I hope to learn lots here