A simple exploration of a Telegram chat-group – Telegram Group

Telegram

A simple exploration of a Telegram chat-group

What does a group of guys chat about?

A couple of days ago, one of my friends mentioned that it would be interesting to do a visualization of our group chat — to see things like who spoke the most, who spoke after who the most, and what were the words most commonly used. In this article, I show you how I did that using pandas, json, and plotly.

Post Outline

  1. Gathering Data
  2. Cleaning and Preparing Data
  3. Visualizing the Data

1. Gathering Data

The group that we were in is hosted on telegram and the data was easy to pull. As shown, you just had to use Telegram Desktop to export the files. You have the option of either exporting as a .html or .json but I would 100% recommend .json just because its easier for python to read.

2. Cleaning and Preparing Data

The cleaning of the data was less fun. The files came in a nested json file and hence had to be flattened (or normalized in python). So, we imported the files…

…and then proceeded to normalize it. So, how json_normalize works is that it basically unpackages a nested json based on the parent node. In this case, if we had just used the pandas read_json, it would have looked something like this.

As you can see, the actual messages are stored in the parent node “messages” and hence we unpackaged it by norm_msg = json_normalize(d['messages']) and pointed it to the parent node "messages".

Once that was done, I needed to extract the important columns. I took only the messages types that were labelled “message” because the output had also recorded random things like starting a poll, sending a location, sending a sticker etc. and these did not contain actual text. I then also filtered it to only give me the important components of date, text, and who the text was from. I then relabeled it to make it easier to understand.

I also noticed that some of the messages contained random characters and I needed to get rid of those so that proper words could be formed.

3. Visualizing the Data

Most common word used

Finally, we were done and ready to get on with the fun stuff! In order to count the number of words, I first joined all the words from the text column together before changing them all to lower caps and splitting them up. I then used value_counts and pulled the top 100 words used in the chat

I then plotted it out using plotly.express in a simple bar chart. Clearly the word "the" was extremely popular

Who sent the most messages?

In order to see who had sent the most messages, I then used value_counts on the ‘from’ column and plotted it out too.

Who responds to who?

This was an interesting one. I wanted to gauge interactions and see who replied the most after whom. I hence created a simple loop to create a list, recording which person tended to reply after which other person. From the looks of it, I guess person A had a lot of fun replying after himself.

Custom words

Finally, I wanted to a list of custom words that allowed me to see how many times the word had been used. I created a list of these words and then counted the number of times they appeared in the pd.Series from our text data.

I guess we all love each other a lot and have plenty of fun.

Conclusion

So there you have it! A fun little exploration of words and phrases in a group chat. Thanks for reading and hope you found it interesting!

Originally published at zachlim98.github.io/me

Ten articles before and after

How to Bulk Invite Members in your Telegram Group or Channel – Telegram Group

How to Scrape and Extract Members from Telegram Group – Telegram Group

UltimateGroupLinks Whatsapp Groups and Telegram groups – Telegram Group

data-rh=”true”>EasyCrypto – Yusuffbot – Medium – Telegram Group

1097+ Best Telegram Group Join Links List Updated – Telegram Group

Get Telegram Chat ID. How to get a Chat ID of a Telegram… – Telegram Group

How to Promote a Telegram Crypto Channel – Telegram Group

25080 Seconds to Hours | Telegram Channels

25100 Seconds to Hours | Telegram Channels

25140 Seconds to Hours | Telegram Channels

About Me

Pretium lorem primis senectus habitasse lectus donec ultricies tortor adipiscing fusce morbi volutpat pellentesque consectetur risus curae malesuada dignissim lacus convallis massa mauris.