A simple exploration of a Telegram chat-group – Telegram 中文版

Telegram

A simple exploration of a Telegram chat-group

What does a group of guys chat about?

A couple of days ago, one of my friends mentioned that it would be interesting to do a visualization of our group chat — to see things like who spoke the most, who spoke after who the most, and what were the words most commonly used. In this article, I show you how I did that using pandas, json, and plotly.

Post Outline

  1. Gathering Data
  2. Cleaning and Preparing Data
  3. Visualizing the Data

1. Gathering Data

The group that we were in is hosted on telegram and the data was easy to pull. As shown, you just had to use Telegram Desktop to export the files. You have the option of either exporting as a .html or .json but I would 100% recommend .json just because its easier for python to read.

2. Cleaning and Preparing Data

The cleaning of the data was less fun. The files came in a nested json file and hence had to be flattened (or normalized in python). So, we imported the files…

…and then proceeded to normalize it. So, how json_normalize works is that it basically unpackages a nested json based on the parent node. In this case, if we had just used the pandas read_json, it would have looked something like this.

As you can see, the actual messages are stored in the parent node “messages” and hence we unpackaged it by norm_msg = json_normalize(d['messages']) and pointed it to the parent node "messages".

Once that was done, I needed to extract the important columns. I took only the messages types that were labelled “message” because the output had also recorded random things like starting a poll, sending a location, sending a sticker etc. and these did not contain actual text. I then also filtered it to only give me the important components of date, text, and who the text was from. I then relabeled it to make it easier to understand.

I also noticed that some of the messages contained random characters and I needed to get rid of those so that proper words could be formed.

3. Visualizing the Data

Most common word used

Finally, we were done and ready to get on with the fun stuff! In order to count the number of words, I first joined all the words from the text column together before changing them all to lower caps and splitting them up. I then used value_counts and pulled the top 100 words used in the chat

I then plotted it out using plotly.express in a simple bar chart. Clearly the word "the" was extremely popular

Who sent the most messages?

In order to see who had sent the most messages, I then used value_counts on the ‘from’ column and plotted it out too.

Who responds to who?

This was an interesting one. I wanted to gauge interactions and see who replied the most after whom. I hence created a simple loop to create a list, recording which person tended to reply after which other person. From the looks of it, I guess person A had a lot of fun replying after himself.

Custom words

Finally, I wanted to a list of custom words that allowed me to see how many times the word had been used. I created a list of these words and then counted the number of times they appeared in the pd.Series from our text data.

I guess we all love each other a lot and have plenty of fun.

Conclusion

So there you have it! A fun little exploration of words and phrases in a group chat. Thanks for reading and hope you found it interesting!

Originally published at zachlim98.github.io/me

Ten articles before and after

How to Scrape and Extract Members from Telegram Group – Telegram 中文版

UltimateGroupLinks Whatsapp Groups and Telegram groups – Telegram 中文版

Messenger bots: Real data on usage – Telegram 中文版

Telegram si rinnova sempre. È uscita la 3.16 – Telegram 中文版

Про каналы в Telegram. Я очень люблю читать интересные тексты… – Telegram 中文版

Get Telegram Chat ID. How to get a Chat ID of a Telegram… – Telegram 中文版

設定 Telegram 聊天資料夾 Telegram 简体中文版

十分鐘速讀Juiker為何能夠力拼Line軟體的八大優點 Telegram 简体中文版

TelegramBot | Firebase + Google Apps Script Telegram 简体中文版

教學│Telegram 使用教學攻略:註冊帳號 Telegram 简体中文版

About Me

Pretium lorem primis senectus habitasse lectus donec ultricies tortor adipiscing fusce morbi volutpat pellentesque consectetur risus curae malesuada dignissim lacus convallis massa mauris.