Telegram OSINT: Generating a data ‘backbone’ for investigation
With Telegram growing ever more popular, vast amounts of data are being generated which we can use to map trends and fuel investigations. While most topics I discuss tend to focus on smaller granular cases, I want to cover an important topic.
The data backbone is a much-neglected area in many investigations and for a good reason. It can be laborious to set up and just as technical to maintain. While the most obvious example of a data backbone may be an ever-growing KML file in your Google Earth as you find more locations of interest, I actually want to dive away from geospatial analysis for this guide.
Here we want to make a versatile Telegram trend monitor that can be easily tweaked and processed in a spreadsheet. The goal is to provide more information — particularly hyper-specific information — than what you would get just by scrolling through multiple Telegram channels.
A mock example
To avoid diving into a current topic, here’s our example scenario:
The land of Mordor is currently invading the rest of Middle Earth. Major human rights violations are being reported. As an investigator, you are wanting to verify and track dragon attacks on civilian villages. This is good when you have footage of such attacks, but can be especially difficult when fleeting mentions are found across various social media platforms.
Each village has its own Telegram group or news channel, and there are also channels for dissident groups and militias that have formed to defend the land. Most importantly, there are a few ‘recon’ channels that are near Mordor that post whenever a dragon takes off, to give additional warning so people can take shelter.
The basic investigation we already know
Currently, as an investigator, you may see reports of an attack and start searching Telegram, especially the relevant local groups for mentions of dragon attacks. Here you may find footage that can be verified and reports are then produced.
But this is a very reactive approach. Following a dragon attack, it may be days before mobile networks are back and people can upload footage. By then, not only do you have latency in your reporting, any actions taken may be too late.
Likewise, what if no footage emerges? You may have very little to go by unless you have a data backbone of dragon movements in the area to infer that something may have indeed taken place.
The optimisation
What if we could sort through 100,000 messages across hundreds of channels in minutes and get a better situational awareness of dragon sightings, when they happened, and even where? Then we could verify any claims by matching them against dates of known dragon activity. We may also have reason to believe an attack wasn’t carried out on a claimed date because we can see there was no activity or the activity was elsewhere.
Let’s revisit our scenario. As I mentioned, we have some recon groups on Telegram who regularly update the channel with sightings of dragons taking off from the bases in Mordor. These channels also report on other attacks and sightings that don’t involve dragons.
We can filter this data to visualise all dragon activity and have a backbone of context to refer to when verifying reports.
Telegram Export
Telegram has an extremely useful feature in the desktop version of the app that allows you to export all chat history to HTML format. This is great for preservation but also it means we can play with the data.
The issue is that I don’t want to rely on coding to process this data. So, if you are scared of coding, this guide is still for you.
To export data in Telegram, look at the top right of the desktop app and you will see the menu icon (three vertical dots).
At this point, you want to make a decision. If you need to preserve everything, make sure you check the boxes for all files. In this case, we just want a simple text scrape. Uncheck everything including photos so we only get the messages. Having only text will make life easier for the following tools.
Convert to CSV
This is the part that makes the process accessible to everyone. Once the data is in a CSV format, we can have the posts in a spreadsheet, organised by date and time. Spreadsheets are also accessible to anyone else who may not be as technically proficient and run on pretty much every computer. Data is only valuable if you can use it.
To do this, there's a very useful tool that can be found on Github. Telegram Export and Converter Tool takes the HTML files and converts them into CSV so you can load the data into a spreadsheet and play with it.
The first thing you want to do is make sure your Telegram export worked correctly. You should have a folder in your downloads that contains these types of files:
You want to convert the HTML files to CSV with the tool mentioned above. To do this go to the tool’s page on Github and click the green Code button then download as a zip.
The downloaded folder can be unzipped and you will see the files inside. A small note, make sure you have Python installed for this to work.
Copy the Python file into the folder with the Telegram exports. It can just be dragged across into the folder.
Go to the address bar of that folder and copy the path of the folder. The reason we do this is that we are going to run the tool using the command line/terminal.
Don’t be afraid of this because you only need to know one command. “CD”. This command stands for change directory and all it tells the computer is to open that folder and any future actions will take place in that folder.
type “cd” then paste the path to the folder as seen below:
cd C:\Users\XXXXX\Downloads\Telegram Desktop\ChatExport_2022–04–05
Then all you need to do is run the program by typing the name of the tool:
telegram-export-converter.py
And now you have a file in that folder with all the chat history in a spreadsheet format.
Now you have every message and the date and time in the spreadsheet. You can start filtering by the mention of “dragon” or any location of interest and graph out those mentions by date. You can also do more advanced processing of the data to collect sightings by date and location.
Now you can have trend graphs of sightings of various attack equipment. You can filter for different types and various parameters.
Think about how this could also be applied, perhaps you are looking to see the increase in the propagation of certain terms or disinformation. Using a real-life example, perhaps you could map a trend in the increase of words such as “Nazi” in pro-Russian propaganda channels.
Other considerations
When filtering for keywords in the data you want to be specific. This can help rule out general discussion. Say you were monitoring the progress of a new tank battalion with the latest M20 (fictional) tanks, you wouldn’t want to use the search term “tanks”.
You want to filter for references to that specific model to get more informed information. Likewise just filtering the word “tanks” could include all discussions of tanks which may be irrelevant.
This is where your own judgement is essential because every investigation is different and you must design your data processing with context in mind.
If you are monitoring air strikes, the term “air raid siren” may be a good one, but if a channel starts updating hourly advice telling people to listen out for the siren and seek shelter, it suddenly becomes noise in the dataset and yields false-positive results.
Likewise, if you are too specific, you may be overly reliant on the identification skills of the people running a channel and may lose data because they did not positively ID a vehicle.
Conclusion
This barely scratches the surface of large-scale data scraping with Telegram but it hopefully demonstrates the power of a few tricks in being able to establish very rapid and visual contexts that are not always apparent just by scrolling through a feed.
Ten articles before and after
La base de datos suprema. Solo escribo esto pa que Madi no me… – Telegram 中文版
New Pangolin Telegram Group Explained – Telegram 中文版
data-rh=”true”>Useful Telegram Bot you should care – Moblize.IT LLC – Medium – Telegram 中文版
Notify Monit alerts to Telegram. Monit is an open-source tool for… – Telegram 中文版
Bot Telegram. Por Prensa Glufco – Telegram 中文版
Leef miner v0.01.5. Leef Miner is a community chat mining… – Telegram 中文版
How Telegram Bots Pose Cyberthreats to Crypto Investors – Telegram 中文版
World Leaders, War & Digital Diplomacy on Telegram – Telegram 中文版
How to schedule recurring messages on Telegram (no code!) – Telegram 中文版
11 Tools for Managing Chats and Channels in the Telegram – Telegram 中文版