The Texty.org.ua team applies AI for investigations of various scales — from tracking missing children to analyzing manipulative content across thousands of Telegram posts. When off-the-shelf solutions don’t fit, the team trains custom models. Throughout the process, AI work is always supervised by a specialist who verifies the results to avoid mistakes.
Texty.org.ua divides its AI use into two main areas. The first is technical: grouping texts by topic, finding needed information, and comparing images, which allows the team to process enormous datasets. The second is optimizing routine editorial tasks: transcribing interviews, translating texts, and producing audio versions of articles.
The team created a custom model based on Meta’s Llama and trained it on Ukrainian data. Within six months of the project, they had a model capable of recognizing manipulative techniques, helping fight disinformation.
Thanks to the creation of audio versions using the Eleven Labs service, the texts gained a voice, giving people with visual or reading impairments access to quality journalism.
For each investigation, the team identifies which technologies to use, develops custom solutions, and combines open-source models and libraries or different tools. If one method doesn’t work, they can quickly switch to another. This flexibility isn’t available in off-the-shelf paid services and allows the team to adapt AI tools to the outlet’s specific needs.
One of Texty.org.ua’s latest large-scale projects involved tracking Ukrainian children illegally taken to Russia from occupied territories. Researchers search for matches in Russia’s federal adoption database to prove the crime of abduction and gauge its scale. That meant comparing roughly 1,000 photos of missing Ukrainian children against about 40,000 records in the Russian database — roughly 40 million comparisons in total. The team used the open-source DeepFace library, which includes facial recognition models. The technology has a certain level of accuracy but it’s not perfect. Each child in Ukraine had three potential matches suggested by the algorithm, so the editorial team hired verifiers to manually check each case and determine whether it was indeed the same child.
For other investigations, the team uses natural language processing methods. For instance, Topic Modeling automatically groups thousands of texts by topic, while Named Entity Recognition extracts names, organizations, and locations.
One example involved investigating the recruitment of teenagers for terrorist attacks. The team extracted structured information from 300 reports by the National Police about attacks on military vehicles: who committed them, when, and which methods were used to recruit perpetrators. Doing this manually would have taken weeks, but queries to a large language model automated the process.
However, off-the-shelf tools don’t always meet the needs of complex queries. For example, the team needed to detect manipulative techniques in Telegram channels — analyzing how language structures make texts emotionally charged and distract from facts and evidence. Here, ChatGPT struggled with Ukrainian context. “Any message about shelling or the war, ChatGPT marked as manipulation because of its strong emotional tone. We realized we needed to train the model on our own data, to build something from scratch that would work the way we need,” explains Texty.org.ua AI specialist Nataliia Romanyshyn. They used Meta’s open-access Llama as the base model and trained it to detect manipulative techniques in Ukrainian information space. The process took about six months. “Now we have a model that identifies manipulation with reasonable accuracy. We use it not only for that project but also occasionally in others whenever we want to measure manipulative content,” she says.
Texty.org.ua publishes analytical content with detailed graphics, but not all readers can access them. The team wanted to add audio versions but lacked the resources to hire a separate audio team. Using Eleven Labs, they cloned the voice of Valeriia Pavlenko, a journalist who hosts the outlet’s YouTube channel and has established the brand’s voice style. The model learned to read texts in her style. Audiences liked hearing a familiar voice, and users with reading difficulties particularly appreciated the accessibility. “When we launched it, I was worried we might get backlash because it’s AI-generated audio, but it worked well because it used the voice of our own journalist,” the specialist recalls.
The editorial team is aware of AI’s main challenges: models can hallucinate, make mistakes, or reproduce biases, so results are always verified by a human responsible for the process. The outlet even conducted a separate study on AI bias against Ukrainians.
“There’s always a human who makes the final decision and can evaluate the result,” emphasizes Nataliia. In all publications, the outlet describes the methodology: how AI was used, which models were applied, and how results were verified.