HomeInnovationAI Decodes Visual Brain Activity—and Writes Captions for It

AI Decodes Visual Brain Activity—and Writes Captions for It


November 6, 2025

3 min read

AI Decodes Visual Brain Activity—and Writes Captions for It

A non-invasive imaging technique can translate scenes in your head into sentences. It could help to reveal how the brain interprets the world

Functional magnetic resonance imaging is a non-invasive way to explore brain activity.

PBH Images/Alamy Stock Photo

Reading a person’s mind using a recording of their brain activity sounds futuristic, but it’s now one step closer to reality. A new technique called ‘mind captioning’ generates descriptive sentences of what a person is seeing or picturing in their mind using a read-out of their brain activity, with impressive accuracy.

The technique, described in a paper published today in Science Advances, also offers clues for how the brain represents the world before thoughts are put into words. And it might be able to help people with language difficulties, such as those caused by strokes, to better communicate.

The model predicts what a person is looking at “with a lot of detail”, says Alex Huth, a computational neuroscientist at the University of California, Berkeley. “This is hard to do. It’s surprising you can get that much detail.”

On supporting science journalism

If you’re enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Scan and predict

Researchers have been able to accurately predict what a person is seeing or hearing using their brain activity for more than a decade. But decoding the brain’s interpretation of complex content, such as short videos or abstract shapes, has proved to be more difficult.

Previous attempts have identified only key words that describe what a person saw rather than the complete context, which might include the subject of a video and actions that occur in it, says Tomoyasu Horikawa, a computational neuroscientist at NTT Communication Science Laboratories in Kanagawa, Japan. Other attempts have used artificial intelligence (AI) models that can create sentence structure themselves, making it difficult to know whether the description was actually represented in the brain, he adds.

Horikawa’s method first used a deep-language AI model to analyse the text captions of more than 2,000 videos, turning each one into a unique numerical ‘meaning signature’. A separate AI tool was then trained on six participants’ brain scans and learnt to find the brain-activity patterns that matched each meaning signature while the participants watched the videos.

Once trained, this brain decoder could read a new brain scan from a person watching a video and predict the meaning signature. Then, a different AI text generator would search for a sentence that comes closest to the meaning signature decoded from the individual’s brain.

For example, a participant watched a short video of a person jumping from the top of a waterfall. Using their brain activity, the AI model guessed strings of words, starting with “spring flow”, progressing to “above rapid falling water fall” on the tenth guess and arriving at “a person jumps over a deep water fall on a mountain ridge” on the 100th guess.

The researchers also asked participants to recall video clips that they had seen. The AI models successfully generated descriptions of these recollections, demonstrating that the brain seems to use a similar representation for both viewing and remembering.

Reading the future

This technique, which uses non-invasive functional magnetic resonance imaging, could help to improve the process by which implanted brain–computer interfaces might translate people’s non-verbal mental representations directly into text. “If we can do that using these artificial systems, maybe we can help out these people with communication difficulties,” says Huth, who developed a similar model in 2023 with his colleagues that decodes language from non-invasive brain recordings.

These findings raise concerns about mental privacy, Huth says, as researchers grow closer to revealing intimate thoughts, emotions and health conditions that could, in theory, be used for surveillance, manipulation or to discriminate against people. Neither Huth’s model nor Horikawa’s crosses a line, they both say, because these techniques require participants’ consent and the models cannot discern private thoughts. “Nobody has shown you can do that, yet,” says Huth.

This article is reproduced with permission and was first published on November 5, 2025.

It’s Time to Stand Up for Science

If you enjoyed this article, I’d like to ask for your support. Scientific American has served as an advocate for science and industry for 180 years, and right now may be the most critical moment in that two-century history.

I’ve been a Scientific American subscriber since I was 12 years old, and it helped shape the way I look at the world. SciAm always educates and delights me, and inspires a sense of awe for our vast, beautiful universe. I hope it does that for you, too.

If you subscribe to Scientific American, you help ensure that our coverage is centered on meaningful research and discovery; that we have the resources to report on the decisions that threaten labs across the U.S.; and that we support both budding and working scientists at a time when the value of science itself too often goes unrecognized.

In return, you get essential news, captivating podcasts, brilliant infographics, can’t-miss newsletters, must-watch videos, challenging games, and the science world’s best writing and reporting. You can even gift someone a subscription.

There has never been a more important time for us to stand up and show why science matters. I hope you’ll support us in that mission.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Must Read

spot_img