Bartosz Cywiński

me_small.jpg

Hey there!

I’m a PhD student at the Warsaw University of Technology. I am also a contractor at Google DeepMind in Neel Nanda’s team.

I work on interpretability of LLMs. Currently, I am mostly excited about using interpretability to understand and fix weird model behaviors or phenomena.

news

Apr 09, 2026 It’s London baby 💂
Oct 02, 2025 New preprint: Eliciting Secret Knowledge from Language Models.
Jun 16, 2025 Starting MATS 8.0 mentored by Arthur Conmy and Sam Marks!
May 21, 2025 New preprint about extracting hidden knowledge from LLMs with mechanistic interpretability is out!
May 01, 2025 SAeUron got accepted to ICML 2025! :tada: