01 · Personal

AMSTERDAM · 2026

Tony Yang

Independent AI Researcher Amsterdam

I work on the hard, useful parts of AI. I want to build AI that serves people broadly, with fairness and real access for everyone.

The limits of languagemean the limits of the world? Perhaps, yet we dwellin more than we can name.

scroll for the work

§00 · A quick hello

I've spent a while exploring all over AI, from emotion recognition and medical imaging to LLM routing and agent benchmarks. This spring I left my PhD fellowship. It was a hard call, but also a sobering dose, and now I have a clearer mind on what I want to do next.

I believe AI is shifting from a scaling race to a systems race. Bigger models still matter, but they are not enough. What counts is how we train, route, compress, and ship them. I want AI that is capable, efficient, and genuinely useful.

I'm an independent AI researcher in Amsterdam, working with industry and academia. Right now I'm reaching out and looking for the right next role. Prior to this, Marie Skłodowska-Curie Fellow, AI research engineer at TU Delft Imaging Physics, and an MSc from TU Delft.

Email tonyyunyang@outlook.com

scholar github cv · en cv · 中文

When data is scarce, design matters more

When data is sensitive or scarce, you can't just scale your way to a good model. The gains come from better architecture and smarter methods.

Powerful. But how useful, really?

By now we all know AI is powerful. The harder part is making it useful for your actual work. Will it hold up when you depend on it? Can it slot into what you already do? The chatbot was just the first shape. I want to find the rest.

§01 · Research

The research I've worked on so far.

ACM CAIS · RLEVAL WORKSHOP · 2026

TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing

Pei Yang ^* , Wanyi Chen ^* , Tongyun Yang , Pengbin Feng , Jiarong Xing , Wentao Guo , Yuhang Yao , Yuhang Han , Hanchen Li , Xu Wang , Zeyu Wang , Jie Xiao , Anjie Yang , Liang Tian , Lynn Ai , Eric Yang , Tianyu Shi

A benchmark for routing LLMs inside agents that work over many steps. At each step, can you pick the cheapest model that still gets the job done? It comes in two tracks, a fast offline test on 970 verified cases and a live run on SWE-bench. A trained router matched unrouted Opus 4.6 while cutting API cost by 53%.

paper · code · site ·

ACM CAIS · RLEVAL WORKSHOP · 2026

MERA: Model Evolution and Routing with Skill Adaptation for Agentic Systems at Scale

Yuhang Yao , Zeyu Wang , Tongyun Yang , Wanyi Chen , Yuhang Han , Jie Xiao , Tianyu Shi

A routing system for AI agents that learns from real usage traces. It pairs each call with the cheapest model that can handle it, reuses a library of recurring prompt patterns, and trains a small specialist to take over repeat work. The result is 87.3% routing accuracy at about half the cost of always calling the large model.

paper · code ·

★ IMWUT / UBICOMP · 2025

Through the Eyes of Emotion: A Multi-faceted Eye Tracking Dataset for Emotion Recognition in Virtual Reality

Tongyun Yang ^* , Bishwas Regmi ^* , Lingyu Du , Andreas Bulling , Xucong Zhang , Guohao Lan

First large-scale public eye-tracking dataset for VR emotion recognition, with high-frame-rate periocular video plus 240 Hz gaze, across seven discrete emotions.

paper · code ·

MICCAI · + IEEE TRANSACTIONS ON MEDICAL IMAGING · 2025

Reverse Imaging: Any-Sequence Generalization for Cardiac MRI Segmentation

Yidong Zhao , Yi Zhang , Tongyun Yang , Maša Božić-Iven , Ayda Arami , Yuchi Han , Orlando Simonetti , Hui Xue , Petter Kellman , Sebastian Weingärtner , Qian Tao

A way to segment cardiac MRI from any scanner sequence, even ones it never trained on. A diffusion model estimates the underlying tissue properties first, then segments from those instead of the raw image.

paper · code ·

MIDL · 2025

Pruning nnU-Net with Minimal Performance Loss

Tongyun Yang , Yidong Zhao , Qian Tao

Trained nnU-Net models contain substantial weight redundancy. Over 80% of weights can be removed by simple magnitude pruning while preserving segmentation quality.

paper · code ·

§02 · Projects

Where the research turns into real systems. Some are live collaborations (Gradient Networks, Tencent, MeetaVista). The rest I build on my own time.

MAY 2026 → PRESENT · IN PROGRESS · SELF-DRIVEN

Polymarket Decoder

Mining Polymarket's order books to study how prediction markets price news in real time. Looking at when crowds get it right, when they get blindsided, and what the order-book dynamics reveal about collective belief.

TENCENT · APR 2026 → PRESENT · IN PROGRESS · TARGET TOP-TIER AI VENUE

LLM for Optimization

A framework that uses an optimization harness (structured search, feedback, memory retrieval) to guide LLMs toward better solutions on optimization problems like TSP, rather than relying on direct generation alone.

MAR 2026 → PRESENT · IN PROGRESS · TARGET EMNLP '26

Cost-Adaptive LLM Routing with Specialist Models

Building on the LLM Router benchmark, using stronger-model trajectories to fine-tune small specialists for repeated workflows. As usage accumulates, small models improve and the system's cost falls.

MEETAVISTA · MAR 2026 → PRESENT · IN PROGRESS · TARGET TOP-TIER AI VENUE

Human Intent World Model

A vision-language model that reads what a customer wants from visual cues and reasons about how to help, trained on a synthetic dataset built from classic books on selling.

GRADIENT NETWORKS · FEB 2026 → PRESENT · IN PROGRESS · UNDER REVIEW · NEURIPS '26

LLM Router

Built a benchmark for evaluating LLM routing strategies. Showed that routing can preserve same-quality performance while reducing cost by >90% versus calling a single SOTA model for every step.

github

JAN 2026 · IN PROGRESS · UNDER REVIEW · ACL '26

Diffusion Weights

What if you could predict where training is headed instead of running it to the end? A diffusion model forecasts a model's future weights from a few early checkpoints, then blends short, medium, and long-range guesses into the final model. About 3.2x faster, with no drop in quality.

DEC 2025 · SHIPPED

ScholarHighlights

A browser extension that surfaces venue-quality badges, author-role signals, and flexible ranking data directly on Google Scholar. The information you wish was there at a glance.

github · chrome web store

§03 · Writing

Notes on the research, the tools I use, and the boring parts of building AI.

No posts yet. The first one's coming, when it's actually worth reading.

↗ subscribe via RSS

§04 · News

2026/05 Concluded my Marie Curie Fellowship at IMDEA Networks. Now open to research, academia or industry.
2026/03 Submitted a paper on a diffusion-based training framework for large language models to ACL Rolling Review.
2025/10 Began Marie Curie Fellowship at IMDEA Networks (MSCA 6th Sense project).
2025/07 Through the Eyes of Emotion accepted to Ubicomp / IMWUT 2025.
2025/06 Reverse Imaging accepted to MICCAI 2025 and IEEE Transactions on Medical Imaging.
2025/05 Pruning nnU-Net with Minimal Performance Loss accepted to MIDL 2025.

§05 · Teaching

2024/25 Q1 · ET 4310 Supercomputing for Big Data · TA · TU Delft
2023/24 Q3 · CESE 4030 Embedded Systems Lab · TA · TU Delft
2023/24 Q1 · CESE 4000 Software Fundamentals · TA · TU Delft
2023/24 Q1 · CESE 4010 Advanced Computing Systems · TA · TU Delft
2023/24 · CESE MSc Programme Student Mentor · TU Delft

§06 · Contact

I read every message. Write to me about research, working together, supervision, or just to say hi. I usually reply within a day, and I am easy to talk to.

Open to research academia or industry

Tony Yang 杨童耘

Independent AI Researcher

Hard problems where data is scarce · AI that's actually useful

based in

Amsterdam (current)
Shanghai
Shenzhen

languages

Mandarin native
English near-native, professional raised in English-speaking environments · IELTS 8.0

signed · T.