For years, much of radiology AI has centered on chest X-rays. That made sense: X-ray datasets were easier to release, benchmarks were more mature, and early multimodal workflows fit naturally around 2D imaging.
But CT is where much of the clinical complexity actually lives.
That is why Merlin, a newly highlighted computed tomography vision-language foundation model, deserves attention. Recently published in Nature, Merlin is more than another impressive medical AI paper. The team has also released code, model weights, and a documented dataset access path for researchers.
That makes this release useful to the community in a way many research announcements are not. It gives people something real to inspect, install, test, and potentially build on in the broader world of machine learning in medical imaging.
Why CT-Native Models Matter
CT is fundamentally different from the 2D imaging setups that shaped the first wave of medical vision-language models.
Instead of a single projection image, CT gives clinicians a volumetric view of anatomy and pathology. That means:
- far richer spatial structure,
- more clinically relevant detail across organs and tissue types,
- greater reporting complexity,
- and much heavier demands on both model architecture and data infrastructure.
Trying to extend 2D radiology AI directly into this setting is often limiting. CT requires models that are native to 3D reasoning and able to connect imaging with language in a more structured way.
That is the core appeal of Merlin: it is presented as a CT-native multimodal foundation model, not just a 2D radiology system stretched into a volumetric domain.
What Merlin Is
According to the paper and project materials, Merlin is a 3D vision-language foundation model for computed tomography trained using:
- CT scans,
- radiology reports,
- and structured electronic health record data.
The model is designed to support multiple downstream capabilities rather than only one narrow benchmark task. The released materials reference support for:
- image-text embeddings,
- phenotype classification,
- five-year disease prediction,
- and radiology report generation.
This multi-capability framing is important. It suggests Merlin is being positioned as infrastructure for CT-centered medical AI workflows rather than as a one-off demo model.
What Has Actually Been Released?
This is where Merlin becomes especially interesting for the broader community.
The project is not just described in a paper. It comes with several tangible assets:
- Public GitHub repository with setup instructions, demos, and documentation
- Installable Python package published as
merlin-vlm - Model weights on Hugging Face
- Merlin Abdominal CT Dataset access page hosted through Stanford AIMI
That combination matters. It means people can do more than cite the paper. They can inspect the codebase, review the inference setup, test the package installation, and evaluate whether the released resources are actually usable.
In other words, this is the kind of release the medical imaging community can engage with in a practical way.
The Dataset Makes This Much More Than Hype
One of the strongest reasons Merlin is worth sharing is the accompanying dataset release path.
The Merlin Abdominal CT Dataset is described as containing:
- 25,494 CT scans
- 18,317 unique patients
- paired radiology reports
- metadata and task files for downstream analysis
The project documentation also references supporting files for:
- dataset splits,
- report findings,
- zero-shot disease classification labels,
- five-year disease prediction labels,
- and demographic or acquisition metadata.
That makes Merlin more than a model release. It is a model-and-dataset stack, which is what serious progress in medical imaging AI usually requires.
There is an important caveat: the dataset is not instant-download open. Access is routed through a Stanford AIMI data use agreement. But that still represents a meaningful community resource. Researchers are given a real and documented path to obtain the data, which is far more actionable than a paper that references private training data with no release plan at all.
Why This Matters Right Now
Merlin arrives at a moment when medical AI is moving beyond the first wave of 2D radiology benchmarks and toward richer multimodal infrastructure.
That shift has been building for some time. Related efforts such as CT-RATE and RadGenome-Chest CT show that the field is increasingly investing in:
- 3D imaging benchmarks,
- paired image-language resources,
- and modality-specific dataset ecosystems.
Merlin fits directly into that trend. It reflects a broader transition from narrow radiology AI tasks toward foundation-model-style resources built for CT workflows.
That is important because progress in medical imaging is not only about model design. It also depends on:
- large multimodal datasets,
- reproducible access to evaluation resources,
- released code and checkpoints,
- and enough transparency for others to test what has actually been built.
Merlin contributes meaningfully across those dimensions, especially for teams thinking seriously about radiology AI workflow automation and how CT-native systems fit into real products.
Building or deploying a medical imaging AI product? Download the Medical AI Deployment Checklist to plan validation, infrastructure, workflow integration, and launch readiness.
Why the Community Should Pay Attention
A lot of medical AI announcements are difficult to act on. They may be academically interesting but not especially useful for engineers, startups, or applied research teams.
Merlin is different because it gives the community concrete things to validate:
- Can the package be installed?
- Can the inference demos run?
- How accessible are the released weights?
- How usable is the dataset request flow?
- What downstream tasks can teams realistically reproduce?
That makes it much easier to discuss Merlin as infrastructure rather than as marketing.
And that distinction matters. If a release is going to be shared with researchers, builders, and radiology AI practitioners, it should come with something they can actually use. Merlin appears to do that.
A Sensible Note of Caution
It is still worth being precise in how we talk about this release.
Merlin is promising, but “open” in medical AI often comes with real constraints. In this case:
- the dataset is governed by a data use agreement,
- clinical data access is naturally restricted,
- practical usability depends on how smooth the setup and inference workflows really are,
- and benchmark strength does not automatically mean clinical readiness.
So the right conclusion is not that CT foundation modeling is solved.
The better conclusion is that the field now has a more serious, shareable, and testable CT-native foundation model release than it had before.
That is already a meaningful step forward.
What Merlin Signals for Radiology AI
Merlin matters not only because of its reported performance, but because of what it signals.
It points toward a future where radiology foundation models are:
- modality-native instead of loosely adapted from 2D setups,
- multimodal by design,
- backed by paired imaging, reports, and metadata,
- and released with enough infrastructure for the community to do real work with them.
If that direction continues, the next wave of radiology AI will likely be shaped less by isolated single-task models and more by general-purpose medical imaging backbones that support classification, report generation, retrieval, forecasting, and decision support.
In that sense, Merlin is not just interesting on its own. It is a marker that CT-native foundation models are becoming practical community artifacts.
It also connects naturally with the broader shift toward radiology AI agents, production-grade tooling, and safer paths for deploying medical AI safely.
Final Thoughts
The radiology AI world has spent years focused primarily on 2D imaging. Merlin is a strong sign that the center of gravity may finally be shifting toward richer, more clinically representative modalities.
And importantly, this shift is arriving with code, weights, and dataset access, not just claims.
That is exactly the kind of release worth watching, testing, and sharing.
If your team is building a radiology AI workflow, CT analysis tool, or medical imaging AI platform, PYCAD can help with model integration, DICOM workflows, deployment, and production software.
Sources
- Merlin: a computed tomography vision-language foundation model and dataset — Nature
- StanfordMIMI/Merlin — GitHub
- stanfordmimi/Merlin — Hugging Face
- Merlin Abdominal CT Dataset — Stanford AIMI
- CT-RATE: Generalist Foundation Models from a Multimodal Dataset for 3D Medical Imaging
- RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT




