Skip to main content

LLMs have their uses, but healthcare needs 'small language models' too, expert says

An executive who in his career has had oversight of nearly 70% of America's healthcare data explains why that exact information is needed for artificial intelligence to succeed in the industry.
By Bill Siwicki , Managing Editor
Fawad Butt of Penguin Ai on LLM

Fawad Butt, CEO and cofounder, Penguin Ai

Photo: Fawad Butt

Fawad Butt has served as chief data officer at Kaiser Permanente, UnitedHealthcare and Optum, some of the largest healthcare organizations in the country. Today, he is CEO of Penguin Ai, a healthcare artificial intelligence company that works extensively with data.

For all their early promise these past few years, Butt believes large language models have reached their limit. GPT-5 disappointed users with "bland" outputs, Meta's LLaMA 4 stumbled on long-context reasoning, and OpenAI was forced to bring back an older model after backlash, he points out.

The problem is not architecture, he says, it's the data.

LLMs are starving on recycled internet text while the real fuel for intelligence – private enterprise data – sits untapped, he says.

"The future isn't bigger models – it's smaller, smarter ones that learn from the data enterprises already own," he predicts.

We spoke with Butt to learn more about SLMs – small language models.

Q. Please elaborate on what you mean when you say the future isn't bigger models but smaller, smarter ones.

A. If data is the new oil, then the grade of that oil matters. You wouldn't pour crude oil into a diesel tanker and expect it to run. The same is true for AI. If you feed it the wrong data, you get the wrong result.

LLMs are trained on generic internet data – Wikipedia, Reddit and scraped websites. That's the wrong grade of fuel for the healthcare industry. SLMs flip that equation. They're trained on real-world, proprietary healthcare enterprise data, so they are more specific, more task-based and specific to the workflows that matter. That makes them faster, cheaper and more accurate in practice – because they're designed for the actual problems healthcare organizations face every day.

As former chief data officer at Kaiser Permanente, UnitedHealthcare and Optum, I had oversight of nearly 70% of America's healthcare data. I've spent hundreds of millions on technology and learned the hard way that most of it wasn't built for healthcare.

The reality is the world's most valuable data is locked inside enterprise systems. In healthcare, that means claims, clinical notes, billing data, prior auth requests and call center transcripts. That's the right fuel on which to train future AI models for the healthcare industry.

Q. You suggest LLMs already are peaking and "eating themselves." Why do you feel they are peaking, and what do you mean by "eating themselves"?

A. When I say LLMs are peaking, I mean we've hit the ceiling of what this approach can deliver. Look at the recent Chat GPT-5 release. On launch, it failed basic math, missed context of earlier versions handled with ease, and customers called it "bland" and "generic."

OpenAI even had to roll it back to an older model because people preferred it. Meta's LLaMA 4 promised to handle millions of tokens but collapsed at around 128,000. Meanwhile, Google's Gemini cleared 90% accuracy at the same scale.

These models are starting to "eat themselves" because they're training on the same recycled data – the internet's collective exhaust – and the signal is getting weaker each time they do so. At this point, the total amount of usable data to train on has already been discovered and consumed. Without new sources of high-quality, proprietary data or new techniques for building models, performance will flatten and decline.

The other issue is inefficiency. LLMs are massive, expensive and overpowered for most real-world tasks. Every simple query burns enormous energy and cost. The models use the same amount of tokens to generate a vacation schedule as they do to predict whether someone will go to the ER. The analogy I like is we're using 18-wheelers to make 7-Eleven runs. Healthcare doesn't need LLMs, they need SLMs.

We need to walk or bike to grab a gallon of milk down the street instead of driving a semi-truck.

Either we start using enterprise data and better architectures, or the current generation of LLMs will keep looping back on themselves – getting slower, blander and less useful over time.

Q. Why is enterprise data and not internet data the missing key to LLMs?

A. MIT recently reported 95% of genAI pilots fail. That doesn't surprise me. Most of those pilots are built on generic, off-the-shelf LLMs trained on Reddit, Wikipedia or scraped web text. Companies are trying to jam those models into enterprise workflows – and then they wonder why the results don't stick.

You can't fix a prior auth backlog with a chatbot that's never seen a prior auth request. You can't process claims or do HCC risk coding with a model that doesn't understand the language of healthcare.

The problem isn't that LLMs are broken; it's that the internet has already been scraped to exhaustion. To get smarter, these models need new, high-quality data. That data lives inside enterprise warehouses across the Fortune 500. It's locked behind privacy, security and compliance walls – and for good reason.

But that's also why large commercial LLMs have peaked: They're cut off from the most valuable data on earth.

The future isn't about patching public models after the fact. It's about building SLMs that start with enterprise data from day one – de-identified, governed and domain-specific. In healthcare, that means combining clinical, claims and billing data – reinforced with synthetic data where needed – to create models that actually understand the work they're meant to do.

ChatGPT-4 was trained on a fraction of the data available inside a single Fortune 500 company. So the real intelligence isn't on the internet; it's sitting inside enterprises.

Q. You describe small language models as efficient, domain-specific and built for business-critical workflows. Please expand on this for health IT leaders and explain how these leaders can get to work on them if they agree with your stance.

A. The beauty of small language models is they're built for the real world – specific workflows, not general conversation. With fewer parameters, they're faster, cheaper and lighter to run. They use less memory, less power and can operate on more modest GPUs, which matters for both cost and sustainability.

They also respond faster, with lower latency, which makes them practical for enterprise environments where reliability and speed count.

In healthcare, SLMs trained on proprietary, real-world data are more accurate on the tasks that actually move the needle – things like HCC risk coding, prior auth, denials, appeals and claims. You can't use a billion-parameter model trained on the open internet to solve those problems. You need a smaller model that knows your data, your language and your workflows – not superfluous data.

For health IT leaders, start with your own data – that's your oil – and find the right refinery. Pick a workflow where staff are overwhelmed, like denials management, and run a targeted pilot. Set a clear timeline for value: 30, 60 or 90 days.

If a vendor can't prove measurable impact in that window, move on. That's how you de-risk AI adoption and start generating real outcomes instead of endless pilots.

Follow Bill's health IT coverage on LinkedIn: Bill Siwicki
Email him: bsiwicki@himss.org
Healthcare IT News is a HIMSS Media publication.

WATCH NOW: Grabbing the Chief AI Officer brass ring – and working with top brass