Anyone following the media reports can see that artificial intelligence is financial fodder for Big Tech. How should healthcare relate to all this commercial violence? And does generative AI, read: language models such as ChatGPT, have anything to offer healthcare at all? To answer questions like these, Health-RI held a well-attended two-day event at Jaarbeurs Utrecht.
When it comes to the commercial potential of AI, Microsoft' s latest quarterly figures speak volumes. Thanks to AI, quarterly revenue rose 18 percent to $62 billion. Profits also rose to nearly $22 billion. With an estimated value of nearly $3,000 billion, Microsoft has thus even surpassed Apple as the world's most valuable company.
Impact
For AI in healthcare, all these developments still seem to be moving a bit fast. Recent American research shows that while half of healthcare executives are following the developments, less than a quarter are already actually applying AI within their own organizations. The majority does not expect AI to have a noticeable impact on the sector for another five years.
Integration into EHR
That the role of AI in healthcare remains limited for now should not be a reason for the industry to sit back, according to Marc Snackey, product owner data analytics at UMC Utrecht. "Large language models like ChatGPT are everywhere," Snackey said at the opening of the LLM two-day event. "Within healthcare, too, there are calls for integration into the EHR."
Small language area
Healthcare providers, professionals and suppliers would do well to join forces, as far as Snackey is concerned. "When it comes to clinical text, we live in a small country," he said.
This low language coverage is also reflected in the model library of AI platform Hugging Face. Of the nearly 500,000 models, 990 are Dutch-inspired. Among them, no more than a handful of medical models. "So there is a great need to collaborate," Snackey concludes. "We don't do that enough. Institutions are too inward looking." There is good news, too. Although the Netherlands covers a small language area, it is a "high resource" country. In other words, the knowledge and resources are there.
Generic LLM tools
The event in Utrecht focuses on two substantive questions. First, participants want to know whether it is possible to use LLMs to extract better, more accurate information from large source files such as questionnaires and EHRs, such as pain scores or diagnoses. There is also the quest for broader applicability. Or as Snackey puts it, "Is it possible to create generic LLM tools that allow the healthcare provider to answer questions on their own instead of us as a development team coming up with point solutions for every question."
Smaller models
What the current generation of LLMs can and cannot do, Professor of Natural Language Processing (NLP) Suzan Verberne of Leiden University makes well clear in her contribution. Distilling side effects from patient conversations on Facebook? Check! Translating them into medical coding? No problem! Working with language models yourself? Of course! "Models are getting smaller and smaller," explains Verberne. "A model with 7 million parameters instead of 70 million parameters is more manageable and applicable. You can basically run that on a MacBook."
Medical use
That said, there are still all sorts of flaws in LLMs that make medical use difficult. ChatGPT, for example, is not so much set up for accuracy as it is for keeping the user friendly. When asked in-depth questions or objections, ChatGPT quickly begins to apologize. ChatGPT also still sometimes lets itself be fooled by leading questions. When Verberne once asked ChatGPT when Einstein stayed in Leiden, she was neatly presented with years consistent with Einstein's life. Only: Einstein never visited Leiden.
Hallucination as a characteristic
"ChatGPT is trained to engage in dialogue and follow instructions," Verberne said. "That makes it strong, but it also brings with it the danger of hallucination. Hallucination is not a bug but a feature. What ChatGPT does is generate a plausible set of likely word sequences. The more specific the topic the greater the chance of hallucination in doing so, because information on such topics is more limited."
Inaccuracies
Especially in an industry where hyperspecialization is becoming more pervasive and treatments are becoming more personalized, ChatGPT's big thumb can cause problems. Creating abstracts of scientific articles is also not without its mishaps.
Research suggests that a quarter of the summaries contained inaccuracies. And when answering medical questions, only 20 percent of the answers matched experts' answers. "Actually, you can only apply ChatGPT if you already know a lot about a topic," Verberne concludes.
Human characteristics
Ironically, ChatGPT excels in traits attributed precisely to human healthcare providers. "Patients often find LLMs friendly, because a language-bot has all the time and explains at length, while the doctor may be in a hurry and be gruff. But satisfaction and friendliness, of course, are not the same as correctness."
Environmental tax
Verberne also touches on an often overlooked aspect of AI, namely its carbon footprint. "One run of ChatGPT consumes a thousand times more energy than an ordinary Google search. And language model BLOOM was found to account for 25 tons of CO2 emissions." By comparison, a passenger car driving 15,000 kilometers a year emits 3,300 kilograms.
Unburdening healthcare professional
Pollution of another order is the copious paperwork that healthcare professionals are saddled with on a daily basis. This sometimes costs them up to 40 percent of their working time, argues Arjan Groen of the Dutch start-up HealthSage.ai. HealthSage's LLM solution can unburden professionals by converting unstructured medical texts into FHIR's standard specifications with "unprecedented speed and punctuality." Healthcare providers who dare can get started with the beta version of Note-to-FHIR. The open source approach is a conscious choice. Green says, "With open source, we are more transparent, so we build trust and can continue to develop much faster."
GPT-NL
At the LLM two-day event, for example, there were more initiatives with a joint touch. With GPT-NL, for example, TNO, SURF and NFI are trying to answer to the commercial violence of Big Tech. The dataset of this language model should be ready in April. The model will become available early next year. "Whether it can compete with big models remains to be seen, but the work being done now is also important and valuable," echoed the room discussion. There were also concerns in the room. For example, about privacy issues: "There is a lot of experimentation with commercial providers, but if you throw all the data over the fence you run big risks." Still, the tenor was positive, or as one participant aptly put it, "Experimenting? Yes, but wait to implement."
AI is one of the core themes during Zorg & ict 2024. The largest health tech event in the Netherlands will be held from April 9 to 11 at Jaarbeurs in Utrecht.