Category: IT in Healthcare

Much Medical Information Provided by Popular Chatbots is Inaccurate and Incomplete

Half of answers to evidence based questions “somewhat” or “highly” problematic

A substantial amount of medical information provided by 5 popular chatbots is inaccurate and incomplete, with half of the answers to clear evidence based questions “somewhat” or “highly” problematic, show the results of a study published in the open access journal BMJ Open.

Continued deployment of these chatbots without public education and oversight risks amplifying misinformation, warn the researchers.

Generative AI chatbots have been rapidly adopted across research, education, business, marketing and medicine, with many people using them like search engines, including for everyday health and medical queries, explain the researchers.

To gauge the level of accuracy provided in areas of health and medicine already prone to misinformation, and therefore with consequences for everyday health behaviour, the researchers probed 5 publicly available and popular generative AI chatbots in February 2025: Gemini (Google); DeepSeek (High-Flyer); Meta AI (Meta); ChatGPT (OpenAI); and Grok (xAI).

Each chatbot was prompted with 10 open ended and closed questions in each of 5 categories of cancer, vaccines, stem cells, nutrition, and athletic performance. The prompts were designed to resemble common ‘information-seeking’ health and medical queries and misinformation tropes online and in academic discourse. 

And they were developed to ‘strain’ models towards misinformation or contraindicated advice—a strategy increasingly used for stress testing AI chatbots and picking up behavioural vulnerabilities, note the researchers.

Closed prompts required chatbots to provide pre-defined responses, often with one correct answer, that aligned with the scientific consensus. Open ended prompts typically required chatbots to generate multiple responses in list form.

Responses were categorised as non-, somewhat, or highly problematic, using objective pre-defined criteria. A problematic response was defined as one that could plausibly direct lay users to potentially ineffective treatment or come to harm if followed without professional guidance.

The information was scored for accuracy and completeness, and particular attention was given to whether a chatbot presented a false balance between science and non-science based claims, regardless of the strength of the evidence.

Each response was also graded on readability, ranging from whether it was written in easy, plain English, to difficult, academic language, using the Flesch Reading Ease score.

Half (50%) the responses were problematic: 30% were somewhat, and 20% were highly problematic. 

Prompt type was influential: open-ended prompts, for example, produced 40 highly problematic responses—significantly more than expected—and 51 non-problematic responses—significantly fewer than expected. The opposite was true of closed prompts.

While the quality of responses didn’t differ significantly among the 5 chatbots, Grok
generated significantly more highly problematic responses than would be expected (29/50; 58%). Gemini generated the fewest highly problematic responses and the most non-problematic ones.

The chatbots performed best in the area of vaccines and cancer, and worst in the area of stem cells, athletic performance, and nutrition. 

Answers were consistently expressed with confidence and certainty, with few caveats or disclaimers. Out of the total 250 questions, there were only two refusals to answer, both of which came from Meta AI in response to queries about anabolic steroids and alternative cancer treatments.

Reference quality was poor, with an average completeness score of 40%. Chatbot hallucinations and fabricated citations meant that no chatbot provided a fully accurate reference list. 

All readability scores were graded as ‘difficult’, equivalent in complexity to suitability for a college graduate.

The researchers acknowledge that they only assessed 5 chatbots and that commercial AI is rapidly evolving, so their findings might not be universally applicable. And not all real-world queries are deliberately adversarial, an approach they took which may have overstated the prevalence of problematic content.

Nevertheless, “Our findings regarding scientific accuracy, reference quality, and response readability highlight important behavioural limitations and the need to re-evaluate how AI chatbots are deployed in public-facing health and medical communication,” they point out. 

“By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences. They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments,” they explain.

“This behavioural limitation means that chatbots can reproduce authoritative-sounding
but potentially flawed responses.” 

The data chatbots draw on also includes Q&A forums and social media, and scientific content is typically limited to open access or publicly available articles, which comprise only 30–50% of published studies. While this enhances conversational fluency, it  may come at the cost of scientific accuracy, advise the researchers.

“As the use of AI chatbots continues to expand, our data highlight a need for public education, professional training, and regulatory oversight to ensure that generative AI supports, rather than erodes, public health,” they conclude.

Source: BMJ Group

Deepfake X-Rays Fool Radiologists and AI

Findings raise concerns about cybersecurity and diagnostic trust

Anatomy-matched real and GPT-4o-generated radiographs: (A) real and (B) GPT-4o-generated posteroanterior chest radiographs, (C) real and (D) GPT-4ogenerated lateral cervical spine radiographs, (E) real and (F) GPT-4o-generated posteroanterior hand radiographs, and (G) real and (H) GPT-4o-generated lateral lumbar spine radiographs. The pairs demonstrate that GPT-4o can produce radiographically plausible images across different anatomic regions.
https://doi.org/10.1148/radiol.252094 ©RSNA 2026

Neither radiologists nor multimodal large language models (LLMs) are able to easily distinguish AI-generated “deepfake” X-ray images from authentic ones, according to a study published in Radiology. The findings highlight the potential risks associated with AI-generated X-ray images, along with the need for tools and training to protect the integrity of medical images and prepare health care professionals to detect deepfakes.

The term “deepfake” refers to a video, photo, image or audio recording that appears real but has been created or manipulated using AI.

“Our study demonstrates that these deepfake X-rays are realistic enough to deceive radiologists, the most highly trained medical image specialists, even when they were aware that AI-generated images were present,” said lead study author Mickael Tordjman, MD, post-doctoral fellow, Icahn School of Medicine at Mount Sinai, New York. “This creates a high-stakes vulnerability for fraudulent litigation if, for example, a fabricated fracture could be indistinguishable from a real one. There is also a significant cybersecurity risk if hackers were to gain access to a hospital’s network and inject synthetic images to manipulate patient diagnoses or cause widespread clinical chaos by undermining the fundamental reliability of the digital medical record.”

Seventeen radiologists from 12 different centers in six countries (United States, France, Germany, Turkey, United Kingdom and United Arab Emirates) participated in the retrospective study. Their professional experience ranged from 0 to 40 years. Half of the 264 X-ray images in the study were authentic, and the other half were generated by AI. Radiologists were evaluated on two distinct image sets, with no overlapping between the datasets. The first dataset included real and ChatGPT-generated images of multiple anatomical regions. The second dataset included chest X-ray images—half authentic and the other half created by RoentGen, an open-source generative AI diffusion model developed by Stanford Medicine researchers.

When radiologist readers were unaware of the study’s true purpose, yet asked after ranking the technical quality of each ChatGPT image if they noticed anything unusual, only 41% spontaneously identified AI-generated images. After being informed that the dataset contained synthetic images, the radiologists’ mean accuracy in differentiating the real and synthetic X-rays was 75%.

Individual radiologist performance in accurately detecting the ChatGPT-generated images ranged from 58% to 92%. Similarly, the accuracy of four multimodal LLMs—GPT-4o (OpenAI), GPT-5 (OpenAI), Gemini 2.5 Pro (Google), and Llama 4 Maverick (Meta)—ranged from 57% to 85%. Even ChatGPT-4o, the model used to create the deepfakes, was unable to accurately detect all of them, though it identified the most by a considerable margin compared to Google and Meta LLMs.

Radiologist accuracy in detecting the RoentGen synthetic chest X-Rays ranged from 62% to 78% and the LLM models’ performance ranged from 52% to 89%.

There was no correlation between a radiologist’s years of experience and their accuracy in detecting synthetic X-ray images. However, musculoskeletal radiologists demonstrated significantly higher accuracy than other radiology subspecialists.

Spotting the Risks in Synthetic Imaging

“Deepfake medical images often look too perfect,” Dr. Tordjman said. “Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent, often limited to one side of the bone.”

Recommended solutions to clearly distinguish real and fake images and help prevent tampering include implementing advanced digital safeguards, such as invisible watermarks that embed ownership or identity data directly into the images and automatically attaching technologist-linked cryptographic signatures when the images are captured.

“We are potentially only seeing the tip of the iceberg,” Dr. Tordjman said. “The logical next step in this evolution is AI-generation of synthetic 3D images, such as CT and MRI. Establishing educational datasets and detection tools now is critical.”

The study’s authors have published a curated deepfake dataset with interactive quizzes for educational purposes.

For More Information

Access the Radiology study, “The Rise of Deepfake Medical Imaging: Radiologists’ Diagnostic Accuracy in Detecting ChatGPT-generated Radiographs,” and the related editorial, “The Democratization of Deceit: Seeing Is No Longer Believing.”

Source: Radiological Society of North America

The Next Leap for AI Scribes Provides Eyes in the Clinic

Vision-enabled artificial intelligence (AI) medical scribes could increase the accuracy of patient notes and save valuable time for clinicians

The introduction of vision-enabled artificial intelligence (AI) to medical scribes – the recording devices used by doctors to document meetings with patients in real-time – could increase the accuracy of patient notes and save valuable time for clinicians.

Flinders University study, published in npj Digital Medicine, has found that AI medical scribes already reduce some administrative work that takes time away from patients, but these devices have the capacity to do more when fitted with visual recording apparatus.

Researchers from Flinders’ College of Medicine and Public Health found that a vision-enabled AI scribe, employing a combination of Google’s Gemini model and Ray-Ban Meta smart glasses, substantially improved the documentation accuracy of pharmacist-patient consultations and reduced omissions and errors in clinical notes.

“AI scribes are already helping clinicians by listening to consultations, but healthcare involves far more than spoken words,” says research author Bradley Menz, an academic pharmacist in Flinders’ College of Medicine and Public Health.

“A lot of clinically important information is visual. Important visual cues during consultations include patients’ medicine containers, prescriptions and devices, as well as their body language. When an AI system can use both what it hears and what sees in these consultations, it captures more of the details that matter for patient care.”

In the study, 10 clinical pharmacists recorded 110 ‘mock’ medication-history interviews, which contained more than 100 different medicine containers, including tablets, capsules, injections and creams.

Researchers wore Meta AI Ray-Ban glasses to record the interview before passing the video footage through to the AI scribe, which was developed using Google’s Gemini AI model.

An AI scribe that analysed both video and audio achieved 98% accuracy, compared with 81 per cent  when the same system processed only audio information.

A significant benefit was capturing medication strength and form, which are crucial details for safe dosing. The AI scribe with video input captured this information 97% of the time, while audio-only recordings fell to 28%.

“This is an augmented tool, not a replacement for clinical judgement,” says Mr Menz. “The clinician still needs to review and sign off the document.

“The AI scribe can contain a verification step, take screenshots of medication packages, and generate a full spoken transcript, giving the health professional a much stronger basis for checking what the AI has produced.”

Senior author, Associate Professor Ashley Hopkins, says the study may point to the next stage of AI scribe usage in health care.

“AI scribes have gained traction because they reduce the burden of documentation and give clinicians more time with their patients. These findings suggest that the next step – when the scribe can see as well as hear – produces a more accurate and complete draft,” says Associate Professor Hopkins. “This means less time editing AI-documentation and even more time focusing on patient care.

“These findings suggest the next step may be that all scribe systems can interpret visual information as well as speech, which could open the door to wider clinical uses.”

The authors say the study has some limitation and underlines the need for human oversight and careful governance before these tools are adopted more broadly. The paper also highlights privacy, consent, data security and workflow integration as important issues that will need to be addressed as vision-enabled AI scribes move closer to practice.

Source: Flinders University

Healthcare Under Attack: Why Cybersecurity is now Critical Care

Photo by Nahel Abdul on Unsplash

By Kerissa Varma, Microsoft Chief Security Advisor, Africa

Africa’s healthcare sector is facing a silent emergency. Many healthcare operators, facilities and doctors across Africa already grapple with the challenges of under-resourced environments, an uneven distribution of resources and massive demand for services. Now, healthcare administrators must turn their attention to a relatively new and extremely urgent concern. While doctors fight to save lives, cybercriminals are infiltrating hospitals, laboratories, and clinics, turning life-saving environments into digital battlegrounds.

A growing epidemic

World Health Organization director-general Tedros Adhanom Ghebreyesus noted that the digital transformation of healthcare, combined with the high value of health data, has made the sector a prime target for cybercriminals, commenting that “At best, these attacks cause disruption and financial loss. At worst, they undermine trust in the health systems on which people depend, and even cause patient harm and death.”

Recent attacks have exposed the fragility of Africa’s medical infrastructure. In May 2025, Mediclinic Southern Africa was hit by a cyber extortion attack, compromising sensitive HR data. Later in 2025, Lancet Laboratories faced a regulatory penalty for failing to notify patients about data breaches under South Africa’s POPIA law, while a ransomware strike on the National Health Laboratory Service disrupted blood test processing nationwide, delaying critical care for millions.

M-Tiba, a Kenyan digital health platform managed by CarePay and backed by Safaricom, suffered a significant cyberattack and data breach in late 2025, while earlier this year Pharmacie.ma, a Moroccan pharmaceutical platform, was reportedly the target of an alleged data leak incident that allegedly involved the unauthorised export of a customer database. And recent research indicates that Nigeria’s private healthcare sector is now one of the most targeted on the African continent, with attacks increasing at an alarming rate.

Many incidents also go unreported, as hospitals and healthcare facilities rarely disclose them publicly, yet these incidents are not isolated, with ransomware dominating the threat landscape. Africa’s healthcare sector is heavily targeted by cybercriminals, with healthcare organisations facing an average of 3575 weekly attacks in 2025, a 38% surge from the previous year, with encryption of patient data, temporary loss of access to hospital systems and the risk of data appearing on the dark web cited as potential impacts.

Why healthcare is a prime target

The healthcare industry in Africa, particularly in the public sector, is working with legacy systems, fragmented infrastructure, and underfunded IT teams, all of which combine to make the sector an easy target for unscrupulous bad actors.

Many medical institutions are adopting open-source AI tools for diagnostics and patient management. While cost-effective, these platforms often lack enterprise-grade security, leaving sensitive data exposed. Combined with fragmented storage of paper and electronic patient records – often unencrypted and scattered across multiple systems – the risk of breaches multiplies.

Hospitals and healthcare facilities cannot afford downtime. Every minute offline risks lives, making them more likely to pay ransoms in an attempt to regain control of their systems. Cyber insurers  indicate that in 2 of 5 cases of a ransom being paid, data and operations still cannot be recovered. Additionally, in instances where some or all of the seized data is recovered after paying a ransom, the attacker goes on to request further payments.

Medical records are also a premium target for cybercriminals. In the USA, researchers found that patient records, insurance details, and research data fetch premium prices on the dark web – up to 10 times higher than financial data, according to cybersecurity analysts. A single stolen medical record can sell for $260–$310, compared to $30–$50 for a credit card, because unlike credit cards, medical records never expire and medical information cannot be easily changed, making it useful for years. Medical records frequently include personal identifiers, insurance details, and sometimes biometric data, enabling identity theft and fraud, while criminals use medical data for fake insurance claims, prescription fraud, and targeted scams. Microsoft believes cybersecurity needs to be embedded into every technology implementation. This should be a key priority, especially with sensitive medical data and operations.

How healthcare can use modern technology safely

As Africa’s healthcare systems digitise and embrace AI, protecting the digital lifeline must become as critical as protecting the physical one. Key steps can secure healthcare organisations and facilities like laboratories and diagnostic services’ systems.

Include cybersecurity in your resilience planning

Medical professionals and healthcare facilities often prioritise the resilience of physical capabilities. Power backups, multiple devices should equipment fail, and a standby roster in the event of a practitioner being unavailable are all practices that save lives. Equally cybersecurity and safeguarding online systems needs to be built into the overall resilience planning of medical facilities and services.

Investing in cybersecurity technology that can quickly identify and contain attacker activity before it leads to system downtime or data theft can save lives. Having a response plan that is practiced and maintained in the event of a cyber breach and ensuring strong data backups could mean the difference between a total failure of health services or a minor incident. Ensuring incident response plans are aligned with local compliance laws such as South Africa’s POPIA, and Kenya and Nigeria’s Data Protection Acts is critical for healthcare providers to meet both their resilience and compliance objectives.

Prepare for AI-driven attacks that are going to increase attacker speed and success

Threat actors are increasingly exploiting the interconnectedness of modern software ecosystems and operational structures to conduct malicious activity, so regular auditing of third-party integrations, especially those involving AI or cloud services, is critical.

Adversaries are using AI to scale and tailor operations, with AI-driven phishing being 4.5x more effective than traditional phishing. However, in equal measure, AI is transforming cyber defence – it automates response and containment, detects threats faster and more accurately, and identifies detection gaps and adapts to attacker behaviour. Healthcare organisations should invest in AI-driven threat detection for faster response and anomaly detection and must also take steps to secure AI models and data pipelines by implementing robust access controls, vulnerability scanning, and regular patching for open-source tools.

Remote and wider access to patient records requires strong identity practices

As both patients and medical professionals start accessing patient records digitally, strong means of identification, verification and authentication are critical. The Microsoft Digital Defense Report 2025 notes that the abuse of valid accounts is a frequent occurrence, with malicious actors gaining access to user credentials (usernames and passwords) and using them to infiltrate systems without triggering traditional security alerts. Therefore, organisations must deploy phishing-resistant multifactor authentication (MFA) and conditional access to strengthen user defences.

Invest in people and skills

People are at the heart of robust cybersecurity measures, so it is vital to train staff against common tactics such as phishing, which is the most common entry point for attackers, and apply role-based access controls for both clinical and research data to prevent privilege misuse.

Cybersecurity is no longer an IT issue – it’s a patient safety issue. Healthcare services and providers must treat digital resilience with the same urgency as infection control. By investing in comprehensive cybersecurity strategies and leveraging AI-powered defences, Africa’s healthcare sector can position itself as a crucial front line against emerging threats and help build stronger, more resilient digital ecosystems.

‘Google Earth’ for Human Organs Made Available Online

A new open-access 3D portal that allows users to explore human organs in unprecedented detail, from the whole organ to individual cells, has been launched by an international team led by UCL scientists.

The Human Organ Atlas, described in a new paper in the journal Science Advances, brings together some of the most detailed images of 3D organs ever produced. It enables scientists, doctors, educators, students and the wider public to interactively “fly through” organs such as the brain, heart, lungs, kidney and liver, providing a new way of understanding human anatomy and human diseases.

The resource can be accessed directly through a standard web browser, without specialist software, at this link.

The Atlas is powered by an advanced X-ray imaging method called Hierarchical Phase-Contrast Tomography (HiP-CT), developed at the European Synchrotron (ESRF) in Grenoble, France. HiP-CT uses the ESRF’s Extremely Brilliant Source – a new generation of synchrotron source – which is up to 100 billion times brighter than conventional hospital CT scanners.

This allows researchers to scan entire intact ex vivo human organs (i.e., donated organs) non-destructively and then zoom in to near-cellular resolution (down to less than one micron, 50 times thinner than the size of a human hair).

The technique bridges a century-old gap in medicine between radiology and histology, and represents a major advance in biomedical imaging.

Professor Peter Lee (UCL Department of Mechanical Engineering), principal investigator of the Human Organ Atlas beamtime, said: “To create the Human Organ Atlas, we brought together scientists and medics from nine institutes worldwide. This grouping is continuing to expand, helping gain new insights into diseases from osteoarthritis to heart disease and changing how we learn about the human body.”

Dr Claire Walsh (UCL Department of Mechanical Engineering), Director of the Human Organ Atlas Hub, said: “The Human Organ Atlas shows what team science can achieve at its best – we went into this project wanting this data to be used by others and to help further the understanding of human physiology. The Human Organ Atlas is an incredible resource that will continue to grow. I am personally hugely excited to see how the AI community use the Human Organ Atlas in AI foundation models.”

From Covid-19 to cardiac and gynaecological disorders

Initially developed during the COVID-19 pandemic, the method has already led to high-impact publications and scientific advancements, revealing previously unseen microscopic vascular injury in the lungs of patients who died from Covid-19 or reshaping understanding of cardiac disorders. The technology has also been applied to other organs, providing new insights, for instance, into the way gynecological disorders develop.

Professor Judith Huirne, based at Amsterdam UMC, said: “The virtual 3D histological data derived from Human Organ Atlas hub provides us with valuable insights into the pathogenesis of gynecological disorders. This knowledge is crucial to bridging the current gaps in both understanding and gender disparities.”

This Human Organ Atlas portal is the result of more than five years of collaborative effort between many researchers, engineers, clinicians, and infrastructure specialists, united within the Human Organ Atlas Hub, a consortium involving nine institutes across Europe and the United States.

Since its inception, the team has been committed to open science. Dr Paul Tafforeau, ESRF scientist and pioneer of the imaging technique used to create the Human Organ Atlas, said: “From the beginning, we wanted these data to be accessible to everyone and build an open, shared scientific infrastructure at a global scale. This is a resource for researchers, doctors, educators – but also for anyone curious about how the human body is built.

A unique tool for AI, medicine and education

To the team’s knowledge, this is the highest-resolution open 3D dataset of intact human organs currently available. The Human Organ Atlas currently provides access to: (to be updated)

  • 62 organs, 319 full 3D datasets from 29 donors
  • 12 organ types, including brain, heart, lung, kidney, liver, colon, eye, spleen, placenta, uterus, prostate and testis
  • Multiscale scans, from whole-organ views down to near-cellular resolution (routinely down to 2 µm, as fine as 0.65 microns for some organs)

The portal has been designed to extend far beyond specialist research laboratories. Each dataset can reach hundreds of gigabytes or even over a terabyte in size. The largest one (a brain) is 14 Tb. To make the data usable worldwide, the portal provides:

  • Interactive browser-based visualisation (no special software required)
  • Downloadable datasets at multiple resolutions
  • Tutorials and software tools for analysis
  • Regular addition of new data

Beyond advancing anatomical and biomedical research, the atlas is expected to become a major resource for artificial intelligence. Large, high-quality 3D datasets are rare – limiting the development of advanced medical AI systems. The Human Organ Atlas provides a curated, hierarchical dataset ideally suited for training machine-learning models for segmentation, disease detection and super-resolution analysis.

At the same time, it offers powerful new opportunities for medical education and public engagement with science, allowing anyone to explore the human body out of curiosity.

Source: University College London

AI Tools for Cancer Rely on Shaky Shortcuts

Small cell lung cancer cells (green and blue) that metastasised to the brain in a laboratory mouse recruit brain cells called astrocytes (red) for their protection. Credit: Fangfei Qu

Artificial intelligence tools are increasingly being developed to predict cancer biology directly from microscope images, promising faster diagnoses and cheaper testing. But new research from the University of Warwick, published in Nature Biomedical Engineering, suggests that many of these systems may be using visual shortcuts rather than true biology – raising concerns that some AI pathology tools are currently too unreliable for real-world patient care.

“It’s a bit like judging a restaurant’s quality by the queue of people waiting to get in: it’s a useful shortcut, but it’s not a direct measure of what’s happening in the kitchen,” says Dr Fayyaz Minhas, Associate Professor and principal investigator of the Predictive Systems in Biomedicine (PRISM) Lab in the Department of Computer Science, University of Warwick, and lead author of the study.

“Many AI pathology models are doing the same thing, relying on correlations between biomarkers or on obvious tissue features, rather than isolating biomarker-specific signals. And when conditions change, these shortcuts often fall apart.”

To reach this conclusion, the researchers analysed more than 8000 patient samples across four major cancer types – breast, colorectal, lung and endometrial – and compared the performance of leading machine learning approaches. While the models often achieved high headline accuracy, the team found this frequently came from statistical “shortcuts.”

For example, instead of detecting mutations in the cancer-associated BRAF gene, a model might learn that BRAF mutations often occur alongside another clinical feature such as microsatellite instability (MSI). The system then learns to use this combination of cues to predict BRAF status rather than learning the causal BRAF signal itself – meaning accurate cancer predictions work only when these biomarkers co-occur and become unreliable when they do not.

Kim Branson, SVP Global Head of Artificial Intelligence and Machine Learning, GSK and co-author says, “We’ve found that predicting a BRAF mutation by looking at correlated features like MSI is often like predicting rain by looking at umbrellas – it works, but it doesn’t mean you understand meteorology.

“Crucially, if a model cannot demonstrate information gain above a simple pathologist-assigned grade, we haven’t advanced the field; we’ve just automated a shortcut. The roadmap for the next generation of pathology AI isn’t necessarily bigger models; it’s stricter evaluation protocols that force algorithms to stop cheating and learn the hard biology.”

When performance of AI models was assessed within stratified patient subgroups, such as only high-grade breast cancers or only MSI-positive tumours, accuracy fell substantially, revealing that the models were dependent on shortcut signals that disappear once confounding factors are controlled.

For certain prediction tasks, the performance advantage of deep learning over human-derived clinical information was modest. AI systems achieved accuracy scores of just over 80% when predicting biomarkers, compared with around 75% using tumour grade alone – a measure already assessed by pathologists.

Machine learning methods can still prove valuable for research, drug development candidate screening and for clinical triaging, screening, or supplementary decision support. However, the researchers argue that future AI tools must move beyond correlation-based learning and adopt approaches that explicitly model biological relationships and causal structure.

They also call for stronger evaluation standards, including subgroup testing and comparison against simple clinical baselines, before looking at deployment in routine care.

Dr Minhas concludes, “This research is not a condemnation of AI in pathology. It is a wake-up call. Current models may perform well in controlled settings but rely on statistical shortcuts rather than genuine biological understanding. Until more robust evaluation standards are in place, these tools should not be seen as replacements for molecular testing, and it is essential that clinicians and researchers understand their limitations and use them with appropriate caution.”

Source: University of Warwick

Robotic Medical Crash Cart Eases Workload for Healthcare Teams

Researcher demo-ing an early prototype of the robotic medical crash cart. Credit: Cornell Tech

Healthcare workers have an intense workload and often experience mental distress during resuscitation and other critical care procedures. Although researchers have studied whether robots can support human teams in other high-stakes, high-risk settings such as disaster response and military operations, the role of robots in emergency medicine has not been explored.

Enter Angelique Taylor, the Andrew H. and Ann R. Tisch Assistant Professor at Cornell Tech and the Cornell Ann S. Bowers College of Computing and Information Science. She is also an assistant professor in emergency medicine at Weill Cornell Medicine and director of the Artificial Intelligence and Robotics Lab (AIRLab) at Cornell Tech.

In a pair of articles published at the Institute of Electrical and Electronics Engineers (IEEE) conference on Robot and Human Interactive Communication (RO-MAN) in August 2025, Taylor and her collaborators at Weill Cornell Medicine, associate professor Kevin Ching and assistant professor Jonathan St. George, described research on their new robotic crash cart (RCC) — a robotic version of the mobile drawer unit that holds supplies and equipment needed for a range of medical procedures.

“Healthcare workers may not know or may forget where all the various supplies are located in the cart drawers, and often they’re kind of shuffling through the cart,” Taylor said. This can cause delays during emergency procedures that require iterative tasks with precise timing, exacerbating medical errors and putting patients at risk, she noted.

To create the RCC, Taylor and her team outfitted a standard cart with LED light strips, a speaker, and a touchscreen tablet integrated with the Robot Operating System. This middleware connects computer programs to robot hardware, enabling them to work together to provide users with verbal and nonverbal cues.

During an emergency procedure, a user can request the location of a supply on the tablet. Then the lights around the drawer with that supply blink, or a spoken instruction plays through the speaker. Users can also receive prompts to remind them about necessary medications and recommend supplies.

In their article, “Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams,” Taylor’s team conducted pilot studies of the RCC. One pilot involved 84 participants, aged 21 to 79, about half of whom had a clinical background. Working in groups of 3 to 4, they conducted a series of simulated resuscitation procedures with a manikin patient using three different carts: a RCC with blinking lights for object search and spoken task reminders, a RCC with blinking lights for task reminders and spoken language for object search, or a standard cart.

The team found that participants preferred the RCC that provided verbal and nonverbal cues over no cues with the standard cart — rating it lower in terms of workload and higher in usefulness and ease of use.

“These results were exciting and achieved statistical significance, suggesting that the use of a robot is beneficial,” said Taylor. The article, by Taylor, Ph.D. student Tauhid Tanjim, and colleagues at Weill Cornell, was a Kazuo-Tanie Paper Award finalist, an honor given to the top three papers in their category at the conference.

In the second article, “Human-Robot Teaming Field Deployments: A Comparison Between Verbal and Non-verbal Communication,” the research team began testing the RCC under more realistic conditions. Participants were healthcare workers from across the United States, and actors played frantic family members during the simulations.

Similar to the pilot studies, Taylor, along with colleagues at Cornell and Michigan State University, found that the RCC reduced participant workload, depending on whether the robot provided verbal or non-verbal cues. However, they evaluated robots with only one type of cue, not both, and identified room for improvement, particularly in the robot’s visual cues. They are now studying healthcare workers’ impressions of an RCC with multimodal communication.

Taylor hopes that other research teams will start exploring how robots can support healthcare teams in critical care settings. To that end, Taylor and her colleague presented an article at the February 2025 Association for Computing Machinery/IEEE International Conference that offers a toolkit for researchers to build their own RCC.

By Carina Storrs, freelance writer for Cornell Tech.

Source: Cornell Tech

Half of All Men Over 60 Have Prostate Cancer – an AI Tool Could Speed Diagnosis

Photo by National Cancer Institute on Unsplash

Increasing use of blood tests to detect prostate cancer is leading to overworked doctors. NTNU has now created an AI diagnostic tool that can help lighten the burden.

Diagnostic tools based on artificial intelligence are now making their way into Norwegian hospitals. AI can independently read X-ray images and detect bone fractures, or assess cancer tumours in both the breast and prostate.

“AI tools can take over the detection of simple and clear-cut cases, allowing doctors to spend their time on more complex ones,” said Tone Frost Bathen. She is a professor at NTNU and the project manager of an AI-powered analysis tool for prostate cancer called PROVIZ.

Tests on patients at St Olavs Hospital indicate that the tool is very promising.

“AI can enable radiologists to determine more quickly and more accurately whether a patient needs a biopsy, and where in the prostate it should be taken from,” explained Bathen.

“The PROVIZ project started as early as 2018. It takes a long time to develop diagnostic tools in medicine because safety standards must be high. The application alone to be allowed to test the tool on patients was 500 pages. It is important to create a tool that clearly shows how the result was reached, and that fits into a busy hospital workday,” says Tone Frost Bathen, Professor at NTNU. Photo: Anne Sliper Midling / NTNU

A recent study shows that patients trust medical test results only if an experienced doctor confirms what has been detected.

“Trust in doctors and health professionals is key for artificial intelligence to gain a place in the diagnosis of prostate cancer. Technology alone is not enough. Human contact and professional assessment remain indispensable,” said Simon A. Berger, a PhD research fellow at NTNU.

Prostate cancer is a natural part of getting older

Prostate cancer is the most common form of cancer among men in Western countries.

Examinations have detected prostate cancer in 10% of 50-year-olds, 50% of 60-year-olds and approximately 70% of men over the age of 80.

This shows that the disease is naturally linked to ageing.

“Prostate cancer is something most men die with, not from,” added Berger.

A blood test called PSA can help detect prostate cancer. Since it has become more common for men to take this blood test, the number of new prostate cancer cases has risen sharply. There are now approximately 5000 new cases each year.

When more people are tested for something that many individuals naturally have as part of the ageing process, the next medical step after the blood test must also be carried out more often, so that doctors can obtain a broader clinical picture of its severity.

Most trust in doctors

Currently, this next step involves taking an MRI scan, which provides a detailed image of the prostate gland and the surrounding tissue. These images need to be interpreted manually by an experienced radiologist. As the number of images taken has increased sharply, this has created a need for new and more efficient ways of making diagnoses.

Through the PROVIZ project, NTNU researchers have developed an AI-powered tool that can help doctors interpret MRI images of the prostate. PROVIZ is currently available only for use as part of the ongoing research project, but efforts are underway to apply for a patent and make the tool commercially available.

High international competition for commercial AI tools

Several research groups around the world are now working on developing AI-based diagnostic tools for prostate cancer.

PROVIZ has completed its first clinical testing in collaboration with St. Olavs Hospital, and the results were good. The next step is a much larger clinical trial, as well as a regulatory approval process.

“Right now, we are seeking approximately 20 million NOK to finance this phase. Once funding is in place, the tool could be on the market in the US within a year, and in Europe in just over a year,” says Gabriel Addio Nketiah, a researcher at NTNU and responsible for the commercialisation of PROVIZ.

For a tool like this to be efficiency-enhancing in routine hospital practice, patients must also trust the findings detected through the use of AI.

“Patients have high expectations that AI can be used for faster diagnostics and to reduce healthcare waiting lists. Many see AI as a kind of safety valve – an additional resource that doctors can use alongside their professional judgment,” says Simon A. Berger, a PhD research fellow at NTNU.

Berger interviewed 18 men who had been diagnosed with prostate cancer through the use of PROVIZ. The study shows that trust in doctors and health professionals plays a decisive role in whether patients accept AI in the health services.

“Patients trust AI in lower-risk cases such as bone fractures, but not in cases where the perceived risk is higher, such as cancer. When the perceived risk is high, we place the greatest trust in specialized doctors who can confirm what AI has found,” explained Berger.

Doctors as guarantors

In his interviews, Berger identified three different dimensions of trust.

  1. Foundational trust in the healthcare system: many patients had positive experiences from previous encounters with the healthcare system. This laid a positive foundation.
  2. Inter-personal trust in health professionals: patients trusted the doctors and their assessments. This trust was crucial for accepting AI because the doctors explained and vouched for the technology.
  3. Possible trust in AI: even though patients recognized the potential of AI, they always wanted a human assessment as well in prostate cancer diagnostics. They were concerned about accountability, professional judgement and AI’s (in)ability to see the whole clinical picture.

“The relationship between patient and doctor is still key. For AI to be accepted in clinical practice, health professionals must be active communicators and guarantors of safety. In order for doctors to serve as guarantors, they must first understand how AI arrived at its conclusions so they can verify that it has made the correct assessment. Patients accept the use of AI within a framework they already trust,” concluded Berger.

NTNU owns an MRI scanner at St. Olavs Hospital that is currently undergoing a major upgrade. It helps researchers obtain the best possible images to be used in, among other things, PROVIZ. “Unfortunately, there are few investors in medical technology right now, but we hope that someone sees the societal value of our project,” says Professor Tone Frost Bathen at NTNU. Photo: Anne Sliper Midling / NTNU

By Anne Sliper Midling

Source:

Berger SA, Håland E, Solbjør M. Patient Perspectives on Trust in Artificial Intelligence-Powered Tools in Prostate Cancer Diagnostics. Qualitative Health Research. 2025;0(0). doi:10.1177/10497323251387545

Source: Norwegian Tech News

Can Medical AI Lie? How LLMs Handle Health Misinformation

Photo by Sanket Mishra

Medical artificial intelligence (AI) is often described as a way to make patient care safer by helping clinicians manage information. A new study by the Icahn School of Medicine at Mount Sinai and collaborators confronts a critical vulnerability: when a medical lie enters the system, can AI pass it on as if it were true?  

Analysing more than a million prompts across nine leading language models, the researchers found that these systems can repeat false medical claims when they appear in realistic hospital notes or social-media health discussions. 

The findings, published in the February 9 online issue of The Lancet Digital Health], suggest that current safeguards do not reliably distinguish fact from fabrication once a claim is wrapped in familiar clinical or social-media language. 

To test this systematically, the team exposed the models to three types of content: real hospital discharge summaries from the Medical Information Mart for Intensive Care (MIMIC) database with a single fabricated recommendation added; common health myths collected from Reddit; and 300 short clinical scenarios written and validated by physicians. Each case was presented in multiple versions, from neutral wording to emotionally charged or leading phrasing similar to what circulates on social platforms. 

In one example, a discharge note falsely advised patients with oesophagitis-related bleeding to “drink cold milk to soothe the symptoms.” Several models accepted the statement rather than flagging it as unsafe. They treated it like ordinary medical guidance. 

“Our findings show that current AI systems can treat confident medical language as true by default, even when it’s clearly wrong,” says co-senior and co-corresponding author Eyal Klang, MD, Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai. “A fabricated recommendation in a discharge note can slip through. It can be repeated as if it were standard care. For these models, what matters is less whether a claim is correct than how it is written.”  

The authors say the next step is to treat “can this system pass on a lie?” as a measurable property, using large-scale stress tests and external evidence checks before AI is built into clinical tools. 

“Hospitals and developers can use our dataset as a stress test for medical AI,” says physician-scientist and first author Mahmud Omar, MD, who consults with the research team. “Instead of assuming a model is safe, you can measure how often it passes on a lie, and whether that number falls in the next generation.”  

“AI has the potential to be a real help for clinicians and patients, offering faster insights and support,” says co-senior and co-corresponding author Girish N. Nadkarni, MD, MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer of the Mount Sinai Health System. “But it needs built-in safeguards that check medical claims before they are presented as fact. Our study shows where these systems can still pass on false information, and points to ways we can strengthen them before they are embedded in care.” 

The paper is titled “Mapping LLM Susceptibility to Medical Misinformation Across Clinical Notes and Social Media.”  

Source: Mount Sinai

AI Treatment Advice Diverges with Physicians’ in Late Stage HCC

LLMs tended to prioritise tumour-related factors whereas physicians prioritise liver function when providing treatment recommendations

Photo by National Cancer Institute on Unsplash

Large language models (LLM) can generate treatment recommendations for straightforward cases of hepatocellular carcinoma (HCC) that align with clinical guidelines but fall short in more complex cases, according to a new study by Ji Won Han from The Catholic University of Korea and colleagues published January 13th in the open-access journal PLOS Medicine.

Choosing the most appropriate treatment for patients with liver cancer is complicated. While international treatment guidelines provide recommendations, clinicians must tailor their treatment choice based on cancer stage and liver function as well as other factors such as comorbidities.

To assess whether LLMs can provide treatment recommendations for hepatocellular carcinoma (HCC) that reflect real-world clinical practice, researchers compared suggestions generated by three LLMs (ChatGPT, Gemini, and Claude) with actual treatments received by more than 13,000 newly diagnosed patients with HCC in South Korea.

They found that, in patients with early-stage HCC, higher agreement between LLM recommendations and actual treatments was associated with improved survival. The inverse was seen in patients with advanced-stage disease. Higher agreement between LLM treatment recommendations and actual practice was associated with worse survival. LLMs placed greater emphasis on tumor factors, such as tumor size and number of tumors, while physicians prioritized liver function.

Overall, the findings suggest that LLMs may help to support straightforward treatment decisions, particularly in early-stage disease, but are not presently suitable for guiding care decisions for more complex cases that require nuanced clinical judgment. Regardless of stage, LLM advice should be used with caution and considered as a supplement to clinical expertise.

The authors add, “Our study shows that large language models can help support treatment decisions for early-stage liver cancer, but their performance is more limited in advanced disease. This highlights the importance of using LLMs as a complement to, rather than a replacement for, clinical expertise.”

Provided by PLOS