As I’ve written about in this newsletter many times, AI is sweeping the healthcare industry—from drug discovery to AI-enhanced mammograms to transcription of clinical medical documents.
Long before hallucinations and many of the risks brought to the forefront by the generative AI boom became apparent, we had widespread evidence of bias in AI algorithms, which are often less accurate for some groups, such as women and people of color. Now as AI companies and healthcare providers increasingly integrate AI into patient care, ways to evaluate and address such biases are needed more than ever.
Yesterday, an international initiative called “STANDING Together (STANdards for data Diversity, INclusivity and Generalizability)” released recommendations to address bias in medical AI technologies, hoping to “drive further progress towards AI health technologies that are not just safe on average, but safe for all.” Published in The Lancet Digital Health and NEJM AI—along with a commentary by the initiative’s patient representatives published in Nature Medicine—the recommendation follows a research study involving more than 30 institutions and 350 experts from 58 countries.
The recommendations largely deal with transparency, training data, and how AI medical technologies should be tested for bias, targeting both those who curate datasets and those who use the datasets to create AI systems.
The problem
Before getting to recommendations, let’s review the problem.
Overall, algorithms created to detect illness and injury tend to underperform on underrepresented groups like women and people of color. For example, technologies that use algorithms to detect skin cancer have been found to be less accurate for people with darker skin, while a liver disease detection algorithm was found to underperform for women. One bombshell study revealed that a clinical algorithm used widely by hospitals required Black patients to be much sicker before it recommended they receive the same care it recommended for white patients who were not as ill. Similar biases have been uncovered in algorithms used to determine resource allocation, such as how much assistance people with disabilities receive. These are just a handful of many examples.
The cause of these problems is most often found in the data used to train AI algorithms. This data is itself often incomplete or distorted—women and people of color are historically underrepresented in medical studies. In other cases, algorithms fail because they are trained on data that is meant to be a proxy for some other piece of information, but which turns out not to appropriately capture the issue the AI system is supposed to address. The hospital algorithm that denied Black patients the same level of care as white patients failed because it used health-care costs as a proxy for patient care during training. And it turns out that hospital systems have historically spent less on healthcare for Black patients at every level of care, which meant that the AI failed to accurately predict Black patients’ needs.
Suggested solutions
The collective behind the study issued 29 recommendations — 18 aimed at dataset curators and 11 aimed at data users.
For the dataset curators, the paper recommends that dataset documentation should include a summary of the dataset written in plain language, indicate which groups are present in the dataset, address any missing data, identify known or expected sources of bias or error, make clear who created the dataset, who funded it, and detail any purposes for which dataset use should be avoided, among other details that would increase transparency and provide context.
For data users, the recommendations state that they should identify and transparently report areas of under-representation, evaluate performance for contextualised groups, acknowledge known biases and limitations (and their implications), and manage uncertainties and risks throughout the lifecycle of AI health technologies, including documentation at every step.
Among the overall themes are a call to proactively inquire and be transparent, and the need to be sensitive to context and complexity. “If bias encoding cannot be avoided at the algorithm stage, its identification enables a range of stakeholders relevant to the AI health technology’s use (developers, regulators, health policy makers, and end users) to acknowledge and mitigate the translation of bias into harm,” the paper reads.
Will guidelines translate into action?
Like with every emerging use of AI, it’s a delicate balance between the potential benefits, known risks, and responsible implementation. The stakes are high, and this is particularly true when it comes to medical care.
This paper is not the first to try to tackle bias in AI health technologies, but it is among the most comprehensive and arrives at a critical time. The authors write that the recommendations are not intended to be a checklist, but rather to prompt proactive inquiry. But let’s be real: the only way to be certain these lessons will be applied is through regulation.
And with that, here’s more AI news.
Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com
AI IN THE NEWSEurope’s privacy regulators affirm AI companies’ “legitimate interest” GDPR argument but set a high bar for complying. The European Data Protection Board (EDPB) issued the new guidelines pertaining to AI yesterday, stating that AI companies’ argument that they have a “legitimate interest” to process people’s personal data for the sake of training AI models is a potentially legal basis for doing so. The opinion does stress that claiming “legitimate interest” would require companies to pass a three-step test, including having a “clear and precisely articulated” reason for processing someone’s data, and that the processing would have to be “really necessary” for achieving the desired aim. Meta applauded the decision, while stating it’s “frustrating” it took this long. Some privacy advocates, on the other hand, felt the decision is too vague, while others worry that the opinion will make it difficult to offer many AI applications in Europe. In particular, some pointed to challenges the three-step test poses for general AI models like ChatGPT that weren’t built with one clear use in mind and can be used in new and different ways after release. You can read more from Fortune’s David Meyer.
Intensifying U.S.-China tech tensions are impacting immigration of top AI talent. China produces half of the world’s AI talent and has consistently ranked as U.S. tech companies’ biggest source of highly-skilled international STEM workers. Chinese AI workers are still looking to immigrate, citing that restrictions prevent them from accessing cutting-edge chips and technologies, like those from OpenAI. Yet, rising geopolitical tensions and espionage concerns are leading to increased scrutiny, longer screening processes, and visa delays for Chinese nationals applying to study or work in the U.S., as well as Canada, another popular destination for top AI talent. You can read more from Rest of World.
Character.ai is hosting chatbots emulating real school shooters—and their victims. “Much of this alarming content is presented as twisted fanfiction, with shooters positioned as friends or romantic partners,” reported Futurism, which found chatbots emulating the specific shooters who committed the massacres at Sandy Hook and Columbine, as well as their victims. Other chatbots thrust users into the midst of graphic school shooting scenarios, prompting them to navigate chaotic scenes at a school in a game-like simulation. These scenes discuss specific weapons and injuries to classmates, reported Futurism. The disturbing report comes as the Google-backed company is already facing multiple lawsuits alleging its chatbots promote violence and self-harm to young users.
AI search tool Perplexity raises additional $500 million at $9 billion valuation. That’s according to a Bloomberg story. The funding round was led by Institutional Venture Partners. Founded in 2022, Perplexity has grown rapidly, and boasted 15 million active users as of March.
Google is asking contract evaluators helping to train its Gemini AI system to judge content in which they may have no expertise. Tech Crunch, citing documents that it had obtained, reported that Google had updated the guidance it had given contractors who work for GlobalLogic, an outsourcing firm whose contractors provide feedback on the answers Google’s Gemini AI models produce in order to help refine those systems. While the contractors used to be able to skip evaluating answers if they felt unqualified to assess the response, the new guidelines removed this option. Critics argue this could lead to less reliable AI outputs, particularly in critical domains such as healthcare, financial advice, or legal advice. Google declined to comment on the report.
OpenAI rolls out a ‘1-800-CHATGPT’ feature. The AI company announced it will allow users in the U.S. to call ChatGPT for free for up to 15 minutes per month using 1-800-CHATGPT and message it globally via WhatsApp, the Verge reported. The service, powered by OpenAI’s Realtime API, aims to make AI more accessible through familiar channels. OpenAI clarified it will not use these calls to train its models, addressing privacy concerns, Fortune’s Jenn Brice reported. But the new feature is reminiscent of Google’s discontinued GOOG-411, which collected voice samples to improve speech recognition.
FORTUNE ON AIDatabricks CEO Ali Ghodsi on raising $10 billion, fighting for AI talent, and someday going public —by Allie Garfinkle
Hundreds of OpenAI’s current and ex-employees are about to get a huge payday by cashing out up to $10 million each in a private stock sale —by Sharon Goldman
Michael Dell says adoption of AI PCs is ‘definitely delayed,’ but it’s coming: ‘I’ve seen this movie a couple times before’ —by Sharon Goldman
How Lowe’s is trying to spruce up shopping with AI, mixed-reality headsets, and other new technologies —by John Kell
AI CALENDARJan. 7-10: CES, Las Vegas
Jan 16-18: DLD Conference, Munich
Jan. 20-25: World Economic Forum, Davos, Switzerland
Feb. 10-11: AI Action Summit, Paris, France
March 3-6: MWC, Barcelona
March 7-15: SXSW, Austin
March 10-13: Human [X] conference, Las Vegas
March 17-20: Nvidia GTC, San Jose
April 9-11: Google Cloud Next, Las Vegas
EYE ON AI NUMBERS2That’s how many years on average it took the AI startups that became unicorns in 2024 to reach that $1 billion+ valuation level. That’s compared to nine years for non-AI unicorns. Of the 72 companies that became unicorns this year, 32 (44%) are AI startups, according to CB Insights.
Even when it comes to much smaller rounds and valuations, non-AI startups are struggling to fundraise as AI monopolizes the attention of investors. Reporting this week in TechCrunch described how other startups can’t compete with all the appetite for AI, and that non-AI companies that raised series A rounds 18 months ago are struggling to raise series B rounds, even with decent revenue growth.
{Categories} _Category: Implications{/Categories}
{URL}https://fortune.com/2024/12/19/international-initiative-aims-to-prevent-bias-in-medical-ai-healthcare/{/URL}
{Author}Sage Lazzaro{/Author}
{Image}https://fortune.com/img-assets/wp-content/uploads/2024/12/GettyImages-2155976462-e1734626652517.jpg?w=2048{/Image}
{Keywords}Artificial Intelligence{/Keywords}
{Source}Implications{/Source}
{Thumb}{/Thumb}