Notes on the AI for Health 2024 conference

I attended AI for Health this November, at Station F in Paris.

I am writing this from rough notes, so this is not a proper conference report. More like a short memory dump: what I saw, what annoyed me, and what I might want to dig into later.

Compared to PyData, this was a very different kind of event. Less code, fewer open-source maintainers, more institutions, pharma, startups, cloud providers, and policy talk. Which makes sense: healthcare AI is not just about models. It is about data access, regulation, interoperability, clinical validation, infrastructure, procurement, and trust.

That also made the event quite uneven for me.

Some sessions were interesting. Others felt like the usual conference fog: important people agreeing that important things are important, product teams presenting roadmaps as insights, and ambitious claims floating around without enough methodological ballast.

In healthcare AI, I think the bar should be high. A nice demo or a polished panel is not enough. If the topic is clinical care, public health data, or drug development, I want definitions, evaluation protocols, failure modes, deployment constraints, and footnotes.

Station F used to be a huge train maintenance site before being reborn as the world's biggest startup campus.

Foundation models, but not only LLMs¶

The broadest session I attended was on foundation models for healthcare, with speakers from Bioptimus, the Health Data Hub, CEA, and PRAIRIE (replay here).

The useful reminder was that "foundation model" does not mean "LLM". The discussion covered medical language models, but also histology, protein models, mutation prediction, and future multimodal systems combining several biomedical data sources.

The hard part is obvious: medical data is sensitive, fragmented, and difficult to share. If we want models adapted to French healthcare data, we need data access, coordination, compute, evaluation, and regulation to all move together.

Easy sentence to write. Much harder system to build.

I liked the concluding point by Emmanuel Bacry, from the Health Data Hub: everybody is eager to mutualize the effort of building non-English LLMs. That feels very European, in both the good and difficult sense.

But the session also suffered from the panel format. Everything is compressed until almost nothing sharp remains. Everyone is aligned, reasonable, strategic, and optimistic. Pleasant enough, but low-density.

Claims, copilots, and product pitches¶

I also attended a short session on AI in clinical trials (replay here). The framing was sensible: AI can help with trial design, recruitment, enrollment, execution, and report writing.

Some examples sounded plausible: extracting safety information from previous trial abstracts, modeling adherence from patient demographics, helping produce reports faster.

One number made me raise an eyebrow: average clinical trial completion time allegedly decreasing from 8.6 years in 2019 to 4.8 years in 2022. Maybe there is a careful definition behind it. But this is exactly the kind of statistic where I want the footnotes before the slide, not after the conference.

The Synapse Medicine talk (replay) was more grounded, because it started from an actual workflow: doctors having very little time to make prescription decisions for complex patients. They presented prescription support, audio transcription, and recommendation tools.

As always with healthcare copilots, the question is not whether the demo looks good. The question is whether it is integrated into the clinical workflow, reduces cognitive load, and is seriously evaluated.

There were also more product-oriented sessions, notably from InstaDeep (replay) and Nebius (replay). I understand the economics of conferences: sponsors exist, and infrastructure matters. But as an attendee, the difference between "here is what we learned" and "here is what we sell" is not subtle.

Federated learning: the one I want to revisit¶

The session I found the most technically grounded was Marco Lorenzi’s talk on federated learning for sensitive healthcare data, from INRIA (replay here).

The promise is well known: train models across hospitals without centralizing raw patient data. The reality is harder: heterogeneous infrastructure, heterogeneous data, subtle privacy guarantees, and deployment constraints everywhere.

The important reminder was that federated learning does not magically make privacy problems disappear. Model parameters can leak information, and under some conditions data can be reconstructed from model updates.

This was the kind of content I wanted more of: a clear method, a real deployment problem, and limitations stated plainly.

I also liked the idea that sensitive ML deployments may require a new kind of role: the ML code reviewer. Not just to check code quality, but to understand what is trained, what is shared, what is logged, and where the privacy/security boundary actually is.

The associated project I wrote down to revisit is Fed-BioMed.

Final impression¶

I did not leave AI for Health with a list of libraries to test over the weekend.

I left with a reminder that healthcare AI is mostly about everything around the model: data access, validation, institutions, regulation, infrastructure, and clinical adoption.

I was also left slightly frustrated. The event was useful as a map of the ecosystem, but too many sessions stayed at the level of slogans, panels, or product positioning.

That is not specific to AI for Health. It is probably the default failure mode of large AI conferences in 2024. But in healthcare, the gap between ambition and evidence matters more than usual.

So: interesting day, mixed conference, useful reminders, and one or two threads worth pulling later.