Analog Intelligence

AI clinical deployment contexts in radiology

Radiology ML models can be deployed in one of three clinical contexts: within the image acquisition hardware, within the data storage system aka PACS, or on the radiologist’s workstation.

First, models can come pre-bundled with hardware where they are guaranteed to work together smoothly. As such, they require no dedicated technical integration, sales, or contracts. They are vendor-specific and can’t be utilized elsewhere.

These models improve the functionality of the hardware. ML models on a CT scanner can help with image reconstruction on x-ray projection data, while models on an ultrasound station can help sonographers adjust the probe and optimize the image.

Hardware is often a long-term investment costing millions, while ML is a rapidly advancing field. ML models bundled with hardware are therefore likely to go out of date very quickly. It will be interesting to see how software updates to hardware are able to accommodate this.

Next, we have models that are automatically triggered as data makes its way into storage. These are likely vendor-agnostic, especially given the DICOM standard to which virtually all radiographic images adhere.

Background-running ML models often produce minimal friction and little to no interruptions to clinical workflow. Model-generated artifacts such as a presence/absence of a pathology or a segmentation mask can be saved alongside the images.

Examples include workflow optimization models that automatically triage and prioritize cases for subsequent reads by radiologists. Model predictions here act as a secondary check in parallel to the radiologist’s independent assessment.

Finally, we have models running on the radiologist’s workstation and are often manually triggered. Model predictions can be edited by users for a closed feedback loop where models are periodically retrained and hopefully improve over time.

The physician-facing nature of these models (whether in a hospital or through teleradiology) gives a good amount of control back to the user. As such, the model front-end and its accompanying UI/UX are of extreme importance.

Biotech: Navigating the double funnel

It is no surprise that developing a drug is estimated to cost $2.6B spanning an entire decade. With success rates in the single digits for many diseases, it is a very humbling line of work. The same goes for startups with the majority failing to reach the next funding round.

Biotech startups are essentially navigating these two funnels simultaneously: working on pushing their programs while ensuring enough runway to the next raise. Many biotech companies will go public with the first promising results and long before their drugs are approved.

Another way for startups to exit this funnel midway is through acquisitions by larger pharma companies. The former is innovative - working on new bold ideas, while the latter is structured - excelling in the clinical, regulatory, marketing, and sales aspects of the business.

Funnel shapes will also vary according to disease. The Alzheimer’s funnel, for instance, may have a very wide top (many more candidates) and a very narrow bottom (low success rates). Generally, the ultimate goal is to avoid late stage failures which tend to be the most expensive.

As for drugs that do not make it all the way down the funnel, there is a business of buying and selling failed “assets” and repurposing them for different indications. Because these assets have a specific shelf-life, they become less valuable over time until their patents expire.

Four models for delivering AI value in drug development

Drug discovery and development is a multi stakeholder process. For early stage AI startups, it is imperative that they work with relatively larger more established companies. Here are four models describing these partnerships in order of increasing risk.

Software-as-a-service (SaaS)

The hands-free model where ML tools are made available through the web and users interact with them much like any other software product. These typically involve an annual license fee based on number of users.

This model is best suited for highly specific ML tasks e.g. a microscopy image analysis tool that detects and counts different cell types. To allow for some level of customization, developers may introduce bespoke options e.g. “no-code” interfaces to train your own ML model.
Consulting

ML startups and larger biotechs may also engage in one-off consulting agreements. This is where the technology is stress-tested and is often considered a precursor to more involved potential partnerships down the road.

Pharma X might have a specific problem at hand that startup Y promises its ML technology can solve within ~1 year for a lump sum payment of $1M. The additional income here may help get the startup off the ground and serves as proof that their ML technology has utility.
Partnerships

These typically come after a couple of successful one-off consulting jobs. It is where risk is truly shared among the parties. Partnerships are open ended explorations that involve working on a general question for a number of years.

It may involve in-licensing a drug, or unleashing the startup’s ML platform to help discover treatments for an indication area the pharma company has expertise in. These come in the form of an initial payment together with milestone payments based on success.
Partnerships ++

As startups grow to take on more risk, they may enter partnerships where they have more ownership. This can come in the form of royalties on future drug sales if the program is indeed successful.

This is a great way for pharma to “de-risk the deal”, share it with the startup, and delay some expenses until actual revenue comes in. While the rewards for startups may be significant, they lose out on immediate revenue.

Choice of model will largely depend on how far the startup is in its journey, the nature of the deal, and what each party is expecting to get out of it. They also present options for startups to diversify their revenue streams, especially startups that are not venture-backed.

Machine learning is just one part of the story

While many biotech companies may identify as “AI-first”, machine learning is often one of three components that come together to create coherent solutions to problems.

The first and most crucial component is the company’s main differentiator. It may be a new way of approaching biology or an asset/IP in the more traditional biotech sense. Everything else is built around this component with the ultimate goal of supporting its development.

This may be probing the immune system through aging biology (Spring), growing brain organoids and measuring drug effects (Herophilus), or laser editing to reprogram cells (Cellino).

The second component is the data generator, the scale enabler. It allows for running controlled experiments and measuring thousands of interactions. Automation and robotics play a major role here.

These may be physical experiments including high throughput screening and long-term culture with continuous monitoring, or even purely computational in the form of physics-based molecular simulations.

Finally, we have machine learning - often referring to data science in the broader sense. This is where data is used to create value and make decisions. It may be data analysis, trend identification, a model that captures some essence of the data, or even plain storytelling.

You may also view these three components as part of a continuous loop, a discovery machine of sorts. A scientific belief that guides an experiment, followed by data analysis that ultimately informs how this belief should be altered for the next iteration through the loop.

This is simply how science has always been done. However, our recent ability to perform large scale experiments and analyze data at unprecedented fidelity is what enables rapid efficient loops with tighter feedback cycles.

Without a science-driven hypothesis as a company’s core brand, together with a means of generating data at scale in controlled environments, there would be no utility for machine learning.

Flagship pioneering: Four forces shaping biotech

Four main forces shaping biotech as outlined in Flagship pioneering`s 2022 annual letter^↗︎.

For context, flagship pioneering is a life science venture capital firm perhaps most known recently for its investment in Moderna.

Force 1 is the shift away from the current probabilistic drug discovery approach towards a more deterministic one. Instead of trial and error screens of thousands of drugs to identify a “hit”, we are now able to design drugs with predictable outcomes.

One can now consider a drug more like a piece of code or a set of instructions meant to perform a specific task, making it more targeted and thus more likely to succeed. A great example of which is the Moderna mRNA vaccine.

Force 2 deals with the convergence of machine learning and biology. On one hand, we have ML applications built on top of data from high throughput drug screens - helping us better understand disease pathways.On the other hand, our understanding of how the human brain works (admittedly, a very limited understanding) is helping us develop better ML tools.

Force 3 deals with an emerging biotech model where companies have a core competence aka “a platform”. This discovery machine is then able to spin out therapeutic programs across multiple disease areas.

Many argue that this model allows for compound learning where knowledge from one program feeds into another. While this model needs more validation, the flexibility to pivot to new directions is perhaps its main advantage over the traditional single-program counterpart.

Force 4 touches on how regulatory bodies have responded to the urgency in developing covid vaccines e.g. overlapping/combining clinical trial phases, and how their practices may evolve beyond the pandemic. The hope is for regulatory pathways that better balance risk and benefit.

ML-readiness across 3 data sources: Preclinical research, clinical research, and clinical practice

The state of being ML-ready can vary widely across different data types, primarily influenced by their source. The distinction here is between experimental data collected within preclinical or clinical research, and real world clinical data.

Any data collected within an experimental context - a drug discovery assay or a clinical trial - often come with high levels of ML-readiness. This is a function of inherent control over the experimental design, characteristics of subjects/samples, and potential confounders.

Real world clinical data, on the other hand, is provided as is. Good luck convincing a physician to write a clinical note in a more standard and structured manner, or getting a technician to change the way a routine CT image is acquired. This translates to low levels of control.

Readily annotated data enables supervised learning. In experiments, labels are identified a priori and are therefore available from day one. In a real world setting, labels are created by different stakeholders in different locations at different times.

This requires multiple glue layers to construct a single annotated training sample. It may start with a patient’s MRI from the radiology PACS, linking it to an oncologist’s note in the EHR, and coupling it with a pathologist’s cancer grade from LIMS.

Working with ML-ready data enables data professionals to spend more time on the actual modeling, storytelling, and visualization. The other end of the spectrum requires considerable data engineering effort before diving into any ML work.

The promises of AI in small molecule drug discovery

An immediately obvious application is repurposing existing molecules for new indications. This is particularly evident in phenotypic drug discovery where the data generated from high throughput drug screens makes for a perfect ML use case.

Next, you have generative ML tools that are able to explore chemical spaces long studied by medicinal chemists. These tools are then able to generate structurally similar molecules to those already known and, in some cases, approved.

It is likely that chemists would have been able to identify these ML-generated molecules based on traditional approaches. Nevertheless, ML tools may help identify empty gaps and pockets in the IP space and turn marginally novel molecules into patentable material.

Exploring new structural space is where untapped opportunities exist. While we have not seen this yet, one can imagine a radically novel ML-generated molecule working with a well-studied target.

The holy grail here is an ML-predicted mechanism, target, or disease pathway - a goal we are admittedly still far from. This type of work today mostly relies on “disease knowledge graphs” built by scrapping scientific literature with mixed results that only reflect our biases.

Recursion: A biotech company scaling more like a tech company

Only a handful of ML-centric companies in the health/bio space have gone public. The regulatory documents from these filings provide a good amount of transparency into both the vision and operations. Here are the top 5 highlights from the Recursion S-1 filing.

In a nutshell, Recursion has a massive warehouse in Salt Lake city where robots manipulate cells and knock out different genes. The cells are imaged throughout, and the data is fed into ML models to identify which of the genes targeted make cells happy.

A biotech company scaling more like a tech company.

Generally speaking, biotech is primarily R&D-focused while tech is not. The struggle here is to balance the flexibility required for innovation vs the consistency and structure required for scale.
Wet-dry lab integration.

Tightly integrating the data generation and analysis sides of the house. The wet-lab relies on heavy automation and is constantly uploading data, which in turn are analyzed by the dry-lab. Decisions are made and fed into the next cycle of experiments.
Focus on rare genetic diseases.

A platform-disease fit may be evident if ML models are able to capture significant morphological changes. As the root cause of genetic disease is known, a stronger preclinical -> clinical leap and higher chances of success are likely.
Monopoly on diverse talent.

Setting up shop in Utah outside usual hotspots has translated into a monopoly on talent. The roughly 50/50 split between wet and dry scientists is quite uncommon for biotechs where data scientists are often a minority and regarded as service providers.
Early specialized platform focus.

The first 5 years were primarily focused on building the platform which comprises specialty components for chemistry, phenotypes..etc. This early investment contributed to clean standardized ML-ready data, some of which has been open-sourced.

$RXRX is down ~65% since going public in April 2021. While this may be a function of the recent poor performance of the entire sector, it may also be that public investors have difficulties valuing platform companies where the core value may not necessarily be in their programs.

You can browse the Recursion S1 filing here^↗︎.

The triangular spectrum of research data teams

Data teams in preclinical research operate within a triangular spectrum spanning three core areas: infrastructure, machine learning, and data science. ML often takes most of the glory, but it is only one part of the story.

First, the infrastructure core engineers cloud data store and compute services. In parallel, this core also builds and maintains “glue layers” that connect with lab equipment through LIMS, external CROs, and data labeling services - ideally through some form of API.

This ensures scientists on the team have direct access to data acquired in-house or externally. This core also provides utilities for provisioning compute resources i.e. cloud instances allowing other team members to focus solely on running experiments.

Next, we have the machine learning core, the more fundamental research arm of the team. This arm provides a gateway into current best “state of the art” practices by reviewing the literature, identifying appropriate methods, and implementing them for data scientists to use.

This core also manages and maintains benchmarking datasets to better understand model performance over time, in addition to other ML technicalities including model training bottlenecks and GPU acceleration.

Finally, the data science core works on extracting value and insights from data. This core is deeply integrated within different programs and works very closely with biologists. With new data coming in every week, this core is responsible for analysis and interpretation.

Data science essentially manages the last-mile, that last leg of the data journey. By combining storytelling and visualization, this core communicates findings to the program team based on which a decision is made on how the next iteration of experiments/data should look like.

Members in smaller teams will start by working across these cores, juggling them around as needs evolve. As teams grow, members become more specialized as they are polarized toward one of these cores.

Novel proteins across sequence, structure, and function

The new generation of antibodies, enzymes, peptides and other proteins will be designed and engineered - not discovered or screened. ML is bound to play a major role in this mindset shift, from identifying sequences to resolving structures and crafting functions.

The design of de novo proteins never seen before in nature is a highly sought-after goal. Natural proteins have not evolved to serve the highly specialized functions that we now want them to carry out. We need to explore nature’s uncharted territory.

One method for such exploration is directed evolution: starting with a natural protein and mutating it until a desired function is achieved. For many research areas, no natural proteins can serve as this starting point. We need better means of sampling the sequence space.

Similar to language, protein sequences can be represented by a string of 20 amino acids. As such, we are able to capitalize on the major advances in NLP to model proteins, including large language models and their underlying transformer architectures.

Labeled biomedical data is often difficult to come by. One major advantage of using language models in protein modeling is their self-supervised nature. These models learn by masking or perturbing random portions of the protein sequence and then attempting to autocomplete them.

A protein’s sequence is inherently linked to its stable folded structure, which in turn dictates its functionality. Protein folding is a long standing problem with 10^143 ways to fold - Levinthal’s paradox. AlphaFold has made some remarkable progress in this area.

Predicting a protein’s 3D structure from its sequence - with some degree of accuracy - is bound to accelerate our search for “unnatural” proteins. As AlphaFold provides mere predictions, experimental methods such as X-ray crystallography remain as the gold standard.

Beyond sequence and structure, we have function. Whether these functions are therapeutic (affinity, immunogenicity, stability) or biomanufacture-related (titer, rate, and yield), this is a multi-objective optimization problem where ML tools can provide much value.

The best ML models will learn from experimental validation and tight integration into iterative feedback loops with the wet lab. They will also learn to work with the nuances of proteins including their multi-state conformational space and their highly variable sequence lengths.

Unnatural Selection: AI Meets Synthetic Biology

Tue, 29 Jul 2025 22:40:32 GMT

Making Insulin in the Lab

Originally sourced from pig pancreases, the synthesis of insulin in 1979 by Genentech marked a major scientific milestone within the field of synthetic biology. By inserting the human insulin gene into yeast cells, scientists were able to encourage the production of insulin. In addition to addressing ethical concerns around animal welfare, making insulin in the lab allowed for cost-effective scale. Without synthetic insulin, an area larger than the surface of the earth would be needed today to raise pigs and harvest this critical protein to treat millions of diabetics around the world^↗︎.

Synthetic biology is the systematic design and engineering of biology. The field is built around the introduction of engineering principles into biology through standardizing and abstracting biological components. Through a better understanding of the language of biology, akin to programming in computer science, the ultimate goal of synthetic biology is producing biological systems with predictable behaviors^↗︎.

The ability to fluently read (sequence) and write (synthesis) this language promises to help tackle some of the world’s most pressing challenges. On the healthcare front, cell and gene therapies allow for correcting defective genes in patients while engineered proteins are developed into highly precise targeted therapies. Mosquito-borne diseases, such as Malaria and Zika, can be eliminated by designer mosquitoes with gene drives that short-circuit the usual patterns of genetic inheritance and prevent disease transmission^↗︎. On the industrial front, engineering cells to a certain specification and rewiring their metabolism enables the production of almost everything we consume today from flavors and fabrics to food and fuels^↗︎. Whether it is replacing oil and gas with biofuel fermentation or engineering bacteria to deliver nitrogen to plants and eliminate the need for fertilizers, many industrial processes are poised for a biological transformation.

The Cell as a Factory

Synthetic biology uses engineered genetic constructs to program host cells as biological factories, producing valuable proteins through a controlled process of gene insertion, cultivation, and purification.

Synthetic biology is centered around harnessing the cellular machinery and utilizing cells as tiny factories to produce biological material. The process starts with the design of genetic constructs that contain a gene of interest - chosen for its ability to encode a protein with desired properties or functions - along with regulatory elements such as promoters and terminators. The next step involves selecting host cells, with Yeast and E. coli being the most common. These cells are chosen based on a multitude of factors including ease of genetic manipulation, scalability, and protein secretion yield. While other approaches replace host cells with cell-free systems that remove genetic regulation and enable direct access to the inner workings of the cell^↗︎, the overall process remains largely unchanged.

The genetic constructs are then delivered and inserted into the host cells through gene transfer methods (e.g. viral vectors) and gene editing (e.g. CRISPR-Cas9) among others^↗︎. The genetically modified cells are then cultivated in a suitable growth medium containing all the essential nutrients, water, and feedstock required for protein production. Finally, the harvesting process involves isolating the target protein from the culture medium and disposing of waste including other metabolic byproducts and cellular debris. The purified protein can then be further processed for its ultimate use case, whether it be an antibody drug for treating cancer or an industrial enzyme for making cheese.

Design-Build-Test-Learn + AI

At the core of synthetic biology lies in the design-build-test-learn (DBTL) cycle, mirroring the traditional scientific method of hypothesis generation, testing, and learning, but tailored for engineering purposes. This iterative process starts with designing cellular manipulations to achieve a specific goal. This is followed by building or implementing these designs in the biological system. Next is the generation of experimental data through testing how closely the resulting phenotype achieves the desired goal. Finally, this test data is leveraged to refine future iterations and drive the cycle to the desired goals more efficiently than what may be accomplished by a random search^↗︎.

The design-build-test-learn cycle is the core methodology in synthetic biology, where computational design informs experimental construction, followed by testing and data analysis to generate insights that drive the next design iteration.

DBTL cycles today are often trial-and-error processes primarily driven by our biological intuition and experience, and guided by ad-hoc practices. Given the complexity of biology paired with our incomplete understanding of cellular mechanisms, DBTL cycles face long development times and high likelihood of failure where the desired phenotype is never achieved. By capturing knowledge directly from high-throughput experimental data and predicting bioengineering outcomes, AI can play a vital role in enhancing the efficiency and accuracy of DBTL cycles. More specifically, the learn step is where AI can really shine by using data from the test step to inform the design step of the next cycle. In fact, it might as well be called the model step. The ultimate goal of tightly integrating AI models in the DBTL cycle would be to enable experiments at the massive scale of computational simulations, but with gold-standard real-life experiments^↗︎.

Guiding experiments in a systematic fashion - without the need for a full or intricate mechanistic understanding of the biological system - is perhaps the greatest value AI can deliver to synthetic biology experiments^↗︎. This comes along with improvements in speed, both in terms of cycle time as well as number of cycles required to reach the desired output^↗︎. Insights from AI models can also help avoid involution: iterative trial-and-error leading to endless DBTL cycles that spiral into a state of increased complexity rather than increased productivity^↗︎. AI can also help expand the toolkit available to conduct such experiments. For instance, Yeast and E. coli are the go-to hosts used in synthetic biology today often because they are the most studied rather than being the most optimal^↗︎. AI, through techniques such as transfer learning, can adapt learned knowledge from yeast models to other so-called non-model organisms^↗︎ including microalgae and fungi opening up the space for a wider range of possible products^↗︎.

Unnatural Proteins

Through the statistical linking of inputs and outputs in flexible models capable of representing diverse relationships, AI can provide valuable insights into complex biological systems^↗︎. However, the real promise of AI extends beyond understanding natural proteins as they exist today and towards the design of novel molecules. By exploring the vast space beyond what nature has sampled, we can unlock the potential for protein engineering to create entirely new functionalities never seen before in nature.

“The amount of sequence space that nature has sampled through the history of life would equate to almost just a drop of water in all of Earth’s oceans^↗︎.”

We need to venture into nature’s uncharted territory. For many areas of biomedical research and drug development, there are no natural proteins that can serve as suitable starting points to build new proteins^↗︎. Natural pathways may prove insufficient in scenarios where genes for product synthesis are unknown or where no natural pathway exists for desired biosynthesis. In such cases, designing novel non-natural pathways becomes necessary^↗︎. Another example lies in gene therapy where efficacy is currently limited by the roster of naturally occurring vectors used to deliver the therapy^↗︎. These vectors have not been optimized for disease treatment and are therefore unable to carry out the highly specialized functions we now ask of them.

The new generation of antibodies, enzymes, peptides, and other proteins will be designed and engineered - not discovered and screened. AI is bound to play a major role in this mindset shift by identifying new unnatural sequences and linking them to stable structures and valuable functions.

Sequence, Structure, & Function

Proteins, often described as biology’s actuators, play a critical role in various physiological processes within our bodies where around 20,000 are responsible for tasks ranging from digestion to oxygen transport^↗︎. Their complex 3D structures are built up of a sequence of building blocks called amino acids. Protein sequences can be represented as strings of letters; The protein alphabet consists of 20 common amino acids^↗︎. Fundamental to understanding their biological functions, the study of proteins involves decoding sequences of amino acids much like deciphering human language. However, the vast combinatorial space of possible protein sequences surpasses astronomical scales. For a relatively short protein of 50 amino acid length, there are over 1065 possible combinations while a 100 amino acid protein holds more combinations than the number of atoms in the observable universe. This makes the sequence space practically infinite.

Traditional protein engineering methods, often mimicking evolutionary processes, struggle with inefficiency due to the immense sequence space and reliance on experimental screening. One such method for exploring the sequence space is directed evolution or rational design where we start with a natural protein and mutate it until a desired function is achieved^↗︎. Given the limitation of exploring infinitesimal regions of the sequence space around starting points, experiments may reach a local maximum or get trapped in a local peak and miss out on the global optimum^↗︎. Because biology is intrinsically messy and non-modular, such approaches are often only slightly better than educated guesses^↗︎. We need better means of sampling the sequence space.

The next generation of therapeutic and industrial proteins will be designed using AI through learning from sequence, structure, and functional data.

Attempts at going beyond the natural protein neighborhoods have thus far been highly inefficient and artisanal in nature, relying heavily on humans for both experimental design and execution^↗︎. When faced with the vast sequence space, most approaches rely on random high throughput screening and serendipity i.e. trial and error at a massive scale^↗︎. This is where computational approaches are needed to drive experimental efforts, presenting an ideal proving ground for AI. In lieu of exploring the sequence space blindly, AI can help guide us towards meaningful sequence regions likely to produce proteins to our specifications^↗︎. AI can also help fill in the blanks by more accurately interpolating between sampled points, and ultimately narrowing down the options before experiments bring molecules to life in the lab. Since it is impossible to brute force our way through the entire sequence space, AI can create a virtual fitness landscape to guide search away from nonfunctional sequence neighborhoods where proteins simply can not exist^↗︎.

Exploring the Vast Sequence Space

Proteins are typically composed of reused modular elements including motifs and domains that can be assembled in a hierarchical fashion - akin to words, phrases, and sentences in human language^↗︎. Given these similarities in both shape and substance, we are able to capitalize on major advances in natural language processing (NLP) to model proteins^↗︎. The prior state of the art in NLP was recurrent neural networks. These operated sequentially on one word at a time and were not great at working with longer sentences; They would forget the context as they reached the end of the sentence. Then came transformer architectures in 2017^↗︎, the methodology powering virtually all large language models today (ChatGPT, Claude..etc). Transformers rely on a self attention mechanism that assigns attention weights to words (or parts of a protein sequence) based on their surrounding context to help the model focus on pertinent information^↗︎. This allows models to learn highly versatile, multipurpose, and information rich numerical representations of protein sequences - also known as embeddings - that can be used for any downstream task including protein structure and function prediction^↗︎.

Another major advantage of using language models in protein modeling is their self-supervised nature. Labeled biomedical data is often difficult to come by. These models learn from the data itself by masking random portions of the protein sequence and then autocompleting them - essentially predicting an explicit ground truth. As such, language models are able to learn to speak protein^↗︎ from raw sequences without labels, making it usable on any corpus at massive scale^↗︎. This understanding of protein grammar enables a generative capability which in turn allows for writing entirely new protein sequences. It also greatly simplifies the protein design process where scientists can focus on creating a protein based on desired functions while leaving the sequence up to the model^↗︎.

Moreover, AI methods for searching the sequence space can also be refined based on the DBTL iteration cycle. While earlier cycles may focus on exploration - trying new unexplored neighborhoods, later cycles lean more towards exploitation - concentrating on an already identified neighborhood and picking the best performers. Loss functions used in training AI models can be optimized for pure exploration, pure exploitation, or anything in between^↗︎. This versatility makes AI a valuable tool in protein sequence space exploration.

Stable Structures

Now that we have identified a novel protein sequence, we need to confirm if it is indeed real. Can it fold in a stable well-defined structure? A protein’s sequence is inherently linked to its stable folded structure, which in turn dictates its functionality. Protein folding - a process driven by energy minimization - is a long standing problem with 10143 ways to fold, also known as Levinthal’s paradox.

Solving atomic structures remains an underdetermined problem with noisy data^↗︎. It is a laborious and expensive process where multi year research projects focus on describing the structure of a single protein^↗︎. These have traditionally relied on experimental techniques including cryo‐electron microscopy and X‐ray crystallography^↗︎. While reliable, these techniques are time-consuming, resource-intensive, and come with a host of limitations due to the immense complexity of protein folding. They are also limited in terms of scale where the largest repository of experimentally verified protein structures, the protein data bank^↗︎, has just over ~230k proteins - a drop in the ocean with respect to the vast sequence space.

While experimental methods remain as the gold standard, physics-based computational approaches emerged as an alternative. Software like Rosetta^↗︎, attempted to simulate protein folding by calculating energy functions for each amino acid and their conformations to predict the most stable fold^↗︎. However, due to the astronomical number of possible conformations, these approaches were computationally prohibitive, often requiring thousands of computers running for weeks to simulate a single protein’s folding pathway^↗︎.

The field experienced a paradigm shift from physics to statistics with the advent of deep learning. Rather than simulating folding from first principles, these methods learn patterns from existing protein structures. This shift is rather significant as we can now make sophisticated inferences about the relationship between sequence and structure without an atomic level understanding^↗︎.

The breakthrough came in 2018 when DeepMind’s AlphaFold outperformed traditional energy-based methods and dominated the CASP13 protein prediction competition^↗︎, followed by AlphaFold2^↗︎ (2020). While these earlier versions relied primarily on the attention mechanism described earlier, AlphaFold3^↗︎ (2024) introduced a diffusion-based architecture where noise is gradually added and then systematically removed from molecular representations - similar to how AI image generators function e.g. Stable Diffusion, DALL-E ..etc. The model starts with a cloud of atoms and iteratively refines it over many steps until converging on the final molecular structure^↗︎. This approach has allowed it to handle significantly more complex structural challenges with greater accuracy and confidence, particularly for proteins with limited evolutionary data.

In addition to AlphaFold, models like Boltz-2^↗︎ now extend beyond proteins to predict DNA and RNA folding patterns. These models can also predict complex multi-molecular structures such as virus spike proteins interacting with antibodies and sugars, as well as binding interactions between proteins and various molecules. This capability opens new frontiers in drug discovery, where understanding how proteins interact with potential therapeutic compounds is crucial.

Useful Functions

Once a stable protein structure is achieved, its utility hinges on optimizing four interconnected functional domains: therapeutic efficacy, safety, biophysical robustness, and manufacturability.

Now that we have a stable protein with a well-defined structure, we must ensure it serves the intended function. Protein functions can be broadly categorized into four interconnected domains. The first category encompasses target affinity and biological function: how effectively proteins bind to their targets, their binding kinetics, specificity, and overall biological activity. These properties directly determine a protein’s therapeutic efficacy.

The second functional category addresses safety and immunogenicity concerns. Historically, the immune system hasn’t been receptive to novel proteins, creating a significant barrier for therapeutic applications. As such, the goal is to co-optimize for both immunogenicity and protein function simultaneously^↗︎, and incorporate naturalness metrics that indicate whether a designed protein will possess desirable developability profiles and low immunogenicity^↗︎.

The third category focuses on biophysical properties, including thermal stability, solubility, conformational dynamics, and resistance to degradation. These characteristics determine whether a protein can withstand the conditions it will encounter during storage, transportation and ultimate use^↗︎.

The fourth category addresses manufacturability and developability. These are production metrics such as titer, rate, and yield that determine whether a protein can be produced at scale: crucial considerations for translating promising designs from the lab to the clinic.

Optimizing numerous protein functions simultaneously is a multi-objective challenge for which AI is very well suited^↗︎, as optimizing one feature at a time will likely lead to improvements in one at the expense of others. Experimentally measured functions are crucial, as the most effective AI approaches for protein function optimization learn from experimental validation through tight integration with wet lab feedback loops. These approaches often involve creating custom scoring functions that assign different weights to various functional parameters, allowing researchers to navigate the vast design space efficiently^↗︎.

Applications

Having explored how AI can help design protein sequence, structure, and function, we now turn to the diverse applications of these technologies. The choice of chemical space to explore largely depends on what one is trying to optimize, whether it’s an antibody drug, an enzyme for industrial production, or a delivery vehicle for gene therapy.

Overview of AI-powered synthetic biology startups categorized by their primary focus areas and technological approaches.

Direct Applications: Protein Design

The most obvious direct application is the design of therapeutic proteins. Companies including Nabla Bio and BigHat Biosciences use AI to design and optimize antibody therapeutics^↗︎ through predicting function from sequence alone^↗︎. Menten AI capitalizes on quantum computing to unlock the immense parallelism needed to explore the peptide space^↗︎. Natural protein discovery also benefits from AI approaches. Nuritas discovers bioactive peptides in natural food sources, identifying compounds with therapeutic potential that have evolved naturally over millions of years^↗︎.

Indirect Applications: Enabling Technologies

Instead of developing therapeutics directly, other companies invest in the technologies and workflows that accelerate their creation. Aether Bio exemplifies this approach by developing enzymes for downstream biomanufacturing with a focus on process innovation^↗︎. On the software side, Cradle^↗︎ and Tamarind^↗︎ are building computational tools and interfaces that help scientists design and optimize proteins for specific functions.

Specialized Approaches: Structure Prediction

Given the complexity of protein design, some companies focus on specific areas of the pipeline. In the structure prediction space, Gandeeva combines AI with cryogenic electron microscopy (cryo-EM) to accelerate structure determination at atomic resolution^↗︎, an experimental technique that has taken over structural biology in the past few years^↗︎. Others including Charm Therapeutics^↗︎ are working on structural prediction of macromolecular configurations, specifically the co-crystal structure of a protein-ligand complex based on the protein’s primary sequence and the ligand’s chemical structure^↗︎.

Beyond Protein Engineering: Parts, Small Molecules, and Diagnostics

The applications of AI in synthetic biology extend beyond protein engineering to other biological components, including predicting the function of biological “parts” such as promoters, ribosome binding sites, and untranslated regions^↗︎. At the DNA/RNA level, companies like Octant and Hexagon Bio are applying AI to small molecule drug development. Octant’s discovery platform engineers human cells to act as reporters, turning on the expression of a genetic barcode if a particular drug target is activated^↗︎, while Hexagon is mining fungal genomes and discovering evolutionarily refined small molecules alongside their protein targets^↗︎. In diagnostics, Sherlock’s DNA/RNA detection platform uses synthetic gene circuits that can be programmed to distinguish targets based on single nucleotide differences^↗︎.

Gene Therapy: A New Frontier

Gene therapy represents a particularly promising application area for AI and synthetic biology - the key components of which include the capsid (delivery vehicle), promoter (control element), and therapeutic transgene (genetic payload to treat disease).

Due to the inherent complexity, many companies take a horizontally integrated approach by focusing on one of these components. For many diseases, we often precisely know the genetic payloads we want to deliver, but do not always have the right vehicles to deliver them to relevant cells^↗︎. Gene therapies have traditionally relied on naturally occurring adeno-associated viruses (AAV) to deliver payloads, but these face limitations in biodistribution, off-target gene expression, pre-existing immunity, and manufacturability^↗︎. This has prompted the use of AI to design custom AAV capsids with specific characteristics, as demonstrated by Dyno Therapeutics^↗︎, Apertura^↗︎, and Capsida^↗︎.

For instances where payloads are too large to be contained within AAVs, companies including Replay are focused on utilizing the large cargo capacity of other viruses - namely the herpes simplex virus (HSV) - to deliver big genes^↗︎. Replay also boosts a hub-and-spoke model wherein the hub innovates and refines platform technologies in a centralized, scalable manner, while the spokes represent focused therapeutic development programs^↗︎. AI plays a vital role in supporting such a model with both scalability: Enhancing the hub’s ability to rapidly iterate on platform technologies that all spokes benefit from, as well as cross-pollination: Insights from one spoke can help refine AI models used across others.

Compounding Value

The integration of AI models with synthetic biology wet lab experiments creates a powerful data flywheel effect where each experimental cycle contributes to an ever expanding knowledge base as more of the landscape is explored^↗︎. Data becomes extensible over time enabling better models and predictions. Even failed experiments provide valuable information where strains that fail to achieve their objectives still contribute data that informs future designs. As such, the next strain that goes through the system will have a higher chance of success. This compounding pattern and increasing probability of success with each iteration is the hallmark of a true platform technology, creating enduring competitive advantages and distinguishing it from the traditional single product or “asset” approach^↗︎.

The data flywheel effect is a virtuous cycle where the collection and analysis of data lead to better decision-making, which in turn generates more data, creating a self-reinforcing loop.

Scaling Experimental Throughput

To fully leverage this potential, companies are rethinking how experiments are conducted and scaled as traditional well plates simply cannot generate data at the pace required to train sophisticated AI models. Companies like Sestina Bio (acquired by Inscripta) use microfluidics, microscopic droplets, and microchannels to miniaturize experiments^↗︎, while Aether has invested in proprietary custom-built hardware and robotic systems to fit their scaling needs^↗︎.

Approaches to Data Generation

Beyond scaling experimental throughput, companies are also exploring innovative approaches to maximize the value of biological data. EVQLV uses evolutionary modeling methods to optimize antibody design by mimicking natural selection processes. Starting from a given antibody sequence, their models computationally generate evolutionarily possible sequences with high affinity and lower likelihood of failure^↗︎. ProteinQure is minimizing reliance on expensive experimental data through physics-based molecular dynamics simulations. By incorporating fundamental physical principles into their models, they can make more accurate predictions with less experimental data^↗︎. Others including Generate Biomedicines^↗︎ and Latent Labs^↗︎ has taken a diversified approach, developing platforms capable of designing multiple types of proteins including antibodies, peptides, and enzymes. This strategy allows them to extract more value from each experiment by identifying generalizable principles that apply across protein families.

Headwinds

Despite these promising advances in AI-driven synthetic biology, several significant challenges remain ahead of these technologies reaching their full potential. These headwinds span technical limitations, scaleup challenges, and fundamental conceptual questions that must be addressed.

Technical Challenges

One of the most fundamental challenges for AI models in protein design is handling the extreme variability in sequence lengths ranging from 10 to 10,000 amino acids, complicating both standardized processing and model architecture design^↗︎. This is exacerbated with longer sequences as transformer-based models scale quadratically with sequence length, making them computationally expensive for long protein sequences^↗︎. This highlights the importance of capturing long-range interactions, which remains challenging with current architectures.

Unlike the static text in language models, proteins exist in multiple conformational states that are critical to their biological function^↗︎. This dynamic nature creates a fundamental modeling challenge. Most structural data in the Protein Data Bank^↗︎ represents stable conformations derived from experimental methods, creating an inherent bias in training data. Models trained on this data may struggle to capture the full conformational landscape of proteins, limiting their ability to design proteins with specific dynamic properties.

A significant portion of the proteome - estimated at 44% in eukaryotes and viruses - belongs to the “Dark Proteome”, which comprises proteins with no stable fold representing a well-defined three-dimensional structure^↗︎. These intrinsically disordered proteins, which often contribute to defense and signaling pathways, pose a particular challenge for AI models trained primarily on structured proteins. Current models struggle with these proteins as we have very little structural data about them beyond their sequences. While models can predict where disorder occurs, designing functional disordered proteins remains largely beyond the capabilities of current AI methods.

While the protein research community excels at organizing competitions like CAFA for function prediction^↗︎, CASP for structure prediction^↗︎, and CAPRI for protein-protein docking^↗︎, it lags behind other fields in establishing standardized benchmarks for model evaluation. Unlike competitions that occur at specific intervals, benchmarks are instantaneously accessible at any given point in time and therefore represent an important step toward accelerating progress in the field^↗︎.

Scaleup Challenges

“Just because you have a bug that produces a gram per liter in a flask doesn’t mean you are ready to go commercial^↗︎.”

One of the most significant barriers to commercializing synthetic biology products is scaling up production. Beyond engineering organisms that perform well in a laboratory reactor, they often need further tweaking to grow and thrive under pressure in steel tanks for large scale manufacturing^↗︎. This transition from bench-scale to pilot-scale to full-scale involves understanding how numerous process variables (feed rate, pH, temperature, fermentation time, mixing regime, media composition, aeration rate ..etc) impact host physiology, cell growth, product titers, rates, and yields^↗︎. This process remains largely heuristic, with scale-up development often seen as more of an art than a science^↗︎.

Current AI models typically optimize for molecular properties but rarely account for manufacturability parameters critical for real-world deployment. This is primarily due to the lack of appropriate training data, despite modern fermentation systems containing sophisticated process controls and comprehensive data collection systems that could be leveraged for training AI algorithms. Capitalizing on the data generated by these industrial systems is key in building the next-generation of AI models that consider manufacturability alongside molecular function.

Conceptual Limitations: Is Protein Sequence Really Like Language?

A more fundamental question concerns the conceptual framework underlying many AI approaches to protein design. The analogy between protein sequences and human language, while useful, has significant limitations that may constrain progress^↗︎. Unlike human language, proteins lack clear punctuation, stop words, and separable structures like words, sentences, and paragraphs. Specific words can have critical influence e.g. changing “love” to “loved” significantly alters meaning, while in proteins effects may be more aggregate e.g. altering the overall hydrophilicity of a region. Additionally, proteins form complex three-dimensional structures, a phenomenon that has no direct analog in natural language^↗︎.

As the field advances, researchers are increasingly recognizing the need for multimodal approaches that unify sequence, structure, interaction data, and experimental measurements. These include architectures that integrate spatial geometry and learn from evolutionary constraints. In essence, we are attempting to treat proteins less like language and more like physical, functional systems embedded in complex biological contexts^↗︎.

The Future of AI in Synthetic Biology

Democratization of AI Models Accelerates Innovation

Unlike previous computational biology tools that required specialized expertise, today’s most sophisticated AI models are increasingly accessible to non-technologists with virtually no barriers to entry. For instance, setting up and running Rosetta required significant technical expertise. Contrast this with running AlphaFold in a notebook bypassing the need to write a single line of code, download software, set up local environments, and more importantly avoiding command line intimidation^↗︎.

This democratization also enables the community to build upon and extend models. A few days after the release of AlphaFold 2, users reported a trick on Twitter that allowed the model to predict quaternary structures - something which few, if anyone, expected the model to be capable of^↗︎. Platforms like Hugging Face have further accelerated this trend by creating open-source communities where models can be shared, modified, and deployed with minimal friction. This collaborative ecosystem has enabled rapid iteration and innovation across the synthetic biology landscape.

The Mini-Biologics Opportunity

The growing importance of biologics presents a compelling opportunity for AI-driven synthetic biology. In 2024, biologics have grown to represent a third of all FDA drug approvals^↗︎, while making up 7 of the top 10 drugs by revenue^↗︎. Despite this growth, current AI approaches remain more established in small molecule discovery^↗︎. This creates a significant opportunity to redirect AI efforts toward biologics, particularly as the FDA continues to approve more biological products.

While large biologics like monoclonal antibodies dominate current markets, their complexity suggests that AI approaches may find more immediate success with smaller proteins and peptides. These molecules, typically around 30 amino acids in length, present a more manageable search space for AI models, making sizable coverage of the chemical space a somewhat attainable goal^↗︎. More specifically, naturally derived macrocycles and engineered constrained peptides have evolved to bridge the gap between small molecules and larger biologics by transcending the traditional boundaries of the cell to access deep, complex targets with high specificity. This has made them one of the fastest-growing categories of new therapeutic products and an ideal target for AI-driven design approaches^↗︎.

Biology as Inspiration for New AI Techniques

Perhaps most intriguingly, the relationship between AI and biology is not unidirectional. Just as AI is transforming biological research, biology itself continues to inspire new approaches to machine learning. This suggests that a deeper understanding of biological systems through synthetic biology will drive the development of novel AI architectures and approaches^↗︎. After all, biology has inspired staples of AI including neural networks, genetic algorithms, and reinforcement learning among many others^↗︎.

“The complexity of biological systems is such that AI solutions based purely on brute-force correlation finding will fail to efficiently characterize the system’s intrinsic features^↗︎.”

As trends discussed here continue to unfold, we can expect a new generation of AI-designed proteins that address previously intractable challenges in medicine, agriculture, and industrial biotechnology, ultimately transforming not just how we design biological systems but how we understand and interact with the living world.

Guidelines for AI Research in Medicine

Wed, 07 Apr 2021 22:40:32 GMT

Medical Research and Guidelines

Medical research in the 70’s and 80’s suffered significantly from poor, or at best mediocre, methodological quality^↗︎. As a response, the community came together to develop guidelines for several types of study design to ensure accurate and transparent reporting. A prominent example of this movement was in clinical trials. At the time, it was demonstrated that trials were being poorly reported due to bias in estimating treatment effects^↗︎. To address this, stakeholders co-developed the Consolidated Standards of Reporting Trials (CONSORT) statement in 1996. CONSORT would then become one of the earliest reporting guidelines, initiating a cascade of changes and improvements to the reporting of medical research in scientific journals^↗︎.

As it turns out, developing these guidelines is no easy feat. These efforts often involve large steering committees of transdisciplinary experts: academic faculty, researchers, practitioners, policy makers, and patient groups. After extensive meetings and voting on candidate components, the guideline is delivered in the form of a checklist together with a statement paper describing the development process^↗︎. This checklist represents a consensus-based minimal set of 15~30 items that these experts have determined should be reported in a study^↗︎.

“Readers should not have to infer what was probably done, they should be told explicitly.^↗︎”

While the ultimate goal is ensuring high-quality, transparent, and complete scientific communication, no two studies perfectly adhere to a given guideline. As such, these guidelines are not meant to be strictly followed, but rather used as an advisory mechanism supporting authors as they develop their research. They are also designed to assist editors, peer reviewers, and general readership in understanding, interpreting, and critically appraising the findings^↗︎. Many journals have since required authors to submit completed checklists indicating where each item has been reported.

Guidelines for Medical AI

A growing line of AI applications in medicine continue to populate scientific journals since ca. 2015. Despite the momentum, an over-inflated hype around the technology together with a reproducibility crisis^↗︎ have both led many to question the quality and scientific rigor of the analyses being conducted. Once again, we see the community mobilizing efforts to address this. Just over the past year, we started witnessing existing guidelines being “extended” to include studies with AI components, as well as newly minted AI-specific guidelines being developed. Acknowledging medical AI research in such a manner brings much needed legitimacy to the field and will play an important role in shaping its future. For context, it took us nearly a couple of decades to identify issues with clinical trials and take corrective measures^↗︎. Today, it only took us 5 years to do the same for AI interventions. Progress.

Here, we explore 10 guidelines serving four stages of a typical medical AI application lifecycle: development & in silico validation, intermediate clinical evaluation, randomized clinical trials, and finally, product procurement.

An overview of 10 guidelines serving four stages of a typical medical AI application lifecycle.

Stage 1: Development & in silico Validation

Most guidelines serve this crucial proof-of-concept stage and aim to capture the nuances of developing AI applications. The argument here is that differences in terminology between existing guidelines and contemporary ML research may be the reason behind them being underutilized^↗︎. For instance, while TRIPOD focused on simpler regression-based models^↗︎, its extension TRIPOD-ML may include specifics pertaining to training and validating neural networks and associated hyperparameter tuning.

Academia is no stranger to duplicated efforts, and it is unclear why multiple guidelines are needed here as they all tend to address identical ML concepts: model design, data partitioning, validation metrics..etc. This is also true for specialized guidelines such as PRIME for cardiovascular imaging. Instead of focusing on generic concepts, these specialized guidelines should focus more on what makes the speciality unique i.e. the specific data it uses and the clinical task it addresses.

“A cynic might be forgiven for thinking that there are now so many publication guidelines that nobody can keep track of, and that they will all sink quietly into oblivion^↗︎”

Stage 2: Intermediate Clinical Evaluation

While it is great to see guidelines being developed here, the mere acknowledgement of this stage is worth noting. In silico AI development is a proof of concept that does not inspire sufficient confidence to run clinical trials. This intermediate clinical evaluation stage will allow for studying ergonomics and human factors by running small clinical experiments. Such a “dry run” deployment will enable rapid prototyping with user feedback in a simulated clinical environment, ultimately informing the go/no-go decision to conduct large and expensive trials. Because this stage may also allow for testing the AI’s safety profile, it has been compared to phase 1/2 trials in the drug development space^↗︎. These types of intermediate clinical experiments represent a minority share of the literature today as most tend to be in silico.

Stage 3: Randomized Clinical Trials

Clinical trials are considered the gold standard for medical evidence. Trials for AI interventions are fairly new, as we have only started to see those in the past few years^↗︎. It is only a matter of time before they become an established research line - one that will immensely help discern hype from true clinical utility. From a regulatory perspective, trial-based evidence is urgently needed as the FDA is currently approving AI applications mainly based on preliminary evidence^↗︎.

With SPIRIT-AI focusing on trial protocols and CONSORT-AI on trial results, these guidelines may help standardize the reporting of how AI interventions are administered and how AI outputs contribute to users’ decision-making. This new breed of AI trials requires additional considerations. For instance, while traditional trials report inclusion/exclusion criteria for human participants, AI trials must also report the same for input data, together with protocols for handling subpar data. While the guidelines’ authors chose not to discuss continuously updated AI models trained on new data^↗︎^↗︎, it will be exciting to see how future trial designs will take this into consideration.

Stage 4: Product Procurement

We are also seeing guidelines being developed beyond the academic sphere, specifically for procuring AI products. This is essential as the evaluation of medical device software with AI components differs from its generic counterparts. In addition to algorithm-related considerations, one must note that AI software consumes data to generate value as opposed to solely performing record-keeping functions (create, read, update, and delete records). To ensure maximum value from a given product, the “data profile” of an institution must match what the product is designed to work with.

In addition to ECLAIR, other guidelines also exist including the NHSx AI buyer’s guide^↗︎ as well as other AI purchasing guides^↗︎. While these offer a wishlist of information about a product, most of these may not be made available, ever. For instance, while it is natural for a clinician to inquire about the data used to train an AI model, vendors today do not reveal this information, nor are they legally required to do so. It will take some time for vendors to adjust to the nature of AI products and hopefully provide more transparency. While there may be only a handful of AI vendor options in each application area today, their increasing number will likely cause “vendor fragmentation” and make for a more challenging procurement process. We are already seeing this today in EHR^↗︎. In the same way we have consultants to help with procuring, designing, and implementing EHR^↗︎, expect to see AI consultants moving forward.

The Nuances of AI Guidelines

Guidelines for medical AI research call for additional considerations beyond their more generic ancestors. Best practices of AI research are constantly evolving and so should the guidelines that accompany them. While CONSORT has been revised twice since its inception in 1996, AI guidelines may require more frequent revisions. As data plays a central role in AI applications, we are also seeing guidelines become more data-specific. AI research today is centered around computer vision and therefore most guidelines cater to imaging-based applications^↗︎. More guidelines for text and speech data are yet to be formalized.

There are also limits to what AI guidelines can achieve. Some guidelines promise to encourage reproducible research through checklist items such as “complete sharing of the code” and “allow a third party to evaluate the code”^↗︎. While this may help measure transparency in the field, reproducibility challenges are often cultural (the appetite to share), computational (controlling software environments), and ethical (healthcare data privacy). It is unlikely that reporting guidelines will move the needle in that regard.

How are Guidelines Really Used?

Despite their delightful acronym-based names, guidelines often fail to serve their intended advisory role. The main reason behind this: guidelines have become more journal-specific as opposed to study-specific. Your likelihood of consulting a specific guideline depends almost entirely on whether the journal you are submitting to requires it as part of their submission process. As such, authors rarely consult guidelines during the development and writing phases of research. Instead, guidelines end up being treated as an after-the-fact checklist filled in at submission time, and appended to published studies.

While the logical move of incorporating guidelines into the journal submission process has helped extend their reach, it has also pushed them towards becoming part of an otherwise highly mundane process. Journal submission portals are not the most user friendly and often involve lengthy frustrating data entry of author names, affiliations, and other information. The guideline checklist has simply become one more item uploaded to these portals.

Adherence & Accuracy

Both the adherence to and accuracy of reporting guidelines have come under scrutiny. The responsibility of adhering to guidelines falls entirely on the author(s), given that they are most familiar with the work presented. Both journal editors and peer reviewers have distanced themselves from policing the correct use of guidelines, arguing that it is a burden that falls out of their competence^↗︎. As a result, authors’ unfamiliarity with guidelines, the lack of a second opinion, and other external factors such as word count limits may all lead to checklists that do not reflect what is reported in the study^↗︎. Moreover, the large variability in how journals incorporate reporting guidelines into their “instructions to authors” may cause additional confusion. These range from “please refer to” and “encourage” to “should conform” and “must be reported”^↗︎.

Authors will always prioritize fulfilling the editors’ and peer reviewers’ requests over conforming to a checklist that has no direct bearing on whether the study will be accepted for publishing.

Measuring Guideline Impact and Reach

The emergence of guidelines to inform AI research in medicine just 5 years into this relatively new field is a positive hint at its prospects. To fully understand the potential impact of these newly proposed guidelines, one must look into the track record of existing guidelines. Research into this area continues to report mixed findings. For one specific guideline, STARD, some report no meaningful differences in the quality of reporting between journals that endorse it and those that do not^↗︎, while others report a small but significant improvement^↗︎. The same is also true for CONSORT where one study reports “extensive misunderstandings” around guideline interpretation among journals^↗︎^↗︎, while another reports that its adoption is associated with improved reporting of trials^↗︎.

Guideline reach today is measured by the number of journal endorsements and citations. We are lacking data on how readers interact with, interpret, and use guidelines to appraise studies. If readers fail to utilize guidelines in the intended manner, they will fail to deliver on their promises. Additionally, more work is to be done in improving authors’ familiarity with the guidelines and promoting their use earlier in the research journey before the academic editorial process starts.

It will be a while before we experience the impact of these guidelines. To make the most of them, they must be used in the right way^↗︎: less as a quality evaluation form or a strict document to be followed verbatim^↗︎, and more as an overarching guidance for research^↗︎.

AI Beyond the Clinic: Labs, Telemedicine, and Consumer Healthcare

Tue, 16 Feb 2021 22:40:32 GMT

Unstructured Data Silos & Protocol Deviations

Healthcare systems are large and complex structures, both in terms of sheer number of components as well as the high levels of influence they exert on one another^↗︎. Mismanagement of such complexity may lead to systems that are difficult to predict and healthcare that is challenging to deliver: patient-dissatisfaction, medical errors, and hospital-acquired infections to name a few consequences. While some geographies have national health and single-payer systems that may help reduce complexity, some of these issues still persist.

Healthcare: a complex system made up of multiple disparate components with diverse stakeholders and participants. Source: NEJM Catalyst (catalyst.nejm.org)

This complexity is naturally reflected in the system’s digital footprint i.e. data. Healthcare data of all sorts (administrative, claims, clinical..etc) is messy, sparse, and incomplete to say the least. Consider a given cancer patient’s clinical data. Much of it will lie scattered across the pathology lab where her biopsy specimen was analyzed, the radiology department where she got a mammogram, the radiation oncology center where she received radiotherapy, and many others. These are the “healthcare data silos” we read about everywhere. The system is so large that instead of scraping it and starting over, we are heaping complexity on complexity^↗︎ by offering data janitoring as a service. A great example is Flatiron - recently acquired by Roche^↗︎. They are able to generate value by aggregating cancer data and curating it - utilizing an army of mechanical turks who painstakingly go through the data and extract relevant aspects^↗︎.

Clinical workflow is another area where this complexity manifests itself. A wide gap exists between clinical protocols and actual clinical practice^↗︎. This is most evident in emergency care where protocol deviations are commonplace and urgency to treat patients takes priority^↗︎. Primary and speciality care settings are also not immune to these issues. Medical professionals will often get interrupted by phone calls and pager pings as they perform certain tasks, and these have been associated with medication administration errors among others^↗︎. Even patients themselves often have trouble adhering to recommended treatment regimens, negatively impacting the quality of care provided^↗︎.

AI in the Clinic

Given this status quo, one can only imagine the amount of friction AI tools will face as they work their way into the clinic.

Friction happens first on the data front, and deep learning has only contributed to this. We’ve covered “expert systems” in a previous article - the predecessors to “AI in healthcare” as we know it today. These systems used machine learning methods that worked with relatively little data, but were tedious, proprietary, and required deep expertise. From a deployment standpoint, they had limited clinical utility, did not generalize very well to new patient populations, and ended up throwing off users when they failed. Today, these methods have been replaced with deep learning and brought along improved performance, an increased appetite for data, as well as over a dozen well supported and documented open-source libraries for model development and deployment^↗︎.

The bottlenecks in bringing AI tools to the clinic, then and now. We no longer need PhD's to develop AI tools - thanks to PyTorch and Tensorflow - but we need way more data. Higher quality = better performance. Challenges in clinical integration remain.

In essence, deep learning has shifted the bottleneck from the methods to the data. In doing so, it made healthcare data engineering a core component of any clinical AI product. This engineering must deal with unstructured data. Most healthcare data are unstructured and that is where untapped value lies^↗︎. Clinical notes are a prime example of this data: Inconsistent terminologies, shorthands, and abbreviations reflective of individuals’ training make the notes unusable as-is. The engineering must also aggregate data from multiple sources, as needed for the AI modelling problem, as well as address interoperability. No wonder that 60% of data science and machine learning work today is purely data curation^↗︎.

Friction then happens on the clinical implementation front where weak adherence to clinical protocols presents major challenges. How can AI seamlessly integrate with an unpredictable and constantly changing clinical workflow. Technology in healthcare has been notorious for laborious data entry, irrelevant information overload, unactionable alerts, and unfriendly user experiences: Electronic health records (EHR) is a notable example^↗︎. We currently can not claim AI will be any different. One can argue that integration today is more challenging than it was a decade ago given the bad reputation clinical expert systems have given AI tools generally, as well as the increased “digital clutter” and endless software that clinicians must interact with.

In the short term, successful clinical AI tools will be those deployed in the background - performing silent tasks with little to no interruptions to clinical workflow.

AI Beyond the Clinic

While discussions around “AI in healthcare” tend to focus on the providers’ front-of-house where clinical care is delivered, deploying AI in the background requires moving beyond this traditional point-of-care and onto other areas within the broader healthcare system. This can happen at three levels as we move further away from the clinic.

Three levels beyond the traditional point-of-care offering increasingly more stable conditions for AI development and deployment.

Level 1: Provider Back-of-house
The first level lies within the provider walls but operates in the back-of-house. Examples of these include radiology reading rooms, labs, and other environments that are free from all the intricacies that come with patient interactions, scheduling, management..etc. Healthcare professionals in these settings are assigned specific tasks to be carried out on specific data types. This well-defined and limited scope of services provides stable conditions for AI model development and deployment. For instance, an AI model for identifying a specific abnormality in a CT image can run automatically right after image acquisition. Model results can then be presented alongside the image to the radiologist. Most AI studies and implementation efforts today operate at this level.
Level 2: Service Vendors
The second level goes beyond the provider and onto other organizations that do business with providers, collectively known as vendors. More specifically, vendors that provide services are very well positioned to utilize AI tools in their workflows. Examples of these include laboratory testing services for blood, tissue, and other clinical specimens. A highly operational lab with good control over protocols often enjoys high data cleanliness and workflow adherence levels. Another category here includes tele-health vendors that remotely help extend the hospitals’ capacity or scope of services. In a previous article, we looked at the growing demand for tele-pathology as a result of the COVID-19 pandemic. By outsourcing these services, we are effectively creating an isolated sandbox within which AI can be deployed and where risks can be more tightly managed.
Level 3: Direct-to-patient Healthcare
The third level is where the traditional provider is replaced by direct-to-patient healthcare services that have recently grown in popularity and present a new mode of healthcare delivery. These come in the form of tele-medicine services that offer consultations through the web, especially for routine check-ups that can be administered remotely. We are already seeing AI-powered chatbots being used to triage tele-medicine patients prior to the virtual consultation (e.g. Ada^↗︎, Buoy^↗︎, and Babylon^↗︎). Another category includes testing services on samples that are collected and sent in by patients themselves (e.g. Paloma for thyroid function tests^↗︎ and Steady for diabetes monitoring^↗︎, and Nurx for COVID-19^↗︎). Similar to labs that work with providers, the operational nature of these services makes them ideal for AI interventions.

Tailwinds & Headwinds

While service providers at each of these levels may appear distinct, they have much in common. They offer a specialized and clearly defined scope of services. They operate at the periphery and do not provide care directly - at least not in the traditional sense. They do not interact with patients - at least not physically. Their day-to-day business operations (receiving information, processing it, and sending out results), combined with relatively clean data and high adherence to protocols, all make for a reliable AI development and deployment environment. Additionally, this assembly line-like process often comes with tight quality control and assurance standards. This may ensure safer AI deployments, especially in cases of misinterpretation by users or even complete failure.

That said, we must not forget that being highly operational can also translate to stifled innovation: Why change a system that works, especially one that is rigid and expensive to upgrade. In such a case, the value proposition of AI tools - whether cost savings, faster turnaround times, or quality improvement - must be significant enough to drive change. Data governance and privacy can also become bottlenecks if this data will be used to develop AI tools. While traditional providers are often regarded as “legal custodians” of their patients’ data^↗︎, regulations for ownership and protection of patient data collected by 3rd party healthcare providers are unclear^↗︎. Some legacy labs and tele-health companies - those that have not changed their business models for decades - may find difficulty in allocating the right resources and attracting the right talent to implement AI tools into their operations. This drives them to hire consultants or enter into strategic collaborations e.g. an equity partnership between Indian radiology AI startup DeepTek with Japanese tele-radiology company Doctor-NET^↗︎.

AI in Consumer Healthcare

Direct-to-patient services are part of a much larger trend: consumer healthcare. It is no surprise that healthcare has one of the lowest customer satisfaction scores across many industries^↗︎, and a consumer-first experience promises to change that^↗︎. Such efforts tend to combine physical and digital channels into holistic patient experiences^↗︎ through bedside, and now “webside” healthcare^↗︎. Examples of these include membership-based primary care practices One Medical^↗︎ (IPO in 2020), Forward^↗︎, and Carbon Health^↗︎.

From an AI application standpoint, this is perhaps the most appropriate context. Consumer healthcare products often claim to be “greenfield projects” i.e. starting from a blank slate without the constraints of existing systems. This entails rethinking healthcare delivery from the ground up, and thereby creates opportunities for implementing AI solutions from day one, or at least building the right provisions for future work. We often see a mismatch between modern AI products and outdated healthcare IT. For instance, offering a cloud-based AI service only to realize that much of healthcare data is still stored within hospital firewalls (on-prem). This mismatch is likely to erode with modern healthcare technology stacks that hopefully generate higher quality data out of the box. Relying on customer (i.e. patient) satisfaction and quality of care as performance metrics may drive greater adherence to clinical workflows, another key enabler of successful AI implementation.

Instead of attempting to tame a somewhat irrational and outdated system, a patient-first approach may help structure and modernize healthcare while making it more amenable to new technologies generally, and AI specifically.

Disaggregation of Healthcare

Narrow AI applications in healthcare today - those that perform a very specific task on a specific data type - require an equally narrow and well-contained context for successful implementation. It is more likely that these ideal contexts will exist beyond the walls of traditional providers. Despite increased consolidation in healthcare^↗︎ (few large healthcare systems buying up smaller ones), we are also seeing the disaggregation of the hospital^↗︎ with healthcare outsourcing on the rise^↗︎. This may translate to more vendors, more specialized services, and more opportunities for AI applications to be bundled into these services. As for current efforts aimed at integrating AI products at or close to the point-of-care, barriers remain until a major system overhaul takes place.

AI Startups in Pathology: A Meta-Review

Tue, 10 Nov 2020 22:40:32 GMT

Patient Story

An abnormal finding is detected on a mammogram during routine annual screening. To investigate further, the physician orders a biopsy and a tissue sample is extracted from the suspicious area. The sample is sent out to the pathology lab where technicians start the manual process of sample preparation. This entails fixating the sample, embedding it into paraffin, and sectioning it into thin slices. The slices are stained with chemicals to colorize and highlight certain cells and tissue types. A pathologist will then view it under a microscope. If cancerous, the appearance of the abnormal cells and their spread will be assessed among other features. All this information is captured in a report that is communicated back to the physician. This entire process from biopsy to diagnostic report will take an average of 10 days in the US^↗︎, a nerve-racking period during which our patient is eagerly waiting to hear back about her breast cancer diagnosis.

The pathology workflow from biopsy to image interpretation. After the slides are prepared, they are either viewed under microscopes, or scanned and digitized for viewing on computer monitors. Most labs today still do the former.

Pathology, Pathologists, and Slides

Pathology is a branch of medical science that determines the presence and extent of a disease. This is done through visually evaluating how healthy tissue is perturbed by pathological processes^↗︎. In fact, the way this evaluation is conducted has seen very little change over the past 150 years. This includes everything from sample preparation methods to viewing slides under an optical microscope. For instance, the haematoxylin and eosin (H&E) stain combination - the most widely used stain and often considered a gold standard^↗︎ - has not changed since it was first introduced in 1876^↗︎. These well-established and trusted pathology protocols continue today to serve as the first diagnosis point for most cancers and other diseases, while complementing multiple areas of study including necrosis (cell death), inflammation, and wound healing. Pathology also extends beyond tissue to examine bodily fluids (e.g. blood, urine) and the whole body through autopsies^↗︎.

Pathologists are often referred to as “the doctor’s doctor” in reference to how diagnoses are first assessed by pathologists and then reported to other physicians^↗︎. There are roughly 21,000 pathologists in the US^↗︎ (5.7 per 100,000^↗︎), and a similar number in China while serving 4 times the population^↗︎. These numbers point to an exponentially growing shortage (18% decrease from 2007 to 2017 in the US^↗︎), especially as cancer cases are on the rise and many pathologists look to retirement. A smaller workforce handling more cases^↗︎ is reflected in longer lab turnaround times and associated physician burnout - which can have dire consequences on diagnostic accuracy. Pathologists are also often considered as “lab administrators” and an overworked pathologist may lead to deficiencies in carrying out this role^↗︎.

In addition to workforce shortages, the general subjectivity of image interpretation is another characteristic of everyday pathology workflows. As is the case with other image-based medical specialities (e.g. radiology), slide interpretation relies heavily on the pathologist’s experience and even their mental state to some extent. Diagnostic disagreements among pathologists are not uncommon and can reach a rate of 11%, with difficulties distinguishing disagreements from errors^↗︎. Another aspect of pathology images is the sheer number of slides per sample that must be examined, making for a cumbersome and error-prone process. A colectomy (surgical removal of the colon) sample may generate up to many dozen slides significantly increasing the risk of missing important findings^↗︎. Obtaining a second opinion on a given case is only feasible if reviewers are physically in the same lab to view the slide under the microscope. As glass slides are archived and go into long term storage, it becomes extremely difficult for pathologists to compare cases and identify similarities across samples and cohorts.

Going Digital & Whole Slide Imaging

Digital pathology has continuously promised to revolutionize the pathologist’s workflow for the past 30 years^↗︎. This shift will ultimately address many of the aforementioned issues and enable functionalities that we take for granted today when manipulating digital data e.g. annotating images with text descriptions or sending images to remote experts for a second opinion. Despite these improvement opportunities, labs have been slow in ditching their microscopes for slides displayed on computer monitors. Many factors have been cited as reasons behind this lag including high digital migration costs, data storage requirements, as well as the need to retrain personnel^↗︎. Today, some estimate that 20% and 1% of US labs use digital pathology for secondary and primary diagnosis respectively^↗︎. In addition to streamlining workflows, going digital is also a prerequisite for any subsequent computational image analysis pipeline. These include AI-based tools that require large amounts of digital data for model development, as well as during production. This underwhelming uptake in digitization makes it rather challenging to envision how AI will generally impact pathology workflows. It is also often cited as a reason behind the limited applications of computational pathology (the “omics” or “big data” approach to pathology)^↗︎.

Whole Slide Imaging (WSI) systems are the main enablers of digital pathology. Earlier versions from the 1970’s started by displaying microscope images on oldstyle cathode-ray tube TVs. They eventually matured as cameras became an integral part of digital microscopes, and images - or virtual slides - were sent directly to computers. Today, WSI systems comprise image acquisition scanners as well as display and management software - together with associated communication and storage systems. Just 3 years ago in 2017, the FDA approved their first ever WSI system developed by Philips for primary diagnosis^↗︎, with a second approval in 2019 for Leica^↗︎. While more are expected, it is clear that WSI systems are still in their infancy, especially as the FDA itself is working on developing the evaluation criteria for these devices^↗︎. It is also clear that digitization and WSI system adoption rates ultimately need to reach critical mass to support an AI ecosystem.

A timeline of major events related to digital pathology, associated deep learning applications, as well as the founding of 16 startups analyzed here.

AI Solutions for Pathology

The first studies to apply deep learning to WSI data appeared in 2015^↗︎. Around the same time, many incumbents - previously providing image analysis pipelines for pathology slides - have rebranded including Visiopharm^↗︎ with an “app store”-like offering^↗︎, ContextVision^↗︎, Indica Labs^↗︎, and Aira Matrix^↗︎. Concurrently, multiple startups with AI as a key differentiator started to surface. The 16 startups analyzed here (founded 2013 onwards) work within 3 interconnected areas. The first area operates at the operational laboratory level, and ultimately enables the two other areas: clinical decision support and research & development.

Laboratory Operations
AI applications in this area tend to focus on increasing lab efficiency, quality control, and image management. Being highly operational, these applications are perhaps the least exciting of the three. Nevertheless, they are likely to have the greatest and most immediate impact in the short term. They are also often advertised as “workflow tools” which may help bypass regulatory roadblocks in some jurisdictions. Examples of these applications include automated detection algorithms to prioritize and triage cases, highlight regions of interest as images are examined, or run tedious tasks such as cell counting. Other image similarity algorithms may be used to index and search images for certain patterns. Some startups in this area include Procsia working on “driving efficiency in high-volume labs”^↗︎, Deciphex with focus on “triage not diagnosis”^↗︎, as well as Techcyte serving niche veterinary pathology labs^↗︎.
Clinical Decision Support
This area focuses on the pathologist’s core clinical tasks: diagnosis and characterization. These may include classification models to identify malignant cells, predict their histology, and grade them based on how differentiated they are from surrounding healthy tissue. Early applications will start by providing a second opinion to pathologists, and it will likely be a while before they become fully autonomous. As in laboratory operations, these applications are also pathologist-facing, hinting at the importance of how and where they are integrated into the workflow. Some startups in this area include Paige with a focus on prostate cancer diagnostics^↗︎, as well as Qritive with the integration of pathology imaging with electronic health records (EHR)^↗︎.
Research & Development
AI applications in this area are geared towards developing imaging biomarkers i.e. identifying features of an image relevant to a given outcome. Analysis of these features can help in clinical trial recruitment, providing more tailored treatments, and developing companion diagnostics (tests that are co-developed with a drug to aid in selecting patients for treatment with that particular drug). These are perhaps the most exciting applications of the technology, and expectedly the most far behind from a translational standpoint. Startups must often partner with pharmaceutical companies or contract research organizations (CRO’s) to collaborate on these research projects and gain access to clinical trial data for model development. Some startups in this area include Deep Lens with patient-trial matching at time of diagnosis^↗︎, Aignostics with companion imaging diagnostics^↗︎, as well as Nucleai who are working on imaging biomarkers for immunotherapy response prediction^↗︎.

A ternary plot with axes for 3 AI focus areas: laboratory operations, clinical decision support, and research & development. The placement of each startup corresponds with its perceived focus area.

It is no surprise that most AI solutions cater for laboratory operations. Labs are a logical starting point with immediate tangible needs. Startups can start there and ultimately grow to provide either diagnostic models or R&D tools. Additionally, some low-hanging AI tasks in this area (e.g. cell detection) can be adequately performed by traditional machine learning methods and do not require deep learning with its large training data requirements. Clinical decision support comes in second place with higher level applications often associated with more risk. If pathologists do not use AI to triage cases and run tedious tasks, they are unlikely to use it for diagnosis. Finally, the R&D area is the least explored yet. Imaging data paired with patient outcomes is perhaps the most scarce, and some pharma choose to develop these technologies in-house as they often have the resources needed.

As startups get closer to the center, especially between clinical decision support and R&D, there may be interesting opportunities to connect providers with pharma. In one direction, these startups could provide pharma with real world data and evidence, both needed to accelerate drug development and inform clinical trial design^↗︎. In the other direction, they would also be in a unique position to envision how the research they conduct may one day be implemented in the clinic.

The Platform and its Auxiliary Services

There is no place for standalone AI algorithms in clinical pathology. Given the infancy of digital pathology, much of the infrastructure needed to contain and serve them does not exist. While this may alleviate clinical integration headaches (healthcare IT is notoriously outdated), it puts startups in a unique position to establish their own turnkey solutions. As a result, most offerings are centered around workflow management software and not the AI as often advertised. Assuming a lab has gone digital, startups will provide a software product that comprises an image viewer for day-to-day case reviews, and may boost report generation, telepathology, and collaboration functionalities. AI components are then offered as “add-ons” or productivity tools. As digitized pathology images include very high levels of magnification, they are relatively large in size: 0.1GB for a 3-dimensional CAT scan vs 3GB for a pathology slide^↗︎. Startups also offer cloud storage to handle this data. Finally, we have the image acquisition hardware for scanning and digitizing glass slides. Startups will work towards being scanner-agnostic, but they will not attempt to make their own. While these scanners were traditionally developed by more established vendors (e.g. Philips^↗︎, Huron digital pathology^↗︎), there are new players (e.g. Morphle^↗︎) in this area providing labs with more options while also contributing to vendor fragmentation.

A general diagram depicting the platform built by AI startups in pathology. It comprises a workflow management software and cloud storage that act as infrastructure for running AI models. Startups will integrate with image acquisition scanners, and may provide some auxiliary services.

In addition to the platform and its components, we are also seeing auxiliary services being offered by some startups.

Do-it-yourself AI
This functionality provides users with simple annotation tools that allows them to develop their own models. This concept is not entirely new and is offered by some open source software. For instance, cellprofiler^↗︎ allows you to “annotate” a few examples which would then be propagated across unseen images to perform tasks such as detecting and counting cells. While these tools may be fitting for research contexts where tweaks to AI models are often needed, it is unlikely they would work in a clinical lab setting. In order to understand how they work and where they fail, these tools require an upfront investment of time and energy. Given time constrains in clinical labs, this investment is unlikely to be made. Examples of startups offering these software features include Aiforia^↗︎, deepathology^↗︎, and Deciphex^↗︎.
Slide Digitization
To capitalize on the sheer amount of archived glass slides accumulated over the past decades as well as new slides prepared daily, a market for slide digitization has emerged. Startups including Medmian^↗︎, Deciphex^↗︎, and Qritive^↗︎ offer a service to digitize and store virtual slides. This positions them to curate very valuable repositories of retrospective slides, in addition to being in control of pipelines carrying future incoming data. Other companies (e.g. Histowiz^↗︎) are purely focused on digitization where you ship in slides and view them digitally a few days later. On one hand, this enables opportunities for automated image analysis services, results of which can be delivered to labs without any interruptions to clinical workflow. On the other hand, digitization-as-a-service may not be viable in the long term as more and more labs digitize their own slides.
Academic Offerings
The educational pathology sector was one of the earliest adopters of digitization serving residents as well as practitioners through continuous professional development programs^↗︎. Some startups offer services that allow educators to standardize course and exam materials, as well as improve accessibility by delivering content over the web. Despite its relatively small size, the educational pathology area may help expose students to the technology early on before they enter the workforce.

The multihead microscope. How pathologists were educated at some point in the not so distant past. Source: focusontoxpath.com

Tailwinds & Headwinds

The COVID-19 pandemic has given a huge boost to tele-medicine models and tele-pathology in particular. The arguments for going digital took center stage when it became necessary for pathologists to continue their work remotely without microscopes. In response, the FDA has issued guidance to expand the availability of digital pathology devices during this public health emergency^↗︎, allowing specific devices that are not FDA-cleared to be used clinically. It will be interesting to observe how this urgency in digitization will continue beyond the pandemic. As for AI tool adoption, much friction still exists. While it is common for startups to act as evangelists, startups in pathology must advocate for two technologies simultaneously: digital pathology and AI. This entails investing heavily in content creation through educational courses, workshops, blogs, and webinars. While this adds burden, it may also translate to stronger relationships with a more engaged and loyal user base.

Digitizing both retrospective and prospective glass slides is no easy feat. Slides must be tagged with identifying barcodes and rescanned multiple times if a quality threshold is not met. Given the large size of the resultant images, it is crucial to correctly set the scanner parameters beforehand (e.g. level of magnification, focus plane). For instance, glass slides with thicker sections often produce out-of-focus images, while sections that extend close to the edge of the slide may not be captured by the scanner. This manual data curation may contribute to a cumbersome process, especially for non-experienced technicians. Moreover, the file formats in which WSI data is saved often come with interoperability limitations. In contrast to radiology where virtually all images are saved in the DICOM format^↗︎, digital pathology is yet to converge on a single standard file format. Instead, proprietary vendor-specific formats dominate pathology today^↗︎, and many proposed standards are yet to be adopted by the community at large^↗︎.

As for AI models, access to high-quality annotated data is a major bottleneck. The type of WSI annotations needed is not often part of pathologists’ daily routine, and hence requires additional effort from the time-constrained experts^↗︎. The analysis of these multi-gigabyte images (~6B pixels per image^↗︎) also poses new challenges for deep learning. Slides are often broken down into smaller patches with different annotation approaches at the slide-level (faster but less granular) and at the patch-level (better but more laborious)^↗︎. The amount of noise inherent to WSI data may also negatively impact the robustness of AI solutions. Manual sample preparation practices can differ both within and across labs, resulting in inconsistent image features and making diagnoses more open to debate. Even the level of stain intensity is often driven by the pathologist’s personal preference^↗︎. Scanners used in digitization often provide varying optical appearances and pixel resolutions, and require color calibration to ensure visual consistency^↗︎. Artifacts on glass slides not addressed during digitization (e.g. handwritten text, tape, cracks) may also contribute to more noise^↗︎. All these factors point at the importance of a solid data curation pipeline.

Opportunities

Virtually all AI applications in pathology cater to the final image interpretation step of the workflow. Conversely, little work has been done to address the highly variable upstream processes of manual slide preparation. This will likely be an area of focus as more holistic AI solutions are proposed. We are already seeing AI research in image correction (e.g. stain normalization, color augmentation^↗︎^↗︎) as well as quality control for flagging slides that deviate from protocol. AI has also been used to digitally stain unlabeled slides and create so-called “virtual stains”^↗︎^↗︎. If truly equivalent to routine staining, this may significantly reduce variability across stains while also disrupting the slide preparation workflow. Perhaps the most interesting AI applications are in computationally de-staining slides^↗︎. This ability to go backwards may allow for greater experimentation with different stains, not to mention the flexibility in capitalizing on archived retrospective slides coupled with very valuable clinical outcome data^↗︎.

Other opportunities lie in aligning the throughput levels of different parts of the pathology workflow. The slowest by far is sample preparation (days), followed by digitization with some scanners capable of handling up to 200 slides at a time^↗︎ (hours), and finally AI-based image interpretation with the highest throughput (minutes). We are seeing some hardware innovation in automating sample preparation (e.g. Inveox^↗︎), as well as high-throughput methods (e.g. tissue microarrays or TMA^↗︎) becoming more mainstream^↗︎. Given that we are in the early days of large-scale adoption of both digitization and AI in pathology, it will be interesting to observe how these technologies will mature alongside one another. This may also bring along opportunities for tighter integrations across hardware (lab equipment) and software (workflow management, viewers, AI) when compared to other digitized medical disciplines. For instance, co-developing image viewers and AI tools has already led to the rise of so called “AI-native” viewers. These viewers - in addition to displaying images - are built to visualize model predictions, facilitate data annotation for model training, and provide feedback for model improvement.

Human factor considerations will enable smoother workflow transitions. Clinical software and hardware are often quite user-unfriendly, and few legacy pathology solutions exist today. As a result, there may be opportunities to start from a clean slate and demonstrate how the user interfaces and experiences (UI/UX) could look like in such products. In fact, lab efficiency can be enhanced exclusively through better UI/UX and ergonomics without the use of any AI. User-centric design will also play a crucial role in introducing new technologies to users. For example, we have seen how displaying real-time AI predictions through augmented reality in digital microscopes can help expose labs to AI tools before they transition to fully digital workflows^↗︎.

Pathologists and Patients

There is no doubt pathology will eventually become a digital discipline, the speed at which will directly impact the adoption rate of AI tools. Today, 3-dimensional tissue samples are reduced to a select number of 2-dimensional slides to aid pathologists in interpretation^↗︎. As more AI is assigned this task, there may be a time when this simplification becomes unnecessary and the computational analysis of entire 3-dimensional tissue samples becomes standard of care^↗︎. While digitization will substitute the microscope with a computer monitor, AI will drive efficiency and ensure greater focus on the tasks that matter. Maybe then pathologists will transition to a more central patient-facing role, a role they are very well positioned to take on^↗︎.

Pre-2012 Startups and AI Products Across Medical Specialities

Wed, 02 Sep 2020 22:40:32 GMT

A Tale of Two Startups

First, we have the acquirer IDx with diagnostic products for diabetic retinopathy - a diabetes complication that damages light-sensitive tissue in the retina (the back of the eye). If left untreated, it may cause mild vision problems and blindness in extreme cases^↗︎. Retinal imaging using a fundus camera (a low-powered microscope with a camera) and manual interpretation by an ophthalmologist is a widely accepted screening method for this disease^↗︎. IDx is developing AI systems for the grading and detection of diabetic retinopathy in fundus photographs. In 2017, IDx ran the first clinical trial for an autonomous medical AI system^↗︎^↗︎, and a year later its product received FDA clearance making it the first device authorized for fundus image screening without the need for an ophthalmologist - essentially making it usable by healthcare providers who may not be involved in eye care^↗︎.

The acquiree, 3Derm, is developing diagnostic products for skin diseases ranging from common rashes to severe infections and skin cancer. Diagnosis of these types of diseases is often carried out initially through visual inspection with potential followup biopsy and pathological examination. Given that the average wait time to see a dermatologist in the US is 28 days^↗︎, teledermatology has grown in popularity as a cost-effective and reliable means for dermatology care delivery^↗︎. This involves photographic imaging of suspicious skin findings for almost instantaneous interpretation by remote experts. 3Derm provides teledermatology services through skin imaging hardware and automated diagnostic products. They’ve participated in multiple pilot programs^↗︎ and clinical efficacy studies^↗︎^↗︎, and earlier this year, a 3Derm product for autonomously detecting different types of skin cancer became the first AI device in dermatology to receive the FDA “breakthrough Device designation”. This designation is a fast-track regulatory pathway for devices that demonstrate more effective diagnosis for life-threatening and irreversibly debilitating diseases^↗︎.

Top: Two fundus photographs. The left image is healthy while the right image shows signs of diabetic retinopathy. Bottom: Two epidermal lesion images. The left image is benign while the right image is malignant. Sources: ocutech.com & nature.com

These two startups share multiple aspects. Both are in the diagnostic space where interpretation by a specialist is required. From a machine learning perspective, both problems addressed may be formulated as a classification problem where a fundus photograph or an image of a suspicious mole can be classified into either negative or positive for a given disease. At their core, both AI applications are likely to utilize similar convolutional neural networks (CNNs) - a class of deep learning algorithms^↗︎ - to perform this classification. Additionally, both products are primarily targeted for use in primary care where they may help triage patients and identify those who would benefit the most from a referral to specialists. Finally, the data used by both are also similar: two dimensional images captured by devices that come in handheld mobile variants and require minimal operator skill - making them ideal for non-specialist primary care settings. Perhaps the most interesting commonality between them: both were founded prior to the 2012 resurrection of research in neural networks and the popularization of deep learning. To understand how this head start has helped them capitalize on the technology, let’s explore what changed back then.

The Shift

Pre-2012, much of computer vision research was based on feature engineering or explicit hand-crafted features designed by experts^↗︎- and had struggled to reach performance levels that would make it clinically useful. In 2012, deep learning - where learning happens directly from labelled data - made substantial performance gains in the ImageNet image classification competition^↗︎. It very quickly became the de facto method for analyzing various data types^↗︎ and doing machine learning in general. It wasn’t until 2014 when the first studies to apply deep learning in medical imaging started to appear^↗︎. Only a handful of studies helped bring these applications to light, and for the ophthalmology and dermatology specialities, these studies happen to come out of Google Research and its massive PR machine. The 2016 ophthalmology study in JAMA showed that deep learning algorithms had a high sensitivity and specificity for detecting diabetic retinopathy in retinal fundus photographs^↗︎. The 2017 dermatology study in Nature demonstrated the ability of a CNN to classify skin lesions into over 700 diseases^↗︎.

While deep learning came with performance improvements, it also shifted the ML bottleneck from the methods to the data. The tedious methods that previously required PhD’s can now be applied using over a dozen open-source tools^↗︎. On the flip side, deep learning requires more data than prior methods, with more higher quality data often meaning better performance. As a result, data engineering became a core component of any machine learning product. While some startups were working on gaining access to data through licensing agreements (either with providers or pharma) and establishing data pipelines, others already had a data infrastructure in place and were able to capitalize on it to train and develop deep learning solutions. IDx and 3Derm are two examples of the latter.

Timeline of some major events related to deep learning and medical imaging, as well as IDx and 3Derm. Both companies were well on their journey when deep learning was first popularized and the bulk of startups that use AI as a key differentiator started to appear. Side note: Notice the 5 year gap between the CE mark IDx earned in 2013 and FDA clearance in 2018, both for similar AI products. This highlights the strictness of the FDA regulatory pathways given that devices can be used globally, and how the CE mark is less onerous to obtain as it comes with restrictions, even within the EU itself.

A Head Start

For IDx, deep learning was simply a new method. The first IDx patents go back to 2006 where hand-crafted features were used^↗︎. By 2013, IDx already had autonomous diagnostic products on the European market based on these methods^↗︎. Introducing deep learning to their existing machine learning stack^↗︎ brought along improved performance and likely played a major role in the FDA clearance they received 5 years later. The increased demand in quality and quantity of training data as a result of this transition was perhaps easily satisfied given IDx’s existing machine learning prediction engine, data infrastructure, as well as existing customers for continuous feedback and development.

3Derm had started by developing the hardware: a stereoscopic digital 3D dermatoscope for capturing standard photos of skin diseases^↗︎. This was bundled with a web interface for cataloging and monitoring these abnormalities. As the product matured, a teledermatology component was added allowing for remote experts to diagnose images captured and processed through their system. Introducing some level of AI into this system seems quite plausible as these diagnostic models can really capitalize on the existing teledermatology service. Given appropriate permissions, diagnostic models can be developed using data processed by the system (and its corresponding labels). This data is likely relatively clean: It has been collected and catalogued in a standard way making it more “ML ready” than your average clinical data. The web component 3Derm developed for viewing images can now serve as the data labelling tool. 3Derm’s existing network of dermatologists can now double as data-labellers, helping fuel the models with more data while providing valuable feedback.

It’s All About Data Engineering

Where is the best spot to build an AI product? On top of a data stream.

For both IDx and 3Derm, this headstart allowed them to really understand the nuances of the data early on - everything from edge cases and imaging artifacts to more general data issues such as class-imbalance. For those working on developing clinical AI products, there is perhaps a lesson here: a solid data infrastructure is a prerequisite for successful implementation. This is the core of what being “AI-first” is: collecting data from day one. In some instances, being AI-first might mean that you start with no AI at all. Infact, it may make sense to start by implementing an idea in its analog form, validating it, then digitizing parts of it over time. In other words, you can build an infrastructure for images to be captured and analyzed by remote experts, test if the solution really addresses an unmet need and providers are willing to pay for it, then ultimately introduce AI. By not forcing expensive digital solutions before validating them, you allow yourself to “fail fast”.

If the analog form is indeed a good place to start, one might consider any telemedicine product that relies purely on image interpretation (teleradiology, telepathology…etc)^↗︎ to be the best proving ground for AI interventions: common cases can be automated while experts can be consulted for the more complex. In fact, any business model that deals with medical data logistics is well positioned to implement AI solutions, at least theoretically. For instance, you will find that big tech offerings that provide medical data storage, retrieval, and archival - such as Google Cloud^↗︎ and Microsoft Azure^↗︎ - also offer peripheral add-on AI products as well as data analysis and labelling services.

AI Across Medical Specialities

A few years ago (~2017), we witnessed the consolidation of products from different vendors through the emergence of AI marketplaces - essentially “app stores”. These marketplaces largely operate within a given speciality (e.g. Nuance^↗︎ and Blackford^↗︎ in radiology, Visiopharm^↗︎ in pathology.. etc). While it is still early to gauge their success, they have multiple selling points. For vendors, they provide additional monetization channels as well as reach. Providers, on the other hand, get single point access to a wide range of AI models, in addition to tracking algorithm usage and performance among other metrics^↗︎. The AI marketplace concept clearly needs a dedicated article. I chose to highlight it here as I see the evolution of stand-alone products into AI marketplaces and now into cross-speciality AI offerings - demonstrated by this acquisition - as an encouraging sign of both technology and market maturity.

The evolution of AI products in healthcare: from single offerings, to AI marketplaces that aggregate solutions per medical specialization, to products that span multiple specialities.

IDx and 3Derm will now operate under Digital Diagnostics, a new “AI vendor” brand^↗︎. It will be interesting to see how AI products for ophthalmology and dermatology may be bundled, marketed, and branded together. Offerings across specialities is not a new concept. Many incumbents in medical image analysis have long been active across a wide range of specialities (e.g. Phillips for radiology, cardiology, and pathology^↗︎). However, departments within these larger organizations tend to be heavily siloed, with solutions developed in isolation from one another. Instead of disparate efforts to provide add-on AI solutions within each of these departments, Digital Diagnostics now has the opportunity to start with AI as a common denominator and demonstrate how the technology can extend horizontally to bridge different medical specialties. For instance, R&D efforts can be shared across the board. Pre-2012, explicitly defined algorithms for analyzing fundus photographs differed greatly from those used on skin images. Today, the data-agnostic nature of deep learning allows almost identical models to be trained separately on different data for different tasks. Additionally, data preparation and labelling tools may also benefit multiple data types. From a clinical user perspective, it may even be the case that a single modular product serves multiple specialities allowing for a “single interface experience”. The versatility of this horizontal AI product to expand to other specialties beyond ophthalmology and dermatology will ultimately signal its scalability and success.

Headwinds & Tailwinds

As with all AI-powered diagnostic tools, the issue of efficacy vs effectiveness often comes into play. Despite having some methodological and clinical limitations, IDx’s prospective observational trial is essential in understanding the efficacy of these tools^↗︎. For effectiveness, however, whether patients directly benefit from them remains unanswered. The evidence from the real world has not been entirely positive so far. The performance of Google’s diabetic retinopathy diagnostic product deployed in Thailand was recently reported. More than one-fifth of images were rejected by the system as it was designed with a relatively high rejection threshold for quality. Poor internet connection often stood in the way of uploading images to the prediction servers^↗︎^↗︎. This highlights the influence of socio-environmental factors and clinical workflows in general on the real-world performance of such systems. Also related to real-world performance are the limitations of autonomous AI devices by design. IDx’s device, for instance, requires images to be captured only by one fundus camera of a specific make and model, is only approved to detect “more than mild” diabetic retinopathy, and is not cleared for use on patients with pre-existing diabetic retinopathy^↗︎.

For providers, AI offerings across specialties may be an attractive option where otherwise a different vendor per speciality is needed. We see a similar pattern with electronic health records (EHR) where providers must often choose between single- and multi-speciality systems, each with its own pros and cons^↗︎. There might be a time in the future where providers have to decide the same for AI systems.

From a regulatory perspective, both IDx and 3Derm have crossed some major milestones as mentioned in the introduction - ultimately lowering the regulatory barriers for incoming players. Specifically with IDx, their FDA clearance was obtained through the De Novo pathway, allowing competitors to use IDx’s marketed device as a predicate for their own submissions and go through another regulatory pathway - the 510(k). For context, De Novo pathways are for new technologies that present low to moderate risk to patients and require an in depth risk-benefit analysis. In 510(k) submissions, devices are only required to show “substantial equivalence” to a previously cleared device^↗︎^↗︎.

Who is paying for all this? In August 2020, the Center for Medicare and Medicaid Services (CMS) introduced billing codes for fully autonomous AI systems that detect certain eye diseases - the first reimbursement of its kind without specialist intervention^↗︎. While this marks a critical first step, it is likely an uphill battle over the coming years to get coverage through other commercial and private payers. Reimbursement for teledermatology services, however, is still relatively new and vary from state to state^↗︎.

A Very Small Step Towards AGI

Writing this article made me think about the concept of artificial general intelligence. I admittedly had to consult wikipedia for a plain definition:

Artificial general intelligence (AGI) is the hypothetical intelligence of a machine that has the capacity to understand or learn any intellectual task that a human being can^↗︎.

While we may be still quite far from achieving this today, the aggregation of knowledge - one that would normally exist across multiple human professionals - is definitely a step in that direction. There aren’t many physicians around the world that have specialized in both ophthalmology and dermatology, or in any two or more medical specialities that would otherwise require multiple lifetimes of training and practice. While the underlying mechanics of this aggregated knowledge today consists of separate models performing single limited tasks, serving them from the same source is perhaps a very early embodiment of AGI.