AlphaFold's Interactome Leap: 1.7 Million AI-Predicted Protein Complexes, Free for Everyone
ai-biology · 7 min read

AlphaFold's Interactome Leap: 1.7 Million AI-Predicted Protein Complexes, Free for Everyone

AlphaFold just stopped predicting lone proteins and started predicting the conversations between them. 1.7 million high-confidence protein complexes — open, searchable, downloadable — changes structural biology overnight.

When AlphaFold dropped its 200-million-protein database in 2022, it felt like the end of something — the long slog of experimental structure determination, at least for individual proteins. It wasn’t the end. It was a setup. Because proteins don’t work alone, and everyone in the field knew the real problem was always the interactions. On April 10, 2026, EMBL-EBI, Google DeepMind, NVIDIA, and the Steinegger Lab at Seoul National University answered that problem with a single release: 1.7 million high-confidence AI-predicted protein complexes, integrated directly into the AlphaFold Database, free for anyone on earth to use.

This is the interactome era of computational biology. Not predicting what a protein looks like — predicting what it does with others.

The implications run from basic science straight into drug discovery, host-pathogen research, and the next generation of AI protein design tools. The barrier to proteome-scale interaction studies just dropped through the floor.

Why Monomers Were Never Enough

Protein complexes — quaternary structures formed when two or more proteins bind — are the actual machinery of biology. Enzyme active sites, immune signaling cascades, transcription factor assemblies: almost nothing important in a cell happens with a protein acting in isolation. Yet for years, computational structure prediction focused almost entirely on monomers, because predicting interactions requires modeling an exponentially larger combinatorial space.

The gap was embarrassing and everyone knew it. AlphaFold gave researchers a structure for nearly every protein in the human proteome. What it couldn’t tell you was how those proteins talk to each other — which surfaces contact, which residues are at the interface, what happens structurally when protein A binds protein B. That’s not a minor detail. That’s the question that determines whether a drug can interrupt a disease pathway.

The scientific community said this clearly enough that the four-way collaboration made interactome-scale prediction its explicit mission. The result is a dataset that doesn’t just fill a gap — it reframes what’s computationally accessible.

31 Million Predictions, 1.7 Million You Can Use Right Now

The scale is worth sitting with. The collaboration generated predictions for roughly 31 million homo- and heterodimeric complexes spanning 4,777 proteomes — humans, key model organisms, and the WHO’s priority pathogen list. The prioritization was deliberate: focus first on species with the highest global health relevance.

What’s live in the AlphaFold Database today:

  • 1.7 million high-confidence homodimers — directly browsable, searchable, downloadable
  • ~18 million lower-confidence homodimers — bulk download via EMBL-EBI FTP
  • Heterodimers — under analysis, high-confidence additions coming in months

The confidence filtering matters. Validation against 125 post-AlphaFold2 X-ray homodimer structures confirmed the pipeline delivers a mean DockQ score of ~0.64–0.65, with 73–75% of predictions classified as usable. That’s not a cherry-picked benchmark — it’s a real-world check on whether these structures are worth building on. They are.

The release is estimated to have saved the global research community the equivalent of 17 million GPU-hours of computation. Every lab that would have spent months running ColabFold locally now has a precomputed answer waiting.

NVIDIA’s Engineering Bet: GPU-Native Structural Biology

The bottleneck that previously made proteome-scale complex prediction impractical wasn’t the folding itself — it was multiple sequence alignment (MSA) generation. Building the evolutionary profiles AlphaFold needs to infer structure is computationally brutal at scale. NVIDIA cracked this by switching to MMseqs2-GPU, a GPU-native homology search tool developed in partnership with the Steinegger Lab, replacing the CPU-bound pipeline that had been the standard.

From there, the acceleration stacked:

  • TensorRT for optimized deep-learning inference during structure prediction
  • cuEquivariance for efficient equivariant operations — specifically triangular attention and multiplication, the mathematical core of AlphaFold-Multimer
  • Decoupled MSA generation from folding steps, enabling asynchronous CPU/GPU overlap
  • Length-based job packing on DGX H100 nodes, reducing idle time by up to 25%
  • Early stopping after four recycles and frozen MSAs to cut redundant computation

This is the kind of infrastructure work that doesn’t make headlines but makes everything else possible. The full pipeline is replicable: NVIDIA has published a technical replication guide, and the MSA search step is now available as an NVIDIA Inference Microservice (NIM) — meaning any lab with API access can run the same workflow without owning a Superpod.

What the Data Is Actually Showing

Raw count aside, the biological content is already yielding discoveries that simply weren’t visible at the monomer level. The preprint highlights novel interfaces, domain-swapping topologies, and structural configurations absent from any monomer prediction. These aren’t artifacts — they’re features of how proteins actually fold when they’re bound to partners.

Clustering analysis of the full dataset surfaces something striking: the top 1% of non-singleton clusters account for ~25% of all high-confidence complexes. A small number of structural archetypes dominate the interaction landscape. And roughly 9% of clusters are conserved across superkingdoms — meaning these interaction geometries have survived billions of years of evolution, which is as strong a signal of functional importance as biology offers.

The applications researchers are already pointing toward:

  • Full cellular interactome mapping — moving from “what proteins exist” to “what proteins touch”
  • Structure-guided drug discovery targeting protein-protein interfaces directly
  • Variant effect prediction at interaction sites — understanding what a disease mutation does to a binding surface
  • Host-pathogen interaction modeling at proteome scale
  • Benchmarking new AI protein design tools against a ground-truth structural dataset

This Is What Open Science Infrastructure Looks Like

Dame Janet Thornton, Director Emeritus of EMBL-EBI, framed the release as “a first step towards a comprehensive description of the human interactome.” That framing is accurate and deliberately modest — this is a foundation, not a finish line. Heterodimers are coming. Larger complexes will follow. The architecture built by this collaboration is designed to keep expanding.

The AlphaFold Database already serves 3.4 million users across 190 countries. It is, at this point, as foundational to biology as PubMed. The difference is that AlphaFold is a living dataset that keeps getting better — and this release is the most significant update since the original 2022 data dump.

At BioComputer, we track AI-biology convergence across the full stack: from single organoids to data center infrastructure. This release is a textbook case of what that convergence produces when it’s executed well — world-class AI research, GPU-accelerated infrastructure, and a commitment to open access combining to hand the entire field a tool it couldn’t have built alone.

The Interactome Is the Map We’ve Been Missing

Biology-as-computation has always had a data problem. You can engineer a cell to behave like a logic gate, but if you don’t know which proteins are interacting to produce that behavior, you’re designing blind. The interactome is the circuit diagram. AlphaFold just gave everyone a draft of it.

The draft is imperfect. Confidence scores vary. Heterodimers are still coming. Dynamic interactions, conformational changes under physiological conditions, membrane-bound complexes — the hard problems remain hard. But 1.7 million high-confidence structures, openly accessible, validated against experimental data, built on infrastructure that can keep scaling?

That’s not a draft. That’s the map changing.


References

  1. EMBL-EBI. (2026). AlphaFold Database protein complex update. EMBL-EBI News. https://www.ebi.ac.uk/about/news/technology-and-innovation/alphafold-protein-complexes/
  2. Costa, A. et al. (2026). GPU-accelerated proteome-scale protein complex prediction. NVIDIA Developer Blog. https://developer.nvidia.com/blog/gpu-accelerated-proteome-scale-protein-complex-prediction/
  3. Steinegger, M. et al. (2026). AlphaFold-Multimer complex predictions at scale. bioRxiv preprint. https://www.biorxiv.org
  4. AlphaFold Database. (2026). Browse protein complexes. EMBL-EBI. https://alphafold.ebi.ac.uk

Related: What is a Biocomputer in 2026? · AI-Biology Convergence · Drug Discovery & Biological Computing


Feature image: AI-generated using Grok.