The Wisdom of Crowds vs. Cancer

How an Open Science Challenge Revolutionized Breast Cancer Prognosis

Introduction: The Prognostic Puzzle

Breast cancer treatment has long faced a paradox: while molecular biomarkers promise personalized prognosis, many models fail to outperform traditional clinical assessments like tumor size or lymph node status. By 2013, over 50 prognostic models existed, yet only two (Adjuvant! Online and PREDICT-Plus) met rigorous clinical validation criteria . This gap inspired a radical experiment: the Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge—a crowdsourced competition combining open science, genomic analysis, and real-time validation to build a superior predictive model. The winning solution didn't just edge past existing tools; it revealed fundamental "bioinformatic hallmarks of cancer" applicable across cancer types 7 .

Key Concepts: Attractor Metagenes & Crowdsourced Science

Pan-Cancer Signatures: The "Attractor Metagene" Theory

Columbia University researchers hypothesized that certain gene networks—co-expressed identically across multiple cancers—control universal disease processes. They called these networks attractor metagenes:

  • Mitotic Chromosomal Instability: Drives abnormal cell division
  • Mesenchymal Transition: Enables metastasis
  • Lymphocyte-Based Immune Recruitment: Reflects anti-tumor immune activity 1 4

Unlike single-gene biomarkers, these metagenes represent emergent biological systems—making them robust prognostic candidates 7 .

The Power of Open Challenges

Traditional research often operates in silos. The DREAM Challenge countered this by:

  • Blinded Validation: Models were tested on hidden datasets to prevent overfitting.
  • Real-Time Leaderboards: Participants instantly saw performance rankings.
  • Code Sharing: All submissions were open-source, allowing iterative improvements 2 .

"Participants collectively submitted 1,700+ models. The best model wasn't just better—it was reproducible." 5

In-Depth Look: The Landmark Experiment

Methodology: A Three-Phase "Scientific Tournament"

The Challenge used genomic/clinical data from 1,981 breast cancer patients (METABRIC cohort), split into training and test sets. A novel 184-patient dataset (OsloVal) served as final validation 2 .

Phase 1: Orientation
  • Participants accessed 1,000 patient profiles (gene expression, copy number, clinical records).
  • Goal: Predict overall survival using any computational approach.
Phase 2: Model Refinement
  • Teams retrained models on the full 1,981-patient dataset.
  • Performance metric: Concordance Index (CI)—probability that the model correctly ranks two patients' survival times (e.g., CI=0.75 = 75% accuracy) 2 .
Phase 3: Ultimate Validation
  • Top models tested on the OsloVal cohort.
  • Benchmarks: Standard clinical tools (e.g., 70-gene MammaPrint assay) 2 .

Patient Cohort Characteristics

Characteristic METABRIC (1,981 patients) OsloVal (184 patients)
Median Age 61 years 58 years
ER+ Tumors 76.3% 60.9%
Tumor Size >5cm 7.5% 7.1%
High Grade (3) 48.1% 30.4%
Data adapted from Challenge results 2

Results: A New Benchmark Emerges

  • The winning model (using 3 attractor metagenes) achieved a CI of 0.75—outperforming 1,400+ submissions and existing tools like MammaPrint 1 5 .
  • Immune recruitment signatures were the strongest survival predictor—validating immunotherapy's emerging role 4 .
  • Performance remained consistent across METABRIC and OsloVal cohorts, proving generalizability 2 .
Challenge Impact Metrics
Participating Countries 35+
Submitted Models >1,700
Key Innovation Attractor metagenes
Validation CI 0.75 (OsloVal cohort)
Benchmark CI (MammaPrint) 0.65–0.70

Data synthesized from Challenge publications 2 5 7

Performance Comparison

Comparison of Concordance Index (CI) values across different models.

The Scientist's Toolkit: Key Reagents & Technologies

Reagent/Tool Function Role in Challenge
Gene Expression Microarrays Profile mRNA levels in tumors Captured attractor metagene activity
RNA Extraction Kits Isolate tumor RNA from biopsies Enabled genomic analysis of patient samples
R Statistical Software Data modeling and survival analysis Primary platform for model development
Cloud Computing (Google VMs) Remote data processing Allowed global participation
Synapse Platform Code sharing/validation infrastructure Hosted real-time leaderboards
Toolkit derived from experimental methodology 2 7

Conclusion: Toward Universal Cancer Signatures

The Challenge proved that crowdsourced science could solve complex biomedical problems. Its real triumph, however, was biological: attractor metagenes represent fundamental "hallmarks of cancer" —mechanisms recurring across malignancies. As Dimitris Anastassiou (lead researcher) noted: "If these signatures work in breast cancer, why not in other cancers?" 7 . Today, these metagenes inform pan-cancer diagnostic tools, illustrating how open collaboration accelerates translational medicine.

Fun Fact

The research was funded by Anastassiou's patents on digital TV technology—proving innovation thrives at unexpected intersections 7 .

For further details on accessing the prognostic model, see the original publication in Science Translational Medicine 1 .

References