Services
Services
About us
About us
Success stories
Success stories
blog
blog
Careers
Careers
Contact
Contact
blog

From Code to Life: Global Top 15 in RNA 3D Folding

Other
From Code to Life: Global Top 15 in RNA 3D Folding
Copied link!
Pablo Guerra
Data Translator

Artificial Intelligence reaches its true value when it solves complex, real-world challenges. At WhiteBox, we see ourselves as data translators capable of solving problems across any industry, and recently, we decided to put this to the test in the high-stakes territory of molecular biology. 

We are proud to share that we achieved 15th place out of 1,867 teams in the Stanford RNA 3D Folding competition on Kaggle. Ranking in the Global Top 1% validates that our methodology can compete with the world’s leading laboratories. 

The Challenge: Mastering a Domain from Scratch

We didn’t approach this as a cold math problem. Having a Biomedical Engineer on the team was key to bridging the gap between abstract numbers and biological reality. She led the internal training, ensuring the technical squad understood the "why" behind the data before we even touched the code.

For a few weeks, we effectively became "data biologists." We spent weeks researching ligands, sequences and the physics of how RNA folds to ensure our models respected the rules of nature. To stay precise, we even developed our own 3D visualization dashboard to inspect every prediction. This allowed us to spot structural flaws, such as steric clashes, that traditional metrics might overlook.

The Engineering Behind the Solution

To climb the leaderboard, we designed an engineering strategy based on combining multiple layers of intelligence: 

  • Our Own Template-Based Modeling (TBM): Instead of starting from scratch, we built our own TBM system, a method that uses known biological structures as a scaffold or mold to map out new ones. By using Biopython to align unknown sequences with existing and proven data, we created a robust baseline that served as the backbone for our more complex architectures.
  • State-of-the-Art Architectures (RNAPro & Protenix): We integrated and optimized cutting-edge models from industry leaders like NVIDIA and ByteDance. We fine-tuned their diffusion parameters and refinement cycles, an iterative process where the AI continuously "polishes" a structure to reach its most stable form, allowing our system to handle everything from short sequences to giant RNA structures with high precision. 
  • In-house Innovation: Our main innovation involved fine-tuning diffusion models, which are advanced architectures that generate high-fidelity structures by reversing "noise" into clear and organized patterns. We took these models a step further by modifying their training process to include approximations of the TM-score. Since this is the gold-standard metric in biology for measuring how accurately a predicted shape matches a real molecule, integrating it directly into the learning process was a game-changer for our precision. While the journey involved hurdles like catastrophic forgetting, where a model starts losing original skills while learning new tasks, every obstacle became an invaluable lesson in how to build more resilient AI.

Talent That Transforms

Securing 15th place globally is a true reflection of our culture. We want to give a special shout-out to our squad of Data Scientists that led this initiative with remarkable drive. At WhiteBox, we don't believe in watching from the sidelines but our team dives deep into every challenge, innovating and competing at the highest level alongside thousands of experts worldwide .

Want to dive into the technical details? Check out our full Solution Writeup on Kaggle.

‍

Related articles
ETHAN: Compare Skills and Roles with Generative AI
Other

ETHAN: Compare Skills and Roles with Generative AI

Discover how ETHAN turns messy job text into ESCO-aligned skills and roles with LLMs for faster, more reliable matches
The online courses you must take to be a better Data Scientist
Other

The online courses you must take to be a better Data Scientist

A curated list of the best existing online courses about Data Science. From Python coding to Deep Learning.
Deepracer re:Invent 2019
Other

Deepracer re:Invent 2019

Our experience competing in the most important autonomous driving competition.
Lead The Change ·
Lead The Change ·
Lead The Change ·
Contact us
About WhiteBox
ServicesEmploymentAbout usBlog
Contact us
ContactLinkedInTwitter
WhiteBox

We cover the needs of Artificial Intelligence projects from start to finish.

LegalCookiesPrivacyQuality