Why a Single Large DoE Fails Biosimilar Glycan Optimization — And the Parallel Screening Method That Actually Works

Bioprocessing | Smart Biotech Scientist Podcast

May 12, 2026

Today we're talking about one of the most frustrating bottlenecks in biosimilar development: how to screen efficiently a long list of quality-modulating compounds when your standard tools — OVAT on one end, a massive DoE on the other — both break down under the weight of the problem.

This is a two-part episode of the Smart Biotech Scientist Podcast. In Part 1, I'll walk you through the problem and the conceptual framework we developed to solve it. In Part 2, we go hands-on: how to actually build this in your lab.

Why Your DoE Is Probably Wrong — And the Smarter Way to Screen 17 Compounds

In the 1990s, the pharmaceutical industry went through a revolution in drug discovery.

Before that shift, chemists were synthesizing and testing candidate molecules one at a time. It was rigorous. It was thorough. And it was slow.

Then high-throughput screening arrived.

The idea was simple: instead of testing one compound at a time, you automate, miniaturize, and parallelize. You screen thousands — sometimes millions — simultaneously. The winning molecules emerged from the data, not from any individual scientist’s intuition. That shift compressed decades of chemistry into years.

Now ask yourself this:

When your cell culture team needs to evaluate 17 potential quality-modulating compounds for a biosimilar development program, how are they doing it?

One compound at a time?
Or one massive design of experiments with all 17 factors thrown in together?

If either of those is the answer — you're leaving months on the table, and you're likely missing the biological interactions that matter most.

This episode is about borrowing that same spirit of intelligent parallelization, applying it to cell culture media optimization, and getting you to your answer in two rounds of experiments.

The 17-Compound Problem

Let me set the scene. You're developing a biosimilar monoclonal antibody. Your glycan profile doesn't match the reference product — high mannose content is off, galactosylation doesn't align, fucosylation is out of range. These are your critical quality attributes, and the gap needs to close.

You do your homework. You review the literature, consult your colleagues, look at what compounds have been shown to modulate glycosylation in CHO cells. At the end of that exercise, you have 17 candidate quality-modulating compounds. Not two. Not five. Seventeen.

Option A is one-variable-at-a-time — OVAT. Test each compound independently, one at a time. This is intuitive and simple to execute. It also takes many experiments, and it will miss every interaction effect. In a system as complex as CHO glycosylation, those interactions are not a secondary concern. They're often the whole story.

Option B sounds more rigorous: one large design of experiments with all 17 factors. Statistically designed, combinatorial. In theory, you capture everything at once.

In practice, it has three fatal flaws.

First: dilution effects. Adding 17 stock solutions to your medium changes the total volume. That dilutes everything else — basal medium, glucose, glutamine. Your model is trying to interpret compound effects against a background that's constantly shifting. Signal degrades.

Second: combinatorial toxicity. Without prior concentration qualification for each of your 17 compounds, some combinations will be toxic. Cells die, wells fail, data disappears — in a clustered, non-random way. With 17 unqualified factors, data loss may jeopardize your ability to draw any conclusions from your experiment.

Third: masking. If one dominant modulator is in the mix — a mannosidase inhibitor pushing high mannose to 90 percent — that signal drowns out the more subtle effects of the other 16 compounds. The candidates that might give you fine-tuned control never surface.

We ran into all three of these problems. And that's what forced us to develop a different approach.

The Parallel Group Method

The core idea: instead of testing all 17 compounds together, split them into five parallel experiments, each with five factors or fewer. Then — critically — run all five experiments at the same time.

This is not a compromise between OVAT and a single large DoE. It is strictly better than both, simultaneously.

Here's how we built the groups. The governing principle: group by biological mechanism, not by convenience.

Group 1 and Group 2: Groups 1 and 2 contained the high mannose modulators. Group 1 held the Golgi processing sugars — compounds like raffinose and GlcNAc that affect mannosylation through osmotic and metabolic mechanisms. Group 2 held the mannosidase inhibitors — compounds that directly block enzymatic trimming of high-mannose glycan structures.

Group 3: targeted sialylation and charge variant modulators.
Group 4: targeted fucosylation and galactosylation drivers.
Group 5: contained growth promoters — compounds that affect culture performance and modulate glycosylation indirectly through metabolic changes.

In every group, we included two anchor compounds: manganese and asparagine. These are well-characterized modulators with documented effects across many CHO cell lines. They served as internal calibration references, allowing us to compare results across groups even though each group was an independent experiment.

All five experiments ran simultaneously in 96-well deep-well plate fed-batch cultures, using robotic liquid handling.

Why is this strictly better?

On dilution: you're adding at most five stock solutions per well instead of 17. Dilution effects are minimal.

On masking: if swainsonine is pushing high mannose to 90 percent in Group 2, it only masks the other four compounds in Group 2. Groups 3, 4, and 5 are completely unaffected.

On calendar time: all five groups run in parallel. Elapsed time is identical to running a single experiment.

You get the biological focus of a small experiment and the candidate breadth of a large one. At the same time, in the same calendar window. In other words, the math works out in your favor on every axis: less time, better signal quality, and better interpretability.

The Multivariate Selection Engine

After a screen like this, you may be tempted to look at one or two key glycoforms — high mannose, or G0F — pick whatever condition showed the best result, and move on. That approach is using about 10 percent of the information your screen just generated.

We measured 13 glycoforms. In biosimilar development, every glycoform is potentially relevant to your CQA specification. Improving three glycoforms while worsening five others doesn't bring you closer to the reference product — it might push you further away on the attributes you didn't check.

To optimize all 13 glycoforms simultaneously toward the reference product profile, you need multivariate statistics. We used three tools in sequence.

The first is PCA — principal component analysis.

PCA compresses your 13-dimensional glycan dataset into two or three dimensions you can visualize. Every experimental condition becomes a point on a score plot. Conditions with similar full glycan profiles cluster together; conditions that differ are separated.

The key move: you project the reference product as an external point onto that same score plot. Now you have a map, and the reference product is the target marked on it. You can see — visually — which conditions are close to where you need to be and which are not.

In our study, three principal components captured 76 percent of the total glycan variance. Conditions containing 2F-peracetyl-fucose clustered far from the reference product target. Conditions containing raffinose clustered closest to it. For the first time, we could see the entire quality landscape in a single picture.

The second tool is Mahalanobis distance.

PCA gives you a map. Mahalanobis distance gives you a number: the multivariate distance from each experimental condition to the reference product target. The lower the number, the better the glycan match.

Unlike simple Euclidean distance, Mahalanobis distance accounts for the correlation structure between glycoforms — and glycoforms are biologically correlated. If one structural class changes, others tend to move predictably. Mahalanobis distance treats those correlations correctly, making it a more accurate measure of how close you actually are to your target profile.

You rank all conditions from lowest to highest distance. The top 20 to 25 percent — the conditions with the lowest Mahalanobis distance — become your confirmation candidates. The selection is objective, data-driven, and fully defensible.

In our paper: conditions with raffinose consistently ranked closest to the reference product. Conditions with 2F-peracetyl-fucose ranked furthest.

The third tool is a decision tree.

You now know which conditions performed best. The decision tree tells you why. It takes your Mahalanobis rankings as input and generates a hierarchical set of if-then rules — which compound, at which concentration level, most reliably drives conditions toward the top of the ranking.

Two rules that are non-negotiable. First: always cross-validate. Sevenfold cross-validation is standard — you partition your data, train and test iteratively, and ensure your rules hold up on data the model hasn't seen. Second: prune the tree. An unpruned tree overfits your specific dataset and gives you rules that don't generalize.

The output is a set of interpretable decision rules you can read out loud, explain to your quality team, and defend to regulatory reviewers. No black box. Every branch is understandable and traceable.

Results Preview

Three group winners emerged: raffinose from Group 1, galactose from Group 4, Enhancer 2 from Group 5 — combined with a temperature shift to 33 degrees Celsius.

In shake tube confirmation: 75 percent of confirmation conditions outperformed the best 25 percent of the initial 96-well screen.

Two rounds of experiments. Estimated time savings of three to six months. Quality testing cost reduction of more than 50 percent.

What's Evolved Since Publication: Hybrid Modeling

One honest caveat before I close Part 1.

The statistical tools we used — PCA, Mahalanobis distance, decision trees — were the right tools in 2017. They still work. But if I were running this study today, I would add one more layer: hybrid modeling.

Hybrid modeling combines mechanistic knowledge of your bioprocess with machine learning trained on experimental data. Applied to this workflow, it can do two things we didn't do in the paper. First, it can help design a smarter initial 96-well screen by predicting which concentration ranges and combinations are most informative, based on historical bioprocess data you already have. Second, it can minimize the confirmation experiments needed to validate your screen results.

I've covered hybrid modeling in depth with several guests. Michael Sokolov — co-author on this paper — walked through the fundamentals in Episodes 5 and 6. Krist Gernaey took it further into digital twin territory in Episodes 173 and 174. Fabian Feidl covered the practical side in Episodes 99 and 100, and Yossi Quint in Episodes 137 and 138. All of those are linked in the show notes.

In Part 2, we go hands-on. How to design your compound groups by biology. How to set concentration ranges without losing data. How to run the 96-well screen with the rigor this method requires. And three things I would do differently if I were running this study today.

Further Listening

Episodes 05 - 06: Hybrid Modeling: The Key to Smarter Bioprocessing with Michael Sokolov

Episodes 173 - 174: Mastering Hybrid Model Digital Twins: From Lab Scale to Commercial Bioprocessing with Krist Gernaey

Episodes 99 - 100: From Raw Data to Actionable Insights: Unlocking the Power of Process Models with Fabian Feidl

Episodes 137 - 138: Skip 90% of Bioreactor Runs: The In Silico Revolution in Bioprocess Development with Yossi Quint

Next Step

If you found value in today’s episode, take a moment to like, follow, and leave a review on Apple Podcasts or your favorite platform—it helps us reach and support more scientists like you.

Thanks for tuning in to the Smart Biotech Scientist podcast and being part of this journey toward bioprocess mastery. For more insights and practical tips, visit www.smartbiotechscientist.com.

David Brühlmann is a strategic advisor who helps C-level biotech leaders reduce development and manufacturing costs to make life-saving therapies accessible to more patients worldwide.

He is also a biotech technology innovation coach, technology transfer leader, and host of the Smart Biotech Scientist podcast—the go-to podcast for biotech scientists who want to master biopharma CMC development and biomanufacturing.

Hear It From The Horse’s Mouth

Want to listen to the full interview? Go to Smart Biotech Scientist Podcast.

Want to hear more? Do visit the podcast page and check out other episodes.
Do you wish to simplify your biologics drug development project? Contact Us

Free Bioprocessing Insights Newsletter

Join 400+ biotech leaders for exclusive bioprocessing tips, strategies, and industry trends that help you accelerate development, cut manufacturing costs, and de-risk scale-up.

Enter Your Email Below

Please wait...

Thank you for joining!

When you sign up, you'll receive regular emails with additional free content.