From Theory to Trade: Mastering the Corporate Code

Last week, I had the privilege of attending the Biotechno Est Forum in Strasbourg, France. The event is primarily a job fair, and it's organized by and for PhD students and post-docs. The forum featured various roundtables that focused on the transition from academia to industry.

I was honored to participate in the data science and AI roundtable, which was organized by Yung-Chieh Chen. My fellow speakers included Dr. Alexander Neumann, co-founder of Omiqa Bioinformatics, a Berlin-based CRO specializing in sequencing data analysis, Dr. Zeineb Si Chaib, a senior scientist at Schrödinger, working on AI-driven drug design, and Dr. Benoît Rosa, a CNRS researcher in medical robotics, and long-time friend of mine.

For my brief presentation, I wanted to share the advice that I wish someone had given me when I first entered the industry after completing my PhD. I reflected on the skills that I developed during my PhD that proved to be valuable later on, as well as the differences between academia and industry.

Below, I've summarized the key points from my presentation. Keep in mind that this is not an exhaustive list, and my insights are based solely on my own experiences. So, take it with a grain of salt!

Official poster of the event (credits Forum Biotechno Est).

Your strengths, out of your PhD¶

As a PhD holder, you come with a unique set of skills that make you an asset to any organization. Here are some of the strengths that you can leverage in your professional life:

Curiosity and capacity to dig deeper on an issue
Clear writing abilities
Public speaking skills
Ability to formulate a problem
Ability to precisely define a concept

To be clear, it's strengths that you're supposed to have acquired during your PhD. Let's go through each one.

Curiosity and capacity to dig deeper on an issue¶

To be fair, it's not so much a capacity, as a willingness: most people are not used to asking "why" all the time and looking for sources, but it's been drilled into us during our PhD.

You're also used to train yourself on new topics constantly, and to efficiently seek high-quality information. This is a valuable skill that you can share with your team, especially since this is not the case for most engineers, most of whom are not used to attending conferences. I think creating or encouraging a culture in your team around this is a way to have a lot of impact, since you'll have a multiplying effect on the team, and this is really high-value. In the data science and ML space, there are a lot of conferences that you can attend to online and for free.

Clear writing abilities¶

This is an absolute must, and this is something everyone can keep working on, because we always need to make things clearer and clearer, and to adapt our discourse to our reader/audience.

In the companies you'll work at, people will usually turn to you when they encounter complex things, in the hope that you help them cut through the complexity, and gain a better understanding. So if you're making things even more confusing, you'll be letting them down! And they probably won't ask you next time.

Public speaking skills¶

This can seem surprising, but during your PhD you've probably had far more opportunities to do presentations, talks, or seminars, than most engineers your age. And you've probably had some tutoring around public speaking.

This is something valuable, if you're not yet comfortable with it, you should absolutely take on any opportunity to speak publicly. This will open up new possibilities for you, for instance being able to discuss with customers, with varying degrees of technical knowledge.

Ability to formulate or reformulate an issue¶

Research gives you the ability to be more comfortable with complexity and uncertainty than most people. In companies, you'll find a lot of people that are very good at execution, meaning they can code well and fast (you should definitely try to learn as much as you can from them!). But you usually have a better technical culture when it comes to algorithms, machine learning models, statistical methods and pitfalls.

So seek out problems that people have, listen to them very carefully, and ask questions to get past the symptoms of the problems, and closer to the causes; helping them can be about developing a sufficient perspective that allows you to reformulate it in a better and simpler framework, or creating a shared and precise vocabulary. From there the solution can often be an order of magnitude easier to find, or you can realize that there's a completely different way of doing things that may avoid the issue entirely.

Ability to precisely define a concept¶

You'll realize quickly that in the corporate world people are usually very lax with definitions. The same name can mean different things to different people, and different names might be used for the same thing. This can lead to confusion and miscommunication.

Be willing to ask seemingly dumb questions, and confront people with examples to get to a very precise definition, meaning one that can be implemented with code. A very common example is the definition of "customer"; depending on who you ask (sales, customer services, finance...), it will not be the same in subtle ways.

Depending on the environment, you might also find that there is very little written information accessible to everyone in the company: info might be in Word documents or PowerPoint presentations, emails, or worse, inside people's heads... In that case I think it's your responsibility, since you're probably the best suited person for that task, to start some kind of internal wiki, and to make information accessible, searchable, and up-to-date.

What is different, and what is implicitely expected from you¶

No premium on novelty or complexity
Different incentives
Capacity to work within a team is crucial
Knowing how to build and evaluate a baseline

There's no premium on novelty or complexity¶

This is one of the biggest differences with academia, and you might need quite a long time to shake off that reflex... Usually in the industry, loving novelty and complexity for their own sake is viewed as a sign of someone junior that tries to impress their colleagues. So, not especially favorably.

Granted, sometimes complexity can be an argument wielded by sales and marketing to help sell your product. But overall you need to start thinking a lot more like engineers: more moving parts and complexity are bad, it makes things more fragile, and more difficult and expensive to maintain.

A question I like asking myself when confronted with a new problem is: "what's the absolute dumbest solution that will work, and that I can get away with?". And then I implement that trivial solution. Even if you might need a more complex solution in the end, a simple one is not a waste of time: it will help you better understand the problem, it can act as a baseline (and sometimes a surprisingly difficult one to beat!), or as a fail-safe: once you embed your solution in a larger system, if conditions make your "complex" solution fail, you can still default to the "dumb" one.

The incentives are different, so you get very different behaviors¶

No one cares much about SotA (State-of-the-Art, meaning the best performance in the scientific literature for a given research problem at a given date), because SotA does not tie in easily with revenue.
On a related note, you have to optimize for the right metric: data scientists naïvely use accuracy or something like this, sometimes because it's the easily accessible one, sometimes because they didn't talk with the business people. But you need to realize that the model you're building will be a tiny part of a larger system, that is supposed to be profitable from the point-of-view of the company. So, at some point, you have to translate the evaluation of your systems into €/$, and you have to monitor its performance.
On the same topic, there's a talk I really like, by Dillon Gardner at PyData London 2022, where he shows very clearly that the best model can sometimes make you lose money.

I'll just throw two other observations here, without much explanation, because I think they're pretty obvious: the time horizons are not the same, and in the industry, "done is better than perfect".

Capacity to work within a team is crucial¶

This point is mainly about coding. There's a stigma around bad code from researchers. So I suggest that you learn right now the usual tools and methodologies around version control, code testing, code reviews, agile work methods and rituals... if you don't know them already.

First of all, you won't be looked at like a complete alien when you arrive on your first day in a team, and ask "what's that git thingy you keep talking about?". And second, it might actually help you during your PhD, to track the changes in your manuscript, in your experimental results, or in your statistical analyses.
If you still need to be convinced by someone else, here's a very detailed blog post about why you might care (OK, I admit that the intro is not very encouraging, but I promise it's worth it).

Knowing how to build and evaluate a baseline¶

In the industry, when you have a new problem to solve, usually there's no clear baseline. It's up to you to: formalize the data you need to collect (what dimensions do you need? how will you code the information, what's the desirable level of granularity?), then actually gather it, build a first simple system, define the modalities of evaluation, run a first evaluation, and then improve on it.
This is not simple. Hopefully your experience in academia has exposed you to some of these steps, so you already have a good grasp on good practices. I hope it also made you absolutely obsess about having good quality and up-to-date data.

Also just to be clear, when I say "improve on it", I'm not talking about beating SotA here. First because you won't be in the same conditions: it's very unlikely that your particular use case is exactly the same than a similar and famous research task, for which clean and open datasets exist. And second, because you don't need to beat every system out there, usually you just need to show improvement on past performance.

Conclusion¶

There it is, everything I wish I'd been told before entering the industry after my PhD.

Coming from academia, the industry is a completely new world, and you have a lot of things to learn, so it's exciting! A lot of the problems you'll encounter will not seem very interesting at first glance, but if you go sufficiently in the details, suddenly you'll understand their intricacy. It's up to you to uncover this hidden complexity, and to elucidate it, so that you can best address it.

One last crucial piece of advice: when searching for your first job, I think you should be as careful as when you chose your PhD advisor. Seek teams that already have a knack for learning new stuff and continously improve on how they work, but where your skills don't overlap too much. This way, everyone will have something to learn from one another. And having someone more experienced who took a similar journey with a stint in academia will be beneficial.