Bonġu 🇲🇹
A significant leap in causal inference over the past decade has been the advent of synthetic control methods. They're just the ticket for evaluating economic or public health policies where a true randomized trial often isn't feasible, allowing for a more precise estimation of the true treatment effect even with only observational data at hand.
In this blog post, Matteo Courthoud, an economics doctoral student at the University of Zürich 🇨🇭, spells out the nitty-gritty of this method using a detailed example.
We keep the ball rolling with a quick read highlighting some lesser-known functions of pandas 🐼 - Python's 🐍 crown jewel for data manipulation. One that often comes to my rescue when summarizing data or whipping up graphs of multiple series is crosstab
!
Wrapping up this week's brain food is a presentation from PyData London 2022 by Dillon Gardner, a data scientist who cut his teeth in physics (earning his Ph.D. from MIT) and has dabbled in a mix of fields (adtech, fintech, energy markets...). Gardner shines a light on the chasm between traditional machine learning metrics used to gauge model quality and the metrics that genuinely matter to the business domain in question. He drives this point home with an instructive example of a financial loan attribution algorithm; here, the AUC (Area Under Curve) falls short when it comes to deciding the model's good calibration.
You can get your hands on the presentation slides here.
And while we're on the topic of PyData, let me give you a quick lowdown. It's a community that hosts a plethora of educational events (conferences, meetups...) with most videos readily available online. It's backed by NumFocus, an advocate for open practices in research, funding the development of a slew of open source scientific computing libraries in Python (NumPy, Jupyter, SciPy, pandas, Matplotlib...) and other languages.
Here's to a week filled with learning and discovery! 🤓