⚡️Trendbreak #24⚡️

Kumustá! 🇵🇭

Privacy protection laws (like GDPR, the French Data Protection Act...) strictly regulate data use, particularly when it comes to analysis or constructing machine learning models. To counter this, a slew of techniques have been developed around "synthetic data". The idea is to simulate the behavior of digital doppelgängers 🤖🥸 - realistic enough to maintain statistical relevance, but freeing you from the constraints tied to real individuals.
This article from MIT Technology Review showcases several companies and use cases tied to "fake data", along with points to consider: no surprise, the learned algorithms still hinge on the quality of the input data — a nod to the famous saying "Garbage in, garbage out!".

If you've so much as glanced at a machine learning tutorial, you've probably heard of the "Iris" dataset. It's a set of measurements (petal sizes...) from specimens of three different iris species 🌸. This brief blog post traces the creation of this venerable dataset.

While machine learning places a premium on prediction problems (where an algorithm, say, correctly labels an image's content or diagnoses an illness — termed as classification; or accurately estimates a property's price from its characteristics — what we call regression), a significant portion of scientific research is more focused on understanding a phenomenon, often by mathematically modelling it. This is true for physics, climate science, and economics 👨‍🔬🧑‍🔬🌪🏦.
This fascinating Quanta Magazine article frames the emergence of "science robots", capable of distilling natural laws from the examination of quantified observations, as well — if not better — than a physicist might. This type of approach can yield far better results than deep learning models, for example, in producing equations that describe planetary motion 🌞🌗🌎🪐.

Happy reading and have a wonderful week! 📚

By @Clément Chastagnol in
Tags : #Trendbreak,