Huomenta! 🇫🇮 ("good morning" in Finnish, the language spoken in Lapland, the country of Santa Claus! 🎅)
For this holiday newsletter, we begin with Miller, a handy tool that allows you to easily examine and transform CSV or JSON text files from the command line. The documentation is very well done and the tool is quite powerful!
We continue with a blog post which demonstrates through a simple example that it is often more effective to invest time in acquiring new data than building more sophisticated models when looking to improve a classification algorithm. The argument is constructed by showing that by degrading good quality data, you can't regain the same performance as the original data, even with sophisticated models.
Finally, here is a blog post by Rob Hyndman, Professor of Statistics at Monash University in Australia 🇦🇺🦘, which presents the concept of cross-validation, central to ML, to a more statistical audience. It's an opportunity for Machine Learners to discover other types of model selection metrics, especially some that are regularizing as they penalize the number of parameters used.
Happy holidays everyone, and enjoy your vacation if you are taking one!