Ayubowan! 🇱🇰
Today we're going to talk about the Parquet file format, Lux, a visualization library, and the genesis of Jupyter notebooks, the tool of choice for data scientists.
Two undeniable advantages of the CSV format that we manipulate every day are the ability to easily read it, and its compatibility with most systems and tools. However, more optimized formats have been developed in recent years, especially when it comes to processing large files. Among them, the Parquet format is becoming more and more established: developed by the Apache Foundation, it is open source, and allows both storage and processing time savings. Here is a blog article that details its advantages over the venerable CSV.
There are many visualization tools in Python 🐍. Lux 💡 is a newcomer, actively developed by the RISE Lab and the School of Information at UC Berkeley, which aims to assist exploratory data analysis. It presents itself as an interactive widget in Jupyter notebooks, designed to work directly on Pandas dataframes, automatically choosing the best visualization based on the data type 🤖📊.
And to conclude this edition, here is a (long) article from The Atlantic explaining the origin of Jupyter notebooks, the weapon of choice for data scientists today. They represent the latest incarnation of a long-standing reflection on the future of the scientific paper, including notable figures such as Bret Victor (designer of the iPad and the Apple Watch) and Stephen Wolfram (visionary and controversial physicist).
Happy reading and have a great week! 📚