Inuugujoq! 🇬🇱
Just a few days ago, the code-hosting platform GitHub (acquired by Microsoft a few years back) announced the general availability of Copilot, its new product developed in partnership with OpenAI. It's a code-writing assistant powered by AI, kind of like autocomplete on steroids, as during tests conducted with over a million developers, it managed to write up to 40% of the code 🤖🧑💻.
Technically, Copilot is built on recent advancements in language generation models using deep neural networks, such as GPT-3. Indeed, a programming language can certainly be considered a language, with its grammar (usually stricter than human languages), idioms, and even aesthetics.
The training data for such a model is the other critical component. On this point, as with other models created by Big Tech companies, questions have been raised, since Copilot is trained on a massive amount of open-source code. While GitHub assures it filters out projects that declare a license prohibiting code reuse for commercial purposes, it's a gray area when it comes to "fair use". Additionally, it appears that 40% of the generated code contains security flaws, raising questions about liability.
Chances are you're familiar with scikit-learn, the world's most widely used machine learning library. Intel has developed an open-source extension, Intelex, that dramatically enhances computational performance, slashing the training time of numerous models by 10x or even 100x! 🚀
Intel utilizes many low-level enhancements (optimized processor instruction sets for vector computation, computation libraries like oneMKL) that work on both CPUs and GPUs. You can find the list of supported models here.
We'll wrap up this edition with a scientific literature review, wherein Andrew Gelman (a professor at Columbia University in New York, and one of the main developers of the probabilistic programming language Stan) and Aki Vehtari (a professor at Aalto University 🇫🇮) detail the eight most important statistical concepts of the past fifty years. Among them, you'll find causal inference, regularization and overparameterized models, robust inference, and exploratory data analysis.
With over 180 references selected by the authors, this article is a good starting point to learn more about these fundamental techniques and ideas.
As a bonus, here's a visualization of these articles in the form of a timeline by Anna Menacher, a doctoral student at Oxford.
Happy reading and have a great week! 🤓