Zdravo! 🇭🇷
For those always in the know, you're likely already aware: OpenAI, the research lab co-founded by Elon Musk and Sam Altman (originator of YCombinator), recently rolled out the second iteration of DALL•E (in a cross-reference to Pixar's WALL-E film 🤖 and the famous painter Salvador Dalí 🎨).
This is an AI program that can generate images from a text description, and the results are very impressive. It utilizes two models previously developed by OpenAI: GTP-3, a generative text model pre-trained with 12 billion parameters (we're talking about LLM or Large Language Model), based on a Transformer-type architecture, and CLIP (Contrastive Language-Image Pre-training), a model capable of describing the content of an image in natural language.
The results are staggering, I encourage you to read Dave Orr's account of experiments conducted here (with lots of images!). The research paper is also available as a preprint here.
If you've always dreamed of observing an (excellent) data scientist in their natural habitat, I suggest this video where Vincent Warmerdam solves a data deduplication problem by combining exploratory analysis in a Jupyter notebook and creating a custom data annotation interface with Prodigy.
He clearly explains the concept of record linkage, which is a type of data deduplication useful when attempting to link data to an existing reference.
Vincent works at ExplosionAI, the Berlin-based startup responsible for spaCy, a groundbreaking open source, general-purpose, multilingual library that has revolutionized NLP in recent years, and Prodigy, their product for easily annotating training data. He also created the Python tutorial site Calm Code (previously mentioned in ⚡️Trendbreak #14⚡️).
We round up this edition with a video from Nature's YouTube channel, detailing a collaboration between a Deep Mind engineer and a specialist in ancient Greece 🏛 affiliated with Harvard and the University of Venice. They designed a deep learning-based language model to assist researchers in completing ancient texts that have reached us in a fragmented form 📜. The scientific article is available here, along with an interactive demo here.
Enjoy your weekend and happy reading! 📚