Newsletter for April 2022

May 2, 2022 · 4 min read

screenshot-app

Data Science

How is Warner Music using AI to turn sound into strategic assets?

Industries relying on creative contents are now moving toward a data centric strategy with dedicated teams and departments. We wanted to share with you the testimony of Kobi Abayomi, vice president of Data Science at Warner Music. The resource is available both as a podcast or as a transcript. The discussion covers broad topics ranging from what’s going on in the industry, what makes a good data science team or what are his views for the future of AI.

Turning Sound Into Information: Warner Music Group’s Kobi Abayomi (mit.edu)

Machine Learning

Gradio is an open-source framework enabling ML Engineer to quickly share their models via a web interface. The cool feature of Gradio is how simple it is while leaving some rooms for changes and modifications. Gradio can work with any kind of models and data structures (Images, Text, Tabular…). However, Gradio is made to share pre-trained models meaning that it cannot be used as an Active Learning asset where data would be provided to the model iteratively to its training. When having to share quick demos without the time or the need to have bespoke interface, Gradio is a no brainer.

Gradio

Learning to prompt for Continual Learning

Even though Pathways Language Models (PaLM) is one of the hottest releases of April 2022, our attention has been focused on another Google Research paper discussing the best way to perform Continual Learning. Continual Learning means to train a single model on various type of tasks iteratively. When being trained at step t, the model does not have access to previous data. One of the main challenge posed by such concept is how to maintain knowledge from past data into the model, avoiding catastrophic forgetting. In this paper, researchers propose an approach by leveraging prompt engineering. Using prompt Is very common in NLP as it tends to better fine-tune pretrained algorithms. The main idea of this paper is to tackle continual learning not as a model weights’ shift but rather as a memory space representing the type of task to be trained on.

Learning to Prompt for Continual Learning

Data Engineering & Architecture

Putting ElasticSearch into production

ElasticSearch is a famous distributed search engine build on Apache Lucene. It has now been pretty much the standard for complex use cases where you have to look for data in a large volume and complex database (cf: How Netflix Content Engineering makes a federated graph searchable). However, when it comes to put it into production, several challenges and pitfalls can occur. Hence, this blog post is a user sharing story with some best practices to adopt and bad habits to avoid.

In depth guide to running Elasticsearch in production | by Mattis Haase | Medium

App and Web Development

How to build a JavaScript Bundler from scratch

A JavaScript Bundler is a tool combining code files in a unique file making it ready to use and deploy. It keeps tracks of every dependency that you might have into your repository. Those are stored into a graph guaranteeing that all your files are updated accordingly. Even if it starts to be common to use a bundler, nothing worth more than building one by ourselves. Following post shows how to build a bundler by yourself while keeping it relatively simple.

Building a JavaScript Bundler | Christoph Nakazawa (cpojer.net)

Special Section: Modeling

Dealing with logs and zeros in regression models

When having data generated by an exponential process, we tend to use log as a function to better fit liner model. However, problems arise when a portion of your data points are equal to zero. Common fix used is adding a constant (often 1) to your data to remove the problem. In this paper, researchers propose a novel family of estimators called iOLS (iterated Ordinary Least Squares). It presents a computational advantage while performing in the best way your fit.

Dealing with Logs and Zeros in Regression Models (arxiv.org)

Data Science​

How is Warner Music using AI to turn sound into strategic assets?​

Machine Learning​

Showcasing and sharing your ML model in the easiest way with Gradio​

Learning to prompt for Continual Learning​

Data Engineering & Architecture​

Putting ElasticSearch into production​

App and Web Development​

How to build a JavaScript Bundler from scratch​

Special Section: Modeling​

Dealing with logs and zeros in regression models​