Skip to main content

Newsletter for September 2022

· 5 min read
Ali Ghazouani

screenshot-app

Data Science

Why You Should Warn Customers When You’re Running Low on Stock

The supply-chain disruptions due to the pandemic and the Ukraine war caused the retailers to face unprecedented stockouts risks. To overcome this challenge, Instacart suggest that honesty is the best policy. By using a Machine Learning model to predict that an item is likely out-of-stock and therefore warning clients when it’s the case, they reduced the proportion of replacements and refunds in the short term and increased the total revenue per customer in the long term.

 

Why You Should Warn Customers When You’re Running Low on Stock - Harvard Business Review

Machine Learning

Hopular: Modern Hopfield Networks for Tabular Data

While Deep Learning excels in structured data as encountered in vision and natural language processing, it failed to meet its expectations on tabular data. For tabular data, Support Vector Machines (SVMs), Random Forests, and Gradient Boosting are the best performing techniques with Gradient Boosting in the lead.
"Hopular" is a novel Deep Learning architecture for medium- and small-sized datasets, where each layer is equipped with continuous modern Hopfield networks.
In experiments on small-sized tabular datasets with less than 1,000 samples, Hopular surpasses Gradient Boosting, Random Forests, SVMs, and in particular several Deep Learning methods. In experiments on medium-sized tabular data with about 10,000 samples, Hopular outperforms XGBoost, CatBoost, LightGBM and a state-of-the art Deep Learning method designed for tabular data. Thus, Hopular is a strong alternative to these methods on tabular data.

 

GitHub - Hopular

Data Engineering & Architecture

How to learn data engineering

Like data scientists, data engineers write code. They’re highly analytical, and are interested in data visualization. Unlike data scientists — and inspired by our more mature parent, software engineering — data engineers build tools, infrastructure, frameworks, and services. In fact, it’s arguable that data engineering is much closer to software engineering than it is to a data science.
This post introduces the role of a data engineer and the challenges he faces on a daily basis. It also provides some very interesting links to acquire the basics or to consolidate your knowledge in this domain.

 

How to learn data engineering | Blef.fr

 

Do’s and Don’ts of Data Mesh

Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes, a new paradigm saw the light : Data Mesh.
This article introduces Data Mesh and offers a series of advices, Do’s and Don’ts as a result of its implementation in BlaBlaCar.

 

Do’s and Don’ts of Data Mesh | Kineret Kimhi | Medium

App and Web Development

Introducing Signals

Signals are a way of expressing state that ensure apps stay fast regardless of how complex they get. Signals are based on reactive principles and provide excellent developer ergonomics, with a unique implementation optimized for Virtual DOM.

Introducing Signals | Preactjs.com

Best practices for creating a modern npm package

NPM is an online repository for the publishing of open-source Node.js projects, it also allows to interact with the repository through a command-line utility for package installation, version management and dependency management.
This article details the best practices for creating and managing an npm package such as security checks, automated semantic version management etc.

 

Best practices for creating a modern npm package | Brian Clark | Synk.io

Special Section: Responsible AI

OpenRAIL: Towards open and responsible AI licensing frameworks

Advances in machine learning and other AI-related areas have flourished these past years partly thanks to thethe open source culture. It has in fact allowed knowledge sharing and created communities that fostered innovation. Nevertheless, recent events related to the ethical and socio-economic concerns of development and use of machine learning models have spread a clear message: Making sure AI is responsible is incompatible with open-source. Yet, closed systems are not the answer, as the problem persists under the opacity of firms' private AI development processes.
In this context, the OpenRAIL approach suggests a new type of licensing that embed a specific set of restrictions to make sure of the good usage of the models. Therefore, while benefiting from an open access to the ML model, the user will not be able to use the model for the specified restricted scenarios.

 

OpenRAIL: Towards open and responsible AI licensing frameworks | Carlos Munoz Ferrandis | huggingface.co

Credits