Photo by Venti Views

Data Science Development Environments

Hacking M1 laptops with Dev-Containers
Giorgia Tandoi
Giorgia Tandoi
Mar 8, 2022
development practices datascience python m1

We’re hiring full stack software engineers.

Join us remote/on-site in ’s-Hertogenbosch, The Netherlands 🇳🇱

Join us in building a fintech company that provides fast and easy access to credit for small and medium sized businesses — like a bank, but without the white collars. You’ll work on software wiring million of euros, every day, to our customers.

We’re looking for both junior and experienced software developers.

Find out more…
Ruby on Rails PostgreSQL Docker AWS React React Native

It was a lovely day when my manager handed me a brand new Macbook M1. I was excited about it as I heard a lot of good news on the efficiency of the new Apple Silicon Chip. However, the high was soon followed by a crashing low.

What went wrong?

Setting up our Python repository locally resulted in an array of console failure tracebacks: I was not able to create the virtual environments or run the REST services locally. Our machine learning services were built with Python 3.7, which is not supported on Macbooks M1. However, solely updating Python and the version of some of its packages, did not solve the problem. The issue was more complex than that.

Many data science packages on the deepest level of abstraction are actually compiled in C or C++ and therefore need packages such as llvmlite in order to be installed. However these packages officially do not yet support the M1 architecture and therefore their pip install would simply fail. Given this problem, we could not install all the resolved dependencies and create the corresponding development virtual environments.

How to deal with the problem?

Right at the moment I thought we had no easy way out for this issue, I decided to try out the Visual Studio Code Remote - Containers extension. It surprisingly turned out to be a fairly easy workaround.

The extension allows developers to reopen Visual Studio Code in a running Docker container and access its terminal. It also allows you to define and manage the container from within VSCode. All you need to do is to define a devcontainer.json file.

This file contains all the information needed to build the Docker file of your development container. The extension provides pre-filled configuration files and I started with a Python one. A couple failures and one success later, the first version was up and running. Poetry was finally printing green dots everywhere and I was able to run pytest suites.

While this issue seemed to be solved, another one popped up along the way, and yet again dev-containers seemed to have an easy solution to it. Many of our python scripts within the repository connect remotely to our AWS S3 instance to download and save files. In order for that to happen, you need to set up awscli and aws-credentials configuration on your laptop. As explained in this guide, adding such credentials to the container is not the smartest solution. The same container configuration file is shared among all the developers in the team. However, we can add a “mounts” field in the devcontainer.json and mount the corresponding environment variable to the development container as follows:

"mounts": [
		"source=${env:HOME}${env:USERPROFILE}/.aws,target=/root/.aws,type=bind"
	]

This means that after authenticating from my laptop and reopening the repository on the dev container I will be able to connect to AWS when running the python scripts in the virtual environments belonging to the dev container.

Great! Now I can run everything! But wait a minute… is there a way to already tell VSCode what I need to run in the docker container so that it will contain all that is needed for my repository? It turns out the field postCreateCommand allows us to specify all sorts of commands we want docker to run when building the container. These are the commands I added:

#!/usr/bin/env bash

apt-get update  
apt-get -y install --no-install-recommends awscli
pip install tox && (curl -sSL https://install.python-poetry.org | python -)
cd /workspaces/floryn-ml
poetry config virtualenvs.in-project true
make create-prediction-services-venvs

Wrapping up

By adding these commands, simply checking out the repo, and using the VSCode command Remote Containers - Reopen in Container will build the docker dev container for the first time and while doing so it will also automatically create all the necessary environments. Easy set up!

Figure 1 - Starting up a container.

All in all, migrating to an M1 laptop caused a little pain but setting up dev containers allowed me to finally enjoy its perks. Happy coding for all new developers :)

Floryn

Floryn is a fast growing Dutch fintech, we provide loans to companies with the best customer experience and service, completely online. We use our own bespoke credit models built on banking data, supported by AI & Machine Learning.

Topics
machine-learning people culture rails online-marketing business-intelligence Documentation agile retrospectives facilitation
© 2023 Floryn B.V.