Photo by Dan-Cristian Pădureț on Unsplash

Migrating 800k+ files from Paperclip to ActiveStorage

There is never a good time for big data migrations. But with some care and planning, you can make the entire process relatively painless — and do it safely and with zero downtime, too.
Arjan van der Gaag
Arjan van der Gaag
Jan 14, 2021
rails

For Floryn, migrating away from the now-deprecated Paperclip was the primary issue blocking an upgrade to Rails 6 on our primary Ruby on Rails application. File uploads is a critical feature, so it was important to handle any migration with care and avoid service interruptions. We were looking to migrate some 800.000 user-uploaded documents, spread over a range of different user-facing features and application models. We managed to transition these from Paperclip to Rails’ own ActiveStorage with no downtime.

A little planning goes a long way

After considering various strategies, we decided on a multi-step approach to this migration:

The migration was implemented by Floryn developers Hugo and Iris over the course of a couple of months at the end of 2020.

Iris remembers this was a difficult project to plan. “Even with all the up-front research in the world, you’re always going to find unexpected issues. This was one of those projects where you cannot cut scope when you come up to the deadline. In the end, everything had to be migrated. That required some flexibility in the planning process, so that this project could take as much time as needed. I’m glad we could easily adapt our process to it.”

Running into trouble

As expected, unexpected issues were found. For one, some parts of ActiveStorage weren’t that well-documented quite yet. Hugo recalls “it took some source code investigation to understand what exactly was going on sometimes”. It slowed things down a little, but “up to a point, that’s what you end up doing with all the libraries you use. The power of open-source software is that you can dive into the source to get a better understanding.”

There were also some practical difficulties. Iris: “one of our use cases was for graphs and charts from one of our own external services. They were served as SVG content. ActiveStorage, unlike Paperclip, treats SVG content as binary — for perfectly valid security reasons. Valid or no, it broke our use case. Luckily, because we could trust this service, we could implement our own show_unsafe_svg helper method to work around it.”

Being able to link new ActiveStorage::Attachment records back to the original Paperclip models also proved a life-saver, and not just for after-the-fact verification. During the migration, we noticed our changes triggered some workflows and notifications that we hadn’t anticipated. To prevent this from happening, we needed to retrieve some extra information from the original records. Fortunately, this was easy because of the links we had set up up front.

Furthermore, ActiveStorage does not offer quite the same capabilities for validating file uploads as Paperclip does. “Although there are third-party gems that can help, our requirements were relatively simple,” says Hugo. “We decided to avoid the extra dependency, and develop our own custom validations.”

it took some source code investigation to understand what exactly was going on sometimes.

Finally, we had to work around an issue with orphaned uploaded files in Rails 5. When the user uploads files, they get stored immediately. When the associated model object is not saved, however, uploaded files might be left dangling. We had to introduce some temporary container to account for that. Iris: “I don’t like the extra complexity that introduces, but we already know with Rails 6, we can remove this work-around again.”

Looking back

Although the final migration took a considerable amount of wall time to complete, the entire process went over relatively smoothly. Along the way, we even found some good refactoring opportunities to clean up old code and make things just a little better than before.

So, how do you switch from Paperclip to ActiveStorage safely and cleanly? The ultimate trick might be to make extra sure whether you really need all those features in the first place. “After migrating one big batch of documents, we learned that that specific piece of functionality was used by only a single customer. We talked to them and learned we could easily transition them off it,” remembers Iris. “After that, we could remove the entire feature — along with all the migrated files — entirely. Knowing that up front would have saved us a bunch of time”. As it turns out, no migration is faster than no migration.

More from Floryn

Floryn

Floryn is a fast growing Dutch fintech, we provide loans to companies with the best customer experience and service, completely online. We use our own bespoke credit models built on banking data, supported by AI & Machine Learning.

© 2021 Floryn B.V.