Writing a custom analyzer for ActiveStorage
Every now and then, I sit down an try out some new features of Rails, or gems I’ve read about but never really had any usecase for in my daily work. This weekend I’ve played around with ActiveStorage and Hotwire, making a simple clone of a ETL project I like to rewrite.
The basic use-case for the app is to fetch EDI files from different locations, organise and parse them into a warehousing solution. The production version of this app uses S3 quite heavily, and has lots of custom code for managing the documents stored in S3.
ActiveStorage is quite easy to set up, and has built-in support for S3 storage (and many other cloud solutions). After a file has been attached, it’s queued for what is called analyzing. Basically it iterates an array of registered analyzers, and picks the one that responds to accept?
first. There are a couple of built in analyzers:
They are registered in that order, hooks on to the mimetype of the uploaded files, and gives back some metadata when ran.
For my particular usecase, I want to analyze mostly text/csv
and text/plaintext
files, at least for this basic testing. I’ll start by making a file called lib/analyzer/edi_analyzer.rb
with the very basic requirements of an ActiveStorage analyzer:
This should be enough to register my analyzer, which should be done in an initializer. In config/initializers/active_storage_analyzers.rb
I add the following code:
If I open a rails console
, I can call on ActiveStorage.analyzers
to see that my analyzer is correctly registered:
irb(main):026:0> ActiveStorage.analyzers
=> [ActiveStorage::Analyzer::ImageAnalyzer, ActiveStorage::Analyzer::VideoAnalyzer, Analyzer::EdiAnalyzer]
As you can see, the analyzer is registered as the last one being called, if you’d like to add it as the first, change .append
to .prepend
in your initializer.
While the analyzer now is registered, it will never be called, as it has not yet implemented the class method accept?
. Let’s do that.
This returns true given that the content_type
of the uploaded file is either text/csv
or text/plain
. In a later evolution of this analyzer you would probably do a more complex check, but this will do for now.
On to the actual result of the analyzer. When our analyzer is picked as the one to provide metadata, ActiveStorage will call metadata
on an instance of our class. I have already implemented a simple EdiFileParser
class that can parse the headers of these files, so let’s utilize that in our analyzer:
Quite a bit of code added there. While it should be pretty straightforward, let’s break it down some. When metadata
is called on our instance, we make a call for the private method parse_file
. If the call is successfull, and we get an object for our block, we use the object to return a hash of useful metadata.
Inside the parse_file
private method, I call on download_blob_to_tempfile
. This is a private method provided by ActiveStorage::Analyzer
, that returns a Tempfile
with the blob contents.
I then instantiate a copy of my EdiParser
with this Tempfile
. If the parser thinks it’s valid?
I yield the instance of the parser back to my original block, and if it’s not, I log the result and fail silently. The logger
method is also provided by ActiveStorage::Analyzer
, and is an alias for ActiveStorage.logger
.
Given a known format, my ActiveStorage attachment will now have a metadata hash like this:
Happy hacking :)
Recent Posts
Extend ActiveStorage::Blob with callbacks
ActiveStorage is currently missing both validations and callbacks, but you can easily extend it with the callbacks you need.
Manjaro/Arch: transfer packages to another computer
How to make a backup of all installed packages on a Arch/Manjaro distro, and install them on a different machine.
Getting Norwegian characters on a US keyboard in Linux
Using a aluminum Apple keyboard in Linux, and getting accessible Norwegian characters like MacOS.