Every now and then, I sit down an try out some new features of Rails, or gems I’ve read about but never really had any usecase for in my daily work. This weekend I’ve played around with ActiveStorage and Hotwire, making a simple clone of a ETL project I like to rewrite.

The basic use-case for the app is to fetch EDI files from different locations, organise and parse them into a warehousing solution. The production version of this app uses S3 quite heavily, and has lots of custom code for managing the documents stored in S3.

ActiveStorage is quite easy to set up, and has built-in support for S3 storage (and many other cloud solutions). After a file has been attached, it’s queued for what is called analyzing. Basically it iterates an array of registered analyzers, and picks the one that responds to accept? first. There are a couple of built in analyzers:

They are registered in that order, hooks on to the mimetype of the uploaded files, and gives back some metadata when ran.

For my particular usecase, I want to analyze mostly text/csv and text/plaintext files, at least for this basic testing. I’ll start by making a file called lib/analyzer/edi_analyzer.rb with the very basic requirements of an ActiveStorage analyzer:

class Analyzer::EdiAnalyzer < ActiveStorage::Analyzer

  def metadata
    { ruby_is_awesome: true }
  end

end

This should be enough to register my analyzer, which should be done in an initializer. In config/initializers/active_storage_analyzers.rb I add the following code:

  Rails.application.config.active_storage.analyzers.append Analyzer::EdiAnalyzer

If I open a rails console, I can call on ActiveStorage.analyzers to see that my analyzer is correctly registered:

irb(main):026:0> ActiveStorage.analyzers
=> [ActiveStorage::Analyzer::ImageAnalyzer, ActiveStorage::Analyzer::VideoAnalyzer, Analyzer::EdiAnalyzer]

As you can see, the analyzer is registered as the last one being called, if you’d like to add it as the first, change .append to .prepend in your initializer.

While the analyzer now is registered, it will never be called, as it has not yet implemented the class method accept?. Let’s do that.

class Analyzer::EdiAnalyzer < ActiveStorage::Analyzer

  def self.accept?(blob)
    ['text/csv','text/plain'].include? blob.content_type
  end

  def metadata
    { ruby_is_awesome: true }
  end

end

This returns true given that the content_type of the uploaded file is either text/csv or text/plain. In a later evolution of this analyzer you would probably do a more complex check, but this will do for now.

On to the actual result of the analyzer. When our analyzer is picked as the one to provide metadata, ActiveStorage will call metadata on an instance of our class. I have already implemented a simple EdiFileParser class that can parse the headers of these files, so let’s utilize that in our analyzer:

class Analyzer::EdiAnalyzer < ActiveStorage::Analyzer

  def self.accept?(blob)
    ['text/csv','text/plain'].include? blob.content_type
  end

  def metadata
    parse_file do |edi_doc|
      {
        sender_gln: edi_doc.sender_gln,
        recipient_gln: edi_doc.recipient_gln,
        edi_standard: edi_doc.standard,
        edi_type: edi_doc.type
      }
    end
  end

  private

    def parse_file
      download_blob_to_tempfile do |file|
        edi_doc = EdiParser.new(file)
        if edi_doc.valid?
          yield edi_doc
        else
          logger.info "Skipping EDI analysis because it's not a known EDI document format"
        end
      end
    end

end

Quite a bit of code added there. While it should be pretty straightforward, let’s break it down some. When metadata is called on our instance, we make a call for the private method parse_file. If the call is successfull, and we get an object for our block, we use the object to return a hash of useful metadata.

Inside the parse_file private method, I call on download_blob_to_tempfile. This is a private method provided by ActiveStorage::Analyzer, that returns a Tempfile with the blob contents.

I then instantiate a copy of my EdiParser with this Tempfile. If the parser thinks it’s valid? I yield the instance of the parser back to my original block, and if it’s not, I log the result and fail silently. The logger method is also provided by ActiveStorage::Analyzer, and is an alias for ActiveStorage.logger.

Given a known format, my ActiveStorage attachment will now have a metadata hash like this:

{
  sender_gln: "1234567890",
  recipient_gln: "0987654321",
  edi_standard: "EFO/Nelfo 4.0",
  edi_type: "Invoice"
}

Happy hacking :)