Skip to content
This repository has been archived by the owner on Nov 23, 2021. It is now read-only.

Mapping columns to attributes

Ernesto Garcia edited this page Feb 23, 2014 · 9 revisions

The main way in which an importer processes the contents of a spreadsheet is by mapping columns in it to attributes in the data model. Here we take a closer look at how to declare this mapping in an importer.

Importers are subclasses of ActiveImporter::Base. This base class provides a DSL for declaring several features of the importing process. One of the most important methods in this DSL is the column method. It receives a column header name, and an attribute name, and this establishes a mapping between the values in that spreadsheet column, and the corresponding attribute in the data model.

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column 'First name', :first_name
  column 'Last name', :last_name
end

As you can see in the example above, this importer expects to receive a spreadsheet with columns named 'First name' and 'Last name'. It will import all rows in such a spreadsheet, into the Employee data model, storing the values in the column 'First name' into the :first_name attribute of the model, and the values in the column 'Last name' into the :last_name attribute, creating one new model for each row processed successfully.

Processing the incoming value

We often need to process incoming data that has to be pre-processed before storing it in the corresponding attribute. Perhaps we need to trim excess whitespace off a string value, parse a string value converting it into some internal representation, etc. This can be achieved by providing a block to the column method, in which we process the incoming value and return the actual value to be stored in the corresponding attribute.

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column('First name', :first_name) do |first_name|
    first_name.capitalize
  end

  column 'Last name', :last_name, &:capitalize
end

You can see in the example above how this works. The block of code receives as argument the incoming value (the value coming from the spreadsheet in the corresponding column). Inside the block we process this value and return the actual value we want to store in our database model. In this case we're instructing the importer to capitalize each name (in case you didn't notice, both attributes are being processed in the same way, we're just using in the second example a more concise Ruby way to specify the transformation block).

Establishing associations

Most commonly, we might need to use an incoming value to search a corresponding record from another data model, and establish the association with the model being created. Blocks like the ones shown above can help us do that:

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column 'Department', :department do |department_name|
    Department.find_by(name: department_name)
  end

  # ...
end

You can see how in this example, we're using the incoming value (the department name) to look-up in the Department data model for a department with that name. In this case, we're setting department to nil if no department is found, but we could take different approaches, such as raising an error if no such department is found, or even dynamically creating new departments when the name does not match any existing one. This all depends on what you want for your applications.

Accessing data from other columns

Sometimes, while processing a column's incoming value, you need to check on values from other columns. It turns out that you have full access to the entire spreadsheet row inside these blocks of code. The importer exposes it via the row method, which is a hash with the keys being the column headers, and the values being the actual values for each column in the current row.

Suppose we're processing a spreadsheet of employees, and it comes with columns for first and last name. But our data model has a single column called :full_name, which should contain the concatenation of the two other names.

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column('First name', :full_name do |first_name|
    last_name = row['Last name']
    [first_name, last_name].compact.join(' ')
  end
end

Note how we're mapping the 'First name' column to the :full_name attribute, but inside the block we're using the row hash to access the last name and include it in the actual value that will be stored in the model.

Omitting the column attribute

Importer column declarations may omit the attribute and the block of code. This is useful when you have a column that will be processed inside a custom processing block, but you still need to inform the importer that this is a required column, so it can be used when scanning the spreadsheet to find the header row.

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column 'Tags'

  on :row_processing do
    tags = row['Tags'].split(',').map(&:strip).map(&:downcase)
    model.manager = !!tags.delete('manager')
    model.tag_ids = tags.map { |tag| Tag.where(name: tag).first_or_create }
  end
end

The importer above will not automatically assign the value from the 'Tags' column to any attribute, but will still consider this column name when searching for the header row.

Optional columns

Columns can be declared as optional. This lets the importer know that the spreadsheet may or may not include that column, and only processes it if present. The column name will not be used to match the row header.

class EmployeeImporter < ActiveImporter::Base
  imports Employee

  column 'Department', :department, optional: true do |department_name|
    Department.find_by(name: department_name)
  end

  # ...
end

The above importer will process spreadsheets as usual, but it will also be able to consume a spreadsheet not containing the 'Department' column. In that case, the block of code will never be invoked. It is important to note that if the :department attribute is required, then you still need to be sure it is assigned some value. Otherwise all rows will fail to be imported.

Helper methods

If logic inside these blocks ever gets too complex, you may need to do some refactoring. In that case remember you can use helper methods to DRY-up your importer code.