Mastico, the new gem by Lean Panda!

Writing simple Elasticsearch queries can often be tedious and complex, using Chewy can help but it increases the difficulty of keeping our code DRY. We would love to show you our solution to make t work easier.

Estimate reading 5 minutes

When it comes to developing a search system for our projects, we often have to deal with complex situations: we might need to cross-search on more fields, in more than one language and maybe with full-text search

In this case, our approach is often called Elasticseach.

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.

ES is a powerful and complex system that manages to handle large amounts of data. As we might expect, it is not a plug-and-play solution so we rarely use it on his own, but luckily it is plenty of gems that we can use to ease our life.

One of these is called Chewy.

What is Chewy?

Chewy is an high-level framework based on the elasticsearch-ruby client.

Chewy simplifies how we manage an index (a collection of models that share similar attributes, which we can interrogate to get our search results).

Let's make some examples using our search system implemented for the Uffizi website:

class UffiziIndex < Chewy::Index
  class << self
    def index_name(_suggest = nil)
      "#{Rails.env}_uffizi_#{I18n.locale}"
    end
  end

  define_type(
    Artwork.includes(:museum),
    delete_if: -> { translation_for(I18n.locale).nil? }
  ) do
    field :title, value: -> { title }
    field :author, value: -> { author }
    field :formatted_text, value: -> { formatted_text }
    field :abstract_text, value: -> { abstract_text }
    field :formatted_renovation, value: -> { formatted_renovation }
    field :location, value: -> { location }
    field :technique, value: -> { technique }
  end
end

If we wanted to look for Botticelli's Venus we could easily look for "venus":

UffiziIndex::Artwork.all.query(word: {title: "venus"}).load.to_a

=> [#<Artwork:0x007fa66ec00a28
  id: 3,
  museum_id: 15,
  title: "Birth of Venus",
  author: "Sandro Botticelli (Florence 1445-1510) ",
  position: 109>]

As we see, we receive only one result, the exact one, but what if there were more paintings with "Venus" in the title and we only wanted the painting by Botticelli?

In this case, we can concatenate multiple queries:

UffiziIndex::Artwork.all.query(word: {title: "venus"}).query(word: {author: "botticelli"}).load.to_a

=> [#<Artwork:0x007fa66ec00a28
  id: 3,
  museum_id: 15,
  title: "Birth of Venus",
  author: "Sandro Botticelli (Florence 1445-1510) ",
  position: 109>]

The result remains the same, but we may have noticed that if we wanted to make more specific queries to filter the results, we will write several lines of code.

Is not there a smarter way to do it? Yes. ES provides multi_match which allows you to do more searches on more fields, but let's see how it behaves:

UffiziIndex::Artwork.all.query(multi_match: {fields: [:title, :author], query: "venus botticelli"}).load.to_a

=> [#<Artwork:0x007fa66eada0b8
  id: 3,
  museum_id: 15,
  title: "Birth of Venus",
  author: "Sandro Botticelli (Florence 1445-1510) ",
  position: 109>,
 #<Artwork:0x007fa66eada360
  id: 30,
  museum_id: 6,
  title: "Adoration of the Magi",
  author: "Sandro Botticelli (Florence 1445-1510)",
  position: 2>,
 #<Artwork:0x007fa66eada1f8
  id: 20,
  museum_id: 11,
  title: "Fortitude",
  author: "Sandro Botticelli (Florence 1445 -1510)",
  position: 2015>]

While we first wanted to filter Venus and only Botticelli's Venus, we now have all the paintings that include "Venus" in the title and all the authors that contain "Botticelli" in the name. The opposite has happened: we have an OR instead of an AND.

Surely there will be dozens of ways to achieve our goal with Chewy, and for this, we decided to create a helper that can make our life easier every time we use this Toptal's gem.

Welcome, Mastico!

Mastico helps us simplify the interface for building queries and provides us with a basic configuration of Chewy so that, once installed, we can immediately start doing our research!

chewy_query = UffiziIndex::Artwork.all

Mastico::Query.new(fields: [:title], query: "Venus").apply(chewy_query).load.to_a

=> [#<Artwork:0x007fa6668fc820
  id: 3,
  museum_id: 15,
  title: "Birth of Venus",
  author: "Sandro Botticelli (Florence 1445-1510) ",
  position: 109>]

This first query is yes longer than that of Chewy, but our purpose is to cross multiple attributes, and then we try to see how it should be the query that returns only the Botticelli's Venus:

Mastico::Query.new(fields: [:title, :author], query: "Venus Botticelli").apply(chewy_query).load.to_a

=> [#<Artwork:0x007fa66a4281b0
  id: 3,
  museum_id: 15,
  title: "Birth of Venus",
  author: "Sandro Botticelli (Florence 1445-1510) ",
  position: 109>]

We got what we wanted and the query is just slightly longer than the previous one, but the even better thing is that we just need to add another field in the fields array to look inside more attributes.

Let's see in detail what is running when we launch this command, or what we should have done by hand with ES: https://gist.github.com/mttmanzo/a7ffb82c312a3b3f4d027fb681e49ef6.

Once fields andquery are passed, Mastico starts to concatenate the search with other values such as the type of search (:word,:prefix, :infix and:fuzzy) and the boost (how much we want to emphasize that word.)

This is enough to make us start to implement even complex research in a simple and fast way. But what if I was wrong to write the word?

Mastico manages this eventuality automatically with the fuzzy type, so if we look for "Botticello" we would still find Botticelli's works.

Mastico::Query.new(fields: [:author], query: "Botticello").apply(chewy_query).load.to_a

=> [#<Artwork:0x007fa66e9a78f8
  id: 30,
  museum_id: 6,
  title: "Adoration of the Magi"
  author: "Sandro Botticelli (Florence 1445-1510)",
  position: 2>,
 #<Artwork:0x007fa66e9a7678
  id: 3,
  museum_id: 15,
  title: "Birth of Venus",
  author: "Sandro Botticelli (Florence 1445-1510) ",
  position: 109>,
 #<Artwork:0x007fa66e9a77b8
  id: 20,
  museum_id: 11,
  title: "Fortitude",
  author: "Sandro Botticelli (Florence 1445 -1510)",
  position: 2015>]

Nice, but what if I want to filter the "stop-word"? There is a solution to this, just pass the word_weight attribute to the query:

def word_weight(word)
  case word
  when "batter"
    0.0
  when /\Ab[ao]ttle\z/
    0.0
  else
    1.0
  end
end

Mastico::Query.new(fields: [:author], query: "Botticello", word_weight: method(:word_weight)).apply(chewy_query).load.to_a

The returned values represent the boost, which can also be used to emphasize the search for other keywords.

All these options may seem complex to match, but in reality, it is possible to concatenate them in a simple hash:

  QUERY_FIELDS = {
    title:                { types: [:fuzzy] },
    formatted_text:       { types: [:word] },
    author:               { boost: 3.0, types: [:word] }, # we define both the type and the boost, only for this word.
    abstract_text:        { types: [:fuzzy] },
    location:             { types: [:infix] },
    technique:            { types: [:prefix] },
  }.freeze

  def matching_text_scope(text)
    Mastico::Query.new(query: text, fields: QUERY_FIELDS).apply(UffiziIndex::Artwork.all)
  end

The features we have just seen are of enormous help and help us daily in many of our projects, but we can't wait to get feedback from external users and more ideas to improve Mastico! Whoever would like to contribute, obviously, can do it here: https://github.com/cantierecreativo/mastico.

Matteo Manzo

Developer

With a background as C# and JS programmer, Matteo's goal is to be a full-stack developer! He loves to play with 2/3D animations and wishes to learn as much as he can!