When it comes to developing a search system for our projects, we often have to deal with complex situations: we might need to cross-search on more fields, in more than one language and maybe with full-text search
In this case, our approach is often called Elasticseach.
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time. It is generally used as the underlying engine/technology that powers applications that have complex search features and requirements.
ES is a powerful and complex system that manages to handle large amounts of data. As we might expect, it is not a plug-and-play solution so we rarely use it on his own, but luckily it is plenty of gems that we can use to ease our life.
One of these is called Chewy.
What is Chewy?
Chewy is an high-level framework based on the elasticsearch-ruby client.
Chewy simplifies how we manage an index (a collection of models that share similar attributes, which we can interrogate to get our search results).
Let's make some examples using our search system implemented for the Uffizi website:
class UffiziIndex < Chewy::Index
class << self
def index_name(_suggest = nil)
"#{Rails.env}_uffizi_#{I18n.locale}"
end
end
define_type(
Artwork.includes(:museum),
delete_if: -> { translation_for(I18n.locale).nil? }
) do
field :title, value: -> { title }
field :author, value: -> { author }
field :formatted_text, value: -> { formatted_text }
field :abstract_text, value: -> { abstract_text }
field :formatted_renovation, value: -> { formatted_renovation }
field :location, value: -> { location }
field :technique, value: -> { technique }
end
end
If we wanted to look for Botticelli's Venus we could easily look for "venus":
UffiziIndex::Artwork.all.query(word: {title: "venus"}).load.to_a
=> [#<Artwork:0x007fa66ec00a28
id: 3,
museum_id: 15,
title: "Birth of Venus",
author: "Sandro Botticelli (Florence 1445-1510) ",
position: 109>]
As we see, we receive only one result, the exact one, but what if there were more paintings with "Venus" in the title and we only wanted the painting by Botticelli?
In this case, we can concatenate multiple queries:
UffiziIndex::Artwork.all.query(word: {title: "venus"}).query(word: {author: "botticelli"}).load.to_a
=> [#<Artwork:0x007fa66ec00a28
id: 3,
museum_id: 15,
title: "Birth of Venus",
author: "Sandro Botticelli (Florence 1445-1510) ",
position: 109>]
The result remains the same, but we may have noticed that if we wanted to make more specific queries to filter the results, we will write several lines of code.
Is not there a smarter way to do it? Yes. ES provides multi_match
which allows you to do more searches on more fields, but let's see how it behaves:
UffiziIndex::Artwork.all.query(multi_match: {fields: [:title, :author], query: "venus botticelli"}).load.to_a
=> [#<Artwork:0x007fa66eada0b8
id: 3,
museum_id: 15,
title: "Birth of Venus",
author: "Sandro Botticelli (Florence 1445-1510) ",
position: 109>,
#<Artwork:0x007fa66eada360
id: 30,
museum_id: 6,
title: "Adoration of the Magi",
author: "Sandro Botticelli (Florence 1445-1510)",
position: 2>,
#<Artwork:0x007fa66eada1f8
id: 20,
museum_id: 11,
title: "Fortitude",
author: "Sandro Botticelli (Florence 1445 -1510)",
position: 2015>]
While we first wanted to filter Venus and only Botticelli's Venus, we now have all the paintings that include "Venus" in the title and all the authors that contain "Botticelli" in the name. The opposite has happened: we have an OR instead of an AND.
Surely there will be dozens of ways to achieve our goal with Chewy, and for this, we decided to create a helper that can make our life easier every time we use this Toptal's gem.
Welcome, Mastico!
Mastico helps us simplify the interface for building queries and provides us with a basic configuration of Chewy so that, once installed, we can immediately start doing our research!
chewy_query = UffiziIndex::Artwork.all
Mastico::Query.new(fields: [:title], query: "Venus").apply(chewy_query).load.to_a
=> [#<Artwork:0x007fa6668fc820
id: 3,
museum_id: 15,
title: "Birth of Venus",
author: "Sandro Botticelli (Florence 1445-1510) ",
position: 109>]
This first query is yes longer than that of Chewy, but our purpose is to cross multiple attributes, and then we try to see how it should be the query that returns only the Botticelli's Venus:
Mastico::Query.new(fields: [:title, :author], query: "Venus Botticelli").apply(chewy_query).load.to_a
=> [#<Artwork:0x007fa66a4281b0
id: 3,
museum_id: 15,
title: "Birth of Venus",
author: "Sandro Botticelli (Florence 1445-1510) ",
position: 109>]
We got what we wanted and the query is just slightly longer than the previous one, but the even better thing is that we just need to add another field in the fields
array to look inside more attributes.
Let's see in detail what is running when we launch this command, or what we should have done by hand with ES: https://gist.github.com/mttmanzo/a7ffb82c312a3b3f4d027fb681e49ef6.
Once fields
andquery
are passed, Mastico starts to concatenate the search with other values such as the type of search (:word
,:prefix
, :infix
and:fuzzy
) and the boost (how much we want to emphasize that word.)
This is enough to make us start to implement even complex research in a simple and fast way. But what if I was wrong to write the word?
Mastico manages this eventuality automatically with the fuzzy
type, so if we look for "Botticello" we would still find Botticelli's works.
Mastico::Query.new(fields: [:author], query: "Botticello").apply(chewy_query).load.to_a
=> [#<Artwork:0x007fa66e9a78f8
id: 30,
museum_id: 6,
title: "Adoration of the Magi"
author: "Sandro Botticelli (Florence 1445-1510)",
position: 2>,
#<Artwork:0x007fa66e9a7678
id: 3,
museum_id: 15,
title: "Birth of Venus",
author: "Sandro Botticelli (Florence 1445-1510) ",
position: 109>,
#<Artwork:0x007fa66e9a77b8
id: 20,
museum_id: 11,
title: "Fortitude",
author: "Sandro Botticelli (Florence 1445 -1510)",
position: 2015>]
Nice, but what if I want to filter the "stop-word"? There is a solution to this, just pass the word_weight
attribute to the query:
def word_weight(word)
case word
when "batter"
0.0
when /\Ab[ao]ttle\z/
0.0
else
1.0
end
end
Mastico::Query.new(fields: [:author], query: "Botticello", word_weight: method(:word_weight)).apply(chewy_query).load.to_a
The returned values represent the boost, which can also be used to emphasize the search for other keywords.
All these options may seem complex to match, but in reality, it is possible to concatenate them in a simple hash:
QUERY_FIELDS = {
title: { types: [:fuzzy] },
formatted_text: { types: [:word] },
author: { boost: 3.0, types: [:word] }, # we define both the type and the boost, only for this word.
abstract_text: { types: [:fuzzy] },
location: { types: [:infix] },
technique: { types: [:prefix] },
}.freeze
def matching_text_scope(text)
Mastico::Query.new(query: text, fields: QUERY_FIELDS).apply(UffiziIndex::Artwork.all)
end
The features we have just seen are of enormous help and help us daily in many of our projects, but we can't wait to get feedback from external users and more ideas to improve Mastico! Whoever would like to contribute, obviously, can do it here: https://github.com/cantierecreativo/mastico.