Part 1: Elasticsearch + Google Vision Labelling API

Part 1: Elasticsearch + Google Vision Labelling API

Sign up for regular updates on our research:

The goal is to allow a user to search a large, growing collection of images with keywords. This problem appears frequently in fashion e-tailers, where the user will search for terms like “red dress”, and see products that fit that description.

Currently, this is implemented by manual tagging of product listings, based on their descriptions, metadata, and product images. In fashion, where number of items are large, and the collections dynamic, the manual metadata creation process is expensive and not scalable.

Enter the Google Vision label recognition service. You can pass this service an image, and it will return with automated labels, plus a number that indicates the service’s confidence on the accuracy of that label.

Using the Google Vision service, we can then pre-process each image before putting it into your database, such that the labels from the service act as tags for the search engine.

You can enhance the usefulness of auto-tagging by replacing the Google service with a custom AI that learns from your existing product images and their manual tags. Give us a shout, and we would happily chat about how we can make that happen.

Technical Details

The idea is simple: we create an Elasticsearch (ES) index that uses nested object mappings to represent tags for a particular image. We use Google Vision API’s label recognition service to create the tags, storing also the confidence score for each tag. When we perform a search over the image corpus, we use a nested query coupled with a functional score query to find the most relevant images, using the confidence score from Google as a proxy to the relevance of any image to a keyword.

Creating the index

For Elasticsearch, we are running a fresh instance on Ubuntu 16.04. The ES server listens on localhost:9200. Let’s first create the index:

  • Line 1: we are PUTing a new index called ‘images’ into our ES instance via curl.
  • Line 3: define a mapping for the index for good order.
  • Line 8-13: matching the keys of the mapping to the keys of the JSON output from Google Vision API to reduce confusion and comments down the road.

Indexing an image

Let’s test this index out by putting in an image of the beautiful Kiyomizudera temple in Kyoto, Japan. I manually ran the image through the Google Vision service, and copied the JSON into our curl parameters.

ES should return success. Notes for the script:

  • Line 1: we PUT a new image into the ‘images’ index, specifying the ID to be 1 before asking ES to pretty print the output with ?pretty.
  • Everything else is just feeding a JSON dump that conforms to the mapping we setup above.

Now, you can manually upload all the images in your database to the Google Vision service, or even manually tag each one and feed it into ES. But let’s programmatically call the Google Vision API to batch process all your images.

Google Vision API + Python wrapper

Getting access to the API is actually a little clumsy, as you have to authenticate via the Cloud Platform, get billing access, and so forth. I would recommend not authenticating using an API key, but instead using the Application Default Credentials:


Next up, install the appropriate libraries for the language of your choice. I am using Python with pipenv to manage my virtual environments and packages in one go, and you need elasticsearch, and the Google Client Library.

When all is said and done, you should be able to import the google vision SDK into python, authenticate, and start labelling pictures:

All fairly simple, with a few notes:

  • Line 18: ImageAnnotatorClient is where the authentication happens, and it will try the GOOGLE_APPLICATION_CREDENTIALS environment variable to find your credentials.
  • Line 25: All you need to know about the response is that it's an object with a property tree that corresponds to the JSON output specified by the Google Vision API.
  • Line 28-35: Following our example in, we wrap thinly over the response object to create a dictionary for the es call.
  • Line 42: We use the elasticsearch package in Python to put the results of our Google Vision call into ES.

And that’s it!

Querying for images

Let’s try querying:

  • Line 4: declare a nested query.
  • Line 5: specify that the nested documents are under ‘labelAnnotations’. This is optional: if you only have one nested property in your mapping, it will automatically point to that one, as is the case for our toy ‘image’ index.
  • Line 8: we use a functional score to bring the confidence levels from the API into ES’ scoring mechanism.
  • Line 9: we use the query to match with the descriptions of the labels, which generates a relevance score between query and description.
  • Line 10-11: We use a field_value_vector to efficiently multiply (boost_mode) the query-description relevance with the confidence score of that level.

Hopefully this gave you a good idea of how to incorporate image data into your search engine. All the code is on Github. Comments are welcome!

Illuminate your business with intelligence


© 2017 High Dimension.


Icon pack by Icons8

Connect with us

+44 (0) 78181 96089
London Office
International House, 776-778 Barking Road, London, E13 9PJ