Sign up for regular updates on our research:
The goal is to allow a user to search a large, growing collection of images with keywords. This problem appears frequently in fashion e-tailers, where the user will search for terms like “red dress”, and see products that fit that description.
Currently, this is implemented by manual tagging of product listings, based on their descriptions, metadata, and product images. In fashion, where number of items are large, and the collections dynamic, the manual metadata creation process is expensive and not scalable.
Enter the Google Vision label recognition service. You can pass this service an image, and it will return with automated labels, plus a number that indicates the service’s confidence on the accuracy of that label.
Using the Google Vision service, we can then pre-process each image before putting it into your database, such that the labels from the service act as tags for the search engine.
You can enhance the usefulness of auto-tagging by replacing the Google service with a custom AI that learns from your existing product images and their manual tags. Give us a shout, and we would happily chat about how we can make that happen.
The idea is simple: we create an Elasticsearch (ES) index that uses nested object mappings to represent tags for a particular image. We use Google Vision API’s label recognition service to create the tags, storing also the confidence score for each tag. When we perform a search over the image corpus, we use a nested query coupled with a functional score query to find the most relevant images, using the confidence score from Google as a proxy to the relevance of any image to a keyword.
For Elasticsearch, we are running a fresh instance on Ubuntu 16.04. The ES server listens on localhost:9200. Let’s first create the index:
PUTing a new index called ‘images’ into our ES instance via
Let’s test this index out by putting in an image of the beautiful Kiyomizudera temple in Kyoto, Japan. I manually ran the image through the Google Vision service, and copied the JSON into our curl parameters.
ES should return success. Notes for the script:
imageinto the ‘images’ index, specifying the ID to be
1before asking ES to pretty print the output with
Now, you can manually upload all the images in your database to the Google Vision service, or even manually tag each one and feed it into ES. But let’s programmatically call the Google Vision API to batch process all your images.
Getting access to the API is actually a little clumsy, as you have to authenticate via the Cloud Platform, get billing access, and so forth. I would recommend not authenticating using an API key, but instead using the Application Default Credentials:
Next up, install the appropriate libraries for the language of your choice. I am using Python with
pipenv to manage my virtual environments and packages in one go, and you need
elasticsearch, and the Google Client Library.
When all is said and done, you should be able to import the google vision SDK into python, authenticate, and start labelling pictures:
All fairly simple, with a few notes:
ImageAnnotatorClientis where the authentication happens, and it will try the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable to find your credentials.
put_image.sh, we wrap thinly over the response object to create a dictionary for the
elasticsearchpackage in Python to put the results of our Google Vision call into ES.
And that’s it!
Let’s try querying:
‘labelAnnotations’. This is optional: if you only have one
nestedproperty in your mapping, it will automatically point to that one, as is the case for our toy ‘image’ index.
field_value_vectorto efficiently multiply (
boost_mode) the query-description relevance with the confidence score of that level.
Hopefully this gave you a good idea of how to incorporate image data into your search engine. All the code is on Github. Comments are welcome!