How to update the elastic search document with python?

Question

I have code below to add the data into elastic search
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
 {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
es.indices.create(index='my-index_1', ignore=400)

for e in enumerate(r):
    #es.indices.update(index="my-index_1", body=e[1])
    es.index(index="my-index_1", body=e[1])

#Retrieve the data
es.search(index = 'my-index_1')['hits']['hits']

Requirement
How to update the document
r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

Here Dr. Messi, Dr. Christiano has to update the index and Dr. Bernard M. Aaron should not update as it is already present in the index

bigbounty · Answer

In Elasticsearch, you when index data without giving a custom id, then a new id will be created by elasticsearch for every document you index.
Hence, in your case as you are not giving any id, elasticsearch gives it for you.
But you also want to check if Name is already or not depending on which you will index the data. There are 2 possible solutions to this.

Index the data without passing an _id for every document. After this you will have to search with Name if the document exists.
Index the data with your own _id for every document. After this search with _id. It's faster and easier approach.

I'm going ahead with the 2nd approach of creating own id's. As you are search on Name I'll create an based on Name value field. The hash of the Name value field is the _id. I'll use md5. But you can use any other hashing function.
First Indexing Data:
import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)

for e in enumerate(r):
    #es.indices.update(index="my-index_1", body=e[1])
    es.index(index=index_name, body=e[1],id=hashlib.md5(e[1]['Name'].encode()).hexdigest())

Output:
[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}}]

Next Step: Indexing new data
r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
     {'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]

for rec in r:
    try:
        es.get(index=index_name, id=hashlib.md5(rec['Name'].encode()).hexdigest())
    except NotFoundError:
        print("Record Not found")
        es.index(index=index_name, body=rec,id=hashlib.md5(rec['Name'].encode()).hexdigest())

Output:
[{'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '1164c423bc4e2fcb75697c3031af9ef1',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christopher DeSimone',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '672ae14197a135c39eab759be8b0597f',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '85702447f9e9ea010054eaf0555ce79c',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Bernard M. Aaron',
   'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': 'e2e0f463145568471097ff027b18b40d',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
 {'_index': 'my-index_1',
  '_type': '_doc',
  '_id': '23bb4f1a3a41efe7f4cab8a80d766708',
  '_score': 1.0,
  '_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]

As you can see Dr. Bernard M. Aaron record is not indexed as it's already present

How to update the elastic search document with python?

One Answer

Add your own answers!

Ask a Question