Stack Overflow Asked by user6882757 on November 15, 2021
I have code below to add the data into elastic search
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
es.indices.create(index='my-index_1', ignore=400)
for e in enumerate(r):
#es.indices.update(index="my-index_1", body=e[1])
es.index(index="my-index_1", body=e[1])
#Retrieve the data
es.search(index = 'my-index_1')['hits']['hits']
Requirement
How to update the document
r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
Here Dr. Messi, Dr. Christiano
has to update the index and Dr. Bernard M. Aaron
should not update as it is already present in the index
In Elasticsearch, you when index data without giving a custom id, then a new id will be created by elasticsearch for every document you index.
Hence, in your case as you are not giving any id, elasticsearch gives it for you.
But you also want to check if Name
is already or not depending on which you will index the data. There are 2 possible solutions to this.
_id
for every document. After this you will have to search with Name
if the document exists._id
for every document. After this search with _id
. It's faster and easier approach.I'm going ahead with the 2nd approach of creating own id's. As you are search on Name
I'll create an based on Name
value field. The hash of the Name
value field is the _id
. I'll use md5. But you can use any other hashing function.
First Indexing Data:
import hashlib
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.cluster.health()
r = [{'Name': 'Dr. Christopher DeSimone', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Tajwar Aamir (Aamir)', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
index_name="my-index_1"
es.indices.create(index=index_name, ignore=400)
for e in enumerate(r):
#es.indices.update(index="my-index_1", body=e[1])
es.index(index=index_name, body=e[1],id=hashlib.md5(e[1]['Name'].encode()).hexdigest())
Output:
[{'_index': 'my-index_1',
'_type': '_doc',
'_id': '1164c423bc4e2fcb75697c3031af9ef1',
'_score': 1.0,
'_source': {'Name': 'Dr. Christopher DeSimone',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '672ae14197a135c39eab759be8b0597f',
'_score': 1.0,
'_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '85702447f9e9ea010054eaf0555ce79c',
'_score': 1.0,
'_source': {'Name': 'Dr. Bernard M. Aaron',
'Specialised and Location': 'Health'}}]
Next Step: Indexing new data
r = [{'Name': 'Dr. Messi', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'},
{'Name': 'Dr. Bernard M. Aaron', 'Specialised and Location': 'Health'}]
for rec in r:
try:
es.get(index=index_name, id=hashlib.md5(rec['Name'].encode()).hexdigest())
except NotFoundError:
print("Record Not found")
es.index(index=index_name, body=rec,id=hashlib.md5(rec['Name'].encode()).hexdigest())
Output:
[{'_index': 'my-index_1',
'_type': '_doc',
'_id': '1164c423bc4e2fcb75697c3031af9ef1',
'_score': 1.0,
'_source': {'Name': 'Dr. Christopher DeSimone',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '672ae14197a135c39eab759be8b0597f',
'_score': 1.0,
'_source': {'Name': 'Dr. Tajwar Aamir (Aamir)',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '85702447f9e9ea010054eaf0555ce79c',
'_score': 1.0,
'_source': {'Name': 'Dr. Bernard M. Aaron',
'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': 'e2e0f463145568471097ff027b18b40d',
'_score': 1.0,
'_source': {'Name': 'Dr. Messi', 'Specialised and Location': 'Health'}},
{'_index': 'my-index_1',
'_type': '_doc',
'_id': '23bb4f1a3a41efe7f4cab8a80d766708',
'_score': 1.0,
'_source': {'Name': 'Dr. Christiano', 'Specialised and Location': 'Health'}}]
As you can see Dr. Bernard M. Aaron
record is not indexed as it's already present
Answered by bigbounty on November 15, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP