Database Administrators Asked by Revolucion for Monica on December 19, 2020
I have several csv files on university courses that all seem linked by an ID, that you can find here, and I wondered how to put them on Elasticsearch. I know, thanks to this video and Logstash, how to insert one sole file csv file to Elasticsearch. But do you know how to insert several such as those in the provided link ?
At the moment I started with a first .config
file for a first csv file : ACCREDITATION.csv
. But it would be painful to write them all…
The .config
file is :
input{
file{
path =>"Users/mike/Data/ACCREDITATION.csv"
start_position => "begining"
sincedb_path => "/dev/null"
}
}
filter{
csv{
separator => ","
columns => ['PUBUKPRN', 'UKPRN', 'KISCOURSEID', 'KISMODE', 'ACCTYPE', 'ACCDEPEND', 'ACCDEPENDURL', 'ACCDEPENDURLW']
}
mutate{convert => ["PUBUKPRN","integer"]}
mutate{convert => ["UKPRN","integer"]}
mutate{convert => ["KISMODE","integer"]}
mutate{convert => ["ACCTYPE","integer"]}
mutate{convert => ["ACCDEPEND","integer"]}
}
output{
elasticsearch{
hosts =>"localhost"
index =>"accreditation"
document_type =>"accreditaiton keys"
}
stdout{}
}
Without knowing how to use a .config
file to implement csv files to Elasticsearch, I fell back to Elastic blog and tried to do a shell script importSVFiles
for a first .csv
file before trying to generalize the approach :
#!/bin/bash
while read f1
do
curl -XPOST 'https://XXX.us-east-1.aws.found.io:9243/courses/accreditation' -H "Content-Type: application/json" -u elastic:XXX -d "{ "accreditation": "$f1" }"
done < AccreditationByHep.csv
Yet I received a mapper_parsing_exception
on the terminal :
mike@mike-thinks:~/Data/on_2018_04_25_16_43_17$ ./importCSVFiles
{"error":{"root_cause":
[{"type":"mapper_parsing_exception","reason":"failed to parse"}],
"type":"mapper_parsing_exception",
"reason":"failed to parse",
"caused_by":{"type":"i_o_exception","reason":"Illegal unquoted character ((CTRL-CHAR, code 13)):
has to be escaped using backslash to be included in string valuen at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@e18584; line: 1, column: 88]"}
},"status":400
}
I just had a look at the data in the Higher Education Statistics Agency (HESA) zipped file and the files are all different.
This means you will either have to create an individual .config
file for each import or create a single .config
file using conditions as described in the following article:
Reference: How to use multiple csv files in logstash (Elastic Discuss Forum)
Expanding on your first .config
by one level:
input{
file{
path =>"Users/mike/Data/ACCREDITATION.csv"
start_position => "begining"
sincedb_path => "/dev/null"
}
file{
path =>"Users/mike/Data/ACCREDITATION.csv"
start_position => "begining"
sincedb_path => "/dev/null"
}
}
filter{
# added condition for first file
if [path] == "Users/mike/Data/ACCREDITATION.csv"{
csv{
separator => ","
columns => ['PUBUKPRN', 'UKPRN', 'KISCOURSEID', 'KISMODE', 'ACCTYPE', 'ACCDEPEND', 'ACCDEPENDURL', 'ACCDEPENDURLW']
}
mutate{convert => ["PUBUKPRN","integer"]}
mutate{convert => ["UKPRN","integer"]}
mutate{convert => ["KISMODE","integer"]}
mutate{convert => ["ACCTYPE","integer"]}
mutate{convert => ["ACCDEPEND","integer"]}
}
# added condition for second file
else if [path] == "Users/mike/Data/AccreditationByHep.csv"{
csv{
separator => ","
columns => ['AccreditingBodyName', 'AccreditionType', 'HEP', 'KisCourseTitle', 'KiscourseID']
}
# ommitted mutations for second file
}
}
output{
# added condition for first file
if [path] == "Users/mike/Data/ACCREDITATION.csv"{
elasticsearch{
hosts =>"localhost"
index =>"accreditation"
document_type =>"accreditaiton keys"
}
}
# added condition for second file
else if [path] == "Users/mike/Data/AccreditationByHep.csv"{
elasticsearch{
hosts =>"localhost"
index =>"accreditationByHep"
document_type =>"accreditaitonbyhep keys"
}
}
stdout{}
}
document_type
is a deprecated configuration option
You should be able to expand on this example on your own.
Answered by John aka hot2use on December 19, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP