Unix & Linux Asked on December 12, 2021
I have simple csv file that should contain only two non-empty fields as the following
This is example of right csv file
$ more file.csv
why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200
.
.
.
The target is to check if the csv file contains only two non-empty fields.
I start with the following awk to check if file has only two fields
awk 'BEGIN{FS=OFS=","} NF!=2{print "not enough fields" }' file.csv
But it does not give “not enough fields” in this example below, which is not OK.
why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,
Example of other wrong csv files:
why_we_need_help,,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200,
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200,
or
why_we_need_help log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,,
Inputfile
cat op.txt
why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200
look_on_the_room,
,ajay
Awk command
awk -F "," 'NF == "2" {print $0}' filename | sed "s/,/ /g"| sed -n '/s{2,}/!p'| awk '{gsub(" ",",",$0);print}'
output
why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200
Python
#!/usr/bin/python
import re
u=re.compile(r' {2,}')
k=open('filename','r')
for i in k:
q=re.sub(","," ",i)
if not re.search (u,q):
print q.replace(" ",",").strip()
output
why_we_need_help,log_low=53687091200
whats_is_going_on,log_high=1073741824
this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,log_low=53687091200
Answered by Praveen Kumar BS on December 12, 2021
Here's a small function that uses grep. Its exit code will be 0 when no lines are invalid and will be 1 if at least 1 line is invalid (in which case, the first invalid line is printed and processing is aborted - no further lines are checked).
The regexp used means at the beginning of the line, one or more characters that aren't a comma, followed by 1 comma, followed by one or more characters that aren't a comma, and then nothing else.
lines_are_valid() {
grep -E -m1 -v '^[^,]+,[^,]+$' && return 1 || return 0
}
How to use it:
cat myFile | lines_are_valid
More examples:
echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,'
| lines_are_valid
&& echo "All lines OK"
|| echo "Invalid line found, see above"
look_on_the_room,
Invalid line found, see above
echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,aaa'
| lines_are_valid
&& echo "All lines OK"
|| echo "Invalid line found, see above"
All lines OK
echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,,
also wrong,'
| lines_are_valid
&& echo "All lines OK"
|| echo "Invalid line found, see above"
look_on_the_room,,
Invalid line found, see above
echo 'this_is_caryze,log_low=53687091200
let_me_know_what_to_do,log_high=1073741824
look_on_the_room,,asdfasdf
also wrong,'
| lines_are_valid
&& echo "All lines OK"
|| echo "Invalid line found, see above"
look_on_the_room,,asdfasdf
Invalid line found, see above
In case you want to show all invalid lines:
show_all_invalid_lines() {
grep -E -v '^[^,]+,[^,]+$' && return 1 || return 0
}
Answered by Elifarley on December 12, 2021
I'm not sure but I THINK what you're looking for is:
awk -F',' 'NF!=2 || /^,|,$/{print "bad:", NR | "cat>&2"; exit 1}' file
which could be improved to report the specific error(s) on the line:
awk -F',' '
NF<2 { err="too few fields" }
NF>2 { err="too many fields" }
/^,|,$/ { err=(err == "" ? "" : err " and ") "empty fields" }
err != "" { print err, "at line", NR | "cat>&2"; exit 1 }
' file
or if you want all errors on all lines found at once:
awk -F',' '
NF<2 { err="too few fields" }
NF>2 { err="too many fields" }
/^,|,$/ { err=(err == "" ? "" : err " and ") "empty fields" }
err != "" { print err, "at line", NR | "cat>&2"; err=""; f=1 }
END { exit f }
' file
Answered by Ed Morton on December 12, 2021
Another awk option is
awk 'BEGIN{FS=OFS=","}NF!=2||$1==""||$2==""{print "Not enough fields";exit 5}' file.csv
It checks explicitly if any of the two fields is empty. If so, it prints the message and immediately exits with error code 5 (this number is arbitrary, choose the one you like most).
Answered by Quasímodo on December 12, 2021
Try this:
awk 'BEGIN{FS=OFS=","} f{skip} NF!=2||!length($1)||!length($2){f=1} END{if (f) {print "File contains malformed lines"; exit 1}}' file.csv
If will set a flag f
whenever a file doesn't contain two ,
-separated fields or any of the two required fields is empty. In the end, it prints a message if the flag was set while parsing the file, and exits with error code 1 (as per your request).
The first rule skips parsing the line if the flag was already set, to speed up the process, since you only want to know if there is any one malformed line - so once such a line was found, we know that the file is malformed and don't need to consider the rest of the file.
In case you want to know how many lines were malformed, this small change would print it:
awk 'BEGIN{FS=OFS=","} NF!=2||!length($1)||!length($2){f++} END{if (f) {printf("File contains %d malformed line(s)n",f); exit 1}}' file.csv
Answered by AdminBee on December 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP