Unix & Linux Asked on November 23, 2021
Sometimes there are a few really annoying lines in otherwise tabular data like
column name | other column name
-------------------------------
I generally prefer removing garbage lines that shouldn’t be there by grep -v
ing a reasonably unique string, but the problem with that approach is that if the reasonably unique string appears in the data by accident that’s a serious problem.
Is there a way to limit the number of lines that grep -v
can remove (say to 1)? For bonus points, is there a way to count the number of lines from the end without resorting to <some command> | tac | grep -v <some stuff> | tac
?
Another possible solution is to use bashs own utilities:
count=1
found=0
cat execute-commons-fileupload.sh | while read line
do
if [[ $line == *"myPattern"* ]]
then
if [ $found -eq $count ]
then
echo "$line"
else
found=$(($found+1))
fi
else
echo "$line"
fi
done
By setting count, you can change the count of occurences of your pattern which you want to remove.
For me personally, this seems to be easier extendable, since you can easily add other conditions to the if
statement (but this might be caused by my marginal knowledge of sed).
Answered by David Georg Reichelt on November 23, 2021
To do this you might have to use awk.
The simple way I know is this:
cat file | awk '{ $1=""; print}'
You can skip multiple columns too:
cat file | awk '{ $1=$2=$3=""; print}'
If you want to skip the last column and you're not sure how much columns you will have:
cat file | awk '{ $NF=""; print}'
Tested on Ubuntu 16.04 (GNU bash, version 4.3.48)
Best.
Answered by Peycho Dimitrov on November 23, 2021
You could use awk
to ignore the first n lines that match (e.g. assuming you wanted to remove only the 1st and 2nd match from the file):
n=2
awk -v c=$n '/PATTERN/ && i++ < c {next};1' infile
To ignore the last n lines that match:
awk -v c=${lasttoprint} '!(/PATTERN/ && NR > c)' infile
where ${lasttoprint}
is the line number of the n
th+1 to last match in your file. There are various ways to get that line no. (e.g. print only the line number for each match via tools like sed
/awk
, then tail | head
to extract it)... here's one way with gnu awk
:
n=2
lasttoprint=$(gawk -v c=$((n+1)) '/PATTERN/{x[NR]};
END{asorti(x,z,"@ind_num_desc");{print z[c]}}' infile)
Answered by don_crissti on November 23, 2021
Perhaps reduce the chances of filtering out your data by using a more accurate grep command. For example:
grep -v -F -x 'str1'
For lines that are exatctly str1. Or maybe:
grep -v '^str1.*str2$'
For lines that start with 'str1' and end with 'str2'.
Answered by ifb on November 23, 2021
sed
provides a simpler way:
... | sed '/some stuff/ {N; s/^.*n//; :p; N; $q; bp}' | ...
This way you delete first occurrence.
If you want more:
sed '1 {h; s/.*/iiii/; x}; /some stuff/ {x; s/^i//; x; td; b; :d; d}'
, where count of i
is count of occurrences (one or more, not zero).
sed '1 {
# Save first line in hold buffer, put `i`s to main buffer, swap buffers
h
s/^.*$/iiii/
x
}
# For regexp what we finding
/some stuff/ {
# Remove one `i` from hold buffer
x
s/i//
x
# If successful, there was `i`. Jump to `:d`, delete line
td
# If not, process next line (print others).
b
:d
d
}'
Probably, this variant will work faster, 'cos it reads all rest lines and print them in one time
sed '1 {h; s/.*/ii/; x}; /a/ {x; s/i//; x; td; :print_all; N; $q; bprint_all; :d; d}'
You can put this code into your .bashrc
(or config of your shell, if it is other):
dtrash() {
if [ $# -eq 0 ]
then
cat
elif [ $# -eq 1 ]
then
sed "/$1/ {N; s/^.*n//; :p; N; $q; bp}"
else
count=""
for i in $(seq $1)
do
count="${count}i"
done
sed "1 {h; s/.*/$count/; x}; /$2/ {x; s/i//; x; td; :print_all; N; $q; bprint_all; :d; d}"
fi
}
And use it this way:
# Remove first occurrence
cat file | dtrash 'stuff'
# Remove four occurrences
cat file | dtrash 4 'stuff'
# Don't modify
cat file | dtrash
Answered by ValeriyKr on November 23, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP