Unix & Linux Asked on December 6, 2021
I have a file with a string Global=x , where x is a number in between lines of text. I want to calculate the total number of repetitions of the number ‘x’ extracted from the string "Global=x". I don’t want the number of occurrences of each ‘x’ printed.
For example, if the input file is like
Global=33333
Global=33333
Global=33334
Global=33335
Global=33336
Global=33337
Global=33337
Global=33337
the output should be 2, as two numbers ‘33333’ and ‘33337’ are repeated (it does not matter how many times they are repeated).
I tried
grep -Po '(Global)=Kd+' file.dat | sort | uniq -c
but I get the frequency of occurrence of each number, which I don’t need:
2 33333
1 33334
1 33335
1 33336
3 33337
Any help will be appreciated, gre, awk and sed solutions are acceptable.
Using any awk in any shell on every UNIX box:
$ awk -F'=' '++cnt[$2] == 2{ dups++ } END{print dups+0}' file
2
If you do need to check for Global
then:
$ awk -F'=' '($1 == "Global") && (++cnt[$2] == 2){ dups++ } END{print dups+0}' file
2
The +0
in the END is to ensure you get numeric output (0
instead of a null string) even if there are no dups in the input.
Answered by Ed Morton on December 6, 2021
You could change uniq -c
to uniq -d
:
$ grep -Po '(Global)=Kd+' file.dat | sort | uniq -d
33333
33337
-d
prints only duplicated lines. A further pipe to wc -l
could count those lines. Also note that both -P
& -o
options to grep are non-standard, so will not be available in every version of grep
.
Answered by guest on December 6, 2021
To get a list of numbers that are repeated and eliminate all extra processes:
$ awk -F= '$1=="Global"{c[$2]++} END{for (num in c) if(c[num]>1)print num}' file.dat
33333
33337
The above code uses =
as a field separator. If the first field is Global
, then we keep track in associative array c
of the number of times that the second field, $2
, has appeared in the file.
After the file has been read completely, we look through array c
and print all numbers which had a count larger than 1.
As proposed by glenn jackman in the comments, we could simply print the number on its second appearance:
$ awk -F= '++c[$2] == 2 {print $2}' file.dat
33333
33337
Answered by John1024 on December 6, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP