Find Total Number of Repetetions of numbers in a file

Question

I have a file with a string Global=x , where x is a number in between lines of text. I want to calculate the total number of repetitions of the number 'x' extracted from the string "Global=x". I don't want the number of occurrences of each 'x' printed.
For example, if the input file is like
Global=33333
Global=33333
Global=33334
Global=33335
Global=33336
Global=33337
Global=33337
Global=33337

the output should be 2, as two numbers '33333' and '33337' are repeated (it does not matter how many times they are repeated).
I tried
grep -Po '(Global)=Kd+' file.dat | sort | uniq -c

but I get the frequency of occurrence of each number, which I don't need:
2 33333
1 33334
1 33335
1 33336
3 33337

Any help will be appreciated, gre, awk and sed solutions are acceptable.

Ed Morton · Answer

Using any awk in any shell on every UNIX box:
$ awk -F'=' '++cnt[$2] == 2{ dups++ } END{print dups+0}' file
2

If you do need to check for Global then:
$ awk -F'=' '($1 == "Global") && (++cnt[$2] == 2){ dups++ } END{print dups+0}' file
2

The +0 in the END is to ensure you get numeric output (0 instead of a null string) even if there are no dups in the input.

guest · Answer

You could change uniq -c to uniq -d:
$ grep -Po '(Global)=Kd+' file.dat | sort | uniq -d
33333
33337

-d prints only duplicated lines. A further pipe to wc -l could count those lines. Also note that both -P & -o options to grep are non-standard, so will not be available in every version of grep.

John1024 · Answer

To get a list of numbers that are repeated and eliminate all extra processes:
$ awk -F= '$1=="Global"{c[$2]++} END{for (num in c) if(c[num]>1)print num}' file.dat
33333
33337

The above code uses = as a field separator.  If the first field is Global, then we keep track in associative array c of the number of times that the second field, $2, has appeared in the file.
After the file has been read completely, we look through array c and print all numbers which had a count larger than 1.
Shorter version
As proposed by glenn jackman in the comments, we could simply print the number on its second appearance:
$ awk -F= '++c[$2] == 2 {print $2}' file.dat
33333
33337

Find Total Number of Repetetions of numbers in a file

3 Answers

Shorter version

Add your own answers!

Ask a Question