How to run grep at one file in another avoiding memory exaust issue?

Question

Have here two large text files, about 30mb each one, which would like to grep them one in another, as grep -f "file01.txt" "file02.txt" > file03.txt.
Doing so returns "memory exaust" error.
How could those files be compared disregarding alphabetic order?

John1024 · Answer

Unless your file01.txt contains actual regular expressions, try:
grep -Ff "file01.txt" "file02.txt" > file03.txt

-F tells grep to treat file01.txt as fixed strings, not regular expressions.  This will both greatly increase the speed and greatly reduce the memory requirements.
Regular Expressions
Alternatively, if your file01.txt really does contain regular expressions, you can split it into parts and apply grep to each part separately:
split -dn 10 "file01.txt" ./tmp-file01.
for f in ./tmp-file01.*; do grep -f "$f" "file02.txt"; done >file03.txt

The above splits file01.txt into 10 parts.  Depending on your available memory, you may need more than that.
If file01.txt does not have regexes, then use -F in the second line:
for f in ./tmp-file01.*; do grep -Ff "$f" "file02.txt"; done >file03.txt

Yfa Kolh · Answer

You can't - pattern must be loaded into grep and this exaust memory.
But if you want to compare files, why don't you simply use diff (after sorting the contents)?
For the one-line per pattern (like list of MD5s):
while read md5; do
    grep -w "$md5" file02.txt
done < file01.txt > file03.txt

This of course is much slower, especially with big file02.txt (when it doesn't fit into cache), but works for every size of the pattern file01.txt.

How to run grep at one file in another avoiding memory exaust issue?

2 Answers

Regular Expressions

Add your own answers!

Ask a Question