TransWikia.com

Identify strings shared between multiple files from the Linux command line

Reverse Engineering Asked by recvfrom on March 24, 2021

Given a set of arbitrary files, what’s the best way to identify the text strings shared between them (either in all files or a subset of them) from the Linux command line?

This would be useful for quickly identifying ways to write Yara rules for clusters of similar malicious files (for instance, malicious executables).

One Answer

Here's one approach, for malicious files in a directory named malware:

find malware/ -type f | xargs -n1 -P1 -I{} sh -c 'strings {} | sort | uniq' | sort | uniq -c | sort -n

The output will look something like the following, where the first number on each line is the number of files containing the string:

      ...
      1 Sleep
      ...
      2 JFIF
      2 SetBkColor
      ...
      5 !This program cannot be run in DOS mode.
      5 t@PW
      5 @tVH
      ...

One useful variation of this when the input files are Windows executables is using strings -el instead of strings, which will cause UTF-16 little-endian strings (also known as wide character strings) to be shown.

Answered by recvfrom on March 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP