Reverse Engineering Asked by abathur on July 31, 2021
I’m cross-comparing a few approaches to testing for binaries that import a symbol and I noticed a YARA rule not finding one in sudo
that nm + grep could find.
I looked at it in xxd
to figure out why, but couldn’t find a match. This explains why the YARA rule misses, but leaves me with a new question: how are tools like nm or objdump discovering the symbol?
I checked other the other GLIBC symbols that nm reports to see how common this is, and found 5 symbols that didn’t match in the output of xxd: execve exit getpgrp sleep textdomain
. (I haven’t yet manually verified whether any of the others only fail to match because they’re split over a line break, but for this search I did run xxd at a width of 256 cols to minimize the likelihood).
I’m running something like:
nm --undefined $(type -p sudo)
xxd -c 40 $(type -p sudo)
Since this outputs a few thousand lines and there may be platform differences in the binary/commands, I went ahead made a GH repo for reference.
The recent updates to the Mach-O format (the LC_DYLD_INFO_ONLY
command) have an option of encoding the export symbol information as a trie structure. In such case it's possible that the symbol name does not appear as an exact string in the file.
However, ELF does not use such encoding so normally all symbols must be present as-is in the binary. What seems to happen in your case is that the "missing" symbols are substrings of other symbols with longer names, e.g.:
getpgrp
is a suffix of tcgetpgrp
execve
- of fexecve
exit
- of _exit
textdomain
- of bindtextdomain
There is no requirement that each symbol must be present as a separate string in the string table. The symbol record encodes an offset to a start of the string in string table and the dynamic linker simply uses the bytes until the next zero for matching. By reusing suffixes of other strings, the string table can be made smaller (often it is a huge contributor to the ELF file's size).
For example, here's the symbol entry for textdomain:
Elf64_Sym <offset aBindtextdomain+4 - offset unk_1DD0, 12h, 0, 0, ; "textdomain"
LOAD:0000000000000D38 offset dword_0, 0>
or
LOAD:0000000000000D38 dd offset aBindtextdomain+4 - offset unk_1DD0; st_name ; "textdomain"
LOAD:0000000000000D38 db 12h ; st_info
LOAD:0000000000000D38 db 0 ; st_other
LOAD:0000000000000D38 dw 0 ; st_shndx
LOAD:0000000000000D38 dq offset dword_0 ; st_value
LOAD:0000000000000D38 dq 0 ; st_size
As you can see, it points 4 bytes into the string for bindtextdomain
. This is perfectly legal and is a common optimization in compilers.
Discussion with the patch which added the feature to GNU ld.
Correct answer by Igor Skochinsky on July 31, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP