TransWikia.com

How do disassemblers work?

Reverse Engineering Asked by user33834 on September 30, 2021

I have 2 questions regarding this, but feel free to elaborate more if you want to, I’m really interested in this topic:

  1. Do they really just read byte by byte until they get a valid instruction? How do they know if it’s a valid instruction and which it is? I don’t imagine they just had every single instruction stored in a table as that’d be very inefficient.

  2. Found this source online that I think does this in around 700 lines (https://github.com/btbd/disassembler/blob/master/disassembler.c). If something like this is possible, why are there others that have a lot more code and logic?

Thanks!

One Answer

The 2 main approaches to disassembly are

  1. Linear sweep - decode all bytes appearing in sections of the executable that are typically reserved for machine code (e.g. the .text section of an ELF binary) as machine code
  2. Recursive traversal - take into account the control flow behavior of the program being disassembled in order to determine what to disassemble

Each has its advantages and disadvantages. More information can be found in Disassembly of Executable Code Revisited.

An exploration of what is involved in developing a disassembler is given in the following research presentation:

The (Long) Journey To A Multi-Architecture Disassembler

Answered by julian on September 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP