TransWikia.com

Why can't you edit pseudo code?

Reverse Engineering Asked by Spyindabox on April 20, 2021

Dabbler in re here, so potentially a stupid question…
I know ida, binary ninja, and ghidra are really powerful at generating pseudo code.
From everything I’ve read pseudo code can’t be edited in realtime but can be edited as assembly, I was just wondering why you couldn’t do a similar process to the one below?

  1. decompile exe to fake code

  2. make code template for compiling in c or c++

  3. load fake code into template

  4. compile basic template and stop at assembly code generation

  5. copy assembly from halfway compiled c++ exe

  6. auto replace halfway compiled code as assembly to ida assembly code

Alternatively,

  1. decompile exe to fake code

  2. make code template for compiling in c or c++

  3. load fake code into template

  4. compile basic template and stop at assembly code generation

  5. load the function into ida or such with pdb

  6. generate fake code for your template exe

  7. go to the function and get the assembly from there to copy and replace

I know this method wouldn’t be fast… but I am surprised no one has tried a method like this as far as I’m aware? Am I just missing something obvious?

3 Answers

From everything I've read pseudo code can't be edited in realtime but can be edited as assembly

This is not entirely correct. Quite the opposite even: Decompilers cannot be perfect (the compilation step looses too much information). Hence they need some help by a human (the reverse engineer). Giving this help is, at least in my opinion, the most important step during reverse engineering: get data types right. Sure, renaming variables helps a lot with readability, but the decompiler doesn't really need it. Changing the type of a variable or function though, feeds information back into the decompiler, which then can run another pass and improve the result. This result can then again further be improved by the human. If I would need to name the most important step during static binary reverse engineering, I would call out this cycle: decompile, re-type, repeat.

Now that we got this out of the way, I'll try to address the suggested steps you propose: I am not entirely sure I understand what you mean but I think a huge problem arises in step 2/4:

make code template for compiling in c or c++

code generated by a decompiler is not really C/C++ code. Formally, it only qualifies as pseudo-code that has a C-like syntax. The differences between valid C and "decompiler C" depends on your decompiler of course (Hex-Rays, Binary Ninja, Ghidra), but to give a simple examples (there are more, many of which are far more serious): If Ghidra's decompiler is not sure what data type a given variable is, it will assign the "type" undefined. This is not a valid data type in C of course, and hence cannot be compiled into an executable (i.e. step 4 fails).

Answered by born on April 20, 2021

for editing in every owning pseudo code do think psychology corelations of it and then you get the human meaning for those who didnt change indetity based on it only for who didnt read

but if code is not accessible (no pseudo code) you can make a dictionary of character to binary and to its functions on the cpu that have high performance on java or python(slow)

Answered by Noam lima on April 20, 2021

@born brings up some great points, but I do think it's definitely worth saying there's not much inherently impossible about the whole idea. Compiling and grabbing assembly is likely not the best bet, though.

Passing the entire thing off as impossible is just not right. IDA clearly has potential in the area; Select psuedocode and click "Copy to Assembly". It'll generate comments in the assembly that map it to where the psuedocode functions come from.

Here's a comparison of the three different relevant phases of a program; source, psuedocode, and ASM:

Source (clang -w -o test) :

int main(void)
{
  printf("hello world");
}

Note the incorrect, but functional, use of printf("string") instead of printf("%s", "string"). This is another debate, but it'll screw up decompilation

Decompilation By IDA (psuedocode):

int __cdecl main(int argc, const char **argv, const char **envp)
{
  printf("hello world", argv, envp);
  return 0;
}

This is just wrong. printf won't accept those values, it's expecting 0 extra arguments due to a lack of formatting "%s" strings in "hello world". A simple mistake has screwed up the psuedocode output.

Disassembly by IDA (note some of these instructions might not be right)

push    rbp
mov     rbp, rsp
sub     rsp, 10h
; 2:   printf("hello world", argv, envp);
lea     rdi, aHelloWorld ; "hello world"
mov     al, 0
call    _printf
; 3:   return 0;
xor     ecx, ecx
mov     [rbp+var_4], eax
mov     eax, ecx
add     rsp, 10h
pop     rbp
retn

Let's say you wanted to edit the string:
Sure, just edit the place it references. Oh, but you want one longer than 11 characters, so you'll need to find somewhere unused and map the string pointer to that address instead. That's complicated.

The entire executable section of the program is 12 instructions long, too. You have almost no space to change anything, and adding stuff is an entirely different ballgame.

Likely Reasons it hasn't been done

  • One huge hangup is how unreliable psuedocode can be at times. Compare Hopper psuedocode to IDA/Ghidra's sometime for a great example. It's an educated guess, not a reliable one. Some don't even create variables, and trying to compile Hopper psuedocode is a waste of time.
  • Most people needing to patch a binary want or need to patch the assembly. ASM works differently than C, and when you're patching, you need to be thinking more about how the assembly works than the C code that was used to create it.
  • Most of the decompilers I know of are already fairly bad at patching assembly alone. An extremely basic hex editor does a much better job. IDA will give you a headache trying to patch more than 4 bytes

I don't think it's anywhere near impossible though. Not by compiling for sure. But what you could try, is something like these:

  • Track how the decompiler was able to map <x assembly> to <y psuedocode>, and whenever <y psuedocode> is changed, create binary patches for the <x assembly> that created it.

    • This is arguably the "best" way to do it and will take a long time to write.
  • Replace a function call with a branch to your own code elsewhere (assuming space can be found). "Cheat Engine" (it's been a while since I've used windows, sorry) had something like this if I remember correctly. Maybe use a compiler to generate that function, then.

    • This is the lazy way to do it, and might end up taking even more work making decompiler output compilable. Only IDA/Ghidra decompilation is close enough to be feasible. I've done this manually before.

Both of these require an understanding of assembly to verify the patches were correct; A wrong one will grind your program to a halt, and no tool doing this will be reliably correct.

#2 still has flaws. I've spent 2+ hours perfecting decompilation/disassembly on a single function; everything in memory properly named, manually defined every struct, etc. Even with perfect decompilation, it still needs work to be compiled.

Maybe you could do that work yourself in some complex script. This is a problem I'd recommend revisiting when you're experienced; it's a really interesting topic, and IDAPython might make it almost feasible.

Answered by krit on April 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP