Reverse Engineering Asked by Spyindabox on April 20, 2021
Dabbler in re here, so potentially a stupid question…
I know ida, binary ninja, and ghidra are really powerful at generating pseudo code.
From everything I’ve read pseudo code can’t be edited in realtime but can be edited as assembly, I was just wondering why you couldn’t do a similar process to the one below?
decompile exe to fake code
make code template for compiling in c or c++
load fake code into template
compile basic template and stop at assembly code generation
copy assembly from halfway compiled c++ exe
auto replace halfway compiled code as assembly to ida assembly code
Alternatively,
decompile exe to fake code
make code template for compiling in c or c++
load fake code into template
compile basic template and stop at assembly code generation
load the function into ida or such with pdb
generate fake code for your template exe
go to the function and get the assembly from there to copy and replace
I know this method wouldn’t be fast… but I am surprised no one has tried a method like this as far as I’m aware? Am I just missing something obvious?
From everything I've read pseudo code can't be edited in realtime but can be edited as assembly
This is not entirely correct. Quite the opposite even: Decompilers cannot be perfect (the compilation step looses too much information). Hence they need some help by a human (the reverse engineer). Giving this help is, at least in my opinion, the most important step during reverse engineering: get data types right. Sure, renaming variables helps a lot with readability, but the decompiler doesn't really need it. Changing the type of a variable or function though, feeds information back into the decompiler, which then can run another pass and improve the result. This result can then again further be improved by the human. If I would need to name the most important step during static binary reverse engineering, I would call out this cycle: decompile, re-type, repeat.
Now that we got this out of the way, I'll try to address the suggested steps you propose: I am not entirely sure I understand what you mean but I think a huge problem arises in step 2/4:
make code template for compiling in c or c++
code generated by a decompiler is not really C/C++ code. Formally, it only qualifies as pseudo-code that has a C-like syntax. The differences between valid C and "decompiler C" depends on your decompiler of course (Hex-Rays, Binary Ninja, Ghidra), but to give a simple examples (there are more, many of which are far more serious): If Ghidra's decompiler is not sure what data type a given variable is, it will assign the "type" undefined
. This is not a valid data type in C of course, and hence cannot be compiled into an executable (i.e. step 4 fails).
Answered by born on April 20, 2021
for editing in every owning pseudo code do think psychology corelations of it and then you get the human meaning for those who didnt change indetity based on it only for who didnt read
but if code is not accessible (no pseudo code) you can make a dictionary of character to binary and to its functions on the cpu that have high performance on java or python(slow)
Answered by Noam lima on April 20, 2021
@born brings up some great points, but I do think it's definitely worth saying there's not much inherently impossible about the whole idea. Compiling and grabbing assembly is likely not the best bet, though.
Passing the entire thing off as impossible is just not right. IDA clearly has potential in the area; Select psuedocode and click "Copy to Assembly". It'll generate comments in the assembly that map it to where the psuedocode functions come from.
Here's a comparison of the three different relevant phases of a program; source, psuedocode, and ASM:
Source (clang -w -o test
) :
int main(void)
{
printf("hello world");
}
Note the incorrect, but functional, use of printf("string")
instead of printf("%s", "string")
. This is another debate, but it'll screw up decompilation
Decompilation By IDA (psuedocode):
int __cdecl main(int argc, const char **argv, const char **envp)
{
printf("hello world", argv, envp);
return 0;
}
This is just wrong. printf won't accept those values, it's expecting 0 extra arguments due to a lack of formatting "%s" strings in "hello world". A simple mistake has screwed up the psuedocode output.
Disassembly by IDA (note some of these instructions might not be right)
push rbp
mov rbp, rsp
sub rsp, 10h
; 2: printf("hello world", argv, envp);
lea rdi, aHelloWorld ; "hello world"
mov al, 0
call _printf
; 3: return 0;
xor ecx, ecx
mov [rbp+var_4], eax
mov eax, ecx
add rsp, 10h
pop rbp
retn
Let's say you wanted to edit the string:
Sure, just edit the place it references. Oh, but you want one longer than 11 characters, so you'll need to find somewhere unused and map the string pointer to that address instead. That's complicated.
The entire executable section of the program is 12 instructions long, too. You have almost no space to change anything, and adding stuff is an entirely different ballgame.
Likely Reasons it hasn't been done
I don't think it's anywhere near impossible though. Not by compiling for sure. But what you could try, is something like these:
Track how the decompiler was able to map <x assembly> to <y psuedocode>, and whenever <y psuedocode> is changed, create binary patches for the <x assembly> that created it.
Replace a function call with a branch to your own code elsewhere (assuming space can be found). "Cheat Engine" (it's been a while since I've used windows, sorry) had something like this if I remember correctly. Maybe use a compiler to generate that function, then.
Both of these require an understanding of assembly to verify the patches were correct; A wrong one will grind your program to a halt, and no tool doing this will be reliably correct.
#2 still has flaws. I've spent 2+ hours perfecting decompilation/disassembly on a single function; everything in memory properly named, manually defined every struct, etc. Even with perfect decompilation, it still needs work to be compiled.
Maybe you could do that work yourself in some complex script. This is a problem I'd recommend revisiting when you're experienced; it's a really interesting topic, and IDAPython might make it almost feasible.
Answered by krit on April 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP