Reverse Engineering Asked on September 30, 2021
I’ve recently become pretty fascinated with virtualization and retrieving original code from a randomly generated byte code, such as protectors like VMProtect/etc. But I can not get a grasp on how it would actually be done.
Although I have read a few writings to help understand virtualization better, even articles specifically written to target the protector I’m trying to devirtualize, I can not relate their articles to my own sample. There are discrepancies between the two which confuse me.
I made a small program that takes in a string input and outputs it, then added Virtualization to the function. First thing I notice is that the function now calls another function, and looks like it adds random junk instructions after the call. I also notice that it pushes 4 bytes onto the stack.
.vmp0:00440C0F 68 28 02 8E 51 push 518E0228h
.vmp0:00440C14 E8 68 BB FC FF call sub_40C781
Inside sub_40C781
I see it is pushing all registers on the stack and also the pushf
instruction which pushes flags. Inbetween these pushes and main instructions, I can’t help but notice pointless instructions inbetween, which is a little odd since I only specified Virtualization in protection.
Anyway, the main valid instructions I managed to scavenge from the function called go as
push ebp
push ebx
push ecx
push esi
push edi
pushf
push edx
push eax
mov eax, 0
push eax
mov esi, [esp+24h+arg_0]
lea esi, [esi+eax]
lea esp, [esp-0C0h]
But from there, I’m not sure how to proceed. The rest of instructions past the last lea
go as
.vmp0:0040E187 loc_40E187: ; CODE XREF: .vmp0:00416516↓j
.vmp0:0040E187 ; .vmp0:loc_42E5EB↓j ...
.vmp0:0040E187 8B DE mov ebx, esi
.vmp0:0040E189 B8 00 00 00 00 mov eax, 0
.vmp0:0040E18E 0F BA F7 E7 btr edi, 0E7h ; 'ç'
.vmp0:0040E192 66 23 F8 and di, ax
.vmp0:0040E195 2B D8 sub ebx, eax
.vmp0:0040E197
.vmp0:0040E197 loc_40E197: ; DATA XREF: sub_40C781:loc_40E197↓o
.vmp0:0040E197 8D 3D 97 E1 40 00 lea edi, loc_40E197
.vmp0:0040E19D 66 0F A4 D0 09 shld ax, dx, 9
.vmp0:0040E1A2 C1 C8 49 ror eax, 49h
.vmp0:0040E1A5 81 EE 04 00 00 00 sub esi, 4
.vmp0:0040E1AB 0F A3 F8 bt eax, edi
.vmp0:0040E1AE 33 C6 xor eax, esi
.vmp0:0040E1B0 C6 C4 27 mov ah, 27h ; '''
.vmp0:0040E1B3 8B 06 mov eax, [esi]
.vmp0:0040E1B5 F9 stc
.vmp0:0040E1B6 F8 clc
.vmp0:0040E1B7 E9 79 12 05 00 jmp loc_45F435
But I can’t see how this would be interpreting code and translating it to its x86 representation. From my understanding, esi
is probably the VM’s instruction ptr. loc_40E197
seems to be the instruction ‘dispatcher’ for lack of better words. But I can not get a grasp of the inner workings, esi
seems to be decremented by 4 each loop, which I thought would be 1 instead.
Any insight on to how to proceed would be greatly appreciated.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP