Reverse Engineering Asked by Zmirlacz on June 12, 2021
I’m writing a compiler from .NET CIL code to some high level language. Process is similar to decompilation. I have done some control flow analysis – detecting loops, ifs, and so on. In terms of data flow analysis i’ve done simple expression propagation by simulating some instructions involving evaluation stack – I treat evaluation stack as hidden variable, push more complex expressions on it, and whenever there is any assignment instruction to some variable (for example starg
or stloc
) – I pop and assign propagated expression from stack to this variable and translate this expression into statement in high language code. Of course for now it is so simple that it generates failures. Consider a function written in C#:
int GCD(int n1, int n2)
{
while(n2 != 0)
{
int c = n1 % n2;
n1 = n2;
n2 = c;
}
return n1;
}
This function compiles to IL:
.method private hidebysig
instance int32 GCD (
int32 n1,
int32 n2
) cil managed
{
.maxstack 8
IL_0000: br.s IL_000a
IL_0002: ldarg.1 // load n1 on eval stack
IL_0003: ldarg.2 // load n2 on eval stack
IL_0004: rem // pop n1 and n2 from stack, compute n1 % n2 and push it on stack
IL_0005: ldarg.2 // load n2 on stack
IL_0006: starg.s n1 // pop n2 from stack and store it in n1
IL_0008: starg.s n2 // pop n1 % n2 from stack and store it in n2
IL_000a: ldarg.2
IL_000b: brtrue.s IL_0002
IL_000d: ldarg.1
IL_000e: ret
}
With this simple propagation we push n1 % n2
on stack, then load n2
on stack, then we have starg
instruction, so we pop expression from stack and translate assignment to statement. Then we pop again, and do the same. Result code looks like this:
GCD(n1, n2) {
while (n2) {
n1 = n2;
n2 = (n1 % n2);
}
return n1;
}
This indicates that I have to do something inverse to dead code elimination, maybe called like "necessary code introduction". I searched for some sources about methods to introduce new variables in decompilation, but I did not find any. Do you have any ideas?
The problem, as I think you realise, is that you are logically moving the calculation of an expression past the point whereby one of its input values gets changed.
Given your current implementation, I'd suggest the easiest approach to handle this is -
Applying this procedure to your code gives -
IL_0002: // stack = { n1 }
IL_0003: // stack = { n2, n1 }
IL_0004: // stack = { n1 % n2 }
IL_0005: // stack = { n2, n1 % n2 }
IL_0006: // this instruction will write to n1
// inspection shows that we have an expression involving n1 on the stack so
// (a) spill this expression to a new temporary variable and
// (b) replace the expression on the stack with the new temporary variable
temp = n1 % n2; // stack = { n2, temp }
// now the write to n1 can safely take place
n1 = n2; // stack = { temp };
IL_0008: // this instruction will write to n2,
// inspection shows no expressions on the stack involving n2
// the write to n2 can safely take place
n2 = temp; // stack = {}
Removing the comments, the inner loop now looks like the original code.
temp = n1 % n2;
n1 = n2;
n2 = temp;
Correct answer by Ian Cook on June 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP