The truth about string protection by obfuscators

Copyright 2003 Remotesoft Inc. All rights reserved.

Some obfuscators offer string encryption feature, where literal strings are protected during obfusaction and decrypted at runtime. The importance of such feature is often heavily exaggerated. In our opinions, it should be only viewed as "better than nothing" because it does not offer any real protection. This document presents the details on how an obfuscator encrypts a string, how the protected string is decrypted at runtime, and how easy for someone to get back the original strings. Following the samples here, you can easily build your own utility to perform string protection.

Download code used in this document

Literal Strings

Take the following simple HelloWorld program in C#,

			
class HelloWorld
{
    public static void Main()
    {
        System.Console.WriteLine("Hello World");
    }
}
		

After compilation, you can use the .NET framework ildasm utility to view the assembly code (IL) of the method, as shown below. Notice there is an ldstr instruction, which loads a literal strings such as "Hello World". In fact, all literal strings are loaded by ldstr. Literal strings are different from symbol names, such as method name Main. Strings used for symbol names are called string heaps in CLR terms, which are usually renamed by obfuscators.

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  .maxstack  1
  IL_0000:  ldstr "Hello World"
  IL_0005:  call  void [mscorlib]System.Console::WriteLine(string)  
  IL_000a:  ret
}
		

String Encryption

We have seen an example of a literal string, "Hello World", and how it gets loaded. An obfuscator searches for the ldstr instruction, and replaces the strings immediately after the instruction with an encrypted value. The encryption could use any mechanism as far as there is a corresponding decryption mechanism. The following is a simple encryption routine.

public static string Encrypt(string inputStr)
{	
	int len = inputStr.Length;
	
	int sum = 10;
	
	char[] encryptedChars = new char[len];
	for (int i=0; i<len; i++) 
	{
		encryptedChars[i] = (char)((int)inputStr[i] + sum);
	}
	
	char[] newNameStr = new char[len];
	for (int i=0; i<len; i++) 
	{ // reverse the name
		newNameStr[i] = encryptedChars[len-1-i];
	}

	for (int i=0; i<len; i++) 
	{ // reverse the name
		encryptedChars[i] = newNameStr[i];
	}
	
	return String.Intern(new String(encryptedChars));
}		
		

After encryption, strings becomes very hard to recognize, and usually use a byte array to represent. "Hello World" becomes the following byte array, which can be used to replace the original literal string.

   
    bytearray(6E 00 76 00 7C 00 79 00 61 00 2A 00 79 00 76 00 76 00 6F 00 52 00)

Our HelloWorld program becomes the following after encryption, notice the encypted byte array is now in place of the original "Hello World" string,

 
.method public hidebysig static  void Main() cil managed
{
    .entrypoint  .maxstack 1  
    IL_0000:  ldstr bytearray(6E 00 76 00 7C 00 79 00 61 00 2A 00 79 00 76 00 76 00 6F 00 52 00) // "Hello World"
    IL_0005:  call  void [mscorlib]System.Console::WriteLine(string)  
    IL_000a:  ret
}

String Decryption

Now that the string is encrypted, the runtime needs a way to get back the original string, otherwise the program will yield different result. Obfuscators embed a method for such a purpose. This method is called after each ldstr instruction to decrypt the protected string. Since this method is embeded into the assembly, and it is usually a managed method, one can easily identify it, and invoke it to recover all protected strings. This is why string protection offers very little protection, and should be only viewed as "better than nothing".

Below is the corresponding decryt method of our encryt method shown above. It will decrypt the encrypted strings.

public static string Decrypt(string inputStr)
{	
    int len = inputStr.Length;
    char[] decryptedChars = new char[len];
    for (int i=0; i<len; i++) 
    {
        decryptedChars[i] = (char)((int)inputStr[i] - 10);
    }	

    char[] newNameStr = new char[len];
    for (int i=0; i<len; i++) 
    { // reverse the name
        newNameStr[i] = decryptedChars[len-1-i];
    }

    for (int i=0; i<len; i++) 
    { // reverse the name
        decryptedChars[i] = newNameStr[i];
    }

    return String.Intern(new String(decryptedChars));
}

Inject the assembly code (IL) of the decypt method into the HelloWorld program, and add an extra call after each ldstr instruction, then compile the resulting IL code to an executable. The new executable is now string protected. This is exactly what some obfuscators do when they perform string encryption. The new version of IL code of our HelloWorld program is shown below,


.method public hidebysig static string  Decrypt(string inputStr) cil managed
{
  // Code size       119 (0x77)
  .maxstack  5
  .locals init (int32 V_0,
           char[] V_1,
           int32 V_2,
           char[] V_3,
           int32 V_4,
           int32 V_5,
           string V_6)
  IL_0000:  ldarg.0
  IL_0001:  callvirt   instance int32 [mscorlib]System.String::get_Length()
  IL_0006:  stloc.0
  IL_0007:  ldloc.0
  IL_0008:  conv.ovf.u4
  IL_0009:  newarr     [mscorlib]System.Char
  IL_000e:  stloc.1
  IL_000f:  ldc.i4.0
  IL_0010:  stloc.2
  IL_0011:  br.s       IL_0025
  IL_0013:  ldloc.1
  IL_0014:  ldloc.2
  IL_0015:  ldarg.0
  IL_0016:  ldloc.2
  IL_0017:  callvirt   instance char [mscorlib]System.String::get_Chars(int32)
  IL_001c:  ldc.i4.s   10
  IL_001e:  sub
  IL_001f:  conv.u2
  IL_0020:  stelem.i2
  IL_0021:  ldloc.2
  IL_0022:  ldc.i4.1
  IL_0023:  add
  IL_0024:  stloc.2
  IL_0025:  ldloc.2
  IL_0026:  ldloc.0
  IL_0027:  blt.s      IL_0013
  IL_0029:  ldloc.0
  IL_002a:  conv.ovf.u4
  IL_002b:  newarr     [mscorlib]System.Char
  IL_0030:  stloc.3
  IL_0031:  ldc.i4.0
  IL_0032:  stloc.s    V_4
  IL_0034:  br.s       IL_0048
  IL_0036:  ldloc.3
  IL_0037:  ldloc.s    V_4
  IL_0039:  ldloc.1
  IL_003a:  ldloc.0
  IL_003b:  ldc.i4.1
  IL_003c:  sub
  IL_003d:  ldloc.s    V_4
  IL_003f:  sub
  IL_0040:  ldelem.u2
  IL_0041:  stelem.i2
  IL_0042:  ldloc.s    V_4
  IL_0044:  ldc.i4.1
  IL_0045:  add
  IL_0046:  stloc.s    V_4
  IL_0048:  ldloc.s    V_4
  IL_004a:  ldloc.0
  IL_004b:  blt.s      IL_0036
  IL_004d:  ldc.i4.0
  IL_004e:  stloc.s    V_5
  IL_0050:  br.s       IL_0060
  IL_0052:  ldloc.1
  IL_0053:  ldloc.s    V_5
  IL_0055:  ldloc.3
  IL_0056:  ldloc.s    V_5
  IL_0058:  ldelem.u2
  IL_0059:  stelem.i2
  IL_005a:  ldloc.s    V_5
  IL_005c:  ldc.i4.1
  IL_005d:  add
  IL_005e:  stloc.s    V_5
  IL_0060:  ldloc.s    V_5
  IL_0062:  ldloc.0
  IL_0063:  blt.s      IL_0052
  IL_0065:  ldloc.1
  IL_0066:  newobj     instance void [mscorlib]System.String::.ctor(char[])
  IL_006b:  call       string [mscorlib]System.String::Intern(string)
  IL_0070:  stloc.s    V_6
  IL_0072:  br.s       IL_0074
  IL_0074:  ldloc.s    V_6
  IL_0076:  ret
} // end of method 'Global Functions'::Decrypt

.method public hidebysig static  void Main() cil managed
{
    .entrypoint  .maxstack 1  
    IL_0000:  ldstr bytearray(6E 00 76 00 7C 00 79 00 61 00 2A 00 79 00 76 00 76 00 6F 00 52 00) // "Hello World"
    IL_0005:  call string Decrypt(string) 
    IL_000a:  call  void [mscorlib]System.Console::WriteLine(string)     
    IL_000f:  ret
}

The instruction shown in red is an extra one that performs decryption. Since there will be an extra call for each ldstr instruction, so the size of your program becomes large, and the performance becomes slower.

Salamander Decompiler

Since a decryption method MUST be embeded, and it always immediately follows the ldstr instruction, it is very straightforward to retrieve the the original strings: simply invoke the decrypt method for the string.

By default, our salamander decompiler automatically detects string encryptions, then calls the decryt method, and put back the original string. You can download the protected file from here, and upload the protected EXE, HelloWorldEncrypted.exe, to salamander decompiler, to see for yourself. You can also try files protected by commercial obfuscators.

Conclusions

In summary, string encyption can be easily implemented, and it offers almost no protection at a cost of larger file size and slower performance. If a vendor makes a big deal out of such a feature, be aware that it does NOT protect your sensitive data. It is better to do some custom protection by yourself.