On the most recent phishing attacks, PowerShell is usually employed to load and execute position-independant shellcode via a macro-enabled Office document.

Infection process

So, in order to know what actions are being carried away the truly interesting part here is the shellcode being executed. However, to slow down analysis or lower detection, shellcode is usually encoded, being shikata ga nai the most used encoder (for the samples I have observed at least).

Shikata ga nai

Shikata ga nai is a polymorphic encoder based on a decoder stub. The decoder stub XORs the encoded bytes with an incremental key. Having an incremental key means that for each decrypted byte the XOR key is modified (with an add instruction). For more information about shikata ga nai you may read this blogpost about the matter.

Generic decoding

A (shikata) encoded shellcode will have the following structure

Where the stub will modify the encoded bytes at runtime. So the main idea is to detect modification on the encoded bytes block at runtime (via emulation or sandboxing) and dump those modified bytes.

Using unicorn emulator for decoding

To do this we can hook memory write in Unicorn to detect self-modifying code.

Note: Code is partly based on a public decoder on GitHub which I could no longer find to give proper credit

import sys  
import logging

from unicorn import *  
from unicorn.x86_const import *  
from unicorn.arm_const import *  
from unicorn.arm64_const import *  
from unicorn.mips_const import *

log = logging.getLogger()

class DecodeEngine:  
    def __init__(self, opts={}):
        Initializes the engine. If no options dict is supplied
        engine will be initialized for x86 code emulation.
        Available options:
        arch: Unicorn architecture (default UC_ARCH_X86)
        mode: Unicorn mode (default UC_MODE_32)
        instr: Unicorn registry used as instruction pointer (default UC_X86_REG_EIP)
        stack: Unicorn registry used as stack pointer (default UC_X86_REG_ESP)
        debug: Enable debugging (default False)
        Keyword arguments:
        opts -- A dict with options for the engine (optional)
        self.opts = {
            "arch": opts.get("arch", UC_ARCH_X86),
            "mode": opts.get("mode", UC_MODE_32),
            "instr": opts.get("instr", UC_X86_REG_EIP),
            "stack": opts.get("stack", UC_X86_REG_ESP),

        self.write_bounds = [None, None]

    def decode(self, bin_code):
        Decode a encoded shellcode by using Unicorn engine for emulation.
        bin_code: (string) The raw shellcode to be decoded
        # Emulation memory
        MEM_SIZE = 2 * 1024 * 1024  # 2MB
        # Start base addres
        ADDRESS = 0x1000

        emu = Uc(self.opts["arch"], self.opts["mode"])
        emu.opts = self.opts
        emu.mem_map(ADDRESS, MEM_SIZE)
        emu.mem_write(ADDRESS, bin_code)
        # Write a INT 0x3 near the end of the code to force stop
        emu.mem_write(ADDRESS + len(bin_code) + 0xff, b"\xcc\xcc\xcc\xcc")

        emu.hook_add(UC_HOOK_MEM_INVALID, self.hook_mem_invalid)
        emu.hook_add(UC_HOOK_MEM_WRITE, self.hook_mem_write)
        emu.hook_add(UC_HOOK_INTR, self.hook_intr)

        # Init stack to half-way the mapped mem
        emu.reg_write(self.opts["stack"], ADDRESS + MEM_SIZE / 2)

            emu.emu_start(ADDRESS, len(bin_code))
        except UcError as e:
            log.error("ERROR: %s" % e)

        if self.write_bounds[0] != None:
            # Read & return the modified code
            return emu.mem_read(self.write_bounds[0],
                            (self.write_bounds[1] - self.write_bounds[0]))
            log.warning("No self-modifying code detected, could not do anything")
            return None

    ########### HOOKS ############

    def hook_intr(self, uc, intno, user_data):
        """Stop on INT 3 instruction"""
        if intno == 0x3:
            return False
            return True

    def hook_mem_invalid(self, uc, access, address, size, value, user_data):
        """Print errors for illegal instructions"""
        eip = uc.reg_read(uc.opts["instr"])

        if access == UC_MEM_WRITE:
            print("invalid WRITE of 0x%x at 0x%X, data size = %u, data value = 0x%x" % (address, eip, size, value))
        if access == UC_MEM_READ:
            print("invalid READ of 0x%x at 0x%X, data size = %u" % (address, eip, size))

        return False

    def hook_mem_write(self, uc, access, address, size, value, user_data):
        """Hook memory write instructions to detect self modifying code"""
        # Maximum write RVA
        MAX_LEN = 0x200  # 512B
        instr_ptr = uc.reg_read(uc.opts["instr"])

        if abs(instr_ptr - address) < MAX_LEN:
            if self.write_bounds[0] == None:
                # Initialize bounds to written addr
                self.write_bounds[0] = address
                self.write_bounds[1] = address
            elif address < self.write_bounds[0]:
                # Expand lower bound
                self.write_bounds[0] = address
            elif address > self.write_bounds[1]:
                # Expand higher bound
                self.write_bounds[1] = address

if __name__ == '__main__':  
    if not sys.argv[1]:
        print "You need to specify a file as first argument"

    for filename in sys.argv[1:]:
        bin_code = open(filename, "rb").read()
        decoded = DecodeEngine().decode(bin_code)
        if decoded:
            print "Shellcode decoded!"
            outf = filename + ".dcded"
            open(outf, "wb").write(decoded)
            print "Decoded shellcode has been written to %s" % outf
            print "Could not decode the file"


We can test it by generating two metasploit shellcodes, one encoded and another one without any encoding.

❯ msfvenom -a x86 --platform Windows -p windows/shell/reverse_tcp -f raw -o reverse.raw 
No encoder or badchars specified, outputting raw payload  
Payload size: 333 bytes  
Saved as: reverse.raw

❯ yara ~/malware_analysis/yara/metasploit.yar reverse.raw 
meterpreter_reverse_tcp_shellcode reverse.raw  
meterpreter_reverse_tcp_shellcode_rev1 reverse.raw

❯ msfvenom -a x86 --platform Windows -p windows/shell/reverse_tcp -e x86/shikata_ga_nai -b '\x00' -i 1 -f raw -o reverse_enc.raw
Found 1 compatible encoders  
Attempting to encode payload with 1 iterations of x86/shikata_ga_nai  
x86/shikata_ga_nai succeeded with size 360 (iteration=0)  
x86/shikata_ga_nai chosen with final size 360  
Payload size: 360 bytes  
Saved as: reverse_enc.raw

❯ yara ~/malware_analysis/yara/metasploit.yar reverse_enc.raw

We can see how the unencoded shellcode is detected as meterpreter_reverse_tcp_shellcode by Yara while the encoded one is not. So, looks like a good target to test the decoder on.

scripts❯ python decoder.py ~/reverse_enc.raw  
Shellcode decoded!  
Decoded shellcode has been written to /home/fernando/reverse_enc.raw.dcded

scripts❯ yara ~/malware_analysis/yara/metasploit.yar ~/reverse_enc.raw.dcded  
meterpreter_reverse_tcp_shellcode /home/fernando/reverse_enc.raw.dcded  

Back to being detected!

We can even have a look at what the decoded contents are:

scripts❯ ndisasm ~/reverse.raw | more                                                                                                                malware_analysis/git/master  
00000000  FC                cld  
00000001  E88200            call word 0x86  
00000004  0000              add [bx+si],al  
00000006  60                pushaw  
00000007  89E5              mov bp,sp  
00000009  31C0              xor ax,ax  
0000000B  648B5030          mov dx,[fs:bx+si+0x30]  
0000000F  8B520C            mov dx,[bp+si+0xc]  
00000012  8B5214            mov dx,[bp+si+0x14]  
00000015  8B7228            mov si,[bp+si+0x28]  
00000018  0F                db 0x0f  
00000019  B74A              mov bh,0x4a  
0000001B  2631FF            es xor di,di  
0000001E  AC                lodsb  
0000001F  3C61              cmp al,0x61  

scripts❯ ndisasm ~/reverse_enc.raw.dcded | more                                                                                                      malware_analysis/git/master  
00000000  0FE2F5            psrad mm6,mm5  
00000003  FC                cld  
00000004  E88200            call word 0x89  
00000007  0000              add [bx+si],al  
00000009  60                pushaw  
0000000A  89E5              mov bp,sp  
0000000C  31C0              xor ax,ax  
0000000E  648B5030          mov dx,[fs:bx+si+0x30]  
00000012  8B520C            mov dx,[bp+si+0xc]  
00000015  8B5214            mov dx,[bp+si+0x14]  
00000018  8B7228            mov si,[bp+si+0x28]  
0000001B  0F                db 0x0f  
0000001C  B74A              mov bh,0x4a  
0000001E  2631FF            es xor di,di  
00000021  AC                lodsb  
00000022  3C61              cmp al,0x61  
00000024  7C02              jl 0x28  
00000026  2C20              sub al,0x20  
00000028  C1CF0D            ror di,byte 0xd  
0000002B  01C7              add di,ax  
0000002D  E2F2              loop 0x21  
0000002F  52                push dx  
00000030  57                push di  

They are the same with the exception of the first instruction. This is due to shikata ga nai also encoding the last (or last bytes) stub instruction (usually the loop instruction). This intruction needs to be the first one to be decoded for the stub to work properly, so it will make it into the dump as well.