Goal

Analyze code re-use in binaries to attribute unknown samples to families / threat actors. To do this we can obtain the list of functions in an executable and hash each function's opcodes using a fuzzy hash algorithm (kind of what Diaphora does). For the fuzzy hashing I will be using ssdeep, and for the opcode extraction r2 (and r2pipe).

Data preparation

First, we need some data to work on (hashes + samples). I will be using a QuantLoader sample set, as it is a rather simple malware.

With that in mind, we need to identify some landmark functions in a Quant sample.

Quant uses a rather characteristic string encryption method. We may use the string decryption function as one of our landmark functions.

CnC decryption routine

The function that returns the CnC is also quite characteristic:

CnC building

However, as it is mostly movs of hard-coded data, we won't be successful in matching the signature against samples with different CnCs. Instead of hashing that function we can hash the one right after it, which is the one in charge to make requests to the CnC.

As a result we have the following:

{
    "Quant": [{
        "hash": "3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r",
        "description": "Quant decode string procedure"
    },
    {
        "hash": "6:ZPluXsUhRlvEaChTcZaOE2Lm9KwqFeopmzdwrkm:OsUfV41Ijmgw8mhwz",
        "description": "Contact CnC procedure"
    }]
}

Results

We will be comparing the aforementioned samples with a Quant sample set containing 20 samples.

$ python genid.py ../../samples/quant/quantloader_* 2>/dev/null
[*] ../../samples/quant/quantloader_1.48.exe - Family: Quant
    Detected function: sub.b14da032ad1d0e2e011e477ae4391609_45c
    Similarity: 58
        Malicious (stored) hash: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
        Hash for function: 3:ZgLFyPPRph9619nDlmMXWQTx4DNlhkPTfPlgIQ1xPsgR/AhE7r:eyHRphA1pDlFGQTx4DyPjlgIsXf7r
    Description: Quant decode string procedure
 
 
[*] ../../samples/quant/quantloader_1.53_.unpacked.exe - Family: Quant
    Detected function: sub.d98a73e2ce8e0244de44251c54f0d4c4_45c
    Similarity: 100
        Malicious (stored) hash: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
        Hash for function: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
    Description: Quant decode string procedure
 
 
[*] ../../samples/quant/quantloader_1.53_.unpacked.exe - Family: Quant
    Detected function: fcn.00401c64
    Similarity: 78
        Malicious (stored) hash: 6:ZPluXsUhRlvEaChTcZaOE2Lm9KwqFeopmzdwrkm:OsUfV41Ijmgw8mhwz
        Hash for function: 6:ZPluXsUhE/vtnvEaChTsZaOE27m9KwqFe4pmzdgIkm:OsUe/vV41YzmgwsmhgA
    Description: Contact CnC procedure
 
 
[*] ../../samples/quant/quantloader_1.54.exe - Family: Quant
    Detected function: fcn.00401c64
    Similarity: 78
        Malicious (stored) hash: 6:ZPluXsUhRlvEaChTcZaOE2Lm9KwqFeopmzdwrkm:OsUfV41Ijmgw8mhwz
        Hash for function: 6:ZPluXsUhE/vtnvEaChTsZaOE27m9KwqFe4pmzdgIkm:OsUe/vV41YzmgwsmhgA
    Description: Contact CnC procedure
 
 
[*] ../../samples/quant/quantloader_1.54.exe - Family: Quant
    Detected function: sub.8c6a5055bef90bf2cbcbd735056e419e_45c
    Similarity: 100
        Malicious (stored) hash: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
        Hash for function: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
    Description: Quant decode string procedure
 
 
[*] ../../samples/quant/quantloader_1.61.exe - Family: Quant
    Detected function: sub.2196163e39ca009cbbd36bd3220db6a8_45c
    Similarity: 54
        Malicious (stored) hash: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
        Hash for function: 3:ZgLFyPPRph7IGmVklmsW5x4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7a+llW5x4Q4PjlgIsuf7r
    Description: Quant decode string procedure
 
 
[*] ../../samples/quant/quantloader_1.61.exe - Family: Quant
    Detected function: fcn.00401c64
    Similarity: 100
        Malicious (stored) hash: 6:ZPluXsUhRlvEaChTcZaOE2Lm9KwqFeopmzdwrkm:OsUfV41Ijmgw8mhwz
        Hash for function: 6:ZPluXsUhRlvEaChTcZaOE2Lm9KwqFeopmzdwrkm:OsUfV41Ijmgw8mhwz
    Description: Contact CnC procedure
 
 
[*] ../../samples/quant/quantloader_1.73.exe - Family: Quant
    Detected function: sub.871806ab0a390607f3da446a8407c104_45c
    Similarity: 60
        Malicious (stored) hash: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
        Hash for function: 3:ZgLFyPPRph7Y61NnDlm8XWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7515Dl1GQTx4Q4PjlgIsuf7r
    Description: Quant decode string procedure
 
 
[*] ../../samples/quant/quantloader_1.73.exe - Family: Quant
    Detected function: fcn.00401c64
    Similarity: 57
        Malicious (stored) hash: 6:ZPluXsUhRlvEaChTcZaOE2Lm9KwqFeopmzdwrkm:OsUfV41Ijmgw8mhwz
        Hash for function: 6:ZPluXsUh0/vtMNEaChTCPZaOE2QahGJV/OVsXgqFei1d6wGBRmAn:OsUO/vyN41CDYSyTwi/6FBHn
    Description: Contact CnC procedure
 
 
[*] ../../samples/quant/quantloader_1.731.73.exe - Family: Quant
    Detected function: sub.871806ab0a390607f3da446a8407c104_45c
    Similarity: 60
        Malicious (stored) hash: 3:ZgLFyPPRph7o619nDlmMXWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7p1pDlFGQTx4Q4PjlgIsuf7r
        Hash for function: 3:ZgLFyPPRph7Y61NnDlm8XWQTx4Q1VkPTfPlgIQ1xPsgQAhE7r:eyHRph7515Dl1GQTx4Q4PjlgIsuf7r
    Description: Quant decode string procedure
 
 
[*] ../../samples/quant/quantloader_1.731.73.exe - Family: Quant
    Detected function: fcn.00401c64
    Similarity: 57
        Malicious (stored) hash: 6:ZPluXsUhRlvEaChTcZaOE2Lm9KwqFeopmzdwrkm:OsUfV41Ijmgw8mhwz
        Hash for function: 6:ZPluXsUh0/vtMNEaChTCPZaOE2QahGJV/OVsXgqFei1d6wGBRmAn:OsUO/vyN41CDYSyTwi/6FBHn
    Description: Contact CnC procedure

Okay, we get plenty of matches with multiple accuracy values. Let's look into the first one, for example, which has a lowish similarity percentage (58%).

Comparison of the detected function vs the one used to generate the hash

We can see how both functions are identical except for the hard-coded constants.

Packing

Of couse, everything proposed works only for unpacked samples, and the vast majority of malware comes packed. As doing manual unpacking for every sample we want to analyze is not practical we should find an alternative way.

Generic unpacking (generating a working, unpacked executable) is not easy and won't work 100% of the time. However, we don't need a working executable, a memory-dumped executable image that r2 / IDA can parse will suffice.

Cuckoo already does this, so we can grab the executables from there, or implement a similar code in a Cuckoo-independant fashion.

Here's the same script executed over Hancitor images extracted from memory and generating results:

[*] Processing ../../samples/hancitor/extracted/1784-3b036c80506bcba9.exe_
[*] ../../samples/hancitor/extracted/1784-3b036c80506bcba9.exe_ - Family: Hancitor
	Detected function: fcn.00282070
	Similarity: 69
		Malicious (stored) hash: 6:nTJHO5i3HVyEzT0oieH2tODjrPkjLR6GB2PPLR6TVuFuHQ/B:tHO5iBzWtQXsfRJwXLRCe0UB
		Hash for function: 6:nTJH+TdGV3HVgExT0oieH2tADjxPqkAfjLXiGBoPdLXiTVuFu1Q/B:tHs0VVzWtADEVfB4LaeOUB
	Description: Get CnC


[*] Processing ../../samples/hancitor/extracted/2620-aafa962e1332ef89.exe_
[*] Processing ../../samples/hancitor/extracted/416-78f210b6e71539ba.exe_
[*] ../../samples/hancitor/extracted/416-78f210b6e71539ba.exe_ - Family: Hancitor
	Detected function: sub.http:__tinheranter.com_ls5_gate.php_http:__ningwitjohnno.ru_ls5_gate.php_http:__tycahatit.ru_ls5_gate.php_98c
	Similarity: 100
		Malicious (stored) hash: 3:FdW853QUQEeVJR/V8Az/dQERQUHA/fFP3AhbXRUdIIAJVFVbSfIANtUhgUWTwPV+:nrRQ/35V/SERBsfFP3abid90VFVbSQIL
		Hash for function: 3:FdW853QUQEeVJR/V8Az/dQERQUHA/fFP3AhbXRUdIIAJVFVbSfIANtUhgUWTwPV+:nrRQ/35V/SERBsfFP3abid90VFVbSQIL
	Description: Get CnC (2)


[*] Processing ../../samples/hancitor/extracted/416-811a273be4a4bda1.exe_
[*] ../../samples/hancitor/extracted/416-811a273be4a4bda1.exe_ - Family: Hancitor
	Detected function: sub.http:__tinheranter.com_ls5_gate.php_http:__ningwitjohnno.ru_ls5_gate.php_http:__tycahatit.ru_ls5_gate.php_98c
	Similarity: 100
		Malicious (stored) hash: 3:FdW853QUQEeVJR/V8Az/dQERQUHA/fFP3AhbXRUdIIAJVFVbSfIANtUhgUWTwPV+:nrRQ/35V/SERBsfFP3abid90VFVbSQIL
		Hash for function: 3:FdW853QUQEeVJR/V8Az/dQERQUHA/fFP3AhbXRUdIIAJVFVbSfIANtUhgUWTwPV+:nrRQ/35V/SERBsfFP3abid90VFVbSQIL
	Description: Get CnC (2)