Cobalt Strike 4.10: Through the BeaconGate

 

Cobalt Strike 4.10 is now available. This release introduces BeaconGate, the Postex Kit, and Sleepmask-VS. In addition, we have overhauled the Sleepmask API, refreshed the Jobs UI, added new BOF APIs, added support for hot swapping C2 hosts, and more. This has been a longer release cycle than in previous releases to allow us to make underlying architectural changes to support our longer-term ambitions.

Note: Cobalt Strike 4.10 introduces breaking changes to the update application. Licensed users will need to download version 4.10 from scratch. The existing 4.9 update application cannot be used to upgrade to version 4.10.

BeaconGate

Over the past few years there has been a dramatic increase in detection logic for anomalous API calls. For example, open-source projects such as syscall-detect, MalMemDetect, Hunt-Sleeping-Beacons, and pe-sieve all demonstrate the value of hunting for suspicious API calls from unbacked memory. Additionally, Elastic has pushed the defensive industry forward with their anomalous call stack detection logic that is a formidable challenge for modern red team operations.

Furthermore, prior to this release, it was difficult for operators to address the challenges outlined above with Cobalt Strike. For example, it was not possible to build on Beacon’s system call implementation and the only way to obtain granular control over Beacon’s API calls was via IAT hooking in a UDRL, which is complex and has a high barrier to entry.

As Cobalt Strike specialises in evasion through flexibility, this was a critical problem to solve and one of our key priorities for this release. Additionally, we wanted to provide a solution that avoided getting bogged down in complex implementation details and made it easy for users to apply custom TTPs to Beacon’s API calls. Our solution to these problems is BeaconGate.

At a high-level, the Sleepmask is conceptually similar to a Remote Procedure Call (RPC), albeit within the same process address space. For example, when Beacon sleeps, it will call into the Sleepmask BOF, mask, and sleep. Beacon here acts as the ‘client’ and the Sleepmask is the ‘server’ that executes the Sleep call on behalf of Beacon. In Cobalt Strike 4.10, we have taken this idea to its logical conclusion and the Sleepmask now supports the execution of arbitrary functions. Therefore, it is now possible to configure Beacon to forward its Windows API calls to be executed via the Sleepmask (aka BeaconGate).

This offers operators unprecedented control and flexibility;  we tell you what Beacon wants to call (and the arguments), and you can do what you want with it. Hence, BeaconGate gives users the ability to implement bleeding edge call stack spoofing TTPs and apply them universally to Beacon’s WinAPI calls. Additionally, as Beacon is now decoupled from its WinAPI calls, you can also mask Beacon while calling a potentially suspicious API. This is all implemented as a BOF, so you can configure different gates and completely change your TTPs by swapping out different Sleepmask BOFs.

By default, if you enable an API to be proxied via BeaconGate, Beacon will be masked while the API is executed. This means that out of the box, Beacon now has mask-and-call functionality. This is a useful mitigation against AV vendors who may trigger scans based on Kernel callbacks/ETW TI events.

BeaconGate can be configured by setting the new stage.beacon_gate Malleable C2 option, as demonstrated below:

stage {  
    beacon_gate {  
          All;  
    }  
}  

Valid values for this option are:  

  • Comms – Currently this is InternetOpenA and InternetConnectA (i.e., HTTP(S) WinInet Beacons only)
  • Core – This is the Windows API equivalents (i.e., VirtualAlloc) of Beacon’s existing system call API. See the BeaconGate documentation for the full list of supported functions.
  • Cleanup – Currently this supports proxying ExitThread via the Sleepmask. If this is enabled, then by default the Sleepmask will scrub/free Beacon from memory before exiting. Additionally, this provides an opportunity for operators to perform custom clean up before Beacon exits.
  • All – Comms + Core + Cleanup. 

It is also possible to forward specific functions from the supported set with the following syntax:

stage {  
    beacon_gate {
          VirtualAlloc;  
       VirtualAllocEx; 
       InternetConnectA; 
    }  
}  

As a note, some more intensive Beacon commands such as ps may spike CPU if you have the core set enabled. This is expected behaviour, as ps will call OpenProcess/CloseHandle multiple times while masking. If desired, you can disable BeaconGate at runtime via beacon_gate disable or alternatively disable the masking for specific functions in your own Sleepmask BOF. 

It is also important to point out that BeaconGate and Cobalt Strike’s existing syscall_method option are mutually exclusive; if you enable BeaconGate for an API, it will take precedence over system calls. However, you can enable BeaconGate for a specific API and use Beacon’s existing system call method for the rest. For example:

stage {  
    set syscall_method "Indirect"; 
    beacon_gate {  
         VirtualAlloc;  // Only VirtualAlloc is proxied via BeaconGate 
    }  
} 

Building On Top Of BeaconGate with Custom Sleepmask BOFs

In the previous section we covered BeaconGate’s default behaviour. However, the real power comes from building on top of BeaconGate. The potential here is unlimited; your own gate can implement novel system call techniques, spoof the call stack, fake the return address, utilize SilentMoonwalk, etc. – all while Beacon is masked (if desired). 

Beacon provides the higher level WinAPI (i.e., VirtualAlloc as opposed to NtAllocateVirtualMemory) to provide as much flexibility as possible. Hence, you can implement your own system call/gate for the NT function (e.g. something like RecycledGate) or unhook and call the original WinAPI function with a spoofed call stack etc. 
 
To demonstrate the possibilities, below is a quick PoC of BeaconGate implementing return address spoofing (while Beacon is masked) for Beacon’s InternetOpenA calls:

Fig 1. A screenshot showing BeaconGate implementing return address spoofing. A breakpoint has been triggered in windbg on WININET!InternetOpenA and the calling thread’s call stack is displayed (via the knf command). The call stack shows the calling function as WININET!UrlCacheFindFirstEntry, however this has been spoofed; the call is being proxied via the Sleepmask. Furthermore, the PowerShell terminal displays a YARA scan on the debugged process after this breakpoint has been hit. This reveals no hits, as Beacon is masked while the InternetOpenA call is made (prior to 4.10 Beacon would be exposed in memory at this point).

This example demonstrates that it is now possible to evade detection logic for anomalous InternetOpen/ConnectA calls, such as in MalMemDetect. However, the same technique could be applied to all the supported WinAPI functions. Additionally, the supported APIs cover the majority of detection use cases for anomalous call stacks from both ETW TI events and Kernel callbacks (with some exceptions, e.g. CreateProcess).

Sleepmask-VS

The Sleepmask is an essential part of Cobalt Strike’s evasion strategy. It started out as a tiny Position Independent Code (PIC) blob that could be stomped into Beacon. However, it has since grown into a fully featured BOF. While this change provided a huge amount of flexibility, its rapid growth has also made the Sleepmask quite complex, which means (in our experience) users often shy away from building on it.  

Furthermore, in dogfooding BeaconGate internally, we found it difficult to write custom Sleepmask BOFs with existing tooling. Therefore, one of the aims of this release was to lower the barrier to entry for writing custom Sleepmasks. 

As a result, we have updated our public BOF-VS template to support Sleepmask and BeaconGate functionality. This means our BOF-VS template is now a one-stop shop for writing all the various BOFs used by Cobalt Strike.  

Additionally, to provide a working Sleepmask BOF example we have also published Sleepmask-VS. This is a simple Sleepmask example that demonstrates how to use the BOF-VS template to write Sleepmask/BeaconGate BOFs. This repository will grow over time to contain a variety of different examples. In addition, it will be used as the accompanying Sleepmask BOF to the UDRL-VS loaders so that we can provide examples of how to use Cobalt Strike’s most important evasion tools together. 

Sleepmask-VS contains runMockedSleepMask() and runMockedBeaconGate() to make it easy to create custom Sleepmask/BeaconGate BOFs. These two functions are similar to the original BOF-VS runMocked() function, except they create a mock in-memory Beacon as well as some example heap memory. This allows users to step through their Sleepmask in the debugger and see the effects of their masking. These functions also allow users to provide their desired Malleable C2 settings to mimic the behaviour of Beacon’s default loader. 

An example call to runMockedSleepMask can be seen below:

int main(int argc, char* argv[]) { 
    bof::runMockedSleepMask(sleep_mask, 
        { 
            .allocator = bof::profile::Allocator::VirtualAlloc, 
            .obfuscate = bof::profile::Obfuscate::False, 
            .useRWX = bof::profile::UseRWX::False, 
            .module = "", 
        }, 
        { 
            .sleepTimeMs = 5000, 
            .runForever = true, 
        } 
    ); 
    return 0;

Sleepmask-VS also provides an example of how to use the runMockedBeaconGate function. This function replicates Beacon invoking the Sleepmask with a BeaconGate call and also passes in mocked Beacon/heap memory to be masked. This makes it easy for operators to start developing their own custom gates.  

For example, the sample code below demonstrates proxying a VirtualAlloc call through BeaconGate:

// Create a FUNCTION_CALL structure
FUNCTION_CALL functionCall = bof::mock::createFunctionCallStructure( 
    VirtualAlloc,   // Function Pointer 
    WinApi::VIRTUALALLOC, // Human readable WinApi enum 
    TRUE, // Mask Beacon? 
    4, // Number of Arguments (for VirtualAlloc) 
    GateArg(NULL),  // VirtualAlloc Arg1/Rcx 
    GateArg(0x1000), // VirtualAlloc Arg2 /Rdx 
    GateArg(MEM_RESERVE | MEM_COMMIT), // VirtualAlloc Arg3/R8 
    GateArg(PAGE_EXECUTE_READWRITE) // VirtualAlloc Arg4/R9 
); 

// Run BeaconGate 
bof::runMockedBeaconGate(sleep_mask, &functionCall, 
    { 
        .allocator = bof::profile::Allocator::VirtualAlloc, 
        .obfuscate = bof::profile::Obfuscate::False, 
        .useRWX = bof::profile::UseRWX::False, 
        .module = "", 
    }
); 

// Free the memory allocated by BeaconGate 
VirtualFree((LPVOID)functionCall.retValue, 0, MEM_RELEASE);

The FUNCTION_CALL structure contains all the information required to execute an “atomic” function call and is what is passed by Beacon to the Sleepmask as part of BeaconGate. The createFunctionCallStructure is a helper function which makes it easy to generate these structures for use in your own code. Lastly, the bof::runMockedBeaconGate function will call the Sleepmask entry point and pass your FUNCTION_CALL for it to be executed by BeaconGate. For more details on the exact API usage and function definitions, see Sleepmask-VS
 
There will be a further deep dive on BeaconGate post-release that will demonstrate how to get started developing your own custom TTPs and demonstrate a few different open-source gates. As a taster, the return address spoofing PoC demonstrated previously was developed using Sleepmask-VS. 

Additionally, we also identified that inline assembly was an important capability to port low-level techniques such as RecycledGate to BeaconGate. Hence, we will also discuss how to do this in the upcoming blog.  

It is possible to use ld with Sleepmask-VS to do this currently via combining two different object files but is not ideal (NB MSVC’s link.exe does not support this). Hence you will need to compile your assembly stub into a separate object file (i.e. via MASM/ml64.exe) and then manually combine it with the sleepmask.x64.o produced by Sleepmask-VS:

> ml64.exe /Fo asm_funcs.o /c asm_funcs.asm 
Microsoft (R) Macro Assembler (x64) Version 11.00.50727.1 

Copyright (C) Microsoft Corporation.  All rights reserved. 

Assembling: asm_funcs.asm 
[ ... ] 

(In WSL/Linux) 
> ld --oformat pe-x86-64 -r sleepmask.x64.o asm_funcs.o -o sleepmask.x64.o

Lastly, to enable operators to get the most out of BeaconGate, we have bumped the max Sleepmask BOF size.

Beacon Object File Updates

We have also made a number of changes to help users get more out of BOFs in this release. 

We have expanded the BOF API to expose Beacon’s system call functionality to BOFs. The new APIs take the form of Beacon<WinAPI>, i.e. BeaconVirtualAlloc. We added new APIs in order to give operators as much flexibility as possible. Hence, users can ‘opt in’ to using Beacon’s sys call API if desired, as opposed to transparently linking and not having a choice. 

As an example, the code below will route the VirtualAlloc call through Beacon’s system call code:

void go(char* args, int len) {  
    PVOID pMemoryBuffer = NULL;  
    pMemoryBuffer = BeaconVirtualAlloc(NULL, 8, MEM_COMMIT, PAGE_READWRITE);  
} 

The sys call method used by Beacon will be the option configured via the Malleable C2 syscall_method option or via the runtime syscall-method command. 

Furthermore, these new BOF APIs are also supported by BeaconGate. Hence, if you have your own custom gate configured, you can proxy WinAPI calls from a BOF to be executed by your custom gate. This gives operators complete control over Beacon’s API usage/footprint and reduces BOF code bloat.
 
Further details on the new APIs can be found in the documentation here. Additionally, an example BOF can be found in the bof_template in the public Cobalt Strike GitHub repository, which demonstrates a trivial example of using the new APIs to allocate and free memory. 
 
A new Beacon API, BeaconGetSyscallInformation, has also been added, which means you can now implement any syscall resolving technique you want in your loader, pass the resolved syscall info to Beacon via Beacon User Data(BUD), and then retrieve it from within a BOF. This is intended to reduce the bloat from within BOFs of having to repeatedly calculate syscall info yourself. For more detailed information on the API see the documentation here.  
 
As a note, the bud-loader in UDRL-VS demonstrates how to pass resolved syscall info to Beacon via BUD and our public BOF-VS template contains a mocked BeaconGateSyscallInformation API, making it easy to integrate into your own BOFs. 
 
Lastly, the BOF API limit has been expanded to 128 ( 😉 @s4ntiago_p ).

Sleepmask Redux

From talking to customers, we are aware of confusion around the interoperability between the Sleepmask and UDRLs. The confusion stems from the fact that transformations set in the Malleable C2 profile are not applied to Beacons generated via the BEACON_RDLL_GENERATE hook. In contrast, the Sleepmask settings are statically calculated from the Malleable C2 profile and Beacon DLL irrespective of whether you’re using a UDRL.

For example, stage.obfuscate can be used to obfuscate parts of the Beacon DLL and it also instructs the default loader not to copy the PE header into memory as part of the reflective loading process. However, this does not apply to Beacons with UDRLs. This is expected behaviour, as it puts the developer in the driving seat (i.e., the UDRL must know how Beacon has been obfuscated in order to reverse it). However, the Sleepmask will use stage.obfuscate to calculate what sections it needs to mask, and hence will assume there is no PE header present. This is an obvious source of issues if a UDRL does not honor the Malleable C2 profile settings.

The introduction of BeaconGate meant we had to make some breaking changes to the Sleepmask API and this also provided us with an opportunity to address this issue. In CS 4.10, we have expanded Beacon User Data to include a new ALLOCATED_MEMORY structure. This structure can be used to pass information to Beacon about (dynamic) memory allocated by the reflective loader. For example, Beacon’s location in memory and the address of each loaded section. This design means that Beacon can now pass the Sleepmask accurate section information, at run time, which greatly simplifies Sleepmask design. This feature also opens up a lot of possibilities, as any memory allocated by the loader can now be automatically masked by the Sleepmask.  

For specific implementation details, the bud-loader and the obfuscation-loader in UDRL-VS contain comprehensive examples to demonstrate how to use ALLOCATED_MEMORY in a UDRL. The design was heavily influenced by Microsoft’s own abstractions around Virtual Memory and we are planning to release a deep dive on how to get the most out of this feature in the coming months. 

It is important to note that configuring the Sleepmask via the ALLOCATED_MEMORY / BEACON_USER_DATA structures is the intended workflow as of the 4.10 release. Beacon will still try to mask based on a best effort basis if you do not pass this information (i.e., statically), but it may not work as expected. However, in future releases we plan to remove backwards compatibility. This means that UDRLs must pass allocated memory to Beacon in order to use the Sleepmask.

User Defined BOF Memory

We have also added support to the ALLOCATED_MEMORY structure for passing Beacon user defined memory which can be used for BOF and Sleepmask execution. Therefore, if you want full control over how the memory used for BOFs is allocated, you can employ your own custom allocation technique in a UDRL and pass this information to Beacon. This now enables operators to employ techniques such as module stomping when loading/executing BOFs. The bud-loader in UDRL-VS contains an example of how to pass user defined memory to Beacon for use with inline-execute and the Sleepmask.

Postex Kit

Another new addition in 4.10 is the Postex kit. The Postex kit opens up Beacon’s job architecture to allow operators to develop their own post-ex DLLs for interoperability with Beacon. Hence, if you need to quickly PoC a custom keylogger/session monitor/TGT monitor etc., you can use the Postex kit to develop a DLL which can plug seamlessly into Beacon’s existing jobs architecture. Furthermore, DLLs generally are simpler to develop and unit test for complex/long running tasks and suffer from none of the pain points and limitations which can make BOF development difficult. 

It is important to highlight that the Postex kit also supports post-ex UDRLs (introduced in 4.9), via the POSTEX_RDLL_GENERATE Aggressor hook, and Process Injection hooks, via the PROCESS_INJECTION_* Aggressor hooks. This gives operators full control over the whole post-ex attack chain, in terms of custom capabilities and how they are injected/loaded into memory. The process injection kit is still (in our experience) under utilised and it is worth checking out this blog for more information on how to configure it. 
 
The Postex kit itself is primarily intended to serve as a template for development. It is a Visual Studio solution that can be found in the Arsenal kit and makes it easy to develop custom long running post-ex DLLs that return data back to Beacon over a named pipe. It includes a library of functions which provide an abstraction over the job architecture allowing operators to focus purely on developing custom tooling. As a note, the Postex kit has been designed in a way that makes it possible to provide support for alternate methods of communication in future releases (i.e. not just named pipes). 
 
To support the Postex kit, a new execute-dll command has been added to the Beacon console. This will take a custom post-ex DLL provided by the operator, prepend a post-ex loader to it, and execute it as a new job. This job can be seen via the normal Cobalt Strike jobs output and killed via the jobkill command.  

Additionally, the execute-dll command also supports passing arguments. These are automatically patched into a separate memory allocation and can be accessed from within the post-ex DLL via the postexData->UserArgumentInfo.Buffer (See the Postex kit example DLL for more information). 
 
However, one of the most powerful features of Cobalt Strike is its scripting language, Aggressor Script, which provides a huge amount of flexibility to operators. Hence, we have also added a new Aggressor Script function beacon_execute_postex_job.  

This works in a similar way to execute-dll except it supports passing BOF style arguments (i.e. via Aggressor Script’s bof_pack function) to your custom post-ex DLL. This enables operators to use the familiar Beacon Data Parser and Beacon Data Format APIs from within their post-ex DLLs. A trivial example of beacon_execute_postex_job is shown below:

        $argument_string = "example argument string"; 
        $packed_arguments = bof_pack($beacon_id, "iz", 4444, $argument_string); 
 
        # example: run the postex task 
        beacon_execute_postex_job($beacon_id, $pid, $postex_dll, $packed_arguments, $null);

From the post-ex DLL, the packed arguments can then be parsed via the standard BeaconDataParse/Extract APIs. As ever with Aggressor Script, there is huge scope for customisation here. For example, if desired, you could also stomp arguments directly into the post-ex DLL and retrieve them yourself. 
 
Furthermore, we have also added another Aggressor Function, bjob_send_data. This means that operators can now send arbitrary data to a custom post-ex job via a named pipe. An example demonstrating this can be seen below:

        # String to send to the post-ex dll         
        $pipe_data= "example pipe data string"; 
 
        # Send the string to target post-ex job over named pipe       
        bjob_send_data($bid, $jid, $pipe_data);    

This provides a huge amount of flexibility in your tooling. As a quick example, below is a screenshot of a custom inject-assembly PoC that we developed internally to dogfood the Postex kit:

Fig 2. A screenshot showing a custom inject-assembly PoC demonstrating use of the Postex kit. This example post-ex DLL waits for arguments to be passed via a named pipe and repeatedly executes the DotNetHelloWorld.exe assembly with the passed arguments. 

The inject-assembly command above uses beacon_execute_postex_job under the hood to inject the PoC post-ex DLL into a remote process along with the user provided .NET assembly. The post-ex DLL then sits listening on a named pipe waiting for the user to send some arguments via bjob_send_data. In the example above we’ve run the target assembly twice using a different set of arguments.  

The Postex kit also contains an example post-ex DLL (which must be used in conjunction with the postex-example.cna) to get up and running to start developing your own tooling. This example post-ex DLL is primarily intended to demonstrate different aspects of the Postex kit in one place. Additionally, the Postex kit documentation can be found here. As with other new 4.10 features, we plan to release a deep dive on the Postex kit (and the custom inject-assembly DLL demonstrated above) in the upcoming months. 

Callbacks Update

Callbacks were introduced in the 4.9 release and have been extended in 4.10 to provide a straightforward method for users to interact with the Postex kit. 
 
For example, you can also call beacon_execute_postex_job and provide a custom callback function as the last argument, which will be invoked each time the job checks in. The custom callback is passed a new %infomap hash map, which contains various information, such as the status of the job (i.e. has the job just been registered, completed, is it sending output etc.) and its job id. The key point is that the callback has access to the job id (once the job has been registered) which can then be used to send further data via bjob_send_data().

A quick example of this functionality is shown in the cna script snippet below, which demonstrates sending further data to a custom post-ex DLL via a callback once it has been registered:

$pipe_data= "example pipe data string"; 
 
# define custom callback function 
        $callback = lambda({ 
            local('$bid $data $result %info $type'); 
            this('$jid'); 

            # get all arguments passed to lambda 
            ($bid, $result, %info) = @_; 
 
            # check the status/type of the job 
            $type = %info["type"]; 
 
            # if the job is registered, send data via the pipe 
            if ($type eq 'job_registered') { 
                $jid = %info['jid']; 

                # send the pipe data string to the DLL 
                bjob_send_data($bid, $jid, $pipe_data);     
            } 

            # print output to the console 
            else if ($type eq 'output') { 
                bjoblog($1, $jid, "Received output:\n" . $result); 
            }  

        }, $pipe_data=> $pipe_data; 

        # run the postex task... 
        beacon_execute_postex_job($beacon_id, $pid, $postex_dll, $packed_arguments, $callback);

Job Browser and Console

To enable you to get the most out of the Postex kit we have given Cobalt Strike’s job UI an update with the introduction of the new job browser and job console. This has also been a common pain point for customers, as prior to 4.10 it was difficult to map job output to a specific job id. 

The job browser is a new dialog that enables a user to work with jobs being run by one or more Beacons. It can be opened by selecting one (or multiple) Beacons, right-clicking, and selecting the Jobs option from the popup menu, or by selecting View -> All Jobs from the main menu.  

The job browser shows a complete list of every job tasked to the given Beacon(s) and shows various information such as the Job ID (JID), its status (i.e., if it has completed or still running), its description, and start and stop times. An example of the job browser is shown below:

Fig 3. A screenshot showing the new job browser UI.

The job console is another new dialog that allows a user to work with the output of a specific job. It is invoked by right-clicking a target job in the job browser and selecting open from the popup menu. An example of the job console for a portscan job is shown below: 

Fig 4. A screenshot showing the new job console UI for a portscan.

It is also possible to hide the output from selected jobs from the Beacon console. The output is then redirected to only appear in the job console window. This improves the user experience for long running post-ex jobs, such as SharpHound, as it means the output is all in one place and allows users to continue to operate in the Beacon console window. Additionally, if you need to revisit the output of a specific job you can now open the job console as opposed to having to trawl up through the Beacon console history.

Host Rotation Updates

Cobalt Strike’s host rotation was introduced in 4.3 to provide operators with greater control over how the HTTP/S and DNS Beacons cycle through hosts. While this offered additional flexibility over C2 comms, host rotation suffered from two main problems:  

  • Unresponsive hosts were included in the rotation strategy regardless of whether they were responsive. Hence, if one out of three hosts were failing, 1/3 check-ins would repeatedly fail.
  • There was no way to update host information on an active Beacon in order to change or disable failing hosts.

In Cobalt Strike 4.10, we have made a number of improvements to host rotation to address these issues:

  • Beacon will now automatically disable (“hold”) hosts that have failed, resulting in a far more reliable connection.
  • A new Beacon command, beacon_config, and its corresponding Aggressor Script function, bbeacon_config, have been added to make it possible to query and update the host information for an active Beacon. Hence, it is now possible to hot swap C2 hosts (via adding a new host or updating an existing one).
  • Operators can enable notifications for failed connections making it much easier to debug host rotation issues.

As an example, we can query an active Beacon’s current host information via the new beacon config host info command, as demonstrated below:

Fig 5. A screenshot showing the new beacon_config command in action. The output shows that only one host (example.yyy) has currently been configured for RoundRobin host rotation on the active Beacon.

The screenshot below shows a new host (example.zzz) being added at run time:

Fig 6. A screenshot showing hot swappable C2 in action. A new host (example.zzz) has been dynamically added, which will now be used as part of the existing RoundRobin host rotation strategy. 

Wireshark shows that Beacon immediately starts using the new host to check-in:

Fig 7. A TCP Stream in Wireshark demonstrating Beacon using the dynamically added host (example.zzz) to check-in. 

As a note, the one restriction for adding hot swappable C2 hosts is that the URI must have previously been defined in the Malleable C2 profile

Lastly, some web/proxy servers (when blocking requests) return a 200 (OK) status without any additional data in response to Beacon check-ins. Beacon assumes this is a valid “nothing to do” response and hence will not trigger a failover rotation to the next host. To address this issue, users are now able to customise the format of the “nothing to do” task so Beacon can determine whether a given response is valid. This can be enabled via the new Malleable C2 profile options, data_required and data_required_length.  

For more information on all of these host rotation updates, see the documentation here.

Java Support Updated To Java 11

As referenced in the Cobalt Strike 4.9 release blog post, we have changed the minimum supported version of Java from Java 8 to Java 11. If you attempt to run the client using an older version of Java, you will see the following error:

Fig 8. A screenshot displaying the error message presented when the Cobalt Strike 4.10 client is run with an older version of Java.

To avoid any issues, please make sure that the version of Java in your environment is at least Java 11 before downloading and running Cobalt Strike. For more guidance, see the Cobalt Strike installation guide.

Product Security Updates

Product security controls have been updated as part of the 4.10 release. In particular, the Linux package now splits the client and server out into separate packages, with each requiring a specific authorization file. This has resulted in a breaking change to the way Cobalt Strike updates, which you may need to account for in any bespoke deployment scripts. 

Additionally, Fortra has partnered with Europol, the UK National Crime Agency, and several other private partners to protect the legitimate use of Cobalt Strike. In June, 593 IP addresses were taken down to disable stolen, unauthorized versions of Cobalt Strike. Fortra and law enforcement will continue to monitor and carry out similar actions as needed. You can read more about the action here.

Additional Updates

In addition, this release also includes updates to System Calls, External C2, as well as numerous quality-of-life (QoL) changes. These QoL updates include: 

  • Improvements to tab completion (including support for custom commands, shift + tab functionality, and case insensitivity).
  • UI Improvements (including better word wrapping for dialogs and a preference to allow users to specify if they want Cobalt Strike opened in a maximized window).
  • Ability to specify the time zone and timestamp format used for logging (configurable via Teamserver.prop). 

To see a full list of what’s new in Cobalt Strike 4.10, please check out the release notes.  

Licensed users will need to download version 4.10 from scratch. The existing 4.9 update application cannot be used to upgrade to version 4.10. 

To purchase Cobalt Strike or learn more, please contact us. 

Introducing the Mutator Kit: Creating Object File Monstrosities with Sleep Mask and LLVM

 

This is a joint blog written by William Burgess (@joehowwolf) and Henri Nurmi (@HenriNurmi).

In our ‘Cobalt Strike and YARA: Can I Have Your Signature?’ blog post, we highlighted that the sleep mask is a common target for in-memory YARA signatures. In that post we recommended using the evasive sleep mask option to scramble the sleep mask at run time and break any static signatures. However, this solves the problem at the cost of introducing further forensic artefacts onto a host and increasing our footprint. A much simpler solution is to mutate the sleep mask each time we compile it to make static signatures redundant.

This blog introduces the mutator kit, which uses an LLVM obfuscator to break in-memory YARA scanning of the sleep mask. In the following sections, we will give a quick background to the mutator kit and then show you how to apply it so that a uniquely mutated sleep mask can be applied every time a payload is exported. 

The mutator kit is available in the Arsenal Kit now. 

Mutator Kit

Typically, given the same source code, compilers will generate the same machine code (I.e. they can be considered, with some caveats, deterministic). As an example, we can build the sleep mask with MinGW and compare the .text sections between different builds. Closer analysis reveals the .text sections are the same:

// [1] Build the sleepmask.
$ ./build.sh 49 WaitForSingleObject true none /tmp/dist 
[ ... ] 
[Sleepmask kit] [*] Compile sleepmask.x64.o 
[ ... ]

// [2] Use objdump to find the text section size.
$ objdump -h sleepmask.x64.o 

 sleepmask.x64.o:     file format pe-x86-64 

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00000200  0000000000000000  0000000000000000  00000104  2**4
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE

// [3] Extract the .text section.
// NB skip is the offset('File off') of the .text section 
// from objdmp and count is the size of the section 
// e.g. python -c 'print(int("104", 16))' == 260.
$ dd if=sleepmask.x64.o of=sleepmask1.bin skip=260 count=512 bs=1 

// [4] Calculate shasum.
$ shasum sleepmask1.bin 
4f7813a6aae018a4cf6a78040d9c20024b5a83da  sleepmask1.bin 

// [5] Repeat the steps again to build another sleep 
// mask and extract/hash the .text section - the hash is identical!
$ shasum sleepmask2.bin 
4f7813a6aae018a4cf6a78040d9c20024b5a83da  sleepmask2.bin 

This is clearly a problem when attempting to hide from YARA signatures which look for specific op code patterns. An example of this can be found in the following YARA signature, which looks for the following op code pattern in the default sleep mask (0x4C 0x8B 0x53 0x08 etc.):

mov r10, [rbx+0x08]  
mov r9d, [r10]  
mov r11d, [r10+0x04]  
lea r10, [r10+0x08]  
test r9d, r9d  
jnz 0x0000000000000007  
test r11d, r11d  
jz 0x0000000000000035  
cmp r9d, r11d  
jnb 0xFFFFFFFFFFFFFFE8  
mov rdi, r9  
mov r8, [rbx]

As the sleep mask is visible in memory when Beacon is sleeping, this can be a trivial detection opportunity, as demonstrated below:

Fig 1. The results of scanning Elastic’s Cobalt Strike YARA rules against a process running Beacon with the default sleep mask enabled. The single hit is for the rule Windows_Trojan_CobaltStrike_b54b94ac, which as explained above, looks for code belonging to the default sleep mask.

Ideally, we would like to compile the sleep mask and get a unique build each time, in order to make it impossible to produce high fidelity YARA signatures at scale. A common technique for mutating code is using LLVM, of which there are numerous well documented open-source projects (for example, see 0xpat’s blog).  Typically, these make use of LLVM Intermediate Representation (IR) code to apply a number of transformation passes to produce obfuscated / mutated machine code. 
 
Our mutator kit adopts a similar approach and contains four obfuscation passes which are based on eShard’s obfuscator-llvm plugin.  This in turn is based on mutations introduced in the research by Pascal J., et al.  These passes include: 

  • Substitution- Replace binary operators with functionally equivalent ones 
  • Bogus- Insert fake control flow blocks 
  • Code Flattening- Aims to break higher level code/control flow structure
  • Basic-block Splitting- Aims to break higher level code/control flow structure 

More information on these can be found in the research paper referenced above. Note that we are not overly concerned with making the sleep mask hard to reverse engineer; we are primarily interested in breaking static signatures. Hence, we will not go into any more detail into creating obfuscation passes for LLVM. However, the mutator kit README.md contains a number of references should you wish to fork our obfuscator-llvm repo and create your own passes. 

Usage

We have provided two methods to install the mutator kit:

  1. Installing the requirements directly (referred to as ‘native’)
  2. Docker

As LLVM plugins require a specific LLVM version and environment, docker makes it easy to handle the required setup and obfuscation plugin compilation. However, if you do not wish to use docker, scripts are provided to bootstrap this process for you (I.e. method 1/native). Additionally, both docker and LLVM can be complicated to use on Windows, so the native method has the advantage of being simple to run on Windows via the Windows Subsystem for Linux (WSL). This blog will assume installation via the native method but see the README.md in the mutator kit repo for more guidance on using the provided docker container. 

After installing the requirements, the primary workflow is to load the sleepmask_mutator.cna script into Cobalt Strike. This will automatically apply a mutated sleep mask to your exported Beacon payloads (see the Cobalt Strike Client section below). This script abstracts away the low level details of the mutator kit and makes it very easy to get up and running. However, in order to demonstrate some of the functionality of the mutator kit we will use the command line in this section.

The mutator kit can be manually invoked from the command line via the mutator.sh script. The script takes the following arguments:

mutator.sh <target architecture> <clang args>

To demonstrate the obfuscation passes in action we can take the following simple C program (example.c):

void go() { 
    int a = 5; 
    int b = a + 6; 
}

We can compile this with only the substitution pass enabled with the following command:

$ OBFUSCATIONS=substitution mutator.sh x64 –c example.c -o example.o

One helpful way of demonstrating the effect of specific obfuscation passes is by generating LLVM IR code and comparing it to the original (unmutated) code. This is demonstrated in the example below:

// Build example.c with only the substitution pass enabled 
$ OBFUSCATIONS=substitution mutator.sh x64 -emit-llvm -S example.c -o example_with_substitutions.ll 

// Compare the original LLVM IR code with the mutated version 
$ diff --color example.ll example_with_substitutions.ll  

12,13c12,16 

// example.ll 

<   %4 = add nsw i32 %3, 6 
<   store i32 %4, i32* %2, align 4 

// example_with_substitutions.ll 

>   %4 = sub i32 %3, 1041996456 
>   %5 = add i32 %4, 6 
>   %6 = add i32 %5, 1041996456 
>   %7 = add nsw i32 %3, 6 
>   store i32 %6, i32* %2, align 4 

This trivial example demonstrates the impact of only including the substitution pass on the generated code. 
 
More detailed documentation on LLVM IR is available, but with a basic understanding this can help debug any problems and to sanity check that specific obfuscation passes have been applied correctly. This makes for a quicker feedback loop, rather than opening the generated object file in IDA.

Having demonstrated the basic usage of the mutator kit, we can now apply it to the sleep mask with the following command:

$ mutator.sh x64 -c -DIMPL_CHKSTK_MS=1 -DMASK_TEXT_SECTION=1 -o sleepmask.x64.o src49/sleepmask.c 
Obfuscation flattening enabled 
Obfuscation substitution enabled 
Obfuscation split-basic-blocks enabled 

Note that the -D* arguments are used to add an implicit #define to the sleep mask which are consistent with the options provided in the build.sh script within the sleep mask kit. For more information on clang command line arguments see the Clang documentation. Additionally, the –DIMPL_CHKSTK_MS=1 flag is needed to avoid any issues when loading the sleep mask into Cobalt Strike. 

By default, only three passes are applied (flattening, substitution, and split-basic-blocks) as bogus can increase the code size. However, you can override the default behaviour by passing in the OBFUSCATIONS environment variable (OBFUSCATIONS=flattening,substitution,split-basic-blocks,bogus mutator.sh x64.. etc.). See the README.md included in the mutator kit for more guidance. 

At this stage, we can compare the default sleep mask to an LLVM mutated sleep mask. The screenshot below shows the call graph for the same function in the default sleep mask (left) and a mutated one (right): 

Fig 2. A comparison of the same function for the default sleep mask (left) and a LLVM mutated sleep mask (right).
Fig 2. A comparison of the same function for the default sleep mask (left) and a LLVM mutated sleep mask (right). 

Cobalt Strike Client

The example above has demonstrated basic use of the mutator kit and the impact it has on compiled code. However, to make experimenting with the mutator kit and sleep mask as simple as possible we have included a cna script (sleepmask_mutator.cna) which adds a menu item allowing you to configure the mutator kit through the Cobalt Strike client. This option is demonstrated in the screenshot below:

Fig 3. Screenshot showing the use of the sleepmask_mutator.cna script. This allows you to configure the sleep mask and desired obfuscation passes from the Cobalt Strike GUI. 

This menu allows you to: 

  • Select what obfuscation passes to apply 
  • Select whether you want to rebuild the sleep mask for every payload export

The script will then automatically apply a mutated sleep mask based on these options to your exported Beacon payloads. Therefore, if desired, it is possible to ensure that every time you export a payload, a different LLVM mutated sleep mask will automatically be applied. The script will also ensure that an error is thrown if any problems are encountered to guarantee a default sleep mask is not accidentally applied (and which could subsequently endanger OPSEC). 

The output of the sleepmask_mutator.cna script can be seen in the screenshot of the Script Console below when generating a raw HTTP Beacon DLL: 

Fig 4: The Script Console showing the sleepmask_mutator.cna script in action. As part of payload generation, a new LLVM mutated sleep mask is built and automatically applied to a raw Beacon DLL.

With our mutated sleep mask, we can re-run the YARA scan against a process hosting beacon and reveal that no hits are found:

Fig 5:  The results of scanning Elastic’s Cobalt Strike YARA rules against a process running Beacon with a LLVM mutated sleep mask. As we have mutated the sleep mask there are now no YARA hits for the sleep mask when Beacon is sleeping.

We can also use the mutator kit to compile other BOFs. You may want to consider doing this for a higher level of OPSEC. As a note, most open source BOFs are intended to be compiled with MinGW. Hence, when compiling BOFs with the mutator kit it is highly likely you will encounter compiler issues which you will need to resolve on your own.

Conclusion

This blog has introduced the mutator kit which is available in the Arsenal Kit now. This kit is designed with the intention of making the creation of high fidelity YARA signatures targeting the sleep mask in-memory impracticable. With this release you can now generate mutated sleep masks on every payload export, which will fundamentally break pre-canned YARA signatures and provide enhanced OPSEC against in-memory signatures. 

Cobalt Strike and Outflank Security Tooling: Friends in Evasive Places

 

This is a joint blog written by the Cobalt Strike and Outflank teams. It is also available on the Outflank site.

Over the past few months there has been increasing collaboration and knowledge sharing internally between the Cobalt Strike and Outflank R&D teams. We are excited about the innovation opportunities made possible by this teamwork and have decided to align Cobalt Strike and Outflank Security Tooling (OST) closely going forward. Although we are actively collaborating, Cobalt Strike will continue to be the industry standard Command & Control (C2) framework, while Outflank Security Tooling (OST) will continue to offer a red team toolbox for all environments containing custom tradecraft that is OPSEC safe, evasive by design, and simple to use. Our vision is that Cobalt Strike and OST together will provide the best red team offering on the planet. 

This blog will provide an update of the technical strategy of each product individually before giving a glimpse into the future of the two combined.

Cobalt Strike

Cobalt Strike is the industry standard Command & Control framework. Following the acquisition of Cobalt Strike by Fortra in 2020, a conscious decision was taken to follow the technical strategy employed by founder Raphael Mudge in taking Cobalt Strike to the next level. The core tenets of this strategy are:

  • Stability: Cobalt Strike must remain reliable and stable; nobody wants to lose their Beacons.
  • Evasion through flexibility: Since its inception, Cobalt Strike has always been an adversary emulation tool. It is designed to enable operators to mimic other malware and the TTPs they desire. Hence, in its default state, Beacon is pretty trivial to detect. This however has never been the point; Cobalt Strike has flexibility built into key aspects of its offensive chain. You can tinker with how Beacon is loaded into memory, how process injection is done, what your C2 traffic looks like etc. We don’t want to bake TTPs into Beacon which become signatured over time (Cobalt Strike’s implementation of module stomping is a good example of this). We want to enable operators to customise Beacon to use their own original TTPs.  Our R&D effort will continue to focus on building in flexibility into all aspects of the offensive chain and to give operators as much control as possible over the TTPs they employ.

Outflank & OST

In September last year Fortra acquired Outflank. Outflank is a security consultancy based in Amsterdam with deep expertise in red teaming and a proven track record of world class research. You may know the team from their work on Direct Sys Calls in Beacon Object Files,  various public tools, Microsoft Office tradecraft (derbycontroopersBlackhat Asiabruconx33fcon), or on the red team SIEM Redelk.

In recent years, Outflank has taken its internal research & development and created Outflank Security Tooling (OST).

OST is not a C2 product but a collection of offensive tools and tradecraft, offering:

  • broad arsenal of offensive tools for different stages of red teaming.
  • Tools that are designed to be OPSEC safe and evade existing security controls (AV/EDR).
  • Advanced tradecraft via understandable interfaces, instead of an operator needing to write or compile custom low-level code.
  • knowledge sharing hub where trusted & vetted red teamers discuss tradecraft, evasion, and R&D.
  • An innovative cloud delivery platform which enables fast release cycles, and complex products such as ‘compilation as a service’, while still allowing any customer to run and manage their own offensive infrastructure. Although OST is offered as a cloud model, it is possible to use the offensive tools and features offline and in air gapped environments.

Hence, it is a toolbox for red teamers made by red teamers, enabling operators to work more efficiently and focus on their job at hand. It contains features such as: a payload generator to build sophisticated artifacts and evade anti-virus / EDR products, a custom .NET obfuscator, credential dumpers, kernel level capabilities, and custom BOF implementations of offensive tools (such as KerberosAsk as an alternative to Rubeus).

Going forward, OST will continue to provide a full suite of bleeding-edge tools to solve the main challenges facing security consultants today (i.e., on prem/workstation attacks, recon, cloud etc.). Outflank’s R&D team remain active in red teaming engagements and so all these tools are being continually battle tested on live red team operations. Furthermore, OST will continue to grow as a vetted knowledge hub and an offensive R&D powerhouse that brings novel evasion, tradecraft, and tooling for its customers.

Combining forces: Cobalt Strike and Outflank Security Tooling

Having outlined the technical strategies of Cobalt Strike and OST above, it is clear that both products naturally complement each other. Therefore, we have decided to align the two products closely going forward.

In our joint roadmap, both products will stay true to their visions as outlined above. Cobalt Strike will continue to push the boundaries of building flexibility into every stage of the offensive chain, e.g. via technologies such as BOFs, and OST will continue to leverage this flexibility to deploy novel tradecraft, as well as continuously releasing stand-alone tools.

Furthermore, both teams are already cooperating extensively, which is further advancing innovation and product development. Outflank’s experience in red teaming is providing valuable insight and feedback into new Cobalt Strike features, while joint research projects between the Cobalt Strike and Outflank R&D teams is already generating new TTPs. Together, we are regularly evaluating offensive innovation and adjusting the roadmap of both products accordingly. This ensures that both Cobalt Strike and OST remain cutting edge and that any new features are designed to integrate seamlessly between the two.

This approach is already bearing fruit; OST recently released a feature focusing on Cobalt Strike Integrations, specifically custom User Defined Reflective Loaders, which we will explore in more detail below.


Case Study : User Defined Reflective Loaders

Cobalt Strike has relied on reflective loading for a number of years now and we have endeavoured to give users as much control over the reflective loading process as possible via Malleable C2 options. However, we always want to push the boundaries in terms of building flexibility into Cobalt Strike so that users can customize Beacon to their liking. This was why we introduced User Defined Reflective Loaders (UDRLs). This enables operators to write their own reflective loader and bake their own tradecraft into this stage of the offensive chain. Furthermore, we recognise that UDRLs can be challenging to develop, which is why we started our own blog series on UDRL development (with a second post on implementing custom obfuscation dropping soon).

As long-term Cobalt Strike users, Outflank also recognised the complexities and time constraints that red teams face when developing custom UDRLs. Hence, they decided to put their own experience and R&D into developing novel UDRLs as part of their Cobalt Strike Integrations feature on OST, as shown below:

Figure 1. The Cobalt Strike Integrations page in OST.

With this feature, it is now possible in OST to stomp a custom UDRL developed by Outflank onto a given Beacon payload. There are currently two custom loaders available and more are in the pipeline. Most pertinently, operators do not need to get into the weeds with Visual Studio/compilers, while still being able to use advanced UDRLs that are OPSEC safe and packed with Outflank R&D.

Bypassing YARA signatures

Furthermore, OST will also check the stomped Beacon payload against a number of public YARA signatures and automatically modify Beacon to bypass any triggering rules, as demonstrated below:

Figure 2. The workflow for stomping a custom UDRL in OST. Notice that the left column (’Pre-processing‘) shows the YARA rules which flag on the Beacon payload before any modifications are made. The column on the right (’Post-processing‘) shows that these rules no longer trigger after OST has made its modifications.

We have previously blogged about YARA signatures targeting Beacon and so this is an important ‘evasion in depth’ step built into payload generation within OST.

Once Beacon has been equipped with a custom UDRL, and YARA bypasses have been applied, the payload can be seamlessly integrated with other OST features. For example, we can import the new payload into OST’s payload generator to create advanced artifacts which can be used for phishing, lateral movement, or persistence. This whole workflow is demonstrated below:

Recording of the User Defined Reflective Loader feature as available in OST

This feature is a great example of the joint roadmap in action; both the UDRL stomper and the YARA module originated from collaboration and shared knowledge between the CS and Outflank teams.


The Road Ahead

  • Novel tradecraft: The UDRL and YARA integration is just the first step. OST’s Cobalt Strike integrations will be further extended with new features, such as custom sleep masks and additional YARA and OPSEC checks. This allows customers of both OST and Cobalt Strike to utilise advanced tradecraft and the flexibility of Cobalt Strike without needing to write low level code.
  • Better user workflows: Instead of manually downloading custom BOFs/tools from OST, we are working on implementing a ‘bridge’ between OST and Cobalt Strike. This bridge would also allow users to upload Beacons to OST and generate advanced payloads quickly; allowing for smoother and more efficient workflows.
Figure 3. Current proof of concept of the OST bridge being worked on
  • New approaches of software delivery: OST has taken a unique approach in offensive software compilation & distribution, utilising just-in-time compilation and anti-piracy via its cloud delivery model. In due course Cobalt Strike will start leveraging a similar approach as OST; enabling new possibilities and evasion techniques within Beacon. The first step of this will be to migrate Cobalt Strike to a new download portal.
  • Team collaboration: Lastly, the OST and Cobalt Strike teams are increasingly collaborating on a number of low-level areas. These deep technical discussions on evasion and novel TTPs between hands-on red teamers, offensive R&D members, and the Cobalt Strike developers provides valuable feedback and accelerates product development.

Closing Thoughts

We hope that this blog provides an informative update to the technical strategy of both products going forward. In summary:

The Outflank and Cobalt Strike teams are cooperating to get the most value for our customers. Both Cobalt Strike and OST will stay close to their roots: Cobalt Strike will remain focused on stability and flexibility while OST offers a broad arsenal of offensive tradecraft. Furthermore, the collaboration between the two teams will enable enhanced product innovation and ensure that new features for both products are designed to work seamlessly together.

If you are interested in either Cobalt Strike or OST please refer to Cobalt Strike’s product info and demo video or OST’s product info and demo videos for more info. Cobalt Strike and OST bundles are available now and you can request a quote here.

Cobalt Strike and YARA: Can I Have Your Signature?

 

Over the past few years, there has been a massive proliferation of YARA signatures for Beacon. We know from conversations with our customers that this has become problematic when using Cobalt Strike for red team engagements and that there has been some confusion over how Cobalt Strike’s malleable C2 options can help.  
 
Therefore, this blog post will outline the OPSEC considerations when using Beacon with respect to in-memory YARA scanning and suggest a malleable C2 profile which should give robust evasion against these types of defensive techniques. 
 
As a TL;DR, to be OPSEC safe against in-memory YARA scanning you should: 

  • Ensure you are using the sleep mask and have enabled the evasive sleep found in the Arsenal kit (or alternatively use a modified/custom sleep mask) 
  • Enable stage.cleanup 
  • Strongly consider setting stage.obfuscate to true (or implement similar functionality in a UDRL) 
  • Be wary when using post-ex reflective DLLs (dependent on the security controls in place) as they currently fail to clean up memory and hence are ripe for signatures (we are working on fixing this) 

YARA Signatures

One of the quick wins for defenders when attempting to obtain detection coverage for a given malware family are YARA rules. Once a sample is obtained, signatures can be quickly generated and then deployed at scale. An excellent blog by Elastic on a methodology for doing this can be found here: https://www.elastic.co/blog/detecting-cobalt-strike-with-memory-signatures.  

Typically, these rules/signatures are made up of either strings found in the target binary or static bytes identified from reversing specific functionality (e.g. XOR/encryption routines etc.). Note that these rules can also be as simple as finding some arbitrary byte sequence that appears to be consistent between versions and that, say when run against Virus Total (e.g. via VTGrep:  content:{4D 5a etc..}), reliably identifies the target family. If this never changes between versions (unbeknownst to the operator), then it can provide a high-fidelity detection signature. 
 
This process becomes even more powerful when used in conjunction with in-memory scanning. Unless an implant takes evasive action, it is a sitting duck in memory (i.e. its .text / .data sections are plainly visible) and good YARA signatures will quickly identify it. Beacon in this sense is no different:

Furthermore, for Beacon specifically, there are countless open source (and no doubt many more private) YARA rules which are deployed by EDR vendors/threat hunters to quickly scan for and identify Beacons. As a result, this can make using Cobalt Strike a challenge for red team exercises unless extra precautions are taken. 

However, it is important to stress that low-cost detections are typically low cost to evade. YARA signatures generally can be thought of as having vast breadth but with limited depth (i.e. they are relatively quick and low cost to churn out/automate but have limited robustness for long term detection efficacy). With some basic adjustments to Beacon’s malleable C2 profile, it is possible to bypass YARA scanning with a high degree of confidence. Therefore, the aim of this blog post is to: 

  • Demonstrate Beacon’s susceptibility (in its default state) to YARA signatures. This will be illustrated initially via statically scanning a default Beacon payload on disk before considering in-memory YARA scanning.
  • Show how Cobalt Strike’s malleable C2 options can be configured to make in-memory YARA scanning redundant.  

As a note, this blog will primarily rely on Elastic’s open-source YARA rules for Cobalt Strike. This is because it was by far the most comprehensive collection of open-source YARA rules that we could find (and Elastic should be commended for being open and transparent in this regard). As such, this blog is not intended to be viewed as a guide in terms of how to “bypass” a specific vendor and it should be stressed that in-memory YARA scanning is likely to be only one component of a defence in depth strategy employed by EDRs (i.e. in conjunction with scanning for unbacked memory, anomalous threads, process injection, etc.). 

Lastly, it is important to note that as this blog is concerned with in-memory YARA scanning, all examples given are for raw Beacon DLL payloads. Hence, in this post we are assuming that Beacon has already been injected into memory via some form of Stage0 shellcode runner. Other payloads generated by Cobalt Strike (e.g. the default Cobalt Strike executables which are simple shellcode runners) have been included in the product for a number of years and as a result will be widely signatured too. However, they are out of scope for this blog post as the intention of the Arsenal kit is to facilitate modifying these executables to bypass signatures.

On-Disk YARA Scanning

To demonstrate the power of YARA signatures we can use Elastic’s open-source rules for Cobalt Strike and run them against a default raw HTTP Beacon DLL (on disk). As a note, this is a slightly contrived scenario, as typically when an exe/DLL is written to disk (or executed), an EDR will attempt to extract features and classify the binary via a PE malware model (although YARA could be used as well). This is a very different problem to get around and out of scope for this blog post, but see here for further reading. However, this first example is primarily designed to get an understanding of Beacon’s footprint in relation to YARA signatures before we consider the in-memory YARA scanning use case. 
 
We can run the Elastic Cobalt Strike YARA rules against a default raw Beacon DLL with the following command:

yara64.exe --print-strings Windows_Trojan_CobaltStrike.yar default_raw_beacon.dll

This generates five results:

1. Default strings found within Beacon.dll, as shown below (the corresponding YARA rule is Windows_Trojan_CobaltStrike_ee756db7):

Windows_Trojan_CobaltStrike_ee756db7 beacon_default_raw.dll

0x2cd60:$a39: %s as %s\%s: %d 
0x3c012:$a41: beacon.x64.dll 
0x2df70:$a46: %s (admin) 
0x2cec0:$a48: %s%s: %s 
0x2cd8c:$a50: %02d/%02d/%02d %02d:%02d:%02d 
0x2cdb8:$a50: %02d/%02d/%02d %02d:%02d:%02d 
0x2dfb9:$a51: Content-Length: %d

Strings are typically stored in the .data/.rdata section of a binary and clearly a number unique to Beacon are shown above.

2. An “unidentified” code fragment found within Beacon’s .text section (the corresponding YARA rule is Windows_Trojan_CobaltStrike_663fc95d):

Windows_Trojan_CobaltStrike_663fc95d beacon_default_raw.dll 

0x195f8:$a: 48 89 5C 24 08 57 48 83 EC 20 48 8B 59 10 48 8B F9 48 8B 49 08 FF 17 33 D2 41 B8 00 80 00 00

3. A code fragment from Beacon’s default sleep mask routine (the corresponding YARA rule is Windows_Trojan_CobaltStrike_b54b94ac):

Windows_Trojan_CobaltStrike_b54b94ac beacon_default_raw.dll 

0x3c37b:$a_x64: 4C 8B 53 08 45 8B 0A 45 8B 5A 04 4D 8D 52 08 45 85 C9 75 05 45 85 DB 74 33 45 3B CB 73 E6 49 8B F9 4C 8B 03

Note that sleep mask was not actually enabled for this Beacon payload, however the default sleep mask is always patched in the .data section unless a custom one is specified.

4. A code fragment found within Beacon’s default exported reflective loader function (the corresponding YARA rule is Windows_Trojan_CobaltStrike_f0b627fc):

Windows_Trojan_CobaltStrike_f0b627fc beacon_default_raw.dll 

0x16ed2:$beacon_loader_x64: 25 FF FF FF 00 3D 41 41 41 00 75 1A 8B 44 24 78 25 FF FF FF 00 3D 42 42 42 00 75 
0x18183:$beacon_loader_x64: 25 FF FF FF 00 3D 41 41 41 00 75 1A 8B 44 24 78 25 FF FF FF 00 3D 42 42 42 00 75

5. The reflective loader shellcode stub which is patched over the DOS header for Beacon (the corresponding YARA rule is Windows_Trojan_CobaltStrike_1787eef5):

Windows_Trojan_CobaltStrike_1787eef5 beacon_default_raw.dll 

0x0:$a5: 4D 5A 41 52 55 48 89 E5 48 81 EC 20 00 00 00 48 8D 1D EA FF FF FF 48 89 DF 48 81 C3 3C 6E 01 00

For the last two hits (4 and 5), it is important to understand that Beacon is a reflectively loaded DLL. This means it has an exported reflective loader function within its .text section. Furthermore, so that it can be run directly from memory, a small shellcode stub is written over the start of the DOS header.  This small shellcode stub is responsible for jumping to the exported reflective loader function, which will then proceed to bootstrap Beacon into memory. For more context on reflective loading, see our blog post series on UDRL development and @0xBoku’s excellent blog on ‘Defining the Cobalt Strike Reflective Loader’. 
 
Unsurprisingly, these two components are very attractive signatures for defenders (and have additionally remained static for a number of years). The fourth result above targets arbitrary code found within the default reflective loader function. The offending code within ReflectiveLoader() is shown in IDA below (25FFFFFF00 disassembles to `and eax, 0xFFFFFF` etc.):

Figure 1. A screenshot from IDA showing the code fragment within the exported reflective loader function which the Windows_Trojan_CobaltStrike_f0b627fc rule triggers on.

Here is a similar YARA rule by Google and note that Windows Defender also contains signatures for Beacon’s default reflective loader. We can demonstrate this by running ThreatCheck against the same raw payload:

Figure 2. ThreatCheck.exe identifying signatured bytes found within a raw Beacon payload.

If we scan for this byte sequence in IDA, we find that it is code located within the ReflectiveLoader function:

Figure 3. A screenshot from IDA showing that the bytes identified by ThreatCheck.exe belong to the exported reflective loader function.

The fifth result above (Windows_Trojan_CobaltStrike_1787eef5) is a signature for Beacon’s default shellcode stub for bootstrapping reflective loading. Beacon’s default shellcode stub is displayed in PE-Bear below:

Figure 4. A screenshot from PE-Bear showing the default Beacon shellcode stub that is written over the DOS header.

The Windows_Trojan_CobaltStrike_1787eef5 rule has one string condition ($a5) which is a fuzzy match for the first 8 bytes of this shellcode stub (` { 4D 5A 41 52 55 48 89 E5 48 81 EC ?? ?? ?? ?? 48 8D 1D ?? ?? ?? ?? 48 89 DF 48 81 C3 ?? ?? ?? ?? }`).

In-Memory YARA Scanning

Having looked at Beacon’s exposure to YARA for a default raw DLL payload located on disk, we can now turn to the problem this blog is concerned with: in-memory YARA scanning.  

The critical thing to understand with respect to in-memory YARA scanning is that when Beacon is reflectively loaded into memory it results in two memory allocations: the raw Beacon DLL (which will actually execute the shellcode stub and reflective loader function) and the virtual Beacon DLL (which is correctly loaded in memory and ready to go). Hence, the raw Beacon DLL will actually allocate memory (via the reflective loader function) for the virtual Beacon DLL and ensure it is correctly loaded. This is demonstrated in the image below:

Figure 5. A diagram showing the two memory allocations which occur as a result of the reflective loading process for a default Beacon payload. The first is Beacon in its raw/packed file format (‘Raw Beacon DLL’) and the second is Beacon after it has been correctly loaded into memory (‘Virtual Beacon DLL’).

Some of Cobalt Strike’s malleable C2 options patch/modify the raw Beacon DLL (i.e. stage.magic_mz_x64) whereas some change the behaviour of the reflective loader/how Beacon is loaded into memory (i.e. stage.obfuscate will not copy over the DLL headers to the virtual Beacon DLL, stage.stomp_pe will stomp values in the virtual Beacon DLL headers etc..). The important point to stress is that depending on what malleable C2 options are set, these two memory allocations may trigger different YARA signatures and hence both need to be accounted for. 
 
It is clear from the diagram above that, in its default state, Beacon is a sitting duck once it has been injected into memory. It’s reflective loader stub, code (.text section), and strings (.rdata/data) are all clearly visible. This is demonstrated in the screenshot from Windbg below, which shows strings belonging to Beacon lurking in plain text in a memory region corresponding to the virtual Beacon DLL (RWX):

Figure 6. A screenshot from Windbg showing suspicious strings in memory belonging to the virtual Beacon DLL. These would trivially be found by in-memory YARA scanning.

Furthermore, because of the two allocations corresponding to the raw and virtual DLL, all these sensitive regions are actually stored in memory twice. Hence, if we run the same set of YARA rules against a process that has a default Beacon injected into it, we will get the same YARA hits as before except this time there will be duplicates. These duplicate results will correspond to the same strings/bytes being found within the raw Beacon DLL and the virtual Beacon DLL. 
 
The screenshot below from Windbg shows the two suspicious memory regions (corresponding to the raw Beacon DLL and the virtual Beacon DLL) which are a result of Beacon being injected into memory. These are both identifiable via the same DLL header (MZARUH..) which is the start of the default reflective loader stub.

Figure 7. A screenshot from Windbg showing the two memory regions which result from reflectively loading a default Beacon payload. The RWX memory region corresponds to the virtual Beacon DLL and the RX region corresponds to the raw Beacon DLL.

The image below shows the output of running YARA against this process memory address space (PID 3204). Notice that we get two hits for each rule, which correspond to the two memory regions highlighted above in Windbg:

Figure 8. The results of running the Elastic Cobalt Strike YARA rules against a process with a default Beacon injected into it. As by default, Beacon is written in memory twice (raw Beacon DLL and virtual Beacon DLL), we get duplicate results for each triggering YARA rule.

Lastly, once Beacon is up and running, also be aware that its run time behaviour can increase its in-memory footprint. A good example to demonstrate this is BeaconEye. This uses the following signatures (https://github.com/CCob/BeaconEye/blob/master/BeaconEye.cs#L35) to identify Beacon via scanning for its configuration in heap memory.

YARA OPSEC Considerations for Beacon

The examples above demonstrate that, in its default state, there is an extremely low barrier to detecting Beacon in memory via YARA scanning. Furthermore, there are a multitude of YARA signatures targeting different sections of the Beacon DLL, i.e. the reflective loader stub/DLL Headers, code fragments found within the .text section, strings found in its data sections and so on. This is obviously something as operators that we need to address and at this point we can start to explore how to configure Beacon’s malleable C2 options so that we can bypass this entire class of detections.

Stage.transform.strrep

As a basic ‘evasion in depth’ approach it is always sensible to remove obviously suspicious strings found within Beacon’s reflective DLL (“beacon.x64.dll”, “ReflectiveLoader”, etc..). We can do this with the strrep command, which will replace a string/set of bytes within Beacon’s reflective DLL. However, note that some strings identified by YARA above are Beacon format strings (“%s%s: %s”) that are currently required to return data and so replacing these may cause undefined behaviour. 
 
As a demonstration of how brittle YARA signatures can be though, we can make some minor modifications to our malleable C2 profile to bypass the Windows_Trojan_CobaltStrike_ee756db7 rule that identifies default strings found within Beacon. This rule requires six suspicious strings to be identified, as the excerpt below shows:

        [...]
        $a50 = "%02d/%02d/%02d %02d:%02d:%02d" ascii fullword
        $a51 = "Content-Length: %d" ascii fullword
    condition:
        6 of ($a*)

By making the following modification to our malleable C2 profile we can bypass this rule:

stage { 
    Transform-x64 { 
        strrep "(admin)" "(adm)";
        # If you modify these ensure you keep format string order
        strrep "%s as %s\\%s: %d" "%s - %s\\%s: %d";
         }
}

Generally though, as we can’t modify all of these default strings without causing issues, the real solution to suspicious strings is to use the sleep mask kit (which we will discuss shortly).

Stage.magic_mz_* / Stage.magic_pe_*

The ‘magic_mz_*’ / ‘magic_pe_*’ options patch the MZ and PE characters respectively in Beacon’s raw DLL. The exported reflective loader function will use these magic bytes to locate itself in memory, so this option will also modify the reflective loader. Note that as these are located in the DLL headers, they will be copied over to the virtual Beacon DLL during the reflective loading process.

As a word of caution, for the magic_mz_* option, the value provided must be valid (no-)op codes as they are the first instructions that will be executed as part of the shellcode stub. Typically, this would be some variant of `pop regA, push regA` as the latter instruction undoes the first, but see here for more guidance on configuring this option. 
 
These options are typically used to frustrate memory scanners trying to identify injected DLLs, however magic_mz could be used to break basic YARA signatures on the reflective loader stub. As an example, modifying the MZ bytes (4D 5A) would break this signature. However, our freedom of movement is limited as we can only modify a few bytes in each case, so clearly more robust YARA signatures would still trigger.

Stage.cleanup

This is a key option and should be set to true wherever possible. As we have previously demonstrated, the initial memory allocation of the raw Beacon DLL contains several highly signaturable components (i.e. reflective loader stub, exported reflective loader function etc..).  Furthermore, once it has bootstrapped Beacon it is no longer needed. By setting the ‘cleanup’ option to true, Beacon will free this memory and dramatically lower its in-memory footprint (hence only the virtual Beacon DLL will remain). Additionally, the clean-up operation is smart enough to identify how the memory was allocated initially and free it accordingly. 
 
Note that if for some reason you cannot use cleanup, this BOF by @S4ntiagoP demonstrates how you can manually clean up memory.

Stage.stomppe

Even if we have set cleanup to true, the DLL headers are still copied over to the virtual Beacon DLL and hence a target for in-memory scanning. The stomppe option will ensure the MZ/PE and e_lfanew values are stomped in the virtual Beacon DLL during the reflective loading process. This means it is again harder for memory scanners to identify if there is an injected DLL in memory but could also help break any YARA signatures targeting Beacon’s DOS header (although this option is similar to magic_mz/pe in that is has limited freedom of movement for the YARA use case).

Stage.obfuscate

We can go one better and ask the reflective loader to copy Beacon over without its DLL headers. This can be achieved with the ‘stage.obfuscate’ flag. Enabling this, along with cleanup, means the reflective loader stub can no longer be found in-memory. 
 
Additionally, stage.obfuscate will also mask Beacon’s: 

  • .text section 
  • Section names 
  • Import table 
  • Dos/Rich Header (this is technically not masked but overwritten with random data)

As of 4.8, stage.obfuscate moved from a fixed single byte XOR key to a randomly generated multi-byte XOR key. This is particularly useful from an evasion perspective because YARA contains a xor modifier which will brute force single byte XOR’d strings. As an example, here is a YARA rule which looks for XOR’d strings in Beacon. 

Stage.obfuscate is demonstrated in the screenshot from PE-Bear below, which compares a default Beacon to one with obfuscation enabled (notice the different DOS headers and masked/XOR’d section names): 

Figure 9. A screenshot from PE-Bear showing a comparison of an obfuscated Beacon DLL vs a default Beacon DLL.

The required sections will then be unmasked during the reflective loading process. Hence this will break some YARA signatures on the raw/static Beacon DLL, but not once it has been loaded into memory (i.e. rules for code fragments in the .text section will still hit on the virtual Beacon DLL). This is demonstrated (at a high level) below:

Figure 10. A diagram showing the two memory allocations which occur as a result of the reflective loading process for an obfuscated Beacon payload. Note, that as well as the masking on the raw Beacon DLL, the Virtual Beacon DLL does not have any DLL headers.

Also note that, as this diagram suggests, some things are still not masked in the raw Beacon DLL even when obfuscate is enabled: 

  • Reflective loader stub 
  • Exported reflective loader function 
  • Sleep mask 
  • Strings 

Hence, even an obfuscated Beacon payload will still trigger the same YARA rules identified previously for each of these respective components. One way to address this limitation is via implementing custom obfuscation in a UDRL. This topic will be covered in much more detail in the next part of our UDRL development blog series
 
Lastly, also bear in mind that the obfuscation applied to Beacon might make it look (more) suspicious to PE malware models if dropped to disk (i.e. the sketchy section names, higher entropy of some sections, etc. will all make it look more anomalous).

Modifying The Reflective Loader Stub

It should be clear by now that the default Cobalt Strike reflective loader stub is an obvious target for YARA signatures. However, it is possible to create our own DOS stub and apply it to Beacon payloads via the `stage.transform` block. Note that this example will only consider X64 payloads. 
 
The default 64 bit DOS stub used by Beacon is shown (and explained) below:

pop r10                ; MZ Header 
push r10               ; undo action above 
push rbp               ; save the stack base pointer 
mov rbp, rsp           ; create a new stack frame 
sub rsp, 0x20          ; create shadow space (x64 __fastcall) 
lea rbx, [rip - 0x16]  ; obtain shellcode base address 
mov rdi, rbx           ; save shellcode base address 
add rbx, 0x16EA4       ; add file offset of ReflectiveLoader to shellcode base 
call rbx               ; call ReflectiveLoader (returns DllMain address) 
mov r8d, 0X56A2B5F0    ; EXITFUNC value 
push 4                 ; push 4 to stack 
pop rdx                ; pop the value into rdx (second argument of call) 
mov rcx, rdi           ; move shellcode base address to rcx (first argument of call) 
call rax               ; call DllMain 

The following example is an alternate version of the stub described above:

pop r10               ; MZ Header 
lea rbx, [rip -0x08]  ; obtain the shellcode base address 
push r10              ; undo action of MZ header 
sub rsp, 0x28         ; create shadow space and align stack 
mov rdi, rbx          ; save shellcode base address 
add rbx, 0xb752       ; add half file offset of ReflectiveLoader to shellcode base  
add rbx, 0xb752       ; add half file offset of ReflectiveLoader to shellcode base 
call rbx              ; call ReflectiveLoader (returns DllMain address) 
mov rdx, 0x04         ; move 4 into rdx (second argument of call) 
mov rcx, rdi          ; move shellcode base address to rcx (first argument of call) 
call rax              ; call DllMain 

Note that the default ReflectiveLoader will actually call DllMain itself and return the address of DllMain to the shellcode stub. We then call DllMain a second time to start Beacon. This is why you may see two calls to DllMain in open source UDRLs. 
 
This stub performs similar steps but with some instruction substitution and in a slightly different order (with the exception of setting the EXITFUNC value which is not actually technically required). Furthermore, there are many other ways of using instruction substitution to achieve a similar effect. For example, every `mov regA, regB` could be substituted as `push regB; pop regA` or `mov tmp, regB; mov regA, tmp` etc. Alternatively, a more automated approach could be to use something like Nettitude’s Shellcode Mutator. This does not actually mutate instructions per se (it randomly adds in no-ops to existing shellcode) but it could also be used to break YARA signatures. 
 
In the screenshot below, we’ve used defuse.ca to assemble the alternate shellcode stub above:

Figure 11. A screenshot from defuse.ca showing the assembled code of our modified shellcode stub which will overwrite the DOS header.

As a note, for complicated instruction encoding reasons, `pop r10` actually assembles to 41 5A, so you will need to manually change “\x41\x5A” back to “\x4D\x5A” at the start of the string literal (4D 5A still disassembles as `pop r10`). This is because the default reflective loader hunts backwards for the MZ header (i.e. 4D 5A) and so will fail if it can’t find it.  
 
It is possible to modify the default DOS stub and use our updated version with strrep in the malleable C2 profile, as demonstrated below:

stage { 
     transform-x64 { 
          strrep                  "\x4D\x5A\x41\x52\x55\x48\x89\xE5\x48\x81\xEC\x20\x00\x00\x00\x48\x8D\x1D\xEA\xFF\xFF\xFF\x48\x89\xDF\x48\x81\xC3\xA4\x6E\x01\x00\xFF\xD3\x41\xB8\xF0\xB5\xA2\x56\x68\x04\x00\x00\x00\x5A\x48\x89\xF9\xFF\xD0" "\x4D\x5A\x48\x8D\x1D\xF8\xFF\xFF\xFF\x41\x52\x48\x83\xEC\x28\x48\x89\xDF\x48\x81\xC3\x52\xB7\x00\x00\x48\x81\xC3\x52\xB7\x00\x00\xFF\xD3\x48\xC7\xC2\x04\x00\x00\x00\x48\x89\xF9\xFF\xD0"; 
    } 
}

After restarting the Teamserver with the above strrep included within our profile, we can see that the new DOS stub has been applied to our exported Beacon payload:

Figure 12. A side by side comparison of the default Beacon shellcode stub (on the left) vs our modified shellcode stub (on the right) for an exported raw Beacon payload.

In the screenshot below, we have used udrl.py (from the udrl-vs kit) to inject the raw Beacon payload (i.e. beacon.bin) with the modified DOS header into memory to test that it works as expected:

Figure 13. An example using udrl.py (located in the udrl-vs kit) to execute our raw Beacon DLL with the modified DOS header to confirm it works as expected.

As a note, to be compatible with the default reflective loader, the reflective loader stub needs to do four things: 
 
1. Obtain the shellcode base address (i.e. start of the MZ header). 
2. Calculate the offset to the exported ReflectiveLoader function from the shellcode base. 
3. Call ReflectiveLoader(). 
4. Call DllMain with the shellcode base address as the hinstDLL parameter (first argument/rcx) and 4 as the reason code parameter (second argument/rdx). Note that the third argument isn’t required. 
 
As long as you perform these steps, you can add whatever code you like to get Beacon up and running. However, you should try and restrict the size of the shellcode stub to 59 bytes or else you may crash Beacon. This is because after this point you will overwrite the value of e_lfanew (which is located at 0x3C/60 in the DOS header). The value of e_lfanew is required by the reflective loader to hunt backwards in memory for the MZ & PE headers and so if you blitz this it will fail. 
 
Lastly, as the magic_mz option also modifies the start of the DOS stub (and in turn what bytes the reflective loader looks for) it is incompatible with (and will supersede) the strrep approach outlined here. 

Sleep Mask

Even with all the modifications above, Beacon is still vulnerable to YARA rules targeting the .text/.data sections, as these are all still plainly visible in memory. Furthermore, Beacon’s run time data is similarly exposed (i.e. heap memory).  

The solution to this problem in Cobalt Strike is the sleep mask. The concept of this is simple: before Beacon sleeps, it will mask itself and any related memory (i.e. on the heap). When Beacon checks in it will briefly be exposed, however most of the time its memory will be masked and hence any valid signatures will fail to find their target. This is the key malleable C2 option to be configured. It will ensure that Beacon is only visible in memory for an extremely short window and will provide the most robust defence against in-memory signatures.

Enabling The Sleep Mask

One thing to bear in mind is that there are some extra steps required to make the sleep mask kit correctly mask the .text section when stage.userwx is set to false. While not strictly related to YARA scanning, it is generally always advised to avoid RWX memory (i.e. set userwx : false) as this is an obvious indicator of code injection and low hanging fruit for memory scanners. Hence, we recommend taking these extra steps to enable both settings. 

Prior to 4.7, sleep mask would only mask the .text section if it was RWX. Hence, if stage.userwx was set to false, Beacon’s .text section would reside in RX memory and would not be masked. The .text section is therefore at risk of trivial detection by the many YARA rules previously discussed and so this is not ideal. 

As of 4.7, we can configure the sleep mask kit to mask the .text section when stage.userwx is set to false. When this is enabled the sleep mask will change the .text protection to RW, mask the section, sleep, unmask the section, and then change the .text protection back to RX.  
 
To enable this, we need to set the following options in the stage block for our malleable C2 profile:

set userwx "false";
set sleepmask "true";

After setting these values and restarting the TeamServer, run the build script found within the sleep mask kit. As of the latest version of the Arsenal Kit (20230315), this is performed via the following (example) command:

./build.sh 47 WaitForSingleObject true indirect /tmp/dst

For a full explanation of each parameter, you can run ./build.sh with no arguments. However, the key parameter related to this discussion is to set the third argument (Mask_text) to ‘true’. After recompiling, load the subsequent .cna script into the Script Manager. For further guidance, see the README found in the sleep mask kit.

As a note, for older versions of the sleep mask kit you will need to set MASK_TEXT_SECTION to 1 in sleepmask.c as demonstrated below:

/* Enable or Disable sleep mask capabilities */
#define MASK_TEXT_SECTION 1

Evasive Sleep Mask

At this point, Beacon will now avoid RWX memory, will not copy over its DLL headers, and will mask itself when sleeping. However, there is one weak link in this approach: the sleep mask itself is still present/exposed in memory. This is shown at a high level below:

Figure 14. A high level diagram showing the memory exposure of Beacon once the sleep mask has been enabled. Beacon itself is now masked, however the default sleep mask is visible in memory and vulnerable to YARA signatures.

The default sleep mask is therefore again a very attractive target for defenders and no surprises that there are plenty of rules which focus on this remaining in-memory footprint. For example, with both cleanup and sleep mask enabled, the Windows_Trojan_CobaltStrike_b54b94ac rule for the default sleep mask will trigger when Beacon is sleeping as demonstrated below:

Figure 15. The results of running the Elastic Cobalt Strike YARA rules against a process with Beacon injected and using the default sleep mask. Now Beacon is masked, we no longer get any hits for rules targeting code/strings within Beacon, however we do get a YARA hit for a code fragment corresponding to the default sleep mask.

This screenshot shows the call stack for the sleeping Beacon thread (tid: 900) and the results of running the same YARA rules against this process (pid: 5944). Now that Beacon is masked, the YARA rules that we identified before will no longer trigger, however we do see the expected hit for the default sleep mask (Windows_Trojan_CobaltStrike_b54b94ac). Note that we can see that this result was found in the same region of memory as the highlighted unbacked return address in the sleeping thread’s call stack (~0x1b4ef8c00d2). Therefore, even if you’re using Beacon with the default sleep mask, you’re still at risk of being trivially identified via in-memory YARA scanning. 

Clearly then, we also need to obfuscate the sleep mask memory while Beacon sleeps. We can do this by using the evasive sleep mask found in the Arsenal kit. This will use an external mechanism to scramble the sleep mask when sleeping and hence will break any signatures on itself as shown in the diagram below:

Figure 16. A high level diagram showing the memory exposure of Beacon once the evasive sleep mask has been enabled. Both Beacon and the sleep mask are now obfuscated so no YARA rules will fire.

This can be configured by setting the below in sleepmask.c:

#if _WIN64
#define EVASIVE_SLEEP 1
#endif

Also note that to avoid any issues when using the evasive sleep (especially with process injection), ensure that you enable the CFG bypass by modifying evasive_sleep.c to the below:

/*
 *   Enable the CFG bypass technique which is needed to inject into processes
 *   protected Control Flow Guard (CFG) on supported version of Windows.
 */
#define CFG_BYPASS 1

Once these two steps have been completed, you can once again rebuild and reload the .cna script for the changes to take effect. 
 
Alternatively, you could modify the default sleep mask and recompile it to break any static byte patterns or apply your own user-defined sleep mask (UDSM). Generally, customisation is king for both the sleep mask and reflective loading (see the UDRLs section below) and using custom / unknown code will obviously completely break pre-canned YARA signatures.

UDRLs

This post has demonstrated that there are numerous signatures for Beacon’s default reflective loader function. With the sleep mask, evasive sleep, and cleanup all enabled, a default reflective loader is less of an issue as our in-memory exposure is extremely limited. However, to avoid default YARA signatures you could consider using a custom UDRL. Our own blog series provides a guide on how to develop UDRLs and there are many excellent open-source UDRLs such as BokuLdr, TitanLdr, and AceLdr
 
Note that if you do use a custom UDRL, many of the malleable C2 options outlined above are ignored. This is because how you modify/obfuscate Beacon is coupled with how the reflective loader works. For example, if you obfuscate a section, your reflective loader needs to know how to de-obfuscate it. Hence, it makes sense to leave these details to the UDRL developer to implement.

Post-Ex Reflective DLLs

There is a caveat to the approach suggested so far and that is in respect to post-ex reflective DLLs. As a note, not all post-ex functionality is run as a reflective DLL and many are implemented as BOFs; see the following documentation for a complete guide. 

If we run the same set of YARA rules against a process which has had the Cobalt Strike keylogger injected into it, we get two hits:

1. Default strings found in the keylogger post-ex DLL (the corresponding YARA rule is Windows_Trojan_CobaltStrike_0b58325e):

Windows_Trojan_CobaltStrike_0b58325e 9540 

0x14767a91d42:$a2: keylogger.x64.dll 
0x14767cb2b42:$a2: keylogger.x64.dll 
0x14767a8b568:$a4: %cE=======%c 
0x14767cac368:$a4: %cE=======%c 
0x14767a8b908:$a5: [unknown: %02X] 
0x14767cac708:$a5: [unknown: %02X] 
0x14767a91d54:$b1: ReflectiveLoader 
0x14767cb2b54:$b1: ReflectiveLoader 
0x14767a8b578:$b2: %c2%s%c 
0x14767cac378:$b2: %c2%s%c 
0x14767a8b8c0:$b3: [numlock] 
0x14767cac6c0:$b3: [numlock] 
0x14767a8b562:$b4: %cC%s 
0x14767cac362:$b4: %cC%s 
0x14767a8b658:$b5: [backspace] 
0x14767cac458:$b5: [backspace] 
0x14767a8b8d0:$b6: [scroll lock] 
0x14767cac6d0:$b6: [scroll lock] 
0x14767a8b688:$b7: [control] 
0x14767cac488:$b7: [control] 
0x14767a8b6f4:$b8: [left] 
0x14767cac4f4:$b8: [left] 
0x14767a8b6c8:$b9: [page up] 
0x14767cac4c8:$b9: [page up] 
0x14767a8b6d8:$b10: [page down] 
0x14767cac4d8:$b10: [page down] 
0x14767a8b718:$b11: [prtscr] 
0x14767cac518:$b11: [prtscr] 
0x14767a8b8e0:$b13: [ctrl] 
0x14767cac6e0:$b13: [ctrl] 
0x14767a8b6ec:$b14: [home] 
0x14767cac4ec:$b14: [home] 
0x14767a8b6a0:$b15: [pause] 
0x14767cac4a0:$b15: [pause] 
0x14767a8b670:$b16: [clear] 
0x14767cac470:$b16: [clear]

2. The reflective loader stub (the corresponding YARA rule is Windows_Trojan_CobaltStrike_29374056):

Windows_Trojan_CobaltStrike_29374056 9540 

0x14767a80000:$a1: 4D 5A 41 52 55 48 89 E5 48 81 EC 20 00 00 00 48 8D 1D EA FF FF FF 48 81 C3 10 19 00 00 FF D3 
0x14767ca0000:$a1: 4D 5A 41 52 55 48 89 E5 48 81 EC 20 00 00 00 48 8D 1D EA FF FF FF 48 81 C3 10 19 00 00 FF D3

Once again we get duplicate results which correspond to the same rules flagging on both the raw post-ex DLL and the virtual post-ex DLL.
 
The main malleable C2 option we have to play with for post-ex DLLs is setting ‘post-ex.obfuscate’ to true. This option will: 

  • Statically mask the rdata/data sections 
  • Scrub module/function name strings 
  • Dynamically mask the rdata section at run time for long running tasks (this behaviour will vary between different post-ex DLLs) 
  • Not copy over the DLL headers during reflective loading 
  • Avoid RWX memory (hence if you don’t set this option, post-ex DLLs will use RWX memory) 

Therefore, after enabling post-ex.obfuscate, we are left with a single hit for the reflective loader stub from the raw post-ex DLL (Windows_Trojan_CobaltStrike_29374056). This is because the DLL headers were not copied over to the virtual post-ex DLL and the previously identified strings are now masked in both the raw and virtual DLLs:

Figure 17. The results of running the Elastic Cobalt Strike YARA rules against a process hosting the Cobalt Strike keylogger with post-ex.obfuscate enabled.

Furthermore, once the keylogger job is killed, we still get this result. In fact, neither memory allocations are cleaned up and will remain in memory until the process terminates. Note that the obfuscate flag means that the strings in rdata will be cleared when the post-ex DLL thread exits, however the memory is not freed.  

Hence, post-ex reflective DLLs do not properly clean up memory and so are a real risk of triggering trivial YARA signatures. As a caveat, there will be differences in behaviour for different post-ex reflective DLLs which are not covered in detail here, but the key take away is that all of them have this limitation currently. 
 
This is not a risk if you fork and run (which can be noisy) but will be if used either intra-process or for a long running job injected into a separate process. Therefore, be wary when using post-ex reflective DLLs, even with obfuscate enabled, if you know the security controls in place involve in-memory YARA scanning and prefer BOF equivalents where possible. This limitation for post-ex DLLs is something we are planning to overhaul in the 4.9 release.

Conclusion

With the suggested approach outlined in this blog, Beacon is now robust against in-memory YARA scanning. It will be masked in memory for all but an extremely brief check in time and when it is visible we have taken as many precautions as possible to limit its exposure. 
 
Our suggested malleable C2 profile therefore looks something like the below:

stage {
     set userwx "false";
     set cleanup "true";
     set obfuscate "true";

     set magic_mz_x64 "<CHANGEME>"; 
     set magic_pe "<CHANGEME>";
     # Alternatively, modify the DOS header via the
     # transform.strrep approach outlined previously. 

     # For sleep mask ensure you:
     # - Enable masking the text section (set Mask_text to true 
     #   for./build.sh or for older sleep mask kits set 
     #   MASK_TEXT_SECTION to 1 in sleepmask.c).
     # - Enable evasive sleep (#define EVASIVE SLEEP 1 in sleepmask.c).
     # - Enable CFG bypass (#define CFG_BYPASS 1 in evasive_sleep.c).
     # - Ensure sleep mask is recompiled after setting the above
     #   and that the .cna script is loaded into the Script Manager.
     set sleep_mask "true";

     # Remove default strings found in Beacon.
     transform-x64 {
          strrep "ReflectiveLoader" "<CHANGEME>";
          strrep "beacon.x64.dll" "<CHANGEME>";
          strrep "(admin)" "(adm)";
          etc.
     }
}

Note that this is a suggested profile for bypassing YARA signatures (there are likely to be many other security controls in place and some of these options may not be desired depending on the context).

At this point, the next line of defence for defenders is either traditional memory scanning approaches for injected code or hunting for sleeping threads with unbacked memory (See https://github.com/thefLink/Hunt-Sleeping-Beacons for an example of this detection technique). Both of which are much more complicated problems to solve at scale. Furthermore, with the 4.8 release you can enable stack spoofing to bypass the latter. This can be enabled once again by modifying sleepmask.c to the below and recompiling/reloading the resulting .cna script:

#if EVASIVE_SLEEP 
// #include "evasive_sleep.c" 
#include "evasive_sleep_stack_spoof.c" 
#endif

This does include a default stack to spoof but as ever customisation is recommended. For more guidance see the README in the Arsenal kit.

As a final note, the analysis in this blog post is feeding in to changes that we plan to make in the next Cobalt Strike release. We want to give users more control over the reflective loading process for both Beacon and post-ex DLLs and enable users to easily push back against YARA signatures.

Behind the Mask: Spoofing Call Stacks Dynamically with Timers

 

This blog introduces a PoC technique for spoofing call stacks using timers. Prior to our implant sleeping, we can queue up timers to overwrite its call stack with a fake one and then restore the original before resuming execution. Hence, in the same way we can mask memory belonging to our implant during sleep, we can also mask the call stack of our main thread. Furthermore, this approach avoids having to deal with the complexities of X64 stack unwinding, which is typical of other call stack spoofing approaches. 

The Call Stack Problem

The core memory evasion problem from an attacker’s perspective is that implants typically operate from injected code (ignoring any module hollowing approaches). Therefore, one of the pillars of modern detection is to monitor for the creation of threads which belong to unbacked (or ‘floating’) memory. This blog by Elastic is a good approximation to the state of the art in terms of anomalous thread detection from an EDR perspective. 

However, another implication of this problem for attackers is that all the implants’ API calls will also originate from unbacked memory. By examining call stacks either at the time of a specific API invocation, or by proactively inspecting running threads (i.e. ones which are sleeping), suspicious call stacks can be identified via return addresses to unbacked memory.  

This is one detection area which historically has not received a huge amount of focus/research in modern EDR stacks (in my experience). However, this is starting to change with the release of open-source tools such as Hunt-Sleeping-Beacons, which will proactively inspect “sleeping” threads to find call stacks with unbacked regions. This demonstrably provides a high confidence signal of suspicious activity; hence it is valuable to EDRs and something attackers need to seriously consider in their evasion TTPs.  

Call Stack Inspection at Rest

The first problem to solve from an attacker’s perspective is how to manipulate the call stack of a sleeping thread so that it can bypass this type of inspection. This could be performed by the actual thread itself or via some external mechanism (APCs etc.).  

Typically, this is referred to as “spoofing at rest” (h/t to Kyle Avery here for this terminology in his excellent blog on avoiding memory scanners). The first public attempt to solve this problem is mgeeky’s ThreadStackSpoofer, which overwrites the last return address on the stack. 

As a note, the opposite way to approach this problem is by having no thread or call stack present at all, à la DeathSleep. The downside of this technique is the potential for the repeated creation of unbacked threads, (depends on the exact implementation), which is a much greater evil in modern environments. However, future use of Hardware Stack Protection by EDR vendors may make this type of approach inevitable. 

Call Stack Inspection During Execution – User Mode

The second problem is call stack inspection during execution, which could either be implemented in user mode or kernel mode. In terms of user mode implementation, this would typically involve hooking a commonly abused function and walking the stack to see where the call originated. If we find unbacked memory, it is highly likely to be suspicious. An obvious example of this is injected shellcode stagers calling WinInet functions. MalMemDetect is a good example of an open-source project that demonstrates this detection technique. 

For these scenarios, techniques such as RET address spoofing are normally sufficient to remove any evidence of unbacked addresses from the call stack. At a high level, this involves inserting a small assembly harness around the target function which will manually replace the last return address on the stack and redirect the target function to return to a trampoline gadget (e.g. jmp rbx). 

Additionally, there is SilentMoonWalk which uses a clever de-syncing approach (essentially a ROP gadget built on X64 stack unwinding codes). This can dynamically hide the origin of a function call and will similarly bypass these basic detection heuristics. Most importantly to an operator, both these techniques can be performed by the acting thread itself and do not require any external mechanism. 

From an opsec perspective, it is important to note that many of the techniques referenced in this blog may produce anomalous call stacks. Whether this is an issue or not depends on the target environment and the security controls in place. The key consideration is whether the call stack generated by an action is being recorded somewhere (say in the kernel, see next section) and appended to an event/alert. If this is the case, it may look suspicious to trained eyes (i.e. threat hunters/IR). 

To demonstrate this, we can take SilentMoonWalk’s desync stack spoofing technique as an example (this is a slightly easier use case as other techniques can be implementation specific).  As stated previously, this technique needs to find functions which implement specific stack winding operations (a full overview of X64 stack unwinding is beyond the scope of this blog but see this excellent CodeMachine article for further reading).  

For example, the first frame must always perform a UWOP_SET_FPREG operation, the second UWOP_PUSH_NONVOL (rbp) etc. as demonstrated in windbg below:

0:000> knf
#   Memory  Child-SP          RetAddr               Call Site 
00           0000001d`240feb98 00007ffe`b622d831     win32u!NtUserWaitMessage+0x14 
[…] 
08        40 0000001d`240ff140 00007ffe`b483b576     KERNELBASE!CreatePrivateObjectSecurity+0x31 
09        40 0000001d`240ff180 00007ffe`b48215a5     KERNELBASE!Internal_EnumSystemLocales+0x406 
0a       3e0 0000001d`240ff560 00007ffe`b4870e22     KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25 
0b       680 0000001d`240ffbe0 00007ffe`b6d87614     KERNELBASE!PathReplaceGreedy+0x82 
0c       100 0000001d`240ffce0 00007ffe`b71826a1     KERNEL32!BaseThreadInitThunk+0x14 
0d        30 0000001d`240ffd10 00000000`00000000     ntdll!RtlUserThreadStart+0x21 

0:000> .fnent KERNELBASE!PathReplaceGreedy+0x82 
Debugger function entry 000001cb`dda19c60 for: 
(00007ffe`b4870da0)   KERNELBASE!PathReplaceGreedy+0x82   |  (00007ffe`b4871050)   KERNELBASE!SortFindString
[…]  
  06: offs 13, unwind op 3, op info 2	UWOP_SET_FPREG.

0:000> .fnent KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25 
Debugger function entry 000001cb`dda19c60 for:  
(00007ffe`b4821580)   KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25   |  (00007ffe`b482182c)   KERNELBASE!AddTimeZoneRules 
[…] 
08: offs b, unwind op 0, op info 5	UWOP_PUSH_NONVOL reg: rbp. 

This output shows the call stack for the spoofed SilentMoonwalk thread (knf) and the unwind operations (.fnent) for two of the functions found on the call stack (PathReplaceGreedy / SystemTimeToTzSpecificLocalTimeEx). 

The key take away is that this results in a call stack which would never occur for a legitimate code path (and is therefore anomalous). Hence, KERNELBASE!PathReplaceGreedy does not call KERNELBASE!SystemTimeToTzSpecificLocalTimeEx … and so on. Furthermore, an EDR could itself attempt to search for this pattern of unwind codes during a proactive scan of a sleeping thread. Again, whether this is an issue depends entirely on the controls/telemetry in place but as operators it is always worth understanding the pros and cons of all the techniques at our disposal. 

Lastly, a trivial way of calling an API with a ‘clean’ call stack is to get something else to do it for you. The typical example is to use any callback type functionality provided by the OS (same applies for bypassing thread creation start address heuristics). The limitation for most callbacks is that you can normally only supply one argument (although there are some notable exceptions and good research showing ways around this). 

Call Stack Inspection During Execution – Kernel Mode

A user mode call stack can be captured inline during any of the kernel callback functions (ie. on process creation, thread creation/termination, handle access etc…). As an example, the SysMon driver uses RtlWalkFrameChain to collect a user mode call stack for all process access events (i.e. calling OpenProcess to obtain a HANDLE). Hence, this capability makes it trivial to spot unbacked memory/injected code (‘UNKNOWN’) attempting to open a handle to LSASS. For example, in this contrived scenario you would get a call stack similar to the following: 

0:020> knf 
#       Memory    Child-SP           RetAddr               Call Site 
00                0000004c`453cf428  00007ffd`7f1006fe     ntdll!NtOpenProcess 
01           8    0000004c`453cf430  00007ff6`98fe937f     KERNELBASE!OpenProcess+0x4e 
02          70    0000004c`453cf4a0  000002ad`c3fd1121     000002ad`c3fd1121 (UNKNOWN) 

Additionally, it is now possible to collect call stacks with the ETW threat intelligence provider.  The call stack addresses are unresolved (i.e. an EDR would need to keep its own internal process module cache to resolve symbols) but they essentially enable EDR vendors the potential to capture near real time call stacks (where the symbols are then resolved asynchronously). Therefore, this can be seen as a direct replacement for user mode hooking which is, critically, captured in the kernel. It is not unrealistic to imagine a scenario in the future in which unbacked/direct API calls to sensitive functions (VirtualAlloc / QueueUserApc / SetThreadContext / VirtualProtect etc.) are trivial to detect. 

These scenarios were the premise for some of my own previous research in to call stack spoofing during execution: https://github.com/WithSecureLabs/CallStackSpoofer. The idea was to offload the API call to a new thread, which we could initialise to a fake state, to hide the fact that the call originated from unbacked memory. My original PoC applied this idea to OpenProcess but it could easily be applied to image loads etc.  

The key requirement here was that any arbitrary call stack could be spoofed, so that even if a threat hunter was reviewing an alert containing the call stack, it would still look indistinguishable from other threads. The downsides of this approach were the need to create a new thread, how best to handle this spoofed thread, and the reliance on a hard coded / static call stack.

Call Stack Masking

Having given a brief review of the current state of research in to call stack spoofing, this blog will demonstrate a new call stack spoofing technique: call stack masking. The PoC introduced in this blog post solves the spoofing at rest problem by masking a sleeping thread’s call stack via an external mechanism (timers). 

While researching this topic in the past, I spent a large amount of time trying to get to grips with the complexities of X64 stack unwinding in order to produce TTPs to perform stack spoofing. This complexity is also present in a number of the other techniques discussed above. However, it occurred to me that there is a much simpler way to spoof/mask the call stack without having to deal with these intricacies. 

If we consider a generic thread that is performing any kind of wait, by definition, it cannot modify its own stack until the wait is satisfied. Furthermore, its stack is always read-writable. Therefore, we can use timers to:

  1. Create a backup of the current thread stack
  2. Overwrite it with a fake thread stack
  3. Restore the original thread stack just before resuming execution 

Any timer objects could be used, but for convenience I based my PoC on C5Spider’s Ekko sleep obfuscation technique.  

The only remaining challenge is to work out the value of RSP once our target thread is sleeping. This can be achieved using compiler intrinsics (_AddressOfReturnAddress) to obtain the Child-SP of the current frame. Once we have this, we can subtract the total stack utilisation of the expected next two frames (i.e. KERNELBASE!WaitForSingleObjectEx and ntdll!NtWaitForSingleObject) to find the expected value of RSP at sleep time.

Lastly, to make our masked thread look as realistic as possible, we can copy the start address and call stack of an existing (and legitimate) thread.

PoC || GTFO

The PoC can be found here: https://github.com/Cobalt-Strike/CallStackMasker

The PoC operates in two modes: static and dynamic. The static mode contains a hard coded call stack that was found in spoolsv.exe via Process Explorer. This thread is shown below and can be seen to be in a state of ‘Wait:UserRequest’ via KERNELBASE!WaitForSingleObjectEx:

The screenshot below demonstrates static call stack masking. The start address and call stack of our masked thread are identical to the thread identified in spoolsv.exe above:

The obvious downside of the static mode is that we are still relying on a hard coded call stack. To solve this problem the PoC also implements dynamic call stack masking. In this mode, it will enumerate all the accessible threads on the host and find one in the desired target state (i.e. UserRequest via WaitForSingleObjectEx). Once a suitable thread stack is found, it will copy it and use that to mask the sleeping thread. Similarly, the PoC will once again copy the cloned thread’s start address to ensure our masked thread looks legitimate.

If we run the PoC with the ‘–dynamic’ flag, it will locate another thread’s call stack to mimic as shown below: 

The target process (taskhostw.exe / 4520), thread (5452), and call stack identified above are shown below in Process Explorer:

If we now examine the call stack and start address of the main thread belonging to CallStackMasker, we can see it is identical to the mimicked thread:

Below is another example of CallStackMasker dynamically finding a shcore.dll based thread call stack from explorer.exe to spoof: 

The screenshot below shows the real ‘unmasked’ call stack:

Currently the PoC only supports WaitForSingleObject but it would be trivial to add in support for WaitForMultipleObjects.

As a final note, this PoC uses timer-queue timers, which I have previously demonstrated can be enumerated in memory: https://github.com/WithSecureLabs/TickTock. However, this PoC could be modified to use fully fledged kernel timers to avoid this potential detection opportunity.