Cobalt Strike and Outflank Security Tooling: Friends in Evasive Places
This is a joint blog written by the Cobalt Strike and Outflank teams. It is also available on the Outflank site.
Over the past few months there has been increasing collaboration and knowledge sharing internally between the Cobalt Strike and Outflank R&D teams. We are excited about the innovation opportunities made possible by this teamwork and have decided to align Cobalt Strike and Outflank Security Tooling (OST) closely going forward. Although we are actively collaborating, Cobalt Strike will continue to be the industry standard Command & Control (C2) framework, while Outflank Security Tooling (OST) will continue to offer a red team toolbox for all environments containing custom tradecraft that is OPSEC safe, evasive by design, and simple to use. Our vision is that Cobalt Strike and OST together will provide the best red team offering on the planet.
This blog will provide an update of the technical strategy of each product individually before giving a glimpse into the future of the two combined.
Cobalt Strike
Cobalt Strike is the industry standard Command & Control framework. Following the acquisition of Cobalt Strike by Fortra in 2020, a conscious decision was taken to follow the technical strategy employed by founder Raphael Mudge in taking Cobalt Strike to the next level. The core tenets of this strategy are:
Stability: Cobalt Strike must remain reliable and stable; nobody wants to lose their Beacons.
Evasion through flexibility: Since its inception, Cobalt Strike has always been an adversary emulation tool. It is designed to enable operators to mimic other malware and the TTPs they desire. Hence, in its default state, Beacon is pretty trivial to detect. This however has never been the point; Cobalt Strike has flexibility built into key aspects of its offensive chain. You can tinker with how Beacon is loaded into memory, how process injection is done, what your C2 traffic looks like etc. We don’t want to bake TTPs into Beacon which become signatured over time (Cobalt Strike’s implementation of module stomping is a good example of this). We want to enable operators to customise Beacon to use their own original TTPs. Our R&D effort will continue to focus on building in flexibility into all aspects of the offensive chain and to give operators as much control as possible over the TTPs they employ.
Outflank & OST
In September last year Fortra acquired Outflank. Outflank is a security consultancy based in Amsterdam with deep expertise in red teaming and a proven track record of world class research. You may know the team from their work on Direct Sys Calls in Beacon Object Files, various public tools, Microsoft Office tradecraft (derbycon, troopers, Blackhat Asia, brucon, x33fcon), or on the red team SIEM Redelk.
OST is not a C2 product but a collection of offensive tools and tradecraft, offering:
A broad arsenal of offensive tools for different stages of red teaming.
Tools that are designed to be OPSEC safe and evade existing security controls (AV/EDR).
Advanced tradecraft via understandable interfaces, instead of an operator needing to write or compile custom low-level code.
A knowledge sharing hub where trusted & vetted red teamers discuss tradecraft, evasion, and R&D.
An innovative cloud delivery platform which enables fast release cycles, and complex products such as ‘compilation as a service’, while still allowing any customer to run and manage their own offensive infrastructure. Although OST is offered as a cloud model, it is possible to use the offensive tools and features offline and in air gapped environments.
Hence, it is a toolbox for red teamers made by red teamers, enabling operators to work more efficiently and focus on their job at hand. It contains features such as: a payload generator to build sophisticated artifacts and evade anti-virus / EDR products, a custom .NET obfuscator, credential dumpers, kernel level capabilities, and custom BOF implementations of offensive tools (such as KerberosAsk as an alternative to Rubeus).
Going forward, OST will continue to provide a full suite of bleeding-edge tools to solve the main challenges facing security consultants today (i.e., on prem/workstation attacks, recon, cloud etc.). Outflank’s R&D team remain active in red teaming engagements and so all these tools are being continually battle tested on live red team operations. Furthermore, OST will continue to grow as a vetted knowledge hub and an offensive R&D powerhouse that brings novel evasion, tradecraft, and tooling for its customers.
Combining forces: Cobalt Strike and Outflank Security Tooling
Having outlined the technical strategies of Cobalt Strike and OST above, it is clear that both products naturally complement each other. Therefore, we have decided to align the two products closely going forward.
In our joint roadmap, both products will stay true to their visions as outlined above. Cobalt Strike will continue to push the boundaries of building flexibility into every stage of the offensive chain, e.g. via technologies such as BOFs, and OST will continue to leverage this flexibility to deploy novel tradecraft, as well as continuously releasing stand-alone tools.
Furthermore, both teams are already cooperating extensively, which is further advancing innovation and product development. Outflank’s experience in red teaming is providing valuable insight and feedback into new Cobalt Strike features, while joint research projects between the Cobalt Strike and Outflank R&D teams is already generating new TTPs. Together, we are regularly evaluating offensive innovation and adjusting the roadmap of both products accordingly. This ensures that both Cobalt Strike and OST remain cutting edge and that any new features are designed to integrate seamlessly between the two.
This approach is already bearing fruit; OST recently released a feature focusing on Cobalt Strike Integrations, specifically custom User Defined Reflective Loaders, which we will explore in more detail below.
Case Study : User Defined Reflective Loaders
Cobalt Strike has relied on reflective loading for a number of years now and we have endeavoured to give users as much control over the reflective loading process as possible via Malleable C2 options. However, we always want to push the boundaries in terms of building flexibility into Cobalt Strike so that users can customize Beacon to their liking. This was why we introduced User Defined Reflective Loaders (UDRLs). This enables operators to write their own reflective loader and bake their own tradecraft into this stage of the offensive chain. Furthermore, we recognise that UDRLs can be challenging to develop, which is why we started our own blog series on UDRL development (with a second post on implementing custom obfuscation dropping soon).
As long-term Cobalt Strike users, Outflank also recognised the complexities and time constraints that red teams face when developing custom UDRLs. Hence, they decided to put their own experience and R&D into developing novel UDRLs as part of their Cobalt Strike Integrations feature on OST, as shown below:
Figure 1. The Cobalt Strike Integrations page in OST.
With this feature, it is now possible in OST to stomp a custom UDRL developed by Outflank onto a given Beacon payload. There are currently two custom loaders available and more are in the pipeline. Most pertinently, operators do not need to get into the weeds with Visual Studio/compilers, while still being able to use advanced UDRLs that are OPSEC safe and packed with Outflank R&D.
Bypassing YARA signatures
Furthermore, OST will also check the stomped Beacon payload against a number of public YARA signatures and automatically modify Beacon to bypass any triggering rules, as demonstrated below:
Figure 2. The workflow for stomping a custom UDRL in OST. Notice that the left column (’Pre-processing‘) shows the YARA rules which flag on the Beacon payload before any modifications are made. The column on the right (’Post-processing‘) shows that these rules no longer trigger after OST has made its modifications.
We have previously blogged about YARA signatures targeting Beacon and so this is an important ‘evasion in depth’ step built into payload generation within OST.
Once Beacon has been equipped with a custom UDRL, and YARA bypasses have been applied, the payload can be seamlessly integrated with other OST features. For example, we can import the new payload into OST’s payload generator to create advanced artifacts which can be used for phishing, lateral movement, or persistence. This whole workflow is demonstrated below:
Recording of the User Defined Reflective Loader feature as available in OST
This feature is a great example of the joint roadmap in action; both the UDRL stomper and the YARA module originated from collaboration and shared knowledge between the CS and Outflank teams.
The Road Ahead
Novel tradecraft: The UDRL and YARA integration is just the first step. OST’s Cobalt Strike integrations will be further extended with new features, such as custom sleep masks and additional YARA and OPSEC checks. This allows customers of both OST and Cobalt Strike to utilise advanced tradecraft and the flexibility of Cobalt Strike without needing to write low level code.
Better user workflows: Instead of manually downloading custom BOFs/tools from OST, we are working on implementing a ‘bridge’ between OST and Cobalt Strike. This bridge would also allow users to upload Beacons to OST and generate advanced payloads quickly; allowing for smoother and more efficient workflows.
Figure 3. Current proof of concept of the OST bridge being worked on
New approaches of software delivery: OST has taken a unique approach in offensive software compilation & distribution, utilising just-in-time compilation and anti-piracy via its cloud delivery model. In due course Cobalt Strike will start leveraging a similar approach as OST; enabling new possibilities and evasion techniques within Beacon. The first step of this will be to migrate Cobalt Strike to a new download portal.
Team collaboration: Lastly, the OST and Cobalt Strike teams are increasingly collaborating on a number of low-level areas. These deep technical discussions on evasion and novel TTPs between hands-on red teamers, offensive R&D members, and the Cobalt Strike developers provides valuable feedback and accelerates product development.
Closing Thoughts
We hope that this blog provides an informative update to the technical strategy of both products going forward. In summary:
The Outflank and Cobalt Strike teams are cooperating to get the most value for our customers. Both Cobalt Strike and OST will stay close to their roots: Cobalt Strike will remain focused on stability and flexibility while OST offers a broad arsenal of offensive tradecraft. Furthermore, the collaboration between the two teams will enable enhanced product innovation and ensure that new features for both products are designed to work seamlessly together.
If you are interested in either Cobalt Strike or OST please refer to Cobalt Strike’s product info and demo video or OST’s product info and demo videos for more info. Cobalt Strike and OST bundles are available now and you can request a quote here.
Cobalt Strike and YARA: Can I Have Your Signature?
Over the past few years, there has been a massive proliferation of YARA signatures for Beacon. We know from conversations with our customers that this has become problematic when using Cobalt Strike for red team engagements and that there has been some confusion over how Cobalt Strike’s malleable C2 options can help.
Therefore, this blog post will outline the OPSEC considerations when using Beacon with respect to in-memory YARA scanning and suggest a malleable C2 profile which should give robust evasion against these types of defensive techniques.
As a TL;DR, to be OPSEC safe against in-memory YARA scanning you should:
Ensure you are using the sleep mask and have enabled the evasive sleep found in the Arsenal kit (or alternatively use a modified/custom sleep mask)
Enable stage.cleanup
Strongly consider setting stage.obfuscate to true (or implement similar functionality in a UDRL)
Be wary when using post-ex reflective DLLs (dependent on the security controls in place) as they currently fail to clean up memory and hence are ripe for signatures (we are working on fixing this)
YARA Signatures
One of the quick wins for defenders when attempting to obtain detection coverage for a given malware family are YARA rules. Once a sample is obtained, signatures can be quickly generated and then deployed at scale. An excellent blog by Elastic on a methodology for doing this can be found here: https://www.elastic.co/blog/detecting-cobalt-strike-with-memory-signatures.
Typically, these rules/signatures are made up of either strings found in the target binary or static bytes identified from reversing specific functionality (e.g. XOR/encryption routines etc.). Note that these rules can also be as simple as finding some arbitrary byte sequence that appears to be consistent between versions and that, say when run against Virus Total (e.g. via VTGrep: content:{4D 5a etc..}), reliably identifies the target family. If this never changes between versions (unbeknownst to the operator), then it can provide a high-fidelity detection signature.
This process becomes even more powerful when used in conjunction with in-memory scanning. Unless an implant takes evasive action, it is a sitting duck in memory (i.e. its .text / .data sections are plainly visible) and good YARA signatures will quickly identify it. Beacon in this sense is no different:
Furthermore, for Beacon specifically, there are countless open source (and no doubt many more private) YARA rules which are deployed by EDR vendors/threat hunters to quickly scan for and identify Beacons. As a result, this can make using Cobalt Strike a challenge for red team exercises unless extra precautions are taken.
However, it is important to stress that low-cost detections are typically low cost to evade. YARA signatures generally can be thought of as having vast breadth but with limited depth (i.e. they are relatively quick and low cost to churn out/automate but have limited robustness for long term detection efficacy). With some basic adjustments to Beacon’s malleable C2 profile, it is possible to bypass YARA scanning with a high degree of confidence. Therefore, the aim of this blog post is to:
Demonstrate Beacon’s susceptibility (in its default state) to YARA signatures. This will be illustrated initially via statically scanning a default Beacon payload on disk before considering in-memory YARA scanning.
Show how Cobalt Strike’s malleable C2 options can be configured to make in-memory YARA scanning redundant.
As a note, this blog will primarily rely on Elastic’s open-source YARA rules for Cobalt Strike. This is because it was by far the most comprehensive collection of open-source YARA rules that we could find (and Elastic should be commended for being open and transparent in this regard). As such, this blog is not intended to be viewed as a guide in terms of how to “bypass” a specific vendor and it should be stressed that in-memory YARA scanning is likely to be only one component of a defence in depth strategy employed by EDRs (i.e. in conjunction with scanning for unbacked memory, anomalous threads, process injection, etc.).
Lastly, it is important to note that as this blog is concerned with in-memory YARA scanning, all examples given are for raw Beacon DLL payloads. Hence, in this post we are assuming that Beacon has already been injected into memory via some form of Stage0 shellcode runner. Other payloads generated by Cobalt Strike (e.g. the default Cobalt Strike executables which are simple shellcode runners) have been included in the product for a number of years and as a result will be widely signatured too. However, they are out of scope for this blog post as the intention of the Arsenal kit is to facilitate modifying these executables to bypass signatures.
On-Disk YARA Scanning
To demonstrate the power of YARA signatures we can use Elastic’s open-source rules for Cobalt Strike and run them against a default raw HTTP Beacon DLL (on disk). As a note, this is a slightly contrived scenario, as typically when an exe/DLL is written to disk (or executed), an EDR will attempt to extract features and classify the binary via a PE malware model (although YARA could be used as well). This is a very different problem to get around and out of scope for this blog post, but see here for further reading. However, this first example is primarily designed to get an understanding of Beacon’s footprint in relation to YARA signatures before we consider the in-memory YARA scanning use case.
We can run the Elastic Cobalt Strike YARA rules against a default raw Beacon DLL with the following command:
Note that sleep mask was not actually enabled for this Beacon payload, however the default sleep mask is always patched in the .data section unless a custom one is specified.
4. A code fragment found within Beacon’s default exported reflective loader function (the corresponding YARA rule is Windows_Trojan_CobaltStrike_f0b627fc):
5. The reflective loader shellcode stub which is patched over the DOS header for Beacon (the corresponding YARA rule is Windows_Trojan_CobaltStrike_1787eef5):
For the last two hits (4 and 5), it is important to understand that Beacon is a reflectively loaded DLL. This means it has an exported reflective loader function within its .text section. Furthermore, so that it can be run directly from memory, a small shellcode stub is written over the start of the DOS header. This small shellcode stub is responsible for jumping to the exported reflective loader function, which will then proceed to bootstrap Beacon into memory. For more context on reflective loading, see our blog post series on UDRL development and @0xBoku’s excellent blog on ‘Defining the Cobalt Strike Reflective Loader’.
Unsurprisingly, these two components are very attractive signatures for defenders (and have additionally remained static for a number of years). The fourth result above targets arbitrary code found within the default reflective loader function. The offending code within ReflectiveLoader() is shown in IDA below (25FFFFFF00 disassembles to `and eax, 0xFFFFFF` etc.):
Figure 1. A screenshot from IDA showing the code fragment within the exported reflective loader function which the Windows_Trojan_CobaltStrike_f0b627fc rule triggers on.
Here is a similar YARA rule by Google and note that Windows Defender also contains signatures for Beacon’s default reflective loader. We can demonstrate this by running ThreatCheck against the same raw payload:
Figure 2. ThreatCheck.exe identifying signatured bytes found within a raw Beacon payload.
If we scan for this byte sequence in IDA, we find that it is code located within the ReflectiveLoader function:
Figure 3. A screenshot from IDA showing that the bytes identified by ThreatCheck.exe belong to the exported reflective loader function.
The fifth result above (Windows_Trojan_CobaltStrike_1787eef5) is a signature for Beacon’s default shellcode stub for bootstrapping reflective loading. Beacon’s default shellcode stub is displayed in PE-Bear below:
Figure 4. A screenshot from PE-Bear showing the default Beacon shellcode stub that is written over the DOS header.
The Windows_Trojan_CobaltStrike_1787eef5 rule has one string condition ($a5) which is a fuzzy match for the first 8 bytes of this shellcode stub (` { 4D 5A 41 52 55 48 89 E5 48 81 EC ?? ?? ?? ?? 48 8D 1D ?? ?? ?? ?? 48 89 DF 48 81 C3 ?? ?? ?? ?? }`).
In-Memory YARA Scanning
Having looked at Beacon’s exposure to YARA for a default raw DLL payload located on disk, we can now turn to the problem this blog is concerned with: in-memory YARA scanning.
The critical thing to understand with respect to in-memory YARA scanning is that when Beacon is reflectively loaded into memory it results in two memory allocations: the raw Beacon DLL (which will actually execute the shellcode stub and reflective loader function) and the virtual Beacon DLL (which is correctly loaded in memory and ready to go). Hence, the raw Beacon DLL will actually allocate memory (via the reflective loader function) for the virtual Beacon DLL and ensure it is correctly loaded. This is demonstrated in the image below:
Figure 5. A diagram showing the two memory allocations which occur as a result of the reflective loading process for a default Beacon payload. The first is Beacon in its raw/packed file format (‘Raw Beacon DLL’) and the second is Beacon after it has been correctly loaded into memory (‘Virtual Beacon DLL’).
Some of Cobalt Strike’s malleable C2 options patch/modify the raw Beacon DLL (i.e. stage.magic_mz_x64) whereas some change the behaviour of the reflective loader/how Beacon is loaded into memory (i.e. stage.obfuscate will not copy over the DLL headers to the virtual Beacon DLL, stage.stomp_pe will stomp values in the virtual Beacon DLL headers etc..). The important point to stress is that depending on what malleable C2 options are set, these two memory allocations may trigger different YARA signatures and hence both need to be accounted for.
It is clear from the diagram above that, in its default state, Beacon is a sitting duck once it has been injected into memory. It’s reflective loader stub, code (.text section), and strings (.rdata/data) are all clearly visible. This is demonstrated in the screenshot from Windbg below, which shows strings belonging to Beacon lurking in plain text in a memory region corresponding to the virtual Beacon DLL (RWX):
Figure 6. A screenshot from Windbg showing suspicious strings in memory belonging to the virtual Beacon DLL. These would trivially be found by in-memory YARA scanning.
Furthermore, because of the two allocations corresponding to the raw and virtual DLL, all these sensitive regions are actually stored in memory twice. Hence, if we run the same set of YARA rules against a process that has a default Beacon injected into it, we will get the same YARA hits as before except this time there will be duplicates. These duplicate results will correspond to the same strings/bytes being found within the raw Beacon DLL and the virtual Beacon DLL.
The screenshot below from Windbg shows the two suspicious memory regions (corresponding to the raw Beacon DLL and the virtual Beacon DLL) which are a result of Beacon being injected into memory. These are both identifiable via the same DLL header (MZARUH..) which is the start of the default reflective loader stub.
Figure 7. A screenshot from Windbg showing the two memory regions which result from reflectively loading a default Beacon payload. The RWX memory region corresponds to the virtual Beacon DLL and the RX region corresponds to the raw Beacon DLL.
The image below shows the output of running YARA against this process memory address space (PID 3204). Notice that we get two hits for each rule, which correspond to the two memory regions highlighted above in Windbg:
Figure 8. The results of running the Elastic Cobalt Strike YARA rules against a process with a default Beacon injected into it. As by default, Beacon is written in memory twice (raw Beacon DLL and virtual Beacon DLL), we get duplicate results for each triggering YARA rule.
Lastly, once Beacon is up and running, also be aware that its run time behaviour can increase its in-memory footprint. A good example to demonstrate this is BeaconEye. This uses the following signatures (https://github.com/CCob/BeaconEye/blob/master/BeaconEye.cs#L35) to identify Beacon via scanning for its configuration in heap memory.
YARA OPSEC Considerations for Beacon
The examples above demonstrate that, in its default state, there is an extremely low barrier to detecting Beacon in memory via YARA scanning. Furthermore, there are a multitude of YARA signatures targeting different sections of the Beacon DLL, i.e. the reflective loader stub/DLL Headers, code fragments found within the .text section, strings found in its data sections and so on. This is obviously something as operators that we need to address and at this point we can start to explore how to configure Beacon’s malleable C2 options so that we can bypass this entire class of detections.
Stage.transform.strrep
As a basic ‘evasion in depth’ approach it is always sensible to remove obviously suspicious strings found within Beacon’s reflective DLL (“beacon.x64.dll”, “ReflectiveLoader”, etc..). We can do this with the strrep command, which will replace a string/set of bytes within Beacon’s reflective DLL. However, note that some strings identified by YARA above are Beacon format strings (“%s%s: %s”) that are currently required to return data and so replacing these may cause undefined behaviour.
As a demonstration of how brittle YARA signatures can be though, we can make some minor modifications to our malleable C2 profile to bypass the Windows_Trojan_CobaltStrike_ee756db7 rule that identifies default strings found within Beacon. This rule requires six suspicious strings to be identified, as the excerpt below shows:
By making the following modification to our malleable C2 profile we can bypass this rule:
stage {
Transform-x64 {
strrep "(admin)" "(adm)";
# If you modify these ensure you keep format string order
strrep "%s as %s\\%s: %d" "%s - %s\\%s: %d";
}
}
Generally though, as we can’t modify all of these default strings without causing issues, the real solution to suspicious strings is to use the sleep mask kit (which we will discuss shortly).
Stage.magic_mz_* / Stage.magic_pe_*
The ‘magic_mz_*’ / ‘magic_pe_*’ options patch the MZ and PE characters respectively in Beacon’s raw DLL. The exported reflective loader function will use these magic bytes to locate itself in memory, so this option will also modify the reflective loader. Note that as these are located in the DLL headers, they will be copied over to the virtual Beacon DLL during the reflective loading process.
As a word of caution, for the magic_mz_* option, the value provided must be valid (no-)op codes as they are the first instructions that will be executed as part of the shellcode stub. Typically, this would be some variant of `pop regA, push regA` as the latter instruction undoes the first, but see here for more guidance on configuring this option.
These options are typically used to frustrate memory scanners trying to identify injected DLLs, however magic_mz could be used to break basic YARA signatures on the reflective loader stub. As an example, modifying the MZ bytes (4D 5A) would break this signature. However, our freedom of movement is limited as we can only modify a few bytes in each case, so clearly more robust YARA signatures would still trigger.
Stage.cleanup
This is a key option and should be set to true wherever possible. As we have previously demonstrated, the initial memory allocation of the raw Beacon DLL contains several highly signaturable components (i.e. reflective loader stub, exported reflective loader function etc..). Furthermore, once it has bootstrapped Beacon it is no longer needed. By setting the ‘cleanup’ option to true, Beacon will free this memory and dramatically lower its in-memory footprint (hence only the virtual Beacon DLL will remain). Additionally, the clean-up operation is smart enough to identify how the memory was allocated initially and free it accordingly.
Note that if for some reason you cannot use cleanup, this BOF by @S4ntiagoP demonstrates how you can manually clean up memory.
Stage.stomppe
Even if we have set cleanup to true, the DLL headers are still copied over to the virtual Beacon DLL and hence a target for in-memory scanning. The stomppe option will ensure the MZ/PE and e_lfanew values are stomped in the virtual Beacon DLL during the reflective loading process. This means it is again harder for memory scanners to identify if there is an injected DLL in memory but could also help break any YARA signatures targeting Beacon’s DOS header (although this option is similar to magic_mz/pe in that is has limited freedom of movement for the YARA use case).
Stage.obfuscate
We can go one better and ask the reflective loader to copy Beacon over without its DLL headers. This can be achieved with the ‘stage.obfuscate’ flag. Enabling this, along with cleanup, means the reflective loader stub can no longer be found in-memory.
Additionally, stage.obfuscate will also mask Beacon’s:
.text section
Section names
Import table
Dos/Rich Header (this is technically not masked but overwritten with random data)
As of 4.8, stage.obfuscate moved from a fixed single byte XOR key to a randomly generated multi-byte XOR key. This is particularly useful from an evasion perspective because YARA contains a xor modifier which will brute force single byte XOR’d strings. As an example, here is a YARA rule which looks for XOR’d strings in Beacon.
Stage.obfuscate is demonstrated in the screenshot from PE-Bear below, which compares a default Beacon to one with obfuscation enabled (notice the different DOS headers and masked/XOR’d section names):
Figure 9. A screenshot from PE-Bear showing a comparison of an obfuscated Beacon DLL vs a default Beacon DLL.
The required sections will then be unmasked during the reflective loading process. Hence this will break some YARA signatures on the raw/static Beacon DLL, but not once it has been loaded into memory (i.e. rules for code fragments in the .text section will still hit on the virtual Beacon DLL). This is demonstrated (at a high level) below:
Figure 10. A diagram showing the two memory allocations which occur as a result of the reflective loading process for an obfuscated Beacon payload. Note, that as well as the masking on the raw Beacon DLL, the Virtual Beacon DLL does not have any DLL headers.
Also note that, as this diagram suggests, some things are still not masked in the raw Beacon DLL even when obfuscate is enabled:
Reflective loader stub
Exported reflective loader function
Sleep mask
Strings
Hence, even an obfuscated Beacon payload will still trigger the same YARA rules identified previously for each of these respective components. One way to address this limitation is via implementing custom obfuscation in a UDRL. This topic will be covered in much more detail in the next part of our UDRL development blog series.
Lastly, also bear in mind that the obfuscation applied to Beacon might make it look (more) suspicious to PE malware models if dropped to disk (i.e. the sketchy section names, higher entropy of some sections, etc. will all make it look more anomalous).
Modifying The Reflective Loader Stub
It should be clear by now that the default Cobalt Strike reflective loader stub is an obvious target for YARA signatures. However, it is possible to create our own DOS stub and apply it to Beacon payloads via the `stage.transform` block. Note that this example will only consider X64 payloads.
The default 64 bit DOS stub used by Beacon is shown (and explained) below:
pop r10 ; MZ Header
push r10 ; undo action above
push rbp ; save the stack base pointer
mov rbp, rsp ; create a new stack frame
sub rsp, 0x20 ; create shadow space (x64 __fastcall)
lea rbx, [rip - 0x16] ; obtain shellcode base address
mov rdi, rbx ; save shellcode base address
add rbx, 0x16EA4 ; add file offset of ReflectiveLoader to shellcode base
call rbx ; call ReflectiveLoader (returns DllMain address)
mov r8d, 0X56A2B5F0 ; EXITFUNC value
push 4 ; push 4 to stack
pop rdx ; pop the value into rdx (second argument of call)
mov rcx, rdi ; move shellcode base address to rcx (first argument of call)
call rax ; call DllMain
The following example is an alternate version of the stub described above:
pop r10 ; MZ Header
lea rbx, [rip -0x08] ; obtain the shellcode base address
push r10 ; undo action of MZ header
sub rsp, 0x28 ; create shadow space and align stack
mov rdi, rbx ; save shellcode base address
add rbx, 0xb752 ; add half file offset of ReflectiveLoader to shellcode base
add rbx, 0xb752 ; add half file offset of ReflectiveLoader to shellcode base
call rbx ; call ReflectiveLoader (returns DllMain address)
mov rdx, 0x04 ; move 4 into rdx (second argument of call)
mov rcx, rdi ; move shellcode base address to rcx (first argument of call)
call rax ; call DllMain
Note that the default ReflectiveLoader will actually call DllMain itself and return the address of DllMain to the shellcode stub. We then call DllMain a second time to start Beacon. This is why you may see two calls to DllMain in open source UDRLs.
This stub performs similar steps but with some instruction substitution and in a slightly different order (with the exception of setting the EXITFUNC value which is not actually technically required). Furthermore, there are many other ways of using instruction substitution to achieve a similar effect. For example, every `mov regA, regB` could be substituted as `push regB; pop regA` or `mov tmp, regB; mov regA, tmp` etc. Alternatively, a more automated approach could be to use something like Nettitude’s Shellcode Mutator. This does not actually mutate instructions per se (it randomly adds in no-ops to existing shellcode) but it could also be used to break YARA signatures.
In the screenshot below, we’ve used defuse.ca to assemble the alternate shellcode stub above:
Figure 11. A screenshot from defuse.ca showing the assembled code of our modified shellcode stub which will overwrite the DOS header.
As a note, for complicated instruction encoding reasons, `pop r10` actually assembles to 41 5A, so you will need to manually change “\x41\x5A” back to “\x4D\x5A” at the start of the string literal (4D 5A still disassembles as `pop r10`). This is because the default reflective loader hunts backwards for the MZ header (i.e. 4D 5A) and so will fail if it can’t find it.
It is possible to modify the default DOS stub and use our updated version with strrep in the malleable C2 profile, as demonstrated below:
After restarting the Teamserver with the above strrep included within our profile, we can see that the new DOS stub has been applied to our exported Beacon payload:
Figure 12. A side by side comparison of the default Beacon shellcode stub (on the left) vs our modified shellcode stub (on the right) for an exported raw Beacon payload.
In the screenshot below, we have used udrl.py (from the udrl-vs kit) to inject the raw Beacon payload (i.e. beacon.bin) with the modified DOS header into memory to test that it works as expected:
Figure 13. An example using udrl.py (located in the udrl-vs kit) to execute our raw Beacon DLL with the modified DOS header to confirm it works as expected.
As a note, to be compatible with the default reflective loader, the reflective loader stub needs to do four things:
1. Obtain the shellcode base address (i.e. start of the MZ header). 2. Calculate the offset to the exported ReflectiveLoader function from the shellcode base. 3. Call ReflectiveLoader(). 4. Call DllMain with the shellcode base address as the hinstDLL parameter (first argument/rcx) and 4 as the reason code parameter (second argument/rdx). Note that the third argument isn’t required.
As long as you perform these steps, you can add whatever code you like to get Beacon up and running. However, you should try and restrict the size of the shellcode stub to 59 bytes or else you may crash Beacon. This is because after this point you will overwrite the value of e_lfanew (which is located at 0x3C/60 in the DOS header). The value of e_lfanew is required by the reflective loader to hunt backwards in memory for the MZ & PE headers and so if you blitz this it will fail.
Lastly, as the magic_mz option also modifies the start of the DOS stub (and in turn what bytes the reflective loader looks for) it is incompatible with (and will supersede) the strrep approach outlined here.
Sleep Mask
Even with all the modifications above, Beacon is still vulnerable to YARA rules targeting the .text/.data sections, as these are all still plainly visible in memory. Furthermore, Beacon’s run time data is similarly exposed (i.e. heap memory).
The solution to this problem in Cobalt Strike is the sleep mask. The concept of this is simple: before Beacon sleeps, it will mask itself and any related memory (i.e. on the heap). When Beacon checks in it will briefly be exposed, however most of the time its memory will be masked and hence any valid signatures will fail to find their target. This is the key malleable C2 option to be configured. It will ensure that Beacon is only visible in memory for an extremely short window and will provide the most robust defence against in-memory signatures.
Enabling The Sleep Mask
One thing to bear in mind is that there are some extra steps required to make the sleep mask kit correctly mask the .text section when stage.userwx is set to false. While not strictly related to YARA scanning, it is generally always advised to avoid RWX memory (i.e. set userwx : false) as this is an obvious indicator of code injection and low hanging fruit for memory scanners. Hence, we recommend taking these extra steps to enable both settings.
Prior to 4.7, sleep mask would only mask the .text section if it was RWX. Hence, if stage.userwx was set to false, Beacon’s .text section would reside in RX memory and would not be masked. The .text section is therefore at risk of trivial detection by the many YARA rules previously discussed and so this is not ideal.
As of 4.7, we can configure the sleep mask kit to mask the .text section when stage.userwx is set to false. When this is enabled the sleep mask will change the .text protection to RW, mask the section, sleep, unmask the section, and then change the .text protection back to RX.
To enable this, we need to set the following options in the stage block for our malleable C2 profile:
set userwx "false";
set sleepmask "true";
After setting these values and restarting the TeamServer, run the build script found within the sleep mask kit. As of the latest version of the Arsenal Kit (20230315), this is performed via the following (example) command:
For a full explanation of each parameter, you can run ./build.sh with no arguments. However, the key parameter related to this discussion is to set the third argument (Mask_text) to ‘true’. After recompiling, load the subsequent .cna script into the Script Manager. For further guidance, see the README found in the sleep mask kit.
As a note, for older versions of the sleep mask kit you will need to set MASK_TEXT_SECTION to 1 in sleepmask.c as demonstrated below:
At this point, Beacon will now avoid RWX memory, will not copy over its DLL headers, and will mask itself when sleeping. However, there is one weak link in this approach: the sleep mask itself is still present/exposed in memory. This is shown at a high level below:
Figure 14. A high level diagram showing the memory exposure of Beacon once the sleep mask has been enabled. Beacon itself is now masked, however the default sleep mask is visible in memory and vulnerable to YARA signatures.
The default sleep mask is therefore again a very attractive target for defenders and no surprises that there are plenty of rules which focus on this remaining in-memory footprint. For example, with both cleanup and sleep mask enabled, the Windows_Trojan_CobaltStrike_b54b94ac rule for the default sleep mask will trigger when Beacon is sleeping as demonstrated below:
Figure 15. The results of running the Elastic Cobalt Strike YARA rules against a process with Beacon injected and using the default sleep mask. Now Beacon is masked, we no longer get any hits for rules targeting code/strings within Beacon, however we do get a YARA hit for a code fragment corresponding to the default sleep mask.
This screenshot shows the call stack for the sleeping Beacon thread (tid: 900) and the results of running the same YARA rules against this process (pid: 5944). Now that Beacon is masked, the YARA rules that we identified before will no longer trigger, however we do see the expected hit for the default sleep mask (Windows_Trojan_CobaltStrike_b54b94ac). Note that we can see that this result was found in the same region of memory as the highlighted unbacked return address in the sleeping thread’s call stack (~0x1b4ef8c00d2). Therefore, even if you’re using Beacon with the default sleep mask, you’re still at risk of being trivially identified via in-memory YARA scanning.
Clearly then, we also need to obfuscate the sleep mask memory while Beacon sleeps. We can do this by using the evasive sleep mask found in the Arsenal kit. This will use an external mechanism to scramble the sleep mask when sleeping and hence will break any signatures on itself as shown in the diagram below:
Figure 16. A high level diagram showing the memory exposure of Beacon once the evasive sleep mask has been enabled. Both Beacon and the sleep mask are now obfuscated so no YARA rules will fire.
This can be configured by setting the below in sleepmask.c:
#if _WIN64
#define EVASIVE_SLEEP 1
#endif
Also note that to avoid any issues when using the evasive sleep (especially with process injection), ensure that you enable the CFG bypass by modifying evasive_sleep.c to the below:
/*
* Enable the CFG bypass technique which is needed to inject into processes
* protected Control Flow Guard (CFG) on supported version of Windows.
*/
#define CFG_BYPASS 1
Once these two steps have been completed, you can once again rebuild and reload the .cna script for the changes to take effect.
Alternatively, you could modify the default sleep mask and recompile it to break any static byte patterns or apply your own user-defined sleep mask (UDSM). Generally, customisation is king for both the sleep mask and reflective loading (see the UDRLs section below) and using custom / unknown code will obviously completely break pre-canned YARA signatures.
UDRLs
This post has demonstrated that there are numerous signatures for Beacon’s default reflective loader function. With the sleep mask, evasive sleep, and cleanup all enabled, a default reflective loader is less of an issue as our in-memory exposure is extremely limited. However, to avoid default YARA signatures you could consider using a custom UDRL. Our own blog series provides a guide on how to develop UDRLs and there are many excellent open-source UDRLs such as BokuLdr, TitanLdr, and AceLdr.
Note that if you do use a custom UDRL, many of the malleable C2 options outlined above are ignored. This is because how you modify/obfuscate Beacon is coupled with how the reflective loader works. For example, if you obfuscate a section, your reflective loader needs to know how to de-obfuscate it. Hence, it makes sense to leave these details to the UDRL developer to implement.
Post-Ex Reflective DLLs
There is a caveat to the approach suggested so far and that is in respect to post-ex reflective DLLs. As a note, not all post-ex functionality is run as a reflective DLL and many are implemented as BOFs; see the following documentation for a complete guide.
If we run the same set of YARA rules against a process which has had the Cobalt Strike keylogger injected into it, we get two hits:
Once again we get duplicate results which correspond to the same rules flagging on both the raw post-ex DLL and the virtual post-ex DLL.
The main malleable C2 option we have to play with for post-ex DLLs is setting ‘post-ex.obfuscate’ to true. This option will:
Statically mask the rdata/data sections
Scrub module/function name strings
Dynamically mask the rdata section at run time for long running tasks (this behaviour will vary between different post-ex DLLs)
Not copy over the DLL headers during reflective loading
Avoid RWX memory (hence if you don’t set this option, post-ex DLLs will use RWX memory)
Therefore, after enabling post-ex.obfuscate, we are left with a single hit for the reflective loader stub from the raw post-ex DLL (Windows_Trojan_CobaltStrike_29374056). This is because the DLL headers were not copied over to the virtual post-ex DLL and the previously identified strings are now masked in both the raw and virtual DLLs:
Figure 17. The results of running the Elastic Cobalt Strike YARA rules against a process hosting the Cobalt Strike keylogger with post-ex.obfuscate enabled.
Furthermore, once the keylogger job is killed, we still get this result. In fact, neither memory allocations are cleaned up and will remain in memory until the process terminates. Note that the obfuscate flag means that the strings in rdata will be cleared when the post-ex DLL thread exits, however the memory is not freed.
Hence, post-ex reflective DLLs do not properly clean up memory and so are a real risk of triggering trivial YARA signatures. As a caveat, there will be differences in behaviour for different post-ex reflective DLLs which are not covered in detail here, but the key take away is that all of them have this limitation currently.
This is not a risk if you fork and run (which can be noisy) but will be if used either intra-process or for a long running job injected into a separate process. Therefore, be wary when using post-ex reflective DLLs, even with obfuscate enabled, if you know the security controls in place involve in-memory YARA scanning and prefer BOF equivalents where possible. This limitation for post-ex DLLs is something we are planning to overhaul in the 4.9 release.
Conclusion
With the suggested approach outlined in this blog, Beacon is now robust against in-memory YARA scanning. It will be masked in memory for all but an extremely brief check in time and when it is visible we have taken as many precautions as possible to limit its exposure.
Our suggested malleable C2 profile therefore looks something like the below:
stage {
set userwx "false";
set cleanup "true";
set obfuscate "true";
set magic_mz_x64 "<CHANGEME>";
set magic_pe "<CHANGEME>";
# Alternatively, modify the DOS header via the
# transform.strrep approach outlined previously.
# For sleep mask ensure you:
# - Enable masking the text section (set Mask_text to true
# for./build.sh or for older sleep mask kits set
# MASK_TEXT_SECTION to 1 in sleepmask.c).
# - Enable evasive sleep (#define EVASIVE SLEEP 1 in sleepmask.c).
# - Enable CFG bypass (#define CFG_BYPASS 1 in evasive_sleep.c).
# - Ensure sleep mask is recompiled after setting the above
# and that the .cna script is loaded into the Script Manager.
set sleep_mask "true";
# Remove default strings found in Beacon.
transform-x64 {
strrep "ReflectiveLoader" "<CHANGEME>";
strrep "beacon.x64.dll" "<CHANGEME>";
strrep "(admin)" "(adm)";
etc.
}
}
Note that this is a suggested profile for bypassing YARA signatures (there are likely to be many other security controls in place and some of these options may not be desired depending on the context).
At this point, the next line of defence for defenders is either traditional memory scanning approaches for injected code or hunting for sleeping threads with unbacked memory (See https://github.com/thefLink/Hunt-Sleeping-Beacons for an example of this detection technique). Both of which are much more complicated problems to solve at scale. Furthermore, with the 4.8 release you can enable stack spoofing to bypass the latter. This can be enabled once again by modifying sleepmask.c to the below and recompiling/reloading the resulting .cna script:
This does include a default stack to spoof but as ever customisation is recommended. For more guidance see the README in the Arsenal kit.
As a final note, the analysis in this blog post is feeding in to changes that we plan to make in the next Cobalt Strike release. We want to give users more control over the reflective loading process for both Beacon and post-ex DLLs and enable users to easily push back against YARA signatures.
Stopping Cybercriminals From Abusing Security Tools
Microsoft’s Digital Crimes Unit (DCU), cybersecurity software company Fortra™ and Health Information Sharing and Analysis Center (Health-ISAC) are taking technical and legal action to disrupt cracked, legacy copies of Cobalt Strike and abused Microsoft software, which have been used by cybercriminals to distribute malware, including ransomware. This is a change in the way DCU has worked in the past – the scope is greater, and the operation is more complex. Instead of disrupting the command and control of a malware family, this time, we are working with Fortra to remove illegal, legacy copies of Cobalt Strike so they can no longer be used by cybercriminals.
We will need to be persistent as we work to take down the cracked, legacy copies of Cobalt Strike hosted around the world. This is an important action by Fortra to protect the legitimate use of its security tools. Microsoft is similarly committed to the legitimate use of its products and services. We also believe that Fortra choosing to partner with us for this action is recognition of DCU’s work fighting cybercrime over the last decade. Together, we are committed to going after the cybercriminal’s illegal distribution methods.
Cobalt Strike is a legitimate and popular post-exploitation tool used for adversary simulation provided by Fortra. Sometimes, older versions of the software have been abused and altered by criminals. These illegal copies are referred to as “cracked” and have been used to launch destructive attacks, such as those against the Government of Costa Rica and the Irish Health Service Executive. Microsoft software development kits and APIs are abused as part of the coding of the malware as well as the criminal malware distribution infrastructure to target and mislead victims.
The ransomware families associated with or deployed by cracked copies of Cobalt Strike have been linked to more than 68 ransomware attacks impacting healthcare organizations in more than 19 countries around the world. These attacks have cost hospital systems millions of dollars in recovery and repair costs, plus interruptions to critical patient care services including delayed diagnostic, imaging and laboratory results, canceled medical procedures and delays in delivery of chemotherapy treatments, just to name a few.
Disruption Components and Strategy
On March 31, 2023, the U.S. District Court for the Eastern District of New York issued a court order allowing Microsoft, Fortra, and Health-ISAC to disrupt the malicious infrastructure used by criminals to facilitate their attacks. Doing so enables us to notify relevant internet service providers (ISPs) and computer emergency readiness teams (CERTs) who assist in taking the infrastructure offline, effectively severing the connection between criminal operators and infected victim computers.
Fortra and Microsoft’s investigation efforts included detection, analysis, telemetry, and reverse engineering, with additional data and insights to strengthen our legal case from a global network of partners, including Health-ISAC, the Fortra Cyber Intelligence Team, and Microsoft Threat Intelligence team data and insights. Our action focuses solely on disrupting cracked, legacy copies of Cobalt Strike and compromised Microsoft software.
Microsoft is also expanding a legal method used successfully to disrupt malware and nation state operations to target the abuse of security tools used by a broad spectrum of cybercriminals. Disrupting cracked legacy copies of Cobalt Strike will significantly hinder the monetization of these illegal copies and slow their use in cyberattacks, forcing criminals to re-evaluate and change their tactics. Today’s action also includes copyright claims against the malicious use of Microsoft and Fortra’s software code which are altered and abused for harm.
Abuse by Cybercriminals
Fortra has taken considerable steps to prevent the misuse of its software, including stringent customer vetting practices. However, criminals are known to steal older versions of security software, including Cobalt Strike, creating cracked copies to gain backdoor access to machines and deploy malware. We have observed ransomware operators using cracked copies of Cobalt Strike and abused Microsoft software to deploy Conti, LockBit, and other ransomware as part of the ransomware as a service business model.
Threat actors use cracked copies of software to speed up their ransomware deployment on compromised networks. The below diagram showsan attack flow, highlighting contributing factors, including spear phishing and malicious spam emails to gain initial access, as well as the abuse of code stolen from companies like Microsoft and Fortra.
Example of an attack flow by threat actor DEV-0243.
While the exact identities of those conducting the criminal operations are currently unknown, we have detected malicious infrastructure across the globe, including in China, the United States and Russia. In addition to financially motivated cybercriminals, we have observed threat actors acting in the interests of foreign governments, including from Russia, China, Vietnam and Iran, using cracked copies.
Continuing the Fight Against Threat Actors
Microsoft, Fortra and Health-ISAC remain relentless in our efforts to improve the security of the ecosystem, and we are collaborating with the FBI Cyber Division, National Cyber Investigative Joint Task Force (NCIJTF) and Europol’s European Cybercrime Centre (EC3) on this case. While this action will impact the criminals’ immediate operations, we fully anticipate they will attempt to revive their efforts. Our action is therefore not one and done. Through ongoing legal and technical action, Microsoft, Fortra and Health-ISAC, along with our partners, will continue to monitor and take action to disrupt further criminal operations, including the use of cracked copies of Cobalt Strike.
Fortra devotes significant computing and human resources to combat the illegal use of its software and cracked copies of Cobalt Strike, helping customers determine if their software licenses have been compromised. Legitimate security practitioners who purchase Cobalt Strike licenses are vetted by Fortra and are required to comply with usage restrictions and export controls. Fortra actively works with social media and file sharing sites to remove cracked copies of Cobalt Strike when they appear on those web properties. As criminals have adapted their techniques, Fortra has adapted the security controls in the Cobalt Strike software to eliminate the methods used to crack older versions of Cobalt Strike.
As we have since 2008, Microsoft’s DCU will continue its efforts to stop the spread of malware by filing civil litigation to protect customers in the large number of countries around the world where these laws are in place. We will also continue to work with ISPs and CERTs to identify and remediate victims.
Behind the Mask: Spoofing Call Stacks Dynamically with Timers
This blog introduces a PoC technique for spoofing call stacks using timers. Prior to our implant sleeping, we can queue up timers to overwrite its call stack with a fake one and then restore the original before resuming execution. Hence, in the same way we can mask memory belonging to our implant during sleep, we can also mask the call stack of our main thread. Furthermore, this approach avoids having to deal with the complexities of X64 stack unwinding, which is typical of other call stack spoofing approaches.
The Call Stack Problem
The core memory evasion problem from an attacker’s perspective is that implants typically operate from injected code (ignoring any module hollowing approaches). Therefore, one of the pillars of modern detection is to monitor for the creation of threads which belong to unbacked (or ‘floating’) memory. This blog by Elastic is a good approximation to the state of the art in terms of anomalous thread detection from an EDR perspective.
However, another implication of this problem for attackers is that all the implants’ API calls will also originate from unbacked memory. By examining call stacks either at the time of a specific API invocation, or by proactively inspecting running threads (i.e. ones which are sleeping), suspicious call stacks can be identified via return addresses to unbacked memory.
This is one detection area which historically has not received a huge amount of focus/research in modern EDR stacks (in my experience). However, this is starting to change with the release of open-source tools such as Hunt-Sleeping-Beacons, which will proactively inspect “sleeping” threads to find call stacks with unbacked regions. This demonstrably provides a high confidence signal of suspicious activity; hence it is valuable to EDRs and something attackers need to seriously consider in their evasion TTPs.
Call Stack Inspection at Rest
The first problem to solve from an attacker’s perspective is how to manipulate the call stack of a sleeping thread so that it can bypass this type of inspection. This could be performed by the actual thread itself or via some external mechanism (APCs etc.).
Typically, this is referred to as “spoofing at rest” (h/t to Kyle Avery here for this terminology in his excellent blog on avoiding memory scanners). The first public attempt to solve this problem is mgeeky’s ThreadStackSpoofer, which overwrites the last return address on the stack.
As a note, the opposite way to approach this problem is by having no thread or call stack present at all, à la DeathSleep. The downside of this technique is the potential for the repeated creation of unbacked threads, (depends on the exact implementation), which is a much greater evil in modern environments. However, future use of Hardware Stack Protection by EDR vendors may make this type of approach inevitable.
Call Stack Inspection During Execution – User Mode
The second problem is call stack inspection during execution, which could either be implemented in user mode or kernel mode. In terms of user mode implementation, this would typically involve hooking a commonly abused function and walking the stack to see where the call originated. If we find unbacked memory, it is highly likely to be suspicious. An obvious example of this is injected shellcode stagers calling WinInet functions. MalMemDetect is a good example of an open-source project that demonstrates this detection technique.
For these scenarios, techniques such as RET address spoofing are normally sufficient to remove any evidence of unbacked addresses from the call stack. At a high level, this involves inserting a small assembly harness around the target function which will manually replace the last return address on the stack and redirect the target function to return to a trampoline gadget (e.g. jmp rbx).
Additionally, there is SilentMoonWalk which uses a clever de-syncing approach (essentially a ROP gadget built on X64 stack unwinding codes). This can dynamically hide the origin of a function call and will similarly bypass these basic detection heuristics. Most importantly to an operator, both these techniques can be performed by the acting thread itself and do not require any external mechanism.
From an opsec perspective, it is important to note that many of the techniques referenced in this blog may produce anomalous call stacks. Whether this is an issue or not depends on the target environment and the security controls in place. The key consideration is whether the call stack generated by an action is being recorded somewhere (say in the kernel, see next section) and appended to an event/alert. If this is the case, it may look suspicious to trained eyes (i.e. threat hunters/IR).
To demonstrate this, we can take SilentMoonWalk’s desync stack spoofing technique as an example (this is a slightly easier use case as other techniques can be implementation specific). As stated previously, this technique needs to find functions which implement specific stack winding operations (a full overview of X64 stack unwinding is beyond the scope of this blog but see this excellent CodeMachine article for further reading).
This output shows the call stack for the spoofed SilentMoonwalk thread (knf) and the unwind operations (.fnent) for two of the functions found on the call stack (PathReplaceGreedy / SystemTimeToTzSpecificLocalTimeEx).
The key take away is that this results in a call stack which would never occur for a legitimate code path (and is therefore anomalous). Hence, KERNELBASE!PathReplaceGreedy does not call KERNELBASE!SystemTimeToTzSpecificLocalTimeEx … and so on. Furthermore, an EDR could itself attempt to search for this pattern of unwind codes during a proactive scan of a sleeping thread. Again, whether this is an issue depends entirely on the controls/telemetry in place but as operators it is always worth understanding the pros and cons of all the techniques at our disposal.
Lastly, a trivial way of calling an API with a ‘clean’ call stack is to get something else to do it for you. The typical example is to use any callback type functionality provided by the OS (same applies for bypassing thread creation start address heuristics). The limitation for most callbacks is that you can normally only supply one argument (although there are some notable exceptions and good research showing ways around this).
Call Stack Inspection During Execution – Kernel Mode
A user mode call stack can be captured inline during any of the kernel callback functions (ie. on process creation, thread creation/termination, handle access etc…). As an example, the SysMon driver uses RtlWalkFrameChain to collect a user mode call stack for all process access events (i.e. calling OpenProcess to obtain a HANDLE). Hence, this capability makes it trivial to spot unbacked memory/injected code (‘UNKNOWN’) attempting to open a handle to LSASS. For example, in this contrived scenario you would get a call stack similar to the following:
Additionally, it is now possible to collect call stacks with the ETW threat intelligence provider. The call stack addresses are unresolved (i.e. an EDR would need to keep its own internal process module cache to resolve symbols) but they essentially enable EDR vendors the potential to capture near real time call stacks (where the symbols are then resolved asynchronously). Therefore, this can be seen as a direct replacement for user mode hooking which is, critically, captured in the kernel. It is not unrealistic to imagine a scenario in the future in which unbacked/direct API calls to sensitive functions (VirtualAlloc / QueueUserApc / SetThreadContext / VirtualProtect etc.) are trivial to detect.
These scenarios were the premise for some of my own previous research in to call stack spoofing during execution: https://github.com/WithSecureLabs/CallStackSpoofer. The idea was to offload the API call to a new thread, which we could initialise to a fake state, to hide the fact that the call originated from unbacked memory. My original PoC applied this idea to OpenProcess but it could easily be applied to image loads etc.
The key requirement here was that any arbitrary call stack could be spoofed, so that even if a threat hunter was reviewing an alert containing the call stack, it would still look indistinguishable from other threads. The downsides of this approach were the need to create a new thread, how best to handle this spoofed thread, and the reliance on a hard coded / static call stack.
Call Stack Masking
Having given a brief review of the current state of research in to call stack spoofing, this blog will demonstrate a new call stack spoofing technique: call stack masking. The PoC introduced in this blog post solves the spoofing at rest problem by masking a sleeping thread’s call stack via an external mechanism (timers).
While researching this topic in the past, I spent a large amount of time trying to get to grips with the complexities of X64 stack unwinding in order to produce TTPs to perform stack spoofing. This complexity is also present in a number of the other techniques discussed above. However, it occurred to me that there is a much simpler way to spoof/mask the call stack without having to deal with these intricacies.
If we consider a generic thread that is performing any kind of wait, by definition, it cannot modify its own stack until the wait is satisfied. Furthermore, its stack is always read-writable. Therefore, we can use timers to:
Create a backup of the current thread stack
Overwrite it with a fake thread stack
Restore the original thread stack just before resuming execution
The only remaining challenge is to work out the value of RSP once our target thread is sleeping. This can be achieved using compiler intrinsics (_AddressOfReturnAddress) to obtain the Child-SP of the current frame. Once we have this, we can subtract the total stack utilisation of the expected next two frames (i.e. KERNELBASE!WaitForSingleObjectEx and ntdll!NtWaitForSingleObject) to find the expected value of RSP at sleep time.
Lastly, to make our masked thread look as realistic as possible, we can copy the start address and call stack of an existing (and legitimate) thread.
The PoC operates in two modes: static and dynamic. The static mode contains a hard coded call stack that was found in spoolsv.exe via Process Explorer. This thread is shown below and can be seen to be in a state of ‘Wait:UserRequest’ via KERNELBASE!WaitForSingleObjectEx:
The screenshot below demonstrates static call stack masking. The start address and call stack of our masked thread are identical to the thread identified in spoolsv.exe above:
The obvious downside of the static mode is that we are still relying on a hard coded call stack. To solve this problem the PoC also implements dynamic call stack masking. In this mode, it will enumerate all the accessible threads on the host and find one in the desired target state (i.e. UserRequest via WaitForSingleObjectEx). Once a suitable thread stack is found, it will copy it and use that to mask the sleeping thread. Similarly, the PoC will once again copy the cloned thread’s start address to ensure our masked thread looks legitimate.
If we run the PoC with the ‘–dynamic’ flag, it will locate another thread’s call stack to mimic as shown below:
The target process (taskhostw.exe / 4520), thread (5452), and call stack identified above are shown below in Process Explorer:
If we now examine the call stack and start address of the main thread belonging to CallStackMasker, we can see it is identical to the mimicked thread:
Below is another example of CallStackMasker dynamically finding a shcore.dll based thread call stack from explorer.exe to spoof:
The screenshot below shows the real ‘unmasked’ call stack:
Currently the PoC only supports WaitForSingleObject but it would be trivial to add in support for WaitForMultipleObjects.
As a final note, this PoC uses timer-queue timers, which I have previously demonstrated can be enumerated in memory: https://github.com/WithSecureLabs/TickTock. However, this PoC could be modified to use fully fledged kernel timers to avoid this potential detection opportunity.