Cobalt Strike Archives - Cobalt Strike Research and Development
fortra logo

Revisiting the User-Defined Reflective Loader Part 1: Simplifying Development

This blog post accompanies a new addition to the Arsenal Kit – The User-Defined Reflective Loader Visual Studio (UDRL-VS). Over the past few months, we have received a lot of feedback from our users that whilst the flexibility of the UDRL is great, there is not enough information/example code to get the most out of this feature. The intention of this kit is to lower the barrier to entry for developing and debugging custom reflective loaders. This post includes a walkthrough of creating a UDRL in Visual Studio that facilitates debugging, an introduction to UDRL-VS, and an overview of how to apply a UDRL to Beacon.

Note: There are many people out there that prefer to use tools such as MingGW/GCC/LD/GDB etc. and we salute you. However, this post is intended for those of us that like the simplicity of Visual Studio and enjoy a GUI. To develop this template we used Visual Studio Community 2022.

Reflective Loading

Beacon is just a Dynamic Link Library (DLL). As a result, it needs to be “loaded” for us to work with it. There are many different ways to load a DLL in Windows, but Reflective DLL Injection, first published by Stephen Fewer in 2008, provides the means to load a DLL completely in memory. There is a lot of information available regarding PE files, reflective loading, and even improving upon Reflective DLL Injection. Therefore, this post will not delve into this in much detail. Fundamentally though, a reflective loader must:

  • Allocate some memory.
  • Copy the target DLL into that memory allocation.
  • Parse the target DLL’s imports/load the required modules/resolve function addresses.
  • Rebase the DLL (fix the relocations).
  • Locate the DLL’s Entry Point.
  • Execute the Entry Point.

In Stephen Fewer’s original implementation, the code used to load the DLL into memory is compiled into the DLL and “exported” as a function. This is how Beacon’s default reflective loader works; if you inspect Beacon’s exported functions you’ll find one called ReflectiveLoader() which is where the magic happens. The following screenshot shows Beacon’s Export Address Table (EAT) and its ReflectiveLoader() function in CFF Explorer.

Figure 1. Beacon’s Export Address Table in CFF Explorer.

Note: Typically, when a reflective loader is implemented in this fashion, a small shellcode stub is also written to the start of the PE file (over the DOS header) to ensure that execution is correctly directed to the right place (the ReflectiveLoader() function). This is what makes it position independent as it’s possible to simply write the reflective DLL to memory, start a thread and let it run.

In 2017, an analysis of the Double Pulsar User Mode Injector (Double Pulsar) leaked by Shadow Brokers showed an alternate approach to reflective loading (archive link). Double Pulsar differed because it was not compiled into the DLL but prepended in front of it. This approach allowed it to reflectively load any DLL. Later in 2017, the Shellcode Reflective DLL Injection (sRDI) project was released which used a similar approach. sRDI is able to take an arbitrary PE file and make it position independent which means it can also be used to load Beacon.

The following high-level diagram shows the different locations of the reflective loader between Stephen Fewer’s approach and Double Pulsar.

Figure 2. The different locations of ReflectiveLoader().

The User-Defined Reflective Loader (UDRL)

The UDRL is an important aspect of Cobalt Strike’s evasion strategy. Cobalt Strike achieves “evasion through flexibility”, meaning we give you the tools you need to modify default behaviors and customize Beacon to your liking. This was something that Raphael Mudge felt strongly about and will remain a key part of the Cobalt Strike strategy moving forward.

As described above, Beacon’s default ReflectiveLoader() is compiled into Beacon and exported. As a result, the UDRL was originally intended to work in the same fashion. The Teamserver would take a given UDRL and use it to overwrite Beacon’s default ReflectiveLoader() function. A great example of a UDRL that utilizes this workflow is BokuLoader by Bobby Cooke.

In this blog post, we’ll be exploring the same approach used by Double Pulsar and will therefore append Beacon to our loader as shown in Figure 2. TitanLdr by Austin Hudson is an excellent example of a UDRL that uses this approach. AceLdr by Kyle Avery is another very good example that also includes some additional functionality for avoiding memory scanners.

There are likely many other UDRLs available, and without a doubt even more that have not been made public. The above projects have been mentioned as they are impressive public examples. If you’ve developed a UDRL for Cobalt Strike yourself and you’d like to share it, you can submit it to the Cobalt Strike Community Kit.

Enter Visual Studio

The original UDRL example provided in the Arsenal Kit is a slightly modified version of Stephen Fewer’s reflective loader, so here we’ll also start in the same place. To save a lot of unnecessary content, we will not cover the process of creating an empty Visual Studio project and copy/pasting code. The only slight difference at this stage however is that our project files were created with the .cpp extension. This minor change to .cpp allows the project to access some additional functionality (more on this later). For clarity, the folder layout of the project after copy/pasting Stephen Fewer’s code has been illustrated below.

UDRL-VS/
├── Header Files/
│ ├── ReflectiveDLLInjection.h
│ └── ReflectiveLoader.h
├── Source Files/
└── ReflectiveLoader.cpp

The purpose of this Visual Studio project is to create a PE executable file that contains our reflective loader. This executable file can then be compiled in either Debug mode or Release mode. In Debug mode it can be used in combination with Visual Studio’s debugger to step through the code and Debug our loader. In Release mode, we can strip our loader out of the resulting executable and prepend it to Beacon to create a Double Pulsar style payload as illustrated in Figure 2.

To compile the project and ensure that it executes correctly, we need to change some of Visual Studio’s Project Settings. These have been outlined below:

  • Entry Point (ReflectiveLoader) – This setting changes the default starting address to Stephen Fewer’s ReflectiveLoader() function. A custom entry point would normally be problematic for a traditional PE file and require some manual initialization. However, Stephen Fewer’s code is position independent, so this won’t be a problem.
  • Enable Intrinsic Functions (Yes) – Intrinsic functions are built into the compiler and make it possible to “call” certain assembly instructions. These functions are “inlined” automatically which means the compiler inserts them at compile time.
  • Ignore All Default Libraries (Yes) – This setting will alert us when we call external functions (as that would not be position independent).
  • Basic Runtime Checks (Default) – This setting is configured correctly in Release mode by default, but changing it in the Debug configuration disables some runtime error checking that will throw an error due to our custom entry point.
  • Optimization – We’ve enabled several of Visual Studio’s different Optimization settings and opted to favor smaller code where possible. However, at certain points in the template we’ve disabled it to ensure our code works as expected.

Note: Optimization can be great because it makes our code smaller and faster. However, it’s important to know what can be optimized and what can’t, which is made even more complex when writing position independent code. If you run into problems, it can be worth checking whether something is being optimized away by the compiler.

Function Positioning

In this post, we are using the Double Pulsar approach to reflective loading. Therefore, after compiling the Release build, we will extract the loader from the resulting executable and prepend it to Beacon to create our payload. As part of this model, we need to ensure that the loaders’ entry point sits at the very start of the shellcode. We also need to make sure that we can identify the end of the loader in order to find out where Beacon begins. This has been illustrated in the following high-level diagram:

Figure 3. A high-level overview of Function Positioning.

There are different ways to achieve this “positioning”, however, for the purposes of this template we have used the code_seg pragma directive. code_seg can be used to specify which section is used to store specific functions. These sections can then be ordered using alphabetical values e.g .text$a. This is possible because the linker takes the section names and splits them at the first dollar sign, the value after it is then used to sort the sections which facilitates the alphabetical ordering. A similar approach to function ordering can also be seen in both TitanLdr/AceLdr in link.ld.

In the example below, we have placed the ReflectiveLoader() function within .text$a to ensure that it is positioned at the start of the .text section and therefore the start of the payload. The remaining functions in ReflectiveLoader.cpp have been placed inside .text$b to ensure that they are located after ReflectiveLoader(). The compiler can order the functions within a given section however it chooses, so this approach of using $a and $b enforces the required layout.

#pragma code_seg(".text$a")
ULONG_PTR WINAPI ReflectiveLoader(VOID) {
[…SNIP…]
}
#pragma code_seg(".text$b")
[…SNIP…]

Note: In some public examples of reflective loaders, a small shellcode stub is used at the very start of execution to ensure stack alignment. This approach is not explicitly required in our template at this point as the loader is intended for use with memory allocation/thread creation APIs for simplicity. It should therefore be aligned correctly. If you do require this stack alignment, it would still be possible to use a similar shellcode stub in this model but it can be left as an exercise for the reader. Matt Graeber’s Writing Optimized Windows Shellcode in C and the associated PIC_Bindshell code demonstrate this. In addition, it can also be found in TitanLdr/Aceldr in start.asm.

We can use the same approach described above to also locate the end of the loader. In the code snippet below, we have used the code_seg directive once more to position the LdrEnd() function. Previously, we used $a to position ReflectiveLoader() at the start of the .text section and here we are using $z to position LdrEnd() at the end of it.

#pragma code_seg(".text$z")
void LdrEnd() {}

The following high-level diagram illustrates the code sections described above.

Figure 4. A high-level overview of Function Positioning with alphabetical values.

The Release build is designed to work with the Teamserver which will append Beacon to our loader. As part of the Debug build, we need to simulate the Release mode behavior. The code_seg directive can also be used in combination with the declspec allocate specifier to position the contents of data items. In the example below, we use the code_seg directive to specify a section, and then use the declspec specifier to place the contents of Beacon.h (unsigned char beacon_dll[]) within it. This logic was placed in End.h/End.cpp for simplicity.

#ifdef _DEBUG
#pragma code_seg(".text$z")
__declspec(allocate(".text$z"))
#include "Beacon.h"
#endif

The folder layout after adding the above files to the project has been illustrated below.

UDRL-VS/
├── Header Files/
│   ├── Beacon.h
│   ├── End.h
│   ├── ReflectiveDLLInjection.h
│   └── ReflectiveLoader.h
├── Source Files/
    ├── End.cpp
    └── ReflectiveLoader.cpp

This is the crux of our development environment, by positioning LdrEnd()/Beacon.h we’re able to easily find the location of Beacon. This change to Stephen Fewer’s original code has been shown below.

#ifdef _DEBUG
    uiLibraryAddress = (ULONG_PTR)beacon_dll;
#elif _WIN64
    uiLibraryAddress = (ULONG_PTR)&ldr_end + 1;
[…SNIP…]
#endif

Note: The x86 version of the Release build works in a slightly different fashion to the one described above. Positioning LdrEnd() and referencing its address works in x64 because the compiler identifies it using relative addressing. Disassembling the binary shows a “load effective address” at [rip + offset] (LEA RSI,[RIP+0X6B9]). This approach does not work in x86 because the absolute address of LdrEnd() is calculated at compile time. Therefore, it points to a completely incorrect location when the loader is prepended to Beacon (MOV EBX, 0X401600). To provide support for x86, we recycled Stephen Fewer’s caller() function in our template and renamed it to GetLocation(). This function simply returns the calling function’s return address via the _ReturnAddress() intrinsic function. Instead of referencing the address of LdrEnd() in x86, we call it, which in turn calls GetLocation(). We then use simple pointer arithmetic to work out the location of Beacon. We could’ve done this for both x86 and x64 but included both to show the two approaches and highlight the difference.

At this point, we now have an operational Debug build. We can set a breakpoint, click “Local Windows Debugger”, and use all the features of Visual Studio’s debugger.

The UDRL-VS Kit

In the previous section we used Stephen Fewer’s original reflective DLL injection code to show that only minor modifications were required to get up and running. However, we wanted to take this a step further and provide a template to support developing and debugging UDRLs for Cobalt Strike.

As part of creating this template, we have attempted to simplify Stephen Fewer’s original code by splitting it into separate functions, removing unused code, updating types and providing more descriptive variable names. In addition, we have also provided some helper functions to speed up writing position independent code (PIC). The following sections provide an overview of these helper functions. For additional help writing PIC, there is an excellent public framework available called ShellcodeStdio that also demonstrates the techniques described below.

Compile Time Hashing

In Stephen Fewer’s original code, several hashes had been pre-calculated and included in ReflectiveLoader.h. This solution works well, but to simplify it further and make it easier for you to include your own hashes, we have added “compile time hashing”.

As the CPP reference states, the “constexpr” specifier makes it possible to “evaluate the value of a function or variable at compile time”. Therefore, it is possible to use the constexpr specifier as part of a hash function to ensure that the hash is generated at compile time. This means instead of pre-calculating hashes and including them in our header file, we can have the compiler/preprocessor hash our strings for us.

Note: Compile time hashing will help us more in a subsequent post, but at this point, an added benefit is that it makes it easier to rotate Stephen Fewer’s HASH_KEY value used to hash the strings. It is not a silver bullet but changing the HASH_KEY could help to push back on simple static signatures.

In the template, we have replaced Stephen Fewer’s static hash values with calls to CompileTimeHash().

constexpr DWORD KERNEL32DLL_HASH = CompileTimeHash("kernel32.dll");
constexpr DWORD NTDLLDLL_HASH = CompileTimeHash ("ntdll.dll");

constexpr DWORD LOADLIBRARYA_HASH = CompileTimeHash("LoadLibraryA");
constexpr DWORD GETPROCADDRESS_HASH = CompileTimeHash("GetProcAddress");
constexpr DWORD VIRTUALALLOC_HASH = CompileTimeHash("VirtualAlloc");
constexpr DWORD NTFLUSHINSTRUCTIONCACHE_HASH = CompileTimeHash("NtFlushInstructionCache");

Note: We have also modified the original hash() function in the template to normalize strings to uppercase before hashing so that “lOadLiBrarYa” and “LoadLibraryA” result in the same hash.

PRINT()

It can be helpful to print strings as part of debugging, but as we mentioned earlier, a custom entry point can affect startup routines, etc. This means that at the start of execution we do not have direct access to the C/C++ standard library or any Windows APIs.

As part of simplifying Stephen Fewer’s original code, we broke it down into independent functions. As a result, we now have a GetProcAddressByHash() function in Utils.cpp that we can use to resolve function addresses. To save a lot of time and effort we have used this to create a _printf() function for Debug purposes and included it in our template. This _printf() function works in the same way as the original printf() so you can give it format specifiers and use it to print variables, etc. We also wrapped it into a macro called PRINT() which will only generate the _printf() calls when the project is compiled in Debug mode.

PRINT("[+] Beacon Start Address: %p\n", beaconBaseAddress);

Here is a screenshot of the above function in action. We have printed the location of Beacon and then found it using the disassembly view in Visual Studio.

Figure 5. Finding Beacon’s MZ Header with a call to PRINT().

Strings

Strings are saved into the .data/.rdata section of a PE file and will therefore be unavailable once we extract the loader (which will be exclusively found in the .text section). It’s therefore important to understand how strings are created and stored within a PE file. Compiler Explorer is an excellent website for seeing how your code is assembled and even color codes the input/output. The following screenshot shows three different approaches to declaring strings in C++.

Figure 6. A demonstration of how strings are created and stored with Compiler Explorer.

The first declaration uses an array initializer; this has been highlighted in yellow. The output window shows how move instructions are used to construct the string one byte at a time. This means that all the code is found within the .text section.

The next approach uses a string literal to initialize the data. As shown in the purple output, the bytes of the string are copied into the array from the .data section. This has been broken down and explained below.

lea    rax, QWORD PTR string$[rsp]     ; load the address of where the string will be on the stack (destination address)
lea    rcx, OFFSET FLAT : $SG2657      ; load the address of the string in the .data section (source address)
mov    rdi, rax			       ; save destination address into destination pointer (RDI)
mov    rsi, rcx			       ; save source address into source pointer (RSI)
mov    ecx, 12 			       ; save the size of the string into the count register (ECX)
rep    movsb  		               ; move a single byte from RDI to RSI and repeat based on ECX (size of string)

In the final example, a char pointer is initialized with a string literal. As shown in the red output, it references the value in the .data section. This has also been broken down and explained below.

lea    rax, OFFSET FLAT:$SG2658        ; load the address of the string in the .data section
mov    QWORD PTR stringPtr$[rsp], rax  ; save the address of the string on the stack

After reviewing the above, we can see the only real option for us when writing PIC is to either avoid using strings (not always possible) or use the first approach in the example above.

char helloWorld[] = {'H','e','l','l','o',' ','W','o','r','l','d','\0'};

As with everything when writing PIC, this is a little clumsy and cumbersome. However, Evan McBroom has provided a very simple and elegant solution to this problem. Evan discovered that when using the constexpr specifier to initialize a char array with a string literal, the resulting string was constructed in the same fashion as the array initializer described above. The following screenshot demonstrates this with Compiler Explorer.

Figure 7. A demonstration of Evan McBroom’s PIC string with Compiler Explorer.

Evan wrapped this into two macros that can be used to create both ASCII strings and wide strings.

#define PIC_STRING(NAME, STRING) constexpr char NAME[]{ STRING }
#define PIC_WSTRING(NAME, STRING) constexpr wchar_t NAME[]{ STRING }

We have added these two macros to the template, this can be seen in the following example.

PIC_STRING(example, "[!] Hello World\n");
PRINT(example);

Release Mode

The ability to develop and debug inside Visual Studio is great, but what about using this loader in production? The great thing about writing a PIC loader is that everything we need is located inside the resulting PE files’ .text section. This means we can use a simple Python script to extract our compiled executable’s .text section and voila, we have our UDRL!

Note: This is why we used the “Function Positioning” described earlier. We needed to ensure that our ReflectiveLoader() function was positioned correctly at the very start of the .text section, which becomes the very start of the UDRL (aka the loader).

There are many examples of Python scripts that do something similar; both TitanLdr and AceLdr have similar scripts in their respective repositories. We have also included a script in the Arsenal kit template called udrl.py. Visual Studio allows us to incorporate this script as a post-build event and so the Release build will automatically create udrl-vs.bin in the relevant Output Directory.

To simplify testing and development, udrl.py also facilitates shellcode execution. This allows you to quickly test the loader without having to go via the Teamserver. We’d strongly recommend using this frequently to test your work. When writing PIC, things will often work in Debug mode but not in Release mode. For example, you can easily be caught out by forgetting the constepxr specifier, by forgetting to initialize pointers, or by using strings that aren’t PIC.

C:\> py.exe udrl.py prepend-udrl .\beacon.x64.bin .\x64\Release\udrl-vs.exe

            _      _
           | |    | |
  _   _  __| |_ __| |  _ __  _   _
 | | | |/ _` | '__| | | '_ \| | | |
 | |_| | (_| | |  | |_| |_) | |_| |
  \__,_|\__,_|_|  |_(_) .__/ \__, |
                      | |     __/ |
                      |_|    |___/

[+] Success: Extracted loader
[*] Size of loader: 1229
[+] Start Address: 0x1b690d90000
[+] Shellcode Executed

Note: Make sure to use the 32-bit version of Python when testing x86 loaders. It will save you a couple of minutes of confusion…

Previously we used the Double Pulsar approach to loading because it simplified our Development/Debugging and provided an alternate way to write a UDRL. However, there is no reason why we can’t still use the “original” UDRL workflow and simply replace Beacon’s default loader with the one we have created.

The UDRL-VS template contains an additional Build Configuration called “Release (Stephen Fewer)”. This Build Configuration still creates the same PIC loader, however, instead of using the LdrEnd() function to calculate the location of Beacon, it uses Stephen Fewer’s original approach of walking backward through memory to find the start address of the DLL that is being loaded (Beacon).

To make it easy to test this type of loader, we have also included an option in udrl.py to overwrite Beacon’s default loader and execute the resulting payload.

C:\> py.exe udrl.py stomp-udrl .\beacon.x64.bin ".\x64\Release (Stephen Fewer)\udrl-vs.exe"

            _      _
           | |    | |
  _   _  __| |_ __| |  _ __  _   _
 | | | |/ _` | '__| | | '_ \| | | |
 | |_| | (_| | |  | |_| |_) | |_| |
  \__,_|\__,_|_|  |_(_) .__/ \__, |
                      | |     __/ |
                      |_|    |___/

[+] Success: Extracted loader
[*] Size of loader: 1277
[*] Found ReflectiveLoader - RVA: 0x17aa4       File Offset: 0x16ea4
[+] Success: Applied UDRL to DLL
[+] Start Address: 0x27239a20000
[+] Shellcode Executed

Once your loader has been tested and works as expected, it can be used in combination with an Aggressor Script to make it operational. We don’t strictly need to use Aggressor. We could use a script like udrl.py to create the payload, however, Aggressor Script has several functions that will simplify customization in subsequent posts and saves writing extra code.

We can use some very simple Aggressor Scripts to apply our loaders to Beacon. The following example demonstrates how to append Beacon to our loader (almost a carbon copy of the one used by TitanLdr/AceLdr).

set BEACON_RDLL_GENERATE {
        # Declare local variables
	local('$arch $beacon $fileHandle $ldr $path $payload');
	$beacon = $2;
	$arch = $3;
	
	# Check the payload architecture
	if($arch eq "x64") {
            $path = getFileProper(script_resource("x64"), "Release", "udrl-vs.bin");
	} 
        else if ($arch eq "x86") {
            $path = getFileProper(script_resource("Release"), "udrl-vs.bin");
	}
        else {
            warn("Error: Unsupported architecture: $arch");
            return $null;
        }

	# Read the UDRL from the supplied binary file
	$fileHandle = openf( $path );
	$ldr = readb( $fileHandle, -1 );
	closef( $fileHandle );
	if ( strlen( $ldr ) == 0 ) {
		warn("Error: Failed to read udrl-vs.bin");
		return $null;
	}

	# Prepend UDRL to Beacon and output the modified payload.
	return $ldr.$beacon;
}

The following example demonstrates how to overwrite Beacon’s default loader with our own. We still read the loader in the same fashion, but this time we call setup_reflective_loader(). This function does the heavy lifting for us; it finds the current ReflectiveLoader() function in Beacon and replaces it with the one provided.

set BEACON_RDLL_GENERATE {	
        # Declare local variables
	local('$arch $beacon $fileHandle $ldr $path $payload');
	$beacon = $2;
	$arch = $3;
	
	# Check the payload architecture.
	if($arch eq "x64") {
            $path = getFileProper(script_resource("x64"), "Release (Stephen Fewer)", "udrl-vs.bin");
        } 
        else if ($arch eq "x86") {
            $path = getFileProper(script_resource("Release (Stephen Fewer)"), "udrl-vs.bin");
	}
        else {
            warn("Error: Unsupported architecture: $arch");
            return $null;
        }

	# Read the UDRL from the supplied binary file
	$fileHandle = openf( $path );
	$ldr = readb( $fileHandle, -1 );
	closef( $fileHandle );
	if ( strlen( $ldr ) eq 0 ) {
		warn("Error: Failed to read udrl-vs.bin");
		return $null;
	}

	# Overwrite Beacon's ReflectiveLoader() with UDRL
	$payload = setup_reflective_loader($beacon, $ldr);
	

	# Output the modified payload.
	return $payload;
}

If we load either of the scripts above into Cobalt Strike and export a payload, we’ll see a message in the Script Console confirming that the custom loader was used. The resulting shellcode can then be used in combination with a Stage0 of your choosing.

Closing Thoughts

That concludes the first post of this series Revisiting the UDRL. As part of this post we have created a Visual Studio project with several Quality of Life (QoL) improvements. We’re now able to develop, debug and operationalize both Stephen Fewer’s original reflective loader and the Double Pulsar concept for Cobalt Strike using Visual Studio. The template developed as part of this project can be found in the Arsenal Kit under udrl-vs in “kits”. In the next installment we’ll explore some evasive techniques as well as how to modify default behaviors.

Cobalt Strike 4.8: (System) Call Me Maybe

Cobalt Strike 4.8 is now available. This release sees support for system calls, options to specify payload guardrails, a new token store, and more.  

We had originally planned to get this release out late in 2022 but progress was stymied due to the 4.7.1 and 4.7.2 patch releases that we had to put out to fix vulnerabilities that were reported in the 4.7 release. We spent a few development cycles performing a security review of the code and working on some technical debt, and then it was the holiday season. It’s here now though, and better late than never!  

Before getting into the details of this release, I just wanted to mention that you should now start to see much more content from us to supplement main product releases. William Burgess recently released his first blog post since joining the Cobalt Strike team and he will be playing a key role in providing technical guidance on the future direction of the product. We have more blog posts and tooling coming over the next few weeks and months, starting with a series on UDRL development (the first of which should drop next week). Coming later in the year are some huge changes to Cobalt Strike itself. More details on that will come in a follow-up blog post soon. We know that our users are struggling with evasion and have reported other pain points. As I mentioned in my roadmap update last year, we have been aggressively building out our R&D team and while it’s taken a while to do that and get all of our ducks in a row, you’ll now really start to see the benefits of those behind-the-scenes changes. Now, back to the 4.8 release. 

System Calls Support

This release sees the addition of support for direct and indirect system calls. We have added support for a number of system calls, specifically:  

  • CloseHandle
  • CreateFileMapping
  • CreateRemoteThread
  • CreateThread
  • GetThreadContext
  • MapViewOfFile
  • OpenProcess
  • OpenThread
  • ResumeThread
  • SetThreadContext
  • UnmapViewOfFile
  • VirtualAlloc
  • VirtualAllocEx
  • VirtualFree
  • VirtualProtect
  • VirtualProtectEx
  • VirtualQuery

The stageless Beacon payload generation dialog has been updated to allow you to specify the system call method to be used at execution time. The available options are:  

None: Use the standard Windows API function
Direct: Use the Nt* version of the function
Indirect: Jump to the appropriate instruction within the Nt* version of the function

It is important to note that there are some commands and workflows that inject or spawn a new Beacon that do not allow you to set the initial system call method. Those commands/workflows are:

  • elevate 
  • inject 
  • jump 
  • spawn 
  • spawnas 
  • spawnu 
  • teamserver responding to a stageless payload request 
  • teamserver responding to an External C2 payload request 

The stage.syscall_method in the MalleableC2 profile controls the method used at execution time, and you can use the syscall-method [method] command to modify the method that will be used for subsequent commands. Additionally, syscall-method without any arguments will query and return the current method.  

System call support is something that we intend to continue to update and enhance in future releases. Your feedback on this is welcomed.  

Generate Payloads With Built-In Guardrails 

Support has been added for payload guardrails, which can be set at the listener level and then, if required, overridden when generating a payload.  

Guardrails can be set based on the following criteria: 

  • IP address: This can be a single IP address or a range using a wildcard to replace the rightmost octet(s). For example, 123.123.123.123, 123.123.123.*, 123.123.*.* and 123.*.*.* are all valid inputs. 123.*.123.* is not. 
  • Username: This can be a specific user name, or you can prefix/suffix a wildcard (i.e. *user or user*). The username field is case insensitive.  
  • Server name: Again, this can be a specific server name, or you can prefix/suffix a wildcard (i.e. *server or server*). The server name field is case insensitive. 
  • Domain: As with username and server name, the domain field can either be a specific domain or you can prefix/suffix a wildcard (i.e. *domain or domain*). The domain name field is also case insensitive.

The listener dialog has a new “Guardrails” option at the bottom of the screen that allows you to set and update guardrails for that listener.  

When generating a payload, the guardrails value from the associated listener is used as a default value.  

You can use the default values or override them to set specific values for the payload being created. Setting specific values here does not change the default values set at the listener level.

Multi-Byte Support For Obfuscating Beacon’s Reflective DLL’s Import Table

We have made a change to the obfuscation routine used for the stage.obfuscate MalleableC2 option which relates to how Beacon’s Reflective DLL’s import table is obfuscated.

Part of this process involved moving from a fixed, single byte XOR key to a randomly generated multi-byte XOR key. The single byte XOR mask was easily signatured and caught by tools such as YARA. Moving to a randomly generated multi-byte XOR key should help address those issues. 

Sleep Mask Updates

A number of updates have been made to the Sleep Mask. The main change is that the Sleep Mask size limit has been increased from 8192 to 16384 bytes. Other changes include: 

  • Support for the use of system calls with the MASK_TEXT_SECTION capability
  • The addition of define tags for the Windows API functions to remove the need for LIBRARY$Function syntax
  • The implementation of evasive sleep via stack spoofing (x64 only). Related changes include the addition of a bypass for Control Flow Guard (CFG), as well as the addition of a helper utility (getFunctionOffset)

Token Store

One change that we’ve had on the backlog for a while was the addition of a token store, to facilitate the hot swapping of access tokens. Windows tokens are process specific; hence each Beacon has its own token store with its own tokens. Please be aware that those tokens are therefore only available to be used by that specific Beacon.

The token store is based around the new token-store command. The command supports a number of options that perform specific functions (all supported by tab completion). Available functions are as follows:

token-store steal [pid,…] <OpenProcessToken access mask>
This steals the token(s) from the specificed PID(s). Use commas to separate each PID. This command will steal the token and put it into the token store, but will not impersonate straight away. The steal-and-use command should be used for that purpose (although it should be noted that steal-and-use only supports a single PID). This command supports the optional OpenProcessToken access mask like the existing steal_token command. 

token-store use [id]
This tasks Beacon to use the token with the specified ID from the token store.  

token-store steal-and-use [pid] <OpenProcessToken access mask>
This allows you to steal the token from the specified PID, add it to the token store and immediately use it. This command supports the optional OpenProcessToken access mask like the existing steal_token command. 

token-store show 
This displays the tokens in the token store. 

token-store remove [id,…]
This allows you to remove the token(s) corresponding to the specific ID(s) from the token store. Use commas to specify multiple IDs.  

token-store remove-all 
This removes all tokens from the token store. 

One additional point is that tokens can also be stolen via the process list in the GUI. Steal a token in the usual way (i.e. Explore -> Process List, select one or more PIDs and click on Steal Token) and then make sure the “store token in the token store” checkbox is checked before stealing the token. You are also able to select multiple Beacons at once before opening the process list.  

Finally, we have also implemented corresponding aggressor functions to the options described above, i.e.:  

  • btoken_store_remove 
  • btoken_store_show 
  • btoken_store_steal_and_use 
  • btoken_store_steal 
  • btoken_store_remove_all 
  • btoken_store_use 

ETW Blinding 

This release also sees the addition of support for ETW blinding via patching for the execute-assembly and powerpick commands. While there is limited support in this release, this is something that we will look to build on in future releases.  

The execute-assembly and powerpick commands now have an optional PATCHES parameter, specified as follows: 

execute-assembly “[PATCHES: [patch-rule] [patch-rule] [patch-rule] [patch-rule]]” [/path/to/file.exe] [arguments] 

and  

powerpick “[PATCHES: [patch-rule] [patch-rule] [patch-rule] [patch-rule]]” [commandlet] [arguments] 

The optional PATCHES parameter can modify functions in memory for the process. Up to four patch-rule rules can be specified (delimited by spaces), with each patch-rule being made up of a library, function, offset and hex patch value, for example: [library],[function],[offset],[hex-patch-value], where:

  • library can be 1 – 260 characters  
  • function can be 1-256 characters 
  • offset is the offset from the start of the executable function and can be 0 – 65535 
  • hex-patch-value can be 2 – 200 hex characters (0 – 9, A – F) and the length must be and even number (hex pairs)

Further information on the patch-rule syntax can be found in the documentation.  

Sync Data At Teamserver Startup 

There has been a long-standing issue within Cobalt Strike whereby any data retrieved from a target (for example, screenshots, keylog data etc.) is unavailable in the client after the teamserver is restarted. This issue has been addressed, and data that was previously “lost” to the UI is now persisted between restarts.  

We have also added a new script, clearteamserverdata, that can be used to delete all data after an engagement has completed. 

Change How License Expiration Date Is Processed 

A key update related to product security is a change to how the license expiration date is processed. Previously, the expiration date was checked at teamserver startup and if it was valid, the teamserver was able to start. No further checks were performed and the teamserver was able to run until shut down by the operator.

We have tightened this processing up so that the license expiration date is checked on a daily basis. The reasoning behind the previous behaviour was that it wasn’t always convenient or logistically possible to run an update in the middle of an engagement (to grab a new authorization file with an updated expiration date). Nor was it convenient under those circumstances to restart the teamserver following an update. We have added mitigations for those scenarios in the new processing, meaning that there will be no impact on operations while refreshing your authorization file. If you need to update your license while an engagement is running, please consult the documentation for information on how you can do this. We do recommend, however, that you consider your license expiration date before starting a new engagement and avoid this situation occurring, if at all possible.

To make sure that you don’t get caught out by an expiring license, we have added a couple of new banners to the client UI.  

Starting 45 days before your license is due to expire, a warning banner will appear in the client, informing you that your license is expiring and when it is due to expire. This warning message can be dismissed but will reappear each day when the expiration date is checked. This should provide sufficient time to renew your license key if this has not already been taken care of:

If you fail to renew your license before it expires, you will have a 14-day grace period in which to do so. For the duration of that grace period, you will see an error banner that cannot be dismissed:

After the grace period has ended, if your license has not been renewed and the authorization file refreshed, the teamserver will shut down and you will be unable to restart it until you have renewed your license.  

Quality Of Life Changes 

In addition to the features already mentioned, we have also added a number of smaller changes, mainly requested by our users, as follows:

Added Flexibility To Specifying Sleep Time

A small change requested by a user was to make the sleep time easier to set. Rather than only being able to specify the sleep time in seconds, you can now also specify days, hours and minutes by suffixing those values with “d”, “h” and “m” respectively.  

A usage example is sleep 2d 13h 45m 8s 30j which translates to “sleep for 2 days, 13 hours, 45 minutes, 8 seconds with 30% jitter”. 

We have also added a new aggressor function, bsleepu that works in the same way.  

Display Current Token In The UI

Another user request was to display the current token in the client UI. This gelled nicely with the new token store and you are now able to see the current token the table view and status bar in brackets alongside the user name.  

Related to this change, we also addressed the issue whereby make_token would report the wrong user name (i.e. the user name of the current Beacon process) due to a Windows API limitation. This has been addressed and the correct username appears in brackets.   

Copy/Paste From The Beacon Output Pane 

We have added support for copying and pasting commands from the Beacon output pane.  

CTRL+C copies and CTRL+X cuts the selected text from either the output pane or the command line. Text can only be selected in the output pane or command line, not both. CTRL+V pastes text from the clipboard onto the command line. This works for any console window (for example, the Beacon console, SSH console, script console, event log and web log).  

Chain Multiple Commands In A Single Mimikatz Call 

The mimikatz command has been updated to support chaining multiple commands in a single operation. This also applies to the bmimikatz and bmimikazt_small aggressor functions (although bmimikatz_small is limited to lsadump::dcsync, sekurlsa::logonpasswords and sekurlsa::pth).  

Commands can be chained by adding a semicolon as a delimiter between commands, for example:
mimikatz standard::coffee;standard::coffee  

Note that the semicolon can still be escaped if it is required in a command (i.e. “\;”). Also, the command length is limited to 511 characters to ensure that the final character in the string is EOS (\0).  

Specify Exit Function On Stageless Windows Executable Dialog

A change was made in the 4.7 release to allow you to specify the exit option (either “thread” or “process”) on the Stageless Payload Generator dialog.  A similar change has now been made to the Stageless Windows Executable dialog that also allows you to specify the exit option (“thread” or “process”).  

Arsenal Kit Checksum

One final change to mention, again requested by a user, was to add a checksum for the Arsenal Kit. This has been done and it can be found at https://verify.cobaltstrike.com/arsenal-kit.  

To see a full list of what’s new in Cobalt Strike 4.8, please check out the release notes. While licensed users can run the update program to get the latest version, we recommend that you download version 4.8 from scratch from the website to get the new update application. This is because the TLS certificates on download.cobaltstrike.com and verify.cobaltstrike.com will be updated soon, and the existing updater will report errors if it isn’t updated. To purchase Cobalt Strike or learn more, please contact us

Behind the Mask: Spoofing Call Stacks Dynamically with Timers

This blog introduces a PoC technique for spoofing call stacks using timers. Prior to our implant sleeping, we can queue up timers to overwrite its call stack with a fake one and then restore the original before resuming execution. Hence, in the same way we can mask memory belonging to our implant during sleep, we can also mask the call stack of our main thread. Furthermore, this approach avoids having to deal with the complexities of X64 stack unwinding, which is typical of other call stack spoofing approaches. 

The Call Stack Problem

The core memory evasion problem from an attacker’s perspective is that implants typically operate from injected code (ignoring any module hollowing approaches). Therefore, one of the pillars of modern detection is to monitor for the creation of threads which belong to unbacked (or ‘floating’) memory. This blog by Elastic is a good approximation to the state of the art in terms of anomalous thread detection from an EDR perspective. 

However, another implication of this problem for attackers is that all the implants’ API calls will also originate from unbacked memory. By examining call stacks either at the time of a specific API invocation, or by proactively inspecting running threads (i.e. ones which are sleeping), suspicious call stacks can be identified via return addresses to unbacked memory.  

This is one detection area which historically has not received a huge amount of focus/research in modern EDR stacks (in my experience). However, this is starting to change with the release of open-source tools such as Hunt-Sleeping-Beacons, which will proactively inspect “sleeping” threads to find call stacks with unbacked regions. This demonstrably provides a high confidence signal of suspicious activity; hence it is valuable to EDRs and something attackers need to seriously consider in their evasion TTPs.  

Call Stack Inspection at Rest

The first problem to solve from an attacker’s perspective is how to manipulate the call stack of a sleeping thread so that it can bypass this type of inspection. This could be performed by the actual thread itself or via some external mechanism (APCs etc.).  

Typically, this is referred to as “spoofing at rest” (h/t to Kyle Avery here for this terminology in his excellent blog on avoiding memory scanners). The first public attempt to solve this problem is mgeeky’s ThreadStackSpoofer, which overwrites the last return address on the stack. 

As a note, the opposite way to approach this problem is by having no thread or call stack present at all, à la DeathSleep. The downside of this technique is the potential for the repeated creation of unbacked threads, (depends on the exact implementation), which is a much greater evil in modern environments. However, future use of Hardware Stack Protection by EDR vendors may make this type of approach inevitable. 

Call Stack Inspection During Execution – User Mode

The second problem is call stack inspection during execution, which could either be implemented in user mode or kernel mode. In terms of user mode implementation, this would typically involve hooking a commonly abused function and walking the stack to see where the call originated. If we find unbacked memory, it is highly likely to be suspicious. An obvious example of this is injected shellcode stagers calling WinInet functions. MalMemDetect is a good example of an open-source project that demonstrates this detection technique. 

For these scenarios, techniques such as RET address spoofing are normally sufficient to remove any evidence of unbacked addresses from the call stack. At a high level, this involves inserting a small assembly harness around the target function which will manually replace the last return address on the stack and redirect the target function to return to a trampoline gadget (e.g. jmp rbx). 

Additionally, there is SilentMoonWalk which uses a clever de-syncing approach (essentially a ROP gadget built on X64 stack unwinding codes). This can dynamically hide the origin of a function call and will similarly bypass these basic detection heuristics. Most importantly to an operator, both these techniques can be performed by the acting thread itself and do not require any external mechanism. 

From an opsec perspective, it is important to note that many of the techniques referenced in this blog may produce anomalous call stacks. Whether this is an issue or not depends on the target environment and the security controls in place. The key consideration is whether the call stack generated by an action is being recorded somewhere (say in the kernel, see next section) and appended to an event/alert. If this is the case, it may look suspicious to trained eyes (i.e. threat hunters/IR). 

To demonstrate this, we can take SilentMoonWalk’s desync stack spoofing technique as an example (this is a slightly easier use case as other techniques can be implementation specific).  As stated previously, this technique needs to find functions which implement specific stack winding operations (a full overview of X64 stack unwinding is beyond the scope of this blog but see this excellent CodeMachine article for further reading).  

For example, the first frame must always perform a UWOP_SET_FPREG operation, the second UWOP_PUSH_NONVOL (rbp) etc. as demonstrated in windbg below:

0:000> knf
#   Memory  Child-SP          RetAddr               Call Site 
00           0000001d`240feb98 00007ffe`b622d831     win32u!NtUserWaitMessage+0x14 
[…] 
08        40 0000001d`240ff140 00007ffe`b483b576     KERNELBASE!CreatePrivateObjectSecurity+0x31 
09        40 0000001d`240ff180 00007ffe`b48215a5     KERNELBASE!Internal_EnumSystemLocales+0x406 
0a       3e0 0000001d`240ff560 00007ffe`b4870e22     KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25 
0b       680 0000001d`240ffbe0 00007ffe`b6d87614     KERNELBASE!PathReplaceGreedy+0x82 
0c       100 0000001d`240ffce0 00007ffe`b71826a1     KERNEL32!BaseThreadInitThunk+0x14 
0d        30 0000001d`240ffd10 00000000`00000000     ntdll!RtlUserThreadStart+0x21 

0:000> .fnent KERNELBASE!PathReplaceGreedy+0x82 
Debugger function entry 000001cb`dda19c60 for: 
(00007ffe`b4870da0)   KERNELBASE!PathReplaceGreedy+0x82   |  (00007ffe`b4871050)   KERNELBASE!SortFindString
[…]  
  06: offs 13, unwind op 3, op info 2	UWOP_SET_FPREG.

0:000> .fnent KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25 
Debugger function entry 000001cb`dda19c60 for:  
(00007ffe`b4821580)   KERNELBASE!SystemTimeToTzSpecificLocalTimeEx+0x25   |  (00007ffe`b482182c)   KERNELBASE!AddTimeZoneRules 
[…] 
08: offs b, unwind op 0, op info 5	UWOP_PUSH_NONVOL reg: rbp. 

This output shows the call stack for the spoofed SilentMoonwalk thread (knf) and the unwind operations (.fnent) for two of the functions found on the call stack (PathReplaceGreedy / SystemTimeToTzSpecificLocalTimeEx). 

The key take away is that this results in a call stack which would never occur for a legitimate code path (and is therefore anomalous). Hence, KERNELBASE!PathReplaceGreedy does not call KERNELBASE!SystemTimeToTzSpecificLocalTimeEx … and so on. Furthermore, an EDR could itself attempt to search for this pattern of unwind codes during a proactive scan of a sleeping thread. Again, whether this is an issue depends entirely on the controls/telemetry in place but as operators it is always worth understanding the pros and cons of all the techniques at our disposal. 

Lastly, a trivial way of calling an API with a ‘clean’ call stack is to get something else to do it for you. The typical example is to use any callback type functionality provided by the OS (same applies for bypassing thread creation start address heuristics). The limitation for most callbacks is that you can normally only supply one argument (although there are some notable exceptions and good research showing ways around this). 

Call Stack Inspection During Execution – Kernel Mode

A user mode call stack can be captured inline during any of the kernel callback functions (ie. on process creation, thread creation/termination, handle access etc…). As an example, the SysMon driver uses RtlWalkFrameChain to collect a user mode call stack for all process access events (i.e. calling OpenProcess to obtain a HANDLE). Hence, this capability makes it trivial to spot unbacked memory/injected code (‘UNKNOWN’) attempting to open a handle to LSASS. For example, in this contrived scenario you would get a call stack similar to the following: 

0:020> knf 
#       Memory    Child-SP           RetAddr               Call Site 
00                0000004c`453cf428  00007ffd`7f1006fe     ntdll!NtOpenProcess 
01           8    0000004c`453cf430  00007ff6`98fe937f     KERNELBASE!OpenProcess+0x4e 
02          70    0000004c`453cf4a0  000002ad`c3fd1121     000002ad`c3fd1121 (UNKNOWN) 

Additionally, it is now possible to collect call stacks with the ETW threat intelligence provider.  The call stack addresses are unresolved (i.e. an EDR would need to keep its own internal process module cache to resolve symbols) but they essentially enable EDR vendors the potential to capture near real time call stacks (where the symbols are then resolved asynchronously). Therefore, this can be seen as a direct replacement for user mode hooking which is, critically, captured in the kernel. It is not unrealistic to imagine a scenario in the future in which unbacked/direct API calls to sensitive functions (VirtualAlloc / QueueUserApc / SetThreadContext / VirtualProtect etc.) are trivial to detect. 

These scenarios were the premise for some of my own previous research in to call stack spoofing during execution: https://github.com/WithSecureLabs/CallStackSpoofer. The idea was to offload the API call to a new thread, which we could initialise to a fake state, to hide the fact that the call originated from unbacked memory. My original PoC applied this idea to OpenProcess but it could easily be applied to image loads etc.  

The key requirement here was that any arbitrary call stack could be spoofed, so that even if a threat hunter was reviewing an alert containing the call stack, it would still look indistinguishable from other threads. The downsides of this approach were the need to create a new thread, how best to handle this spoofed thread, and the reliance on a hard coded / static call stack.

Call Stack Masking

Having given a brief review of the current state of research in to call stack spoofing, this blog will demonstrate a new call stack spoofing technique: call stack masking. The PoC introduced in this blog post solves the spoofing at rest problem by masking a sleeping thread’s call stack via an external mechanism (timers). 

While researching this topic in the past, I spent a large amount of time trying to get to grips with the complexities of X64 stack unwinding in order to produce TTPs to perform stack spoofing. This complexity is also present in a number of the other techniques discussed above. However, it occurred to me that there is a much simpler way to spoof/mask the call stack without having to deal with these intricacies. 

If we consider a generic thread that is performing any kind of wait, by definition, it cannot modify its own stack until the wait is satisfied. Furthermore, its stack is always read-writable. Therefore, we can use timers to:

  1. Create a backup of the current thread stack
  2. Overwrite it with a fake thread stack
  3. Restore the original thread stack just before resuming execution 

Any timer objects could be used, but for convenience I based my PoC on C5Spider’s Ekko sleep obfuscation technique.  

The only remaining challenge is to work out the value of RSP once our target thread is sleeping. This can be achieved using compiler intrinsics (_AddressOfReturnAddress) to obtain the Child-SP of the current frame. Once we have this, we can subtract the total stack utilisation of the expected next two frames (i.e. KERNELBASE!WaitForSingleObjectEx and ntdll!NtWaitForSingleObject) to find the expected value of RSP at sleep time.

Lastly, to make our masked thread look as realistic as possible, we can copy the start address and call stack of an existing (and legitimate) thread.

PoC || GTFO

The PoC can be found here: https://github.com/Cobalt-Strike/CallStackMasker

The PoC operates in two modes: static and dynamic. The static mode contains a hard coded call stack that was found in spoolsv.exe via Process Explorer. This thread is shown below and can be seen to be in a state of ‘Wait:UserRequest’ via KERNELBASE!WaitForSingleObjectEx:

The screenshot below demonstrates static call stack masking. The start address and call stack of our masked thread are identical to the thread identified in spoolsv.exe above:

The obvious downside of the static mode is that we are still relying on a hard coded call stack. To solve this problem the PoC also implements dynamic call stack masking. In this mode, it will enumerate all the accessible threads on the host and find one in the desired target state (i.e. UserRequest via WaitForSingleObjectEx). Once a suitable thread stack is found, it will copy it and use that to mask the sleeping thread. Similarly, the PoC will once again copy the cloned thread’s start address to ensure our masked thread looks legitimate.

If we run the PoC with the ‘–dynamic’ flag, it will locate another thread’s call stack to mimic as shown below: 

The target process (taskhostw.exe / 4520), thread (5452), and call stack identified above are shown below in Process Explorer:

If we now examine the call stack and start address of the main thread belonging to CallStackMasker, we can see it is identical to the mimicked thread:

Below is another example of CallStackMasker dynamically finding a shcore.dll based thread call stack from explorer.exe to spoof: 

The screenshot below shows the real ‘unmasked’ call stack:

Currently the PoC only supports WaitForSingleObject but it would be trivial to add in support for WaitForMultipleObjects.

As a final note, this PoC uses timer-queue timers, which I have previously demonstrated can be enumerated in memory: https://github.com/WithSecureLabs/TickTock. However, this PoC could be modified to use fully fledged kernel timers to avoid this potential detection opportunity. 

Out Of Band Update: Cobalt Strike 4.7.2

Cobalt Strike 4.7.2 is now available. This is an out of band update to fix a remote code execution vulnerability that is rooted in Java Swing but which can be exploited in Cobalt Strike.

Remote Code Execution Vulnerability

I’d like to start by giving credit to Rio Sherri (0x09AL) and Ruben Boonen (FuzzySec) from the X-Force Red Adversary Simulation Team for their work in not only researching this vulnerability, but also sharing their findings with me and my team and helping us to mitigate it. They plan on publishing detailed information about this on their blog later today (if their blog post isn’t live right now, check back later).

The write-up linked above goes into a tremendous amount of detail and is well worth taking the time to read. The very short version is that the underlying cause of this issue is due to Cobalt Strike’s user interface being built using the Java Swing framework. Certain components within Java Swing will automatically interpret any text as HTML content if it starts with <html>. This can be exploited using an object tag, which in turn can load a malicious payload from a webserver, which is then executed by the Cobalt Strike client. Disabling automatic parsing of html tags across the entire client was enough to mitigate this behaviour.

Why Is There No CVE For This Vulnerability?

While the remote code execution vulnerability could be exploited in Cobalt Strike, I feel that it is important to stress that this isn’t specific to Cobalt Strike and this is the reason why we haven’t submitted a new CVE to cover it. The underlying vulnerability can be found in Java Swing and can be exploited in any Java Swing GUI that renders html, not just Cobalt Strike. We felt that there were parallels between this and the recent log4j vulnerability – thousands of applications that used log4j were vulnerable and yet there aren’t CVEs to cover every single vulnerable application. It is the same case here, although I appreciate that some people may disagree.

It goes without saying that as this is our second out of band update in a matter of weeks, we apologise for any problems that these issues may have caused. If you notice any other issues with Cobalt Strike, please refer to the online support page, or report them to our support email address. Licensed users can run the update program to get this version, or download version 4.7.2 from scratch from the website. We recommend taking a copy of your existing Cobalt Strike folder before upgrading in case you need to revert to the previous version. To purchase Cobalt Strike or learn more, please contact us.

Out Of Band Update: Cobalt Strike 4.7.1

Cobalt Strike 4.7.1 is now available. This is an out of band update to fix an issue discovered in the 4.7 release that was reported to be impacting users, and for which there was no workaround. We also took the opportunity to address a vulnerability that was reported shortly after the 4.7 release, along with mitigations for potential denial-of-service attacks.

Sleep Mask Issue

An issue was reported whereby when stage.sleep_mask is not set (i.e. set to false), Beacon will still allocate space for the sleep mask BOF in memory. This issue has now been fixed.

CVE-2022-39197

An independent researcher identified as “Beichendream” reached out to inform us about an XSS vulnerability that they discovered in the teamserver. This would allow an attacker to set a malformed username in the Beacon configuration, allowing them to remotely execute code. We created a CVE for this issue which has been fixed.

As part of this fix, a new property has been added to the TeamServer.prop file (located in the home folder of the teamserver):

limits.beacons_xssvalidated specifies whether XSS validation is performed on selected Beacon metadata. By default, this is set to true.

Denial-of-Service Mitigations

We were also made aware of the potential to conduct a denial-of-service attack against the teamserver itself. While this can be mitigated by good OPSEC (using a redirector, turning staging off and so on), we have made updates to mitigate this type of attack.

A number of new properties have been added to the TeamServer.prop file as part of the mitigations:

limits.beacons_max sets a limit on the total number of Beacons that the teamserver will support. The default is 500. To turn this off (support an unlimited number of Beacons), use 0.

Three additional settings allow you to set a threshold rate for adding new Beacons (how many new Beacons can be added in a specific time period):

limits.beacon_rate_period specifies the time period (in milliseconds) during which the number of Beacons added is monitored and limited.

limits.beacon_rate_maxperperiod specifies how many new Beacons can be added in the specified time period.

limits.beacon_rate_disableduration specifies how long the teamserver will ignore additional new Beacons for if the number of new Beacons exceeds the limit in the given time period.

Example

limits.beacon_rate_period is set to 3000, limits.beacon_rate_maxperperiod is set to 50 and limits.beacon_rate_disableduration is set to 600000. If more than 50 new Beacons are added in a 3 second (3000ms) time period, any additional new Beacons added in the next 10 minutes (600000ms) will be ignored.

We apologise for any problems that these issues may have caused. If you notice any other issues with Cobalt Strike, please refer to the online support page, or report them to our support email address. Licensed users can run the update program to get this version, or download version 4.7.1 from scratch from the website. We recommend taking a copy of your existing Cobalt Strike folder before upgrading in case you need to revert to the previous version. To purchase Cobalt Strike or learn more, please contact us.

Cobalt Strike 4.7: The 10th Anniversary Edition

Cobalt Strike 4.7 is now available. This release sees support for SOCKS5, new options to provide flexibility around how BOFs live in memory, updates to how Beacon sleeps and a number of other changes that have been requested by our users. We’ve also given the user interface a bit of a refresh (including support for the much-requested dark mode!).

In recognition of Cobalt Strike’s 10th anniversary, I’d like to say a sincere thanks to all of our users for your continued support over the years – from the very first version created by Raphael Mudge, through the acquisition by Fortra (the new face of HelpSystems) and up to today and beyond. When I first met Raphael, he impressed upon me how unique and special Cobalt Strike’s user community is, and I’m reminded of that every day – from interactions on social media, to submissions to the Community Kit and all of the great (and to be honest, sometimes not so great!) feedback that we receive. Cobalt Strike wouldn’t be where it is today without your support and constant feedback, so thank you. Here’s to the next 10 years!

A Word About Evasion

Before getting into the details of the 4.7 release, I’d like to spend a little time talking about what isn’t in the release. We’ve had a lot of feedback over the last few months that Cobalt Strike is being aggressively fingerprinted, and this is making it difficult to bypass AV and EDR tools. This is making things particularly difficult for teams that don’t have the time to develop their own tools, and you may have been expecting changes in the 4.7 release to push back on this. However, as I mentioned in a blog post about our roadmap back in March, we aren’t going to be adding any out of the box evasive measures to the core Cobalt Strike product (and just to avoid repeating myself, please do read the blog post as it goes into depth about why that’s the case). That isn’t to say that we aren’t doing anything at all–of course we take this seriously, and of course we are focussing our efforts on making improvements. Our main product releases will continue to add flexibility, make changes to the product requested by our users, and keep things stable. Meanwhile, our growing research team will focus on adding new and updating existing evasive tools to the Arsenal Kit outside of the main release cycle, keeping things moving without affecting or making you wait for main product releases. This should work out much better for you, our users, in the long term. Rather than waiting for main product releases, you should start to see regular releases of, and updates to, the evasive tooling that you need in the Arsenal Kit.

Fortra continues to invest in both the development team and the research team. We recently released a Thread Stack Spoofing tool into the Cobalt Strike Arsenal Kit and we have a number of other tools currently in development that we are expecting to release over the next few weeks, filling in the gap between now and the 4.8 release at the end of the year. The reason for this aside is to reassure you all that we are acutely aware of the issues that you’re facing and while the 4.7 release itself doesn’t contain a raft of tools to address issues around evasion, we are taking this seriously and already working on this in the background. Thank you all for your patience as our research team finds their feet and research efforts ramp up.

Now, back to the details of the 4.7 release. That’s what you’re here for, after all.

SOCKS5 Proxy Server Support

This release sees the implementation of a popular feature request – support for SOCKS5. Rather than replacing SOCKS4a altogether, you can choose whether to use SOCKS4a or SOCKS5 when starting SOCKS. A number of changes have been made, including an update to the “Start SOCKS” dialog to enable you to choose between SOCKS4a and SOCKS5 (as well as enter required parameters if SOCKS5 is selected), an update to the Proxy Pivots table to display whether SOCKS4a or SOCKS5 is being used, updates to the commands to start and stop SOCKS in the Beacon console, and an update to the bsocks Aggressor Script command. For details on the new command line options, run help socks within a Beacon console. For general details of the changes, please refer to the documentation.

It is important to note that these changes currently only add support for DNS resolution and UDP. We have not added support for IPv6 or GSSAPI authentication in this release, because the feedback that we got from you is that those features aren’t critical. We will, of course, continue to monitor feedback and will add support for those features if and when you indicate that they are important to add. We also intend to make other changes in future releases, including decoupling SOCKS5 from Beacon which should improve both speed and reliability. That is a bigger change though, and our priority for this release was to add this initial support.

Adding Flexibility Around How BOFs Live In Memory

Beacon Object Files are a key feature for Cobalt Strike. We have added more malleability around how Beacon Object Files live in memory, which should make them more difficult to fingerprint. To facilitate this, two new Malleable C2 profile settings have been added:

bof_allocator controls how you allocate memory for your BOF. Supported settings are VirtualAlloc, MapViewOfFile and HeapAlloc.

bof_reuse_memory determines whether or not memory is released. If this setting is “true”, memory is cleared and then reused for the next BOF execution; if this setting is “false”, memory is released and the appropriate memory free function is used, based on the bof_allocator setting.

Memory permissions (RWX/RX or RW/RX) are set according to the values set in the new Malleable C2 profile settings above. The exception is HeapAlloc, which is always RWX.

Review Of BOF Usage Of VirtualAlloc RWX/RX Memory

Adding flexibility around how BOFs live in memory provided us with the means to address another item that we had on our backlog. We have added support for additional relocation types for BOFs, specifically the .xdata, .pdata, and .bss sections. This change firstly means that an issue has been resolved whereby BOFs sometimes wouldn’t run because the address offset was greater than 4GB. Secondly, the number of available dynamic functions has been increased from 32 to 64.

Sleep Updates

Changes have been made to the Sleep Mask Kit and around sleep in general.

The main change is that you are now able to override the method called when Beacon goes to sleep. From 4.4 through 4.6, Beacon would call the sleep mask function that was patched into the .text section. This had some drawbacks as you were limited on how the code could be written in the sleep_mask.c files used for writing BOFs, and there was also an issue related to size constraints. In this release, Beacon has been reworked to add support for all of the things that you can do in a BOF when it comes to sleeping. Not only do you now have the ability to use your own sleep function, but you are also now able to call dynamic functions (LIBRARY$function) and Beacon API functions (when the Beacon code isn’t masked).

There are two other benefits of this change worth highlighting:

Executable code is now no longer located within Beacon’s .text section and has instead been moved to a different memory region.

The sleep mask BOF size limit has been increased from 769 bytes to 8192 bytes.

Related to this, while the Arsenal Kit still supports the older versions of the sleep mask, this release adds support for implementing the sleep mask as a true BOF. You will need to pull the updated Arsenal Kit to be able to use this feature.

Steal Token Update

The steal_token function has been updated to enable it to steal tokens from processes that it previously couldn’t get to. A user reported that as Beacon used OpenProcessToken to request TOKEN_ALL_ACCESS, in some cases this returned an access denied error. Manually tweaking the permissions in Beacon when it called OpenProcessToken was enough for them to get steal_token to succeed.

We took this feedback on board and you are now able to customize the access mask when steal_token is invoked. A number of changes have been made to facilitate this:

A steal_token_access_mask option has been added to the Malleable C2 profile. This is optional and allows you to set a default access mask used for steal_token and bsteal_token.

Support has been added to allow you to set the access mask (and override the default value) when invoking steal_token and bsteal_token from the command line.

The Steal Token dialog has been updated to allow you to set the access mask (and override the default value). This applies when both a single process and multiple processes are selected before opening the dialog.

Note that if no default value for the access mask is provided (either via the new Malleable C2 profile option, dialog option, or command line options), steal_token will default to the current access mask of TOKEN_ALL_ACCESS.

Module Stomping Update

Based on user feedback, a small change has been made to module stomping. In some cases, although the module was loaded, the actual stomping failed because Beacon remained in virtual memory. This was because unless the exported function had an ordinal value between 1 and 15, Beacon would default to using VirtualAlloc. This limitation has now been addressed by adding optional syntax to the setting to specify the starting ordinal when searching for exported functions.

Clipboard Stealer

You are now able to steal the contents of the Windows clipboard on the target system via a command (clipboard) or an Aggressor Script command (bclipboard), with a caveat: this feature is only useful when the clipboard contains text (for example, credential material). This is a quick change and the intended use case is for those occasions where a target is observed using a password manager (or similar) to grab a password; you would then be able to copy that password (or other relevant material) from the clipboard for use. If there is text on the clipboard, this will be returned and displayed; if not, an error will be displayed informing you that the clipboard contents are not text based. The exception to this is that if the clipboard contents exceed 204800 bytes, an error will be returned instead.

While this is a quick change with limited scope, we will likely enhance this feature in a future release. There are a number of interesting directions that we could go with this and we’d be interested to hear your feedback.

User Interface/Default Aggressor Script Updates

This release brings a refresh to the look and feel of the client UI (although not the complete overhaul that we’re still considering for a future release), along with a number of changes to the default aggressor script that introduce some usability improvements. You may recognise some of the default Aggressor Script changes, as some of those changes were inspired by mgeeky’s “cobalt-arsenal” Aggressor Scripts (which incidentally can be found within the Cobalt Strike Community Kit). While we have also added our own changes and implemented some things in our own way, we would just like to say a huge thank you to mgeeky for permission to bring some of that functionality into Cobalt Strike itself.

We have made a number of changes in this area. Here are a few highlights:

Dark Mode

The most eye-catching change (and one of the most requested) is the addition of dark mode. This can be toggled via a new menu option.

Sleep Time Tracking

Sleep time tracking works by recording the sleep time for each Beacon and displaying it in a new column in the Beacon table view. This information is persisted between teamserver restarts so it should always be available.

Beacon Health Tracking

Linked to the sleep time tracking is a new Beacon Health tracking feature. This feature uses the sleep time and cross references this with the last check-in time to determine whether the Beacon is active, disconnected, or dead. This information is displayed in the Beacon table view and reflected in the Beacon’s icon. This feature can be enabled or disabled via a new option on the preferences dialog.

Icon Updates

Speaking of icons, we have updated the icons that are used on the pivot graph and in the Beacon table view to represent Beacon status and OS type.

Toolbar And Menu Updates

We have also updated the icons on the toolbar in the client and have removed toolbar buttons for some of the less popular functions. Related to this change, the main menu has been reorganised, flattening the menus and moving some of the options around into more useful and intuitive locations.

Bulk Payload Generation

Related to the menu reorganisation, we have added a new menu item to allow you to generate x86 and x64 stageless payloads for all available payload variants at once. A new Sleep function, all_payloads, has also been added to allow you to do this from the command line.

Stageless Payload Generator With Exit Option

We have added a stageless payload generator dialog that allows you to set either “thread” or “process” as the exit option.

Windows Error Code Resolution

Windows error codes are now automatically parsed and resolved, so you no longer need to memorise every single Windows error code or go and look it up when Beacon just returns the error code. The relevant error message is now displayed alongside the error code. We have also added a new Beacon console command (windows_error_code) and an Aggressor Script function (windows_error_code) that can be used to convert an error code to a message on demand.

Process List Display Updates

The output of the ps command has been enhanced to resolve parent/child relationships and display the process list as a treeview instead of the old, flat version. The Beacon process is displayed in yellow. We have plans to enhance this with more colour coding in a future release, too.

There are a number of other UI changes that have been implemented, including displaying more information in the Beacon and event status bars, displaying timestamps, making it easier to interact with Beacons, a new “import credentials” option, and much more.

To see a full list of what’s new in Cobalt Strike 4.7, please check out the release notes. Licensed users can run the update program to get the latest version, or download version 4.7 from scratch from the website. To purchase Cobalt Strike or learn more, please contact us.

Celebrating 10 Years of Cobalt Strike

Can you believe it? Cobalt Strike is 10 years old! Think back to the summer of 2012. The Olympics were taking place in London. CERN announced the discovery of a new particle. The Mars Rover, Curiosity, successfully landed on the red planet. And despite the numerous eschatological claims of the world ending by December, Raphael Mudge diligently worked to create and debut a solution unique to the cybersecurity market.

Raphael designed Cobalt Strike as a big brother to Armitage, his original project that served as a graphical cyber-attack management tool for Metasploit. Cobalt Strike quickly took off as an advanced adversary emulation tool ideal for post-exploitation exercises by Red Teams.

Flash forward to 2022 and not only is the world still turning, Cobalt Strike continues to mature, having become a favorite tool of top cybersecurity experts. The Cobalt Strike team has also grown accordingly, with more members than ever working on research activities to further add features, enhance security, and fulfill customer requests. With version 4.7 nearly ready, we’re eager to show you what we’ve been working on.

However, we’d be remiss not to take a moment to pause and thank the Cobalt Strike user community for all you’ve done to contribute over the years to help this solution evolve. But how could we best show our appreciation? A glitter unicorn card talking about “celebrating the journey”? A flash mob dance to Hall & Oates’ “You Make My Dreams Come True”? Hire a plane to write “With users like you, we’ve Cobalt Struck gold!” It turns out that that it is very difficult to express gratitude in a non-cheesy way, but we’ve tried our best with the following video:


Arsenal Kit Update: Thread Stack Spoofing

As I mentioned in the recent Roadmap Update blog post, we are in the process of expanding the Cobalt Strike development team and ramping up our research activities so that we can release more tools outside of the core product release schedule. We’re also acutely aware of Cobalt Strike’s limitations when it comes to EDR and AV evasion, and our research efforts at the moment aim to make improvements in that area. In that vein, a new tool is now available in the Cobalt Strike Arsenal that adds thread stack spoofing capabilities.

AV and EDR detection mechanisms have been improving over the years and one specific technique that is used is thread stack inspection. This technique determines the legitimacy of a process that is calling a function or an API.

Thread stack spoofing is not a new technique and there are several good examples of this technique that are already available. The research team would like to highlight mgeeky’s thread stack spoofer, which works well and was the catalyst for the team to look into their own implementation. To avoid confusion here, it’s worth pointing out that the research team used new concepts and techniques resulting from their own research activities to develop their own unique take on this technique, rather than using mgeeky’s implementation.

Full details on our implementation are included in the readme that accompanies the tool in the Cobalt Strike Arsenal. This information and the tool itself are only available to licensed customers. The Cobalt Strike Arsenal is accessed via a link in Cobalt Strike, or directly here.

There’s Another New Deputy in Town

Things are moving in the Cobalt Strike world…
And they are moving… FAST.

When I started my position with the Cobalt Strike team, I got to meet the team in person in the head office in Eden Prairie, Minnesota.
I can’t say much yet, but the team has been cooking up some cool stuff coming into the next several releases.
I’m pleased to join a team of wonderful individuals that all excel in their own areas of expertise.

So what am I going to be doing here in the mix, you ask?

I’ll be drawing from my own expertise as a Cobalt Strike user, and that of our wonderful community to research, support, and build new features into the product. Some of these might make it into Beacon (or teamserver) itself, others will be released as BOFs or as kits.

There has been a lot of back and forth amongst the team already and I’m very excited to see the features that are already on the roadmap. Unfortunately, I have been sworn into an oath of silence, but fear not, you, the user, will get to see some cool features being added soon! (Something about 5 pair of socks?)

I maintain a relatively active social media presence and am lurking around in a majority of Discord and Slack channels. I have also been known to attend conferences every now and then, be it as an attendee or a speaker. So if you catch me online or IRL, feel free to have a chat!

Hopefully, this post has made you curious about what comes next, and I can’t wait to, together with the team, share new features with the community and our customers.

But for now, sit back, relax, and take part of the wonderful journey as we, as a team, lift Cobalt Strike into a new generation.

Cobalt Strike 4.6: The Line In The Sand

Cobalt Strike 4.6 is now available. As I mentioned in the recent Roadmap Update blog post, this isn’t a regular release, as it mostly focuses on security updates. There are also a couple of useful updates for users. A major release is planned for this summer, so this release lays the groundwork for the changes that are coming at that point.

Execute-assembly 1MB Limit Increase

A number of users have been asking for this for quite some time, and the change that we made affect not only execute-assembly, but other tasks (eg. dllinject) as well. We have added three new settings to the Malleable C2 profile (tasks_max_size, tasks_proxy_max_size and tasks_dns_proxy_max_size) that can be used to control maximum size limits. Note that these settings need to be set prior to team server startup. If the size is increased at a later time, old artifacts will still use the previous size settings and tasks that are too large will be rejected.

Comprehensive information on the new settings can be found in the Cobalt Strike documentation.

Arsenal Kit

We have combined the individual kits in the Cobalt Strike arsenal into a single kit, appropriately known as the Arsenal Kit. Building this kit yields a single aggressor script that can be loaded instead of loading all of the separate kits individually. The kit is controlled by the arsenal_kit_config file which is used to configure the kits that are built with the build_arsenal_kit.sh script.

The Arsenal Kit can be downloaded by licensed users from the Cobalt Strike arsenal.

Security Updates

This is the main focus of the Cobalt Strike 4.6 release. It is a necessary step as it lays the groundwork for our future development efforts.

Product security is nothing new. There has always been anti-proliferation processing in the software and, as discussed in this blog post (published by Raphael Mudge in 2019), we do our due diligence when it comes to screening potential customers and working with law enforcement. I think it is worth pointing out that the processes described by Raphael in that blog post are still processes that are followed at Fortra (the new face of HelpSystems) today–specifically:

From time to time, we receive informal requests for technical assistance or records from private entities. Our policy is not to perform analysis for, provide deconfliction services to, or disclose our records to private entities upon informal request.

If we have information relevant to a law enforcement investigation, we comply with valid legal process.

This stance is to avoid frivolous requests and to protect our customer’s information.

We also investigate tips. We can’t usually share information back, but we look into things brought to our attention.

We are also proactive when it comes to searching for Cobalt Strike teamservers out in the wild. This work is carried out by our own, dedicated threat intelligence team and it helps us to improve our product controls. That team also issues takedown requests if cracked copies are found.

Over the past few releases, we have made enhancements to Cobalt Strike’s product security. We intentionally haven’t described product security changes in much detail, but we do take it very seriously. Product security has been and will continue to be a key feature on our roadmap.

The 4.5 release in December 2021 saw changes to product licensing and improvements on the watermarking in the software. Those changes made it significantly more difficult to tamper with the authorization ID and locate the ever-changing hidden watermarks, therefore making it easier for us to trace stolen copies of Cobalt Strike back to specific customers. We have yet to see any credible reports of cracked copies of the 4.5 release being used because of these changes. We have seen what are claimed to be cracked copies of 4.5 being sold, but those have all turned out to be older versions badged as 4.5. By design, if the watermarks in the 4.5 release are tampered with, it will simply no longer work.

The 4.6 release brings a change to how the teamserver is deployed. Rather than a Java .jar archive, the teamserver has been built as a native binary. The client is still shipped as a .jar archive but we also plan to change that at some point as well. You shouldn’t notice anything different about the update process itself, but it is important to note that “cobaltstrike.jar” is now just a container for the team server (“TeamServerImage”) and client (“cobaltstrike-client.jar”), both of which will automatically be extracted during the update process. One thing to bear in mind though is that due to the changes in how Cobalt Strike 4.6 is installed and how it runs, coupled with changes to the download infrastructure to facilitate those changes, any scripts that you might have to automate the update process will likely no longer work and will need to be changed.

What does this mean? For you, moving forward, there is no real change. You can still download, update and use Cobalt Strike in the same way–however, please be aware that in this instance, you will need to download 4.6 directly from the website as the version 4.5 updater is incompatible with this release and will not recognize that an update is available. For us, building the software in this way is another step forward in terms of product security.

This is a line in the sand for us. We needed to make these necessary security enhancements so that we can forge ahead with our new development strategy and deliver more of what matters to our users. Normal service will be resumed with the 4.7 release this summer. Cobalt Strike will be 10 years old then so we’re hoping to do that release justice to mark the occasion properly.

To see a full list of what’s new in Cobalt Strike 4.6, please check out the release notes. Licensed users can download version 4.6 from the website. To purchase Cobalt Strike or learn more, please contact us.