Cobalt Strike 3.14 finally delivered some of the process injection flexibility I’ve long wanted to see in the product. In this post, I’d like to write about my thoughts on process injection, and share a few details on how Cobalt Strike’s implementation(s) work. Along the way, I will share details about which methods you might want to use in your red team exercises.

Where does Cobalt Strike process inject?

Cobalt Strike does process injection in a few places. Some of its artifacts spawn and migrate to a new process. While these are an important part of the attack chain, they’re under your control via the Artifact Kit, Applet Kit, and Resource Kit. This post focuses on the process injection in Cobalt Strike’s Beacon payload.

The inject and shinject commands inject code into an arbitrary remote process. Some of the tool’s built-in post-exploitation jobs can target specific remote processes too. Cobalt Strike does this because it’s safer to inject a capability into a context that has the data you want vs. migrating a payload and C2 to that context.

Many of Cobalt Strike’s post-exploitation features spawn a temporary process, inject the feature’s DLL into the process, and retrieve the results over a named pipe. This is a special case of process injection. In these cases, we control the temporary process. We know the process has no purpose beyond our offense action. This allows us to do more aggressive things. For example, we can take over the main thread of these temporary processes and not worry about giving it back. This is an important detail to keep in mind when configuring process injection in Cobalt Strike.

The Process Injection Cycle

The process-inject block in a Malleable C2 profile is where you configure process injection in Cobalt Strike:

process-inject {
# set remote memory allocation technique
set allocator "NtMapViewOfSection";

# shape the content and properties of what we will inject
set min_alloc "16384";
set userwx    "false";

transform-x86 {
prepend "\x90";
}

transform-x64 {
prepend "\x90";
}

# specify how we execute code in the remote process
execute {
CreateThread "ntdll!RtlUserThreadStart";
CreateThread;
NtQueueApcThread-s;
CreateRemoteThread;
RtlCreateUserThread;
}
}

This block is organized around the lifecycle of the process injection process. Here are the steps:

1. Open a handle to the remote process
2. Allocate memory in the remote process
3. Copy the injected data to the remote process
4. Ask the remote process to execute our injected code

Allocate and Copy Data to a Remote Process

Step 1 is kind of implicit. If we spawn a temporary process (e.g., for a post-exploitation job); we already have a handle to do things to the remote process. If we want to inject code into an existing remote process (naughty, naughty), Cobalt Strike will use OpenProcess to do this.

Steps 2 and 3:

Cobalt Strike offers two options to allocate memory in a remote process and copy data to it.

The first option is the classic VirtualAllocEx -> WriteProcessMemory pattern. This is a common pattern in offense tools. This option also works across different process architectures. This matters. Process injection is not limited to an x64 process context injecting into an x64 target process. A good implementation needs to account for the different corner cases that come up (e.g., x86 -> x64, x64 -> x86, etc.). This requirement makes VirtualAllocEx a safe choice. It’s also the default Cobalt Strike uses. If you want to explicitly specify this pattern: set the process-inject -> allocator option to VirtualAllocEx.

Cobalt Strike also has the CreateFileMapping -> MapViewOfFile -> NtMapViewOfSection pattern. This option creates a file mapping that is backed by the Windows system paging file. It then maps a view of that mapped file into the current process. Cobalt Strike then copies the injected data to the memory associated with that view. The NtMapViewOfSection call makes the same mapped file (with our local changes) available in the remote target process. This is available if you set process-inject -> allocator to NtMapViewOfSection. The downside to this option is it only works x86 -> x86 and x64 -> x64. For cross-architecture injection, Cobalt Strike will fall back to the VirtualAllocEx pattern. This pattern is useful in situations where a defense solution hones in on VirtualAllocEx -> WriteProcessMemory but does not detect other methods to copy data into a remote process.

Transform your Data

The above description of steps 2 and 3 assumes that you’re copying the injected data over as-is. That’s not necessarily true. Cobalt Strike’s process-inject block has options to transform the injected data. The min_alloc option is the minimum size of the block Beacon will allocate in a remote process. The startrwx and userwx options are a hint to the initial and final permissions of the allocated memory. If you want to avoid RWX pages, set these options to false. The transform-x86 and transform-x64 blocks allow you to pad either side of the injected data. If you prepend data, make sure it’s valid code to execute for that architecture.

The options to transform content in the process-inject block are very basic. They’re basic because these are options that are safe for all injected content. If I assume what I receive is a position-independent blob that is a self-contained program, I know I am OK to prepend and append data to it at will. If I assume that this position-independent blob does not modify itself, I know I can get away without RWX permissions. These things are as far as I’m willing to go with data I know nothing about. For more aggressive changes to injected content itself, use the Malleable C2 stage block to modify Beacon. Use the Malleable C2 post-ex block to modify Cobalt Strike’s actual post-exploitation DLLs.

Don’t dismiss these transforms because they are basic though. A lot of content signatures look for specific bytes at fixed offsets from the beginning of an observable boundary. These checks occur in O(1) time which is favorable to an O(n) [or worse] search. Too many expensive checks and a security technology can run into performance issues.

Binary padding can also affect the thread start address offset of your Cobalt Strike post-exploitation jobs. When Beacon injects a DLL into memory; it starts the thread at the location of that DLL’s exported ReflectiveLoader function. This offset shows up in the thread’s start address characteristic and is a potential indicator to hunt for a specific post-exploitation DLL. Data prepended to an injected DLL affects this offset. (Less visible threads help too; we’ll get to that in a moment…)

Part 3 of In-Memory Evasion has some more discussion on content, memory, and thread characteristics that are used to detect injected DLLs in memory.

Code Execution: So many damned corner cases…

At this point, we assume our injected content is in the remote process. The next step is execute that content. This is where the process-inject -> execute block comes in. Here, you get to specify which options Cobalt Strike will consider when it needs to inject code. Beacon goes through these options, one at a time, and tries the options that are valid to the current context. When one of these options succeeds, Beacon stops this process.

I mentioned it earlier, but I want to emphasize it again: process injection is filled with corner cases. The list of options you specify has to cover these corner cases. If your list of options misses a corner case, you will find that process injection fails for seemingly random reasons. My goal with this blog post is to help clear up some of these seemingly random reasons.

What are those corner cases?

All of the injection techniques implemented in Cobalt Strike work x86 -> x86 and x64 -> x64. Injecting from one architecture into the other is a trivial base case. But, x86 -> x64 and x64 -> x86 are contexts that matter too.

One context factor (favorable, if we treat it different) is whether or not the remote process is a throw-away temporary process. Remember, Beacon’s post-ex jobs spawn a temporary process and because the process is temporary—we can do more aggressive things.

Another favorable context factor is self injection. If we inject into our own process, we can and should treat that differently. We can simply use VirtualAlloc and CreateThread when injecting into ourself. When dealing with a security stack that aggressively swat remote process injection, self-injection is a way to safely use capabilities that can target a remote process.

One last corner case is whether or not the injected data has an argument. I can pass an argument via SetThreadContext with an x64 target (thanks fastcall!). Cobalt Strike’s implementation can’t pass an argument, via SetThreadContext, with an x86 target. Bummer.

We’re not done though. When dealing with remote process injection there are other factors. Some methods are riskier on Windows XP era systems. *gasp*. RtlCreateUserThread falls into this camp. And, other methods don’t work when you have to inject across desktop session boundaries (CreateRemoteThread, I’m looking at you).

Code Execution: The perfect execute block

Some of the execute options are scoped to the special cases described above. When you specify your execute block, put these special cases (self-injection, suspended processes) first. Beacon will ignore these options when they’re not right for the current injection context.

Next, you should follow up with which methods you want Beacon to use in-general. Remember, each method has different context limitations and failure cases. If you care that your process injection succeed, OPSEC be damned, make sure you have backups to the primary methods you specify. This is how Beacon’s process injection cocktail worked before 3.14 gave control to your profiles.

Let’s walk through the different execute options implemented in Beacon and their nuances:

Code Execution: CreateThread

I’ll start with CreateThread. I think CreateThread should come first in an execute block (if it’s there at all). This function will only run when you’re doing self-injection. You can use CreateThread which will spin up a thread pointing to the code you want Beacon to run. Be cautious though. When you self-inject this way, your thread will have a start address that’s not associated with one of the modules (DLLs, the current program itself) loaded into the current process space. This is a tell used to detect injected content. To help with this, you can specify CreateThread “module!somefunction+0x##”. This variant will spawn a suspended thread that points to the specified function. If the specified function is not available via GetProcAddress; this variant will immediately fail. Beacon will use SetThreadContext to update this new thread to run your injected code. This is a way of doing self-injection in a way that gives your thread a more favorable start address.

Code Execution: SetThreadContext

The next place to go is SetThreadContext. This is one of the methods available to take over the primary thread of a temporary process spawned for a post-exploitation job. Beacon’s SetThreadContext option works x86 -> x86, x64 -> x64, and x64 -> x86. If you choose to use SetThreadContext, put it after the CreateThread option(s) in your execute block. When you use SetThreadContext; your thread will have a start address that reflects the original execution entry point of the temporary process.

Code Execution: NtQueueApcThread-s

Another option for suspended processes is NtQueueApcThread-s. This option uses NtQueueApcThread to queue a one-off function that runs when the target thread wakes up next. In this case, the target thread is the primary thread of our temporary process. This methods next step is to call ResumeThread. This function wakes up the primary thread of our suspended process. Because the process is suspended, we don’t have to worry about giving this primary thread back to the process. Supposedly, executing code this way, allows our injected capability to initialize itself in the process before some userland-resident security products initialize themselves. This method of evasion was labeled the early bird injection technique by researchers from Cyberbit. This option is x86 -> x86 and x64 -> x64 only.

The use of SetThreadContext vs. NtQueueApcThread-s are up to you. I don’t think one is clearly better than the other in all contexts.

Code Execution: NtQueueApcThread

The next option to consider is NtQueueApcThread. This is a different implementation from NtQueueApcThread-s. It’s designed to target an existing remote process. This implementation pushes an RWX stub to the remote process. This stub contains both code and context related to the injection. To execute this stub, we add our stub to the APC queue of every thread in the remote process. If one of those threads enters an alertable state, our stub will execute.

What does the stub do?

The stub first checks if it was already run. If it was, it does nothing. This is to prevent our injected code from running multiple times.

The stub then calls CreateThread with our injected code and its argument. We do this to allow the APC to quickly return and let the original thread go on about its business.

There’s a risk that no thread will wake up and execute our stub. Beacon waits about 200ms and checks the stub to determine if the code ran. If it didn’t, we update the stub to mark the injection as having run, and we move on to the next injection technique. That’s the implementation of this technique.

I’ve had several requests for this option, because some security products have less visibility into this event. That said, this implementation has its OPSEC concerns. It does push that RWX stub which itself is a noisy memory indicator. It also calls CreateThread against our code that was pushed into this remote process. The start address of this thread is not backed by a module on disk. It won’t do well with a Get-InjectedThread sweep. If you find this injection method valuable, go ahead and use it. Just be aware that it has its trade-offs. One other note: this method (as I’ve implemented it) is x86 -> x86 and x64 -> x64 only.

Code Execution: CreateRemoteThread

Another option is CreateRemoteThread. This is the standard-issue remote process injection technique. As of Windows Vista, it does fail when injecting code across session boundaries. In Cobalt Strike, vanilla CreateRemoteThread covers x86 -> x86, x64 -> x64, and x64 -> x86 cases. This technique is also very visible. The Sysmon event 8 will fire when this method is used to create a thread in another process. Beacon does implement a CreateRemoteThread variant that accepts a fake start address in the form “module!function+0x##”. Like CreateThread, Beacon will create this thread in a suspended state and use SetThreadContext/ResumeThread to make it run our code. This variant is x86 -> x86 and x64 -> x64 only. This variant will fail if the specified function is not available via GetProcAddress.

Code Execution: RtlCreateUserThread

The last option available to Cobalt Strike’s execute block is RtlCreateUserThread. Be aware! This option is similar to CreateRemoteThread without some of its limitations. It does have its own drawbacks though.

RtlCreateUserThread will inject code across session boundaries. Supposedly it has some trouble in some injection contexts on Windows XP. This may or may not matter to you. This method DOES fire Sysmon event 8 as well. One benefit to RtlCreateUserThread is it covers x86 -> x86, x64 -> x64, x64 -> x86, AND x86 -> x64. This last corner case is important to address.

x86 -> x64 injection happens when you’re in an x86 Beacon context and you spawn an x64 process for a post-exploitation job. The hashdump, mimikatz, execute-assembly, and powerpick modules all default to an x64 context where they can. To pull off the feat of x86 -> x64 injection, this implementation transitions your x86 process to an x64 mode and injects an RWX stub to call RtlCreateUserThread from an x64 context. This implemention comes from Meterpreter and the RWX stub is a loud memory indicator. I’ve long advised: “stay x64 as much as possible”. This type of detail is the reason why. I do recommend RtlCreateUserThread exist in any process-inject -> execute block though. It makes sense to have this as the bottom-most option. Use it when nothing else works.

Life without (Remote) Process Injection

When I think about how to make an offense technique flexible, I also like to give similar consideration to what would I do if this technique were not an option?

Process injection is a way to move a payload/capability to a different process context (e.g., go from desktop session 0 to desktop session 1). It’s possible to move to a different process context without remote process injection. Use the runu command. This Beacon command will execute a program as a child of an arbitrary process you specify. This is a way to get a capability into another desktop session (for example) without remote process injection.

Process injection is also a way to execute capabilities on-target without putting a capability on disk. In Cobalt Strike; many post-exploitation capabilities have the option to target a specific process. To use these without remote process injection; specify your current Beacon process. This is self-injection.

Sometimes, putting something on disk is the best option available. I once had success compiling a keystroke logger as a DLL and dropping it to c:\windows\linkinfo.dll to (eventually) load it into explorer.exe. We used an open share on the same system to periodically grab our keystrokes. This helped my colleagues and I operate in a highly-scrutinized situation where it was difficult to keep a memory-resident payload alive on target.

If you enjoy these types of thought exercises; I recommend watching Agentless Post Exploitation and Fighting the Toolset.

Interested in Trying Cobalt Strike?

REQUEST A QUOTE