Perfect Loader Implementations

Oct 9 2023

By: Evan McBroom • 6 min read

Thank you to SpecterOps for supporting this research and to Lee and Sarah for proofreading and editing! Crossposted on GitHub.

TLDR: You may use fuse-loader or perfect-loader as examples for extending an OS’s native loader to support in-memory libraries.

Some software applications require the ability to load dynamic libraries from the memory of the application’s own process. The majority of desktop OSes do not support this use case, so a number of developers have reimplemented the process of loading a library to overcome this limitation.

The quality of these reimplementations may be judged by comparing the feature set of these custom loaders against what the OS’s native loader supports. As such, the native OS loader may be considered a “perfect loader,” but it should not be considered the only perfect loader.

An OS’s loader can be modified or used with other native OS facilities to support in-memory libraries. Extending a native loader in such a manner will result in a new loader which supports both in-memory libraries and the entirety of the native loader’s original feature set (i.e., a new perfect loader). These approaches are explored in the following sections.

Native Loader Modifications

Matt Miller and Jarkko Turkulainen authored the seminal work on modifying native loaders with their publication of “Remote Library Injection” in April, 2004. In the section titled “In-Memory,” they described placing hooks on relevant system routines an OS’s loader used (e.g., mmap and NtMapViewOfSection). Those hooks allowed them to use a native loader as intended while modifying the behavior of its underlying routines to have a library’s data be supplied from memory instead of the filesystem.

Although this technique was excellent, in 2011, Stephen Fewer’s ReflectiveDLLInjection project (which reimplemented LoadLibrary) overshadowed it. What Stephen developed was useful, but LoadLibrary reimplementations are incomplete by nature and their feature gaps will only grow with time.

Matt and Jarkko’s approach for modifying the native Windows loader required manually parsing a library’s file format to map its sections into appropriately protected memory regions. Although this was required at the time, overwriting an open file in an uncommitted NTFS transaction and using it to create a section object can bypass this step. The native loader can then be redirected to use the section object with the updated file data instead of a section object with the original file data.

The original approach of using a section object created from an updated file in an uncommitted NTFS transaction was documented by Tal Liberman and Eugene Kogan in their work titled “Process Doppelgänging.” While their work only described using the section object to create a new process or thread, you can use it to extend LoadLibrary as described above. To my knowledge, this is a novel approach to using transactions and I personally refer to it as Module Doppelgänging to acknowledge Tal and Eugene’s prior work.

Combining Native Facilities

A native loader may also be extended by combining it with other native facilities. Such an approach is arguably more stable because it does not require hooking the native loader’s internal implementation, which will change over time.

The most straightforward example of this is certainly the use of memfd_create in Linux 3.17 and newer to create a memory backed file descriptor whose full path may be provided to dlopen. Another simple approach used by developers supporting older versions of Linux and other POSIX platforms is to place libraries in tmpfs mounts (e.g., /dev/shm). While lesser known, POSIX developers have the additional option of hosting their libraries in a Filesystem in Userspace (FUSE) mount to use with dlopen as shown in fuse-loader.

Windows provides less approaches for combining a native loader with other native facilities to achieve in-memory loading, but there are solutions. The oldest available approach is to have your process host a WebDAV server, use LoadLibrary to load a path that resolves to your server, and have the server respond with the bytes of an in-memory library when that path is requested. Jonas Lyk created this approach and implemented it as a proof of concept (POC) for creating a new process from an in-memory executable, but WebDAV servers may also be used to load a library. Alexander Sotirov showed this use case in 2006 with his work titled “Tiny PE”, albeit it did not use a WebDAV server that the application’s own process hosted.

Newer versions of Windows with Windows Subsystem for Linux (WSL) come with a Plan9 multiple UNC provider (MUP) which allows users to access Linux files from their host using the wsl$ UNC prefix. Such an ability allows developers to now use some of the above described POSIX approaches on Windows.

Some readers who learn this may be tempted to try loading an in-memory library by writing it to a named pipe and passing its path to LoadLibrary. Unfortunately, the underlying driver for SMB does not support creating section objects from a pipe and LoadLibrary will encounter the error STATUS_INVALID_FILE_FOR_SECTION when it internally calls NtCreateSection.

This summarizes the Windows approaches that I am aware of. Although few were listed, I am sure others will identify approaches I missed and newer approaches will become possible as Windows adds support for more technologies.

Conclusion

Although developers more commonly reimplement the process of loading a library to overcome the limitations of an OS’s native loader in regards to loading in-memory library data, such approaches are inherently incomplete. Further, reimplementing some native loader features can obligate developers to painful update cycles. An example of such an issue is with providing full exception handling support on Windows without using symbol data. Some developers achieve this by maintaining byte signatures of pertinent unexported NTDLL functions for every version of Windows.

Developers who use a perfect loader approach do not have these issues. Their implementations typically also require less code, less maintenance overhead, and will support more library loading features by design.

Two companion repositories were made for this blog to assist developers who are new to perfect loader approaches and interested in their use. The first is fuse-loader, which implements the FUSE mount approach for POSIX platforms. The second, perfect-loader, implements various approaches for modifying the native Windows loader. If either sound interesting to you, I encourage you to check them out and hope you find them useful!

Perfect Loader Implementations was originally published in Posts By SpecterOps Team Members on Medium, where people are continuing the conversation by highlighting and responding to this story.

Post Views: 6