AI & Security

BloodHound MCP, One Year Later: What I Learned About MCPs, Models, and Context

Author

Read Time

15 mins

Published

Jun 18, 2026

TL:DR: The first version of BloodHound MCP proved that an LLM could converse with BloodHound. The current version drove home the lesson that MCP design is context design. The most useful changes were smaller but more flexible tools, better error recovery, domain-specific resources, and a prompt that guides the model instead of trying to teach everything up front.

Introduction: The First Version Worked, But Needed Improvement

When I released the first version of the BloodHound MCP, I was mostly thinking like a developer looking at an API surface. To me, more coverage meant more capability.

BloodHound has an extensive API and my goal was to expose as much of it to the MCP. That made sense for the first version. It proved the concept, and it gave users a way to ask natural-language questions against BloodHound data.

The feedback after release was useful. I got bug reports, pull requests, feature requests, privacy questions, and enough outside use to see the project differently. The first version worked, but it had inherited the shape of the API too directly.

That was my mistake.

An MCP tool is more than a function call. It is a chunk of context that is passed to the model with every query in a conversation

MCP Tools Are Not Free

The most obvious way to build an MCP server (or so I thought) is to map one API operation to one tool.

That is how the first version grew. If I had an endpoint for basic user information, that became a tool. If I had an endpoint for user admin rights, that became another tool. Group memberships? Another tool. Sessions? Another tool. RDP rights, DCOM rights, PSRemote rights, constrained delegation, controllables, controllers all got their own separate tools.

For users alone, the tool catalog started to look something like this:

get_user_info
get_user_admin_rights
get_user_group_memberships
get_user_sessions
get_user_rdp_rights
get_user_dcom_rights
get_user_ps_remote_rights
get_user_constrained_delegation
get_user_controllables
get_user_controllers

That pattern repeated across computers, groups, domains, OUs, GPOs, graph operations, ADCS, and Cypher. It was straightforward to implement and easy to reason about as a developer.

It was easy to implement, but very expensive in terms of the model’s context.

Every tool definition includes a name, a description, and a parameter schema. The model sees that information before it decides what to call. Depending on the client and provider, the exact packaging differs, but the practical effect is the same: every tool definition is loaded into context with every request to the model. That is true whether a specific tool is used or not. A tool catalog with dozens of similar entries becomes a small API reference manual sitting beside the actual analyst conversation.

That cost shows up in two places. The first is raw context overhead. Tokens spent describing the interface are tokens the model cannot spend on graph results, object names, paths, edge explanations and analyst reasoning. The second is tool selection quality. When the model sees a long list of similar tools, it has to choose between them. That increases the odds of over-calling, hedging, or picking a tool adjacent to the correct one. Those are recoverable failures, but they waste time and tokens.

The goal was for the model to behave like an analyst with BloodHound available, not a tool-selection engine.

Composite Tools Fit the Models’s Decision Better

The main architectural change was moving from one tool per endpoint to composite tools.

Instead of exposing ten separate user tools, the MCP now exposes one user_info tool with an info_type parameter. The model still has access to the same underlying capabilities, but the top-level tool catalog is much smaller.

A simplified version looks like this:

@mcp.tool()
def user_info(
    user_id: str,
    info_type: str = "info",
    limit: int = 100,
    skip: int = 0,
) -> str:
    """Query user data from BloodHound.

    info_type options:
        info - General user properties and attributes
        sessions - Machines this user has active sessions on
        memberships - Groups this user belongs to
        admin_rights - Objects this user has admin rights on
        rdp_rights - Machines this user can RDP to
        dcom_rights - Machines this user can execute DCOM on
        ps_remote_rights - Machines this user can PSRemote to
        constrained_delegation - Services this user can delegate to
        controllables - Objects this user can control
        controllers - Principals that control this user
    """

The model’s decision changes from, “Which of these 10 user tools should I call?” to “What kind of user information do I need?”

That maps much more cleanly to the analysis task.

The current version of the MCP exposes thirteen top-level tools:

Tool	Purpose
domain_info	Domain-level enumeration, trusts, DCSyncers, foreign principals, search
user_info	User properties, memberships, sessions, rights, controllers, controllables
group_info	Group members, memberships, admin rights, controllers, controllables
computer_info	Computer properties, sessions, local admins, remote access rights, delegation
ou_info	OU properties and contained users, groups, computers, and GPOs
gpo_info	GPO properties and controllers
graph_analysis	Shortest paths, edge composition, graph search
adcs_info	ADCS templates and ESC path information
cypher_query	Custom Cypher execution and saved-query access
data_quality	Data statistics and platform coverage checks
custom_nodes	OpenGraph custom node type configuration
asset_groups	Asset group and selector information
file_upload	Controlled SharpHound/AzureHound collection upload support

The composite tool pattern introduces a tradeoff. With individual tools, every operation is discoverable by name. get_user_rdp_rights is obvious. With composite tools, the model has to know that user_info accepts rdp_rights as an info_type.

I mitigated this by providing better tool descriptions. Each composite tool lists the valid info_type values directly in the docstring. The model reads those descriptions when deciding what to call. In practice, that has been enough for the model to find the right operation while keeping the top-level catalog manageable. The backup for when the descriptions aren’t enough is error handling.

Wrong Calls Should Help The Model Learn

Composite tools are only useful if wrong calls fail cleanly. That led to making sure that every composite tool dispatches through a shared helper.

def _handle_tool_call(info_type: str, handlers: dict, **context):
    """Dispatch a composite tool call to the appropriate handler"""
    handler = handlers.get(info_type)
    if not handler:
        valid = ", ".join(sorted(handlers.keys()))
        return json.dumps(
            {"error": f"Unknown info_type '{info_type}'. Valid options: {valid}"}
        )
    try:
        result = handler()
        return json.dumps({"info_type": info_type, "data": result, **context})
    except BloodhoundConnectionError as e:
        return json.dumps({"error": f"Connection error: {str(e)}"})
    except BloodhoundAPIError as e:
        return json.dumps({"error": f"API error: (HTTP {e.status_code}) {str(e)}"})
    except Exception as e:
        logger.error(f"Error in {info_type}: {str(e)}")
        return json.dumps({"error": f"Unexpected error in {info_type}: {str(e)}"})

In the original tool layout, every function carried its own error handling. The same connection errors, API errors, and generic exception handling showed up again and again. That kind of repetition is manageable when a project is small, but as a project grows it becomes a maintenance problem quickly.

With the dispatcher, each composite tool passes a dictionary of handlers:

handlers = {
    "info": lambda: bloodhound_api.users.get_info(user_id),
    "sessions": lambda: bloodhound_api.users.get_sessions(user_id, limit=limit, skip=skip),
    "memberships": lambda: bloodhound_api.users.get_memberships(user_id, limit=limit, skip=skip),
    "admin_rights": lambda: bloodhound_api.users.get_admin_rights(user_id, limit=limit, skip=skip),
}

Adding a capability becomes adding a handler entry. Error formatting stays consistent. Pagination forwarding is easier to test. And when the model guesses an invalid info_type, it gets a useful response back:

{
  "error": "Unknown info_type 'groups'. Valid options: admin_rights, constrained_delegation, controllables, controllers, dcom_rights, info, memberships, ps_remote_rights, rdp_rights, sessions, sql_admin_rights"
}

While not perfect, it helps the model recover. The model sees the valid options and can retry. I would rather have a small recoverable error than a huge tool catalog designed to avoid that error entirely.

Cypher Needed References, Not A Bigger Prompt

The most frustrating part of the project has consistently been Cypher and it’s not because BloodHound’s API makes Cypher hard to execute. The MCP can run queries just fine, but the problem is that LLMs write technically correct Cypher and BloodHound doesn’t always support that.

That distinction matters.

BloodHound Cypher has its own requirements beyond generic graph querying. The model has to know the labels, relationship names, property names, casing, directionality, and BloodHound-specific path semantics.

That matters because models are trained on Cypher in the broad internet sense. BloodHound needs Cypher queries that will actually accept and interpret the way the analyst expects.

A few examples:

BloodHound property names are lowercase, such as hasspn, enabled, and admincount
Many object names are stored uppercase with the domain suffix
DCSync rights target Domain nodes, not Group nodes
GPO abuse paths require the full chain through the GPO, GPLink, OU or container, and affected targets
List properties need COALESCE patterns to avoid null errors
Aggregation can work through the API while still being a poor fit for queries meant to render in the BloodHound GUI

A model can get this partially correct, but that is misleading because the query looks correct, but the result can be wrong or fail.

The solution was better prompt architecture.

In the first version, I treated the system prompt as the primary place to teach the model on how to behave. That worked for a while, but the long prompt turned into context pollution quickly. Every caveat feels important, but eventually the prompt is trying to wear too many hats as a workflow guide, schema reference, Cypher tutorial, and behavior police.

The new approach uses MCP resources for detailed reference material. The system prompt tells the model when to load those resources and the resources contain the domain knowledge.

The important resources are:

Resource	Purpose
`bloodhound://cypher/reference`	Cypher syntax, BloodHound schema, properties, name formats, and query patterns
`bloodhound://cypher/offensive-queries`	Known-good templates for attack scenarios like DCSync, Kerberoasting, GPO abuse, delegation, ADCS, and more
`bloodhound://guides/ad`	Active Directory node and relationship quick reference
`bloodhound://guides/ad-methodology`	Deeper AD attack-path workflow guidance
`bloodhound://guides/azure`	Azure / Entra ID quick reference
`bloodhound://guides/azure-methodology`	Azure attack-chain methodology
`bloodhound://guides/adcs`	ADCS ESC quick reference
`bloodhound://guides/adcs-methodology`	Deeper ADCS analysis guidance
`bloodhound://opengraph/guide`	OpenGraph schema design and best practices
`bloodhound://opengraph/examples`	Example OpenGraph patterns

The prompt becomes a map, with the resources serving as the turn-by-turn directions.

For example the prompt can tell the model to load the offensive query library before writing a custom Cypher query for an attack path scenario. That is much better than expecting the model to create every query from scratch.

The intended shape is the LLM as an attack-path navigator instead of a Cypher author.

Extending BloodHound MCP for Agentic Workflows

The last major addition is file upload support.

In the first version, I intentionally kept the MCP mostly read-only. The model could query BloodHound, inspect objects, run Cypher, and explain paths, but it assumed the data was already there. At the time, that felt like the right boundary. Agents were still rough around the edges, and I did not feel comfortable handing a model the ability to push data into BloodHound. When I thought about it, I could only picture scenarios where the model blows away or uploads bad data just to achieve a goal.

I could not see the advantage of giving a model this capability outweighing the risk of destroying or modifying data.

That changed as agent harnesses and coding agents improved. More of my testing started to look less like “ask a model a question about existing data” and more like “give an agent a goal and let it work through the steps.” In a lab, I had Codex successfully run a BloodHound collection. That was the moment the old boundary started to feel artificial. The agent had reached the point where it had the collection data, but it could not continue towards its goal by uploading it into BloodHound for analysis. I discovered a workflow that I had not originally intended for the MCP.

The original MCP helped with analysis after ingestion. File upload moves the MCP one step earlier in the process, letting the assistant help with setup and ingestion as well as analyst work.

The file_upload tool supports any BloodHound or OpenGraph collection uploads into BloodHound through the file upload API. The workflow looks roughly like this:

File upload support is a step toward more agentic workflows. The right use-case is controlled assistance, the model prepares the upload, explains what it is about to do, uploads the data after approval, and then verifies the successful upload. The MCP should make the analyst faster, not remove the analyst from decisions that matter.

What This Version is Really About

The last year of BloodHound MCP work added features, but the most impactful changes reshaped the interface for the models and made it more token efficient.

The first version answered the question:

“Can an LLM talk to BloodHound?”

The newer version is trying to answer a better question:

“What does the model need in order to use BloodHound well?”

For this project, the answer was a smaller tool catalog, composite tools that match analyst intent, recoverable errors, resources for detailed domain knowledge, and a prompt that tells the model how to move through the workflow.

That pattern is not unique to BloodHound. Most MCP servers intended to connect models to a deep technical domain will run into some version of the problems I encountered. Exposing everything feels helpful, but it can degrade the models performance by carrying too much interface detail and forcing the model to choose between too many similar tools.

Good MCP design gives the models better handles not hands. More hands just means more ways to drop things.

What Comes Next?

The practical lesson from the rearchitecture was simple; the MCP should spend less context explaining itself and more context helping the model reason through a BloodHound graph.

Once the MCP became more efficient, I started thinking about how the models should adapt when the graph itself changes, leading me to use the MCP with OpenGraph data.

OpenGraph lets BloodHound model technologies outside of AD and Azure. GitHub, SaaS platforms, device management systems, and custom internal systems can all become graphable in BloodHound. That makes the MCP more useful, but it also makes the model’s job harder. It has to understand a schema it may not currently know, write queries against unfamiliar relationships, and explain risk without hallucinating over the gaps.

That is where the next post picks up; using BloodHound MCP with OpenGraph data and what happened when I started testing it with GitHound.

Current Project Links

BloodHound MCP: https://github.com/mwnickerson/bloodhound_mcp
Previous post: Chatting with Your Attack Paths: An MCP for BloodHound

Post Views: 3,621

Matthew Nickerson

Consultant

Matthew Nickerson is a Consultant at SpecterOps specializing in Adversary Simulation. He is also the Director of Workshops at Red Team Village.

MCP Version	Tool Count	Prompt Text Tokens	Tool Schema Tokens	Total MCP Tokens
Pre-Optimization	100	2,055	16,684	19,016
First Composite-Tool Version	12	389	4388	4810
Current Version	13	1,008	4619	5721

BloodHound MCP, One Year Later: What I Learned About MCPs, Models, and Context

Share

Introduction: The First Version Worked, But Needed Improvement

MCP Tools Are Not Free

More Tokens More Problems

Composite Tools Fit the Models’s Decision Better

Wrong Calls Should Help The Model Learn

Cypher Needed References, Not A Bigger Prompt

Extending BloodHound MCP for Agentic Workflows

What This Version is Really About

What Comes Next?

Current Project Links

Ready to get started?

You might also be interested in

Introducing Attack Path Management for Entra Agents in BloodHound Enterprise

Designing an MCP Server for AI Agents: Why Wrapping Your API Is the Wrong Abstraction

Attack Path Management Comes to AWS