BloodHound MCP, One Year Later: What I Learned About MCPs, Models, and Context

Read Time

15 mins

Published

Jun 18, 2026

Share

TL:DR: The first version of BloodHound MCP proved that an LLM could converse with BloodHound. The current version drove home the lesson that MCP design is context design. The most useful changes were smaller but more flexible tools, better error recovery, domain-specific resources, and a prompt that guides the model instead of trying to teach everything up front.

Introduction: The First Version Worked, But Needed Improvement

When I released the first version of the BloodHound MCP, I was mostly thinking like a developer looking at an API surface. To me, more coverage meant more capability.

BloodHound has an extensive API and my goal was to expose as much of it to the MCP. That made sense for the first version. It proved the concept, and it gave users a way to ask natural-language questions against BloodHound data.

The feedback after release was useful. I got bug reports, pull requests, feature requests, privacy questions, and enough outside use to see the project differently. The first version worked, but it had inherited the shape of the API too directly.

That was my mistake.

An MCP tool is more than a function call. It is a chunk of context that is passed to the model with every query in a conversation

MCP Tools Are Not Free

The most obvious way to build an MCP server (or so I thought) is to map one API operation to one tool.

That is how the first version grew. If I had an endpoint for basic user information, that became a tool. If I had an endpoint for user admin rights, that became another tool. Group memberships? Another tool. Sessions? Another tool. RDP rights, DCOM rights, PSRemote rights, constrained delegation, controllables, controllers all got their own separate tools.

For users alone, the tool catalog started to look something like this:

get_user_info
get_user_admin_rights
get_user_group_memberships
get_user_sessions
get_user_rdp_rights
get_user_dcom_rights
get_user_ps_remote_rights
get_user_constrained_delegation
get_user_controllables
get_user_controllers

That pattern repeated across computers, groups, domains, OUs, GPOs, graph operations, ADCS, and Cypher. It was straightforward to implement and easy to reason about as a developer.

It was easy to implement, but very expensive in terms of the model’s context.

Every tool definition includes a name, a description, and a parameter schema. The model sees that information before it decides what to call. Depending on the client and provider, the exact packaging differs, but the practical effect is the same: every tool definition is loaded into context with every request to the model. That is true whether a specific tool is used or not. A tool catalog with dozens of similar entries becomes a small API reference manual sitting beside the actual analyst conversation.

That cost shows up in two places. The first is raw context overhead. Tokens spent describing the interface are tokens the model cannot spend on graph results, object names, paths, edge explanations and analyst reasoning. The second is tool selection quality. When the model sees a long list of similar tools, it has to choose between them. That increases the odds of over-calling, hedging, or picking a tool adjacent to the correct one. Those are recoverable failures, but they waste time and tokens.

The goal was for the model to behave like an analyst with BloodHound available, not a tool-selection engine.

More Tokens More Problems

A coworker reported that a couple of queries with the BloodHound MCP exhausted the Claude free tier and would also result in context compaction. That was the moment I stopped thinking about the tool list as an implementation detail and started treating it as a tax on every query.

So I measured it.

For the last pre-optimization version I measured, the MCP exposed exactly 100 tools, one MCP prompt, and four resources. The resources were loaded on demand, so I kept them out of the fixed per-turn number. The fixed cost was the tool schema list plus the bloodhound_assistant prompt.

Using the FastMCP-generated tool metadata serialized as minified JSON and counted with the cl100k_base tokenizer, the old shape looked like this:

MCP VersionTool CountPrompt Text TokensTool Schema TokensTotal MCP Tokens
Pre-Optimization1002,05516,68419,016
First Composite-Tool Version1238943884810
Current Version131,00846195721

The exact number will vary a bit by model, provider, and tokenizer, but the shape is the important part. Before the refactor, the MCP was spending about 19 thousand tokens on tools and prompts before the user’s actual question, the graph results, or the assistant’s reasoning entered the picture.

The first composite tool pass cut that overhead from 19,016 tokens to 4,810 tokens. That is 14,206 tokens saved per turn, or about a 75% reduction. Over a ten-turn analysis session, that is roughly 142,060 fewer input tokens spent describing the MCP to the model.

The current version has one more tool and a larger behavioral prompt because it includes file upload and more explicit guidance. Even with that added back, it still comes in at 5,721 fixed tokens, 13,295 fewer tokens per turn than the old design, or about a 70% reduction.

That matters for cost, but cost is the boring part. The bigger issue is what those tokens displace. In a BloodHound workflow, the model needs room for graph context (query results, object names, path explanations, and the analysts follow-up questions). If the MCP burns too much context describing itself, the model has less room to reason about the environment.

It matters even more when local models are part of the workflow. The data within BloodHound should not always leave the environment just so an assistant can help reason about them. Local models make that more practical, but local context windows are constrained by the hardware. Cutting fixed overhead from 19k to 5-6k tokens can be the difference between a useful session and one that runs out of room immediately.

Composite Tools Fit the Models’s Decision Better

The main architectural change was moving from one tool per endpoint to composite tools.

Instead of exposing ten separate user tools, the MCP now exposes one user_info tool with an info_type parameter. The model still has access to the same underlying capabilities, but the top-level tool catalog is much smaller.

A simplified version looks like this:

@mcp.tool()
def user_info(
    user_id: str,
    info_type: str = "info",
    limit: int = 100,
    skip: int = 0,
) -> str:
    """Query user data from BloodHound.

    info_type options:
        info - General user properties and attributes
        sessions - Machines this user has active sessions on
        memberships - Groups this user belongs to
        admin_rights - Objects this user has admin rights on
        rdp_rights - Machines this user can RDP to
        dcom_rights - Machines this user can execute DCOM on
        ps_remote_rights - Machines this user can PSRemote to
        constrained_delegation - Services this user can delegate to
        controllables - Objects this user can control
        controllers - Principals that control this user
    """

The model’s decision changes from, “Which of these 10 user tools should I call?” to “What kind of user information do I need?”

That maps much more cleanly to the analysis task.

The current version of the MCP exposes thirteen top-level tools:

ToolPurpose
domain_info
Domain-level enumeration, trusts, DCSyncers, foreign principals, search
user_infoUser properties, memberships, sessions, rights, controllers, controllables
group_infoGroup members, memberships, admin rights, controllers, controllables
computer_infoComputer properties, sessions, local admins, remote access rights, delegation
ou_infoOU properties and contained users, groups, computers, and GPOs
gpo_infoGPO properties and controllers
graph_analysisShortest paths, edge composition, graph search
adcs_infoADCS templates and ESC path information
cypher_queryCustom Cypher execution and saved-query access
data_qualityData statistics and platform coverage checks
custom_nodesOpenGraph custom node type configuration
asset_groupsAsset group and selector information
file_uploadControlled SharpHound/AzureHound collection upload support

The composite tool pattern introduces a tradeoff. With individual tools, every operation is discoverable by name. get_user_rdp_rights is obvious. With composite tools, the model has to know that user_info accepts rdp_rights as an info_type.

I mitigated this by providing better tool descriptions. Each composite tool lists the valid info_type values directly in the docstring. The model reads those descriptions when deciding what to call. In practice, that has been enough for the model to find the right operation while keeping the top-level catalog manageable. The backup for when the descriptions aren’t enough is error handling.

Wrong Calls Should Help The Model Learn

Composite tools are only useful if wrong calls fail cleanly. That led to making sure that every composite tool dispatches through a shared helper.

def _handle_tool_call(info_type: str, handlers: dict, **context):
    """Dispatch a composite tool call to the appropriate handler"""
    handler = handlers.get(info_type)
    if not handler:
        valid = ", ".join(sorted(handlers.keys()))
        return json.dumps(
            {"error": f"Unknown info_type '{info_type}'. Valid options: {valid}"}
        )
    try:
        result = handler()
        return json.dumps({"info_type": info_type, "data": result, **context})
    except BloodhoundConnectionError as e:
        return json.dumps({"error": f"Connection error: {str(e)}"})
    except BloodhoundAPIError as e:
        return json.dumps({"error": f"API error: (HTTP {e.status_code}) {str(e)}"})
    except Exception as e:
        logger.error(f"Error in {info_type}: {str(e)}")
        return json.dumps({"error": f"Unexpected error in {info_type}: {str(e)}"})

In the original tool layout, every function carried its own error handling. The same connection errors, API errors, and generic exception handling showed up again and again. That kind of repetition is manageable when a project is small, but as a project grows it becomes a maintenance problem quickly.

With the dispatcher, each composite tool passes a dictionary of handlers:

handlers = {
    "info": lambda: bloodhound_api.users.get_info(user_id),
    "sessions": lambda: bloodhound_api.users.get_sessions(user_id, limit=limit, skip=skip),
    "memberships": lambda: bloodhound_api.users.get_memberships(user_id, limit=limit, skip=skip),
    "admin_rights": lambda: bloodhound_api.users.get_admin_rights(user_id, limit=limit, skip=skip),
}

Adding a capability becomes adding a handler entry. Error formatting stays consistent. Pagination forwarding is easier to test. And when the model guesses an invalid info_type, it gets a useful response back:

{
  "error": "Unknown info_type 'groups'. Valid options: admin_rights, constrained_delegation, controllables, controllers, dcom_rights, info, memberships, ps_remote_rights, rdp_rights, sessions, sql_admin_rights"
}

While not perfect, it helps the model recover. The model sees the valid options and can retry. I would rather have a small recoverable error than a huge tool catalog designed to avoid that error entirely.

Cypher Needed References, Not A Bigger Prompt

The most frustrating part of the project has consistently been Cypher and it’s not because BloodHound’s API makes Cypher hard to execute. The MCP can run queries just fine, but the problem is that LLMs write technically correct Cypher and BloodHound doesn’t always support that. 

That distinction matters.

BloodHound Cypher has its own requirements beyond generic graph querying. The model has to know the labels, relationship names, property names, casing, directionality, and BloodHound-specific path semantics.

That matters because models are trained on Cypher in the broad internet sense. BloodHound needs Cypher queries that will actually accept and interpret the way the analyst expects.

A few examples:

  • BloodHound property names are lowercase, such as hasspn, enabled, and admincount
  • Many object names are stored uppercase with the domain suffix
  • DCSync rights target Domain nodes, not Group nodes
  • GPO abuse paths require the full chain through the GPO, GPLink, OU or container, and affected targets
  • List properties need COALESCE patterns to avoid null errors
  • Aggregation can work through the API while still being a poor fit for queries meant to render in the BloodHound GUI

A model can get this partially correct, but that is misleading because the query looks correct, but the result can be wrong or fail.

The solution was better prompt architecture.

In the first version, I treated the system prompt as the primary place to teach the model on how to behave. That worked for a while, but the long prompt turned into context pollution quickly. Every caveat feels important, but eventually the prompt is trying to wear too many hats as a workflow guide, schema reference, Cypher tutorial, and behavior police.

The new approach uses MCP resources for detailed reference material. The system prompt tells the model when to load those resources and the resources contain the domain knowledge.

The important resources are:

ResourcePurpose
bloodhound://cypher/referenceCypher syntax, BloodHound schema, properties, name formats, and query patterns
bloodhound://cypher/offensive-queriesKnown-good templates for attack scenarios like DCSync, Kerberoasting, GPO abuse, delegation, ADCS, and more
bloodhound://guides/adActive Directory node and relationship quick reference
bloodhound://guides/ad-methodologyDeeper AD attack-path workflow guidance
bloodhound://guides/azureAzure / Entra ID quick reference
bloodhound://guides/azure-methodologyAzure attack-chain methodology
bloodhound://guides/adcsADCS ESC quick reference
bloodhound://guides/adcs-methodologyDeeper ADCS analysis guidance
bloodhound://opengraph/guideOpenGraph schema design and best practices
bloodhound://opengraph/examplesExample OpenGraph patterns

The prompt becomes a map, with the resources serving as the turn-by-turn directions.

For example the prompt can tell the model to load the offensive query library before writing a custom Cypher query for an attack path scenario. That is much better than expecting the model to create every query from scratch.

The intended shape is the LLM as an attack-path navigator instead of a Cypher author.

Extending BloodHound MCP for Agentic Workflows

The last major addition is file upload support.

In the first version, I intentionally kept the MCP mostly read-only. The model could query BloodHound, inspect objects, run Cypher, and explain paths, but it assumed the data was already there. At the time, that felt like the right boundary. Agents were still rough around the edges, and I did not feel comfortable handing a model the ability to push data into BloodHound. When I thought about it, I could only picture scenarios where the model blows away or uploads bad data just to achieve a goal.

I could not see the advantage of giving a model this capability outweighing the risk of destroying or modifying data.

That changed as agent harnesses and coding agents improved. More of my testing started to look less like “ask a model a question about existing data” and more like “give an agent a goal and let it work through the steps.” In a lab, I had Codex successfully run a BloodHound collection. That was the moment the old boundary started to feel artificial. The agent had reached the point where it had the collection data, but it could not continue towards its goal by uploading it into BloodHound for analysis. I discovered a workflow that I had not originally intended for the MCP.

The original MCP helped with analysis after ingestion. File upload moves the MCP one step earlier in the process, letting the assistant help with setup and ingestion as well as analyst work.

The file_upload tool supports any BloodHound or OpenGraph collection uploads into BloodHound through the file upload API. The workflow looks roughly like this:

File upload support is a step toward more agentic workflows. The right use-case is controlled assistance, the model prepares the upload, explains what it is about to do, uploads the data after approval, and then verifies the successful upload. The MCP should make the analyst faster, not remove the analyst from decisions that matter.

What This Version is Really About

The last year of BloodHound MCP work added features, but the most impactful changes reshaped the interface for the models and made it more token efficient. 

The first version answered the question:

“Can an LLM talk to BloodHound?”

The newer version is trying to answer a better question:

“What does the model need in order to use BloodHound well?”

For this project, the answer was a smaller tool catalog, composite tools that match analyst intent, recoverable errors, resources for detailed domain knowledge, and a prompt that tells the model how to move through the workflow.

That pattern is not unique to BloodHound. Most MCP servers intended to connect models to a deep technical domain will run into some version of the problems I encountered. Exposing everything feels helpful, but it can degrade the models performance by carrying too much interface detail and forcing the model to choose between too many similar tools.

Good MCP design gives the models better handles not hands. More hands just means more ways to drop things.

What Comes Next?

The practical lesson from the rearchitecture was simple; the MCP should spend less context explaining itself and more context helping the model reason through a BloodHound graph. 

Once the MCP became more efficient, I started thinking about how the models should adapt when the graph itself changes, leading me to use the MCP with OpenGraph data.

OpenGraph lets BloodHound model technologies outside of AD and Azure. GitHub, SaaS platforms, device management systems, and custom internal systems can all become graphable in BloodHound. That makes the MCP more useful, but it also makes the model’s job harder. It has to understand a schema it may not currently know, write queries against unfamiliar relationships, and explain risk without hallucinating over the gaps. 

That is where the next post picks up; using BloodHound MCP with OpenGraph data and what happened when I started testing it with GitHound.

Matthew Nickerson

Consultant

Matthew Nickerson is a Consultant at SpecterOps specializing in Adversary Simulation. He is also the Director of Workshops at Red Team Village.

Ready to get started?

Book a Demo