← Back to Blog

HOW AI AGENTS ARE CHANGING OSINT

Published: 2026-05-24

OSINT investigations have always been bottlenecked not by data availability but by analyst bandwidth. The data is out there — enumerable from public APIs, DNS registries, breach aggregates, and social platforms. The problem is that each data source requires a separate tool, a separate command, and a human decision about what to do with the output before moving to the next step. AI agents are restructuring this at the loop level: the AI makes the sequential decisions, the analyst reviews the compiled result.

THE MANUAL OSINT BOTTLENECK

Before AI agents, a standard email investigation looked like this: run holehe against the target email, wait 30–90 seconds, read 40 lines of output identifying which platforms have an account. Decide manually whether any of those usernames are worth pivoting to. Run sherlock against a discovered username, wait another 60 seconds, read another block of output. Decide whether to check breach databases. Run an HIBP API call, parse the JSON, check which breach categories are represented. Decide whether any breach metadata suggests a platform worth querying.

Each decision point requires the analyst to hold context from prior steps in working memory while reading new output. For a single target, this is manageable — fifteen to twenty minutes of focused work. For ten targets in parallel, it becomes an organizational problem. Analysts use spreadsheets, notes, and ticket systems to track state across targets. The tooling is fragmented; the state management is entirely manual.

The bottleneck is not research skill — it is context switching between tools and keeping state across steps. An experienced analyst running this process manually and an inexperienced one differ mainly in which follow-up pivots they recognize, not in how fast they can run tools. The mechanical execution part was always a solved problem that required human attention.

HOW AN AGENTIC LOOP WORKS

The agentic loop replaces the human decision engine for mechanical steps while keeping the analyst in control of the question framing and result verification.

The loop works as follows: the user submits a target (email address, IP, username, domain) via the REPL or CLI. The model receives the target plus a complete set of tool definitions — each tool described by name, natural-language description, and a JSON Schema specifying its parameters. The model generates a tool_use block identifying which tool to call and with what parameters. The Python runtime intercepts this block, calls the actual tool function, waits for real output, and returns that output to the model as a tool_result message. The model reads the output and makes a judgment call: is this complete, or should another tool run? If another tool run is warranted, it generates another tool_use block. The loop continues until the model generates a final text response summarizing findings.

The analyst receives a structured report rather than a sequence of raw tool outputs. OpenOSINT's agentic loop is implemented in openosint/agent.py: it builds the tool definitions from the 14 available tools, sends them to the Anthropic API, and dispatches each tool_use block to the corresponding run_*_osint() Python function. Real output. No simulation.

TOOL USE VS. FUNCTION CALLING

Anthropic's tool use protocol has a structural property that matters for security research: it is a mandatory stop. When the model generates a tool_use content block, the completion halts. The API does not return a continuation — it returns the message with the tool use block and waits for the client to provide a tool_result. The model cannot generate text that follows the tool call without first receiving real output.

This is not universally true of AI function-calling implementations. Some frameworks allow the model to generate both the function call and a simulated response in the same stream, with the runtime optionally intercepting before the simulated response is shown. The model has, in these implementations, seen what a plausible response looks like before the real tool runs. This creates conditions for hallucination: the model has already committed to an expected result shape.

Anthropic's stop-and-wait design prevents this. The model issues a tool call with no visibility into what the response will contain. It receives whatever the actual tool returns — including errors, empty results, and unexpected data formats. This is the architecturally correct approach for any domain where data accuracy is non-negotiable. In OSINT, acting on fabricated breach data or fabricated account findings causes real harm to investigations and to targets. The tool must actually run.

For a detailed look at how this integrates with the Model Context Protocol, see the MCP protocol explainer.

CHAINING TOOLS BASED ON FINDINGS

The qualitative difference between an AI agent and a bash pipeline is adaptive branching. A pipeline runs a fixed sequence. An agent reads each result and decides what runs next based on what was found.

Concrete examples of how this manifests in practice:

None of these pivot decisions are hardcoded. The model makes them by reading tool output and applying judgment based on what it knows about each data source. The analyst reviewing the final report sees the reasoning chain — which tools ran, in what order, and what each returned.

CURRENT LIMITATIONS

AI agents are good at a specific subset of the OSINT problem: executing known tool sequences, tracking findings across steps, and surfacing connections that would require the analyst to hold a lot of intermediate context simultaneously. They are not a replacement for analyst expertise, and several limitations are structural.

Legal judgment. The agent will call whichever tool the model believes is most relevant. It does not verify that the investigation is authorized. Authorization is an external constraint that must be enforced by the operator before the agent is invoked, not by the agent itself. Running an AI-powered investigation against a target for which you do not have written authorization is not made safer or more legal by the AI doing the tool calls instead of you.

False positive recognition. Sherlock returns a hit on "johndoe99" at reddit.com. The agent reports it. Whether that Reddit account belongs to the target or is a different person who happens to use the same username requires human verification. The agent can note ambiguity — multiple accounts with the same handle across platforms — but it cannot verify identity. The analyst must close that loop.

Ambiguous targets. Common names and widely-used usernames produce high false-positive rates. "john.smith@gmail.com" may be one of many people. The agent will diligently enumerate breach entries and platform accounts, but correlating them to a specific individual requires additional signals (profile photos, mutual followers, overlapping geographic data) that typically require human assessment.

Tool coverage gaps. The 14 tools in OpenOSINT cover most standard OSINT surface area, but not all. Court record searches, property records, vehicle registration, and employer verification require jurisdictional APIs that are not yet integrated. The agent works with what it has. Analysts must supplement with manual lookups where needed.

The practical impact of AI agents in OSINT is a significant reduction in mechanical work and a modest improvement in finding connections that a human might miss due to fatigue. The expertise requirement for the analyst — understanding what findings mean, verifying identity, making authorization decisions — does not decrease. Read more about how hallucination risks are managed in the AI hallucination in OSINT article.

SEE ALSO


Home · Blog · Tools · GitHub