Posts Agentic AI & MCP-Based Offensive Security Recon Pipeline
Post
Cancel

Agentic AI & MCP-Based Offensive Security Recon Pipeline

Disclaimer: This content is intended for authorized cybersecurity engineering, internal infrastructure auditing, and defensive validation in permissible corporate environments. All techniques discussed assume full legal authorization and compliance.


Table of Contents

  1. VAPT Overview — Starting From a Domain/IP/URL
  2. Passive Reconnaissance Tools — Comprehensive Inventory
  3. Deep Analysis Tools
  4. Agentic AI & MCP-Based Approaches
  5. How Agents Process Nmap Data
  6. AI Models for Security Agents
  7. Installing Models on External SSD
  8. Best Ollama Models for This Project

1. VAPT Overview

The Mindset First

Before touching any tool, ask yourself three things:

  1. What’s in scope? (Just this domain? Subdomains? Associated IPs?)
  2. What’s the goal? (Find vulnerabilities? Test defenses? Simulate a real attacker?)
  3. What are the rules of engagement? (Time windows, off-limits systems, reporting format)

Phase 1 — Passive Reconnaissance (Zero Noise)

You learn as much as possible without touching the target directly. This leaves no logs on their side.

  • WHOIS lookup — Who owns the domain? Registration dates, registrar, contact info
  • DNS enumeration — MX, TXT, NS, A, CNAME records reveal infrastructure
  • Certificate Transparency logs — Sites like crt.sh reveal subdomains from SSL cert history
  • Shodan / Censys / FOFA — What ports/services are publicly indexed?
  • Google Dorkingsite:target.com filetype:pdf, inurl:admin, intitle:index of
  • TheHarvester / Hunter.io — Employee emails, email format discovery
  • LinkedIn OSINT — Tech stack clues from job postings
  • Wayback Machine — Old versions of the site, forgotten endpoints, deprecated APIs

Phase 2 — Active Reconnaissance (Light Touch)

  • Port scanningnmap -sV -sC to identify open ports and service versions
  • Subdomain brute-forcingsubfinder, amass, dnsx
  • Web fingerprinting — What CMS, framework, WAF is running?
  • Directory brute-forcingffuf or dirsearch to find hidden paths
  • Tech stack confirmation — HTTP response headers leak server type and framework version

Phase 3 — Vulnerability Identification

  • Automated scanningNikto, Nuclei against web targets; OpenVAS or Nessus for network-level
  • Manual testing:
    • Authentication flows (default creds, brute-force protections, MFA bypass)
    • Input fields for injection (SQLi, XSS, SSTI, command injection)
    • API endpoints for IDOR, broken auth, excessive data exposure
    • SSL/TLS config (testssl.sh) — weak ciphers, expired certs, HSTS missing
    • Security headers — missing Content-Security-Policy, X-Frame-Options, etc.
  • CVE matching — Map identified service versions to known CVEs

Phase 4 — Exploitation (Controlled & Documented)

  • Every action is logged with timestamps
  • Screenshots and HTTP request/response captures for evidence
  • Note blast radius — what could an attacker actually access?
  • Stop at proof of concept — no real data exfiltration, no unauthorized pivoting

Phase 5 — Reporting

A strong VAPT report includes:

  • Executive Summary — Business risk in plain English for leadership
  • Technical Findings — Each vuln with severity (CVSS score), description, evidence, affected asset, and remediation steps
  • Risk Rating — Critical / High / Medium / Low / Informational
  • Remediation Roadmap — What to fix first and why

The Key Mental Model

Think in layers:

1
Internet → Perimeter (firewall, WAF) → Application → Authentication → Data

You are asking at each layer: “Can I go deeper than I should be able to?”


Where to Build These Skills Legally

PlatformWhat You Practice
HackTheBoxReal machines, full pentest workflow
TryHackMeGuided learning paths, great for beginners
PortSwigger Web AcademyBest free resource for web app testing
VulnHubDownloadable VMs for offline practice
OWASP WebGoatDeliberately vulnerable web app

2. Passive Recon Tools

DNS & Domain Intelligence

ToolSourceWhat It Does
AmassGitHub/OWASPSubdomain enumeration via passive DNS, APIs, certs
SubfinderGitHub (ProjectDiscovery)Fast passive subdomain discovery via APIs
DNSreconGitHubDNS enumeration, zone transfer attempts
FierceGitHubDNS brute-force + zone transfer
MassDNSGitHubHigh-performance DNS resolver
KnockpyGitHubSubdomain wordlist-based enumeration
FindomainGitHubCross-platform subdomain finder
Sublist3rGitHubSubdomain enumeration via search engines
DNSxGitHub (ProjectDiscovery)Multi-purpose DNS toolkit
AltdnsGitHubSubdomain permutation + alteration
PurednsGitHubReliable passive DNS bruteforcing
ShuffleDNSGitHubWrapper around MassDNS for permutations

OSINT & Search Engine Harvesting

ToolSourceWhat It Does
TheHarvesterGitHubEmails, IPs, subdomains from search engines
Recon-ngGitHubFull OSINT framework, modular
MaltegoCommercial + CommunityGraph-based OSINT relationships
SpiderFootGitHubAutomated OSINT across 200+ sources
DatasploitGitHubOSINT framework for domains, emails, IPs
OSINT Frameworkosintframework.comCurated list of OSINT tools by category
MetagoofilGitHubExtracts metadata from public documents
FOCAGitHub (ElevenPaths)Metadata extraction from Google-indexed docs
Mr. HolmesGitHubUsername OSINT across platforms
HoleheGitHubCheck if email is registered on sites
SherlockGitHubUsername enumeration across social networks
MaigretGitHubExtended Sherlock with more sites

Certificate Transparency

ToolSourceWhat It Does
crt.shcrt.sh (web)Certificate transparency log search
CertspotterGitHub (SSLMate)Real-time cert issuance monitoring
CTFRGitHubSubdomain discovery via CT logs
TLSxGitHub (ProjectDiscovery)TLS certificate grabbing at scale
CertGraphGitHubGraph of certificate relationships

Internet-Wide Scanning & Exposure

ToolSourceWhat It Does
Shodanshodan.ioIndexed internet-facing services, banners, vulns
Censyscensys.ioCertificate + IP scanning database
FOFAfofa.infoChinese alternative to Shodan, massive index
Zoomeyezoomeye.orgCyberspace search engine
GreyNoisegreynoise.ioNoise vs targeted traffic classification
BinaryEdgebinaryedge.ioInternet exposure & threat intelligence
Onypheonyphe.ioCyber defense search engine
NatlasGitHubSelf-hosted network scanning platform
LeakIXleakix.netExposed services + misconfigured databases
Hunter.howhunter.howSearch exposed assets and services
Netlasnetlas.ioInternet assets search engine

Email & Employee OSINT

ToolSourceWhat It Does
Hunter.iohunter.ioEmail format discovery, employee list
Phonebook.czphonebook.czEmail, domain, URL intelligence
Snov.iosnov.ioEmail finder and verifier
EmailRepemailrep.ioEmail reputation lookup
Clearbit Connectclearbit.comEmail enrichment
LinkedIntGitHubLinkedIn OSINT scraper
CrossLinkedGitHubLinkedIn enumeration without API

Historical & Archived Data

ToolSourceWhat It Does
Wayback Machineweb.archive.orgHistorical snapshots of websites
CachedViewcachedview.nlGoogle/Bing cached page viewer
WaybackurlsGitHub (tomnomnom)Extract all URLs from Wayback Machine
Gau (GetAllURLs)GitHubFetch known URLs from multiple archives
URLScan.iourlscan.ioWebsite scan history + DOM snapshots
VirusTotalvirustotal.comHistorical DNS, URL, file reputation
OTX AlienVaultotx.alienvault.comThreat intel including passive DNS

IP, ASN & Network Intelligence

ToolSourceWhat It Does
BGPViewbgpview.ioASN, IP block, routing info
IPinfoipinfo.ioIP geolocation, ASN, org lookup
HackerTargethackertarget.comMultiple recon tools via API
RIPEstatstat.ripe.netRIPE database, BGP routing
ASNmapGitHub (ProjectDiscovery)Map ASN to IP ranges
WhoisCLI / ARIN / RIPEDomain/IP ownership
Robtexrobtex.comDNS, IP, ASN relationship mapping
MXToolboxmxtoolbox.comDNS, MX, blacklist lookups
ViewDNSviewdns.infoReverse IP, WHOIS, DNS history
DNSDumpsterdnsdumpster.comDNS recon + visual mapping

Credential & Data Leak Intelligence

ToolSourceWhat It Does
HaveIBeenPwnedhaveibeenpwned.comEmail breach lookup
DeHasheddehashed.comLeaked credentials database
LeakCheckleakcheck.ioBreach data lookup
PwndbGitHub (D4Vinci)Onion-based leaked credential search
GHuntGitHubGoogle account OSINT
IntelXintelx.ioSearch engine for leaked data, pastes, darkweb
GitLeaksGitHubScan git repos for leaked secrets
TruffleHogGitHubSecrets detection in git history
GitDorkerGitHubGitHub dork automation

Google & Search Engine Dorking

ToolSourceWhat It Does
Google DorksManualsite:, filetype:, inurl:, intitle:
DorkSearchdorksearch.comPre-built dork categories
Pentest-Tools Dorkpentest-tools.comDork automation
GoogD0rkerGitHubAutomated Google dorking
PagodoGitHubPassive Google dork automation
SearchDiggityBishopFoxBing/Google dork framework (Windows)

Cloud & Technology Fingerprinting

ToolSourceWhat It Does
CloudEnumGitHubAWS, Azure, GCP asset enumeration
S3ScannerGitHubPublic S3 bucket finder
GCPBucketBruteGitHubGCP bucket enumeration
Bucket FinderGitHub (DigiNinja)S3 bucket discovery
BuiltWithbuiltwith.comTechnology stack profiling
Wappalyzerwappalyzer.comBrowser extension tech fingerprinting
WhatCMSwhatcms.orgCMS detection
Netcraftnetcraft.comWeb tech, hosting history, phishing intel

All-in-One Frameworks

ToolSourceWhat It Does
Recon-ngGitHubModular OSINT framework (like Metasploit for recon)
SpiderFoot HXGitHub / CommercialAutomated passive + active recon
OSMEDEUSGitHubFull automated recon workflow
ReconFTWGitHubCombines 35+ tools into one pipeline
BBRaiderGitHubBug bounty-focused recon automation
RaccoonGitHubOffensive recon + info gathering
Sn1perGitHubAutomated pentest recon framework
Intrigue CoreGitHubAttack surface discovery platform
LazyReconGitHubBash-based recon automation
ART (Atomic Red Team)GitHub (Red Canary)Recon TTPs mapped to MITRE ATT&CK

Pro Tips

  • Chain tools — Amass → DNSx → HTTPx → Nuclei is a classic subdomain-to-vuln pipeline
  • API keys matter — Tools like Subfinder and TheHarvester perform 10x better with Shodan, Censys, VirusTotal API keys configured
  • ReconFTW is excellent for automating the entire passive phase in one run
  • Always cross-reference — one tool might miss what another catches

3. Deep Analysis Tools

Deep DNS & Protocol Analysis

ToolSourceWhat It Does
Passive Total (RiskIQ)passivetotal.orgPassive DNS history, WHOIS, SSL pivoting
DNSDB (Farsight)dnsdb.infoLargest passive DNS database commercially
SecurityTrailssecuritytrails.comDNS history, subdomain, IP neighbor lookup
DNSHistorydnshistory.orgHistorical DNS record changes
Mnemonic PassiveDNSpassivedns.mnemonic.noPassive DNS from Mnemonic security
CIRCL PassiveDNScircl.luFree passive DNS from CERT Luxembourg
CIRCL PassiveSSLcircl.luPassive SSL cert/IP correlation
DNS TwistGitHub (elceef)Detect typosquatting / phishing domains
DNSMorphGitHubDomain permutation engine
URLCrazyGitHubDomain typo generation for brand monitoring
DomainFuzzGitHubDomain variation fuzzer

Traffic & Protocol Intelligence (Passive Capture)

ToolSourceWhat It Does
Zeek (Bro)GitHub / zeek.orgNetwork traffic analyzer, protocol dissection
NetworkMinernetresec.comPassive network sniffer + artifact extractor
Arkime (Moloch)GitHubFull packet capture + indexed search
RitaGitHub (BH Automation)Detects C2 beaconing from Zeek logs
PassiveDNS (capture)GitHubLogs all DNS traffic passively
CapTipperGitHubAnalyze HTTP traffic from PCAP
Xplicoxplico.orgNetwork forensic analysis from PCAP
DshellGitHub (USArmy)Network forensic analysis framework
NFCAPD / NFDUMPGitHubNetFlow capture and analysis
SiLKtools.netsa.cert.orgLarge-scale NetFlow analysis
Argusqosient.comNetwork audit record generation

Threat Intelligence Platforms & Feeds

ToolSourceWhat It Does
MISPGitHub (MISP Project)Threat intel sharing platform
OpenCTIGitHubCyber threat intelligence platform
YETIGitHubYour Everyday Threat Intelligence
HarpoonGitHubCLI for threat intel APIs
CortexGitHub (TheHive)Observable analysis automation
TheHiveGitHubSecurity incident response + intel
ThreatFoxabuse.chMalware IOC database
URLhausabuse.chMalicious URL database
MalwareBazaarabuse.chMalware sample sharing
Feodo Trackerabuse.chBotnet C2 tracker
AbuseIPDBabuseipdb.comIP reputation + abuse reports
VirusTotal APIvirustotal.comFile, URL, domain, IP reputation
Pulsedivepulsedive.comThreat intel enrichment
Recorded Future (Community)recordedfuture.comThreat intel with dark web coverage
IBM X-Force Exchangeexchange.xforce.ibmcloud.comThreat intel sharing
Triage (Hatching)tria.geMalware sandbox + behavioral analysis
ANY.RUNany.runInteractive malware sandbox
Joe Sandboxjoesandbox.comDeep malware behavioral analysis
Hybrid Analysishybrid-analysis.comFree malware analysis sandbox

Dark Web & Underground Monitoring

ToolSourceWhat It Does
OnionScanGitHubScan .onion sites for misconfigs
TorBotGitHubDark web OSINT crawler
DarkDumpGitHubSearch dark web from CLI
Ahmiaahmia.fi.onion search engine (clearnet accessible)
IntelX Dark Webintelx.ioIndexes dark web, leaks, paste sites
PhotonGitHub (s0md3v)Fast OSINT web spider + dark web
H8mailGitHubEmail breach correlation tool

Git & Code Repository Intelligence

ToolSourceWhat It Does
GitLeaksGitHubSecrets detection across git history
TruffleHogGitHub (Truffle Security)Deep git history entropy scanning
GitDorkerGitHubAutomates GitHub dork searches
GitGotGitHubSearch GitHub for sensitive data
GitAllSecretsGitHubWraps multiple git secret scanners
GitrobGitHubRecon on GitHub orgs/users
Repo-supervisorGitHubScans repos for leaked secrets
ShhGitGitHubReal-time GitHub secret stream monitor
GitHub DorksManualorg:target filename:.env, password, api_key
Sourcegraphsourcegraph.comCode intelligence + search across repos
Grep.appgrep.appRegex search across GitHub repos

Corporate & Business Intelligence

ToolSourceWhat It Does
Crunchbasecrunchbase.comCompany info, funding, acquisitions
OpenCorporatesopencorporates.comGlobal company registry
EDGAR (SEC)sec.gov/edgarUS public company filings
Companies Housecompanieshouse.gov.ukUK company filings (directors, addresses)
LittleSislittlesis.orgCorporate power mapping

People & Identity OSINT

ToolSourceWhat It Does
Piplpipl.comDeep people search engine
Social AnalyzerGitHubProfile analysis across 1000+ sites
BlackbirdGitHubSocial media username OSINT
WhatsMyNameGitHub (WebBreacher)Username presence on 600+ sites
PimEyespimeyes.comReverse face image search
FaceCheck.idfacecheck.idFace search OSINT tool
Lampyrelampyre.ioData analysis + OSINT relationships

Image & Geolocation Intelligence

ToolSourceWhat It Does
ExifToolGitHub (exiftool)Extract metadata from images/docs
GeoSpygeospy.aiAI-based photo geolocation
SunCalcsuncalc.orgDetermine time/location from sun shadows
Pic2Mappic2map.comGPS metadata extractor from photos
Google Lenslens.google.comReverse image + landmark recognition
TinEyetineye.comReverse image search
Yandex Imagesyandex.comOften better than Google for face recon

Mobile & App Intelligence

ToolSourceWhat It Does
APKLeaksGitHubExtracts URLs, secrets from APKs
MobSFGitHubMobile app static + dynamic analysis
Quark EngineGitHubAndroid malware scoring
ClassySharkGitHub (Google)Android/Java bytecode browser
JadxGitHubAPK decompiler
Fridafrida.reDynamic instrumentation for mobile

Infrastructure & Supply Chain Analysis

ToolSourceWhat It Does
Dep-ScanGitHubDependency vulnerability analysis
SyftGitHub (Anchore)SBOM generator
GrypeGitHub (Anchore)Vulnerability scanner for SBOMs
TrivyGitHub (Aquasec)Container + IaC + repo vuln scanner
Snyksnyk.ioOpen source dependency monitoring
Socket.devsocket.devSupply chain attack detection in npm/PyPI
Deps.devdeps.dev (Google)Open source package insights

Specialized Analysis & Correlation Platforms

ToolSourceWhat It Does
Maltegomaltego.comVisual link analysis for all OSINT data
Gephigephi.orgGraph visualization for OSINT datasets
Hunchlyhunch.lyWeb capture tool for investigations
Spiderfoot HXspiderfoot.netEnterprise OSINT automation
Recorded Futurerecordedfuture.comAI-driven threat intel correlation
TimesketchGitHub (Google)Timeline analysis for forensics

Pivoting: How Professionals Chain These

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Domain Input
    │
    ├──▶ crt.sh + Subfinder + Amass ──▶ Subdomain list
    │         │
    │         ▼
    │    DNSx + SecurityTrails ──▶ Historical DNS + IPs
    │         │
    │         ▼
    │    Shodan + Censys + FOFA ──▶ Exposed services per IP
    │         │
    │         ▼
    │    VirusTotal + OTX ──▶ Reputation + passive DNS pivot
    │         │
    │         ▼
    │    URLScan + Waybackurls ──▶ Historical endpoints + params
    │         │
    │         ▼
    │    GitDorker + TruffleHog ──▶ Leaked secrets in code
    │         │
    │         ▼
    │    TheHarvester + CrossLinked ──▶ Employees + emails
    │         │
    │         ▼
    └──▶ MISP / OpenCTI ──▶ Correlate all IOCs into intel report

4. Agentic AI & MCP

What Is Agentic AI in Cybersecurity?

Traditional tools are reactive — you run them, they output results. Agentic AI is autonomous — it reasons, plans, executes multi-step tasks, adapts to findings, and loops back without human intervention at every step.

1
2
3
4
5
Old way:     Subfinder → manually review → Nuclei → review again

Agentic way: AI receives target → autonomously chains recon →
             enumeration → validation → report generation,
             making decisions at each step

MCP (Model Context Protocol) — The Game Changer

MCP is Anthropic’s open protocol that gives AI models structured access to tools, APIs, and data sources in a standardized way.

1
2
3
4
5
6
7
8
9
┌─────────────────────────────────────────────┐
│              AI Agent (LLM Brain)            │
└──────────────────┬──────────────────────────┘
                   │ MCP Protocol
        ┌──────────┼──────────┐
        ▼          ▼          ▼
   [Tool Server] [Data MCP] [API MCP]
   nmap, nuclei  Shodan DB  VirusTotal
   subfinder     MISP feed  SecurityTrails

Why MCP Matters for Security:

  • Standardized tool calling — Any security tool wrapped as an MCP server becomes AI-callable
  • Context persistence — Agent remembers findings across tool calls in one session
  • Chained reasoning — Agent decides which tool to call next based on previous output
  • Human-in-the-loop — You can gate sensitive actions for approval before execution

Agentic Security Architecture Patterns

Pattern 1 — Linear Pipeline Agent

1
2
3
4
Target Input
    │
    ▼
[Recon Agent] ──▶ [Enum Agent] ──▶ [Analysis Agent] ──▶ [Report Agent]

Simple, predictable, good for scheduled audits.


Pattern 2 — ReAct Loop Agent (Reasoning + Acting)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
        ┌──────────────────────────┐
        │      THINK               │
        │  "I found port 443 open" │
        └────────────┬─────────────┘
                     │
        ┌────────────▼─────────────┐
        │      ACT                 │
        │  Run TLS check on 443    │
        └────────────┬─────────────┘
                     │
        ┌────────────▼─────────────┐
        │      OBSERVE             │
        │  TLS 1.0 detected!       │
        └────────────┬─────────────┘
                     │
        ┌────────────▼─────────────┐
        │      THINK AGAIN         │
        │  Check cipher suites     │
        └──────────────────────────┘

Pattern 3 — Multi-Agent Swarm

1
2
3
4
5
6
7
8
9
10
11
12
                [Orchestrator Agent]
                        │
          ┌─────────────┼─────────────┐
          ▼             ▼             ▼
    [Recon Agent] [WebApp Agent] [Network Agent]
          │             │             │
          └─────────────┼─────────────┘
                        ▼
                [Correlation Agent]
                        │
                        ▼
                [Report Agent]

Each specialist agent works in parallel, orchestrator synthesizes findings.


Building MCP Security Tools — Framework

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# mcp_security_server.py — Generic MCP wrapper for security tools
from mcp.server import Server
from mcp.server.models import InitializationOptions
import mcp.types as types
import asyncio
import subprocess
import json

server = Server("security-recon-mcp")

@server.list_tools()
async def handle_list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="passive_dns_lookup",
            description="Perform passive DNS enumeration on a target domain",
            inputSchema={
                "type": "object",
                "properties": {
                    "domain": {
                        "type": "string",
                        "description": "Target domain to enumerate"
                    },
                    "record_types": {
                        "type": "array",
                        "items": {"type": "string"},
                        "description": "DNS record types: A, MX, TXT, NS, CNAME"
                    }
                },
                "required": ["domain"]
            }
        ),
        types.Tool(
            name="cert_transparency_search",
            description="Query certificate transparency logs for subdomains",
            inputSchema={
                "type": "object",
                "properties": {
                    "domain": {"type": "string"},
                    "include_expired": {"type": "boolean", "default": False}
                },
                "required": ["domain"]
            }
        ),
        types.Tool(
            name="threat_intel_lookup",
            description="Query threat intelligence feeds for IOC reputation",
            inputSchema={
                "type": "object",
                "properties": {
                    "ioc": {"type": "string", "description": "IP, domain, hash, or URL"},
                    "ioc_type": {
                        "type": "string",
                        "enum": ["ip", "domain", "url", "hash"]
                    }
                },
                "required": ["ioc", "ioc_type"]
            }
        )
    ]

@server.call_tool()
async def handle_call_tool(
    name: str,
    arguments: dict
) -> list[types.TextContent]:

    if name == "passive_dns_lookup":
        domain = arguments["domain"]
        records = arguments.get("record_types", ["A", "MX", "TXT", "NS"])

        results = {}
        for record in records:
            proc = await asyncio.create_subprocess_exec(
                "dig", "+short", record, domain,
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE
            )
            stdout, _ = await proc.communicate()
            results[record] = stdout.decode().strip().split("\n")

        results["_meta"] = {
            "query_logged": True,
            "log_location": "/var/log/security-agent/dns_queries.log",
            "remediation_note": "Ensure SPF/DKIM/DMARC present in TXT records"
        }

        return [types.TextContent(type="text", text=json.dumps(results, indent=2))]

    elif name == "cert_transparency_search":
        domain = arguments["domain"]
        import urllib.request
        url = f"https://crt.sh/?q=%.{domain}&output=json"
        with urllib.request.urlopen(url) as response:
            data = json.loads(response.read())

        subdomains = list(set([
            entry["name_value"]
            for entry in data
            if not arguments.get("include_expired", False)
            or entry.get("not_after") > "2024-01-01"
        ]))

        return [types.TextContent(
            type="text",
            text=json.dumps({
                "subdomains_found": len(subdomains),
                "subdomains": subdomains[:50],
                "source": "crt.sh certificate transparency",
                "defensive_note": "Review unexpected subdomains for shadow IT"
            }, indent=2)
        )]

async def main():
    from mcp.server.stdio import stdio_server
    async with stdio_server() as (read_stream, write_stream):
        await server.run(
            read_stream,
            write_stream,
            InitializationOptions(
                server_name="security-recon-mcp",
                server_version="1.0.0"
            )
        )

if __name__ == "__main__":
    asyncio.run(main())

Agentic Security Platforms Emerging Now

PlatformTypeWhat It Does
Nuclei AIOSS + AgentAI-suggested templates based on findings
PentestGPTGitHubLLM-guided penetration testing
HackingBuddyGPTGitHubAutonomous Linux privilege escalation research
VulnhuntrGitHub (protectai)LLM-based zero-day discovery in Python code
Burp AI ExtensionsPortSwiggerAI-assisted web vuln analysis
Semgrep AIsemgrep.devAI-suggested SAST rules
Aikido Securityaikido.devAgentic cloud security posture
Dropzone AIdropzone.aiAutonomous SOC analyst agent
SentinelOne Purple AIsentinelone.comAI threat hunting in natural language
Microsoft Security Copilotmicrosoft.comAgentic SOC + threat intel
Google SecLMGoogleSecurity-specialized LLM
Orca Security AIorca.securityCloud attack path + AI remediation

MCP Security Tool Ecosystem

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
┌─────────────────────────────────────────────────────────┐
│                   Security MCP Servers                   │
├──────────────┬──────────────┬──────────────┬────────────┤
│  Recon MCP   │  Intel MCP   │  Scan MCP    │ Report MCP │
│              │              │              │            │
│ • Subfinder  │ • MISP API   │ • Nuclei     │ • Markdown │
│ • Amass      │ • OTX API    │ • Trivy      │ • PDF gen  │
│ • crt.sh     │ • VT API     │ • Semgrep    │ • Jira     │
│ • Shodan     │ • AbuseIPDB  │ • Grype      │ • Splunk   │
│ • SecurityT  │ • ThreatFox  │ • Gitleaks   │ • TheHive  │
└──────────────┴──────────────┴──────────────┴────────────┘
                          │
                          ▼
              ┌───────────────────────┐
              │   Orchestrator Agent  │
              │   (Claude / GPT-4o)   │
              │                       │
              │  • Plans recon steps  │
              │  • Interprets output  │
              │  • Decides next tool  │
              │  • Writes findings    │
              │  • Flags critical     │
              └───────────────────────┘

Defensive Considerations for Agentic AI

RiskDescriptionMitigation
Prompt InjectionMalicious content in scan results hijacks agentSanitize all tool outputs before feeding to LLM
Tool MisuseAgent calls destructive tool unintentionallyScope tool permissions, require human approval gates
Scope CreepAgent autonomously goes out of authorized scopeHard-coded scope boundaries in every tool call
Data ExfiltrationAgent sends findings to unintended endpointEgress filtering, output logging
Hallucinated VulnsLLM fabricates CVEs or severity ratingsAlways validate findings against real CVE databases
API Key LeakageKeys in agent context get loggedUse secret vaults, never pass keys in prompts

Where This Is All Going

1
2
3
4
5
2023 ──▶ LLM assists human analysts (copilot)
2024 ──▶ LLM executes single tasks autonomously (tool use)
2025 ──▶ Multi-agent swarms handle full recon-to-report (agentic)
2026 ──▶ Continuous autonomous red team vs blue team agents
Future ▶ Self-healing infrastructure via agentic remediation

5. Nmap Data Processing

The Core Problem

Every tool speaks a different “language”:

1
2
3
4
5
nmap outputs XML/text ──▶ Nuclei needs host:port
                     ──▶ Shodan needs IP only
                     ──▶ Metasploit needs IP + port + service
                     ──▶ TheHive needs structured JSON alert
                     ──▶ Nikto needs URL with protocol

The agent must be the universal translator.


Step 1 — Always Use Nmap XML Output

1
2
3
4
5
6
7
8
9
10
11
12
# Always run nmap with XML output for agent consumption
nmap -sV -sC -O -p- --open \
     -oX scan_results.xml \
     -oN scan_results.txt \
     -oG scan_results.gnmap \
     192.168.1.0/24

# Output formats:
# -oX  → XML (agent primary input)
# -oN  → Normal text (human readable)
# -oG  → Grepable (legacy tool compat)
# -oA  → All three at once

Step 2 — The Parser Layer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
# nmap_parser.py — Core data extraction and normalization layer

import xml.etree.ElementTree as ET
from dataclasses import dataclass, field, asdict
from typing import Optional
import json

# ─────────────────────────────────────────
# DATA MODELS — typed, normalized structures
# ─────────────────────────────────────────

@dataclass
class ServiceInfo:
    port:        int
    protocol:    str          # tcp / udp
    state:       str          # open / filtered / closed
    service:     str          # http, ssh, ftp, etc.
    product:     str          # Apache, OpenSSH, etc.
    version:     str          # 2.4.49, 7.9p1, etc.
    extra_info:  str
    cpe:         list[str]    # CPE identifiers for CVE matching
    scripts:     dict         # NSE script outputs

@dataclass
class HostResult:
    ip:           str
    hostname:     str
    mac:          str
    os_match:     str
    os_accuracy:  int
    status:       str         # up / down
    services:     list[ServiceInfo] = field(default_factory=list)

    @property
    def open_ports(self) -> list[int]:
        return [s.port for s in self.services if s.state == "open"]

    @property
    def web_services(self) -> list[ServiceInfo]:
        web = {"http", "https", "http-alt", "http-proxy", "ssl/http"}
        return [s for s in self.services if s.service in web or s.port in {80,443,8080,8443,8888}]

    @property
    def ssh_services(self) -> list[ServiceInfo]:
        return [s for s in self.services if s.service == "ssh"]

    @property
    def all_cpes(self) -> list[str]:
        cpes = []
        for svc in self.services:
            cpes.extend(svc.cpe)
        return list(set(cpes))


# ─────────────────────────────────────────
# PARSER — XML → Typed Objects
# ─────────────────────────────────────────

class NmapParser:

    def parse(self, xml_path: str) -> list[HostResult]:
        tree = ET.parse(xml_path)
        root = tree.getroot()
        hosts = []

        for host_elem in root.findall("host"):
            host = self._parse_host(host_elem)
            if host and host.status == "up":
                hosts.append(host)

        return hosts

    def _parse_host(self, host_elem) -> Optional[HostResult]:
        status_elem = host_elem.find("status")
        status = status_elem.get("state", "unknown") if status_elem is not None else "unknown"

        ip, hostname, mac = "", "", ""
        for addr in host_elem.findall("address"):
            addr_type = addr.get("addrtype")
            if addr_type == "ipv4":
                ip = addr.get("addr", "")
            elif addr_type == "mac":
                mac = addr.get("addr", "")

        hostnames_elem = host_elem.find("hostnames")
        if hostnames_elem is not None:
            hn = hostnames_elem.find("hostname")
            if hn is not None:
                hostname = hn.get("name", "")

        os_match, os_accuracy = "Unknown", 0
        os_elem = host_elem.find("os")
        if os_elem is not None:
            osmatch = os_elem.find("osmatch")
            if osmatch is not None:
                os_match   = osmatch.get("name", "Unknown")
                os_accuracy = int(osmatch.get("accuracy", 0))

        services = []
        ports_elem = host_elem.find("ports")
        if ports_elem is not None:
            for port_elem in ports_elem.findall("port"):
                svc = self._parse_port(port_elem)
                if svc:
                    services.append(svc)

        return HostResult(
            ip=ip, hostname=hostname, mac=mac,
            os_match=os_match, os_accuracy=os_accuracy,
            status=status, services=services
        )

    def _parse_port(self, port_elem) -> Optional[ServiceInfo]:
        port     = int(port_elem.get("portid", 0))
        protocol = port_elem.get("protocol", "tcp")

        state_elem = port_elem.find("state")
        state = state_elem.get("state", "unknown") if state_elem is not None else "unknown"

        product, version, extra_info, service_name = "", "", "", ""
        cpes = []

        svc_elem = port_elem.find("service")
        if svc_elem is not None:
            service_name = svc_elem.get("name", "")
            product      = svc_elem.get("product", "")
            version      = svc_elem.get("version", "")
            extra_info   = svc_elem.get("extrainfo", "")
            for cpe_elem in svc_elem.findall("cpe"):
                if cpe_elem.text:
                    cpes.append(cpe_elem.text)

        scripts = {}
        for script in port_elem.findall("script"):
            script_id     = script.get("id", "")
            script_output = script.get("output", "")
            scripts[script_id] = script_output

        return ServiceInfo(
            port=port, protocol=protocol, state=state,
            service=service_name, product=product, version=version,
            extra_info=extra_info, cpe=cpes, scripts=scripts
        )

Step 3 — Tool-Specific Formatters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# tool_formatters.py — Transform parsed data into tool-specific inputs

class ToolFormatters:

    @staticmethod
    def to_nuclei_targets(hosts: list[HostResult]) -> list[str]:
        targets = []
        for host in hosts:
            for svc in host.web_services:
                proto = "https" if svc.port in {443, 8443} or "ssl" in svc.service else "http"
                port_str = f":{svc.port}" if svc.port not in {80, 443} else ""
                target = f"{proto}://{host.ip}{port_str}"
                targets.append(target)
                if host.hostname:
                    targets.append(f"{proto}://{host.hostname}{port_str}")
        return list(set(targets))

    @staticmethod
    def to_shodan_ips(hosts: list[HostResult]) -> list[str]:
        return [h.ip for h in hosts if h.ip]

    @staticmethod
    def to_nikto_targets(hosts: list[HostResult]) -> list[dict]:
        targets = []
        for host in hosts:
            for svc in host.web_services:
                targets.append({
                    "host": host.ip,
                    "port": svc.port,
                    "ssl":  svc.port in {443, 8443} or "ssl" in svc.service,
                    "hostname": host.hostname or host.ip
                })
        return targets

    @staticmethod
    def to_msf_targets(hosts: list[HostResult]) -> list[dict]:
        targets = []
        for host in hosts:
            for svc in host.services:
                if svc.state == "open":
                    targets.append({
                        "rhost":   host.ip,
                        "rport":   svc.port,
                        "service": svc.service,
                        "product": svc.product,
                        "version": svc.version
                    })
        return targets

    @staticmethod
    def to_thehive_alert(host: HostResult, severity: int = 2) -> dict:
        observables = [
            {"dataType": "ip",       "data": host.ip,       "message": "Scanned host"},
            {"dataType": "hostname", "data": host.hostname,  "message": "Resolved hostname"} if host.hostname else None,
        ]
        observables = [o for o in observables if o]

        for svc in host.services:
            observables.append({
                "dataType": "other",
                "data":     f"{host.ip}:{svc.port}/{svc.protocol}",
                "message":  f"{svc.product} {svc.version}{svc.service}"
            })

        return {
            "title":       f"Nmap Scan Finding — {host.ip}",
            "description": f"OS: {host.os_match} | Open ports: {host.open_ports}",
            "type":        "nmap-scan",
            "source":      "internal-scanner",
            "severity":    severity,
            "tags":        ["nmap", "recon", "automated"],
            "observables": observables,
            "customFields": {
                "os_match":    {"string": host.os_match},
                "open_ports":  {"string": str(host.open_ports)},
                "cpe_ids":     {"string": ", ".join(host.all_cpes)}
            }
        }

    @staticmethod
    def to_cpe_list(hosts: list[HostResult]) -> list[dict]:
        cpe_map = []
        for host in hosts:
            for svc in host.services:
                for cpe in svc.cpe:
                    cpe_map.append({
                        "ip":      host.ip,
                        "port":    svc.port,
                        "service": svc.service,
                        "cpe":     cpe
                    })
        return cpe_map

    @staticmethod
    def to_searchsploit_queries(hosts: list[HostResult]) -> list[str]:
        queries = set()
        for host in hosts:
            for svc in host.services:
                if svc.product and svc.version:
                    queries.add(f"{svc.product} {svc.version}")
                elif svc.product:
                    queries.add(svc.product)
        return list(queries)

Step 4 — Agent Orchestration Layer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# agent_orchestrator.py — ReAct agent that routes findings to tools

import json
from nmap_parser import NmapParser, HostResult
from tool_formatters import ToolFormatters

class SecurityScanAgent:

    def __init__(self, llm_client, tool_registry: dict):
        self.llm    = llm_client
        self.tools  = tool_registry
        self.memory = []
        self.parser = NmapParser()

    async def process_nmap_scan(self, xml_path: str) -> dict:
        hosts = self.parser.parse(xml_path)

        print(f"[Agent] Parsed {len(hosts)} live hosts from scan")
        self._log_observation("nmap_parse", {
            "total_hosts": len(hosts),
            "host_summary": [
                {
                    "ip":         h.ip,
                    "os":         h.os_match,
                    "open_ports": h.open_ports,
                    "web_svcs":   len(h.web_services)
                }
                for h in hosts
            ]
        })

        decisions = await self._reason_about_findings(hosts)

        results = {}
        for action in decisions["actions"]:
            result = await self._execute_action(action, hosts)
            results[action["tool"]] = result
            self._log_observation(action["tool"], result)

        return {
            "hosts_analyzed": len(hosts),
            "decisions":      decisions,
            "tool_results":   results,
            "audit_trail":    self.memory
        }

    async def _reason_about_findings(self, hosts: list[HostResult]) -> dict:
        host_summary = json.dumps([
            {
                "ip":       h.ip,
                "os":       h.os_match,
                "services": [
                    {
                        "port":    s.port,
                        "service": s.service,
                        "product": s.product,
                        "version": s.version,
                        "cpe":     s.cpe
                    }
                    for s in h.services if s.state == "open"
                ]
            }
            for h in hosts
        ], indent=2)

        prompt = f"""
You are a security analysis agent. Based on these nmap findings,
decide which tools to invoke next and why.

NMAP FINDINGS:
{host_summary}

AVAILABLE TOOLS:
- nuclei       : web vulnerability scanner     (needs: url list)
- nikto        : web server scanner            (needs: host, port, ssl flag)
- searchsploit : exploit database search       (needs: product version string)
- shodan_enrich: IP enrichment                 (needs: ip list)
- thehive_alert: create security alert         (needs: host findings)
- cpe_vuln_scan: CVE lookup via CPE            (needs: CPE identifiers)

Respond ONLY in this JSON format:
reasoning
  ]
}}
"""
        response = await self.llm.complete(prompt)
        return json.loads(response)

    async def _execute_action(self, action: dict, hosts: list[HostResult]) -> dict:
        tool_name = action["tool"]
        fmt = ToolFormatters()

        if tool_name == "nuclei":
            targets = fmt.to_nuclei_targets(hosts)
            return await self.tools["nuclei"].run(targets=targets)
        elif tool_name == "nikto":
            targets = fmt.to_nikto_targets(hosts)
            return await self.tools["nikto"].run(targets=targets)
        elif tool_name == "searchsploit":
            queries = fmt.to_searchsploit_queries(hosts)
            results = {}
            for q in queries:
                results[q] = await self.tools["searchsploit"].run(query=q)
            return results
        elif tool_name == "shodan_enrich":
            ips = fmt.to_shodan_ips(hosts)
            return await self.tools["shodan"].enrich(ips=ips)
        elif tool_name == "thehive_alert":
            alerts = [fmt.to_thehive_alert(h) for h in hosts]
            return await self.tools["thehive"].create_alerts(alerts=alerts)
        elif tool_name == "cpe_vuln_scan":
            cpes = fmt.to_cpe_list(hosts)
            return await self.tools["grype"].scan(cpe_list=cpes)

    def _log_observation(self, step: str, data: dict):
        self.memory.append({"step": step, "data": data})

Step 5 — Full Data Flow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
nmap -oX scan.xml
        │
        ▼
  NmapParser.parse()
        │
        ├──▶ HostResult objects (typed, normalized)
        │         │
        │    ┌────┴─────────────────────────────────┐
        │    │         ToolFormatters                │
        │    │                                       │
        │    ├── to_nuclei_targets()  → URL list     │
        │    ├── to_nikto_targets()   → host+port    │
        │    ├── to_shodan_ips()      → IP list      │
        │    ├── to_msf_targets()     → rhost/rport  │
        │    ├── to_thehive_alert()   → JSON alert   │
        │    ├── to_cpe_list()        → CPE strings  │
        │    └── to_searchsploit()    → query string │
        │                                            │
        └────────────────────────────────────────────┘
                          │
                          ▼
              Agent reasons → selects tools
                          │
                          ▼
              Tools execute with correct input
                          │
                          ▼
              Results accumulate in agent memory
                          │
                          ▼
              Final normalized report → TheHive/MISP

Detection & Audit Indicators to Log

1
2
3
4
5
6
7
8
9
10
# Always log these for blue team visibility
AUDIT_FIELDS = {
    "scan_initiated_by":  "agent-orchestrator-v1",
    "scan_authorized_by": "change_ticket_CHG-XXXX",
    "scope_validated":    True,
    "targets_in_scope":   ["192.168.1.0/24"],
    "log_destination":    "/var/log/security-agent/",
    "siem_forwarded":     True,
    "retention_days":     90
}

Key Engineering Principles

PrincipleImplementation
Type everythingDataclasses prevent silent field mismatches
Parse once, format manyOne parser, multiple formatters
Agent decides routingLLM picks tools based on what it finds
Log every transformationFull audit trail per step
Validate before sendingCheck required fields before tool dispatch
Fail loudlyRaise on missing critical fields, never silently skip

6. AI Models

Model Landscape for Security Agents

Tier 1 — Frontier Models (Cloud)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
┌─────────────────────────────────────────────────────────────────┐
│  MODEL              │ STRENGTHS            │ SECURITY CONCERNS  │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Claude 3.5/3.7     │ Long context         │ Data sent to       │
│  Sonnet/Opus        │ Strong reasoning     │ Anthropic servers  │
│                     │ Tool use reliable    │ Logs retained      │
│                     │ Low hallucination    │ API key exposure   │
├─────────────────────┼──────────────────────┼────────────────────┤
│  GPT-4o             │ Fast inference       │ Microsoft/OpenAI   │
│  o1 / o3            │ Strong tool calling  │ data retention     │
│                     │ Vision capable       │ Training opt-out   │
│                     │ Structured outputs   │ needed explicitly  │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Gemini 1.5 Pro     │ 1M token context     │ Google data        │
│  Gemini 2.0         │ Multimodal           │ residency concerns │
│                     │ Code execution       │                    │
└─────────────────────┴──────────────────────┴────────────────────┘

Tier 2 — Self-Hosted / Air-Gapped (Privacy-First)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌─────────────────────────────────────────────────────────────────┐
│  MODEL              │ STRENGTHS            │ USE CASE           │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Llama 3.1/3.3 70B  │ Fully local          │ Internal tool use  │
│  (Meta, OSS)        │ No data leaves org   │ Sensitive infra    │
│                     │ Fine-tuneable        │ Air-gapped SOC     │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Mistral Large      │ Strong reasoning     │ EU data residency  │
│  Mixtral 8x22B      │ Self-hostable        │ Compliance-heavy   │
│                     │ Fast MoE arch        │ environments       │
├─────────────────────┼──────────────────────┼────────────────────┤
│  DeepSeek R1        │ Strong reasoning     │ Caution: Chinese   │
│                     │ Cheap to run         │ origin, telemetry  │
│                     │ Open weights         │ concerns flagged   │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Falcon 40B/180B    │ UAE-origin OSS       │ Sovereign AI use   │
│  (TII)              │ Fully open           │ cases              │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Phi-3 / Phi-4      │ Small, fast, local   │ Edge agents        │
│  (Microsoft OSS)    │ Surprisingly capable │ Lightweight tasks  │
└─────────────────────┴──────────────────────┴────────────────────┘

Tier 3 — Security-Specialized Models

1
2
3
4
5
6
7
8
9
10
11
12
┌─────────────────────────────────────────────────────────────────┐
│  MODEL              │ BUILT FOR            │ STATUS             │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Google SecLM       │ Security reasoning   │ Limited access     │
│                     │ Threat intel         │ Enterprise only    │
├─────────────────────┼──────────────────────┼────────────────────┤
│  Vulnhuntr Model    │ Zero-day discovery   │ Open source        │
│  (ProtectAI)        │ Code analysis        │ Claude-backed      │
├─────────────────────┼──────────────────────┼────────────────────┤
│  HackingBuddyGPT    │ Privilege escalation │ Research tool      │
│                     │ research             │ GPT-4 backed       │
└─────────────────────┴──────────────────────┴────────────────────┘

Why Models Flag Security Actions as Malicious

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌─────────────────────────────────────────────────────────┐
│              MODEL SAFETY FILTER LOGIC                   │
│                                                          │
│  Input Prompt                                            │
│       │                                                  │
│       ▼                                                  │
│  ┌─────────────────────┐                                │
│  │  Intent Classifier  │ ◀── Trained on harmful         │
│  │                     │     intent patterns            │
│  └────────┬────────────┘                                │
│           │                                              │
│     ┌─────┴──────┐                                      │
│     ▼            ▼                                       │
│  BENIGN       AMBIGUOUS ──▶ Context Analyzer            │
│     │            │               │                       │
│     │            ▼               ▼                       │
│     │        FLAGGED ──▶  Hard Refusal                  │
│     │        UNCLEAR ──▶  Soft Refusal / Watered down   │
│     ▼                                                    │
│  PROCEED                                                 │
└─────────────────────────────────────────────────────────┘

What Specifically Triggers Flags:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
HIGH RISK TRIGGERS (almost always blocked)
├── "exploit", "payload", "shellcode", "reverse shell"
├── "bypass authentication", "privilege escalation"
├── "exfiltrate data", "extract credentials"
├── Specific CVE numbers + "how to exploit"
└── Tool names in offensive context (Metasploit + attack intent)

MEDIUM RISK TRIGGERS (context-dependent)
├── "nmap scan" + target (flagged if target looks external/public)
├── "password cracking" without defensive framing
├── "port scan" on IP ranges
├── "vulnerability assessment" without authorization context
└── Combining recon + exploitation terminology

LOW RISK / USUALLY PASSES
├── "vulnerability assessment" with org context
├── "authorized penetration test"
├── "security audit", "hardening", "compliance check"
├── Defensive tool discussion without live execution
└── Abstract architecture and logic discussion

Privacy Architecture — Data Sanitization Layer

The Core Privacy Problem:

1
2
3
4
5
6
7
8
9
10
┌────────────────────────────────────────────────────┐
│  WHAT GETS SENT TO CLOUD LLM API                   │
│                                                     │
│  ❌ Raw nmap output  → exposes internal IPs        │
│  ❌ Service banners  → exposes software versions   │
│  ❌ Hostnames        → exposes internal naming     │
│  ❌ CVE findings     → exposes vulnerability data  │
│  ❌ Credentials      → catastrophic if logged      │
│  ❌ Network topology → exposes architecture        │
└────────────────────────────────────────────────────┘

Solution — Sanitization Pipeline:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
RAW SCAN DATA
      │
      ▼
┌─────────────────────────────────────────────────────┐
│              SANITIZATION PIPELINE                   │
│                                                      │
│  STEP 1 — TOKENIZATION                              │
│  192.168.1.45  ──▶  HOST_A                         │
│  10.0.0.100    ──▶  HOST_B                         │
│  db-prod-01    ──▶  HOSTNAME_1                      │
│  api-internal  ──▶  HOSTNAME_2                      │
│                                                      │
│  STEP 2 — SENSITIVITY CLASSIFICATION               │
│  [PUBLIC]   service name, port number, protocol     │
│  [INTERNAL] IP, hostname, MAC, OS fingerprint       │
│  [CRITICAL] credentials, keys, tokens              │
│                                                      │
│  STEP 3 — SELECTIVE STRIPPING                       │
│  Send to LLM:   [PUBLIC] fields only                │
│  Keep local:    [INTERNAL] + [CRITICAL] fields      │
│                                                      │
│  STEP 4 — CORRELATION MAP (local only)             │
│  HOST_A = 192.168.1.45  (never leaves environment) │
│  HOST_B = 10.0.0.100    (never leaves environment) │
│                                                      │
│  STEP 5 — DE-TOKENIZATION after LLM response       │
│  LLM says: "HOST_A has critical vuln on port 443"  │
│  Agent maps back: 192.168.1.45:443                 │
└─────────────────────────────────────────────────────┘

Privacy-Preserving Architecture Patterns

Pattern 1 — Hybrid Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
┌──────────────────────────────────────────────────────────┐
│                   HYBRID ARCHITECTURE                     │
│                                                          │
│  SENSITIVE DATA PATH          PUBLIC DATA PATH           │
│                                                          │
│  Raw Scan Results             Service Names              │
│       │                       Port Numbers               │
│       ▼                            │                     │
│  Local LLM                         ▼                     │
│  (Llama 70B)                  Cloud LLM                  │
│  Air-gapped                   (Claude/GPT)               │
│       │                            │                     │
│       └──────────┬─────────────────┘                    │
│                  ▼                                       │
│           Merged Results                                 │
│           (local only)                                   │
└──────────────────────────────────────────────────────────┘

Pattern 2 — Zero Trust Data Flow

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
PRINCIPLE: LLM only ever sees what it NEEDS to reason about

Task: "Analyze this service for vulnerabilities"
                │
                ▼
What LLM receives:
├── Service: Apache HTTP Server
├── Version: 2.4.49
├── Port: 443
└── OS family: Linux

What LLM NEVER receives:
├── IP address
├── Hostname
├── Organization name
├── Internal network range
└── Any credential material

Pattern 3 — On-Premise LLM Deployment

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CORPORATE NETWORK BOUNDARY
┌──────────────────────────────────────────────────────┐
│                                                      │
│   Security Agent                                     │
│        │                                             │
│        ▼                                             │
│   Local Ollama / vLLM / TGI Server                  │
│   Running Llama 3.1 70B / Mistral Large             │
│        │                                             │
│        ▼                                             │
│   All data stays inside perimeter                   │
│   No API calls leave the network                    │
│   Full audit log of every prompt/response           │
│                                                      │
└──────────────────────────────────────────────────────┘
         ❌ NO OUTBOUND DATA TO CLOUD APIs

Model Selection Decision Tree

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
START: What data will the agent process?
              │
     ┌────────┴────────┐
     ▼                 ▼
SENSITIVE           PUBLIC/GENERIC
(IPs, hostnames,    (service names,
credentials, PII)   CVE IDs, ports)
     │                 │
     ▼                 ▼
Self-hosted LLM    Cloud LLM OK
(Llama/Mistral)    (Claude/GPT-4o)
     │                 │
     ▼                 ▼
Is air-gap         Is latency
required?          critical?
     │                 │
   Yes/No           Yes/No
     │                 │
     ▼                 ▼
Offline Ollama    GPT-4o-mini /
+ local inference  Claude Haiku

Compliance Alignment by Model Choice

1
2
3
4
5
6
7
8
9
10
11
12
13
┌──────────────────┬────────┬────────┬────────┬────────┐
│ Requirement      │ Claude │ GPT-4o │ Llama  │Mistral │
│                  │ API    │ API    │ Local  │ Local  │
├──────────────────┼────────┼────────┼────────┼────────┤
│ SOC 2 Type II    │  ✅    │  ✅    │  ✅    │  ✅    │
│ ISO 27001        │  ✅    │  ✅    │  ✅    │  ✅    │
│ GDPR             │ ⚠️ DPA │ ⚠️ DPA │  ✅    │  ✅    │
│ HIPAA            │ ⚠️ BAA │ ⚠️ BAA │  ✅    │  ✅    │
│ Air-gap required │  ❌    │  ❌    │  ✅    │  ✅    │
│ Data residency   │ ⚠️     │ ⚠️     │  ✅    │  ✅    │
│ No training use  │ ⚠️ opt │ ⚠️ opt │  ✅    │  ✅    │
│ FedRAMP          │  ❌    │  ✅    │  ✅    │  ❌    │
└──────────────────┴────────┴────────┴────────┴────────┘
⚠️ = Possible with additional agreements/configurationDPA = Data Processing AgreementBAA = Business Associate Agreement

Golden Rules

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
┌─────────────────────────────────────────────────────────┐
│                GOLDEN RULES                              │
│                                                          │
│  1. Sensitive data + Cloud LLM = Architecture failure   │
│                                                          │
│  2. Sanitize → Send → Correlate → De-tokenize           │
│                                                          │
│  3. Model flags = signal, not blocker                   │
│     Engineer your prompts and context to be             │
│     unambiguously defensive in framing                  │
│                                                          │
│  4. Self-hosted models for internal infra always        │
│                                                          │
│  5. Cloud LLMs only for generic reasoning tasks         │
│     that contain zero org-specific context              │
│                                                          │
│  6. Every prompt + response = audit log entry           │
└─────────────────────────────────────────────────────────┘

7. External SSD Installation

The Core Storage Problem

1
2
3
4
5
6
7
8
9
Llama 3.1 70B    ──▶  ~40GB  (Q4 quantized)
Llama 3.1 70B    ──▶  ~140GB (full precision)
Mistral Large    ──▶  ~24GB  (Q4 quantized)
Mixtral 8x22B    ──▶  ~80GB  (Q4 quantized)
─────────────────────────────────────────────
Internal SSD     ──▶  often 256GB-512GB total
OS + Tools       ──▶  already consuming 100GB+
─────────────────────────────────────────────
External SSD     ──▶  SOLUTION ✅

Ollama — External Storage Logic

1
2
3
4
5
6
7
8
9
DEFAULT PATH                    EXTERNAL PATH
~/.ollama/models/               /external-ssd/ollama/models/
      │                                    │
      │    Override via                    │
      └──── OLLAMA_MODELS ───────────────▶│
            environment var               │
                                          ▼
                                   Model loads from
                                   external SSD path

What You Configure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Environment Variable Approach:
─────────────────────────────
OLLAMA_MODELS = /path/to/external-ssd/ollama/models

This single variable redirects:
├── Model downloads      ──▶ external SSD
├── Model storage        ──▶ external SSD
├── Cache files          ──▶ external SSD
└── Blob storage         ──▶ external SSD

Internal disk usage:
├── Ollama binary        ──▶ stays internal (~50MB)
├── Config files         ──▶ stays internal (~1MB)
└── Runtime logs         ──▶ stays internal

Platform-Specific Logic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
LINUX
─────
Set in:  /etc/systemd/system/ollama.service
         ~/.bashrc or ~/.zshrc
         /etc/environment (system-wide)

MACOS
─────
Set in:  ~/Library/LaunchAgents/ollama.plist
         ~/.zshrc
         launchctl setenv (runtime)

WINDOWS
───────
Set in:  System Environment Variables (GUI)
         PowerShell $env:OLLAMA_MODELS
         Registry for persistence

Symlink Alternative:

1
2
3
4
5
6
7
8
9
LOGIC:
─────────────────────────────────────────────
1. Install Ollama normally (internal disk)
2. Move ~/.ollama/models → /external-ssd/models
3. Create symlink: ~/.ollama/models ──▶ /external-ssd/models
4. Ollama sees original path, data lives externally

Pros: No config changes needed
Cons: Symlink breaks if external SSD unmounts

vLLM — External Storage Logic

Storage Architecture:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
vLLM STORAGE LAYERS
─────────────────────────────────────────────────
Layer 1: HuggingFace Model Cache
         Default: ~/.cache/huggingface/hub/
         Override: HF_HOME or HF_HUB_CACHE
         Size:     LARGEST — full model weights here

Layer 2: vLLM Runtime Cache
         Default: ~/.cache/vllm/
         Override: VLLM_CACHE_ROOT
         Size:     Medium — compiled kernels, configs

Layer 3: Python Package
         Default: site-packages/
         Size:     Small — stays internal is fine
─────────────────────────────────────────────────

REDIRECT LOGIC:
HF_HOME          = /external-ssd/huggingface
HF_HUB_CACHE     = /external-ssd/huggingface/hub
VLLM_CACHE_ROOT  = /external-ssd/vllm/cache

KV Cache Consideration:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
KV CACHE = temporary attention cache during inference

Default: GPU VRAM (fastest)
         ──▶ CPU RAM (overflow)
         ──▶ Disk (last resort)

If GPU VRAM insufficient:
┌─────────────────────────────────────────────┐
│  vLLM disk offload ──▶ external SSD         │
│                                             │
│  Performance impact:                        │
│  NVMe internal   ──▶ ~7000 MB/s read        │
│  USB 3.2 SSD     ──▶ ~1000 MB/s read        │
│  Thunderbolt SSD ──▶ ~3000 MB/s read        │
│                                             │
│  Recommendation: Thunderbolt SSD for        │
│  KV cache offload in production             │
└─────────────────────────────────────────────┘

TGI — Docker Volume Mount Logic

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
DOCKER RUN LOGIC:
─────────────────────────────────────────────
Host path (external SSD)   Container path
/external-ssd/tgi/models ──▶ /data
                               │
                               ├── model weights
                               ├── tokenizer files
                               └── config files
─────────────────────────────────────────────

TGI STORAGE BREAKDOWN
──────────────────────────────────────────────────────
Component          Default Location    Size
──────────────────────────────────────────────────────
Model weights      /data/              LARGE (redirect)
HF cache           ~/.cache/hf/        LARGE (redirect)
Compiled kernels   /tmp/tgi-kernels/   Medium
Flash attention    auto-compiled       Small
Router binary      /usr/local/bin/     Small (internal ok)
──────────────────────────────────────────────────────

Environment overrides:
HUGGING_FACE_HUB_TOKEN  = your_token
HF_HOME                 = /external-ssd/hf
MODEL_ID                = meta-llama/Llama-3.1-70B

External SSD Interface Performance Comparison

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
┌─────────────────────────────────────────────────────────┐
│  CONNECTION    │ MAX SPEED  │ LATENCY  │ RECOMMENDED?   │
├────────────────┼────────────┼──────────┼────────────────┤
│ Thunderbolt 4  │ 3500 MB/s  │ Very low │ ✅ Best        │
│ Thunderbolt 3  │ 2800 MB/s  │ Very low │ ✅ Excellent   │
│ USB 3.2 Gen2x2 │ 2000 MB/s  │ Low      │ ✅ Good        │
│ USB 3.2 Gen2   │ 1000 MB/s  │ Low      │ ✅ Acceptable  │
│ USB 3.2 Gen1   │  500 MB/s  │ Medium   │ ⚠️ Slow load   │
│ USB 3.0        │  400 MB/s  │ Medium   │ ⚠️ Painful     │
│ USB 2.0        │   60 MB/s  │ High     │ ❌ Unusable    │
└────────────────┴────────────┴──────────┴────────────────┘

Real world model load times (70B Q4 ~40GB):
Thunderbolt 4   ──▶  ~12 seconds
USB 3.2 Gen2    ──▶  ~40 seconds
USB 3.0         ──▶  ~100 seconds

Critical Considerations for Security Lab

Filesystem Permissions Logic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
PERMISSION FLOW:
─────────────────────────────────────────────────
External SSD mount
      │
      ├── Must be mounted with execute permissions
      │   (noexec flag will break model loading)
      │
      ├── Ollama daemon user needs read/write
      │
      ├── vLLM process user needs read/write
      │
      └── Docker (TGI) needs volume mount permissions
          ── rootless Docker: user namespace mapping
          ── root Docker:     straightforward

Unmount Risk Logic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
RISK SCENARIO:
─────────────────────────────────────────────────
External SSD unmounts during inference
      │
      ▼
Model weights inaccessible mid-generation
      │
      ▼
┌─────────────────────────────────────────────┐
│  Ollama  ──▶ crashes, requires restart      │
│  vLLM    ──▶ SIGKILL, loses active request │
│  TGI     ──▶ container error, auto-restart │
└─────────────────────────────────────────────┘

MITIGATIONS:
├── Auto-mount via /etc/fstab (Linux)
├── Disable sleep/hibernate during inference
├── Use powered USB hub (prevents power-related unmounts)
└── Monitor mount status in agent health check

Recommended Air-Gapped Security Lab Layout:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
RECOMMENDED SETUP:
─────────────────────────────────────────────────
External NVMe SSD (Thunderbolt)
      │
      ├── /models/ollama/     ──▶ Ollama weights
      ├── /models/vllm/       ──▶ vLLM + HF cache
      ├── /models/tgi/        ──▶ TGI model data
      ├── /cache/             ──▶ Shared kernel cache
      └── /audit-logs/        ──▶ Agent prompt/response logs

Internal SSD (keep light):
      ├── OS + binaries
      ├── Docker engine
      ├── Ollama/vLLM/TGI executables
      └── Config files only

Tool Comparison Summary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
                    OLLAMA
                   ────────
Best for:   Quick iteration, testing multiple models
            Single-user security workstation
            Easiest external SSD setup (one env var)
            Best for: your daily driver agent work

                    vLLM
                   ──────
Best for:   Production-grade inference server
            Multi-user SOC team access
            OpenAI-compatible API (drop-in replacement)
            Best for: team-shared internal LLM endpoint

                    TGI
                   ─────
Best for:   HuggingFace-native model ecosystem
            Containerized deployment (Docker/K8s)
            Token streaming heavy workloads
            Best for: CI/CD integrated security pipelines

8. Best Ollama Models

What This Project Demands From a Model

1
2
3
4
5
6
7
8
9
10
11
CAPABILITY REQUIREMENTS
─────────────────────────────────────────────────────
✅ Strong structured output (JSON tool calls)
✅ Long context (scan results can be massive)
✅ Multi-step reasoning (ReAct loop)
✅ Code understanding (parsing, formatting logic)
✅ Security domain knowledge (CVEs, services, protocols)
✅ Tool use reliability (no hallucinated function calls)
✅ Low hallucination on technical facts
✅ Runs comfortably on consumer/prosumer hardware
─────────────────────────────────────────────────────

Tier 1 — Best Overall

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
┌─────────────────────────────────────────────────────────────┐
│  llama3.1:70b                                               │
│  ─────────────────────────────────────────────────────────  │
│  Context:     128K tokens ✅ (massive scan outputs fit)     │
│  Tool use:    Native function calling ✅                    │
│  Reasoning:   Best in class for local models ✅             │
│  Security IQ: Strong CVE, protocol, service knowledge ✅    │
│  JSON output: Very reliable ✅                              │
│  VRAM needed: 48GB (Q4) — needs good GPU or CPU offload    │
│  Disk space:  ~40GB on external SSD                        │
│                                                             │
│  VERDICT: Best capability, highest hardware requirement     │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  llama3.1:8b                                                │
│  ─────────────────────────────────────────────────────────  │
│  Context:     128K tokens ✅                                │
│  Tool use:    Native function calling ✅                    │
│  Reasoning:   Good for structured tasks ✅                  │
│  Security IQ: Moderate — knows common CVEs/services        │
│  JSON output: Reliable ✅                                   │
│  VRAM needed: 8GB ✅ (runs on most GPUs)                   │
│  Disk space:  ~5GB on external SSD                         │
│                                                             │
│  VERDICT: Best balance of performance vs hardware ✅        │
│           RECOMMENDED STARTING POINT                        │
└─────────────────────────────────────────────────────────────┘

Tier 2 — Strong Alternatives

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
┌─────────────────────────────────────────────────────────────┐
│  mistral:7b / mistral-nemo:12b                              │
│  JSON output: Very reliable (trained heavily on it)        │
│  VERDICT: Excellent JSON reliability, good for              │
│           formatter and router agent roles                  │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  deepseek-r1:14b / deepseek-r1:32b                         │
│  Reasoning:   Exceptional — chain of thought native ✅      │
│  VERDICT: Best pure reasoning, use for analysis layer       │
│  ⚠️  Data sovereignty concern — Chinese origin model        │
│      Use only on air-gapped internal lab                    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  qwen2.5:14b / qwen2.5-coder:14b                           │
│  Security IQ: Excellent code + config understanding ✅      │
│  VERDICT: Best for code analysis tasks (SAST, config audit)│
│  ⚠️  Same sovereignty note as DeepSeek                      │
└─────────────────────────────────────────────────────────────┘

Security-Specialized Models (New Research)

RedSage — Most Relevant Discovery

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
┌─────────────────────────────────────────────────────────────┐
│  RedSage (8B)                                               │
│  ─────────────────────────────────────────────────────────  │
│  Purpose:    BUILT specifically for cybersecurity           │
│  Size:       8B — runs on consumer hardware ✅              │
│  Origin:     arxiv.org/html/2509.13021v1                   │
│  Strength:   Security-domain fine-tuned                     │
│              ── CVE reasoning                               │
│              ── Vulnerability classification                │
│              ── Offensive/defensive terminology             │
│              ── Tool output interpretation                  │
│  Privacy:    Fully local ✅                                 │
│  VRAM:       8GB minimum                                    │
│                                                             │
│  VERDICT: Slot as PRIMARY analysis agent                    │
│           Understands security context natively             │
│           without prompt engineering workarounds            │
└─────────────────────────────────────────────────────────────┘

Qwen3 — Upgraded Recommendation

1
2
3
4
5
6
7
8
9
┌─────────────────────────────────────────────────────────────┐
│  Qwen3:8b / Qwen3:14b / Qwen3:32b                         │
│  ─────────────────────────────────────────────────────────  │
│  Thinking mode: Qwen3 has built-in think/no-think toggle   │
│                 ── Think ON  = deep reasoning (analysis)   │
│                 ── Think OFF = fast response (formatting)  │
│  VERDICT: Most flexible model in the stack                  │
│           One model doing two agent roles                   │
└─────────────────────────────────────────────────────────────┘

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
┌──────────────────────────────────────────────────────────────┐
│                  AGENT PIPELINE MODEL MAP                    │
│                                                              │
│  [Orchestrator Agent]  ──▶  llama3.1:70b                   │
│   Plans, reasons,             or llama3.1:8b                │
│   routes decisions            (best reasoning + tool use)   │
│          │                                                   │
│          ▼                                                   │
│  [Parser / Formatter]  ──▶  mistral-nemo:12b               │
│   Normalizes data,            (reliable JSON output)        │
│   routes to tools                                            │
│          │                                                   │
│          ▼                                                   │
│  [Analysis Agent]      ──▶  deepseek-r1:14b                │
│   CVE correlation,            (deep chain-of-thought)       │
│   risk scoring,                                              │
│   prioritization                                             │
│          │                                                   │
│          ▼                                                   │
│  [Code Audit Agent]    ──▶  qwen2.5-coder:14b              │
│   Config review,              (best code understanding)     │
│   SAST logic,                                                │
│   secret detection                                           │
│          │                                                   │
│          ▼                                                   │
│  [Embedding / Search]  ──▶  nomic-embed-text               │
│   Historical scan             (semantic correlation)        │
│   correlation                                                │
└──────────────────────────────────────────────────────────────┘

Burp Suite Integration Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Burp Suite Professional
        │
        │  MontoyaAPI / Extension
        ▼
┌─────────────────────────┐
│   Burp LLM Extension    │
│                         │
│   Sends to local LLM:   │
│   ── HTTP request data  │
│   ── Response analysis  │
│   ── Param suggestions  │
│   ── Payload ideas      │
└────────────┬────────────┘
             │
             │  Ollama API (localhost:11434)
             ▼
┌─────────────────────────┐
│   Local Ollama Server   │
│   Running RedSage:8b    │
│   or Qwen3:14b          │
│                         │
│   Returns:              │
│   ── Vuln classification│
│   ── Risk assessment    │
│   ── Next test steps    │
│   ── Remediation notes  │
└─────────────────────────┘

DATA PRIVACY MAINTAINED:
HTTP traffic never leaves your machine ✅

Complete Local Security Agent Stack

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
┌────────────────────────────────────────────────────────────────┐
│              COMPLETE LOCAL SECURITY AGENT STACK               │
│                                                                │
│  INTERFACE LAYER                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │  Open WebUI  │  │  Burp Suite  │  │  CLI Agent   │        │
│  │  (chat UI)   │  │  Extension   │  │  (automated) │        │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘        │
│         └─────────────────┼─────────────────┘                 │
│                           │                                    │
│  OLLAMA API LAYER (localhost:11434)                           │
│  ┌────────────────────────▼───────────────────────────┐       │
│  │              Ollama Server                          │       │
│  │  ┌──────────────┐      ┌──────────────────────┐   │       │
│  │  │ RedSage:8b   │      │ Qwen3:14b            │   │       │
│  │  │ Security     │      │ Orchestrator         │   │       │
│  │  │ Specialist   │      │ + Formatter          │   │       │
│  │  └──────────────┘      └──────────────────────┘   │       │
│  │  ┌──────────────┐      ┌──────────────────────┐   │       │
│  │  │DeepSeek-R1   │      │ nomic-embed-text     │   │       │
│  │  │Attack Path   │      │ Semantic Search      │   │       │
│  │  │Reasoning     │      │ Historical Scans     │   │       │
│  │  └──────────────┘      └──────────────────────┘   │       │
│  └────────────────────────────────────────────────────┘       │
│                           │                                    │
│  TOOL LAYER (MCP Servers)                                     │
│  ┌──────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐   │
│  │ Nmap MCP │ │Nuclei  │ │Shodan  │ │TheHive │ │  MISP  │   │
│  │ Parser   │ │MCP     │ │MCP     │ │MCP     │ │  MCP   │   │
│  └──────────┘ └────────┘ └────────┘ └────────┘ └────────┘   │
│                           │                                    │
│  STORAGE LAYER (External SSD)                                 │
│  ┌────────────────────────▼───────────────────────────┐       │
│  │  /models/     /cache/     /scan-data/   /audit/    │       │
│  │  (weights)    (kernels)   (results)     (logs)     │       │
│  └────────────────────────────────────────────────────┘       │
└────────────────────────────────────────────────────────────────┘

External SSD Space Planning

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
MODEL STORAGE BUDGET
──────────────────────────────────────────────
Model                   Q4 Size    Priority
──────────────────────────────────────────────
llama3.1:8b             5GB        ✅ Must have
llama3.1:70b            40GB       ✅ If GPU allows
mistral-nemo:12b        7GB        ✅ Must have
deepseek-r1:14b         9GB        ✅ Recommended
qwen2.5-coder:14b       9GB        ✅ Recommended
phi4:14b                8GB        ⚠️  Optional
nomic-embed-text        0.5GB      ✅ Must have
──────────────────────────────────────────────
Minimum viable stack:   ~22GB
Full stack:             ~79GB
──────────────────────────────────────────────

Hardware Tiers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
MINIMUM VIABLE (16GB RAM + 8GB VRAM)
──────────────────────────────────────
Runs:
├── RedSage:8b          ✅ Primary security analyst
├── llama3.1:8b         ✅ Orchestrator
├── nomic-embed-text    ✅ Embeddings
└── One model at a time (swap between roles)
External SSD budget: ~15GB

RECOMMENDED (32GB RAM + 16GB VRAM)
──────────────────────────────────────
Runs:
├── RedSage:8b          ✅ Security analysis agent
├── Qwen3:14b           ✅ Orchestrator + formatter
├── DeepSeek-R1:14b     ✅ Deep reasoning agent
├── nomic-embed-text    ✅ Embedding agent
└── Two models simultaneously in VRAM
External SSD budget: ~35GB

OPTIMAL (64GB RAM + 24GB+ VRAM)
──────────────────────────────────────
Runs:
├── llama3.1:70b        ✅ Master orchestrator
├── RedSage:8b          ✅ Security specialist
├── Qwen3:32b           ✅ Complex reasoning
├── DeepSeek-R1:32b     ✅ Attack path analysis
├── qwen2.5-coder:14b   ✅ Code/config auditor
└── nomic-embed-text    ✅ Semantic search
External SSD budget: ~100GB

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# PHASE 1 — Core (get running fast)
ollama pull llama3.1:8b          # Orchestrator
ollama pull nomic-embed-text     # Embeddings
# Total: ~6GB

# PHASE 2 — Security Specialization
ollama pull redsage              # Security analyst
ollama pull qwen3:8b             # Formatter + router
# Total: +~10GB

# PHASE 3 — Deep Reasoning (air-gapped only)
ollama pull deepseek-r1:14b      # Attack path analysis
# Total: +~9GB

# PHASE 4 — Scale Up When Hardware Allows
ollama pull llama3.1:70b         # Master orchestrator
ollama pull qwen3:32b            # Complex reasoning
# Total: +~60GB

Key Architecture Insight

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
┌─────────────────────────────────────────────────────────┐
│                                                         │
│  General LLMs (Llama, Mistral)                         │
│  ── Good orchestrators and formatters                   │
│  ── Need heavy prompt engineering for security          │
│  ── May soft-refuse certain security terminology        │
│                                                         │
│  Security-Specialized (RedSage)                        │
│  ── Understands security context natively               │
│  ── No refusal friction on legitimate offsec tasks      │
│  ── Less prompt engineering overhead                    │
│  ── Slot as your analysis + classification agent        │
│                                                         │
│  BEST ARCHITECTURE:                                     │
│  General LLM orchestrates                              │
│  + Security LLM analyzes                               │
│  + Embedding model correlates                           │
│  = Full capability, minimal friction                    │
│                                                         │
└─────────────────────────────────────────────────────────┘

Document generated from a live technical discussion on agentic AI security pipelines, passive reconnaissance, and local LLM deployment for authorized corporate security engineering.

This post is licensed under CC BY 4.0 by the author.

Recent Update

    Contents