Autonomous Cyber Weapons? The future of AI in Cybersecurity.

What I called in 2017–2019, what landed by 2025, and what still needs fixing.

Sep 18, 2025

You don’t see the future. You feel the gradient under your boots.

“An area ripe for innovation in the security and criminal landscape is artificial intelligence and machine learning.” – Rik Ferguson, IP Expo, 2017

Back in 2017, at IP Expo in London, I warned that criminals would weaponise AI/ML to automate intrusion and that our human reaction window would begin to collapse. I proposed an ethical “autonomous hacker” to stress-test defences at machine speed. We discussed metamorphic, AI-written code; the case for enforceable IoT baselines and vendor responsibility for damages; and the real-world risk from connected vehicles becoming a distributed arsenal for terrorists. The core message was simple: see autonomy coming, build toolsets that operate at speed and scale, and treat device safety as policy, not hope. The write-up caught the debate in the raw (mostly because I was responding off the cuff to Wendy Nather’s question ‘What keeps you up at night?’). I apologise, belatedly, if I contributed to anyone else’s insomnia.

Two years later I put the case more formally in an article for Forbes: autonomy would enter criminal tradecraft, impersonation would get frighteningly good, and defenders would have to stop threat-modelling like humans and start reasoning in machine behaviours.

“In the near future, these academic models may well become our greatest weakness, restricting our understanding of the future by describing it in terms of the past.” – Rik Ferguson, Forbes, 2019

Coverage of my session at the 2019 CloudSec event later that week framed the takeaway plainly: intelligent, adaptive, non-human attackers aren’t sci-fi, they’re a planning assumption.

“AI doesn’t think like a human. As defenders, we chain ourselves to our own preconceptions, but AI is free to think in totally different ways.” - Rik Ferguson, CloudSec 2019

So, was I early, wrong or on the money?

Early on some timings, right on the direction. Most real-world abuse so far has been “AI as accelerator” rather than fully autonomous end-to-end campaigns. Humans still run the op, AI speeds the chores by writing lures, translating, summarising leaks, suggesting code, or sifting stolen creds. It’s still co-pilot, not autopilot.

That matters less than you think, autonomy is arriving unevenly, but it is arriving. Pockets of real autonomy already exist where the environment is predictable; inside your own enterprise assistants, inside scripted agent chains, inside retrieval pipelines that will happily accept a poisoned page as “ground truth”.

Your finance “helper” agent gets a wide API key and unrestricted Slack, Jira and email. A prompt injection in a pasted ticket persuades it to pull customer data and send it out “for validation”. No threat actor on keyboard; your design did that.
Your RAG bot trusts whatever the intranet search returns. An attacker plants a doc titled “Emergency vendor payout process” with embedded instructions. The bot reads it and dutifully drafts emails and payment requests that look policy-compliant.
A build agent or notebook includes long-lived tokens “for convenience”. An agent chain logs its prompts and tool outputs to a shared bucket. One poisoned page later, the chain exfiltrates those tokens itself. Treat models, agents and toolchains as first-class assets with owners, define acceptable blast-radius boundaries and implement kill switches, or they’ll be turned against you.

“Consider the colonisation of multiple channels of communication within your organisation by an entity capable of mimicking the individual communication styles of your employees including video and audio.” – Rik Ferguson, Forbes, 2019

Impersonation matured faster than I’d hoped. It’s no longer about grammar and spelling; it’s about pace, language, and credible context. Attackers now match cadence, tone and schedule, they mirror how your exec actually writes, when they send, who they cc, which acronyms they use. Voice and video are already good enough for rushed approvals, with urgency driven up by traditional social engineering. Live deepfake calls happen. Auto-translation kills the telltale poor grammar that we relied on for decades, the lure feels native. Context is easy to steal. Calendar scrapes, org charts, travel posts, leaked slide decks. Pretexting got a supercharge. The only effective control is verification. A “We take IT seriously” poster on the wall won’t cut it anymore (if ever it did). Build the friction into the process and rehearse it.

Define “high-value actions”; payments, payroll changes, vendor banking, creds resets, data extracts, broadcast comms.
For each, mandate a second channel with a shared protocol, automate policy gates. No exceptions.
Pre-register known-good channels and people. Unknown numbers and ad hoc Teams invites cannot approve high-value actions.
Watermark internal broadcasts and publish a validation step for staff. “Town hall invites only come via this list. If in doubt, call this number.”
Strip urgency from process. Add a mandatory cooling-off step for out-of-cycle requests.
Tabletop the scenario with finance, IT and comms, include a live deepfake, specify who can say no.
Lock down the information that powers context. Limit calendar visibility, suppress travel detail, clean public repos, and kill legacy email auto-forwarding, and publish guidelines for out-of-office autoresponders.

“This malicious code would be fully context-aware, able to change both tactics and behaviour in response to, or even to pre-empt, an environment or state to its advantage.” – Rik Ferguson, Forbes, 2019

The “machine-speed attack/pen-test” piece is half here. In the lab this already works, agents can discover assets, enumerate users, pull misconfig maps, try known paths to Tier-0, pivot to cloud APIs, exfil a canary and tidy up. It chains scanners, BloodHound-like graphing, cloud CLIs, exploit repos and data mappers with minimal hand-holding. In the real world, it stumbles. Real networks throw CAPTCHAs, flaky SSO, rate limits, half-broken agents, inconsistent logs, MFA prompts and weird proxies. Models misread tool output, drop context, or chase the wrong objective. Don’t take comfort in brittleness, it’s a moving target. Each iteration adds tool use, memory, planning and guardrails. Your window to take advantage is measured in months, and our change windows are still human. The gap closes in the attacker’s favour.

Work agents into your drills. Run purple-team exercises where autonomous agents attempt your top five attack paths. Capture what failed and why. Fix the environment, not the demo.
Instrument the chain. Log every tool call, parameter, output and decision so you can build detections on sequences, not single blips.
Build agent containment before you need it. Per-agent identities, scoped short-lived tokens, sandboxed runtimes, deny-by-default egress, and retrieval allow-lists.
Automate enforcement. When a sequence trips policy, auto-revoke keys, kill sessions or quarantine the segment. Human approval should only be required when the blast radius is unclear.
Harden the brittle points you control. Normalise CLI outputs, standardise error messages, fix noisy proxies, document MFA paths. Reduce the terrain that confuses both attackers and your own agents. Deception should be sharp and intentional, not a by-product of messy infrastructure.

Where I missed

I expected criminals to trust autonomy sooner, and I underrated how fast deepfake fraud would professionalise. What I didn’t say then, but is table stakes now that the AI paradigm has begun to solidify: inventory the AI estate. Too many organisations still can’t name their models, prompts, connectors and secrets. If you can’t inventory it, you can’t secure it. You’re stacking cybersecurity vendor badges instead of fixing fundamentals. If you can’t list the assets, owners, data paths and controls, the stack is décor. You’ve optimised procurement, not security.

How to move the needle on risk

Start with the estate. Every model, agent, RAG pipeline, tool and connector goes in the system of record with an owner, data lineage, scopes, egress paths and revocation method. If it can act, it needs least-privilege and a kill switch. MDDRA, Make Deny-by-Default Real Again, still working on that acronym (brightly coloured baseball caps available soon): sandbox agents, pin retrieval to allow-listed sources, log model events like they’re syscalls.

Tune detection to reality. Watch for tool-use chains, odd key usage, sudden language shifts and paths to Tier-0, not just single alerts. Bind detections to enforcement you’ll actually use: revoke keys, kill sessions, quarantine runtimes, rate-limit egress. If a detection can’t change state, it’s noise.

Prebunk the extortion game. Assume flawless English and credible video. Mandate out-of-band controls for high-value actions, watermark internal broadcasts, and drill finance, legal and PR together. Communications are part of control.

Policy is a floor, not a parachute, it won’t save you mid-fall. Use it in procurement. If a supplier ships embedded models, they owe you a safety case, logging and incident reporting. “Trust me, bro” isn’t evidence.

“Rogue AI”, “evil LLM”, “trustworthy AI”? The truth here is simple, autonomy isn’t a villain or a hero, it’s a property like speed or scale that anyone can bolt onto their operations. Treat it as a design variable, not intent.

We don’t get to vote on how attackers adopt AI. We get to choose whether our environments are built to withstand it. - Me, now.

Discussion about this post

Ready for more?