Protecting Operational Technology: OT/ICS Security Fundamentals

The first generation of industrial control system architects did not think much about cybersecurity. They had no reason to. In 1985, a SCADA system controlling a pipeline was physically isolated from every other network on the planet. The threat model was equipment failure, not adversarial attack. The safety engineering was excellent. The security engineering was zero.

Forty years later, those same systems — or direct successors running the same protocols on similar hardware — are connected to business networks, vendor remote access channels, and in some cases the public internet. The convergence of IT and OT happened gradually, driven by legitimate business needs for remote monitoring, predictive maintenance, and operational visibility. It created an attack surface that those systems were never designed to have.

Stuxnet in 2010 demonstrated that ICS environments could be targeted with precision cyber weapons. Industroyer in 2016 and 2022 demonstrated that adversaries were developing ICS-specific attack tooling capable of causing physical damage. Triton/TRISIS in 2017 targeted safety instrumented systems — the last line of defense against physical catastrophe — demonstrating that adversaries were willing to risk human casualties to achieve their objectives. The threat to OT environments is not theoretical. It is documented, ongoing, and escalating.

What Makes OT Security Different

Security professionals trained in IT environments make predictable mistakes when they first engage with OT security: they apply IT security principles directly and create new problems. Understanding the differences is not academic — it is operationally critical.

Availability is the primary constraint. In IT security, the traditional triad prioritizes confidentiality, integrity, and availability in that order. In OT security, availability is the non-negotiable priority. A water treatment plant controller that goes offline for a security patch is more dangerous than one that runs with a known vulnerability. A turbine control system that gets disrupted by a security scan can cause physical equipment damage costing millions of dollars and weeks of production loss. Security controls in OT environments must be designed around the assumption that disruption is not acceptable, even temporarily.

Legacy systems are permanent fixtures, not upgrade opportunities. An enterprise IT team can patch Windows in days. An OT team managing PLCs running firmware from 2003 does not have a patch available. The vendor may no longer exist. The system may be embedded in equipment with a 30-year expected lifespan. Many OT environments operate systems with known critical vulnerabilities that cannot be patched because no patch exists and the system cannot be replaced without a multi-year capital project and extensive regulatory approval. Security in these environments must be achieved through compensating controls, not remediation.

Standard IT tools cause physical harm. Network scanners, vulnerability assessment tools, and even some endpoint detection agents have been documented causing OT device crashes, control disruption, and loss of communication with field devices when deployed in OT networks without testing. The reason: many legacy OT protocols and devices have no error handling for unexpected traffic. A TCP SYN scan that takes milliseconds on an IT network can crash a PLC that was never designed to receive unsolicited TCP connections. Every IT security tool must be tested in a lab environment against the specific OT device types in scope before deployment in production OT networks.

Physical consequences change the risk calculus entirely. An IT breach causes data loss, business disruption, and financial harm. An OT breach can cause those things plus: chlorine overdose in a water treatment system, natural gas pipeline rupture, refinery fire, power grid instability, industrial equipment destruction, and worker injury or death. The consequence severity for OT security failures creates ethical and legal obligations that IT security does not carry.

The Purdue Model and Its Limits

The reference architecture for OT security for three decades has been the Purdue Enterprise Reference Architecture (PERA), which defines a hierarchical model of network zones from field devices at Level 0 through business systems at Level 4, with strict traffic controls between levels and a demilitarized zone (Level 3.5) mediating communication between OT and IT networks.

The Purdue model's architecture is sound in principle. The problem is that modern industrial environments routinely violate it, often for legitimate operational reasons that the model did not anticipate: remote access for vendor support, historian replication to cloud analytics platforms, direct sensor integration with business intelligence systems, and the general erosion of zone boundaries as IT-OT convergence programs progress.

Asset inventory in most OT environments reveals this reality starkly. The assumed architecture and the actual architecture differ significantly in every environment we have assessed. Systems that should not communicate directly do. Jump servers that should gate all remote access have been supplemented with secondary remote access tools installed by individual vendors for their convenience. Wi-Fi networks installed for operational efficiency cross zone boundaries that the wired network architecture maintains carefully.

The practical implication: network segmentation cannot be trusted without verification. The first step in any OT security engagement is comprehensive network discovery to understand what is actually connected to what — not what the network diagram says should be connected.

Passive Asset Discovery and Network Monitoring

Given the constraint that active scanning can disrupt OT devices, passive network monitoring is the foundation of OT security visibility. Passive monitoring captures all network traffic without injecting any traffic of its own, allowing complete asset discovery and behavioral baseline establishment without risk to operational systems.

Tools specifically designed for OT environments — Claroty, Dragos, Nozomi Networks — understand the protocol landscape of industrial networks: Modbus, DNP3, IEC 61850, PROFINET, EtherNet/IP, OPC-UA. Standard IT network monitoring tools are blind to these protocols and cannot interpret the communication patterns that distinguish normal operation from attack activity.

What passive OT network monitoring reveals: every communicating device and its communication relationships, protocol anomalies that may indicate attack activity or device malfunction, unexpected changes in device behavior (a PLC that has operated identically for three years suddenly changing its communication pattern), unauthorized devices that appeared on the network, and remote access sessions from unexpected sources.

The behavioral baseline for OT networks is unusually stable compared to IT networks. Industrial processes are repetitive by design. A water pump that performs a specific sequence of valve operations every 6 hours for years generates a highly predictable communication pattern. Deviations from that pattern are high-fidelity indicators of either attack activity or equipment malfunction — both of which warrant immediate investigation.

Incident Response in OT Environments

OT incident response has a parameter that IT incident response does not: before any response action, the team must confirm that the action will not cause greater harm than the incident itself. Isolating an infected workstation on an IT network is a straightforward containment action. Isolating the HMI that is the only interface for operators to monitor a running chemical process may be worse than leaving the infection in place while engineering a safer response.

Every OT environment needs a documented OT-specific incident response plan, co-developed with operations, engineering, safety, and security teams, that defines: what constitutes an incident that warrants response, what response actions are permitted without operational authorization, what response actions require engineering approval, and what response actions require plant shutdown. These boundaries must be defined before an incident, not negotiated during one.

The specific scenario that kills OT incident response speed: the security team identifies an active intrusion, escalates to senior leadership, and is told by operations that the affected system cannot be taken offline because doing so would violate a customer supply contract. If that conversation has not been had in advance, the organization will default to availability over security every time, and adversaries who understand this will use it deliberately — timing attacks to coincide with peak operational periods when response actions are most constrained.

Building an OT Security Program From the Ground Up

For organizations beginning their OT security journey, the prioritization is clear from incident data: focus first on the controls that address the attack vectors responsible for the most real-world OT incidents.

Remote access security has been the initial access vector in the majority of OT incidents with confirmed attribution. Every remote access pathway into the OT environment needs to be identified, authorized, and protected with MFA. Vendor remote access tools that allow persistent connections from vendor networks without active approval are a particularly high-risk category — multiple ICS incidents including the 2021 Oldsmar water treatment attack involved unauthorized access through remote desktop tools that had been installed for vendor support purposes.

Network segmentation between IT and OT, even if imperfect, raises the cost of IT-to-OT lateral movement substantially. The DMZ between IT and OT networks should allow only specific, documented data flows: historian replication, patch distribution through a dedicated OT patch management system, and monitored jump server access. All other IT-to-OT communication should be blocked and alerted.

Backup and recovery procedures for OT systems need to be tested against the scenarios that actually occur in OT incidents: not just hardware failure, but configuration corruption, ransomware affecting OT management systems, and the need to restore specific ICS configurations after a control logic modification by an attacker. The restore procedure that has never been tested against a realistic scenario will fail when it is needed.

The organizations that have most significantly improved their OT security posture in the past three years are not the ones that deployed the most tools — they are the ones that committed to understanding what they actually have in their environments, established monitoring visibility proportional to the stakes, and built response procedures that operations teams actually accept and will follow under pressure.

Torres Blackwell is OT Security Lead at AIFox AI, with fifteen years of industrial cybersecurity experience across energy, manufacturing, and water utility sectors. He has led OT security assessments at over 80 industrial facilities across North America and Europe.