Senior Industrial Automation Engineer Alp Arya on What Functional Safety Standards Teach About Software Designed to Fail

Senior Industrial Automation Engineer Alp Arya on What Functional Safety Standards Teach About Software Designed to Fail

Written by Tech Tired Team, In Technology, Published On
February 18, 2026
, 14 Views

An engineer who has spent two decades certifying that programmable logic controllers, SCADA networks, and safety instrumented systems fail without killing anyone evaluated hackathon projects where failure is the entire point — and found that the principles governing SIL-rated shutdown sequences apply directly to software that embraces collapse.

On March 23, 2005, a hydrocarbon vapor cloud ignited at the BP Texas City refinery, killing fifteen workers and injuring 180 others. The U.S. Chemical Safety Board traced the disaster to cascading failures in process safety management: instruments that misread liquid levels, alarms operators had learned to ignore, and a blowdown drum that vented directly to the atmosphere. Every individual component had been designed to function. The system-level interaction between those components produced a catastrophe.

Alp Arya works in the discipline that exists because of incidents like Texas City. As a Senior Industrial Automation Engineer at KPI Automation in Turkiye, Arya has spent more than twenty years designing, commissioning, and certifying the programmable logic controllers, distributed control systems, and safety instrumented systems that keep industrial processes from killing people. He holds FSEngineer and CMSE certifications from TUV, the German technical inspection authority whose stamp determines whether a safety system is fit for deployment in environments where failure means explosion, toxic release, or mechanical dismemberment.

His professional vocabulary — Safety Integrity Level, Probability of Failure on Demand, Mean Time to Dangerous Failure — is rooted in IEC 61508, the international standard governing functional safety. When System Collapse 2026, organized by Hackathon Raptors, asked twenty-six teams to build software that thrives on instability, Arya brought that vocabulary to bear on eight submissions where failure was not the thing to prevent but the thing to design.

In my work, the system must fail safely — that is the non-negotiable requirement,” Arya explains. “A safety instrumented system detects a dangerous condition and brings the process to a safe state. The failure mode is designed, tested, and certified. What interested me about these hackathon projects is that they also design their failure modes. The intent is different — entertainment, art, education — but the engineering discipline required is the same.”

Narrow Band Operation and the Reactor That Pushes Back

IEC 61508 defines four Safety Integrity Levels, each specifying a target failure rate. SIL 1 permits one dangerous failure per ten thousand hours. SIL 4 — reserved for nuclear and chemical processes — demands no more than one per hundred million hours. The higher the SIL rating, the narrower the operating band and the more rigorously failure modes must be analyzed.

The project System Collapse by team keystone earned 4.60 out of 5.00 in Arya’s batch, with perfect marks in System Design and Creativity. The game presents a world where progress is decay. Every action destabilizes the environment. The goal is not to win but to experience the most aesthetically compelling destruction before the system surrenders.

This is narrow-band operation turned into a game mechanic,” Arya observes. “In a chemical plant, we define safe operating envelopes — temperature ranges, pressure limits, flow rates. Step outside the envelope, and the safety instrumented system intervenes. In System Collapse, the player is constantly pushing the system outside its operating envelope, and the system responds not with a shutdown but with transformation. The engineering question is the same: what happens at the boundary?”

“I evaluate DCS configurations where operators manage process variables within tight tolerances,” Arya says. “A deviation of two percent on a reactor temperature might trigger a pre-alarm. Five percent triggers the safety system. In System Collapse, there is no pre-alarm phase. The deviation is the experience. That is a design choice, and it is the right one for the medium — but it made me think about what happens when you deliberately remove every layer of protection.”

Atomic Interactions and the Chemistry of Emergent Failure

The Atomic Simulator by TheExperimentalists scored 4.60 in Arya’s evaluation — identical to System Collapse. The project strips complex systems down to particle-level interactions: users define rules for attraction, repulsion, and chaos, then observe how atoms organize, merge, evolve, and collapse into emergent patterns. No goals, no scripts. Just physics parameters and the consequences that follow.

Also Read -  Ditch the Machine, Embrace Freedom: Effortless Online Faxing

In industrial process control, this maps directly to what happens inside a reactor or distillation column. A shift in feed composition changes reaction kinetics, which changes heat generation, which changes cooling requirements, which changes pressure. The macro behavior emerges from micro interactions in ways that are deterministic in theory but unpredictable in practice.

“The Atomic Simulator captures something that takes process engineers years to internalize,” Arya explains. “Small parameter changes produce disproportionate system-level effects. In safety engineering, we call this common cause failure — when a single root cause affects multiple components simultaneously. A temperature deviation doesn’t just change one variable. It changes the interaction between every variable in the system.”

Arya connects this to hazard and operability studies — HAZOP — the structured methodology used in process industries to identify deviations from design intent. A HAZOP team applies guide words (more, less, no, reverse, other than) to every process parameter and traces the consequences. The Atomic Simulator allows users to perform an informal HAZOP on their own particle systems: change the attraction coefficient and observe what breaks.

“What this project lacks is the ability to define safety constraints and then violate them,” he notes. “In HAZOP, we ask ‘what if the temperature exceeds design limits?’ The Atomic Simulator lets you set parameters, but it does not let you define which outcomes are unacceptable. That is the gap between a simulation and a safety analysis tool.”

The Editor That Degrades: Human-Machine Interface Under Stress

Raptor Editor by team Bisht takes a different approach to instability. It is a code editor where the development environment itself becomes unstable: the interface morphs between different editor paradigms — VS Code to nano to Notepad to Turbo Pascal — while applying visual distortions (blur, shake, zoom), altering syntax highlighting, corrupting indentation, and randomizing stability parameters. The developer must write functional code while the tool they depend on actively undermines their ability to work.

Arya scored Raptor Editor 4.60 — the same as his two top-rated projects — with perfect marks in both System Design and Creativity. From an industrial automation perspective, the project mirrors a problem that has caused real incidents: SCADA HMI degradation under adversarial conditions.

“In 2015, attackers compromised the SCADA systems at three Ukrainian power distribution companies, causing outages affecting 230,000 customers,” Arya notes. “Part of the attack involved replacing legitimate HMI firmware with corrupted versions — operators could see their screens, but the controls no longer mapped to actual process states. Raptor Editor creates exactly that experience: the interface looks functional, but the relationship between what you see and what you get becomes unreliable.”

“The most dangerous condition in a control room is not an alarm, flood, or system trip. It is when the operator believes the process is in a normal state when it is not,” Arya explains. “Raptor Editor puts the user in that condition deliberately. The syntax highlighting changes, so the visual cues you rely on become misleading. The indentation shifts, so structural information becomes corrupt. You are still writing code, but you cannot trust your perception of what you have written.”

The project reveals something safety engineers have documented extensively: human performance degrades faster than system performance under stress. The editor’s code execution engine remains functional — the instability is in the presentation layer. But the developer’s ability to produce correct output degrades because the feedback mechanisms they depend on have been compromised. This is the human factors problem at the heart of every major industrial accident investigation.

Cellular Automata and Distributed Control Behavior

Conway’s Game of Life has been a staple of computational theory since 1970. Its four rules — underpopulation, survival, overpopulation, reproduction — produce complex emergent behavior from deterministic local interactions. Team Lawless submitted a variant where the rules themselves mutate every N generations, introducing new behaviors and patterns that the original rule set would never produce.

Also Read -  Guide to Torrenting Safely and Efficiently in 2024-2025

Arya scored the Variant of Conway’s Game of Life 4.40, with the highest Technical Execution mark of 5 — his only perfect score in that criterion across the entire batch. The project resonated with his experience in distributed control systems.

“A DCS works on the same principle as Conway’s automata,” he explains. “Each controller node operates on local information — its inputs, its setpoints, its algorithm. The macro behavior of the plant emerges from the interaction of hundreds of these local control loops. When I look at Conway’s Game, I see a simplified model of a distributed control architecture.”

The mutation mechanic takes this analogy further. In production DCS environments, control strategies are not static. Operators adjust setpoints. Engineers tune PID parameters. Software updates modify control algorithms. Over months and years, the control strategy drifts from its original design intent — not through a single dramatic change but through accumulated incremental modifications.

“The lawless team’s variant makes this drift visible in minutes instead of years,” Arya observes. “Every N generations, the rules change. Stable patterns that emerged under the old rules suddenly become unstable. New patterns form. Some survive the next rule change. Some do not. This is exactly what happens in a plant that has been running for a decade — the control strategy has mutated through hundreds of small changes, and nobody fully understands the current behavior because it is the product of accumulated modifications rather than deliberate design.”

In safety engineering, this accumulated drift is called management of change failure — modifications that individually pass review but collectively produce system states that no single review anticipated. The project demonstrates in a visual, immediate way what safety auditors spend weeks uncovering through documentation review and process analysis.

Recovery After Collapse: The Safety Instrumented System Analogy

After the Stroke by the Gladiators team is an evolutionary drawing application where strokes persist, decay, and mutate over time. The system degrades through autonomous processes — the user’s input is the initial condition, but the transformation that follows is driven by the system’s own entropy mechanics. The result is emergent glitch art created through controlled collapse.

Arya scored it 3.90, with the Creativity mark of 5 reflecting appreciation for the concept, but the Technical Execution of 3 indicating implementation limitations. The project connects to a concept central to functional safety: the distinction between fail-safe and fail-operational systems.

“A fail-safe system achieves a safe state by shutting down,” Arya explains. “A gas detection system that trips the process when it detects a hazardous concentration is a fail-safe — the safe state is ‘off.’ A fail-operational system must continue functioning after a failure. An aircraft fly-by-wire system cannot simply shut down. It must continue providing flight control with degraded capability.”

After the Stroke is a fail-operational system by design. When strokes decay and mutate, the application does not crash or reset. It continues operating in a degraded but functional state, producing outputs that are different from — but not less valid than — the original inputs. The system’s identity transforms through its failures while maintaining operational continuity.

“In safety instrumented systems, we define the process safe state and the time required to achieve it. This is the process safety time — the interval between a dangerous condition arising and the consequence occurring. After the Stroke has no process safety time because there is no consequence to prevent. The degradation is the output. This inverts the entire framework: in my work, we design systems to prevent transformation under failure. This project designs systems to produce transformation through failure.”

When the Safety System Itself Fails

Fading Ink by Error 404 received the lowest score in Arya’s batch: 2.70, with a Technical Execution score of 0. The project concept — text that evolves through attention, where ignored words fade and engaged words brighten — is conceptually sound. But the GitHub repository link was broken, preventing code review by multiple judges.

In functional safety, this scenario has a specific name: a dangerous detected failure. The safety system fails, and the failure is apparent. This is actually the preferred failure mode — far better than a dangerous undetected failure, where the safety system fails silently, and the operators believe protection is still active.

Also Read -  Discovering Retroya: A Journey into Nostalgia and Innovation

“Fading Ink had a dangerous detected failure,” Arya says. “The judges could see immediately that the code was inaccessible. The failure was transparent. In SIL calculations, we distinguish between safe failure fraction — the proportion of failures that are either safe or detected — and dangerous undetected failures. A high safe failure fraction means the system either fails safely or fails visibly. Fading Ink failed visibly.”

“I gave Technical Execution a zero not because the concept is poor — the concept is interesting — but because functional safety demands evidence,” Arya says. In IEC 61508, safety claims must be supported by documentation, test results, and traceable evidence. If I cannot review the code, I cannot verify the technical execution claim. A broken repository link is equivalent to a Factory Acceptance Test failure — the system was not verified as functional before delivery.”

Structured Instability: The Common Thread

Across his eight evaluations, Arya identified a pattern that maps directly to the hierarchy of controls used in functional safety. IEC 61511 — the process industry implementation of IEC 61508 — defines layers of protection: inherently safe design first, then process controls, then safety instrumented systems, then physical relief devices, then plant emergency response. Each layer addresses failure modes that the previous layer did not eliminate.

The highest-scoring projects exhibited analogous layering. System Collapse implements instability at the foundational level — the world itself decays. The Atomic Simulator operates at the interaction level — particle rules produce emergent behavior. Raptor Editor targets the interface level — the tool mediates between the user and the work unreliably. Conway’s Game Variant shifts the rule level itself — the laws governing behavior mutate over time.

“In a HAZOP, we trace deviations through layers,” Arya explains. “A temperature excursion in the reactor is the process layer. The control system responding is the BPCS layer. The safety instrumented system tripping is the SIS layer. Each project in my batch focused on a different layer of instability, and the ones that understood which layer they were operating in produced the most coherent results.”

The projects that scored lower tended to mix layers without clarity — instability as decoration rather than architecture. These projects treated instability the way some plants treat safety systems: as additions bolted onto existing designs rather than principles integrated from inception.

“IEC 61508 demands that safety is designed in, not added on,” Arya observes. “You cannot take an unsafe design and make it safe by adding a safety system. The base design must be inherently as safe as practicable, and the safety system addresses the residual risk. The same principle applies here: you cannot take a conventional application and make it unstable by adding screen shake. The instability must be inherent to the system’s architecture, not layered on top of it.”

This insight — that designed instability requires the same engineering discipline as designed safety, applied in the opposite direction — is what twenty years of functional safety work brings to evaluating software that deliberately breaks. The best projects in System Collapse 2026 did not merely fail. They failed with the same rigor and structural integrity that Arya demands from systems certified to protect human life.

“A system that breaks randomly is not interesting to a safety engineer,” Arya concludes. “A system that breaks in designed, reproducible, analyzable ways — that is interesting. Because that is what we build. The only difference is that we build them to prevent harm. These teams built them to create art. The engineering underneath is the same.”

System Collapse 2026 was organized by Hackathon Raptors, a Community Interest Company supporting innovation in software development. The event featured 26 teams competing across 72 hours, building systems designed to thrive on instability. Alp Arya served as a judge evaluating projects for technical execution, system design, creativity, and expression.

Related articles
Join the discussion!