A heat exchanger in a refinery fails at 3 AM. The failure cascades: the process unit trips, production stops, and the plant loses $300,000 in revenue before the sun comes up. The failure was predictable. Temperature sensors showed a 2°C drift over the previous week. Pressure readings were creeping upward. The warning signs were there. They just weren't recognized as warning signs.
This is the predictive maintenance problem in industrial settings. Equipment generates terabytes of sensor data, but most of it is noise until something breaks. By the time alarms trigger, the failure is already happening. What you need is a system that knows what the equipment should be doing at every instant, so it can detect deviations before they become failures.
The traditional approach and why it fails
The standard predictive maintenance industrial workflow relies on one of two approaches. The first is rule-based: set threshold alarms on sensor readings and hope you catch problems before they cascade. This works for catastrophic failures (bearing vibration exceeding 10 mm/s means the bearing is disintegrating), but it misses the slow degradation modes that account for most unplanned downtime.
The second approach is machine learning: train a model on historical sensor data to predict failures. This works when you have thousands of failure examples. Most industrial equipment doesn't fail that often, which is the whole point of maintenance, so you're training on a handful of events, trying to predict an even smaller handful of future events. The model overfits to noise, produces false positives, and operators learn to ignore it.
Physics-based digital twins: prediction from first principles
A physics-based digital twin is a computational model of an industrial process that runs in real time, ingesting live sensor data and computing what the process should be doing based on conservation of mass, energy, and momentum. It doesn't need historical failure data because it doesn't learn from failures. It computes expected behaviour from the underlying physics. This approach to combining accurate data with rigorous simulation is central to our mission.
When the physical process deviates from the digital twin's prediction, that deviation is an anomaly. Not a sensor malfunction. Not a noise spike. An actual physical departure from expected behaviour, which means something in the process has changed. The digital twin doesn't just tell you there's a problem. It tells you what changed, where, and by how much, which shows up in the governing equations.
Real-time anomaly detection before alarms
Consider the heat exchanger example. The digital twin ingests temperature sensors at the inlet, outlet, and tube wall, along with flow rates and pressures. It solves the coupled heat transfer and fluid flow equations at every timestep (typically once per second) and predicts what those sensor readings should be.
When fouling begins to accumulate on the tube walls, the heat transfer coefficient drops. The outlet temperature drifts downward relative to the prediction. The drift is small (maybe 0.5°C) and well within the normal operating range, so no alarm triggers. But the digital twin knows this isn't noise. The heat transfer equation says the temperature should be higher. The deviation is systematic, it's growing over time, and it localizes to the tube wall boundary condition.
The system flags this as an anomaly three days before the alarm would have triggered, and seven days before the exchanger would have failed. The maintenance team inspects the tubes, confirms early-stage fouling, and schedules a cleaning during the next planned shutdown. No unplanned downtime. No lost revenue. No 3 AM emergency.
Why physics beats ML-only for industrial predictive maintenance
Machine learning models are interpolators. They work when the future looks like the past. Industrial equipment fails in novel ways: a new fouling mechanism, an unexpected corrosion mode, a supply chain change that alters feed composition. ML models trained on historical data won't predict these because they've never seen them before.
Physics-based models are extrapolators. They compute behaviour from governing equations, not from past examples. If a new failure mode changes the physics (by altering heat transfer, flow resistance, or reaction kinetics), the digital twin detects it immediately because the sensor readings no longer satisfy the conservation laws. You don't need training data for every possible failure mode. You need accurate physics.
That said, physics and ML are not mutually exclusive. The most effective approach is hybrid: use physics to compute expected behaviour and detect anomalies, then use ML to classify anomaly patterns and predict time-to-failure based on the rate of deviation. The physics gives you sensitivity to novel failures. The ML gives you pattern recognition for recurring ones.
Multi-scale physics for complex industrial processes
Industrial processes span an enormous range of physical scales. A refinery catalytic cracker involves molecular-scale reactions on catalyst surfaces, millimeter-scale fluid dynamics in the riser, and meter-scale circulation patterns in the vessel. A useful digital twin needs to capture all of these scales simultaneously, including their coupling.
This is where most digital twin implementations fail. They simulate the process at a single scale, usually continuum-level fluid dynamics, and use empirical correlations for everything happening at smaller scales. When the catalyst deactivates or the feed composition changes, the correlations break down, and the digital twin stops matching reality.
LiveSim handles this by running concurrent simulations at multiple scales. Molecular dynamics simulations compute reaction kinetics and transport properties on the catalyst surface. Those properties feed into a continuum-scale CFD simulation of the riser flow. The CFD simulation predicts local temperature and concentration fields, which feed back into the molecular simulations to update the kinetics. The entire coupled system runs faster than real time, so the digital twin stays synchronized with the physical process.
Technical implementation: from sensors to predictions
A physics-based digital twin for predictive maintenance industrial applications requires four technical components. First, real-time sensor integration. LiveSim connects to existing SCADA, DCS, and IoT infrastructure via standard protocols (OPC-UA, MQTT, REST API) with sub-second latency. Sensor data flows directly into the solver input buffers without intermediate storage or processing.
Second, GPU-native solvers that run faster than real time. A digital twin that can't keep up with the physical process is just a slow simulation. Our solvers are designed for GPU execution from the ground up, achieving throughput high enough that a single-GPU workstation can maintain a live digital twin of a process unit with hundreds of sensors and millions of mesh points.
Third, automatic state estimation. Sensors don't measure every variable the solver needs. You have temperature and pressure readings, but the solver needs velocity fields, species concentrations, and boundary layer properties. State estimation algorithms infer these unmeasured variables from the measured ones, using the physics equations as constraints. This is inverse problem solving, not curve fitting.
Fourth, anomaly scoring with uncertainty quantification. The difference between a sensor reading and the digital twin prediction is not, by itself, an anomaly. Sensors have noise. Simulations have modeling error. The anomaly score accounts for both, reporting deviations in terms of statistical significance. A 5-sigma deviation is a real anomaly. A 2-sigma deviation might be noise.
Fewer sensors, same root cause analysis
Traditional threshold-based alarm systems are direct measurement systems. To know if a valve is failing, you put a sensor on the valve. To know if a pump bearing is overheating, you put a temperature sensor on the bearing. To monitor fouling across 50 heat exchanger tubes, you need 50 sensors. The system can only alert you to problems where you have direct instrumentation.
Physics-based digital twins break this constraint. Because the solver computes the full state of the system from governing equations, it can infer what's happening at unmeasured locations using measurements from elsewhere. You don't need a sensor on every valve to detect which valve is degrading. You need inlet and outlet conditions, and the physics equations tell you what must be happening in between.
Consider a process line with 30 control valves, each with potential fouling, wear, or actuator drift. A traditional monitoring approach requires instrumenting each valve: 30 pressure sensors, 30 position sensors, 90 measurement points total. With a physics-based digital twin, you instrument the line inlet and outlet, plus 4-5 validation points along the line. The solver computes pressure drops and flow splits at every valve using conservation of mass and momentum. When a valve starts to foul, the deviation shows up in the computed valve characteristics even without a direct sensor.
This is not interpolation or pattern matching. It's inverse problem solving. The physics equations define exact relationships between measured and unmeasured variables. Given a sparse set of boundary measurements, the solver inverts the governing equations to infer the interior state. The result is root cause attribution at every point in the process, not just where you have sensors.
The economic impact is substantial. A refinery with 200 heat exchangers might traditionally require 2,000+ measurement points for full-coverage monitoring: thermocouples on tube walls, pressure taps at inlet/outlet, flow meters on every stream. With physics-based state estimation, you can achieve equivalent diagnostic capability with 400-500 sensors at critical boundaries and validation points. At $5K-15K per installed sensor, that's $8M-$24M in avoided capital cost, plus ongoing savings in calibration and maintenance.
Root cause attribution: beyond anomaly detection to diagnosis
When an ML-based predictive maintenance system detects an anomaly, it reports a score and maybe a feature importance ranking. When a physics-based digital twin detects an anomaly, it tells you which physical process is responsible. This is because the anomaly manifests as a violation of a specific conservation law or constitutive relation.
Return to the heat exchanger fouling example. The anomaly doesn't just appear as a temperature deviation. It appears as a deviation in the heat flux through the tube wall, which is computed from Fourier's law. The digital twin can partition the heat flux into contributions from convection on the shell side, conduction through the tube wall, and convection on the tube side. The fouling shows up as a decrease in the tube-side convection coefficient. That's not an empirical pattern. That's a physical diagnosis.
This level of attribution is critical for maintenance planning. Knowing that a heat exchanger is degrading tells you to schedule maintenance. Knowing that the degradation is tube-side fouling tells you to plan a chemical cleaning, not a mechanical inspection. Knowing the fouling rate tells you whether you can wait until the next planned shutdown or need to intervene sooner.
Real-world impact: downtime prevention in practice
A large petrochemical facility operates a steam cracker with 40+ heat exchangers in the convection section. These exchangers foul gradually from coke deposition, reducing efficiency and eventually forcing shutdowns for decoking. The plant runs on a fixed maintenance schedule: shutdown every 60 days for decoking, regardless of actual fouling state.
After deploying a physics-based digital twin, the facility switched to condition-based maintenance. The digital twin monitors heat transfer efficiency in real time and predicts remaining run length before fouling forces a shutdown. Some exchangers now run 90 days between cleanings. Others get cleaned at 45 days because the digital twin detected accelerated fouling from a feed composition change.
The result: 18% reduction in unplanned downtime, 12% increase in average run length, and $4.2M annual savings from optimized maintenance scheduling. Critically, zero surprise failures occurred in the two years since deployment. Every shutdown was predicted, planned, and executed during scheduled maintenance windows.
What this means for industrial operations
Predictive maintenance with physics-based digital twins shifts the maintenance paradigm from reactive to prescriptive. Instead of responding to alarms, which means responding to failures already in progress. Maintenance teams respond to predictions of future failures while there's still time to plan the intervention.
The economic case is straightforward. Unplanned downtime costs 5-10x more than planned downtime because you lose production, you pay overtime for emergency repairs, and you often incur secondary damage from cascade failures. Eliminating even a fraction of unplanned downtime pays for the digital twin implementation within months.
The technical case is that physics-based approaches work for sparse failure data, generalize to novel failure modes, and provide root cause attribution that enables precise maintenance interventions. This is not a replacement for human expertise. It's a tool that gives your maintenance team superhuman foresight about what's about to break and why. Learn more about our approach to physics-based simulation.