Architecture-centric Strategies for Addressing Challenges in Software-reliant Safety-critical Systems

Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213

Peter H. Feiler Oct 29, 2013

Software Engineering Institute Carnegie Mellon

© 2012 Carnegie Mellon University

#### Copyright 2013 Carnegie Mellon University

This material is based upon work funded and supported by the Department of Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center.

Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Department of Defense.

NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.

This material has been approved for public release and unlimited distribution.

This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at permission@sei.cmu.edu.

DM-0000708



Software Engineering Institute Carnegie Mellon

2

#### Outline

Software Related Safety Hazards SAE AADL and Virtual Integration System and SW Architecture Fault Modeling & Analysis From Requirements to Managed Verification Evidence



**Carnegie Mellon** 

#### High Fault Leakage Drives Major Increase in Rework Cost



## **Mismatched Assumptions in Embedded SW**



Software Engineering Institute

**Carnegie Mellon** 

## **Software-Based Latency Contributors**

Execution time variation: algorithm, use of cache

**Processor speed** 

**Resource contention** 

Preemption

Legacy & shared variable communication

Rate group optimization

Protocol specific communication delay

Partitioned architecture

Migration of functionality

Fault tolerance strategy

Impact of Scheduler Choice on Control System Stability

A. Cervin, Lund U., CCACSD 2006



Flow Use Scenario through Subsystem Architecture

Software Engineering Institute | CarnegieMellon

## **Potential Model-based Engineering Pitfalls**



The system

Inconsistency between independently developed analytical models







System models

Lack of confidence that model reflects implementation



System implementation

#### Aircraft industry experience has led to single truth requirement in the System Architecture Virtual Integration (SAVI) initiative



Software Engineering Institute Carnegie Mellon

#### Outline

Software Related Safety Hazards SAE AADL and Virtual Integration System and SW Architecture Fault Modeling & Analysis From Requirements to Managed Verification Evidence



Challenges & Strategies for SW-Reliant Systems Feiler, Oct 29, 2013 © 2013 Carnegie Mellon University

**Carnegie** Mellon

## SAE Architecture Analysis & Design Language (AADL) for Software-reliant Systems



**Software Engineering Institute** CarnegieMellon

Challenges & Strategies for SW-Reliant Systems Feiler, Oct 29, 2013 © 2013 Carnegie Mellon University

9

## The SAE AADL Standard Suite (AS-5506 series)

#### Core AADL language standard (V2.1-Sep 2013, V1-Nov 2004)

- Strongly typed textual & graphical notation, Meta model & XMI interchange format
- Thread, process, system, processor, memory, bus, device, virtual processor, virtual bus
- Sampling and queuing ports, (non)deterministic sampling, end-to-end flows, modes
- Dispatch protocol, scheduling protocol, input/output timing and rates, queuing behavior
- Packages, refinement/extensions, abstract components and features, parameterization

#### AADL Meta model & XMI/XML standard

Model interchange & tool interoperability

#### AADL Annexes (Extensions) [2006, 2012]

- Error Model Annex for dependability analysis
- ARINC653 Annex for partitioned architectures
- Behavior Annex for formal behavior specification
- Data Modeling Annex for interfacing with data models
- Requirements Definition and Analysis Annex

Software Engineering Institute

- Constraint Annex
- Code Generation Annex

**Carnegie** Mellon

## **System Level Fault Root Causes**

#### Processing of Data Streams in Time-Sensitive Manner

- Stream miss rates, Mismatched data representation, Latency jitter & age
- Sampling, frame-level jitter, and loss of state change data/events
- Use of partitioned architectures (virtual machines) for fault containment
  - Mixed criticality in safety and security concerns
  - Logical vs. physical redundancy of resources
  - Virtualization of time and time sensitive processing
  - Asynchronous systems

#### **Inconsistent System States & Interactions**

- Modal systems with modal components
- Failure and operational modes
- Replicated, mirrored, and coordinated state machines

#### Resource management

- Resource budgets for processor, memory & networks
- Mismatch of resource demand and capacity
- Unmanaged computer system resources

End-to-end latency analysis Port connection consistency

Virtual processors & buses Synchronization domains

Fault modeling Security analysis Architectural redundancy patterns

Resource budget analysis & task roll-up analysis Resource allocation & deployment configurations



Software Engineering Institute Carnegie Mellon



Software Engineering Institute | CarnegieMellon

## Early Discovery and Incremental V&V through Virtual



**Proof of Concept Demonstration and Transition by Aerospace industry initiative** 

- Propagate requirements and constraints
- Higher level model down to suppliers' lower level models
- Verification of lower level models satisfies higher level requirements and constraints
- Multi-tier system & software architecture (in AADL)
- Incremental end-to-end validation of system properties

**Software Engineering Institute** CarnegieMellon



#### Outline

Software Related Safety Hazards SAE AADL and Virtual Integration System and SW Architecture Fault Modeling & Analysis From Requirements to Managed Verification Evidence



**Carnegie Mellon** 

## **AADL Error Model Scope and Purpose**

System safety process uses many individual methods and analyses, e.g.

Subsystem

System

**Component**) Capture FMEA model

- hazard analysis
- failure modes and effects analysis
- fault trees
- Markov processes

Goal: a general facility for modeling fault/error/failure behaviors that can be used for several modeling and analysis activities.

Annotated architecture model permits checking for consistency and completeness between these various declarations.

Related analyses are also useful for other purposes, e.g.

- maintainability
- availability
- Integrity
- Security

SAE ARP 4761 Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment Demonstrated in SAVI Wheel Braking System Example

Error Model Annex can be adapted to other ADLs

**Capture hazards** 

Capture risk mitigation architecture



Software Engineering Institute Carnegie Mellon

## **Error Model V2 Annex Overview**

#### Three levels of abstraction:

- Focus on fault interaction with other components
- Focus on fault behavior of components
- Focus on fault behavior in terms of subcomponent fault behaviors

Specification of expected fault management strategy and realization

• Voting logic, error detection, recovery, repair





**Carnegie Mellon** 

#### Incoming/Assumed

#### **Outgoing/Guarantee**

Error propagation and flow specification supports fault impact analysis based on a Fault Propagation and Transformation Calculus (FPTC)

Software Engineering Institute

## An Extensible Set of Error Propagation Types

User definable error types, type sets, type hierarchies, and type products Can be combined to characterize failure modes, resulting error states, and types of error propagation



# Original Preliminary System Safety Analysis (PSSA)



System engineering activity with focus on failing components.



#### **Discovery of Unexpected PSSA Hazard**

Software Engineering Institute



© 2013 Carnegie Mellon University

Feiler, Oct 29, 2013

Systems

**Carnegie Mellon** 

**Challenges & Strategies for SW-Reliant** 

## **Recent Automated FMEA Experience**

Failure Modes and Effects Analyses are rigorous and comprehensive reliability and safety design evaluations

- Required by industry standards and Government policies
- When performed manually are usually done once due to cost and schedule
- If automated allows for
  - multiple iterations from conceptual to detailed design
  - Tradeoff studies and evaluation of alternatives
  - Early identification of potential problems

| ID | Item        | Initial State | Initial Failure Mode | 1st Level Effect | Transition                            | 2nd Level Effect | Transition | 3rd Level Effect | Severity                               | M      |
|----|-------------|---------------|----------------------|------------------|---------------------------------------|------------------|------------|------------------|----------------------------------------|--------|
| 1  | Sat_Bus     | Working       | Failure              | Failed           |                                       | Failed           | Recovery   | Working          |                                        | Workir |
| 1  | Sat_Payload | Working       |                      | Working          | Bus failure causes payload transition | Standby          |            | Standby          | Bus Recovery Causes Payload Transition | Workir |
| 2  | Sat_Bus     | Working       |                      | Working          |                                       | Working          | 5          |                  |                                        |        |
| 2  | Sat_Payload | Working       | Failure              | Failed           | Recovery                              | Working          | 5          |                  |                                        |        |

Largest analysis of satellite to date consists of 26,000 failure modes

- Includes detailed model of satellite bus
- 20 states perform failure mode
- Longest failure mode sequences have 25 transitions (i.e., 25 effects)

Myron Hecht, Aerospace Corp. Safety Analysis for JPL, member of DO-178C committee

oftware Engineering Institute CarnegieMellon

## **Automation of Safety Analysis Practice**

#### A public Aircraft Wheel Brake System model

https://wiki.sei.cmu.edu/aadl/index.php/ARP4761\_-\_Wheel\_Brake\_System\_%28WBS%29\_Example

Error Sources

NoService /

NoPressure

oftware and/o

Use of Error-Model and ARINC653 annexes Relevance for the avionics community

Comparative study

Federated vs. Integrated Modular Avionics (IMA) architecture

Support of SAE ARP 4761 System Safety Assessment Practice Hazards (FHA), Fault Trees (FTA), Fault Impact (FMEA) Reliability/Availability (Markov Chain/Dependence Diagram)



## **Scalability and Incremental Safety Analysis**



Software Engineering Institute Carnegie Mellon

## **Archetype-based Fault & Hazard Identification**

Application interaction architecture patterns

- Feedback control system
- Data, event, message, command streams
- State-based interaction protocols
- Multi-tier service layers

#### Fault management architecture patterns

- Redundancy
- Monitoring & recovery
- Partitions

Application as well as fault management architecture patterns have fault & hazard potential

| Flow    | Service  |   |                   | Timi  | ng   | Value    |         |   |
|---------|----------|---|-------------------|-------|------|----------|---------|---|
| Туре    | Omission |   | Commission        | Early | Late | Subtle   | Coarse  |   |
| Boolean | No Data  | ı | Extra Data        | Early | Late | Stuck at | N/A     | J |
| Value   |          | E | «<br>xample: Part |       |      | • • •    | ion Ice |   |
| Complex |          |   |                   |       |      |          | istent  |   |

Pre-analyzed architecture patterns enable analysis of potentially highrisk safety-criticality areas



Fault & hazard types in common architecture patterns as starting point for FHA, FTA, FMEA, root cause analysis, and IV&V



**Software Engineering Institute** | CarnegieMellon

25

#### Outline

Software Related Safety Hazards SAE AADL and Virtual Integration System and SW Architecture Fault Modeling & Analysis From Requirements to Managed Verification Evidence



**Carnegie Mellon** 

## **Towards Analyzable Requirements Specification**

#### Best practice industry study for FAA [2009]

- Primarily textual shalls , tables, and diagrams, MS Word and DOORS
- FAA Requirements Engineering Management (REM) Handbook [2009]
  - Draws on SpecTRM, Rockwell Collins experience with model checking
  - 11 step process with avionics and medical device examples
- Requirements Definition and Analysis Annex
  - Separation of concerns: problem (requirements) / solution (design)
  - Incorporates concepts from SysML, KAOS, URN, FAA REM Handbook
  - Goals, requirements, refinement, decomposition, verification, risks
  - Semantics: validation of requirement specifications, verification of formally specified requirements
  - Extensible with respect to constraints, use cases, and traceability links
  - Demonstrated on FAA REM Handbook process with medical device example

**Carnegie Mellon** 

Applicable to AADL and other ADLs

Software Engineering Institute

### Formal V&V of Safety-critical System Requirements as Early Evidence

Formalized requirements specification as best practice

- SCR (four variable model) [Parnas], SpecTRM [Leveson]
- From system to software requirements: system state behavior
- From hazards to safety requirements: intent & rationale
- Environmental assumptions, human factors

Pilots connecting safety-critical requirements to architecture & design

- FHA, FTA, FMEA based on AADL/Error Model Annex [Vestal, Hecht, Delange]
- SpecTRM & JPL Goal-oriented Mission State Analysis [Leveson/Weiss]
- JPL State Analysis & AADL/MBE [Weiss/Feiler NASA IV&V funded]

Pilots showing value of formal architecture-centric V&V

- Model checking to validate coverage of safe & unsafe system states by safety requirements [Tribble/Miller/Whalen, Nguyen/Noll]
- Verification of redundancy mode logic in nominal & abnormal conditions results in design revision, which introduces two new critical hazards [Miller/Whalen, DeNiz]

Software Engineering Institute Carnegie Mellon

Challenges & Strategies for SW-Reliant Systems Feiler, Oct 29, 2013 © 2013 Carnegie Mellon University

28

Reflected in: Requirements Engineering Management Handbook (FAA 2009)

## **Understanding the Cause of Faults**

Software Engineering Institute

Through model-based analysis identify architecture induced unhandled, testable, and untestable faults and understand root causes, contributing factors, impact, and potential mitigation options.



**Demonstrated in COMPASS project** 

Use of text templates as formalism frontend

Carnegie Mellon

29

## **Stepper Motor Case Study**

Stepper motor (SM) controls a valve

- Commanded to achieve a specified valve position
- Controller instructs the motor to move in up 15 step increments per 25ms frame
- Execution time jitter & health monitor preemption causes missed steps

#### Architecture Fault Model Analysis

- Fault impact analysis identifies multiple sources of missed steps
  - Early arrival of step increment commands
  - Step increment command rate mismatch
  - Transient message corruption or loss
- Understanding of error cause
  - When is early too early
  - Guaranteed delivery assumption for step increment commands

Software Engineering Institute



Software modeled and verified in SCADE

Carnegie Mellon © 2013 Carnegie Mellon University

## Assurance by Confidence Maps

Iterative process between fault analysis and confidence mapping

- Fault analysis focuses on system hazards
- Multi-legged confidence mapping address process related defeaters as well



Use in comparative architecture study

Software Engineering Institute

**Carnegie Mellon** 

#### Increased Confidence through Model-based Analysis and Testing Evidence throughout Life Cycle



© 2013 Carnegie Mellon University

## References

Website www.aadl.info

Public Wiki https://wiki.sei.cmu.edu/aadl User Days, publications

AADL Book in SEI Series of Addison-Wesley http://www.informit.com/store/product.aspx?isbn=0321888944

System Architecture Virtual Integration: An Industrial Case Study, SEI-2009-TR-017

Reliability Improvement and Validation Framework, SEI-2012-SR-013

Confidence-Based Architecture Fault Analysis, SEI-2013-TR-011 (in review)

#### Peter H. Feiler

Telephone: +1 412-268-7790

Email: phf@sei.cmu.edu

Software Engineering Institute Carnegie Mellon

33