Toward Neuro-Symbolic Natural Language Requirements to Verified Implementations for Model-Based Systems Engineering in SysML v2

Amer Tahat, David Hardin, Isaac Aumndson, Darren Cofer
Collins Aerospace

ABSTRACT

The development of high-confidence software systems requires rigorous specification and
high-assurance code generation. While generative AI offers promise, adopting Large
Language Models (LLMs) in safety-critical workflows remains hindered by hallucinations,
token costs, and limited domain data. We report progress toward integrating state-of-theart Codex software engineering agents into the DARPA PROVERS INSPECTA (Industrial Scale
Proof Engineering for Critical Trustworthy Applications) toolchain, supporting Model-Based
Systems Engineering (MBSE) workflows in SysML v2, utilizing a novel Meta-Rule
methodology. We define Meta-Rules as reusable assets that transfer ephemeral In-Context
Learning (ICL) knowledge into persistent, verified formalization logic; importantly, MetaRules are expert-curated to ensure transferability.

We evaluate this methodology on a number of MBSE examples. From natural language
requirements, our copilot generates GUMBO contracts and HAMR skeletal code, and
employs a closed-loop, self-healing generate–verify–repair cycle, where HAMR/Logika
feedback triggers largely automated, Meta-Rule guided revisions, with minimal user
intervention until system model integration and code-level verification succeed. On the
Isolette benchmark, our approach achieved 100% code-level formal verification for the
predefined Scala application logic in 25 minutes, utilizing approximately 4 million tokens. In
contrast, our baselines failed to achieve end-to-end, formally verifiable code-level artifacts
within 33 minutes, despite consuming nearly 19 million input tokens (including a > 40-page
system requirements PDF). The Meta-Rule–guided approach demonstrated an
approximately 58× increase in task throughput. These results suggest that neuro-symbolic
guardrails can reconcile the generative capability of LLMs with formal rigor in the
development of realistic high-assurance systems. We conclude by discussing current
limitations and outlining future directions for scaling this neuro-symbolic architecture to
broader cyber-physical system applications.

Dr. Amer Tahat is a Senior Engineer at Collins Aerospace/RTX with over 13 years of experience across industry and academia in neuro-symbolic AI, formal methods, and GenAI safety and security for critical systems. He specializes in building trustworthy MBSE and coding copilots that support critical systems development from requirements to implementation. Previously, he served as an Assistant Research Professor at Penn State and Virginia Tech. He earned his Ph.D. from Michigan Technological University, with research co-supervised by NASA Langley Research Center, and has led and contributed to multiple DARPA-, ONR-, and NSF-supported research projects.

 

Submitted by Katie Dey on