MID-088: Formally Verified Parsers

Mitigation Level: Leading

Description

Parsers frequently have vulnerabilities that threat actors can exploit with malicious inputs. These can include parsers for the network protocol messages being ingested and/or the underlying data contained in that protocol, such as an incoming file, document, or other structured payloads. Exploits may take advantage of memory issues such as double-fetch memory calls, buffer overflows, and array out-of-bounds accesses. Another common weakness is a so-called “parser differentials”, where the attacker can exploit mismatches in behavior between two parsers for the same language. To avoid the presence of vulnerabilities, parser creators can use formal verification to ensure the functional correctness of the parsers with regard to a particular formal definition of the format.

In general, formal verification is the process of proving, typically with an aid of automated or semi-automated tooling, that a particular code module or function satisfies the desired requirements. Depending on the size and complexity of the code to be verified, formal verification can be a time-consuming effort requiring significant expertise. One way for developers to leverage formal verification at lower cost is to use pre-existing code and libraries that has already been formally verified, or to leverage tooling that can generate “correct-by-construction” code from specifications.

Fortunately, formally verified parser generators already exist. In addition to ensuring correctness, the formal verification of parsers typically provides significant security guarantees, such as guaranteeing memory and type safety, and preventing deadlock states and crashes. As a result, parser implementors and users can be sure that they will not have bugs or vulnerabilities in their code. Formally verified parsers therefore can be trusted to reject improperly structured inputs as defined by the parser specification. For example, they can be trusted to not have buffer overflows that could cause arbitrary code execution vulnerabilities when verified with respect to memory safety. Moreover, formally verified parsers can be trusted to produce a well-formed output, where each data item would have the expected size and type, thereby protecting downstream code from the effects of malicious inputs.

A best practice to create trustworthy network parsers is by using formally verified parser generators that will either use formal tooling to prove their output correct, or have already been verified to always produce correct results. Generally, use of these generators involves creating a specification to describe a protocol and then feeding that specification through the generator. The generator will output usable code, assembly, and/or binaries. By having developers only create a specification, but not actually write code themselves, assurances can be provided that the final code has undergone the formal verification process and is functionally correct with regard to the specified format. Therefore, it will be free of any potential for undefined behaviors, no matter how malicious the input. Often the same protocol formal specification can be used to generate parsers in different programming languages for multiple environments, helping to ensure that different implementations will act consistently and eliminating the potential for parser differential vulnerabilities.

Considerations: Unlike most other uses of formal methods, the use of formally verified parser generators typically does not require any special expertise. It does however require capturing the target format(s) in a formal (mathematically precise) definition, which can sometimes be fulfilled in a specification for a generator. While this process can be time-consuming, it can typically be a beneficial action in its own right, as it removes any ambiguities that are almost certainly present in any less-formal specifications, such as those written in natural languages.

Limitations: Only properties that were scoped into the specification and/or generator verification of the parser can be guaranteed to be in the resulting parser. Errors and vulnerabilities may still occur on the boundary of the formally verified code. If any changes, even seemingly trivial ones, are made to the generated code (such as to optimize performance), all guarantees provided by the formal methods tooling are invalidated.

IEC 62443 4-2 Mappings

CR – 3.5 Input Validation

References

[1] T. Ramananandro. A. Rastogi, N. Swamy. “EverParse: Hardening critical attack surfaces with formally proven message parsers.” microsoft.com. Accessed: Mar. 18, 2025. Available: https://www.microsoft.com/en-us/research/blog/everparse-hardening-critical-attack-surfaces-with-formally-proven-message-parsers/[2] D. Wallach. “SafeDocs: Safe Documents.” darpa.mil. Accessed: Apr. 01, 2025. Available: https://www.darpa.mil/research/programs/safe-documents

[3] S. Ali and S. W. Smith, “A Survey of Parser Differential Anti-Patterns,” 2023 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 2023, pp. 105-116, doi: 10.1109/SPW59333.2023.00016.

[4] H. Foster. “Formal Verification.” verificationacademy.com. Accessed: Mar. 18, 2025. Available: https://verificationacademy.com/topics/formal-verification/

[5] Y. Cai. “Vest.” github.com. Accessed: Mar. 18, 2025. Available: https://github.com/secure-foundations/vest?tab=readme-ov-file