NYU, Graduate Division, Computer Science Course, CSCI-GA.3140-001

Abstract Interpretation applied to Verification and Static Analysis

Patrick Cousot
pcousotcsnyuedu

Spring 2019

Description, Prerequisites, Class Hours, First class, Office Hours, Textbook, Homework, Personal Project, Midterm, Final, Academic integrity, Requirements, Grading, Course content

Course Description:

Objective of the course

Modern societies are increasingly dependent upon the proper functioning of computer systems. Hardware and software safety and security is critical in many applications of computer science (for example in cyber-physical systems such as avionics, medical devices, automotive/transportation, but also in enterprise and service software such as accounting, banking, trading, social networking, etc.).

Yet, computer programs are riddled with flaws that at best are inconvenient and annoying, and at worst, have extremely damaging consequences. The Heartbleed buffer over-read security bug in the open-source OpenSSL cryptography library is a recent example among thousands of other less advertised ones, not speaking of the yet to be discovered ones. Software of millions of lines are now common meaning thousands undiscovered potentially disastrous bugs.

This situation is specific to computer science since in other engineering disciplines engineers are penally responsible for there errors while programmers are not, and so no computer scientist is compelled to use best practices. A direct consequence is the use of cheap and ineffective practices such as debugging. ``Program testing can be used very effectively to show the presence of bugs but never to show their absence'' (E. W. Dijkstra).

In software and hardware engineering, formal methods aim at using the rigor of mathematics for the specification, development, and verification of reliable and robust software and hardware systems. Formal methods are now part of the academic and industry standards as shown for example by the DO-178 C primary document for certifying all commercial software-based aerospace systems. In particular, the DO-333 supplement to the DO-178 C discusses the use of sound formal methods as part of a software life cycle. So a firm foundation in formal methods are now an integral part of the core knowledge of every high-level professional computer scientist.

The ultimate objective of formal methods is to verify the correctness of computer systems and to have this verification completely automated, using computers. This objective is hard to achieve because of undecidability: no computer, as ``intelligent'' as it is, can correctly answer with finite time and memory resources every question relative to the correctness of an arbitrary computer system. Therefore all automated formal methods have to make compromises to escape this inevitable limitation of the power (including any form of ``intelligence'') of computer systems.

All automated formal methods must provide answers in finite time on undecidable questions and so have limitations:

Automated theorem proving requires human guidance and assistance, which may be an overwhelming task;
Model-checking can only enumerate the behaviors of finite systems, and small enough to avoid time and memory explosion;
Static program analysis may be unsound or correct but not be precise enough to answer the desired questions.

A consequence of undecidability is that all automated formal verification methods will fail on infinitely many programs. Fortunately, automated formal verification methods will also succeed on infinitely many programs. By succeed, we mean that if a question about a program is answered then this answer can definitely be trusted. Not all formal methods can be trusted. For example program simulation, symbolic execution, or bounded model checking may be good at finding bugs but they are not sound formal verification methods.

It is therefore necessary to study formal methods not only as a recipe to get correct programs but also from a mathematical point of view to understand their respective principles. This is the role of abstract interpretation. Abstract interpretation originated from the need to formalize the soundness and generalize dataflow analysis and typing performed by compilers into fully automated program verification (and is often limited to this understanding). It turned out that the principles recognized by abstract interpretation where able to establish the foundations of all formal methods (preferably sound, but also to study the origin of unsoundness in incorrect ones). Abstract interpretation has also been applied to the design of the most successful static analysis tools such as Astrée.

The general idea is that to state anything formal on a computer system, it is necessary to rigorously define its semantics that is a model of all its possible behaviors in any possible execution environment. Then reasoning on the computer system is done on an abstraction of the properties of this semantics, which has to be derived, often by calculational design, from the semantics. Most often an inductive reasoning is needed, and possible thanks to abstract induction. So the study of formal methods amounts to the study of the abstractions that they are built upon. Examples of applications of the theory of abstract interpretation involve SMT solvers ¹, proof methods ², and static analysers ³;

The course will put an emphasis on the use of abstract interpretation to design proof methods and static analyzers, which now become part of the practice of companies developing huge softare (like Amazon, Facebook, Google, Microsoft, Spotify, Uber, etc, see for example CodeContracts or Infer).

Content

A non-exhaustive list of topics possibly covered in this course is the following

Semantics

Operational semantics, inductive definitions
Programming language semantics are abstract interpretations of the operational semantics
Relational semantics
Reachability semantics

Abstraction

Program properties, lattices, fixpoints
Galois connections
Abstract domains
Fixpoint abstraction (widening, narrowing and their dual)

Verification

Proof methods are abstract interpretations of the programming language semantics
Proof of invariance for sequential and parallel programs (absence of data races)
Proof of liveness for sequential (termination) and parallel programs (non-starvation)
Deductive formal methods, soundness and completeness

Static analysis

Sound static analyses are abstract interpretations of the programming language verification methods
Static analysis based on finite domains (including dataflow analysis, model checking, constraint resolution-based, dependency analysis, etc)
Static analysis based on infinite domains (numerical, type, and data structure analysis)

Benefits

This course provides a deep understanding of formal methods in computer science for safety and security. It provides a scientific methodology, paving the way for safety and security engineering, as well as practical examples of formal methods.

Prerequisites:

The course is opened to PhD students and Master students eager to study at the PhD level, which presupposes basic knowledge and includes the ability to read and the effective reading of recent, up to date, research articles.

Therefore students are assumed to have some basic knowledge in computer science (programming languages, compilation, and operating systems), mathematics (set theory and first-order logic), and to have a good practice of computer programming in any high-level programming languages.

Students without this minimal background in computer science will not really benefit from the course. For mathematics, background material will be provided for self-study to make sure the students master the basics.

Class Hours:

Mondays, 5:10 PM — 7:00 PM, Room 406, 60 Fifth Ave (see the course schedule)

No class on Monday, February 18, 2019 (Presidents' Day), Monday, March 18, 2019 (Spring Recess). The last class is on Monday, May 13, 2019 see the academic calendar.

First class:

Monday, January 28, 2019 (see the academic calendar).

Office Hours:

Mondays, 3:00PM—5:00PM, room 400 at 60 5th Ave., by email appointment to avoid waiting lines.

Textbook:

None, all information (including slides and course notes) is provided online by the instructor.

Homework:

The careful reading of the course notes is mandatory. The homework exercices are facultative and given for encouraging the improvement of one's knowledge and self-training in the perspective of the exams. No answers to homeworks are given. Students are encouraged to discuss and compare their solutions with other students and, in case of doubts, to ask for help during office hours.

Optional personal project:

The project will consists in

Analyzing programs using static analysers and explaining the outcome of the analysis in case of false alarms;
Reading a research paper on abstract interpretation, providing a comprehensive summary in 1200 words (about 4 pages). The papers will be distributed randomly. The summary will be evaluated on the understanding of the background material, the clarity of the presentation, the insights and opinions beyond what was in the paper.

Midterm exam:

Monday, March 12, 2019 (tentative date), 50mn, 5:10PM—6:00PM, Room 406, 60 Fifth Ave. Questions on all previous classes.

Final exam:

Monday, May 20, 2016, 1:50 hours, 5:10PM—7:00PM, Room 406, 60 Fifth Ave Questions on all previous classes.

Academic integrity:

See the NYU computer science department commitment to academic integrity and the Academic Integrity of the College of Arts and Science

Course requirements:

Midterm and final exam. The personal project is facultative, Class attendance is mandatory, studying course notes is highly recommended.

Grading:

Personnal project (25%), midterm exam (25%), and final exam (50%) or midterm exam (50%), and final exam (50%).

The course content is online

^{1. Vijay D'Silva, Leopold Haller, Daniel Kroening:
Abstract satisfaction. POPL 2014: 139-150 ↩}
^{2. Jade Alglave, Patrick Cousot: Ogre and Pythia, An invariance proof method for weak consistency models. POPL 2019 ↩}
^{3. The Astrée static analyzer ↩}