NYU, Graduate Division, Computer Science Course, CSCI-GA 3033-017 Abstract Interpretation
Patrick Cousot pcousotcsnyuedu
Spring 2017 |

Description, Prerequisites, Schedule, First class, Office Hours, Textbook, Homework, Personal Project, Midterm, Final, Academic integrity, Requirements, Grading, Course content

Modern societies are increasingly dependent upon the proper functioning of computer systems. Hardware and
software safety and security is critical in many applications of computer science
(for example in cyber-physical systems such as avionics, medical devices,
automotive/transportation, but also in enterprise and service software
such as accounting, banking, trading, social networking, *etc*.).

Yet, computer programs are riddled with flaws that at best are inconvenient and annoying, and at worst, have extremely damaging consequences. The Heartbleed buffer over-read security bug in the open-source OpenSSL cryptography library is a recent example among thousands of other less advertised ones.

This situation is specific to computer science since in other engineering disciplines engineers are penally responsible for there errors while programmers are not, and so no computer scientist is compelled to use best practices. A direct consequence is the use of cheap and ineffective practices such as debugging. ``Program testing can be used very effectively to show the presence of bugs but never to show their absence'' (E. W. Dijkstra).

In software and hardware engineering, formal methods aim at using the rigor of mathematics for the specification, development, and verification of reliable and robust software and hardware systems. Formal methods are now part of the academic and industry standards as shown for example by the DO-178 C primary document for certifying all commercial software-based aerospace systems. In particular, the DO-333 supplement to the DO-178 C discusses the use of sound formal methods as part of a software life cycle. So a firm foundation in formal methods are now an integral part of the core knowledge of every high-level professional computer scientist.

The ultimate objective of formal methods is to verify the correctness of computer systems and to have this verification completely automated, using computers. This objective is hard to achieve because of undecidability: no computer can correctly answer with finite time and memory resources every question relative to the correctness of an arbitrary computer system. Therefore all automated formal methods have to make compromises to escape this inevitable limitation of the power (including any form of ``intelligence'') of computer systems.

All automated formal methods must provide answers in finite time on undecidable questions and so have limitations:

- Automated theorem proving requires human guidance and assistance, which may be an overwhelming task;
- Model-checking can only enumerate the behaviors of finite systems, and small enough to avoid time and memory explosion;
- Static program analysis may be unsound or not be precise enough to answer the desired questions.

It is therefore necessary to study formal methods not only as a recipe to get correct programs but also from a mathematical point of view to understand their respective principles. This is the role of abstract interpretation. Abstract interpretation originated from the need to formalize the soundness and generalize dataflow analysis and typing performed by compilers into fully automated program verification (and is often limited to this understanding). It turned out that the principles recognized by abstract interpretation where able to establish the foundations of all formal methods (preferably sound, but also to study the origin of unsoundness in incorrect ones).

The general idea is that to state anything formal on a computer system, it is necessary to rigorously define its semantics that is a model of
all its possible behaviors in any possible execution environment. Then reasoning on the computer system is done on an abstraction
of the properties of this semantics, which has to be derived, often by calculational design, from the semantics. Most often an
inductive reasoning is needed, and possible thanks to abstract induction.
So the study of formal methods amounts to the study of the abstractions that
they are built upon. Examples of applications of the theory of abstract interpretation involve
SMT solvers ^{1},
proof methods ^{2},
and static analysers ^{3};

The course will put an emphasis on the use of abstract interpretation to design proof methods and static analyzers, which now become part of the practice of companies developing huge softare (like Facebook, Google, Microsoft, Spotify, Uber, etc, see for example CodeContracts or Infer).

- Semantics
- Operational semantics, inductive definitions
- Programming language semantics are abstract interpretations of the operational semantics
- Relational semantics
- Reachability semantics
- Abstraction
- Program properties, lattices, fixpoints
- Galois connections
- Abstract domains
- Fixpoint abstraction (widening, narrowing and their dual)
- Verification
- Proof methods are abstract interpretations of the programming language semantics
- Proof of invariance for sequential and parallel programs (absence of data races)
- Proof of liveness for sequential (termination) and parallel programs (non-starvation)
- Deductive formal methods, soundness and completeness
- Static analysis
- Sound static analyses are abstract interpretations of the programming language verification methods
- Static analysis based on finite domains (including dataflow analysis, model checking, constraint resolution-based, etc)
- Static analysis based on infinite domains (numerical, type, and data structure analysis)

The course is opened to PhD students and Master students eager to study at the PhD level, which presupposes basic knowledge and includes the ability to read and the effective reading of recent, up to date, and possibly difficult research articles.

Therefore students are assumed to have previously successfully studied courses in mathematics (set theory and first-order logic), programming languages, compilation, and operating systems, to have a good practice of computer programming in any high-level programming languages.

Students without this minimal background will not really benefit from the course.

Background mathematical material will be provided for self-study to make sure the students master the basics.

No class on Monday, February 20, 2017 (Presidents' Day), Monday, March 13, 2017 (Spring Recess). The last class is on Monday, May 8, 2017 see the academic calendar.

- Analyzing programs using static analysers and explaining the outcome of the analysis in case of false alarms;
- Reading a research paper on abstract interpretation, providing a comprehensive summary in 1200 words (about 4 pages). The papers will be distributed randomly. The summary will be evaluated on the understanding of the background material, the clarity of the presentation, the insights and opinions beyond what was in the paper.

© P. Cousot 2017