A Tour of Applied Program Analysis and Domain-Specific Automated Testing
Course Description
This course studies the nature of software bugs and security vulnerabilities arising in complex application domains and surveys specialized program analysis + automated testing techniques for identifying such issues proactively. The course will take a tour of various domains such as mobile systems, databases, web browsers, distributed and networked systems, autonomous vehicles, and smart contracts. For each domain, the class will discuss state-of-the-art research techniques that aim to uncover a special class of software bugs automatically. Apart from the literature review, students will engage significantly with software system design + engineering via a semester-long project, which will involve working with real-world applications and analysis tools for one or more domains.
Logistics
Spring 2023, 12 units
Class: Tue/Thu 5:00pm-6:20pm in GHC 4101
Should I take this course?
Note: This is not a traditional lecture-based course. Classes will often consist of group discussions and student-led presentations, based on assigned readings of research papers, online articles, or case studies on open-source software.
- For students doing research in the areas of programming languages, software engineering, computer systems, or security: this course will provide exposure to (a) a number of new application domains and the challenges of reasoning about software in those domains, and (b) techniques for leveraging domain-specific assumptions in order apply their research to new problems.
- For students targeting careers in security, software quality, or as domain experts: this course will (a) provide an introduction to a wide array of techniques for highly specialized software analysis and bug finding, and (b) help develop a knack for acquiring knowledge about state-of-the-art techniques from academic literature and prototyping with associated tools and artifacts.
- For students with a general interest in program analysis and security, this course will provide an opportunity to learn about and discuss a variety of different approaches to automated bug finding, as well as to engage in hands-on tool building through the course project.
Prerequisites
This course is open to PhD and Masters students interested in software engineering, program analysis, and/or security. The course assumes some background in understanding the source of common software bugs (e.g., buffer overflows) and dealing with program representations (e.g., abstract syntax trees) or automated testing tools (e.g., fuzzing). Any one of the following courses serve as sufficient prerequisites: 18-335/732 (Secure Software Systems), 14-735 (Secure Coding), 17-355/665/819 (Program Analysis), 15-411/611 (Compiler Design), 15-414 (Bug Catching), 15-330/18-330/18-730 (Intro to Computer Security). 14-741/18-631 (Intro to Information Security) may also be sufficient, depending on background or related coursework. If you have taken a course equivalent to any of the listed pre-requisites in a different institution, or if you have had other relevant experiences (e.g., participating in CTFs or working in industry), please register and contact the instructor via email. You should expect to have the following background:
- Basic understanding of build systems and program execution: compilers, interpreters, type checkers, bytecode, threads, system calls, virtual machines, inter-process communication, client-server architecture.
- Comfort working with large-ish code-bases (10K+ LoC) in C and Java.
- Ability to discover resources from the web to quickly learn unfamiliar programming languages, build systems, virtual machine setups, etc.
- Basic understanding of foundational algorithms and data-structures such as hash-maps, trees, and graph traversal.
- Basic understanding of discrete mathematics (e.g., set theory) and fluency in first-order logic notation. Non-trivial formulas using the following symbols should make sense: {∀, ∃, ⇒, ⇔, ∅, ⊆}.
Degree Requirements Fulfilled
Masters: Contact the instructor to request.
PhD students: Satisfies the ENG requirement of the Software Engineering PhD program. Contact the instructor to request others.
Learning Objectives
Students completing this course should be able to:
- Identify practical challenges of applying well known program analysis techniques to a variety of application domains.
- Formulate and leverage domain-specific assumptions for making program analysis tractable and useful in a specialized setting.
- Build practical tools for improving software quality in real-world systems.
Course Topics
- Overview of general techniques for finding software bugs (static analysis, fuzzing, symbolic execution, formal methods)
- Program analysis techniques for various domains, including:
- Database systems
- Operating systems
- Mobile applications
- Web applications
- Compilers
- Web browsers
- Distributed systems
- Network protocols
- Machine learning
- Cyber-physical systems
- Smart contracts
- Considerations in industrial adoption of automated bug-finding tools.
Assessments
- 20% pre-class reading responses
- 20% class presentations
- 20% participation
- 10% exploratory assignment
- 30% final project
Schedule
The following schedule of topics is tentative and will be updated in real time during the semester.
Course Logistics and Policies
Technology Requirements
Here is the technology that students may need to use during the semester. If you have any trouble using any of these tools, please talk to the instructor so that we can figure out an accommodation.
- Canvas: For reading material, pre-class quizzes, uploading class presentations.
- Piazza: For class announcements, questions about the course, other discussions.
- Access to computer or VM with a UNIX-based operating system (e.g. Linux, MacOS): Most code artifacts that we will encounter run best on Unix environments. Windows users should able to use WSL.
- A GitHub account: To interact with source code repositories of analysis tools and target programs, as well as for certain class activities.
- Laptop or tablet for in-class presentations: Our classrooms should be equipped with a projector connected via HDMI.
Accessing the Reading Material
When we assign readings, we will provide a link to a web resource containing the official publication. For published academic papers, we will usually reference the Digital Object Identifier (DOI) that takes you to the online proceedings. For other resources such as blog posts, news articles, and software repositories, we will provide a URL to the primary source. For articles behind a paywall, we will provide a PDF via Canvas for internal classroom use only. Please do not distribute these PDFs publicly as doing so may infringe on copyright.
Pre-class Readings
For most classes with assigned readings or tutorials, a quiz will be assigned on Canvas. These quizzes are to be completed individually before the start of class. The quizzes will be based on the assigned reading and will be graded leniently; they are intended to be a checkpoint to ensure that everyone is prepared and on the same page before coming to class. Late submissions will not be accepted, since that would defeat the purpose of the quiz. However, see the absence policy below.
Class Presentations
Depending on the class size, you will be expected to be the discussion lead 1—2 times in the semester. As the discussion lead, you should read the paper carefully (complete all three passes of Keshav’s three pass approach) and prepare a presentation for the paper along with points to seed the discussions afterwards. In some cases, you may be able to find a video of the authors own presentation at the conference or their slides, which you are welcome to use. However the lead is still required to prepare slides for their own view on the work and the paper, including seeding and leading the discussion around the work and any other context necessary.
In class presentations, be sure to avoid infringing on copyright. Most publishers including ACM and IEEE allow using parts of the paper (such as figures) for internal classroom use. If using images sourced from the internet at large, look for works in the public domain or those that allow reuse (e.g., via a Creative Commons License). Provide proper attribution where required. When in doubt, make your own drawings. You are also welcome to use the whiteboard in class in lieu of making complicated custom diagrams on slides.
Participation
A portion of the grade is reserved for class participation, which has both an objective and subjective component. Partial credit is objectively assigned to class attendance (see also the absence policy below). The remaining credit is subjectively assigned based on active involvement in technical discussions such as asking questions, providing clarifications, and sharing opinions or experiences either in class or on Piazza. The instructor will track student participation through the semester and holistically assign participation grades (low/medium/high) at the end of the semester.
Course Project
The most significant component of the course is a semester-long project. Students may work individually or in groups of two or three (the expectations of project scope scale with team size). Projects should be related to analyzing software in some specific domain and must have a concrete implementation component, but otherwise the topic can be of the students' choosing. PhD students are expected to pick a project topic that explores an open research question, usually aligning with their own thesis work. Masters students are welcome to perform research, but can also pick an engineering-oriented project as long as it engages with large-scale real-world software: either the software-analysis tooling or the target applications should be in regular widespread use. The teaching staff will help students refine their project scope to ensure it meets learning objectives while being appropriately sized for completion with the semester.
In the last week of the semester, all project teams will give a presentation of their project outcomes in class. Additionally, project teams are expected to write up a report of their project in a conference/workshop-style short paper, which will be due in Finals week. Projects are graded on contributions and presented insights. More details will be released closer to the end of the semester.
Class Absence Policy
A significant portion of the final grade is dependent on regular participation in class activities. However, we understand that unexpected life events (e.g., health or family issues) and other professional obligations (e.g., conference travel or university-level athletics events) can cause students to miss a small number of classes. To account for such absences, every student will automatically get full points for up to 4 missed absences (~15%) in both the pre-class reading quizzes and class attendance. Students need not inform the instructor ahead of time. This policy will account for lapses of all types, including simply forgetting to submit on time, registering for the class late, etc. No other make-up provision will be made for reading responses and class attendance, with the exception of explicit disability-related accommodations.
For unexpected contingencies affecting scheduled class presentations and final project presentations, please contact the instructor ASAP; these will be handled on a case-by-case basis.
Collaboration and Academic Integrity
Since this is a discussion-oriented and advanced topics course, collaboration is expected. However, each student is expected to submit pre-class reading responses individually. Course projects and in-class activities may be performed in teams. Any contribution by members outside the team (e.g., assistance provided by open-source software developers) should be explicitly credited. In general, we will follow the standard CMU Academic Integrity Policy.
Statement of Support for Students’ Health and Well-being
Grad school isn't easy, so please take care of yourself. Your health matters. Do your best to maintain a healthy lifestyle this semester, including eating well, getting enough sleep, and taking time to relax. This will help you achieve your goals and cope with stress.
All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at https://www.cmu.edu/counseling. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.