Writing a constraint-based static analysis for C programs with LLVM and Z3
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写In this lab, you will implement a constraint-based analysis to detect exploitable divide-by-zero bugs. A bug is exploitable if hackers can control inputs or environments, thereby triggering unintended behaviors (e.g., denial-of-service) through the bug. For example, a recently reported divide-by-zero bug in the Linux kernel can be exploitable and crash the system. You will design a static analysis that detects such bugs by combining reaching definition analysis and taint analysis on top of a constraint solver, Z3.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写The skeleton code for Lab 6 is located under /cis547vm/lab6/ . We will frequently refer to the top level directory for Lab 6 as lab6 when describing file locations for the lab.
The following commands setup the lab:
$ cd lab6
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 $ mkdir build && cd build
$ cmake ..
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 $ make
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写The above command will generate an executable file constraint that checks whether the input program has an exploitable divide-by-zero bug:
$ cd lab6/test
$ clang -emit-llvm -S -fno-discard-value-names -c simple0.c
$ ../build/constraint simple0.ll
If you’ve done everything correctly up to this point you should see Potential divide-by-zero points:
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写In this lab, you will design a reaching definition analysis and taint analysis using Z3. The main tasks are to design the analysis in the form of Datalog rules through the Z3 C++ API, and implement a function that extracts logical constraints in the form of Datalog facts for each LLVM instruction.
We will then feed these constraints, along with your datalog rules, into the Z3 solver which should report any exploitable divide-by-zero errors. The main function of src/Constraint.cpp ties this logic together.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写In short, the lab consists of the following tasks:
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写1. Write Datalog rules in the initialize function in Extractor.cpp to define the reaching definition analysis and taint analysis.
2. Write the extractContraints function in Extractor.cpp that extracts Datalog facts from LLVM IR Instruction .
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Relations for Datalog Analysis. The skeleton code provides the definitions of necessary Datalog relations over LLVM IR in Extractor.h . In the following subsection, we will show how to represent LLVM IR programs using these relations.
The relations for def and use of variables are as follows:
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写● Def(X,Y) : Variable X is defined at instruction Y.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ● Use(X,Y) : Variable X is used at instruction Y.
The relations for the reaching definition analysis are as follows:
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写● Kill(X,Y) : Instruction X kills definition at instruction Y.
● Next(X,Y) : Instruction Y is an immediate successor of instruction X.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ● In(X,Y) : Definition at instruction Y may reach the program point just before instruction X.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ● Out(X,Y) : Definition at instruction Y may reach the program point just after instruction X.
Note that the Kill relation can be derived by using the Def relation by writing a Datalog rule.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写The relations for the taint analysis are as follows:
● Taint(X) : There exists a function call at intruction X that reads a tainted input.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 ● Edge(X,Y) : There exists an immediate tainted data-flow from instruction X to instruction Y.
● Path(X,Y) : There exists a transitive tainted data-flow from instruction X to instruction Y.
● Sanitizer(X) : There exists a function call at intruction X that sanitizes a tainted input.
● Div(X,Y) : There exists a division operation at instruction Y whose divisor is variable X
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写. ● Alarm(X) : There exists a potential exploitable divide-by-zero error at instruction X.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Assume that input programs may contain function calls to tainted_input and sanitizer that read and sanitize a tainted input, respectively. The final output relation for potential bug reports is Alarm .
You will use these relations to build rules for the definition analysis and taint analysis in Extractor.cpp .
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Encoding LLVM Instruction in Datalog. Recall that, in LLVM IR, values and instructions are interchangable. Therefore, all variables X, Y, and Z are an instance of LLVM’s Value class. Assume that input C programs do not have pointer variables. Therefore, we abuse pointer variables in LLVM IR as their dereference expressions. Consider the following simplified LLVM program from a simple C program int x = 0; int y = x;
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写x = alloca i32 ; I0
y = alloca i32 ; I1
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 store i32 0, i32* x ; I2
a = load i32, i32* x ; I3
store i32 a, i32* y ; I4
We ignore alloca instructions and consider that each store instruction defines the second argument. In the case of the above example, you should have Def(I0,I2) , because x corresponds to x in LLVM IR. Likewise, consider each load instruction uses the argument. In the example, you should have Use(I0,I3) and Def(I3,I3) because load instructions define a non-pointer variable which is represented as the instruction itself in LLVM. Finally, you should have Use(I3,I4) and Def(I1,I4) for instruction I4.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Defining Datalog Rules from C++ API. You will write your Datalog rules in the function initialize using the relations above. Consider an example Datalog rule:
A(X, Y) :- B(X, Z), C(Z, Y).
This rule corresponds to the following formula:
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写???, ??, ??. ??(??, ??) ∧ ??(??,??) ? ??(??, ??).
In Z3, you can specify the formula in the following sequence of APIs in initialize :
/* Declare quantified variables */
z3::expr X = C.bv_const(“X”, 32); // encode X as a 32-bit bitvector (bv)
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写z3::expr Y = C.bv_const(“Y”, 32);
z3::expr Z = C.bv_const(“Z”, 32);
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写/* Define and register rules */
z3::expr R0 = z3::forall(X, Y, Z, z3::implies(B(Y,Z) && C(Z, Y), A(X,Y)));
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写You might take a look at the important classes including expr and fixed point as well as an example of using Z3 C++ API.
Extracting Datalog Facts. You will need to implement the function extractConstraints in Extractor.cpp to extract Datalog facts for each LLVM instruction. The skeleton code provides a couple of auxiliary functions in lab6/src/Extract.cpp and lab6/src/Utils.cpp help you with this task:
- void addX(const InstMapTy &InstMap, ...)
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 - X denotes the name of a relation. These functions add a fact of X to the solver. It takes InstMap that encodes each LLVM instruction as an integer. This map is initialized in the main function.
- vector<Instruction*> getPredecessors(Instruction *I)
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写- Returns a set of predecessors of a given LLVM instruction I .
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写 - bool isTaintedInput(CallInst *CI)
- Checks whether a given LLVM call instruction CI reads a tainted input or not.
- bool isSanitzer(CallInst *CI)
- Checks whether a given LLVM call instruction CI reads sanitizes a tainted input or not.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Miscellaneous. For convenience for easy debugging, you can use the print function in Extract.cpp. If the -d option is passed through the command line (e.g., constraint simple0.ll -d ), it will print out tuples of several relations. You can extend the function for your purpose.
Format of Input Programs
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Input programs in this lab are assumed to have only sub-features of the C language as follows:
- All values are integers (i.e., no floating points, pointers, structures, enums, arrays, etc).
You can ignore other types of values.
- Assume that there is no function call except for tainted_input and sanitizer
Example Input and Output
Your analyzer should run on LLVM IR. For example:
$ cd lab6/test
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写$ clang -emit-llvm -S -fno-discard-value-names -c loop0.c
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写$ ../build/constraint loop0.ll
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写If the input program has exploitable divide-by-zero errors, it should print out the corresponding LLVM instructions.
一站式論文代寫,英国、美国、澳洲留学生Essay代寫—FreePass代写Potential divide-by-zero points:
%div = sdiv i32 4, %3
Items to Submit
Submit file Extractor.cpp via Coursera.