CryptoMiniSat
New version: 1.2.11 (30/10/09)
CryptoMiniSat is a program developed from MiniSat, primarily to use in cryptography, though its advantages are probably useful for other fields as well. It has the following advantages compared to standard MiniSat:
- Natively handled XOR functions
- Phase saving
- Statistics generation -- With learnt clause stats
- Search randomization
- Detailed solving process visualization
- Clause grouping and group naming
- Variable naming
- Debug mode
- Code cleanup -- no dependency on Boost
- Regression tests -- thanks to Vijay Ganesh for this idea
These advancements are partially described in the poster for Eurocrypt 2009 available here, and a research paper on how these advancements were used in the field of cryptography will be presented at the SAT 2009 conference in Swansea, Wales. The authors of the paper are: Mate Soos, Karsten Nohl, and Claude Castelluccia. Questions specifically about CryptoMiniSat should be addressed to Mate Soos.
The program is available as a source (ver 1.2.11) or as a windows executable (ver 1.2.11). If you download the source, please read the INSTALL and the README file. An example DIMACS file that uses XOR-clauses, clause groups and variable names , and describes the Crypto-1 stream cipher is available here (it is also in the .tar.gz archive under "cryptominisat-1.2.11/build/"). A detailed description of the enhancements' usage is detailed below.
During development, MiniSat's version solely written in C has also been used, and the native XORs have been back-ported to it. No other advancement has been back-ported. It is available here.
Usage
The overview of the usage is printed by:
USAGE: ./cryptominisat [options] <input-file> <result-output-file>
where input may be either in plain or gzipped DIMACS.
OPTIONS:
-polarity-mode = {true,false,rnd}
-decay = <num> [ 0 - 1 ]
-rnd-freq = <num> [ 0 - 1 ]
-verbosity = {0,1,2}
-proof-log = Logs the proof into files 'proofN.dot', where N is the
restart number. The log can then be visualized using
the 'dot' program from the graphviz package
-grouping = Lets you group clauses, and customize the groups' names.
This helps when printing statistics
-stats = Computes and prints statistics during the search
-randomize = <seed> [0 - 2^32-1] Randomly permutates the clauses. The
seed is later also used for picking decision variables
-restrict = <num> [1 - varnum] when picking random variables to branch
on, pick one that in the 'num' most active vars useful
for cryptographic problems, where the question is the key,
which is usually small (e.g. 80 bits)
-restarts = <num> [1 - 2^32-1] No more than the given number of
restarts will be performed during search
XOR functions
One of the most challenging problems when trying to describe cryptographic primitives in SAT solvers is the XOR function that is used very often in cryptography. MiniSat (and most DPLL-based SAT solvers) only understands the Conjunctive Normal Form or CNF for short. However, to describe a XOR function in CNF you need to have 2^{n-1} clauses, where n is the size of your XOR function. To overcome this exponential explosion, XOR functions are cut up into manageable sizes (6-7) and then implemented in 2^6 clauses each. This is not only tedious, uses a lot of memory and slow, but also makes it hard to know what is happening while solving, as the statistics will be very strange.
In CryptoMiniSat, the DIMACS standard format augmented with the following extension: having an "x" in front of a line makes that line a XOR function. For instance, the line
x10 2 7 0
means that
v10 + v2 + v7 = true (where "+" is the binary XOR)
If we wish to invert the xor-clause, it is sufficient to invert one of the variables:
x10 2 -7 0
has now the XOR is inverted (i.e. it is equated to false)
v10 + v2 + v7 = false (where "+" is the binary XOR)
Naturally, if two variables are inverted, their inversions cancel each other out, and the XOR is equated to true. There is no need to cut up XOR clauses to smaller sizes, they can be of any length. Using this representation instead of the cut-and-convert-CNF will give you approximately 2x the speed, lots of memory savings and cleaner statistics.
Clause grouping and naming
Clause groups are useful because most of the time a single clause does not describe the function that we need to describe, e.g. a filter function. So, all clauses that describe that one function can now be put into a group and named, so that they will show up in the statistics and the search graph as one entity. Clause grouping is turned on by the "-grouping" switch to CryptoMiniSat. In the DIMACS file, clauses can be grouped and named by adding a line after the clause:
20 3 5 0 c g 2 interesting function
This will put the clause "20 3 5" into group 2 and name it "interesting function". You can put multiple clauses into the same group:
20 3 5 0 c g 2 interesting function -8 9 12 0 c g 2 interesting function
Now both clauses "20 3 5" and "-8 9 12" are in the same clause group 2, which has the name "interesting function"
Clause groups have no effect on the solving, they have an effect only on the search statistics and the generated search graph.
Variable naming
Variables are good to name so that when they show up in the statistics or in the search graph, they can covey a meaning: it's much more useful to see that "state[148]" caused 800 propagations, than to see that variable 875 caused 800 propagations. Variables can be named by simply putting a line at the very end of you DIMACS file:
c v 875 state[148]
which will name variable 875 as "state[148]". Note that variable 875 should exist in your DIMACS file, though it's not required.
Statistics generation
Giving the "-stats" switch to CryptoMiniSat will make it generate statistics during the solving. It is best to save these statistics to a file like "./cryptominisat -grouping -stats satfile > mystats". The generated statistics will then be in the file "mystats". Statistics are generated for each restart of MiniSat. It is best to look at the statistics generated after at least 4-5 restarts, as MiniSat needs time to learn the best variable ordering, and until then the statistics are quite unuseful.
The statistics generated are the following:
- Distribution of the length of the learnt clauses: Ex.:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | Learnt clause length distribution | |Length between no. cl. | +-----------------------------------------------------------+ |0 - 1 1 | |2 - 3 79 | |4 - 5 390 | |6 - 7 419 | |8 - 9 429 | |10 - 11 283 | |12 - 13 185 | |14 - 15 119 | |16 - 17 78 | |18 - 19 70 | |20 - 21 52 | |22 - 23 41 | |24 - 25 36 | |26 - 27 17 | |28 - 29 12 | |30 - 31 0 | +-----------------------------------------------------------+ | Learnt clause distribution in graph form | +-----------------------------------------------------------+ | | | + | | + | | + ++ | | + +++ | | + ++++ | | ++++++ | | N ++++++ | | u +++++++ | | m +++++++ | | b ++++++++ | | e +++++++++ | | r +++++++++ | | ++++++++++ | | ++++++++++++ | | ++++++++++++ | | +++++++++++++ ++ | | ++++++++++++++++++ | | ++++++++++++++++++++++++ | | Learnt clause size | +-----------------------------------------------------------+
The smaller the learnt clause, the better. You should observe a general shrinking of the clause lengths, restart after restart. After a while, unitary clauses should be started to be learnt.
- The number of times variables are branched upon by CryptoMiniSat. Ex.:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | No. times variable branched on | |var var name no. times | +-----------------------------------------------------------+ |54 sr[0][65](real unknown) 1608 | |45 sr[0][49](real unknown) 1480 | |82 sr[0][43](real unknown) 1461 | |66 sr[0][54](real unknown) 1420 | |195 sr[0][59](real unknown) 1407 | |218 sr[0][69](real unknown) 1318 | |88 sr[0][57](real unknown) 1307 | |223 sr[0][51](real unknown) 1260 | |426 sr[0][70](real unknown) 1201 | |236 sr[0][47](real unknown) 1149 | |83 sr[0][56](real unknown) 1118 | |43 sr[0][45](real unknown) 1050 | |387 sr[0][74](real unknown) 1014 | |46 sr[0][72](real unknown) 1012 | |378 sr[0][55](real unknown) 975 | |349 sr[0][58](real unknown) 890 | |9 sr[0][66](real unknown) 889 | |279 sr[0][44](real unknown) 841 | |108 sr[0][61](real unknown) 841 | |401 sr[0][42](real unknown) 820 | +-----------------------------------------------------------+
In this example, the stream cipher's shift registers ('sr[0][xxxx]') were branched upon the most. This is expected in this example, as the CNF models a stream cipher, for which the state is the unknown.
- Number of propagations made by clause groups. Ex.:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | No. propagations made by clause groups | |group group name no. props | +-----------------------------------------------------------+ |270 sr0[79] 6101 | |307 f1[36] 5950 | |347 f0[42] 5911 | |388 f0[48] 5875 | |172 f4[16] 5843 | |361 f0[44] 5788 | |130 f4[10] 5775 | |335 f2[40] 5768 | |389 f1[48] 5764 | |253 f2[28] 5684 | |362 f1[44] 5682 | |334 f1[40] 5621 | |116 f4[8] 5608 | |416 f0[52] 5564 | |348 f1[42] 5563 | |374 f0[46] 5559 | |295 f3[34] 5545 | |239 f2[26] 5502 | |402 f0[50] 5493 | |280 f2[32] 5443 | +-----------------------------------------------------------+
In this example, most of the propagations were made by fX[yy] clause groups, which model the internal filter functions of the Crypto1 stream cipher (note: if internal filter functions were not present, the most propagating functions would have been the feedback functions)
- Number of conflicts made by clause groups. Ex.:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | No. conflicts made by clause groups | |group group name no. confl | +-----------------------------------------------------------+ |284 sr0[81] 388 | |305 sr0[84] 357 | |270 sr0[79] 329 | |256 sr0[29] 307 | |298 sr0[83] 263 | |243 sr0[27] 223 | |229 sr0[25] 200 | |277 sr0[80] 184 | |263 sr0[30] 175 | |236 sr0[26] 169 | |222 sr0[24] 166 | |311 sr0[85] 165 | |291 sr0[82] 162 | |250 sr0[28] 160 | |161 f0[15] 133 | |377 f3[46] 119 | |318 sr0[86] 115 | |215 sr0[23] 113 | |202 sr0[21] 103 | |209 sr0[22] 100 | +-----------------------------------------------------------+
Most conflicts were made by the feedback functions here.
- Propagation depth order of clause groups. Shows which clause is usually the first to propagate (i.e. is at the top of the search tree. Ex.:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | Propagation depth order of clause groups | |group group name avg order | +-----------------------------------------------------------+ |62573 learnt clause 4 | |63859 learnt clause 4 | |67558 learnt clause 4 | |72722 learnt clause 4 | |62562 learnt clause 5 | |62660 learnt clause 5 | |67427 learnt clause 5 | |73221 learnt clause 5 | |61349 learnt clause 6 | |62554 learnt clause 6 | |63481 learnt clause 6 | |65004 learnt clause 6 | |65590 learnt clause 6 | |66270 learnt clause 6 | |66347 learnt clause 6 | |66608 learnt clause 6 | |69756 learnt clause 6 | |71278 learnt clause 6 | |71306 learnt clause 6 | |71871 learnt clause 6 | +-----------------------------------------------------------+
In this case, the search tree's topmost propagations were made by learnt clauses. This means that at this point in solving (at the 15th restart), the learnt clauses are effectively directing the search(!)
- Average conflict depth order of clause groups. Gives the average depth that a clause group caused a conflict. Ex.:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | Avg. conflict depth order of clause groups | |groupno group name avg. depth| +-----------------------------------------------------------+ |59376 learnt clause 8 | |67919 learnt clause 8 | |69444 learnt clause 8 | |71658 learnt clause 8 | |59330 learnt clause 9 | |62405 learnt clause 9 | |65874 learnt clause 9 | |71801 learnt clause 9 | |71864 learnt clause 9 | |59700 learnt clause 10 | |61640 learnt clause 10 | |61812 learnt clause 10 | |66760 learnt clause 10 | |72067 learnt clause 10 | |60360 learnt clause 10.5 | |59720 learnt clause 11 | |60949 learnt clause 11 | |61647 learnt clause 11 | |63013 learnt clause 11 | |64090 learnt clause 11 | +-----------------------------------------------------------+
The table shows that the topmost conflicts were on average made by learnt clauses. This means that learnt clauses are effectively pruning the search graph, they are aborting the search early.
- Variable assignment order. Assignment of the variables was in this order on average. Ex.:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | Variables are assigned in the following order | |var var name avg order | +-----------------------------------------------------------+ |146 sr[0][46](real unknown) 0 | |356 sr[0][48](real unknown) 17.47 | |413 sr[0][41](real unknown) 18.2 | |43 sr[0][45](real unknown) 21.46 | |100 sr[0][50](real unknown) 22.4 | |423 sr[0][60](real unknown) 23.21 | |83 sr[0][56](real unknown) 23.78 | |378 sr[0][55](real unknown) 23.82 | |82 sr[0][43](real unknown) 26.55 | |47 f[3][13] 26.91 | |320 f[0][37] 27.02 | |54 sr[0][65](real unknown) 27.27 | |45 sr[0][49](real unknown) 28.11 | |66 sr[0][54](real unknown) 28.64 | |398 f[2][21] 28.81 | |195 sr[0][59](real unknown) 29.25 | |353 f[4][5] 29.47 | |288 f[1][29] 29.92 | |401 sr[0][42](real unknown) 30.08 | |236 sr[0][47](real unknown) 31.42 | +-----------------------------------------------------------+
At the top of the search graph, we find variables of the shift register as branching points. Interestingly, the solver also branched on some internal variables (internal filter functions) at a relatively high point of the tree.
- Some overview statistics. Ex:
[~/cryptominisat/build] $./cryptominisat -stats -grouping satfile [...] +-----------------------------------------------------------+ | Advanced statistics | +-----------------------------------------------------------+ |No. branches visited 15585 | |Avg. branch depth 84.81 | |No. decisions 32755 | |No. propagations 1151570 | |sum decisions on branches/no. branches | | (in a given branch, what is the avg. | | no. of decisions?) 8.518 | |sum propagations on branches/no. branches | | (in a given branch, what is the | | avg. no. of propagations?) 76.29 | +-----------------------------------------------------------+
The most useful statistics will be generated if the clauses are grouped and named, and variables also have names.
Search randomization
Giving the "-randomize=X" option to CryptoMiniSat, you can randomize the clause order and the internal pseudo-random seed of MiniSat. This is useful if you want to see how much time a specific instance takes on average for CryptoMiniSat to solve. Since other than the order of clauses and the pseudo-random seed, there is no randomization whatsoever of the working of CryptoMiniSat, using this option will randomize the running to its full extent. It is a common misconception that MiniSat generates a seed from the order of the clauses. This is not the case. MiniSat simply inserts the clauses into memory according to their order, and will propagate on the first one found - which will be different for different clause orderings.
Notes: (1) Clause randomization uses the idea from Daniel J. Bernstein, and has a proof by induction. (2) If the "-randomize=X" option is not given, the clause order is not randomized, and the pseudo-random seed is 0. (3) MiniSat's randomization has been replaced with Mersenne twister
Detailed solving process visualization
Generates search graphs into the folder "proofs/" (this folder must exist), if you give the "-proof-log" option to CryptoMiniSat. Always use "-grouping" and name your clause groups and variables to get visual clues in the graph. The generated graph can be analyzed with the boost library, or visualized with graphviz's "dot" program. The proof files generated are of the from "X-proofY.dot", where X is a random number, and Y runs from 0 to the number of restarts made by CryptoMiniSat. Number 0 is special, it shows the running of MiniSat's simplify() method (MiniSat's pre-parser). Since the search tree is larger for each run, I suggest you only visualize the 2nd or the 3rd restart - the rest would take too much time to visualize by the "dot" program in graphviz. To generate the graph, just execute "dot -Tsvg X-proofY.dot > mygraph.svg", and then open the "mygraph.svg" with a good svg editor - I suggest inkscape. You can also generate PDF files by giving the "-Tpdf" option to dot. An example search graph is is presented below:
The graph starts with a circle on the top, marked "BEGIN". Then it usually makes some unit propagations, moving straight down. This is the simplify() method of MiniSat. Then the solver makes some guesses moving to the left. Each of the boxes will contain the propagated/guess variable with the sign it was guessed/propagated to. After moving to the left, it goes down the search tree, usually finding a conflict marked by a regular pentagon with the generated conflict clause inside . The pentagon contains the literals in the generated conflict clause. From the pentagon it goes back up to one of the guesses, reverses the guess and then the search starts again. The search tree must be read from top to bottom, and from left to right. If the last restart (the last "X-proofY.dot" file) was used, at the bottom right of the tree there is a circle with either a "MODEL" or a "UNSAT" written in it. If not the last "X-proofY.dot" file was used, then there was a restart, and no conclusion was reached. If a model was found, by simply reading the decisions leading to this point you can see the satisfying variable bounding.
Squares correspond to guesses/propagations, e.g. "-sr[148]" written in a square means that the variable named sr[148] was bounded to false, and "sr[148]" means variable named sr[148] was bounded to true. All decisions have exactly one arrow pointing to them, indicating why that decision was taken: due to propagation (marked with the name of the clause group causing it), assumption, or a guess (marked with "guess" and drawn as a dotted line). Conflict clauses have numbers starting from 1, and their numbers are always preceded with a "**" (double star). Conflicts (pentagons) also have an arrow pointing to them: on these arrows are written the clause causing them.
Debug mode
CryptoMiniSat contains code to have debug output to the console while running. To activate it, you simply need to uncomment the line "//#define VERBOSE_DEBUG" in the "Solver.C" file and recompile the program. CryptoMiniSat will then run much slower, but will generate large quantities of internal debug output that can be used to either debug the program, or to see how the solving evolves over time.
Code cleanup
MiniSat's code is quite difficult to understand sometimes. To alleviate this problem, CryptoMiniSat's code is more clean, and tries to use more C++ to hide the complexity from the programmer.