diff options
author | Helmut Grohne <helmut@subdivi.de> | 2013-03-24 21:29:44 +0100 |
---|---|---|
committer | Helmut Grohne <helmut@subdivi.de> | 2013-03-24 21:29:44 +0100 |
commit | 2e98ddfaecfeb1800e5b19ba8234316282971aa7 (patch) | |
tree | 717bcb69f4994c0d8838bbc68826a9616bd9aaa8 | |
parent | 096a61df7ff2beb652c341ef35c5a6efb6f6b652 (diff) | |
download | ssdeep-2e98ddfaecfeb1800e5b19ba8234316282971aa7.tar.gz |
add a README
-rw-r--r-- | README.md | 35 |
1 files changed, 35 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..6f8ea88 --- /dev/null +++ b/README.md @@ -0,0 +1,35 @@ +ssdeep fork +=========== +This is a fork of [ssdeep][http://ssdeep.sf.net] and a different implementation +of the [Python wrappers][https://github.com/DinoTools/python-ssdeep]. Goals of +this fork: + +1. Compute the hash by reading the input exactly once. No seeks. No buffering. +2. Thread safety. Do not use global variables. +3. Fixed memory consumption. +4. An API in the spirit of Python's hashlib. + +fuzzy.c and fuzzy.h contain a different implementation of the hash computation. +Note that comparison has been left out entirely, because I have no complaint +about the upstream implementation. + +ssdeep.c contains a simply program that does not recognize any of the original +ssdeep options, but tries to behave a bit similar. The goal here is to make it +comparable to upstream. + +pyfuzzy.pyx and setup.py contain a Cython wrapper to glue it into Python. + +performance +----------- +The new implementation runs about 8 to 20 "normal" hashes in parallel. This is +much more expensive. In my profiling the reimplementation is about 1.5 times +slower than upstream in average. On the other hand upstream is occasionally 10 +times slower. The likely explanation here is that the blocksize was guessed +wrong and the file was rehashed. When reading from stdin the upstream version +first reads the entire input into main memory. The fork handles this case in +fixed memory. The `fuzzy_state` structure takes up about 2.5kb. + +about +----- +Like the upstream projects this code is licensed under the GPL-2+. If you have +any questions, please contact me at `Helmut Grohne <helmut@subdivi.de>`. |