summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorHelmut Grohne <helmut@subdivi.de>2013-03-24 21:29:44 +0100
committerHelmut Grohne <helmut@subdivi.de>2013-03-24 21:29:44 +0100
commit2e98ddfaecfeb1800e5b19ba8234316282971aa7 (patch)
tree717bcb69f4994c0d8838bbc68826a9616bd9aaa8
parent096a61df7ff2beb652c341ef35c5a6efb6f6b652 (diff)
downloadssdeep-2e98ddfaecfeb1800e5b19ba8234316282971aa7.tar.gz
add a README
-rw-r--r--README.md35
1 files changed, 35 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..6f8ea88
--- /dev/null
+++ b/README.md
@@ -0,0 +1,35 @@
+ssdeep fork
+===========
+This is a fork of [ssdeep][http://ssdeep.sf.net] and a different implementation
+of the [Python wrappers][https://github.com/DinoTools/python-ssdeep]. Goals of
+this fork:
+
+1. Compute the hash by reading the input exactly once. No seeks. No buffering.
+2. Thread safety. Do not use global variables.
+3. Fixed memory consumption.
+4. An API in the spirit of Python's hashlib.
+
+fuzzy.c and fuzzy.h contain a different implementation of the hash computation.
+Note that comparison has been left out entirely, because I have no complaint
+about the upstream implementation.
+
+ssdeep.c contains a simply program that does not recognize any of the original
+ssdeep options, but tries to behave a bit similar. The goal here is to make it
+comparable to upstream.
+
+pyfuzzy.pyx and setup.py contain a Cython wrapper to glue it into Python.
+
+performance
+-----------
+The new implementation runs about 8 to 20 "normal" hashes in parallel. This is
+much more expensive. In my profiling the reimplementation is about 1.5 times
+slower than upstream in average. On the other hand upstream is occasionally 10
+times slower. The likely explanation here is that the blocksize was guessed
+wrong and the file was rehashed. When reading from stdin the upstream version
+first reads the entire input into main memory. The fork handles this case in
+fixed memory. The `fuzzy_state` structure takes up about 2.5kb.
+
+about
+-----
+Like the upstream projects this code is licensed under the GPL-2+. If you have
+any questions, please contact me at `Helmut Grohne <helmut@subdivi.de>`.