blob: 6f8ea8853a70c20c2085305d358564c250ec2a9e (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
ssdeep fork
===========
This is a fork of [ssdeep][http://ssdeep.sf.net] and a different implementation
of the [Python wrappers][https://github.com/DinoTools/python-ssdeep]. Goals of
this fork:
1. Compute the hash by reading the input exactly once. No seeks. No buffering.
2. Thread safety. Do not use global variables.
3. Fixed memory consumption.
4. An API in the spirit of Python's hashlib.
fuzzy.c and fuzzy.h contain a different implementation of the hash computation.
Note that comparison has been left out entirely, because I have no complaint
about the upstream implementation.
ssdeep.c contains a simply program that does not recognize any of the original
ssdeep options, but tries to behave a bit similar. The goal here is to make it
comparable to upstream.
pyfuzzy.pyx and setup.py contain a Cython wrapper to glue it into Python.
performance
-----------
The new implementation runs about 8 to 20 "normal" hashes in parallel. This is
much more expensive. In my profiling the reimplementation is about 1.5 times
slower than upstream in average. On the other hand upstream is occasionally 10
times slower. The likely explanation here is that the blocksize was guessed
wrong and the file was rehashed. When reading from stdin the upstream version
first reads the entire input into main memory. The fork handles this case in
fixed memory. The `fuzzy_state` structure takes up about 2.5kb.
about
-----
Like the upstream projects this code is licensed under the GPL-2+. If you have
any questions, please contact me at `Helmut Grohne <helmut@subdivi.de>`.
|