Age | Commit message (Collapse) | Author |
|
Returning the object gets us into trouble as to what precisely the
return type is at no benefit.
|
|
|
|
The local variable data can be bool or bytes. That's inconvenient for
static type checkers. Avoid doing so.
|
|
|
|
|
|
|
|
|
|
This module is not used anywhere and thus its dependency on
python3-magic is not recorded in the README. It can be used to guess the
file type by looking at the contents using file magic. It is not a
typical hash function, but it can be used for repurposing dedup for
other analysers.
|
|
It wasn't copying the stored member and thus could be blacklist "wrong"
content after a copy.
|
|
|
|
In Python 3.x, lzma.LZMADecompressor doesn't have a flush method.
|
|
Moving the fetching part into dedup.utils. Instead of hard coding the
gzip compressed copy, try xz, gz and plain in that order. Also take care
to actually close the connection.
|
|
After all, it isn't that generic. It knows what information is necessary
for running dedup. Thus it really belongs to the extractor subclass.
By building on handle_control_info, not that much parsing logic is left
in the extractor subclass.
|
|
|
|
Iteration over file-like is required by deb822.Packages.iter_paragraphs.
|
|
The former behaviour was ignoring them. The intended use for dedup is to
know whenever a package unconditionally requires another package.
|
|
The handle_ar_member and handle_ar_end methods now have a default
implementation adding further handlers handle_debversion,
handle_control_tar and handle_data_tar.
In that process two additional bugs were fixed:
* decompress_tar was wrongly passing errors="surrogateescape" for
Python 2.x even though that's only supported for Python 3.x.
* The use of decompress actually passes the extension as unicode.
|
|
Building on the previous commit, add a decompress function that turns a
compressed filelike into a decompressed filelike. Use it to decouple the
decompression step.
|
|
It now supports:
* tell()
* seek(absolute_position), forward only
* close()
* closed
This is sufficient for putting it as a fileobj into tarfile.TarFile. By
doing so we can decouple decompression from tar processing, which eases
papering over the Python 2.x vs Python 3.x differences.
|
|
It is supposed to separate the parsing of Debian packages (understanding
how the format works) from the actual feature extraction. Its goal is to
simplify writing custom extractors for different feature sets.
|
|
|
|
Otherwise the yaml will contain binary strings on py3k which end up as
binary data in the sqlite database. In py2, yaml can handle those
unicode objects just fine.
|
|
|
|
zlib.crc32 returns a int32_t on py2 and a uint32_t on py3.
|
|
|
|
|
|
dpkg supports those since 1.17.6.
Signed-off-by: Guillem Jover <guillem@debian.org>
|
|
The GNU ar format adds a trailing slash to the member names, normalize
the member names to take this into account.
Signed-off-by: Guillem Jover <guillem@debian.org>
|
|
|
|
Reported-By: Stefan Kaltenbrunner
|
|
Otherwise the gzip hash cannot tell the empty stream and the
compressed empty stream apart.
|
|
|
|
Otherwise all files smaller than 10 bytes are successfully hashed to the
hash of the empty input when using the GzipDecompressor.
Reported-By: Olly Betts
|
|
|
|
|
|
|
|
|
|
|
|
* Rename "image_sha512" to "png_sha512".
* dedup.image.ImageHash is now a base class for image hashes such as
PNGHash and GIFHash.
* Enable both hashes in importpkg.
* Fix README.
* Add new hash combinations to webapp.
* Add "gif file not named *.gif" to issues in update_sharing.
* Add redirect for "image_sha512" to webapp for backwards
compatibility.
|
|
|
|
|
|
|
|
They cluttered webapp.py and now vim can give proper highlighting for
the templates.
|
|
|
|
|
|
|
|
|
|
Also document it.
|
|
|
|
|