summaryrefslogtreecommitdiff
path: root/dedup
AgeCommit message (Collapse)Author
2016-06-09DecompressedStream: fix decompression without flushHelmut Grohne
In Python 3.x, lzma.LZMADecompressor doesn't have a flush method.
2016-05-25autoimport: improve fetching package listsHelmut Grohne
Moving the fetching part into dedup.utils. Instead of hard coding the gzip compressed copy, try xz, gz and plain in that order. Also take care to actually close the connection.
2016-05-23move dedup.debpkg.process_control back into importpkgHelmut Grohne
After all, it isn't that generic. It knows what information is necessary for running dedup. Thus it really belongs to the extractor subclass. By building on handle_control_info, not that much parsing logic is left in the extractor subclass.
2016-05-23DebExtractor: implement parsing of control.tarHelmut Grohne
2016-05-22DecompressedStream: implement readlineHelmut Grohne
Iteration over file-like is required by deb822.Packages.iter_paragraphs.
2016-05-05treat Pre-Depends like regular DependsHelmut Grohne
The former behaviour was ignoring them. The intended use for dedup is to know whenever a package unconditionally requires another package.
2016-05-01push more functionality into DebExtractorHelmut Grohne
The handle_ar_member and handle_ar_end methods now have a default implementation adding further handlers handle_debversion, handle_control_tar and handle_data_tar. In that process two additional bugs were fixed: * decompress_tar was wrongly passing errors="surrogateescape" for Python 2.x even though that's only supported for Python 3.x. * The use of decompress actually passes the extension as unicode.
2016-04-28decouple a function decompress out of decompress_tarHelmut Grohne
Building on the previous commit, add a decompress function that turns a compressed filelike into a decompressed filelike. Use it to decouple the decompression step.
2016-04-28extend functionality of DecompressedStreamHelmut Grohne
It now supports: * tell() * seek(absolute_position), forward only * close() * closed This is sufficient for putting it as a fileobj into tarfile.TarFile. By doing so we can decouple decompression from tar processing, which eases papering over the Python 2.x vs Python 3.x differences.
2016-04-19add a class DebExtractor for guiding feature extractionHelmut Grohne
It is supposed to separate the parsing of Debian packages (understanding how the format works) from the actual feature extraction. Its goal is to simplify writing custom extractors for different feature sets.
2016-04-16add a validate method to HashedStreamHelmut Grohne
2015-04-16process_control: do not encode to asciiHelmut Grohne
Otherwise the yaml will contain binary strings on py3k which end up as binary data in the sqlite database. In py2, yaml can handle those unicode objects just fine.
2015-04-16element access on bytes yields int in py3kHelmut Grohne
2015-04-16zlib.crc32 behaves inconsistently on py2 vs py3Helmut Grohne
zlib.crc32 returns a int32_t on py2 and a uint32_t on py3.
2015-04-16there is no itertools.imap in py3kHelmut Grohne
2015-04-16distinguish bytes from unicode for py3kHelmut Grohne
2014-05-11importpkg: add support for control.tar and control.tar.xzGuillem Jover
dpkg supports those since 1.17.6. Signed-off-by: Guillem Jover <guillem@debian.org>
2014-05-11dedup.arreader: remove trailing slash from ar membersGuillem Jover
The GNU ar format adds a trailing slash to the member names, normalize the member names to take this into account. Signed-off-by: Guillem Jover <guillem@debian.org>
2014-02-23spell check commentsHelmut Grohne
2014-02-23fix spelling mistakeHelmut Grohne
Reported-By: Stefan Kaltenbrunner
2014-02-19blacklist content rather than hashesHelmut Grohne
Otherwise the gzip hash cannot tell the empty stream and the compressed empty stream apart.
2014-02-19GzipDecompressor: don't treat checksum as garbage trailerHelmut Grohne
2014-02-19DecompressedHash should fail on trailing inputHelmut Grohne
Otherwise all files smaller than 10 bytes are successfully hashed to the hash of the empty input when using the GzipDecompressor. Reported-By: Olly Betts
2013-10-03work around python-debian's #670679Helmut Grohne
2013-09-04webapp: serve static files from /staticHelmut Grohne
2013-09-02importpkg: move library-like parts to dedup.debpkgHelmut Grohne
2013-08-16make debian version_compare available in sqlHelmut Grohne
2013-08-16webapp templates: add an anchor for file issuesHelmut Grohne
2013-08-01support hashing gif imagesHelmut Grohne
* Rename "image_sha512" to "png_sha512". * dedup.image.ImageHash is now a base class for image hashes such as PNGHash and GIFHash. * Enable both hashes in importpkg. * Fix README. * Add new hash combinations to webapp. * Add "gif file not named *.gif" to issues in update_sharing. * Add redirect for "image_sha512" to webapp for backwards compatibility.
2013-07-30templates/binary: space between package and compareHelmut Grohne
2013-07-30templates: wiki.d.o redirects to https nowHelmut Grohne
2013-07-27also move the static directory into the dedup packageHelmut Grohne
2013-07-27move templates to dedup packageHelmut Grohne
They cluttered webapp.py and now vim can give proper highlighting for the templates.
2013-07-26verify package hashes when importing via httpHelmut Grohne
2013-05-27dedup.image: img.convert can also raise that crazy stuffHelmut Grohne
2013-03-18dedup.image: mask errors from PILHelmut Grohne
2013-03-12dedup.arreader: missing bytes markerHelmut Grohne
2013-03-12move ArReader from importpkg to dedup.arreaderHelmut Grohne
Also document it.
2013-03-02move fetchiter from webapp to dedup.utilsHelmut Grohne
2013-02-24hash image contentsHelmut Grohne
2013-02-21move compression functions to module dedup.compressionHelmut Grohne
2013-02-21do not track byted compiled python filesHelmut Grohne
2013-02-21move hashing functions to module dedup.hashingHelmut Grohne