Age | Commit message (Collapse) | Author | |
---|---|---|---|
2016-04-19 | add a class DebExtractor for guiding feature extraction | Helmut Grohne | |
It is supposed to separate the parsing of Debian packages (understanding how the format works) from the actual feature extraction. Its goal is to simplify writing custom extractors for different feature sets. | |||
2016-04-16 | add a validate method to HashedStream | Helmut Grohne | |
2016-04-16 | importpkg: use yaml dumper directly | Helmut Grohne | |
Instead of carefully crafting an iterator to pass to yaml.safe_dump_all, we simply take control on our own and call represent on a yaml dumper object where needed. | |||
2016-04-16 | importpkg: refactor commit handling out of process_package* | Helmut Grohne | |
2015-04-16 | use binary stdin on py3k | Helmut Grohne | |
2015-04-16 | distinguish bytes from unicode for py3k | Helmut Grohne | |
2014-07-23 | importpkg: be more liberal in control file naming | Helmut Grohne | |
While in current sid packages the control file in control.tar is always named "./control", some older packages name it "control". | |||
2014-05-11 | importpkg: reduce copy&paste | Helmut Grohne | |
2014-05-11 | importpkg: add support for data.tar.lzma | Guillem Jover | |
Creating packages with lzma compression has been deprecated since dpkg 1.16.4, but there might be some of those in the wild and supporting them is strightforward when xz is already supported. Signed-off-by: Guillem Jover <guillem@debian.org> | |||
2014-05-11 | importpkg: add support for control.tar and control.tar.xz | Guillem Jover | |
dpkg supports those since 1.17.6. Signed-off-by: Guillem Jover <guillem@debian.org> | |||
2014-02-23 | spell check comments | Helmut Grohne | |
2014-02-19 | blacklist content rather than hashes | Helmut Grohne | |
Otherwise the gzip hash cannot tell the empty stream and the compressed empty stream apart. | |||
2013-09-02 | importpkg: move library-like parts to dedup.debpkg | Helmut Grohne | |
2013-08-19 | importpkg: don't blacklist boring gzip_sha512 hashes | Helmut Grohne | |
* In practise there are very few compressed files with trivial hashes. * Blacklisting these values results in false positives in the gzip issues. | |||
2013-08-01 | support hashing gif images | Helmut Grohne | |
* Rename "image_sha512" to "png_sha512". * dedup.image.ImageHash is now a base class for image hashes such as PNGHash and GIFHash. * Enable both hashes in importpkg. * Fix README. * Add new hash combinations to webapp. * Add "gif file not named *.gif" to issues in update_sharing. * Add redirect for "image_sha512" to webapp for backwards compatibility. | |||
2013-07-29 | importpkg.py: support uncompressed data.tar | Helmut Grohne | |
2013-07-26 | verify package hashes when importing via http | Helmut Grohne | |
2013-07-12 | importpkg: simplify state logic | Helmut Grohne | |
2013-07-12 | importpkg: split process_package to process_control | Helmut Grohne | |
2013-06-10 | split the import phase to a yaml stream | Helmut Grohne | |
importpkg.py now emits a yaml stream instead of updating the database. The acutual updating now happens in readyaml.py. In this process autoimport.py was significantly reworked to import packages in parallel. | |||
2013-03-26 | Merge branch schemachange | Helmut Grohne | |
2013-03-12 | move ArReader from importpkg to dedup.arreader | Helmut Grohne | |
Also document it. | |||
2013-03-09 | split content table to a hash table | Helmut Grohne | |
In the old content table (package, filename, size) would be the same for multiple hash functions. Now the schema represents that each file has precisely one size, but multiple hashes. | |||
2013-03-07 | enable enforcing foreign keys | Helmut Grohne | |
2013-03-07 | integrate the source table into the package table | Helmut Grohne | |
2013-03-05 | importpkg: source header may contain a version | Helmut Grohne | |
2013-03-04 | importpkg: record the source package relationship | Helmut Grohne | |
2013-03-02 | move sql schema to a separate file | Helmut Grohne | |
2013-02-24 | hash image contents | Helmut Grohne | |
2013-02-23 | importpkg: ignore filenames with encoding errors | Helmut Grohne | |
2013-02-21 | move compression functions to module dedup.compression | Helmut Grohne | |
2013-02-21 | move hashing functions to module dedup.hashing | Helmut Grohne | |
2013-02-21 | rename test.py to importpkg.py | Helmut Grohne | |