summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2021-12-29multiarchimport.py: reduce default loggingHelmut Grohne
2021-12-29multiarchanalyze.py: fix python3 compatibilityHelmut Grohne
.keys() now returns a special object, but show_files really wants something that provides len() and supports repeated iteration.
2021-12-27stop hiding M-A:same conflicts in binNMUed packagesHelmut Grohne
The issue has been solved by Mattia Rizzolo in dh-strip-nondeterminism via #999665.
2020-09-06fix tuple mismatchHelmut Grohne
Fixes: e6115dd16b46 ("hide M-A:same conflicts in binNMUed packages")
2020-09-03hide M-A:same conflicts in binNMUed packagesHelmut Grohne
binNMUed packages are not currently reproducible, because buildds don't pass --binNMU-timestamp to sbuild. Thus they use varying SOURCE_DATE_EPOCH and produce faulty packages. As much as this is a real bug, it is not actionable by maintainers. Hide such issues for now. Link: https://salsa.debian.org/perl-team/modules/packages/libtie-hash-indexed-perl/-/merge_requests/1 Link: https://bugs.debian.org/843773
2020-02-17fix typo in maforeign_library regexHelmut Grohne
2018-01-07multiarchanalyze: give examples when representing arch setsHelmut Grohne
Uwe Kleine-König said that knowing example architectures for file conflicts would be incredibly useful. The old presentation of architecture sets would collapse sets that are too big to a single count. This makes it difficult to find any colliding pair. Now, we'll now give at least two example architectures in addition to the count. Reported-By: Uwe Kleine-König <ukleinek@debian.org>
2018-01-05fix logic inversion in package selectionHelmut Grohne
We want the package with the highest version, not the lowest. Reported-By: Uwe Kleine-König <ukleinek@debian.org>
2017-12-21multiarchanalyze: opportunistically emit a version when uniqueHelmut Grohne
2017-03-05multiarchimport: python 3 forward compatibilityHelmut Grohne
2017-03-04multiarchanalyze: detect some form wrong M-A:foreignHelmut Grohne
When an arch:any package ships a .so file in a public library search path (e.g. a symlink as many lib*-dev packages do) it most likely shouldn't be M-A:foreign. A common exception is plugins loaded into programs, so exclude that case. Many thanks to Johannes Schauer and Guillem Jover for helping discover this pattern of Multi-Arch: foreign abuse.
2016-08-07multiarchanalyze: make it easily consumable by tracker.d.oHelmut Grohne
Many thanks to Paul Wise for his detailed feedback on the data format.
2016-06-12multiarchanalyze: speed up on sqlite3 3.8.7.1Helmut Grohne
Since all users of archdepcandidate run the results through "exists()" or "group by", "union" vs "union all" does not make any difference to the results. On the performance side however, it avoids a b-tree merge getting the maforeign_candidate query down from hours to seconds.
2016-06-10add a separate tool for generating hints on Multi-Arch headersHelmut Grohne
It builds on the core functionality of dedup, but uses a different database schema. Unlike dedup, it aborts downloading Arch:all packages early and consumes any other architecture in its entirety instead.
2016-06-09DecompressedStream: fix decompression without flushHelmut Grohne
In Python 3.x, lzma.LZMADecompressor doesn't have a flush method.
2016-06-09autoimport: fix hash checkHelmut Grohne
Fixes: 2f12a6e2f426 ("autoimport: add option to skip hash checking")
2016-05-25autoimport: improve fetching package listsHelmut Grohne
Moving the fetching part into dedup.utils. Instead of hard coding the gzip compressed copy, try xz, gz and plain in that order. Also take care to actually close the connection.
2016-05-24use urlopen from urllib2 on py2Helmut Grohne
This causes non-successful fetches to result in HTTPErrors like it does in py3 already.
2016-05-23move dedup.debpkg.process_control back into importpkgHelmut Grohne
After all, it isn't that generic. It knows what information is necessary for running dedup. Thus it really belongs to the extractor subclass. By building on handle_control_info, not that much parsing logic is left in the extractor subclass.
2016-05-23DebExtractor: implement parsing of control.tarHelmut Grohne
2016-05-23importpkg: fix --hash broken in previous commitHelmut Grohne
2016-05-23remove curl dependencyHelmut Grohne
Teach importpkg how to download urls using urlopen and thus remove the need for invoking curl.
2016-05-23autoimport: add option to skip hash checkingHelmut Grohne
For variations of dedup, that do not consume the data.tar member, this option can save significant bandwidth.
2016-05-22autoimport: stream package list and use generic decompressorHelmut Grohne
* streaming means that we do not need to hold the entire package list in memory (but the pkgs dict will become large anyway). * The decompress utility allows easily switching to e.g. xz which is the only compression format for the dbgsym suites.
2016-05-22DecompressedStream: implement readlineHelmut Grohne
Iteration over file-like is required by deb822.Packages.iter_paragraphs.
2016-05-21move from deprecated optparse to argparseHelmut Grohne
2016-05-05treat Pre-Depends like regular DependsHelmut Grohne
The former behaviour was ignoring them. The intended use for dedup is to know whenever a package unconditionally requires another package.
2016-05-01push more functionality into DebExtractorHelmut Grohne
The handle_ar_member and handle_ar_end methods now have a default implementation adding further handlers handle_debversion, handle_control_tar and handle_data_tar. In that process two additional bugs were fixed: * decompress_tar was wrongly passing errors="surrogateescape" for Python 2.x even though that's only supported for Python 3.x. * The use of decompress actually passes the extension as unicode.
2016-05-01use same Python version for autoimport and importpkgHelmut Grohne
The autoimport tool runs the Python interpreter explicitly. Instead of invoking just "python" and thus calling whatever the current default is, use sys.executable which is the interpreter used to run autoimport, thus locking both to the same Python version.
2016-04-28support Python 3.x in importpkgHelmut Grohne
In Python 2.x, TarInfo.name is a bytes object. In Python 3.x, TarInfo.name always is a unicode object. To avoid importpkg crashing with an exception, we direct the Python 3.x decoding to use surrogateescapes. Thus decoding the name boils down to checking whether it contains surrogates.
2016-04-28decouple a function decompress out of decompress_tarHelmut Grohne
Building on the previous commit, add a decompress function that turns a compressed filelike into a decompressed filelike. Use it to decouple the decompression step.
2016-04-28extend functionality of DecompressedStreamHelmut Grohne
It now supports: * tell() * seek(absolute_position), forward only * close() * closed This is sufficient for putting it as a fileobj into tarfile.TarFile. By doing so we can decouple decompression from tar processing, which eases papering over the Python 2.x vs Python 3.x differences.
2016-04-21importpkg: move the hash function list to the extractor classHelmut Grohne
They really are an aspect of the particular extractor and can easily be changed by subclassing.
2016-04-19add a class DebExtractor for guiding feature extractionHelmut Grohne
It is supposed to separate the parsing of Debian packages (understanding how the format works) from the actual feature extraction. Its goal is to simplify writing custom extractors for different feature sets.
2016-04-16add a validate method to HashedStreamHelmut Grohne
2016-04-16importpkg: use yaml dumper directlyHelmut Grohne
Instead of carefully crafting an iterator to pass to yaml.safe_dump_all, we simply take control on our own and call represent on a yaml dumper object where needed.
2016-04-16importpkg: refactor commit handling out of process_package*Helmut Grohne
2016-04-08urlopen moved from urllib to urllib.request in py3kHelmut Grohne
2015-04-16process_control: do not encode to asciiHelmut Grohne
Otherwise the yaml will contain binary strings on py3k which end up as binary data in the sqlite database. In py2, yaml can handle those unicode objects just fine.
2015-04-16tempfile.mkdtemp does not like bytes in py3kHelmut Grohne
2015-04-16unquote moved from urllib to urllib.parse in py3kHelmut Grohne
2015-04-16element access on bytes yields int in py3kHelmut Grohne
2015-04-16zlib.crc32 behaves inconsistently on py2 vs py3Helmut Grohne
zlib.crc32 returns a int32_t on py2 and a uint32_t on py3.
2015-04-16there is no itertools.imap in py3kHelmut Grohne
2015-04-16use binary stdin on py3kHelmut Grohne
2015-04-16distinguish bytes from unicode for py3kHelmut Grohne
2014-07-23importpkg: be more liberal in control file namingHelmut Grohne
While in current sid packages the control file in control.tar is always named "./control", some older packages name it "control".
2014-06-14improve schema documentationHelmut Grohne
wording, more NOT NULLs, some more explanations
2014-06-14add documentation to schema.sqlHelmut Grohne
Thanks to Peter Palfrader for explaining what information is needed and reviewing the documentation.
2014-05-11update copyright informationHelmut Grohne