Age | Commit message (Collapse) | Author |
|
|
|
|
|
Each view now has its own view function following the show_ pattern and
it accepts its parsed parameters as keyword-only arguments now.
|
|
|
|
|
|
|
|
When the decompression ratio is huge, we may be faced with a large
(multiple megabytes) bytes object. Slicing that object incurs a copy
becomes O(n^2) while appending and trimming a bytearray is much faster.
|
|
Fixes: 775bdde52ad5 ("DecompressedStream: avoid mixing types for variable data")
|
|
Again static type checking is the driver for the change here.
|
|
knownpkgvers is a dict while knownpkgs is a set. Separating them helps
static type checkers.
|
|
We now know that our parameter is a jinja2.environment.TemplateStream.
Enable buffering and accumulate via an io.BytesIO to avoid O(n^2)
append.
|
|
html_response expects a str-generator, but when we call the render
method, we receive a plain str. It can be iterated - one character at a
time. That's what encode_and_buffer will do in this case. So better
stream all the time.
|
|
|
|
|
|
The content must be bytes. Passing str silently skips the suppression.
|
|
|
|
Instead of retroactively attaching a name to an ImageHash, autogenerate
it via a property. Doing so also simplifies static type checking.
|
|
Returning the object gets us into trouble as to what precisely the
return type is at no benefit.
|
|
|
|
The local variable data can be bool or bytes. That's inconvenient for
static type checkers. Avoid doing so.
|
|
|
|
Both lzma and concurrent.futures are now part of the standard library
and solely exist as virtual packages.
|
|
|
|
|
|
|
|
|
|
This module is not used anywhere and thus its dependency on
python3-magic is not recorded in the README. It can be used to guess the
file type by looking at the contents using file magic. It is not a
typical hash function, but it can be used for repurposing dedup for
other analysers.
|
|
It wasn't copying the stored member and thus could be blacklist "wrong"
content after a copy.
|
|
The list path got inadvertently prepended to all binary package urls.
Fixes: 420804c25797 ("autoimport: improve fetching package lists")
|
|
|
|
In Python 3.x, lzma.LZMADecompressor doesn't have a flush method.
|
|
Fixes: 2f12a6e2f426 ("autoimport: add option to skip hash checking")
|
|
Moving the fetching part into dedup.utils. Instead of hard coding the
gzip compressed copy, try xz, gz and plain in that order. Also take care
to actually close the connection.
|
|
This causes non-successful fetches to result in HTTPErrors like it does
in py3 already.
|
|
After all, it isn't that generic. It knows what information is necessary
for running dedup. Thus it really belongs to the extractor subclass.
By building on handle_control_info, not that much parsing logic is left
in the extractor subclass.
|
|
|
|
|
|
Teach importpkg how to download urls using urlopen and thus remove the
need for invoking curl.
|
|
For variations of dedup, that do not consume the data.tar member, this
option can save significant bandwidth.
|
|
* streaming means that we do not need to hold the entire package list
in memory (but the pkgs dict will become large anyway).
* The decompress utility allows easily switching to e.g. xz which is
the only compression format for the dbgsym suites.
|
|
Iteration over file-like is required by deb822.Packages.iter_paragraphs.
|
|
|
|
The former behaviour was ignoring them. The intended use for dedup is to
know whenever a package unconditionally requires another package.
|
|
The handle_ar_member and handle_ar_end methods now have a default
implementation adding further handlers handle_debversion,
handle_control_tar and handle_data_tar.
In that process two additional bugs were fixed:
* decompress_tar was wrongly passing errors="surrogateescape" for
Python 2.x even though that's only supported for Python 3.x.
* The use of decompress actually passes the extension as unicode.
|
|
The autoimport tool runs the Python interpreter explicitly. Instead of
invoking just "python" and thus calling whatever the current default is,
use sys.executable which is the interpreter used to run autoimport, thus
locking both to the same Python version.
|
|
In Python 2.x, TarInfo.name is a bytes object. In Python 3.x,
TarInfo.name always is a unicode object. To avoid importpkg crashing
with an exception, we direct the Python 3.x decoding to use
surrogateescapes. Thus decoding the name boils down to checking whether
it contains surrogates.
|
|
Building on the previous commit, add a decompress function that turns a
compressed filelike into a decompressed filelike. Use it to decouple the
decompression step.
|
|
It now supports:
* tell()
* seek(absolute_position), forward only
* close()
* closed
This is sufficient for putting it as a fileobj into tarfile.TarFile. By
doing so we can decouple decompression from tar processing, which eases
papering over the Python 2.x vs Python 3.x differences.
|
|
They really are an aspect of the particular extractor and can easily be
changed by subclassing.
|
|
It is supposed to separate the parsing of Debian packages (understanding
how the format works) from the actual feature extraction. Its goal is to
simplify writing custom extractors for different feature sets.
|