~helmut/debian-dedup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2023-05-09	add type annotations to most of the codeHEAD master	Helmut Grohne

2022-01-12	webapp.py: fuse two sql queries in get_details	Helmut Grohne

2022-01-12	webapp.py: generalize url router	Helmut Grohne
	Each view now has its own view function following the show_ pattern and it accepts its parsed parameters as keyword-only arguments now.
2021-12-31	importpkg.py + readyaml.py: prefer the C libyaml implementation	Helmut Grohne

2021-12-31	dedup.utils: uninline helper function iterate_packages	Helmut Grohne

2021-12-31	webapp.py: consistently close cursors using context managers	Helmut Grohne

2021-12-30	DecompressedStream: improve performance	Helmut Grohne
	When the decompression ratio is huge, we may be faced with a large (multiple megabytes) bytes object. Slicing that object incurs a copy becomes O(n^2) while appending and trimming a bytearray is much faster.
2021-12-29	DecompressedStream: fix endless loop	Helmut Grohne
	Fixes: 775bdde52ad5 ("DecompressedStream: avoid mixing types for variable data")
2021-12-29	webapp: avoid changing variable type	Helmut Grohne
	Again static type checking is the driver for the change here.
2021-12-29	autoimport: avoid changing variable type	Helmut Grohne
	knownpkgvers is a dict while knownpkgs is a set. Separating them helps static type checkers.
2021-12-29	webapp: speed up encode_and_buffer	Helmut Grohne
	We now know that our parameter is a jinja2.environment.TemplateStream. Enable buffering and accumulate via an io.BytesIO to avoid O(n^2) append.
2021-12-29	webapp: improve performance	Helmut Grohne
	html_response expects a str-generator, but when we call the render method, we receive a plain str. It can be iterated - one character at a time. That's what encode_and_buffer will do in this case. So better stream all the time.
2021-12-29	webapp: forward compatibility with newer werkzeug	Helmut Grohne

2021-12-29	autoimport.py: convert to use pathlib	Helmut Grohne

2021-12-29	importpkg: fix suprression of boring content	Helmut Grohne
	The content must be bytes. Passing str silently skips the suppression.
2021-12-29	DecompressedHash: also gain a name property for consistency	Helmut Grohne

2021-12-29	ImageHash: gain a name property	Helmut Grohne
	Instead of retroactively attaching a name to an ImageHash, autogenerate it via a property. Doing so also simplifies static type checking.
2021-12-29	don't return the first parameter from hash_file	Helmut Grohne
	Returning the object gets us into trouble as to what precisely the return type is at no benefit.
2021-12-29	drop unused function sql_add_version_compare	Helmut Grohne

2021-12-29	DecompressedStream: avoid mixing types for variable data	Helmut Grohne
	The local variable data can be bool or bytes. That's inconvenient for static type checkers. Avoid doing so.
2021-12-29	DecompressedStream: eliminate redundant closed field	Helmut Grohne

2020-10-25	drop obsolete python modules	Helmut Grohne
	Both lzma and concurrent.futures are now part of the standard library and solely exist as virtual packages.
2020-10-25	externalize ar parsing to arpy	Helmut Grohne

2020-10-25	use python3-pil instead of removed python3-imaging	Helmut Grohne

2020-02-16	drop support for Python 2.x	Helmut Grohne

2018-06-25	adapt to python3-magic/2:0.4.15-1 API	Helmut Grohne

2017-09-23	add module dedup.filemagic	Helmut Grohne
	This module is not used anywhere and thus its dependency on python3-magic is not recorded in the README. It can be used to guess the file type by looking at the contents using file magic. It is not a typical hash function, but it can be used for repurposing dedup for other analysers.
2017-09-13	fix HashBlacklistContent.copy	Helmut Grohne
	It wasn't copying the stored member and thus could be blacklist "wrong" content after a copy.
2016-11-13	autoimport: fix regresion in url computation	Helmut Grohne
	The list path got inadvertently prepended to all binary package urls. Fixes: 420804c25797 ("autoimport: improve fetching package lists")
2016-07-29	repository moved	Helmut Grohne

2016-06-09	DecompressedStream: fix decompression without flush	Helmut Grohne
	In Python 3.x, lzma.LZMADecompressor doesn't have a flush method.
2016-06-09	autoimport: fix hash check	Helmut Grohne
	Fixes: 2f12a6e2f426 ("autoimport: add option to skip hash checking")
2016-05-25	autoimport: improve fetching package lists	Helmut Grohne
	Moving the fetching part into dedup.utils. Instead of hard coding the gzip compressed copy, try xz, gz and plain in that order. Also take care to actually close the connection.
2016-05-24	use urlopen from urllib2 on py2	Helmut Grohne
	This causes non-successful fetches to result in HTTPErrors like it does in py3 already.
2016-05-23	move dedup.debpkg.process_control back into importpkg	Helmut Grohne
	After all, it isn't that generic. It knows what information is necessary for running dedup. Thus it really belongs to the extractor subclass. By building on handle_control_info, not that much parsing logic is left in the extractor subclass.
2016-05-23	DebExtractor: implement parsing of control.tar	Helmut Grohne

2016-05-23	importpkg: fix --hash broken in previous commit	Helmut Grohne

2016-05-23	remove curl dependency	Helmut Grohne
	Teach importpkg how to download urls using urlopen and thus remove the need for invoking curl.
2016-05-23	autoimport: add option to skip hash checking	Helmut Grohne
	For variations of dedup, that do not consume the data.tar member, this option can save significant bandwidth.
2016-05-22	autoimport: stream package list and use generic decompressor	Helmut Grohne
	* streaming means that we do not need to hold the entire package list in memory (but the pkgs dict will become large anyway). * The decompress utility allows easily switching to e.g. xz which is the only compression format for the dbgsym suites.
2016-05-22	DecompressedStream: implement readline	Helmut Grohne
	Iteration over file-like is required by deb822.Packages.iter_paragraphs.
2016-05-21	move from deprecated optparse to argparse	Helmut Grohne

2016-05-05	treat Pre-Depends like regular Depends	Helmut Grohne
	The former behaviour was ignoring them. The intended use for dedup is to know whenever a package unconditionally requires another package.
2016-05-01	push more functionality into DebExtractor	Helmut Grohne
	The handle_ar_member and handle_ar_end methods now have a default implementation adding further handlers handle_debversion, handle_control_tar and handle_data_tar. In that process two additional bugs were fixed: * decompress_tar was wrongly passing errors="surrogateescape" for Python 2.x even though that's only supported for Python 3.x. * The use of decompress actually passes the extension as unicode.
2016-05-01	use same Python version for autoimport and importpkg	Helmut Grohne
	The autoimport tool runs the Python interpreter explicitly. Instead of invoking just "python" and thus calling whatever the current default is, use sys.executable which is the interpreter used to run autoimport, thus locking both to the same Python version.
2016-04-28	support Python 3.x in importpkg	Helmut Grohne
	In Python 2.x, TarInfo.name is a bytes object. In Python 3.x, TarInfo.name always is a unicode object. To avoid importpkg crashing with an exception, we direct the Python 3.x decoding to use surrogateescapes. Thus decoding the name boils down to checking whether it contains surrogates.
2016-04-28	decouple a function decompress out of decompress_tar	Helmut Grohne
	Building on the previous commit, add a decompress function that turns a compressed filelike into a decompressed filelike. Use it to decouple the decompression step.
2016-04-28	extend functionality of DecompressedStream	Helmut Grohne
	It now supports: * tell() * seek(absolute_position), forward only * close() * closed This is sufficient for putting it as a fileobj into tarfile.TarFile. By doing so we can decouple decompression from tar processing, which eases papering over the Python 2.x vs Python 3.x differences.
2016-04-21	importpkg: move the hash function list to the extractor class	Helmut Grohne
	They really are an aspect of the particular extractor and can easily be changed by subclassing.
2016-04-19	add a class DebExtractor for guiding feature extraction	Helmut Grohne
	It is supposed to separate the parsing of Debian packages (understanding how the format works) from the actual feature extraction. Its goal is to simplify writing custom extractors for different feature sets.