~helmut/debian-dedup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2021-12-29	webapp: improve performance	Helmut Grohne
	html_response expects a str-generator, but when we call the render method, we receive a plain str. It can be iterated - one character at a time. That's what encode_and_buffer will do in this case. So better stream all the time.
2021-12-29	webapp: forward compatibility with newer werkzeug	Helmut Grohne

2020-02-16	drop support for Python 2.x	Helmut Grohne

2016-05-21	move from deprecated optparse to argparse	Helmut Grohne

2014-05-11	webapp: allow git-like hash truncation	Helmut Grohne

2014-02-23	spell check comments	Helmut Grohne

2014-02-23	webapp: fix eqclass usage in package comparison	Helmut Grohne
	When comparing two packages, objects would be considered duplicates without considering whether the respective hash functions are comparable by checking their equivalence classes. The current set of hash functions does not expose this bug.
2013-09-11	webapp: open cursors less often	Helmut Grohne
	On the main instance opening cursors equals initiating a connection. Unfortunately sqlite3.Connection.close does not close filedescriptors. So just open less cursors to leak filedescriptors less often.
2013-09-10	webapp: close database cursors	Helmut Grohne
	Leaking them can result in running out of available filedescriptors.
2013-09-04	webapp: serve static files from /static	Helmut Grohne

2013-09-02	add option -d --database for db path to all scripts	Helmut Grohne

2013-08-02	model comparability as an equivalence relation	Helmut Grohne
	webapp has had a relation hash_functions, that modeled "comparable functions". Images should not be compares to other files, since it makes no sense to store them as the RGBA stream, that is being hashed. This comparability property resembles an equivalence relation. So the function table gains a column eqclass. Each class is represented by a number and functions are statically assigned to these classes. Now the filtering happens in SQL instead of Python.
2013-08-01	support hashing gif images	Helmut Grohne
	* Rename "image_sha512" to "png_sha512". * dedup.image.ImageHash is now a base class for image hashes such as PNGHash and GIFHash. * Enable both hashes in importpkg. * Fix README. * Add new hash combinations to webapp. * Add "gif file not named .gif" to issues in update_sharing. Add redirect for "image_sha512" to webapp for backwards compatibility.
2013-07-27	also move the static directory into the dedup package	Helmut Grohne

2013-07-27	move templates to dedup package	Helmut Grohne
	They cluttered webapp.py and now vim can give proper highlighting for the templates.
2013-07-26	Merge branch functionid	Helmut Grohne
	Actual savings on the full data set are around 7%. Conflicts: README
2013-07-25	display "issues" with files in package view	Helmut Grohne
	Currently this is invalid .gz files and png files not named .png.
2013-07-23	webapp: make html for index valid	Helmut Grohne

2013-07-23	webapp: remove unused function	Helmut Grohne

2013-07-23	schema: reference hash functions by integer key	Helmut Grohne
	This already worked quite well for package.id. On a test data set of 5% size this transformation reduces the database size by about 4%.
2013-07-10	schema: reference package table by integer key	Helmut Grohne
	One approach to improve performance is to reduce the database size. A package name takes up 15 bytes in average. A number of a package takes up two bytes. Multiply that difference with the number of references and it should be noticeably. A small test set show a reduction by 10%.
2013-06-19	webapp: fix hash example link after git upload	Helmut Grohne
	The git binary changed and so did its hash. Choosing a more stable example now: The GPL-3.
2013-05-09	webapp: enrich comparison page with version info	Helmut Grohne

2013-05-05	webapp: markup error in /source template	Helmut Grohne

2013-05-02	webapp: use jinja's filesizeformat	Helmut Grohne
	Except it doesn't work, so replace it with our version. At least we might be able to drop this code in a future update.
2013-05-02	webapp: reduce size of comparison output	Helmut Grohne
	Only add rowspan when it carries a meaning.
2013-04-27	webapp: add a css class binary-package	Helmut Grohne

2013-04-25	webapp: total_size is None if num_files is 0	Helmut Grohne

2013-04-25	webapp: turn the <br> after filename into a style	Helmut Grohne

2013-04-25	move css to /style.css	Helmut Grohne

2013-04-25	webapp: make filenames css styleable	Helmut Grohne

2013-04-25	webapp: top-align fields in /compare pages	Helmut Grohne
	Suggested by Paul Wise.
2013-04-24	implement the /compare/pkg1/pkg2 page differently	Helmut Grohne
	The original version had two major drawbacks: 1) The SQL query used would cause a btree sort, so the time waiting for the first output was rather long. 2) For packages with many equal files, the output would grow with O(n^2). Thanks to the suggestions by Christine Grohne and Klaus Aehlig. The approach now groups files in package1 by their main hash value (sha512). It also does some work SQL was designed to solve manually now. To speed up page generation a new caching table was added identifying which files have corresponding shared files.
2013-04-14	webapp: added some useful notes	Helmut Grohne

2013-03-26	webapp: fix problem from the previous merge	Helmut Grohne

2013-03-26	Merge branch schemachange	Helmut Grohne

2013-03-20	webapp: report correct sizes	Helmut Grohne

2013-03-20	webapp: remove broken assert	Helmut Grohne
	Fails on long inputs.
2013-03-09	split content table to a hash table	Helmut Grohne
	In the old content table (package, filename, size) would be the same for multiple hash functions. Now the schema represents that each file has precisely one size, but multiple hashes.
2013-03-09	webapp: drop unused function compute_sharedstats	Helmut Grohne
	The sharing table works great and I don't want to adapt it for the next step in the schema change.
2013-03-07	integrate the source table into the package table	Helmut Grohne

2013-03-05	webapp: added /source/<pkg> page	Helmut Grohne

2013-03-05	webapp: helper function function_combination	Helmut Grohne

2013-03-04	webapp: fix index template	Helmut Grohne
	Apparently not all browsers understand <a ... /> in all rendering modes.
2013-03-04	webapp: use caching table "shared" for /binary page	Helmut Grohne

2013-03-04	webapp: generate /comparison pages in constant-space	Helmut Grohne

2013-03-02	move fetchiter from webapp to dedup.utils	Helmut Grohne

2013-03-02	added html form to main page	Helmut Grohne
	Thanks to Jan Luehr for doing the work.
2013-02-25	webapp: open database cursor lazily	Helmut Grohne
	Makes things more correct when using Application in multiprocessing context.
2013-02-25	webapp: pass database to Application class	Helmut Grohne