~helmut/debian-dedup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2014-03-08	autoimport: fix --database option broken in merge	Helmut Grohne

2014-03-08	Merge branch 'master' into sqlalchemy	Helmut Grohne
	In the mean time, the master branch evolved quite a bit and the schema changed again (eqclass added to function table). The main reason for the merge is to resolve the large amounts of conflicts once, so development of the sqlalchemy branch can continue and still benefit from changes in the master branch such as schema compatibility, adapting the indent level in web app due to the use of contextlib.closing which resembles sqlalchemy's "with db.begin() as conn:". Conflicts: autoimport.py dedup/utils.py readyaml.py update_sharing.py webapp.py
2014-03-08	schema: make syntax compatible with postgres	Helmut Grohne

2014-02-23	Merge branch updatesharing-eqclass	Helmut Grohne

2014-02-23	spell check comments	Helmut Grohne

2014-02-23	fix spelling mistake	Helmut Grohne
	Reported-By: Stefan Kaltenbrunner
2014-02-23	webapp: fix eqclass usage in package comparison	Helmut Grohne
	When comparing two packages, objects would be considered duplicates without considering whether the respective hash functions are comparable by checking their equivalence classes. The current set of hash functions does not expose this bug.
2014-02-21	update_sharing: weaken assumptions about db layout	Helmut Grohne
	Hash functions are partitioned into equivalence classes. We are generally only interested in sharing among hash functions with the same equivalence class, but the algorithm would compute any sharing. While the current layout never produces the same hashes for functions in difference equivalence classes (for different output length), that may change in future. Also allow hash functions, that belong to no equivalence class at all (eqclass = NULL) as a means to add additional metadata to content without computing any sharing for it.
2014-02-19	blacklist content rather than hashes	Helmut Grohne
	Otherwise the gzip hash cannot tell the empty stream and the compressed empty stream apart.
2014-02-19	GzipDecompressor: don't treat checksum as garbage trailer	Helmut Grohne

2014-02-19	DecompressedHash should fail on trailing input	Helmut Grohne
	Otherwise all files smaller than 10 bytes are successfully hashed to the hash of the empty input when using the GzipDecompressor. Reported-By: Olly Betts
2013-10-03	work around python-debian's #670679	Helmut Grohne

2013-09-11	webapp: open cursors less often	Helmut Grohne
	On the main instance opening cursors equals initiating a connection. Unfortunately sqlite3.Connection.close does not close filedescriptors. So just open less cursors to leak filedescriptors less often.
2013-09-10	webapp: close database cursors	Helmut Grohne
	Leaking them can result in running out of available filedescriptors.
2013-09-04	webapp: serve static files from /static	Helmut Grohne

2013-09-02	add option -d --database for db path to all scripts	Helmut Grohne

2013-09-02	autoimport: avoid hard coded temporary directory	Helmut Grohne

2013-09-02	importpkg: move library-like parts to dedup.debpkg	Helmut Grohne

2013-08-19	importpkg: don't blacklist boring gzip_sha512 hashes	Helmut Grohne
	* In practise there are very few compressed files with trivial hashes. * Blacklisting these values results in false positives in the gzip issues.
2013-08-16	make debian version_compare available in sql	Helmut Grohne

2013-08-16	webapp templates: add an anchor for file issues	Helmut Grohne

2013-08-03	convert remaining code to sqlalchemy	Helmut Grohne
	No explicit "import sqlite3" left. It's still a bit rough around the corners, particularly since sqlalchemy's support for executemany is totally broken.
2013-08-02	model comparability as an equivalence relation	Helmut Grohne
	webapp has had a relation hash_functions, that modeled "comparable functions". Images should not be compares to other files, since it makes no sense to store them as the RGBA stream, that is being hashed. This comparability property resembles an equivalence relation. So the function table gains a column eqclass. Each class is represented by a number and functions are statically assigned to these classes. Now the filtering happens in SQL instead of Python.
2013-08-02	Merge branch master into sqlalchemy	Helmut Grohne
	This makes the sqlalchemy branch schema-compatible with master again. The biggest change on master was the introduction of the function table. It caused most of the conflicts. Note that webapp had one conflict not detected by git: The selecting of issues in show_package needed sqlalchemy conversion. Conflicts: README update_sharing.py webapp.py
2013-08-01	support hashing gif images	Helmut Grohne
	* Rename "image_sha512" to "png_sha512". * dedup.image.ImageHash is now a base class for image hashes such as PNGHash and GIFHash. * Enable both hashes in importpkg. * Fix README. * Add new hash combinations to webapp. * Add "gif file not named .gif" to issues in update_sharing. Add redirect for "image_sha512" to webapp for backwards compatibility.
2013-07-30	templates/binary: space between package and compare	Helmut Grohne

2013-07-30	templates: wiki.d.o redirects to https now	Helmut Grohne

2013-07-30	fix update_sharing to work after functionid merge	Helmut Grohne

2013-07-29	importpkg.py: support uncompressed data.tar	Helmut Grohne

2013-07-27	also move the static directory into the dedup package	Helmut Grohne

2013-07-27	move templates to dedup package	Helmut Grohne
	They cluttered webapp.py and now vim can give proper highlighting for the templates.
2013-07-26	verify package hashes when importing via http	Helmut Grohne

2013-07-26	Merge branch functionid	Helmut Grohne
	Actual savings on the full data set are around 7%. Conflicts: README
2013-07-25	display "issues" with files in package view	Helmut Grohne
	Currently this is invalid .gz files and png files not named .png.
2013-07-25	README: foo.PNG is also a valid png name	Helmut Grohne

2013-07-24	sqlalchemy's fetchmany defaults to being fetchall	Helmut Grohne
	This voids the benefits of processing rows during row generation as has been observed on postgres.
2013-07-24	readyaml: cache the whole function table	Helmut Grohne
	This should reduce the query bandwidth to the rdbms.
2013-07-23	webapp: make html for index valid	Helmut Grohne

2013-07-23	README: fix typo in query	Helmut Grohne

2013-07-23	webapp: remove unused function	Helmut Grohne

2013-07-23	adapt queries in README to new schema	Helmut Grohne

2013-07-23	schema: reference hash functions by integer key	Helmut Grohne
	This already worked quite well for package.id. On a test data set of 5% size this transformation reduces the database size by about 4%.
2013-07-22	schema: extend content_package_index	Helmut Grohne
	We can avoid a b-tree sort in the package comparison of the web app, if the package index, also provides a size.
2013-07-20	another missing sqlalchemy.text wrapper	Helmut Grohne

2013-07-20	use sqlalchemy.text	Helmut Grohne
	Without using this wrapper the sql statements are not munged by sqlalchemy. Specifically paramstyle is not translated. For sqlite3 this did not matter, because it allows the changed paramstyle, but for postgres it fails without sqlalchemy.text wrappers.
2013-07-17	Merge branch master into sqlalchemy	Helmut Grohne
	This basically pulls the packageid branch into sqlalchemy. The merge was complex, because many sql statements diverged. The merge brings us one step closer to supporting postgres, because an "INSERT OR REPLACE" was removed from readyaml.py in the packageid branch. Conflicts: update_sharing.py webapp.py
2013-07-15	Merge branch 'packageid'	Helmut Grohne

2013-07-12	importpkg: simplify state logic	Helmut Grohne

2013-07-12	importpkg: split process_package to process_control	Helmut Grohne

2013-07-10	use sqlalchemy paramstyle	Helmut Grohne
	By using the :name syntax inside sql statements, sqlalchemy will replace the contents with whatever paramstyle the underlying dbapi2 module needs. In case of psycopg2 the paramstyle is not qmark for instance.