~helmut/debian-dedup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2013-08-02	Merge branch master into sqlalchemy	Helmut Grohne
	This makes the sqlalchemy branch schema-compatible with master again. The biggest change on master was the introduction of the function table. It caused most of the conflicts. Note that webapp had one conflict not detected by git: The selecting of issues in show_package needed sqlalchemy conversion. Conflicts: README update_sharing.py webapp.py
2013-08-01	support hashing gif images	Helmut Grohne
	* Rename "image_sha512" to "png_sha512". * dedup.image.ImageHash is now a base class for image hashes such as PNGHash and GIFHash. * Enable both hashes in importpkg. * Fix README. * Add new hash combinations to webapp. * Add "gif file not named .gif" to issues in update_sharing. Add redirect for "image_sha512" to webapp for backwards compatibility.
2013-07-30	templates/binary: space between package and compare	Helmut Grohne

2013-07-30	templates: wiki.d.o redirects to https now	Helmut Grohne

2013-07-30	fix update_sharing to work after functionid merge	Helmut Grohne

2013-07-29	importpkg.py: support uncompressed data.tar	Helmut Grohne

2013-07-27	also move the static directory into the dedup package	Helmut Grohne

2013-07-27	move templates to dedup package	Helmut Grohne
	They cluttered webapp.py and now vim can give proper highlighting for the templates.
2013-07-26	verify package hashes when importing via http	Helmut Grohne

2013-07-26	Merge branch functionid	Helmut Grohne
	Actual savings on the full data set are around 7%. Conflicts: README
2013-07-25	display "issues" with files in package view	Helmut Grohne
	Currently this is invalid .gz files and png files not named .png.
2013-07-25	README: foo.PNG is also a valid png name	Helmut Grohne

2013-07-24	sqlalchemy's fetchmany defaults to being fetchall	Helmut Grohne
	This voids the benefits of processing rows during row generation as has been observed on postgres.
2013-07-24	readyaml: cache the whole function table	Helmut Grohne
	This should reduce the query bandwidth to the rdbms.
2013-07-23	webapp: make html for index valid	Helmut Grohne

2013-07-23	README: fix typo in query	Helmut Grohne

2013-07-23	webapp: remove unused function	Helmut Grohne

2013-07-23	adapt queries in README to new schema	Helmut Grohne

2013-07-23	schema: reference hash functions by integer key	Helmut Grohne
	This already worked quite well for package.id. On a test data set of 5% size this transformation reduces the database size by about 4%.
2013-07-22	schema: extend content_package_index	Helmut Grohne
	We can avoid a b-tree sort in the package comparison of the web app, if the package index, also provides a size.
2013-07-20	another missing sqlalchemy.text wrapper	Helmut Grohne

2013-07-20	use sqlalchemy.text	Helmut Grohne
	Without using this wrapper the sql statements are not munged by sqlalchemy. Specifically paramstyle is not translated. For sqlite3 this did not matter, because it allows the changed paramstyle, but for postgres it fails without sqlalchemy.text wrappers.
2013-07-17	Merge branch master into sqlalchemy	Helmut Grohne
	This basically pulls the packageid branch into sqlalchemy. The merge was complex, because many sql statements diverged. The merge brings us one step closer to supporting postgres, because an "INSERT OR REPLACE" was removed from readyaml.py in the packageid branch. Conflicts: update_sharing.py webapp.py
2013-07-15	Merge branch 'packageid'	Helmut Grohne

2013-07-12	importpkg: simplify state logic	Helmut Grohne

2013-07-12	importpkg: split process_package to process_control	Helmut Grohne

2013-07-10	use sqlalchemy paramstyle	Helmut Grohne
	By using the :name syntax inside sql statements, sqlalchemy will replace the contents with whatever paramstyle the underlying dbapi2 module needs. In case of psycopg2 the paramstyle is not qmark for instance.
2013-07-10	webapp: fix handling of total_size	Helmut Grohne
	The expression "total_size and 0" masks any positive integer to 0.
2013-07-10	schema: reference package table by integer key	Helmut Grohne
	One approach to improve performance is to reduce the database size. A package name takes up 15 bytes in average. A number of a package takes up two bytes. Multiply that difference with the number of references and it should be noticeably. A small test set show a reduction by 10%.
2013-07-10	schema.sql: drop unused index	Helmut Grohne
	sharing_package_index is a sub-index of sharing_insert_index and therefore unnecessary.
2013-07-03	README: explain update_sharing.py	Helmut Grohne

2013-06-23	update_sharing: postgres does not support "INSERT OR IGNORE"	Helmut Grohne

2013-06-23	dedup.utils: add enbale_sqlite_foreign_keys helper	Helmut Grohne
	Makes usage of sqlalchemy easier, cause I can invoke it once and it works for all connections.
2013-06-23	Merge master into sqlalchemy	Helmut Grohne
	This is necessary to avoid severe merge conflicts when converting importpkg.py to sqlalchemy. The actual sql invocation has moved to a different file in master. Conflicts: README (diverged set of dependencies)
2013-06-23	port update_sharing.py to sqlalchemy	Helmut Grohne

2013-06-23	Merge branch yamlimport	Helmut Grohne
	+ Way faster on multiple cores. + More reliable, cause http connections do not time out when the db blocks. - Way slower on single core with contended io path. No clue why. Still update_sharing.py makes up the bulk of processing time.
2013-06-19	webapp: fix hash example link after git upload	Helmut Grohne
	The git binary changed and so did its hash. Choosing a more stable example now: The GPL-3.
2013-06-13	webapp: use sqlalchemy	Helmut Grohne
	* Arguably the interface is nicer. * Actually closes connections. => wal files get deleted. * Permits switching from sqlite to anything.
2013-06-11	autoimport: don't fork for readyaml	Helmut Grohne
	This appears to be a huge performance boost.
2013-06-11	autoimport: support processing individual files	Helmut Grohne
	This gets back the original functionality of importpkg.py.
2013-06-10	split the import phase to a yaml stream	Helmut Grohne
	importpkg.py now emits a yaml stream instead of updating the database. The acutual updating now happens in readyaml.py. In this process autoimport.py was significantly reworked to import packages in parallel.
2013-05-27	dedup.image: img.convert can also raise that crazy stuff	Helmut Grohne

2013-05-09	webapp: declare html5 and utf-8	Helmut Grohne

2013-05-09	webapp: enrich comparison page with version info	Helmut Grohne

2013-05-08	fix attribution of logo	Helmut Grohne
	I remembered the wrong name. The logo was made by Sune Vuorela.
2013-05-05	webapp: markup error in /source template	Helmut Grohne

2013-05-05	webapp: validator complained about <link> with sizes	Helmut Grohne

2013-05-05	webapp: reference favicon from base.html	Helmut Grohne

2013-05-05	added favicon.ico	Helmut Grohne
	Authored: Cyril Brulebois
2013-05-02	webapp: use jinja's filesizeformat	Helmut Grohne
	Except it doesn't work, so replace it with our version. At least we might be able to drop this code in a future update.