~helmut/debian-dedup.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2013-07-20	another missing sqlalchemy.text wrapper	Helmut Grohne

2013-07-20	use sqlalchemy.text	Helmut Grohne
	Without using this wrapper the sql statements are not munged by sqlalchemy. Specifically paramstyle is not translated. For sqlite3 this did not matter, because it allows the changed paramstyle, but for postgres it fails without sqlalchemy.text wrappers.
2013-07-17	Merge branch master into sqlalchemy	Helmut Grohne
	This basically pulls the packageid branch into sqlalchemy. The merge was complex, because many sql statements diverged. The merge brings us one step closer to supporting postgres, because an "INSERT OR REPLACE" was removed from readyaml.py in the packageid branch. Conflicts: update_sharing.py webapp.py
2013-07-10	use sqlalchemy paramstyle	Helmut Grohne
	By using the :name syntax inside sql statements, sqlalchemy will replace the contents with whatever paramstyle the underlying dbapi2 module needs. In case of psycopg2 the paramstyle is not qmark for instance.
2013-07-10	schema: reference package table by integer key	Helmut Grohne
	One approach to improve performance is to reduce the database size. A package name takes up 15 bytes in average. A number of a package takes up two bytes. Multiply that difference with the number of references and it should be noticeably. A small test set show a reduction by 10%.
2013-06-23	update_sharing: postgres does not support "INSERT OR IGNORE"	Helmut Grohne

2013-06-23	dedup.utils: add enbale_sqlite_foreign_keys helper	Helmut Grohne
	Makes usage of sqlalchemy easier, cause I can invoke it once and it works for all connections.
2013-06-23	port update_sharing.py to sqlalchemy	Helmut Grohne

2013-04-24	implement the /compare/pkg1/pkg2 page differently	Helmut Grohne
	The original version had two major drawbacks: 1) The SQL query used would cause a btree sort, so the time waiting for the first output was rather long. 2) For packages with many equal files, the output would grow with O(n^2). Thanks to the suggestions by Christine Grohne and Klaus Aehlig. The approach now groups files in package1 by their main hash value (sha512). It also does some work SQL was designed to solve manually now. To speed up page generation a new caching table was added identifying which files have corresponding shared files.
2013-03-09	split content table to a hash table	Helmut Grohne
	In the old content table (package, filename, size) would be the same for multiple hash functions. Now the schema represents that each file has precisely one size, but multiple hashes.
2013-03-07	enable enforcing foreign keys	Helmut Grohne

2013-03-02	update_sharing: wrong database name	Helmut Grohne

2013-03-02	add sharing table	Helmut Grohne
	The sharing table is a cache for the /binary web pages. It essentially contains the numbers presented. This caching table is not automatically populated. It needs to be reconstructed after every (group of) package imports.