summaryrefslogtreecommitdiff
path: root/schema.sql
AgeCommit message (Collapse)Author
2014-07-22Merge branch master into multiarchHelmut Grohne
Resolve accumulated conflicts. In particular webapp.py gained a few non-trivial ones, such as changes in InternalRedirect or usage of contextlib.closing. Conflicts: schema.sql webapp.py
2014-06-14improve schema documentationHelmut Grohne
wording, more NOT NULLs, some more explanations
2014-06-14add documentation to schema.sqlHelmut Grohne
Thanks to Peter Palfrader for explaining what information is needed and reviewing the documentation.
2014-03-08schema: make syntax compatible with postgresHelmut Grohne
2013-09-07record multi-arch header in package tableHelmut Grohne
2013-09-07permit multiple architectures per packageHelmut Grohne
While the importer can easily cope with this change, the web presentation still needs fixing. It works somewhat now.
2013-08-02model comparability as an equivalence relationHelmut Grohne
webapp has had a relation hash_functions, that modeled "comparable functions". Images should not be compares to other files, since it makes no sense to store them as the RGBA stream, that is being hashed. This comparability property resembles an equivalence relation. So the function table gains a column eqclass. Each class is represented by a number and functions are statically assigned to these classes. Now the filtering happens in SQL instead of Python.
2013-08-01support hashing gif imagesHelmut Grohne
* Rename "image_sha512" to "png_sha512". * dedup.image.ImageHash is now a base class for image hashes such as PNGHash and GIFHash. * Enable both hashes in importpkg. * Fix README. * Add new hash combinations to webapp. * Add "gif file not named *.gif" to issues in update_sharing. * Add redirect for "image_sha512" to webapp for backwards compatibility.
2013-07-26Merge branch functionidHelmut Grohne
Actual savings on the full data set are around 7%. Conflicts: README
2013-07-25display "issues" with files in package viewHelmut Grohne
Currently this is invalid .gz files and png files not named .png.
2013-07-23schema: reference hash functions by integer keyHelmut Grohne
This already worked quite well for package.id. On a test data set of 5% size this transformation reduces the database size by about 4%.
2013-07-22schema: extend content_package_indexHelmut Grohne
We can avoid a b-tree sort in the package comparison of the web app, if the package index, also provides a size.
2013-07-10schema: reference package table by integer keyHelmut Grohne
One approach to improve performance is to reduce the database size. A package name takes up 15 bytes in average. A number of a package takes up two bytes. Multiply that difference with the number of references and it should be noticeably. A small test set show a reduction by 10%.
2013-07-10schema.sql: drop unused indexHelmut Grohne
sharing_package_index is a sub-index of sharing_insert_index and therefore unnecessary.
2013-04-24implement the /compare/pkg1/pkg2 page differentlyHelmut Grohne
The original version had two major drawbacks: 1) The SQL query used would cause a btree sort, so the time waiting for the first output was rather long. 2) For packages with many equal files, the output would grow with O(n^2). Thanks to the suggestions by Christine Grohne and Klaus Aehlig. The approach now groups files in package1 by their main hash value (sha512). It also does some work SQL was designed to solve manually now. To speed up page generation a new caching table was added identifying which files have corresponding shared files.
2013-03-09split content table to a hash tableHelmut Grohne
In the old content table (package, filename, size) would be the same for multiple hash functions. Now the schema represents that each file has precisely one size, but multiple hashes.
2013-03-07use "ON DELETE CASCADE" clausesHelmut Grohne
2013-03-07schema.sql: remove unsatisfiable foreign keyHelmut Grohne
In the dependency table we will insert dependencies on packages which are not tracked. This happens during initial import and for virtual packages. Therefore the "required" column cannot be a foreign key.
2013-03-07schema.sql: annotat foreign keys of sharingHelmut Grohne
2013-03-07integrate the source table into the package tableHelmut Grohne
2013-03-04importpkg: record the source package relationshipHelmut Grohne
2013-03-02add sharing tableHelmut Grohne
The sharing table is a cache for the /binary web pages. It essentially contains the numbers presented. This caching table is not automatically populated. It needs to be reconstructed after every (group of) package imports.
2013-03-02move sql schema to a separate fileHelmut Grohne