summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-07-30fix update_sharing to work after functionid mergeHelmut Grohne
2013-07-29importpkg.py: support uncompressed data.tarHelmut Grohne
2013-07-27also move the static directory into the dedup packageHelmut Grohne
2013-07-27move templates to dedup packageHelmut Grohne
They cluttered webapp.py and now vim can give proper highlighting for the templates.
2013-07-26verify package hashes when importing via httpHelmut Grohne
2013-07-26Merge branch functionidHelmut Grohne
Actual savings on the full data set are around 7%. Conflicts: README
2013-07-25display "issues" with files in package viewHelmut Grohne
Currently this is invalid .gz files and png files not named .png.
2013-07-25README: foo.PNG is also a valid png nameHelmut Grohne
2013-07-24readyaml: cache the whole function tableHelmut Grohne
This should reduce the query bandwidth to the rdbms.
2013-07-23webapp: make html for index validHelmut Grohne
2013-07-23README: fix typo in queryHelmut Grohne
2013-07-23webapp: remove unused functionHelmut Grohne
2013-07-23adapt queries in README to new schemaHelmut Grohne
2013-07-23schema: reference hash functions by integer keyHelmut Grohne
This already worked quite well for package.id. On a test data set of 5% size this transformation reduces the database size by about 4%.
2013-07-22schema: extend content_package_indexHelmut Grohne
We can avoid a b-tree sort in the package comparison of the web app, if the package index, also provides a size.
2013-07-15Merge branch 'packageid'Helmut Grohne
2013-07-12importpkg: simplify state logicHelmut Grohne
2013-07-12importpkg: split process_package to process_controlHelmut Grohne
2013-07-10schema: reference package table by integer keyHelmut Grohne
One approach to improve performance is to reduce the database size. A package name takes up 15 bytes in average. A number of a package takes up two bytes. Multiply that difference with the number of references and it should be noticeably. A small test set show a reduction by 10%.
2013-07-10schema.sql: drop unused indexHelmut Grohne
sharing_package_index is a sub-index of sharing_insert_index and therefore unnecessary.
2013-07-03README: explain update_sharing.pyHelmut Grohne
2013-06-23Merge branch yamlimportHelmut Grohne
+ Way faster on multiple cores. + More reliable, cause http connections do not time out when the db blocks. - Way slower on single core with contended io path. No clue why. Still update_sharing.py makes up the bulk of processing time.
2013-06-19webapp: fix hash example link after git uploadHelmut Grohne
The git binary changed and so did its hash. Choosing a more stable example now: The GPL-3.
2013-06-11autoimport: don't fork for readyamlHelmut Grohne
This appears to be a huge performance boost.
2013-06-11autoimport: support processing individual filesHelmut Grohne
This gets back the original functionality of importpkg.py.
2013-06-10split the import phase to a yaml streamHelmut Grohne
importpkg.py now emits a yaml stream instead of updating the database. The acutual updating now happens in readyaml.py. In this process autoimport.py was significantly reworked to import packages in parallel.
2013-05-27dedup.image: img.convert can also raise that crazy stuffHelmut Grohne
2013-05-09webapp: declare html5 and utf-8Helmut Grohne
2013-05-09webapp: enrich comparison page with version infoHelmut Grohne
2013-05-08fix attribution of logoHelmut Grohne
I remembered the wrong name. The logo was made by Sune Vuorela.
2013-05-05webapp: markup error in /source templateHelmut Grohne
2013-05-05webapp: validator complained about <link> with sizesHelmut Grohne
2013-05-05webapp: reference favicon from base.htmlHelmut Grohne
2013-05-05added favicon.icoHelmut Grohne
Authored: Cyril Brulebois
2013-05-02webapp: use jinja's filesizeformatHelmut Grohne
Except it doesn't work, so replace it with our version. At least we might be able to drop this code in a future update.
2013-05-02webapp: reduce size of comparison outputHelmut Grohne
Only add rowspan when it carries a meaning.
2013-04-27webapp: add a css class binary-packageHelmut Grohne
2013-04-25webapp: total_size is None if num_files is 0Helmut Grohne
2013-04-25webapp: color filenames when hovering themHelmut Grohne
2013-04-25webapp: turn the <br> after filename into a styleHelmut Grohne
2013-04-25move css to /style.cssHelmut Grohne
2013-04-25webapp: make filenames css styleableHelmut Grohne
2013-04-25webapp: top-align fields in /compare pagesHelmut Grohne
Suggested by Paul Wise.
2013-04-25fix markup in base.htmlHelmut Grohne
2013-04-24implement the /compare/pkg1/pkg2 page differentlyHelmut Grohne
The original version had two major drawbacks: 1) The SQL query used would cause a btree sort, so the time waiting for the first output was rather long. 2) For packages with many equal files, the output would grow with O(n^2). Thanks to the suggestions by Christine Grohne and Klaus Aehlig. The approach now groups files in package1 by their main hash value (sha512). It also does some work SQL was designed to solve manually now. To speed up page generation a new caching table was added identifying which files have corresponding shared files.
2013-04-14webapp: added some useful notesHelmut Grohne
2013-04-13base.html: add link to wiki.debian.orgHelmut Grohne
2013-04-08README: improve query after schemachangeHelmut Grohne
2013-03-26webapp: fix problem from the previous mergeHelmut Grohne
2013-03-26Merge branch schemachangeHelmut Grohne