summaryrefslogtreecommitdiff
path: root/webapp.py
AgeCommit message (Collapse)Author
2014-03-08Merge branch 'master' into sqlalchemyHelmut Grohne
In the mean time, the master branch evolved quite a bit and the schema changed again (eqclass added to function table). The main reason for the merge is to resolve the large amounts of conflicts once, so development of the sqlalchemy branch can continue and still benefit from changes in the master branch such as schema compatibility, adapting the indent level in web app due to the use of contextlib.closing which resembles sqlalchemy's "with db.begin() as conn:". Conflicts: autoimport.py dedup/utils.py readyaml.py update_sharing.py webapp.py
2014-02-23spell check commentsHelmut Grohne
2014-02-23webapp: fix eqclass usage in package comparisonHelmut Grohne
When comparing two packages, objects would be considered duplicates without considering whether the respective hash functions are comparable by checking their equivalence classes. The current set of hash functions does not expose this bug.
2013-09-11webapp: open cursors less oftenHelmut Grohne
On the main instance opening cursors equals initiating a connection. Unfortunately sqlite3.Connection.close does not close filedescriptors. So just open less cursors to leak filedescriptors less often.
2013-09-10webapp: close database cursorsHelmut Grohne
Leaking them can result in running out of available filedescriptors.
2013-09-04webapp: serve static files from /staticHelmut Grohne
2013-09-02add option -d --database for db path to all scriptsHelmut Grohne
2013-08-02model comparability as an equivalence relationHelmut Grohne
webapp has had a relation hash_functions, that modeled "comparable functions". Images should not be compares to other files, since it makes no sense to store them as the RGBA stream, that is being hashed. This comparability property resembles an equivalence relation. So the function table gains a column eqclass. Each class is represented by a number and functions are statically assigned to these classes. Now the filtering happens in SQL instead of Python.
2013-08-02Merge branch master into sqlalchemyHelmut Grohne
This makes the sqlalchemy branch schema-compatible with master again. The biggest change on master was the introduction of the function table. It caused most of the conflicts. Note that webapp had one conflict not detected by git: The selecting of issues in show_package needed sqlalchemy conversion. Conflicts: README update_sharing.py webapp.py
2013-08-01support hashing gif imagesHelmut Grohne
* Rename "image_sha512" to "png_sha512". * dedup.image.ImageHash is now a base class for image hashes such as PNGHash and GIFHash. * Enable both hashes in importpkg. * Fix README. * Add new hash combinations to webapp. * Add "gif file not named *.gif" to issues in update_sharing. * Add redirect for "image_sha512" to webapp for backwards compatibility.
2013-07-27also move the static directory into the dedup packageHelmut Grohne
2013-07-27move templates to dedup packageHelmut Grohne
They cluttered webapp.py and now vim can give proper highlighting for the templates.
2013-07-26Merge branch functionidHelmut Grohne
Actual savings on the full data set are around 7%. Conflicts: README
2013-07-25display "issues" with files in package viewHelmut Grohne
Currently this is invalid .gz files and png files not named .png.
2013-07-23webapp: make html for index validHelmut Grohne
2013-07-23webapp: remove unused functionHelmut Grohne
2013-07-23schema: reference hash functions by integer keyHelmut Grohne
This already worked quite well for package.id. On a test data set of 5% size this transformation reduces the database size by about 4%.
2013-07-20use sqlalchemy.textHelmut Grohne
Without using this wrapper the sql statements are not munged by sqlalchemy. Specifically paramstyle is not translated. For sqlite3 this did not matter, because it allows the changed paramstyle, but for postgres it fails without sqlalchemy.text wrappers.
2013-07-17Merge branch master into sqlalchemyHelmut Grohne
This basically pulls the packageid branch into sqlalchemy. The merge was complex, because many sql statements diverged. The merge brings us one step closer to supporting postgres, because an "INSERT OR REPLACE" was removed from readyaml.py in the packageid branch. Conflicts: update_sharing.py webapp.py
2013-07-10use sqlalchemy paramstyleHelmut Grohne
By using the :name syntax inside sql statements, sqlalchemy will replace the contents with whatever paramstyle the underlying dbapi2 module needs. In case of psycopg2 the paramstyle is not qmark for instance.
2013-07-10webapp: fix handling of total_sizeHelmut Grohne
The expression "total_size and 0" masks any positive integer to 0.
2013-07-10schema: reference package table by integer keyHelmut Grohne
One approach to improve performance is to reduce the database size. A package name takes up 15 bytes in average. A number of a package takes up two bytes. Multiply that difference with the number of references and it should be noticeably. A small test set show a reduction by 10%.
2013-06-23Merge master into sqlalchemyHelmut Grohne
This is necessary to avoid severe merge conflicts when converting importpkg.py to sqlalchemy. The actual sql invocation has moved to a different file in master. Conflicts: README (diverged set of dependencies)
2013-06-19webapp: fix hash example link after git uploadHelmut Grohne
The git binary changed and so did its hash. Choosing a more stable example now: The GPL-3.
2013-06-13webapp: use sqlalchemyHelmut Grohne
* Arguably the interface is nicer. * Actually closes connections. => wal files get deleted. * Permits switching from sqlite to anything.
2013-05-09webapp: enrich comparison page with version infoHelmut Grohne
2013-05-05webapp: markup error in /source templateHelmut Grohne
2013-05-02webapp: use jinja's filesizeformatHelmut Grohne
Except it doesn't work, so replace it with our version. At least we might be able to drop this code in a future update.
2013-05-02webapp: reduce size of comparison outputHelmut Grohne
Only add rowspan when it carries a meaning.
2013-04-27webapp: add a css class binary-packageHelmut Grohne
2013-04-25webapp: total_size is None if num_files is 0Helmut Grohne
2013-04-25webapp: turn the <br> after filename into a styleHelmut Grohne
2013-04-25move css to /style.cssHelmut Grohne
2013-04-25webapp: make filenames css styleableHelmut Grohne
2013-04-25webapp: top-align fields in /compare pagesHelmut Grohne
Suggested by Paul Wise.
2013-04-24implement the /compare/pkg1/pkg2 page differentlyHelmut Grohne
The original version had two major drawbacks: 1) The SQL query used would cause a btree sort, so the time waiting for the first output was rather long. 2) For packages with many equal files, the output would grow with O(n^2). Thanks to the suggestions by Christine Grohne and Klaus Aehlig. The approach now groups files in package1 by their main hash value (sha512). It also does some work SQL was designed to solve manually now. To speed up page generation a new caching table was added identifying which files have corresponding shared files.
2013-04-14webapp: added some useful notesHelmut Grohne
2013-03-26webapp: fix problem from the previous mergeHelmut Grohne
2013-03-26Merge branch schemachangeHelmut Grohne
2013-03-20webapp: report correct sizesHelmut Grohne
2013-03-20webapp: remove broken assertHelmut Grohne
Fails on long inputs.
2013-03-09split content table to a hash tableHelmut Grohne
In the old content table (package, filename, size) would be the same for multiple hash functions. Now the schema represents that each file has precisely one size, but multiple hashes.
2013-03-09webapp: drop unused function compute_sharedstatsHelmut Grohne
The sharing table works great and I don't want to adapt it for the next step in the schema change.
2013-03-07integrate the source table into the package tableHelmut Grohne
2013-03-05webapp: added /source/<pkg> pageHelmut Grohne
2013-03-05webapp: helper function function_combinationHelmut Grohne
2013-03-04webapp: fix index templateHelmut Grohne
Apparently not all browsers understand <a ... /> in all rendering modes.
2013-03-04webapp: use caching table "shared" for /binary pageHelmut Grohne
2013-03-04webapp: generate /comparison pages in constant-spaceHelmut Grohne
2013-03-02move fetchiter from webapp to dedup.utilsHelmut Grohne