Age | Commit message (Collapse) | Author |
|
This makes the sqlalchemy branch schema-compatible with master again.
The biggest change on master was the introduction of the function table.
It caused most of the conflicts. Note that webapp had one conflict not
detected by git: The selecting of issues in show_package needed
sqlalchemy conversion.
Conflicts:
README
update_sharing.py
webapp.py
|
|
* Rename "image_sha512" to "png_sha512".
* dedup.image.ImageHash is now a base class for image hashes such as
PNGHash and GIFHash.
* Enable both hashes in importpkg.
* Fix README.
* Add new hash combinations to webapp.
* Add "gif file not named *.gif" to issues in update_sharing.
* Add redirect for "image_sha512" to webapp for backwards
compatibility.
|
|
|
|
|
|
|
|
|
|
|
|
They cluttered webapp.py and now vim can give proper highlighting for
the templates.
|
|
|
|
Actual savings on the full data set are around 7%.
Conflicts:
README
|
|
Currently this is invalid .gz files and png files not named .png.
|
|
|
|
This voids the benefits of processing rows during row generation as has
been observed on postgres.
|
|
This should reduce the query bandwidth to the rdbms.
|
|
|
|
|
|
|
|
|
|
This already worked quite well for package.id. On a test data set of 5%
size this transformation reduces the database size by about 4%.
|
|
We can avoid a b-tree sort in the package comparison of the web app, if
the package index, also provides a size.
|
|
|
|
Without using this wrapper the sql statements are not munged by
sqlalchemy. Specifically paramstyle is not translated. For sqlite3 this
did not matter, because it allows the changed paramstyle, but for
postgres it fails without sqlalchemy.text wrappers.
|
|
This basically pulls the packageid branch into sqlalchemy. The merge was
complex, because many sql statements diverged. The merge brings us one
step closer to supporting postgres, because an "INSERT OR REPLACE" was
removed from readyaml.py in the packageid branch.
Conflicts:
update_sharing.py
webapp.py
|
|
|
|
|
|
|
|
By using the :name syntax inside sql statements, sqlalchemy will replace
the contents with whatever paramstyle the underlying dbapi2 module
needs. In case of psycopg2 the paramstyle is not qmark for instance.
|
|
The expression "total_size and 0" masks any positive integer to 0.
|
|
One approach to improve performance is to reduce the database size. A
package name takes up 15 bytes in average. A number of a package takes
up two bytes. Multiply that difference with the number of references and
it should be noticeably. A small test set show a reduction by 10%.
|
|
sharing_package_index is a sub-index of sharing_insert_index and
therefore unnecessary.
|
|
|
|
|
|
Makes usage of sqlalchemy easier, cause I can invoke it once and it
works for all connections.
|
|
This is necessary to avoid severe merge conflicts when converting
importpkg.py to sqlalchemy. The actual sql invocation has moved to a
different file in master.
Conflicts:
README (diverged set of dependencies)
|
|
|
|
+ Way faster on multiple cores.
+ More reliable, cause http connections do not time out when the db
blocks.
- Way slower on single core with contended io path. No clue why.
Still update_sharing.py makes up the bulk of processing time.
|
|
The git binary changed and so did its hash. Choosing a more stable
example now: The GPL-3.
|
|
* Arguably the interface is nicer.
* Actually closes connections. => wal files get deleted.
* Permits switching from sqlite to anything.
|
|
This appears to be a huge performance boost.
|
|
This gets back the original functionality of importpkg.py.
|
|
importpkg.py now emits a yaml stream instead of updating the database.
The acutual updating now happens in readyaml.py. In this process
autoimport.py was significantly reworked to import packages in parallel.
|
|
|
|
|
|
|
|
I remembered the wrong name. The logo was made by Sune Vuorela.
|
|
|
|
|
|
|
|
Authored: Cyril Brulebois
|
|
Except it doesn't work, so replace it with our version. At least we
might be able to drop this code in a future update.
|