diff options
author | Helmut Grohne <helmut@subdivi.de> | 2013-07-23 23:23:41 +0200 |
---|---|---|
committer | Helmut Grohne <helmut@subdivi.de> | 2013-07-23 23:23:41 +0200 |
commit | eaba84e444c77495a5654b600c599646b8aa1aed (patch) | |
tree | ff6bc8bb15de0c3669e2a6a6ad159b39dd638594 /README | |
parent | 6206dea43941560a29c9a1105ae3055740ab80aa (diff) | |
download | debian-dedup-hashid.tar.gz |
schema: identify hash values by an integerhashid
This one is a bit more complex, than the other transformations, because
the new hashvalue table has to be cleaned with a trigger. During a test
import the -wal file exploded. The resulting db is similar in size to
the original.
Diffstat (limited to 'README')
-rw-r--r-- | README | 4 |
1 files changed, 2 insertions, 2 deletions
@@ -38,12 +38,12 @@ SQL database by hand. Here are some example queries. Finding the 100 largest files shared with multiple packages. - SELECT pa.name, a.filename, pb.name, b.filename, a.size FROM content AS a JOIN hash AS ha ON a.id = ha.cid JOIN hash AS hb ON ha.hash = hb.hash JOIN content AS b ON b.id = hb.cid JOIN package AS pa ON b.pid = pa.id JOIN package AS pb ON b.pid = pb.id WHERE (a.pid != b.pid OR a.filename != b.filename) ORDER BY a.size DESC LIMIT 100; + SELECT pa.name, a.filename, pb.name, b.filename, a.size FROM content AS a JOIN hash AS ha ON a.id = ha.cid JOIN hash AS hb ON ha.hid = hb.hid JOIN content AS b ON b.id = hb.cid JOIN package AS pa ON a.pid = pa.id JOIN package AS pb ON b.pid = pb.id WHERE (a.pid != b.pid OR a.filename != b.filename) ORDER BY a.size DESC LIMIT 100; Finding those top 100 files that save most space when being reduced to only one copy in the archive. - SELECT hash, sum(size)-min(size), count(*), count(distinct pid) FROM content JOIN hash ON content.id = hash.cid WHERE hash.function = "sha512" GROUP BY hash ORDER BY sum(size)-min(size) DESC LIMIT 100; + SELECT hashvalue.hash, sum(size)-min(size), count(*), count(distinct pid) FROM content JOIN hash ON content.id = hash.cid JOIN hashvalue ON hash.hid = hashvalue.id WHERE hash.function = "sha512" GROUP BY hash.hid ORDER BY sum(size)-min(size) DESC LIMIT 100; Finding PNG images that do not carry a .png file extension. |