Database Architects: TPC-H

Thursday, May 22, 2014

Using Benchmarks in Research Papers

Many, if not most research papers want to include experimental results, and due to the lack of customer data researchers tend to use well known benchmarks. Which is fine in principle, as these benchmarks are well known and well specified, so using them makes sense.

However, following a benchmark to the letter, as would be needed for an audited result, is hard, and thus most research papers deviate more or less from the official benchmark. For example, most TPC-C results in research papers ignore the (mandatory) client wait time. Which is strictly speaking not allowed, but usually accepted in research papers, as it "only" affects the space requirements, and has hopefully little effect on the results otherwise. Some deviations like ignoring warehouse-crossing transactions are more dangerous, as these can suddenly have a large impact on transaction rates:

TPC-C rates, taken from ICDE14

There might be good reasons for deviating from the official benchmarks. For example implementing client wait times correctly in TPC-C means that the allowed number of transactions per second is bounded by the number of warehouses in the database. For 300,000 transactions per second we would need ca. 4,500,000 warehouses, which would need ca. 1.3PB of space. This is far beyond the size of main memory, and probably no OLTP system on this planet has that size, which means that TPC-C with wait times is not suitable for main-memory database.
But then a paper has to acknowledge that explicitly. Deviating for good reasons is fine, if that deviation is clearly and easily visible in the text and well justified.

Of course all research papers are somewhat sloppy. Hardly anybody cares about overflow checking in arithmetics, for example. Which gives a certain bias in comparisons, as commercial systems, and even a few research system, will do these checks, but these are usually small deviations. What is more critical is if somebody implements something that has little resemblance to the original benchmark, but then does not state that in the experiments.

This paper for example studies fast in-memory OLAP processing. Which is fine as a paper topic. But then they claim to run TPC-H in the experiments, which is an ad-hoc benchmark that explicitly forbids most performance tricks. For example most non-key indexes are forbidden, materialization is forbidden, exploiting domain knowledge is forbidden, etc. And they get excellent performance numbers in their experiment, easily beating the official TPC-H champion VectorWise:

Excerpt from ICDE13

But even though they claim to show TPC-H results, they are not really running TPC-H. They have precomputed the join partners, they use inverted indices, they use all kinds of tricks to be fast. Unfortunately most of these tricks are explicitly forbidden in TPC-H. Not to mention the update problem, the real TPC-H benchmark contains updates, too, which would probably make some of the data structures expensive to maintain, but which the paper conveniently ignores.
And then the experiments compare their system on their machine with VectorWise numbers from a completely different machine, scaling them using SPEC rates. Such a scaling does not make sense, in particular since VectorWise is freely available.

These kinds of problems are common in research papers, and not limited to the somewhat arbitrary examples mentioned here, but they greatly devalue the experimental results. Experiments must be reproducible, they must be fair, and they must provide insightful results. Deviating from benchmarks is ok if the deviation is well justified, if the evaluation description makes this very clear and explicit, and if the deviation affects all contenders. Comparing a rigged implementation to a faithful one is not very useful.

Thursday, May 15, 2014

Trying out HyPer

At TUM we have built a very fast main-memory database system named HyPer. It offers fairly complete SQL92 support plus some SQL99 features, and is much faster than "traditional" database systems.

The easiest way to play with it is the online demo. It provides you with an easy to use interface for entering queries, running them, and inspecting the execution plan. All queries are evaluated against a SF1 TPC-H database which contains roughly 1GB of data.


HyPer web interface

The web interface is easy to use and requires no setup, but it runs on a fairly weak server machine and queries only 1GB of data. For larger experiments you might be interested in a local installation of HyPer. For that, download the hyper demo, unpack it with tar xvfJ, and try out one of the included demo scripts (or your own queries, of course).

For TPC-C experiments you can use the demo-tpcc script:

 neumann@tester:~/hyperdemo$ ./demo-tpcc   
 Executing scripts...  
 2044ms  
 Loading...  
 5598ms (722MB)  
 OLTP...  
  wallclock    total  primary    tps   time    mem  
 --------------------------------------------------------------  
         0s   100000   52946  119904   834ms   19MB  
         1s   200000   55108  124843   801ms   31MB  
         2s   300000   52573  120192   832ms   142MB  
         3s   400000   54219  122549   816ms   179MB  
         4s   500000   54896  124533   803ms   199MB  
         4s   600000   54576  123609   809ms   217MB  
         5s   700000   54812  124069   806ms   230MB  
         6s   800000   54391  123609   809ms   236MB  
         7s   900000   52794  119617   836ms   329MB  
         8s   1000000  53557  122100   819ms   347MB  
 --------------------------------------------------------------  
         8s   1000000  53914  122339  8174ms   347MB

TPC-C run

In the example run above we get 122,339 transactions per second (53,913 neworder transactions per second), and grow the database by 347MB.

For TPC-H experiments you can use the demo-tpch script. Note that the demo does not include the TPC-H data itself, it has to be generated by using the official dbgen tool (in ../tpch). Sample run:

 neumann@tester:~/hyperdemo$ ./demo-tpch   
 Executing scripts...
 6792ms
 Writing database state to tpch.dump
 10287ms
 TPC-H is now available in tpch.dump. Access with ./bin/sql tpch.dump

 Running all 22 TPC-H queries in sequence
 query 1: 23ms
 query 2: 3ms
 query 3: 29ms
 query 4: 13ms
 query 5: 7ms
 query 6: 7ms
 query 7: 18ms
 query 8: 10ms
 query 9: 100ms
 query 10: 27ms
 query 11: 7ms
 query 12: 13ms
 query 13: 57ms
 query 14: 8ms
 query 15: 9ms
 query 16: 116ms
 query 17: 14ms
 query 18: 71ms
 query 19: 35ms
 query 20: 9ms
 query 21: 31ms
 query 22: 8ms

TPC-H run

As indicated in the output the database is now available as a dump, so you can use the command line interface for experiments:

`neumann@tester:~/hyperdemo$ ./sql tpch.dump > select count(*) from lineitem; 1 6001215 1 row (0.00176411 s) > \q`
command-line interface

This should get you started with trying out HyPer. Feel free to contact us if you encounter any issues, or if you get performance numbers that are significantly different from what is shown here.