And usually, incorrect results are unacceptable. Or at least, should be unacceptable. It is very tempting for a prototype system to ignore correctness issues and just aim for maximum performance. But once money is associated with the data, wrong results are very problematic.
A classical example of this is the representation of numerical values in databases. It is tempting to just use doubles for everything. Doubles are well supported by hardware, we can use SIMD instructions with them, and they are often the fastest way to represent non-integer values. On the other hand doubles can get into rounding issues very quickly. This can be seen by when performing the computation 0.1+0.2-0.3:
select ceil((0.1+0.2)-0.3)
-> 0.0
select ceil((0.1::double precision+0.2::double precision)
-0.3::double precision)
-> 1.0
When using a correct NUMERIC implementation (e.g., HyPer or PostgreSQL) we get the correct result of 0.1+0.2-0.3=0. When using DOUBLE PRECISION we get a non-zero result, which is here rounded to 1. Unfortunately some systems like SQLite (and a lot of research systems) do not both to implement NUMERIC correctly, and always use doubles, which leads to wrong results even for the first query.
Implementing NUMERIC correctly means implementing fixed-point arithmetic. The number is usually represented as integer value plus corresponding position of the decimal point, and all operations are then mapped to integer operations. For addition and subtraction that is reasonable easy (as long as both arguments have the decimal point at the same position), but division for example is more complex even if we ignore the decimal point:
int64_t div(int64_t a,int64_t b)
{
if (!b) // avoid hardware trap due to division by zero
throw DivBy0();
if (b==-1) // avoid hardware trap due to int64min/-1
return sub(0,a);
return a/b;
}
Note that the sub function in there is non-trivial, as we will see in a moment. Plus the extra code needed for handling the decimal point, plus the code needed to handle rounding. A fixed point division operation easily needs 20-30 instructions, compared to a single instruction for a floating point division. This costs performance, of course, but has the undeniable advantage of producing the correct result.
Now why is the subtraction function non-trivial? Because it has to cope with underflows/overflows. And that is not only nice to have, but fundamental, because there are values that are fundamentally non-representably in our usual two's complement representation. Consider for example the following computation:
select (-9223372036854775808)/(-1)
-> div(-9223372036854775808,-1)
-> sub(0,-9223372036854775808)
-> ???
The problem here is that -9223372036854775808 has no corresponding positive number when using 64bit integers for our fixed-point values, the number is non-representable. In fact if we had executed the division without the check for -1 we would have gotten a very unpleasant hardware trap due to that. We avoid the trap by delegating to subtraction, but if we not check for underflows/overflows there we silently produce wrong results.
Checking for overflows manually is quite painful, in particular since signed overflows are undefined in C! We have to break the computation down into unsigned operations, which is both complex and slow. Fortunately recent versions of both gcc and clang added intrinsics to use the CPU flags for overflow checking, which is both much easier and much cheaper:
int64_t sub(int64_t a,int64_t b)
{
int64_t c;
if (__builtin_ssubll_overflow(a,b,&c))
throw Overflow();
return c;
}
Even when using the intrinsics the (correct) fixed-point operations are much more complex than the (inaccurate) double operations. What does that mean for performance? I show the execution time of 100,000 passes over two vectors of 1,000 values each below (i.e., 100,000*1,000 = 100,000,000 arithmetic operation of the indicated type were executed), both for double arithmetic and for fixed point arithmetic. (i7-5820K, gcc 5.2.1, -O3 -march=native)
add | sub | mul | div | |
double (unchecked) | 11ms | 11ms | 12ms | 212ms |
fixed point (unchecked) | 10ms | 10ms | 42ms | 817ms |
fixed point (checked) | 57ms | 57ms | 56ms | 912ms |
The correct arithmetic (fixed-point, with checks) is quite a bit slower than just using doubles. But that is the price we have to pay for correctness. (Update: in the original posting I forgot -march=native, adding that improved performance of the unchecked versions by another factor 2 due to AVX instructions).
Note that it would have been possible to check for overflows after double operations, too, using fetestexcept. But that is even slower than the checked fixed point arithmetic (>620ms for all cases), and it does not help with the rouding issues of floating point numbers.
So performing numerical operations correctly is difficult and expensive. But still, every system should do it! Users get very unhappy when the result is wrong, in particular if the values correspond to money. If you are building a prototype system, do not use floating point numbers, even if it is fast and tempting. Using doubles is nice for micro benchmarks, but inappropriate for real-world usage when money is involved.
I don't know if this is solely an issue with data type such as double. I think this is more of an application issue doing error estimation of math operations. For example all numerical packages use double or single, error control is an essential property of their qualities.
ReplyDeleteYou are right, but it is particular severe here. If a column is declared as, e.g., numeric(10,2), this is has a well defined meaning, and behaves as people expect when performing computations on monetary values.
DeleteBut only if implemented correctly, i.e., as fixed-point numbers. When using doubles, even simple computations can lead to deviations from what you would get when performing the computations manually.
Oh, I agree that the database providers should not use double as a fake fixed numerical type Implementation under the cover.
DeleteHowever double and single have no issue provided as is, along with decimal types.
We should have both performance and correctness from an application pov.
Another nice article, thanks Thomas! I actually didn't know GCC added intrinsics with overflow checking.
ReplyDeleteIndeed, using doubles in research papers is a common simplification (I'm guilty myself), but reality is much harder.
Things get even worse when we exceed int64 domain (roughly DECIMAL(18,0)) and start to work on 128bit or larger values.
But the best trick about checking for overflows is... knowing when not to :)
I know, VectorWise for example keeps tracks of the domains of individual attributes and avoids overflow checks if the result cannot overflow. But there are two problems with that approach: 1) maintaining the domain is problematic in the presence of high-update rates (admittedly not a use case for VectorWise), and 2) this can lead you to use larger data types than necessary. And using a 128bit representation without overflow checks will be more expensive than a 64bit representation with checks.
DeleteIt would be best if the hardware did the checking for us. An overflow will lead to a transaction abort anyway, so it does not matter if handling the overflow is expensive. But the check itself should be cheap (ideally free). So we want trapping arithmetic instructions. But it is unlikely that we will get them, the INTO instruction for example was removed from the 64bit instruction set.