CREATING BLACK HOLES: DIVISION BY ZERO IN PRACTICE
7 Comments- by:
- Sven Gregori
Dividing by zero — the fundamental no-can-do of arithmetic. It is somewhat surrounded by mystery, and is a constant source for internet humor, whether it involves exploding microcontrollers, the collapse of the universe, or crashing your own world by having Siri tell you that you have no friends.
It’s also one of the few things
gcc
will warn you about by default, which caused a rather vivid discussion with interesting insights when I recently wrote about compiler warnings. And if you’re running a modern operating system, it might even send you a signal that something’s gone wrong and let you handle it in your code. Dividing by zero is more than theoretical, and serves as a great introduction to signals, so let’s have a closer look at it.
Chances are, the first time you heard about division itself back in elementary school, it was taught that dividing by zero is strictly forbidden — and obviously you didn’t want your teacher call the cops on you, so you obeyed and refrained from it. But as with many other things in life, the older you get, the less restrictive they become, and dividing by zero eventually turned from forbidden into simply being impossible and yielding an undefined result.
And indeed, if a = b/0, it would mean in reverse that a×0 = b. If b itself was zero, the equation would be true for every single number there is, making it impossible to define a concrete value for a. And if b was any other value, no single value multiplied by zero could result in anything non-zero. Once we move into the realms of calculus, we will learn that infinity appears to be the answer, but that’s in the end just replacing one abstract, mind-boggling concept with another one. And it won’t answer one question: how does all this play out in a processor?
FLOATING POINT DIVISION BY ZERO
Let’s start with floating point numbers. Nowadays, they are usually represented and stored in memory using the IEEE 754 format, splitting the value itself into separate fields for the sign, exponent, and fraction. If we look at a
float
or double
in memory, it won’t make too much sense, as its actual value is constructed from those three separate fields. To demonstrate that, we can cast a float
into an int
and print it. Note that we have to do so by pointer conversion, otherwise we’d just end up with the integer part of the floating point number.
1
2
3
4
| float fval = 12.34f; int *iptr = ( int *) &fval; printf ( "%f -> 0x%08x\n" , fval, *iptr); // output: 12.340000 -> 0x414570a4 |
Neither
0x414570a4
nor 1095069860
would give us any hint that this is the number 12.34
. However, this form of representation leaves room for some special cases like infinity and not a number, something dividing zero by zero will result in, since that equation has no single answer and therefore cannot be represented by a regular number. That means we can just go ahead and divide by zero all we want, the IEEE 754 format has us covered.
1
2
3
4
5
6
7
| fval = 1 / 0.0f; printf ( "%f -> 0x%08x\n" , fval, *iptr); // output: inf -> 0x7f800000 fval = 0 / 0.0f; printf ( "%f -> 0x%08x\n" , fval, *iptr); // output: -nan -> 0xffc00000 |
In other words, floating point numbers have built-in mechanisms to deal with division by zero without wreaking havoc. Not that it really helps, we cannot do much with either
inf
or nan
, and arithmetic operations on infinity either remain infinity (maybe with changed signedness), or turn into nan
, which is a dead end. No arithmetic operation can turn nan
into anything else ever again.INTEGER DIVISION BY ZERO
If you tried out the previous example for yourself, you may have noticed that after all the talk about compiler warnings that led us here in the first place, you didn’t see a single one of them. Since floating point numbers have their own, well-defined way to handle division by zero, it doesn’t pose a threat that the compiler would have to warn about. This is all great, provided we have an FPU in place, or enough resources to emulate floating point operations in software. However, if we’re working with integers, we have one major problem: the concept of infinity doesn’t exist in the limited range of values we have available. Even if we ended up with a data type with an enormous amount of bits, best we could represent is a really, really large number, but never infinity. This time, the compiler will warn about an obvious attempt to divide by zero.
1
2
3
4
5
| // zerodiv.c int main( void ) { int i = 10/0; return 0; } |
1
2
3
4
5
6
| $ gcc -o zerodiv zerodiv.c zerodiv.c: In function ‘main’: zerodiv.c:3:15: warning: division by zero [-Wdiv-by-zero] int i = 10/0; ^ $ |
So what’s going to happen if we still do it? After all, it’s just a warning, we got a functional executable from the compiler, and nothing is going to stop us running it. Well, let’s do it and see what happens on x86_64:
1
2
3
| $ ./zerodiv Floating point exception (core dumped) $ |
There we go, since we cannot represent the result in any way, the program simply crashed. But what caused that crash?
In more complex processors, the instruction set offers dedicated division opcodes, and the division itself is performed in hardware. This allows the processor to detect a zero divisor before the operation itself is executed, causing a hardware exception. In our case, the operating system caught the exception and raised the floating point exception signal
SIGFPE
— don’t mind the somewhat misleading name of it. So just like with floating point numbers, the hardware division instruction has means in place to avoid dealing with actually dividing by zero. But what about processors without such dedicated hardware, like an 8-bit AVR or ARM Cortex-M0? After all, division isn’t a particularly new concept that was only made possible by modern processor technology.
If you think back to your school days, pen-and-paper division was mainly a combination of shifts, comparison, and subtraction in a loop. These are all basic instructions available in even the simplest processors, which allows them to replace division with a series of other instructions. AN0964 describes the concept for AVR if you’re curious. While there are also more sophisticated approaches, in some cases the division instruction isn’t any different behind the scenes, it just has dedicated hardware for it to wrap it in a single opcode.
However, for the processor, there is no telling that a particular series of regular operations are actually performing a division that require to keep an eye on the divisor not being zero. We’d have to manually check that ourselves before the division. If we don’t, it just keeps happily on looping and comparing for all eternity (or 1267+ years) until it finds a number that fulfills the impossible, i.e. it simply gets stuck in an infinite loop. Mechanical calculators are great devices to demonstrate this.
While this will leave the universe intact, it essentially renders your program unresponsive just like any other endless loop, which is obviously bad since it’s most likely supposed to handle other things. If you’re therefore performing division or modulo operations on an architecture without a hardware divider, and the divisor isn’t from a predefined set that guarantees it won’t be zero, make sure you check the value beforehand. Well, in fact, it’s probably a good idea even with the right hardware support. An instant crash might be better than a possibly undetected endless loop, but either way, your code won’t be doing what it’s supposed to do.
BACK TO THAT SIGFPE
One difference with receiving a hardware exception that either turns into an interrupt or a signal like
SIGFPE
, is that we can act on it. Not to go too deep into the details: signals are a way to notify our running program that some certain event has happened — segmentation fault (SIGSEGV
), program termination (SIGTERM
and the more drastic SIGKILL
), or the previously encountered floating point exception SIGFPE
, to name a few. We can catch most of these signals in our code and act upon them however needed. For example, if we catch SIGINT
, we can shut down gracefully when CTRL+C is pressed, or simply ignore it.
This means, we could catch a division by zero occurrence and for example save our current program state, or write some log file entries that lets us reproduce how we got here in the first place, which might help us avoiding the situation in the future. Well, time to write a signal handler then. Note that the code is slightly simplified, the
sigaction
man page is a good source for more information about masks and error handling.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
| // zerodiv.c #include <stdio.h> #include <signal.h> // SIGFPE callback function void sigfpe_handler( int sig, siginfo_t *si, void *arg) { // print some info and see if it was division by zero that got us here printf ( "SIGFPE received at %p due to %s\n" , si->si_addr, ((si->si_code == FPE_INTDIV) ? "div by zero" : "other reasons" )); // do whatever should be done } int main( void ) { int i = 123; struct sigaction sa; // set sigfpe_handler() as our handler function sa.sa_sigaction = sigfpe_handler; // make sure we get the siginfo_t content and reset the original handler sa.sa_flags = SA_SIGINFO | SA_RESETHAND; // set up the signal handling for SIGFPE sigaction(SIGFPE, &sa, NULL); printf ( "before: %d\n" , i); i /= 0; // doing the nasty printf ( "after: %d\n" , i); return 0; } |
If we compile it and run it, we can expect the division by zero warning, and then get an output like this:
1
2
3
4
5
| $ ./zerodiv before: 123 SIGFPE received at 0x55f2c333b208 due to div by zero Floating point exception (core dumped) $ |
Since we added the
SA_RESETHAND
flag, the program gets still terminated with the original exception. If we omit the flag, it won’t, but that doesn’t mean we would have successfully worked around the problem. On x86_64, the signal handler simply ends up in an endless loop, printing the message over and over again. We’d have to explicitly terminate the process by calling exit(int)
in the signal handler.
On a side note, it appears that the ARM Cortex-A53 processor (the one you find on a Raspberry Pi) automatically resets the exception flag once handled, and therefore the program continues after returning from the signal handler, displaying
after: 0
. This suggests that the division by zero is defined to result in zero. I did not succeed resetting the flag on x86_64, hence the endless loop, but that doesn’t necessarily mean it’s not possible, I simply wasn’t able to achieve it myself. However, there’s one other thing we can do on x86_64: skip the division.Skipping The Division
Note the third
void *arg
parameter in the signal handler callback? It will contain the context of the moment the signal was received, which gives us access to the saved registers, including the instruction pointer, which will be restored when we leave the signal handler. If we disassemble our code, we will see that our division in this particular example is a 2-byte instruction:
1
2
3
4
5
6
7
| $ objdump -d ./zerodiv |grep -2 div 125b: b9 00 00 00 00 mov $0x0,%ecx 1260: 99 cltd 1261: --> f7 f9 idiv %ecx 1263: 89 85 5c ff ff ff mov %eax,-0xa4(%rbp) 1269: 8b 85 5c ff ff ff mov -0xa4(%rbp),%eax $ |
If we increased the instruction pointer register
RIP
(or EIP
on 32-bit x86) by two bytes, the execution would continue with the mov
instruction after the idiv
when we return from the signal handler. This kind of register tweaking sounds like a really bad idea, so you probably shouldn’t be doing something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| #include <stdio.h> #define __USE_GNU // this and the file order is important to succeed #include <ucontext.h> #include <signal.h> // adjusted signal handler void sigfpe_handler( int sig, siginfo_t *si, void *arg) { // cast arg to context struct pointer holding the registers ucontext_t *ctx = arg; // print some info printf ( "SIGFPE received at %p\n" , si->si_addr); // add 2 bytes to the instruction pointer register ctx->uc_mcontext.gregs[REG_RIP] += 2; } // main() remained as it was |
1
2
3
4
5
| $ ./zerodiv before: 123 SIGFPE received at 0x555c1ef5b243 after: 123 $ |
Tadaa — the division was skipped and the program survived. The value itself remained the same as before the attempted division. Obviously, any subsequent operation relying on the division’s result will most likely be useless and/or has unknown consequences. Same case with the Raspberry Pi earlier that yielded zero as result. Just because a random outcome was defined for an otherwise undefined situation doesn’t mean the underlying math was solved and everything makes suddenly sense. Failing fast and determined is often times the better option.
So there we have it. You won’t create black holes when dividing by zero, but you won’t get much useful out of it either.
0 件のコメント:
コメントを投稿