Bad memory handling and server stability
13th July 2009
The two graphs below (clickable) are for CPU and RAM use during a period of a program going wild between 23:17 and 23:41 (24+ minutes of server’s downtime). The program was run non-root, it just consumed all the memory it could. It was killed by kernel, so the server started responding without any interventions – which were hard to perform, because none of the services (including ssh) were responding during downtime.
If you happen to be developing a C/C++ program – do use mtrace and valgrind, those are huge helpers against the problems akin to that shown on the graphs.