Registration

Register
Login

Content

Call For Papers
Schedule
Presentations
Tutorials
BOFS

Event Info

Venue
Travel
FAQ
Link Us

Additional Info

Sponsors
Special Events Contact Us
Media
Archives
Home



Linux Kernel Debugging on Google-Sized Clusters

Martin Bligh (mbligh@mbligh.org)

This paper will discuss the difficulties and methods involved in debugging the Linux kernel. Intermittent errors that occur once every few years are hard to debug ... but a problem when running across thousands of machines simultaneously. The more we scale to very large clusters, the more reliablilty becomes critical. In such environments, many of the normal debugging luxuries are gone (like a serial console, or any physical access), and we're forced to change to a different strategy to solve thorny intermittent race conditions.

We need (and have created) powerful but lightweight kernel tracing tools that are critical for cluster debugging, but also make powerful weapons in a smaller scale enviroment, where they can help debug issues more quickly and less intrusively. Real world usage examples will be included.