Updated: August 13, 2010
Windows Blue Screen of Death (BSOD) is not something you want to see on your computer monitor, unless you're using a certain screensaver or testing software. But now and then, Windows users do experience the ultimate software failure case, that of the kernel itself, which results in a complete system freeze and eventually a crash.
In Linux, this kind of situation is known as kernel panic. In Windows, it is called BSOD. But it amounts to the same thing: a critical, unrecoverable exception in the core of the system, the kernel and accompanying drivers.
After doing a super-long and ultra-geeky series on Linux crash, starting with the kernel crash dump tools, continuing with setups on openSUSE and CentOS and culminating with in-depth analysis, I'd like to offer Windows users a somewhat shorter and less geeky version of BSOD analysis. There are two reasons for this: one, I cannot go as deep as I'd like to, because Windows sources are closed; two, I am not as proficient in dabbling in Windows kernel as I'm with Linux. Regardless, this tutorial will still be fairly nerdy and far beyond the requirements, needs or desires of an average user. However, if you wish to learn a little more about Windows internals and acquire new skills that should help diagnose core system problems, you've come to the right place.
Now let's begin.
Table of Contents
- Enable BSOD collection
- BSOD collection
- BSOD diagnosis
- BSOD example
- BSOD analysis
- Additional stuff
Before we dig into tech lingo, let's answer a few questions regarding BSOD.
There is no simple answer. A long-time Windows user will have by now learned that some people some to experience BSOD very frequently while others do not see them at all, whereas they are all running pretty much the same Windows.
The wide spectrum of experience stems from the fact that BSOD are usually never caused by Microsoft Windows components. While most people like to blame Windows for the crashes, they are rarely if ever caused by the operating system itself.
My personal experience is as follows:
On two Windows machines with the total uptime of 10 years, I've only encountered BSOD only twice, one on each host. As it happens, the two crashes happened less than one hour apart.
You might assume this was caused by a recent Windows update or one of the programs running. Indeed, this seems like a good lead, especially considering the two machines have nearly identical hardware and software setups. But this was no software fault.
What really happened was that the graphic cards overheated. As simple as that. It was a terribly hot day and the graphic cards exceeded their normal temperature range.
All other cases of BSOD I encountered on other machines were related to bad, malfunctioning or buggy drivers installed by either software or hardware, including the Wireless card, security programs and similar. I've never seen one of the core Windows components squeak. Just to clarify, we're talking home use here.
Now that we know what we're talking about, let's get scientific.
To make good use of the built-in system tools, you need to enable your Windows to collect crash dumps, called minidumps. This is similar to enabling LKCD or Kdump in Linux. By default, Windows kernel memory dumps are enabled, so you just need to take a look and make sure the settings are correct.
You can configure the BSOD collection by right-clicking on Computer in the Explorer menu, Properties, System, Advanced, Startup and Recovery. In the bottom half, under System failure, you need to configure the parameters.
You can choose whether to log the crashes to the system log, automatically restart the system or overwrite existing files. I'd recommend not overwriting files. Retaining information for later comparison and analysis is always a good thing.
Another thing to pay attention to is the Write debugging information dropdown box. Here, you can specify what portions of memory you want to save when the machine crashes. This is very similar to Kdump DUMPLEVEL.
You have the following options:
Small memory dump - Only the basic file containing crash information. This is of limited value, since you have no trace of the executables and DLLs loaded into the memory. On Windows XP, this file is 64K in size. On Windows 7, it's 128K.
Kernel memory dump - This will dump the portion of the memory containing the kernel only, which should be sufficient in most cases, as kernel crashes will be caused by either a kernel bug or one of the drivers.
Complete memory dump - This will dump the entire contents of the RAM.
Once you're satisfied with the settings, click OK. We can now start analyzing.
Finding the root cause to the crash many not be easy. Hardware problems can and will cause erratic and unpredicted behavior that may manifest in various symptoms, without letting you pinpoint the issue. Software problems should be easier to diagnose.
If and when BSOD strikes, your first task should be trying to isolate the problematic components and get them to trigger the BSOD again. If you can replicate the problem, you will be able to solve it.
If you're trying to troubleshoot a BSOD, you should use Driver Verifier.
Driver Verifier is a powerful tool and can do lots of stuff, like run drivers in an isolated memory pool, without sharing memory with other components, provide extreme memory pressure, validate parameters, unload checking, and more.
To start using Verifier, locate the executable and start it. On Windows XP, click Start > Run > Verifier. On Windows 7, type Verifier in the inline search box and hit Enter.
Once Verifier is started, you will need to configure it. You can create standard settings, which should work for most people, or configure custom settings, mainly useful for code developers. You can also display tasks, delete tasks or display information for currently verified drivers.
In the next menu screen, you need to choose which drivers you want to check: unsigned drivers, drivers built for older versions of Windows or all drivers. Since you do not yet know what the problem is really about, select all.
The next step is to reboot. Verifier will consume a lot of CPU and slow down the machine considerably. You may also experience additional crashes. Verifier will disable faulty drivers in between BSOD and reboots until you finally reach the desktop. You will also possibly have collected a handful of minidumps.
You can now disable Verifier. Start the application and delete the existing settings.
Furthermore, if your machine cannot boot into desktop because of Verifier, you can disable the tool by launching the Last Known Good configuration or booting into Safe mode.
OK, minidumps collected, let's analyze.
To diagnose the minidumps, you will need a number of tools. If you recall the Linux tutorials, the pre-requisite to using crash was having the kernel compiled with debug symbols and debuginfo packages installed.
Well, Windows is no different. To make a proper analysis, you will need symbols. You will need to download and install the symbols that match your Windows kernel version exactly. Otherwise, the analysis will not be accurate. Again, no different than Linux, in this regard.
I will show you later an example of this.
After the symbols are installed, you will need a tool to read and interpret the crash data. I will show you three such tools, starting with the easiest and slowly climbing up the geek hill.
WhoCrashed is a simple, effective tool that lets you find out which drivers caused the machine crash. It is very simple to use and does not require expertise, although a proper analysis does The tool requires the Windows Debugger to be installed.
If you're even semi-serious about Windows, you should have heard about Nirsoft tools, an extremely versatile collection of Windows utilities developed and maintained by Nir Sofer. In particular, we want the diagnostics tool called BlueScreenView, which is used for analyzing Windows kernel memory dumps. Best of all, this useful little tool is included in the super-powerful Swiss Army knife style toolbox called Nirlauncher, which is a collection of more than a hundred applications developed by Nir. Not without a reason does Nir's website feature on my Greatest list.
Furthermore, Nir Sofer also has a tool for initiating BSOD, so you can simulate crashes. This tool is called StartBlueScreen and is included in the Nirlauncher package. For more details about Nirlauncher, you may want to read my review of the software.
We'll talk about both these programs very soon.
Windows Debugger is a multi-purpose tool, which you can use to troubleshoot all kinds of things, including drivers, applications, and services on Windows systems.
Windows Debugger is included in the Windows SDK.
On Windows 7, when installing the Debugger, you may get a .NET Framework 4 error. You can ignore it, as long as you're not trying to work with applications developed in .NET framework. In our case, we can safely proceed.
In the installation menu, you can choose which components you want. We want the Debugging Tools for Windows, under Common Utilities.
Windows debugger does not look very interesting when launched, but it's a mighty tool that takes quite a lot of time getting used to and working with properly. It's similar to GDB in Linux, since it can be used to examine sources, attach to running processes, examine kernel dumps, and more.
To see these tools working, we need a BSOD. Running Verifier on my Windows 7 machine produced no ill effects. So we will have to try NirSoft StartBlueScreen tool, which I've mentioned earlier.
StartBlueScreen is a command line tool. It needs to be run with a number of parameters, which will in turn trigger a BSOD. To execute BSOD, you need to use the Administrator account on your Windows box.
On Windows 7, enabling the hidden administrator account might be a little tricky, but we will have a separate tutorial for that soon. In fact, doing the same thing on Windows XP is not trivial either. Again, we shall discuss this separately.
Once logged in as administrator, run StartBlueScreen from the command line. Nir Sofer lists a number of examples on his website, so we will use one of those:
StartBlueScreen.exe 0x12 0 0 0 0
This is very similar to running echo c > /proc/sysrq-trigger on Linux, with System Request enabled. Indeed, after a few seconds, you should seen the infamous BSOD:
Let the machine complete the dump. After it comes up, we can analyze the crash. You may see a message pop up after the machine reboots. There is already a hint about what happened, more details coming soon.
Let's see what each of the three tools gives us.
You get a very simple drilldown of what happened. For most people, this information is sufficient enough to get started. Knowing the name of the offending driver can help isolate the issue.
We see that the error is an unknown kernel trap caused by the nirsoftbluescreendriver.sys driver. Well, this is to be expected. I guess Nir's code is similar to my null-pointer kernel driver example.
BlueScreenView offers more detailed information. It will automatically load minidump files found in the root folder. In the top view, you will see some basic information about the crash, including the Bug Check String, which is identical to Panic String in Linux crash analysis file, and Bug Check Code, which is similar to Kernel Page Error.
In the bottom pane, you have the list of all drivers loaded in memory, with those related to crash marked in salmon - I guess that's a color name. If you want to see only the call trace for the crash process, you can change the filter in the Options menu.
And you can also load the original BSOD screen (XP style):
This can be sometimes useful. For example, take a look at the Technical Information section. You have the name of the bad driver and the memory address. To use Linux analogy once more, this is like the exception RIP in the task backtrace. Indeed, let's focus on the stack:
We have the name of the executable and the memory address. In theory, if we had the sources, we could pinpoint the exact line in code that resulted in the kernel crash. Since we do not, the best you can do is collect as much data as you can and send the information to Microsoft for further analysis. We'll talk about that soon.
In general, Microsoft will issue patches for crashes in Microsoft components, so the question is, how do you know if nkrnlpa.exe is a Microsoft component? Well, if you double click on any one entry or right-click and choose properties, you'll get detailed information.
As you can see, working with Windows crashes is not that different than working with Linux. However, you will probably want to know what happened exactly, so you will need the sources, which are not always readily available. Then again, this is not always possible on Linux either, especially if you have proprietary drivers loaded into the kernel, like Nvidia. Please note the file version - this is important when we want to use the symbols, which we will soon see in action.
Windows Debugger is the most complex and most powerful of the three tools mentioned. Before we start, you should be aware that it takes time, patience and knowledge working with the Debugger. In fact, despite my bravado, I'm fairly inexperienced with the tool, although common sense and universal knowledge when it comes to crash analysis applies well here. If you know your business down the murky trails of code in one system, you'll get fine in all others.
If you're not working as the Administrator, you will not have permissions to access the memory dumps, for obvious security reasons. You may need to copy the file away or set the correct permissions.
The first thing you need to do is load symbols. The tool may not be aware of the symbols location of the disk, as the path may not be stored in the environment variables. To emphasize the point, I'll load the crash dump without specifying the symbols.
Notice the error string: ERROR: Module loaded completed but ... This is what happens when no symbols are loaded, making the analysis rather futile. Indeed, if you run analysis, you'll get a handful of question marks, since the debugger cannot guess how the driver was mapped in memory at the time of the crash.
You can check the current symbols path by executing the .sympath command. It will be empty unless you've loaded any symbols. We will now load the symbols.
After loading symbols, you do not need to reopen the minidump file. You merely need to reload it. You can do that by checking the Reload box in the Symbol Search Path windows or run .reload in the debugger command line, marked by kd> at the bottom of the command window.
You will now see a different output:
Running analysis is done by executing !analyze -v command. The -v flag stands for verbose.
You will now see more information, including detailed strings for the crash arguments. For most people, this is way, way above their basic needs, but if you're really into controlling your system, solving problems and even helping Microsoft fix core bugs, then you will spend a few minutes running the analysis and send a crash report, if possible.
Symbols do not match the kernel!
Now, this is something that you should pay attention to. If you load the wrong symbols, the information about the crash will be wrong. Indeed, if you have downloaded symbols that are either older or newer than your kernel version, you will have a problem.
This is similar to the Linux example, of not having the debuginfo package available in the repository on openSUSE 11.2 after the kernel update. In fact, I did encounter this problem. Let's go back to symbols installation:
The symbols are for kernel 7600.16385, which, if I'm not mistaken is RTM. Notice the timestamp and the exact revision - 090713-1255. On the other hand, the Windows 7 is running a newer kernel, plus it has undergone a number of updates, which, too, could have affected the kernel version.
The version is 7600.16481. The two do not match! If you encounter a case like this and cannot download a newer, more up to date version of kernel symbols, you should contact Microsoft for support. You will most likely not have symbols for third-party drivers. You could contact third-party vendors, as well.
Now, Windows-wise, here's what you need to do to obtain the latest symbols for your operating system. After loading the crash dump in the Windows Debugger, open again the Symbol Search Path windows. In addition to the local path, we will specify an online symbols repository, which is only accessible from within the Debugger. For more information, please see this Microsoft KB article.
Specifically, you want the following:
Replace c:\symbols with the correct symbols path on your machine. The path does not need to be input using the Symbol Search Path. It can also be specified on the command line using the .sympath command. We'll discuss other Debugger commands and options very soon. And let's run the analysis again. Of course, we won't have symbols for Nirsoft driver.
To make it more fun, here's the call stack (more about that soon):
And we're good!
Luckily for you, the Windows Debugger has an extremely rich and detailed help, which should get you going in no time, provided you like this kind of stuff. And if you're familiar with Linux crash analysis, most of the stuff will be familiar.
For example, a very useful command is lm (list modules). You can run lml to get a short list of modules or lmv for a complete, verbose listing. You can also list user-land modules with the u flag or the kernel modules with the k flag.
Here's an example of lvm:
Here's an example of lml; notice that some drivers do not have symbols, namely the third-party ones, since the Windows kernel has not been compiled with these symbols, nor are they available. You may get them from the vendor.
Under the View menu, you have a handful of commands built in, so you need not hunt them on the command line. You have Watch, Locals, Registers, Memory, Call Stack, which we've seen a short while ago, and more.
For example, you may want to display the Processes and Threads.
Other commands you may want to use include !memusage and !address. The combination of commands and options we have just seen is quite similar to bt, ps and other commands used by the crash utility. The overall idea is the same.
Even if you do not have sources, you may want to see the binary coded disassembled. You may not fully understand what goes on in the code, but it could give you an indication what went wrong.
The disassembly options, as well as many others are available in the menus.
Here we go:
And you can embedded the different windows into the main interface.
This barely touches the iceberg of what Windows Debugger can do, but I guess it should be enough for most people.
If you've isolated the source of the problem, you can try several things:
Uninstall or disable bad drivers
See if this makes any difference, that is, if you can, since you may lose critical functionality. If problems persist, you may have a complex problem related to hardware.
Try updating the drivers
This might work. Head to the vendor site or Microsoft update and obtain the latest drivers for your hardware and software. Remember to backup your data and image the system, so you have a baseline to go to.
Google out the information
Always a wise move. Looking for the name of the driver or the Bug Check String could yield useful information, including workarounds for the problem. In general, someone ought to have seen or heard or experienced something similar to your issue.
As always, filter out the data carefully and with discretion. If you do decide to try some of the suggested solutions, make sure your data is safe and that you can roll back to a good, known configuration.
If you have useful crash information, you should trying sending it to the developers for analysis. This could be Microsoft or a third party developing hardware or software drivers for Microsoft Windows. See below.
I do not have anything solid here. If you have suggestions, please send them. I did try a number of Microsoft links, but they seem to be out of bounds for the casual users.
The most relevant page is oca.microsoft.com, but it seems to suffer from server-side errors. Feel free to correct me and/or send your feedback and links.
Note: Sending crash dumps is a sensitive affair! Memory dumps can contain private information, including passwords and just about anything else loaded into memory at the time of the crash. Please be aware of this before uploading or mailing your crash data.
If you're facing intermittent hardware problems, you may want to run a memory test on your machine. The most popular open-source tool is Memtest86+. The tool can be used as a standalone ISO; it also comes included with the vast majority of Linux distributions, all bootable as live CDs.
You can also use Windows Memory Diagnostic.
Furthermore, general advice listed in Linux crash also applies here!
And that would be all, gentlemen!
Nirsoft Nirlauncher (my review)
Retrieve symbols from online server live from within Windows Debugger:
Microsoft articles on kernel memory dump analysis:
Useful articles, including registry tweaks, keyboard use a-la System Request (SysRq), command line use of the Windows Debugger and batch (scripted) use.
A great article by Mark Russinovich (Sysinternals, now Wininternals):
And don't forget the built-in help in Windows Debugger! It's very thorough and detailed. Last but not the least, we go back to Internet search engines, your free best friend in all situations.
Wow, that was long - and far geekier than I've anticipated. Apparently, you can't escape super-geeko when handling kernel stuff. Nevertheless, I do hope you've enjoyed this article.
It has quite a bit of everything: with kernel memory dump setup, verification of drivers, three tools for examining the kernel crashes, including a very simple tool like WhoCrashed all the way up to the powerful Windows Debugger. We also used Nirsoft tools to both trigger and then analyze the BSOD. We made sure our machine had symbols installed. We looked more deeply into what Windows Debugger offers us, covering several commands and options. Lastly, some generic tips and a wealth of links.
I don't think you'll find that many articles as friendly as this one, especially not one written by a Linux guy, for the benefit and pleasure of Windows users. And therein lies the secret. The syntax is different, but the basic principles are identical. Once you get the hang of either Linux or Windows kernel crash analysis, you'll be far more comfortable working with the other. And that would be all. I'm spent. See you around!