Debugging Faulting Application w3wp.exe Crashes
Posted on May 31, 2011 by Author
If your websites
are going down, due to your application pool failing, or being automatically shut down after a few application errors, then you have a major problem on your hands. In my case the symptom was A process serving the application pool suffered a fatal communication error with the Windows Process Activation Service, in my system logs. If you have each website running in its own application pool then you know which website is going down, but the errors being displayed give you no hints as to the cause. I recently had a website that was randomly going down. Sometimes it would go down every few days, and other times it would only crash once a month. So I finally did some digging for the cause, and found it. If you don't know where to start in debugging your crashed application pools, this might help.
Why Does W3WP.exe Crash?
In my experience crashing happens for only a few reasons. The first is an unhandled Exception which was serious enough to crash the Application Pool which is rare but does happen. The second is a recursive function or a set of functions that continue to pass the ball to one another with nothing stopping them. In my case this was the case. I had two functions that got caught in a loop and continued to run out of control until the server bottle necked due to CPU or memory usage.
How do you know when your Application Pool Crashed?
Easy check your event viewer Application Logs under Windows Logs
The Red Exclamation points are telling you there was an Application Error
But the details of the error don't help us very much. See Below
Faulting application name: w3wp.exe, version: 7.5.7601.17514, time stamp: 0x4ce7afa2
Faulting module name: KERNELBASE.dll, version: 6.1.7601.17514, time stamp: 0x4ce7c78c
Exception code: 0xe053534f
Fault offset: 0x000000000000a49d
Faulting process id: 0x%9
Faulting application start time: 0x%10
Faulting application path: %11
Faulting module path: %12
Report Id: %13
These errors are followed by a application log from Windows Error Reporting
The details of this message tell us where the system dump was stored. Take note of this, as this will help you find the dump files you need.
Fault bucket , type 0
Event Name: APPCRASH
Response: Not available
Cab Id: 0
These files may be available here:
Rechecking for solution: 0
Report Id: 215432e8-8bba-11e0-8d49-0030482e5b77
Report Status: 4
Check your System Logs
Next we go to our system errors on the event viewer. If you are running websites
on your server then your system logs are likely to be very numerous. Every time a Application Pool starts or shuts down it will make an entry in your system logs. This is nothing to be concerned about. Many websites
are set to shutdown the application pools after a website hasn't had any visitors for more than 20 minutes. This frees up the memory for other sites that do have traffic.
But if you see any entries with the Yellow Warning this is not good.
In my case it was telling which website was throwing the error and gave me the following message
A process serving application pool 'yourwebsite.com' suffered a fatal communication error with the Windows Process Activation Service. The process id was '10452'. The data field contains the error number.
Retrieving the W3WP.exe Dump Files
The one you want is the biggest one.
First if you don't have WinDbg on your system then you need it.
This program will take your dump files and give you the stack trace, which can shed some light on where the problem lies.
Unfortunately Microsoft has recently decided to put WinDbg into some huge 600MB+ ISO package. Forcing you to install the entire package to get just one little program. So after some hard searching I found the raw installers.Windows WinDbg x86 installerWindows WinDbg x64 installer
Huge Thanks to RXWEN
for keeping these files available
In my case since I am running Windows Server 2008
R2 ( only comes in 64 bit ), I installed the 64 bit version of the WinDbg
. After you install it, you will find it under Debugging Tools for Windows (x64)
. Also take note that I am running WinDbg off my production server. You would also download your Dump files to your local machine and try to debug them from there, but to keep things easy in case I ever needed to do some debugging on a Remote Desktop Connection using my cell phone I figured running them off my Production Server was ideal.
After you open the program you will click
File > Open Crash Dump
You will select the dump file which in my case found in the following directory
Now you should see a screen like this
In the command line at the bottom of the screen you will type.loadby sos mscorwks
and press enter. The result should look similar to this
Next we do this
Type the following into the command line!clrstack
and press enter. You should get something like this showing you the stack trace
What you are looking for is either a single function being called over and over, or maybe 2 or 3 functions that are running in a loop. At the top of the stack I don't see anything so I scroll down a bit.
and the cause is right before our eyes. In my case I have two custom functions that seem to be passing control back and forth with no end in sight. The first is fixquotes
, and the second is blacklistVisitor
fixquotes is a function that just cleans up strings before they are added to MySQL
queries, and blacklistVisitor is a function that I setup to blacklist IPs that were attempting something bad ( like SQL Injection, or Denial of Service Attacks ).
Unfortunately until I had a hacker attempt something bad, I had no idea that these two functions were stuck in a loop. Of course just a single instance wouldn't cause the site to go down. But if the same hacker tried over and over again this could create multiple errors and at this point IIS would assume something is wrong and shut the site down.
Conclusion to finding errors in Crashed Application Pools
Most likely you have a recursive function that is calling itself over and over, or you have several functions that are stuck in a loop. Hoping the steps I went through will help others find errors in their code.