Blog Archive: php

Working around Litespeed’s mod_rewrite intermittent 404 Issue

I use mod_rewrite all over the place. Who doesn’t these days? IIS 7 even has a URL rewriting module that will convert your mod_rewrite rules. But I digress.

I noticed a few months back that I’d get an occasional 404 error for a page that I know exists and is handled by a mod_rewrite rule. Within a few seconds, the access logs reported a URL matching the same rewrite rule with a 200 OK response. If I hit the URL myself after receiving notification of the failure, it would work. I concluded there’s some sort of bug in Litespeed and escalated to the Fluid Hosting support folks. They suggested that they are “pretty sure that Litespeed developers are aware of the issue” and that it is a known issue because a Google Search for “htaccess rewrite problem litespeed” yields lots of results. Not so convinced myself; most of those results are people confused about how to configure mod_rewrite. In any case, my problem is still not resolved, and I continue to receive emails when these 404s occur. Probably a few each day. Not only would I like to reduce some of the noise, but I also want to avoid losing business if someone happens across a 404 error. So, this evening, I set out to create a workaround.

The hypothesis I’m testing with this solution is that the 404 error resolves itself automatically within about a second. The gist of the solution is to add a custom 404 error handler that spits out some Javascript that retries the request and, if it receives a 200 response, replaces the body of the page with that content. After a threshold is reached, it will stop trying and display an error message.

I initially tried implementing this retry on the server side to keep the client unaware of what was going on. Although I had a way around it, the risk of infinite recursion (and consuming all the server’s threads) was greater than I wanted to accept. (I.e. 404 error handler calls the target page, which calls my 404 error handler again. Certainly adding a header or query param to the request could help avert this.)

So … I’d share the entire code for this solution but it’s embedded in all the rest of my error handling mess that I’d rather keep to myself for now. I dream of polishing off every little pet project I’ve got, but in reality, I can’t see it happening anytime soon. So, I’ll give you an outline of what I did and hopefully you can adapt the solution to work for your needs. I did this with PHP but these concepts should transfer to any language.

Configure a Handler for 404 Errors

This is as simple as adding the following line to your .htaccess file. If you don’t know what htaccess is, this solution isn’t for you … because you couldn’t have really gotten in this mess without it … unless WordPress or some other tool configured the problematic mod_rewrite for you.
ErrorDocument 404 /404.php

Determine your retry criteria

I don’t want to retry every request. I decided to restrict retries to GET requests matching a certain set of URL patterns that I know are handled by mod_rewrite rules.

function ShouldRetryRequest()
{
	$retryPatterns = array(
		"/blog\/.*/",
		"/errorSimulator/",
	);

	// Only retry GET requests. Is REQUEST_METHOD ever not set?
	if (!isset($_SERVER['REQUEST_METHOD']) || $_SERVER['REQUEST_METHOD'] != "GET") {
		return false;
	}

	// Only retry if it matches our pattern
	foreach ($retryPatterns as $pattern) {
		if (preg_match( $pattern, $_SERVER['REQUEST_URI'] )) {
			return true;
		}
	}		

	return false;
}

Return 200 OK from your error handler

When you’ve decided you want to retry the request, you’ll want to be sure you return 200 OK from your script. Many modern browsers, when they see a 404 response, will return a “friendly” error message that tries to give the user an idea of what happened but probably only confuses him further. If you’re not retrying this particular request, go ahead and return 404. Returning 200 OK ensures your content will be displayed and executed. The other advantage to returning 200 OK is that if the search engine is the “user” who happens to hit this scenario, it won’t remove the page from its index … then again, it will have totally different content associated with that page, unless it evaluates and execute the Javascript I provide a little further on, so maybe we don’t really gain anything there.

You’ll notice that my Javascript retry code expects an error condition in order to retry; once it sees a 200 OK, it will simply display that content. If that content again is our error page with the retry script, you’ll get stuck in an infinite loop. So, be sure to continue to return a 404 for requests coming from the retry script. I’m simply returning 200 OK only when the attempt query param, added by the retry script, is not present. The web server, because it has reached my script through ErrorDocument, takes care of returning the 404 when the following condition does not match.

$retry = ShouldRetryRequest();

// Original GET/POST params are not passed to the error document, so can't use $_REQUEST
if ($retry && strpos($_SERVER['REQUEST_URI'], "attempt=") === false) {
	header('HTTP/1.0 200 OK');
}

Hide your <body> at page load

I’d recommend putting display:none on your <body> tag. The code below will either replace the content if the 404 issue resolves itself, or remove the display:none to show the error text if the maximum number of retry attempts is reached.

Return the script that will do the retry

Your error document needs to return the code that will do the retry. This is the bit of code I whipped up this evening. Whereas I typically prefer to put my scripts toward the end of the document, I wanted to get the first retry fired off as quickly as possible, so I put this near the top of my <head> tag.

I decided to use jQuery to take advantage of its friendly AJAX APIs, so you’ll want to be sure to include the jQuery script ahead of this one.

var retryCount = 0;
var maxRetries = 5;
// How long to wait between retries, in milliseconds
var retryFrequencyMS = 200;

var theUrl = window.location.href;
var params = { 
	attempt: retryCount, 
	log404: false // This tells my error handler not to do what it would normally do with this particular request
};


function retry() {
	retryCount++;
	$.ajax({
	  url: theUrl,
	  data: params,
	  success: function(html) {
		var newDoc = document.open("text/html", "replace");
		newDoc.write(html);
		newDoc.close();
	  },
	  error: function(jqXHR, status) {
		if (retryCount == maxRetries) {
			params['log404'] = true;
		}
		if (retryCount <= maxRetries) {
			params['attempt'] = retryCount;
			setTimeout(retry, retryFrequencyMS);
		}
		else {
			// Ensure body has loaded
			$(document).ready(function() {
				$(document.body).show();
			});
		}
	  },
	  dataType: 'html',
	  cache:false
	});
}
retry();

Testing the retry logic

I put together a second simple PHP file that simulates the intermittent 404 error. The first time it is hit, it returns our error handling code. Subsequent requests, it will respond with a 404 header, until a threshold is reached.

Success!

Finally, hit the simulated 404 script from your browser. Change the attempt threshold from 2 to a value above your retry count to simulate the behavior in which the server persists in returning 404s for all retry attempts.

http://localhost/errorSimulator.php

So … there it is. I’ve still got to put this code into production; it will be interesting to see if the volume of 404 error reports from this issue go down. I’m also interested in how long it takes for Litespeed to recover from this issue. I’ll provide updates if I find better values for the retry interval and/or maximum number of attempts.

Let me know how this works for you, if I can answer any questions, or if you spot any errors.

An Update…

Saturday, 2:20 pm PST – So, an hour or two after implementing the solution I described above, I got a report about one of these 404 errors. The script retried the page, but the server persisted in ignoring the mod_rewrite rule. Certainly I could introduce a greater delay between retries. But then it struck me … since this issue is occurring really with rules targeting a single script, why not use my error handler to forward requests to the script internally, instead of with a HTTP request. The script exists on the file system, so fake the PATH_INFO server variable, I include the script, and I’m in business.

Will continue to keep an eye on this and see how it works out.