Tuesday, November 08, 2005

Gaining total control of URL rewriting for ASP.NET/IIS (or mono)

More often lately I have wanted to handle "nice" URLs in my web application by forwarding those requests on to their actual URLs. ASP.NET offers HttpContext.RewritePath that you can use inside your Global class (HttpApplication descendant) to change one URL to the other.  But what about those requests that IIS never hands to ASP.NET because no .aspx or similar extension is included in the URL?  This blog discusses what I have recently learned about how to take total control over URL redirection, without any third-party software at all.

The first step is to insert some code into your Global class to check for the old URLs.  This can be done in Application_BeginRequest or Application_Error.  The latter is good for when you only want the special handling to occur after ASP.NET has already tried to find the URL being sought.  If you are expecting these redirected URLs to come in frequently, you probably will want to use Application_BeginRequest.

Suppose you wanted to change all incoming requests for /a.aspx to /b.aspx.  This is how you could do it:

protected void Application_BeginRequest(Object sender, EventArgs e) {
	if (Request.Path == "/a.aspx")
		Context.RewritePath("/b.aspx");
}

You can imagine how to create variants of this to suit your purposes. There are even HttpHandlers that you can write or download, and install in your Web.config file that allow you to declaratively list lots of substitutions using regular expressions.  One such that I know of is available from CodeProject.

For my purposes I wanted to handle only those requests that were not found already by ASP.NET, so rather than Application_BeginRequest, I implemented Application_Error, and it looked like this:

protected void Application_Error(Object sender, EventArgs e) {
	Exception ex = Server.GetLastError();
	if (ex.Message == "File does not exist." ||
		Regex.IsMatch(ex.Message, @"\AThe file .+ does not exist.\z")) {
		CheckForCustomizedStartUrl();
		return;
	}
	DatabaseInternal.Rollback();
	Log.Error("Web application error: ", ex);
}

Notice how I still handle application errors in the traditional way if my URL rewriting fails to find a match.

The shortcoming of using only this method of redirection is that you are relying on IIS calling up ASP.NET to handle the request, which it only does if it recognizes an ASP.NET extension in the URL (like a.aspx).  What if you want to handle requests made to just /a, as in http://www.mycompany.com/a?  Well you could make a directory in your web site called "a", and put a default.aspx file in it.  I personally have far too many (and end user controlled) keywords like "a" to create folders for them all.  So I had to find another solution.

First, let's prepare your web app to handle a spoofed page called 404.aspx.  This imaginary page will invoke the ASP.NET interpreter when IIS calls for it, and rewrite the URL as needed.  Add some variation of this code to your Global class.

private void CheckForCustomizedStartUrl() {
	string appPath = Request.ApplicationPath;
	if (!appPath.EndsWith("/")) appPath += "/";
	if (Request.Path == appPath + "404.aspx")
		CheckForCustomizedStartUrl(Get404RequestedPage());
	else
		CheckForCustomizedStartUrl(Request.Url.ToString());
}

private string Get404RequestedPage() {
	// Check to see if this URL is a customized start URL.
	// This is the case when the URL doesn't end with .aspx or some 
	// other extension that invokes the ASP.NET interpreter. 
	// We have IIS configured to use this 404.aspx page as a 404 page
	// so we can catch these cases.
	if (Request.QueryString[null] != null) {
		Match m = Regex.Match(Request.QueryString[null], @"^404;([^:]+:[^:]+)(:\d+)(.+)$");
		if (m.Success) {
			if (m.Groups[2].Value == ":80")
				return m.Groups[1].Value + m.Groups[3].Value; // Rip out port number if port 80
			else
				return m.Groups[1].Value + m.Groups[2].Value + m.Groups[3].Value;
		}
	}
	return null;
}

public static void CheckForCustomizedStartUrl(string url) {
	if (url == null || url.Length == 0) return;
	// Look for customized URLs coming in.
	if (url == "http://www.mycompany.com/a") {
		HttpContext.Current.RewritePath("/b");
	}
}

Obviously I have some customized code in there already that you may not need, or may need to extend.  The meat of the code is the 404 handler, and how the original URL is extracted from that request.

If you want to process site-relative URLs rather than the absolute URLs handed to you from IIS, you can use these two simple methods I wrote:

private static string MakeRelativeUrl(string url) {
	return MakeRelativeUrl(new Uri(url));
}

private static string MakeRelativeUrl(Uri uri) {
	return "~" + uri.AbsolutePath.Substring(HttpContext.Current.Request.ApplicationPath.Length);
}

Now we need to get IIS or Apache to pass all these not found requests to ASP.NET, even if those URLs don't include .aspx in them.

If you are using IIS, instruct it to send 404 error pages to your custom /404.aspx URL, by following these steps:

  1. Open up your web site Properties box in IIS
  2. Custom Errors tab
  3. Select the row starting with "404" (not the ones that say 404;2 or 404;3).
    Click Edit...
  4. Copy the existing 404 filename to the clipboard.  Open the file in Visual Studio.  It will look like a 404 page your browser displays when you hit a non-existant page.  You'll use this later, so keep this file open.
  5. Back in the "Edit Custom Error Propeties" box, change "Message type" from "File" to "URL"
  6. In the URL box, type in "/404.aspx".  Note the .aspx ending, which will force IIS to start ASP.NET even on those URLs that IIS cannot find on the hard disk.
  7. Click OK on each box until you return to IIS.

If you are running mod_mono and Apache together, getting total control is as easy as "SetHandler mono" in your configuration file.  Suddenly all requests to that virtual host go through the mod_mono interpreter, even .gif and other static files.  When this is done, you can just stick your code into your Global.BeginRequest method and have everything right there.  You don't even need to handle 404's.

I hope you find this useful.  It took several hours of research on my part, and thanks to other web resources I was able to figure it out.  I brought it all together here so you could learn it all at once.

No comments:

Post a Comment