How the wpd.mx shortlink service works

The domain wpd.mx provides a shortlink service for the WebPlatform Docs. It features a lookup table for defining explicit mappings (like wpd.mx/tasks -> http://docs.webplatform.org/wiki/WPD:Most_Wanted_Tasks). Everything not found in the lookup table is passed to the wiki, which in turn displays the article (if it exists) or shows search results.

The cool part is, that the lookup table is hosted as a GitHub repository, so everyone (with the appropriate rights) can modify the links! I will explain here how it does its thing.

Everything is achieved using Apache rewrite rules, that’s all. :) Here are the bits and pieces:

 

RewriteMaps

Apache’s mod_rewrite has the concept of a RewriteMap, this is a file consisting of key-value pairs. The map can be in different formats, from dbm to txt, even an external program can be called. Obviously I went with a txt map, for easy editing.

So the first step was to define the map(s):

Note that you can’t put the RewriteMap directive in a .htaccess file, only in the server or a virtual host config! tolower is an internal map, used for converting a string to lower-case (surprise surprise). I put it in there so /tasks and /Tasks are both found in the lookup table under “tasks” (the RewriteMap lookup is case-sensitive).

 

RewriteRules

These directives can go into a .htaccess file or vhost config. The obvious one:

The first RewriteRule matches, if no input was given (wpd.mx/ was requested) and redirects to the main page:

The flags stand for Redirect (send a HTTP redirect instead of rewriting the request internally, default is 302 aka temporary) and Last (no other rules are processed, if this one matches).

Next up is the map lookup:

Alright, some explanation is needed, I guess. :) The first RewriteCond simply transforms the input to lower-case. $1 is a reference to the regex capture group of the RewriteRule, but how does that work if the rule is defined after the condition? Well, basically Apache first looks at RewriteRules to determine possible candidates, so it already parsed and applied the RewriteRule’s regex pattern. $n refers to capture groups from RewriteRules and %n to those from RewriteConds, btw. The first condition matches “at least one character” (.+), which is also the regex for the RewriteRule (“not at least one character” is already handled by the first RewriteRule).

The next RewriteCond takes that lowercase input (%1) and performs a map lookup. All map lookups look like ${map:key} or ${map:key|default}, btw. The result, which is the URL to be redirected to, is (again) stored in a capture group. Notice that if a key isn’t found in a map, an empty string is returned (or default, if specified). That’s why the second condition only matches if something is found (.+ aka “at least one character”).

Finally, the RewriteRule redirects to the value found in the map. The flag NE stands for No Escape, so the value from the map is passed as-is to the client. This is needed if your values include #, as it would be escaped otherwise. That also means you need to take care of escaping your map-values where needed yourself.

There’s a pitfall: the %1 in line 3 doesn’t contain the capture from line 1, but the one from line 2!

If a request has made it past RewriteRule #2 that means it’s not in the map, so redirect it to the wiki search:

This last RewriteRule just redirects to the wiki search, which takes care of the rest. It gets passed some namespaces to search in (the main namespace and WPD:) and because we pass the go parameter it will either display an article (if one’s title matches $1) or show search results for $1.

Combining everything, here are some examples:

 

Updating the map

The last piece is updating the local map with the one from the repo. This is just an hourly cronjob that pulls the raw file from GitHub. That one-liner is left as an exercise for the reader. ;)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">