TYPO free

home

fighting for TYPO free code

Rethinking the realurl mod_rewrite rules

22 Jun 2008

TYPO3 ships with a default .htaccess file and also with a simplified version in the /misc directory. Both files contain a mod_rewrite setup to help you set up your site so your content can be reached using nice readable URI's. The default _.htaccess file that you can get from the TYPO3 dummy package has always worked for me. I have never had any trouble setting up realurl using this file and thus never gave it a lot of attention.

Last week however, I wondered about the lines in the top of the file and the lines in the bottom of the file. I also wondered about the lines in between . . . ;-). The first question that popped into my mind was why these three lines are there:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l

They test if the requested resource is not a file, not a directory and not a link. If these conditions are met the request is passed on to index.php.

I first deleted the above lines because I expeced that all static resources (file, directory and link) were already filtered out by the first rewrite rule:

RewriteRule ^(typo3|t3lib|tslib|fileadmin|typo3conf|typo3temp|uploads|showpic\.php|favicon\.ico)/ - [L]

I put them back in when Dmitry reminded me that some websites may host other content from folders in the root like /oldsite or /some_other_app. In that case the checks make sense.

After sleeping on it for a night, it occured to me that a lot of resources are indeed static:

  • fileadmin
  • typo3
  • typo3conf
  • typo3temp
  • uploads

And after mulling it over some more, I think that (for a realurl configuration) in words it comes down to:

  • Don't rewrite any static resources, just serve them
  • Rewrite everything else to index.php

 In code this boils down to:

RewriteEngine On

# Do not rewrite static resources
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d [OR]
RewriteCond %{REQUEST_FILENAME} -l
RewriteRule .* - [L]

# Rewrite the rest to index.php
RewriteRule .* /index.php [L]

This ruleset looks cleaner and works just as well. In fact, when looking at the rewrite logs, we can observe it works better than the default ruleset. I enabled rewrite logging using the following lines in the vhost configuration:

RewriteLog "/var/log/apache2/rewrite.log"
RewriteLogLevel 4

Next I hit the server with three requests and looked ad the log.

First I requested /broodjes/test.txt:

[rid#839d868/initial] (2) init rewrite engine with requested uri /broodjes/test.txt
[rid#839d868/initial] (3) applying pattern '^(typo3|t3lib|tslib|fileadmin|typo3conf|typo3temp|uploads|showpic\.php|favicon\.ico)/' to uri '/broodjes/test.txt'
[rid#839d868/initial] (3) applying pattern '^typo3$' to uri '/broodjes/test.txt'
[rid#839d868/initial] (3) applying pattern '.*' to uri '/broodjes/test.txt'
[rid#839d868/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/broodjes/test.txt' pattern='!-f' => not-matched
[rid#839d868/initial] (1) pass through /broodjes/test.txt

This shows us that the request goes through a lot of checking before the conclusion is finally drawn that it is a request for a real file. Then the request if passed through to the Apache file handler.

With the simplified .htaccess, the log looks like this:

[rid#839d340/initial] (2) init rewrite engine with requested uri /broodjes/test.txt
[rid#839d340/initial] (3) applying pattern '.*' to uri '/broodjes/test.txt'
[rid#839d340/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/broodjes/test.txt' pattern='-f' => matched
[rid#839d340/initial] (2) forcing '/broodjes/test.txt' to get passed through to next API URI-to-filename handler

Here we can see that the request is immediately identified as a request for a real file and thus passed on to the file handler. This is a big improvement since we already discovered that most of the files we serve (css, js and cached images) are static.

Second, the request for /typo3

Before:

[rid#839d898/initial] (2) init rewrite engine with requested uri /typo3
[rid#839d898/initial] (3) applying pattern '^/(typo3|t3lib|tslib|fileadmin|typo3conf|typo3temp|uploads|showpic\.php|favicon\.ico)/' to uri '/typo3'
[rid#839d898/initial] (3) applying pattern '^/typo3$' to uri '/typo3'
[rid#839d898/initial] (2) rewrite /typo3 -> /typo3/index_re.php
[rid#839d898/initial] (2) forcing '/typo3/index_re.php' to get passed through to next API URI-to-filename handler

After:

[rid#8567c50/initial] (2) init rewrite engine with requested uri /typo3
[rid#8567c50/initial] (3) applying pattern '.*' to uri '/typo3'
[rid#8567c50/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/typo3' pattern='-f' => not-matched
[rid#8567c50/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/typo3' pattern='-d' => matched
[rid#8567c50/initial] (2) forcing '/typo3' to get passed through to next API URI-to-filename handler

The new setup actually does slightly worse than the original one. we can even further optimize the original code if we fix the rewrite rule by adding a question mark to the end (to make the last slash optional), we have a setup that even recognizes favicon.ico (bug: 5020). That should also obsolete the rule:

RewriteRule ^/typo3$ /typo3/index_re.php [PT]

For accessing the backend, the original rewriting setup worked fine. But the reality is that we don't use this setup to get speed from the backend.

We want the rewriting rules to excell at serving readable URI's.

Third, the request for a realurl path

Before:

[rid#8553de8/initial] (2) init rewrite engine with requested uri /scripts/site-checker/
[rid#8553de8/initial] (3) applying pattern '^(typo3|t3lib|tslib|fileadmin|typo3conf|typo3temp|uploads|showpic\.php|favicon\.ico)/' to uri '/scripts/site-checker/'
[rid#8553de8/initial] (3) applying pattern '^typo3$' to uri '/scripts/site-checker/'
[rid#8553de8/initial] (3) applying pattern '.*' to uri '/scripts/site-checker/'
[rid#8553de8/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/scripts/site-checker/' pattern='!-f' => matched
[rid#8553de8/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/scripts/site-checker/' pattern='!-d' => matched
[rid#8553de8/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/scripts/site-checker/' pattern='!-l' => matched
[rid#8553de8/initial] (2) rewrite /scripts/site-checker/ -> /index.php
[rid#8553de8/initial] (2) forcing '/index.php' to get passed through to next API URI-to-filename handler

After:

[rid#8509618/initial] (2) init rewrite engine with requested uri /scripts/site-checker/
[rid#8509618/initial] (3) applying pattern '.*' to uri '/scripts/site-checker/'
[rid#8509618/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/scripts/site-checker/' pattern='-f' => not-matched
[rid#8509618/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/scripts/site-checker/' pattern='-d' => not-matched
[rid#8509618/initial] (4) RewriteCond: input='/var/www/sites/typofree.org/scripts/site-checker/' pattern='-l' => not-matched
[rid#8509618/initial] (3) applying pattern '.*' to uri '/scripts/site-checker/'
[rid#8509618/initial] (2) rewrite /scripts/site-checker/ -> /index.php
[rid#8509618/initial] (2) forcing '/index.php' to get passed through to next API URI-to-filename handler

The new setup does slightly better because it lacks the two rules that check for /typo3 stuff that were in before. The bad thing is that it does not start first with checking if it is a directory (first it checks for file). But in the most other cases (like pulling images from typo3temp) we want the file check to be first.

Conclusion

The new setup should be faster in theory but I was unable to see significant speed improvements using the new setup. The fact remains however that the configuration is cleaner than the current default.

Dmitry Dulepov 23 Jun 2008, 13:18
Sounds good. I am not sure why a dedicated check for typo3 and other directories were created. May be old .htaccess was made in stages: firsts this rule, next more universal rules.

You findings make sense. I may change the _.htaccess in RealURL next time I work on RealURL. Added this to my task list for RealURL.

Thanks a lot!
Michiel 23 Jun 2008, 13:59
A note from Michael Stucki:

Add a rule for those versions of exploder which search for favicon.ico in the root of the site (disregarding information in the html head).

RewriteRule /favicon.ico - [L]

So that it will nog get pushed through to typo3 -> 404
Bob 23 Jun 2008, 16:47
Your RewriteRules are not being processed in a .htaccess file, but in httpd.conf. There's a difference. A pattern like "^(typo3|t3lib|t..." won't match there (missing lading slash).

I wouldn' d recommend using f/-l-d checks for existing files on high traffic systems, because they may be performance intensive. Compiling a few RegEx (in server context only once at httpd startup, directory context (.htaccess) per each request) should be faster rather than checking for existing files in the file system. But unfortunately you can't measure differences on low traffic sites.
Michiel 23 Jun 2008, 17:06
@Bob: My vhost version reads:

RewriteEngine on

RewriteRule /favicon.ico - [PT]

RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -d [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -l
RewriteRule .* - [PT]
RewriteRule .* /index.php [PT]

I tried to do some speed tests using https://www.joedog.org/JoeDog/Siege, but I could not see any significant improvements testing agains my own server (poor old Athlon 1800 XP). I'll try laying siege to our companies webserver some time soon. Then I'll test your hypothesis.
Sebastian 30 Jun 2008, 08:52
Maybe i blind, but i dont find the position of the questionmark you mentioned in the 'Second, the request for /typo3' section.
Michiel 30 Jun 2008, 14:21
@Sebastian:

That question mark should be put in the default TYPO3 config:
^/(typo3|t3lib|tslib|fileadmin|typo3conf|typo3temp|uploads|showpic\.php|favicon\.ico)/

Should become:
^/(typo3|t3lib|tslib|fileadmin|typo3conf|typo3temp|uploads|showpic\.php|favicon\.ico)/?

To make the last slash optional, so the files also match.
Commenting is closed for this item