Magento’s Many 404 Pages
The 404 page has a long and illustrious history in the world of web development. What started as a simple, unfriendly error message has turned into a key part of any site’s experience, and any retail outlet’s conversion rate. Like many other PHP frameworks, Magento faces the challenge of providing a unified 404 experience. Also like many other PHP frameworks, Magento has punted that responsibility onto the end user-developer of the system. In this article we’ll explore the various ways that the Magento cart application generates 404 pages, which will allow you to make educated choices when building your 404 experience.
Before we begin though, a quick history lesson and HTTP primer is in order.
Some HTTP Background
If you have curl installed on your system, try running the following command
curl -I http://example.com
Assuming your computer is connected to the internet, you should see output something like this
HTTP/1.0 302 Found
Location: http://www.iana.org/domains/example/
Server: BigIP
Connection: Keep-Alive
Content-Length: 0
Here’s another one
curl -I http://www.iana.org/domains/example/
with results something like
HTTP/1.1 200 OK
Date: Fri, 22 Apr 2011 13:15:13 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT
Content-Length: 2945
Connection: close
Content-Type: text/html; charset=UTF-8
The command line curl
program allows you to download files over http via the shell. The -I
option tells curl that we only want the HTTP headers returned to us, not the actual contents of the file. HTTP headers are the information that a web server sends to the client about the request. While it’s good to understand what every line means, it’s the first line of each response that we’re interested in
HTTP/1.0 302 Found
HTTP/1.1 200 OK
These are HTTP status codes. HTTP stands for Hyper Text Transfer Protocol, and is the common language of the web. It defines how a computer or software application should act when it receives or requests information. The response code can be broken into two parts. The first is the HTTP version being used (HTTP/1.0
, HTTP/1.1
), and the second is the status itself (302 Found
, 200 OK
).
This status attempts to describe the type of response from the server. For example, a code of 200 means everything went as expected (OK). A code of 302 tells the client/browser that a resource has been moved to a different URL. This may seem like over engineered nerdy fluff, but it’s actually important.
When a browser receives a status code of 200, it knows to expect a document after the headers, and that it should attempt to render the document, or in the case of supporting files (images, CSS, Javascript), apply the contents of those files to the main HTML document in a way that makes sense (display the image, apply the CSS, run the Javascript). However, when a browser receives a status code of 302, it knows to look for a companion Location
header, and then automatically make another request for the URL it finds there.
That’s the first reason status codes are important. They tell the browser what to do with a particular request. Status codes also allow other kinds of web clients, particularly web spiders, to infer information about a page/resource based on its status headers. For example, if a URL returns a status of
301 Moved Permanently
the spider knows it may safely ignore the previous URL in the future, and start treating the new URL in the Location field as canonical. Google infers a significant amount of information about your site based on its headers, which is why their webmaster tools are geared towards cleaning these up.
Status 404
This brings us, finally, to the topic at hand. Give the following request a try
curl -I http://www.iana.org/domains/example/notthere.html
You should get a response something like
HTTP/1.1 404 NOT FOUND
Date: Fri, 22 Apr 2011 14:02:26 GMT
Server: Apache/2.2.3 (CentOS)
Connection: close
Content-Type: text/html; charset=utf-8
A “404 page” gets it’s name from the HTTP status code for file not found. Back in the day, the original web servers were designed to share documents. The 404 status code was originally intended to tell a browser that the file they were looking for was not available. The HTTP specification is silent on how a browser should handle 404 responses. Early web servers included a brief HTML document along with the not found status
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /testing was not found on this server.</p>
</body></html>
and most browsers chose to display whatever HTML was returned via a 404 document. This seemingly innocuous choice had an interesting effect on web development and internet culture.
Webmasters of that bygone era quickly realized that the standard 404 page provided an awful user experience for their visitors, and they started customizing the HTML output on a per-site basis so that a more useful page was returned. From a user experience point of view this allowed the end-user-visitor to continue navigating on the site despite the fact the page they were looking for wasn’t there. From an engineering point of view this created a weird situation where you needed to return a document even though the document wasn’t found. The 404 code went from being a simple status to becoming an integral part of any website’s design.
The interesting bit is, if a modern browser encounters a default 404 page (such as the one above), instead of displaying the page it will display a custom error message
If the original web browsers/web-culture had chosen to implement things this way, the entire idea of a 404 page may have never existed.
404 in the MVC Era
Modern PHP web development and 404’s present a problem that needs to be solved. Out of the box most web servers (Apache, etc.) handle 404 pages themselves. Early PHP web applications relied on a server’s 404 mechanism handling the file not found responses. If the URL was for a file that existed, PHP would process the request. If the user requested a PHP page that didn’t exist, Apache would send back its configured 404 document, and the request would never get to the PHP processing portion.
However, as you’re likely aware, most modern PHP MVC systems route all requests through a single PHP file.
http://example.com/index.php/some/uri/path
http://example.com/some/uri/path
The code in index.php
is then responsible for bootstrapping the system, and handing off control to a PHP controller class. The problem this creates is with PHP handling the request, the web server (Apache) can no longer handle 404’s. As far as the web server is concerned, if the request mapped to a PHP file, that’s a 200 OK
. This means if a user enters an invalid route, it’s the responsibility of the PHP framework to
- Send back HTML for a 404 page
- Send back the proper HTTP 404 header <br/><br/>
Framework authors need to be careful and provide a centralized 404 mechanism, or else they may end up with multiple sources for 404 page content. Also, and very commonly missed, is sending the proper 404 header. If your PHP page is returning a status 200 header Google ends up indexing every file-not-found page as an actual page, meaning you may have an infinite number of identical pages in your google results, which will negatively impact your search rankings.
Magento gets the status code right. However, it falls prey to the problem most PHP frameworks do, in that there are multiple ways a 404 page is created and rendered. Let’s take a look at those now.
Magento 404 Pages
If we take a look at the rewrite rule (in .htaccess
) that captures and redirect’s requests into Magento’s bootstrap file
############################################
## always send 404 on missing files in these folders
RewriteCond %{REQUEST_URI} !^/(media|skin|js)/
############################################
## never rewrite for existing files, directories and links
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
############################################
## rewrite everything else to index.php
RewriteRule .* index.php [L]
we can see that the line that does the capturing is
RewriteRule .* index.php [L]
However, it’s preceeded by by four RewriteCond
statments. These statements provide rules that will allow certain requests to skip the bootstrapping process. For example, these three
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-l
say only apply this rule if a file (-f
), directory (-d
) or link (-l
) do not exist for the request. This allows Apache to serve out existing static files without incurring the performance cost of Magento’s bootstrapping. The primary reason this rule is here is to allow the serving of CSS, Javascript and images from any folder in the system without additional special cases. You can also use the presence of these rules to implement a simple static cache. If you have a URL like this
http://magento.example.com/some/controller/route
and created a static HTML file at the following location
/path/to/wwwroot/some/controller/route/index.html
Apache would serve out the index.html
file instead of handing control over to Magento.
Of particular interest to us is the first rule
RewriteCond %{REQUEST_URI} !^/(media|skin|js)/
This one says if the request URL starts with media, skin, or js, then Apache should handle the request. This means requests for files that don’t exist with URLs that look like the following
http://magento.example.com/media/file.jpg
http://magento.example.com/skin/base/badstyle.css
http://magento.example.com/js/another-file-that-is-not-there.js
will use the web server’s configured 404 page. This means if you want to ensure all 404 pages have the same experience, you still need to configure a custom 404 page via your web server.
That’s the first 404 page you need to be aware of in a Magento system.
Magento’s Outer Shell
Magento’s index.php
bootstrap is relatively simple. A few environmental variables are set and checked, and then the following static method is called
1
|
Mage::run( $mageRunCode , $mageRunType ); |
The run method is on the Mage
class located in app/Mage.php
. On the surface this run
method is relatively simple.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
public static function run( $code = '' , $type = 'store' , $options = array ()) { try { Varien_Profiler::start( 'mage' ); self::setRoot(); self:: $_app = new Mage_Core_Model_App(); self:: $_events = new Varien_Event_Collection(); self:: $_config = new Mage_Core_Model_Config(); self:: $_app ->run( array ( 'scope_code' => $code , 'scope_type' => $type , 'options' => $options , )); Varien_Profiler::stop( 'mage' ); //... } |
Outside of the profiler lines, all that’s involved in starting up a Magento system is five lines of code
-
First, the root file path for the application is stored for later retrieval and path creation (
self::setRoot();
) -
Then, an “application” domain model object is instantiated (
self::$_app = new Mage_Core_Model_App()
) -
Then, an event collection is instantiated (
self::$_events = new Varien_Event_Collection();
) -
Then, a configuration object is instantiated (
self::$_config = new Mage_Core_Model_Config();
) -
Finally, the
run
method of the application domain model object is calledself::$_app->run(...
<br/><br/>
Each of the objects instantiated here gets assigned as a static property of the Mage
class, and will be referenced later during the processing of the request. You’ll notice this entire bit of code is enclosed in a try
block. Let’s take a look at the exception catching to see what happens if an exception bubbles up to this top layer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
... Varien_Profiler::stop( 'mage' ); } catch (Mage_Core_Model_Session_Exception $e ) { header( 'Location: ' . self::getBaseUrl()); die (); } catch (Mage_Core_Model_Store_Exception $e ) { require_once (self::getBaseDir() . DS . 'errors' . DS . '404.php' ); die (); } catch (Exception $e ) { if (self::isInstalled() || self:: $_isDownloader ) { self::printException( $e ); exit (); } try { self::dispatchEvent( 'mage_run_exception' , array ( 'exception' => $e )); if (!headers_sent()) { header( 'Location:' . self::getUrl( 'install' )); } else { self::printException( $e ); } } catch (Exception $ne ) { self::printException( $ne , $e ->getMessage()); } } |
Here we can see there’s three catch blocks. First Magento looks for its custom exceptions (Mage_Core_Model_Session_Exception
, Mage_Core_Model_Store_Exception
), and then the last block is a catch-all for any other exception type. The session and generic exception blocks are worth exploring, but that’s for another article. It’s the store exception we’re interested in.
1
2
3
4
|
} catch (Mage_Core_Model_Store_Exception $e ) { require_once (self::getBaseDir() . DS . 'errors' . DS . '404.php' ); die (); } |
If a Mage_Core_Model_Store_Exception
is thrown anywhere in the system and is uncaught, Magento will catch it up here. When a store exception is caught, Magento will require
in the following file.
errors/404.php
This is second Magento 404 handler. It handles page not found states for requests that don’t quite make it to the controller dispatch stage. Let’s take a look at what’s going on in 404.php
Error Proceesor
If you take a look at 404.php
you’ll see the following code.
1
2
3
|
require_once 'processor.php' ; $processor = new Error_Processor(); $processor ->process404(); |
This code bootstraps a mini error processing system inside Magento. (If you’ve spent anytime with Magento you’ll find that it’s the Mandelbrot set of software systems). The end result of process404
being called is the rendering of the following phtml template
errors/default/page.phtml
In turn, this phtml
template will include the following inner-template
errors/default/404.phtml
If you had called $processor->process503();
then 503.phtml
would have been rendered instead, with page.phtml
remaining the outer template. If you’re interested in tracing how this happens, then checkout the definition of the Error_Processor
class in
errors/processor.php
Customizing the Store Exception 404 Page
Chance are you’re going to want to customize this 404 page. You could just edit edit page.phtml
and 404.phtml
with your desired style and content. However, like any Magento core hack, you run the risk of your changes being overritten during an upgrade, and the general scorn of the Magento development community.
Fortunatly, Magento provides a mechanism for creating a custom skin folder for your error pages. Take a look at the following file
errors/local.xml.sample
This is a sample error configuration override file. If you rename it to
errors/local.xml
the Error_Processing
class will load this file and use its values rather than use the defaults hard coded in the class, (for legacy reasons Magento will also look for a design.xml
file). Take a look at the skin node in this file
<config>
<skin>default</skin>
<!-- ... -->
</config>
This is the value that controls which folder the Error_Processor
object looks for it’s phtml
files in. Let’s change that to something like
<config>
<skin>our_custom_skin</skin>
<!-- ... -->
</config>
Error skin names must me comprised of letters, numbers, and the underscore character. A folder created with any other characters will be ignored.
To test our custom 404 we’ll need to trigger a Magento store exception. The simplest way to do that is temporarily add one to the run
method in Mage.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
#File: app/Mage.php public static function run( $code = '' , $type = 'store' , $options = array ()) { try { Varien_Profiler::start( 'mage' ); self::setRoot(); self:: $_app = new Mage_Core_Model_App(); self:: $_events = new Varien_Event_Collection(); self:: $_config = new Mage_Core_Model_Config(); #our new exception throw new Mage_Core_Model_Store_Exception( '' ); self:: $_app ->run( array ( 'scope_code' => $code , 'scope_type' => $type , 'options' => $options , )); |
If you reload your development environment with the above in place, you’ll see a 404 page something like
Creating the Custom Skin
When we jiggered our system to throw that Exception, Magento ignored the custom value in the <skin/>
node because it didn’t find a errors/our_custom_skin
folder. Let’s change that now. Copy the existing errors/default
to create a new errors/our_custom_skin
cp -r errors/default errors/our_custom_skin
and then let’s edit the text in our_custom_skin/404.phtml
. Replace the following file with the following content.
#File: errors/our_custom_skin/404.phtml
<div id="main" class="col-main">
<!-- [start] content -->
<div class="page-title">
<h1>404 error: Page not found.</h1>
<p>
<em>we're sorry that you / had to see this four o four / it is what it is</em>
</p>
</div>
<!-- [end] content -->
</div>
Reload the page and you should now see your own custom 404 page.
Important: The entire default folder will need to be copied over to make this work. There’s isn’t a robust “look in my custom folder, then look in default” fallback system in place as there are in other parts of the Magento system.
Before we continue. you’ll want to restore app/Mage.php
by removing the custom exception we dropped into place.
1
2
3
4
5
6
7
8
|
#our new exception ## throw new Mage_Core_Model_Store_Exception( '' ); self:: $_app ->run( array ( 'scope_code' => $code , 'scope_type' => $type , 'options' => $options , )); |
No Route 404
So far we’ve covered two of Magento’s 404 errors. The first was the Apache 404 issued when requesting nonexistent files in the media/skin/js folders. The second was the store exception 404. The third, and most common yet most complex is the no route 404. You can see this 404 page by browsing to the following URL
http://magento.example.com/not/a/file
In a default install you should see a page that looks something like this
Magento is an MVC system. If you’re not sure what that means now might be a good time to review the Magento for PHP MVC Developers series. We’ll be here when you get back.
Similar to other web MVC systems, when Magento encounters a URL like /not/a/file
, it searches the configuration for a frontName
with the name of not
. If it finds one, next it will look for a controller in the associated module(s) named something like
1
|
class Packagename_Modulename_AController |
if it finds the controller, it will look for an action method in that controller named
1
|
public fileAction() |
If any of the above steps fail, Magento will search the database for a CMS page with the identifier
not/a/file
If it find a CMS page, that page will be rendered. If none of the above result in a match, Magento will need to create a 404 page to let the user know their resource wasn’t found. In a default installation, Magento does this by manually setting the controller on the request object to the CMS Index controller, and the action to use for the request to noRouteAction
.
When this is dispatched, the following code runs
1
2
3
4
5
6
7
8
9
10
11
|
#File: app/code/core/Mage/Cms/controllers/IndexController.php public function noRouteAction( $coreRoute = null) { $this ->getResponse()->setHeader( 'HTTP/1.1' , '404 Not Found' ); $this ->getResponse()->setHeader( 'Status' , '404 File not found' ); $pageId = Mage::getStoreConfig(Mage_Cms_Helper_Page::XML_PATH_NO_ROUTE_PAGE); if (!Mage::helper( 'cms/page' )->renderPage( $this , $pageId )) { $this ->_forward( 'defaultNoRoute' ); } } |
In a default instalation, this code looks for a CMS page named no-route
, and if it finds one, the CMS page will be rendered. Magento ships with a default CMS page named no-route
, which is the “Whoops, our bad…” page you’ve probably seen too much of.
If this CMS page has been deleted or renamed, Magento will forward the request on to the defaultNoRoute
controller action,
1
|
$this ->_forward( 'defaultNoRoute' ); |
which looks like this
1
2
3
4
5
6
7
8
9
|
#File: app/code/core/Mage/Cms/controllers/IndexController.php public function defaultNoRouteAction() { $this ->getResponse()->setHeader( 'HTTP/1.1' , '404 Not Found' ); $this ->getResponse()->setHeader( 'Status' , '404 File not found' ); $this ->loadLayout(); $this ->renderLayout(); } |
resulting in a page like this
Here, Magento is simply setting the correct headers for a 404, and then loading and rendering the layout. This results in a layout handle of cms_index_defaultnoroute
being issued, which (again, in a default installation), results in the following Layout Update XML being applied
<cms_index_defaultnoroute>
<remove name="right"/>
<remove name="left"/>
<reference name="root">
<action method="setTemplate"><template>page/1column.phtml</template></action>
</reference>
<reference name="content">
<block type="core/template" name="default_no_route" template="cms/default/no-route.phtml"/>
</reference>
</cms_index_defaultnoroute>
In layman’s terms, this removes the left and right content blocks, sets the root template to page/1column.phtml
and then adds a content block that renders the following theme template
cms/default/no-route.phtml
If you’re getting tripped up on layout concepts, reviewing this article or (better yet!) purchasing No Frills Magento Layout should set you straight.
Customizing No Route 404
You’ll notice the above paragraphs were peppered with a phrase something like “in a default install”. Out in the wild, there’s a huge number of ways the no route page might be customized. If you’re working for a variety of clients, or on a team with a number of head strong developers, you’ll probably run into some combination of the following. Neither the community or Magento Inc. has much guidance on “the right” way to do this, so your best bet is to be aware of each possible customization point and learn to debug them quickly. Let’s take a look.
Default Pages
If open up the Admin Console’s system configuration at
System -> Configuration -> Web -> Default Pages
you’ll see there’s several ways you might configure the behavior of the no route 404. Above we mentioned that Magento will attempt to load a page with the CMS identifier of no-route
. The page that Magento attempt to load is actually controlled by the CMS No Route Page setting. If you wanted to hand over management of the CMS Page to some folks from marketing, this is your best bet
Using a Different Controller Action
The CMS no route page works because there’s code in Magento that will override the controller and action used for a request if no real route to a controller is detected. By default, that’s the CMS controller and the noRouteAction
method. However, using the Default No-route URL System Configuration, a system owner can change which controller action is dispatched to a no route state. By default, this value is
cms/index/noRoute
The format of this string is
frontname/controller/action-name
You might do this if you were creating a custom module to run a significant amount of logic before (or after) displaying the 404 page.
If it wasn’t obvious from the above, if you’re using a custom controller action for your 404 page, you loose the ability to set a custom CMS page with CMS No Route Page
Controller 404 via Layout XML
The no route 404 page is rendered using the Magento layout xml system. That means its appearance may be customized by Adding custom Layout XML Updates to the handles cms_index_noroute
and cms_index_defaultnoroute
. This could happen via local.xml
, a custom layout XML file, or by editing/replacing one of the existing layout XML files in the design package.
Finally, even if no custom Layout Update XML has been added, it’s possible that a new no-route.phtml
template has been added to the current theme, or that someone has modified the no-route.phtml
template in the base folder.
Wrap UP
A good 404 page is an important part of any website’s user experience, and a Magento store is no exception. We’ve shown you the various places where Magento will detect and render a 404 into its system, as well as shown you the various ways that the experience may have been customized. With these tools in hand, you’ll be ready to conquer any 404 related challenges that the fates (or you boss!) throw at you.