Sudden empty / blank page for large posts with WordPress

Sudden empty / blank page for large posts with WordPress


It isn’t fair: you spend a few hours on creating your most marvellous blog post ever, the one and only that will surely enlist you in the Hall of Famous Bloggers, but then you click Publish or Preview and all you see is an empty post. Totally panicked you click “Edit” (never do that!) on that empty post only to find yourself in an editing screen that’s, well, empty (!!). Before you try to commit suicide, I hope you do a bit of googling and find yourself here or on one of the many other helpful / not so helpful posts on the subject.

Because googling didn’t help me (I tried them all), I present you here with what I believe is the ultimate solution. After I located the real culprit I was able to do a bit more to the point searching on the Net and suddenly found some other posts on the same subject closing in on a good solution. See references section for others posting on this subject.

The problem: empty page with only header and footer for a large post

Update: this problem can also occur if you do not use any shortcodes, after a clean install of PHP and WordPress !

In my case, the post needed to be reasonably long. The post also needs to have shortcodes (or short-tags), like [mycaption]some caption[/mycaption] or [mygallery source="galley" /] (remove “my”, see note). As soon as you place a picture with a caption on your page, you have shortcodes, possibly without even knowing it. If the text after the shortcodes contains newlines and the shortcode itself is surrounded by newlines, the problem may kick in which results in a page that is without any content (see screenshot), without any error but with the header, the footer, the title, the edit button etc.

Screenshot of no content for a large post in WordPress

Screenshot of no content for a large post in WordPress

This same behavior occurs already if you click the Preview button so you may see the problem before actually publishing.

My blog post about how to determine debug or release build was my first encounter with this problem. If you want to test whether your blog or blogosfere is vulnerable for this bug, you can download / view the content of this post. Just paste it in a new blog post (in HTML view) and click Preview Changes. If it shows an empty page, this can happen to you too in which case you should check the fix provided below.

A note on shortcodes

Shortcodes are nice additions to make editing easier without knowing much XHTML. They are available since WordPress 2.5, but so far the implementers have forgotten to add a method to escape the shortcodes. One of the larger blogs about the subject doesn’t even mention it, nor does the original API doc. If you look through the code, there’s been some thought about using double [[ and ]], but this has never been applied throughout all the code. Someone thought to come up with a plugin No Short Code which adds a new shortcode that disables the shortcodes inside, but unfortunately, that only works occasionally and it doesn’t work at all from the editor (try using wpNSC with the [_gallery_] shortcode, the effect is terrible). I changed the names of the shortcodes above to rubbish shortcodes, because if I used a registered shortcode, they wreak havoc to my page.

The solution: a simple fix prevents this blank page bug

EDIT: original solution was the shortcodes fix below, but it didn’t work in all cases. Instead, the actual cause of the problem is the limit of the regular expression recursion depth as explained below. This limit is way too restrictive and causes a hard crash of your page, depending on size, use of shortcodes and use of other plugins that also employ regular expressions.

Do the following to fix this problem:

  1. Open PHP.INI in a text editor of your choice (normally you can find php.ini in your php install dir)
  2. Change the recursion limit to 200x normal, that is, set: pcre.recursion_limit=20000000
  3. Change the backtrack limit to 100x normal, that is, set: pcre.backtrack_limit=10000000
  4. Stop and start the Apache (or IIS) service

This increase will hardly have any effect on the performance. Placing it much higher is not a good idea, as running a neverending faulty regular expression is possible and should be resolved, otherwise it will take too much memory or performance, neither are good.

Please drop me a line if this fix does not work for you and when the problem is similar.

Alternative solutions

Below are a few solutions that I tried along the way. I added them, because they didn’t work for me or only worked for a short while. The first one, about shortcodes, solves the issue in many cases, but does not solve the core problem: not enough depth for the regular expressions.

Shortcodes fix

EDIT: this was my original solution, it will work in the majority of cases, but as explained above, you can solve this better by not changing the source code of WordPress.

As explained above, the cause of all this is the way default shortcodes are treated in the formatting.php source file. To fix this behavior, simply uncomment the line that’s at the end of the wpautop() function in the formatting.php file, which you can find in the wp-includes directory. For WordPress 2.8, this is line 173:

// in function wpautop in file wp-includes/formatting.php
// next line is line 172
$pee = preg_replace( "|\n</p>$|", '</p>', $pee );

// AB: comment the following line to prevent blank page bug:
//$pee = preg_replace('/<p>\s*?(' . get_shortcode_regex() . ')\s*<\/p>/s', '$1', $pee); // don't auto-p wrap shortcodes that stand alone

return $pee;

The effect of removing or commenting that line is almost nothing in most cases. It is used by the default filters which are called when apply_filters is called and expression replaces the content that’s wrapped inside shortcodes which got an extra <p> earlier on and removes this <p> to prevent unwanted paragraph whitelines around shortcodes. The line has been around for a while and this bug too. There’s a post on the wp-pro list about this line not giving a too clear an answer, and there’s this other post that contains a bit more detail about this bug altogether. It being reported so long ago, it doesn’t seem that anybody at the WordPress developer team is going to fix this bug sometime soon.

Stop using [ caption ] shortcodes

Reading on about this issue you may have encountered that some people consider the [ caption ] shortcodes to be the problem. By not using them (and painfully typing in the HTML code by hand) they get around this bug. But the bug could easily occur in any other situation using shortcodes.

Remove the filter wpautop

Using the line remove_filter(‘wpautop’) you can remove the wpautop filter altogether. Some people prefer this as it gives them more control over the whitespace handling of a post. Removing this filter has the effect that whitespace becomes insignificant and that you’ll have to add paragraph <p>…</p> tags yourself for inserting whitelines.

This fix feels for me like using a bulldozer for hoeing your garden. This fix has so many nasty side effects that all my existing posts become illegible and that’s not really what we’re after here…

The cause: a bug in the regular expression engine of PHP

Update: this is not really a bug, it is more a matter of very low and unrealistic default settings, see solution amd recursion limit of PCRE.

The actual cause of this bug seems more complex than it appeared at first sight. The regular expression does its thing in all but a few cases. Then, if you take the text from the download link and you remove one letter from the end of it, the post renders correctly. Also, you may notice that when you remove the /s modifier of the regular expression like in the following code snippet, the problem goes away as well:

// the /s removed works well too
$pee = preg_replace('/<p>\s*?(' . get_shortcode_regex() . ')\s*<\/p>/', '$1', $pee);

The /s modifier makes the match-all dot to match everything, including newlines. Without the /s modifier, the dot does not match newlines, i.e., the complete match does not span multiple lines. Let’s look a bit closer to the match when the get_shortcode_regex function is applied (this function can be found in wp-includes/shortcodes.php):

// the shortcode regex (abbreviated, spanned across multiple lines)
$pee = preg_replace('/<p>\s*?((.?)\[
TEST|gallery|code|anyshortcode|etc)
           \b(.*?)(?:(\/))?\]
           (?:(.+?)\[\/\2\])?(.?))
           \s*<\/p>/',
           '$1', $pee);

If you’re familiar with regular expressions it shouldn’t take you too long to figure out what the above does. It looks for texts of the following forms and removes the <p> and </p> from it:

<!-- NOTE: remove underscore, _caption becomes caption etc -->
<!-- before regex -->
[_caption]hello I am a <p>caption</p>[/caption]
[_gallery value="<p>something</p>" /]

<!-- after regex -->
[_caption]hello I am a caption[/caption]
[_gallery value="something" /]

When you test the regular expression in a test environment, it works pretty well. Why it doesn’t work in this or some other situations is beyond me. The only reason I can think of is that the numerous backtracking reaches some internal limit on large posts, which is where and why it starts behaving oddly. However, reaching any such limits without throwing any error is highly unusual.

If you read this and you can shed some light on this issue or you know of a better fix, or you’re a core WordPress Developer and you know what’s going on, then don’t hesitate to drop me a line.

Meandering about large blog posts on WordPress

It’d be unlikely a big surprise that my first and foremost thought was “WordPress does not allow big posts, help!!”. As you can read from the previous sections, this proofed wrong. Yet I’d like to share a few of the things I found out during my searches, it may help you next time you think large posts are the cause of your issues. Here’s a summary:

Apache and HTTP POST limits

In all but a few cases, WordPress runs under a LAMP, WAMP or MAMP environment, which all have in common that Apache is the HTTP daemon that serves the PHP pages of WordPress. Apache has limits, but they are few:

  • Apache limits internal redirects. Redirects are used with friendly and permanent urls in WordPress. The limit is defaulted to a maximum of 10 redirects on the same server. Surpassing this limit will show up in the logs as an error. See LimitInternalRecursion directive.
  • You can limit the length of the request body. The default is unlimited, however. Unlimited in this sense means 2GB which in practice means that you cannot upload files larger then 2GB. See LimitRequestBody directive for details.
  • There’s a fixed limit for XML Request Bodies. This is the body that’s used for XMLHttpRequests and this method is heavily used by the WordPress page editor and others to send intermediate updates to the server (to prevent work loss). The maximum here defaults 1000000 bytes, which is close to 1 MB. It is possible that posts exceed this limit, though unlikely. In my case, I wasn’t even close. See LimitXMLRequestBody directive for details.
  • There are other limits, like the limiting the size or the amount of HTTP headers through LimitRequestFieldSize/LimitRequestLine (both default 8190 bytes per header value/line) and LimitRequestFields (default 100 HTTP headers) but these have little to do with the issue at hand.
  • By default, Apache is not limited to the amount of memory it can consume, but you can configure this differently on a per thread basis.

PHP limits for HTTP Post and file uploads

Where Apache has its limits set very wide (and righteously so, it must support many types of server software), PHP is must trickier when it comes to limits. As it turns out, the following limits apply to a default PHP installation:

  • The maximum time a script can take to respond is 30 seconds by default. You can set the variable max_execution_time in php.ini.
  • The maximum time a script can spend in gathering request data (related to upload time from client to server) is 60 seconds, it can be set with max_input_time.
  • Any script can consume 128MB as a limit, but multiple requests and scripts together can add to that limit unlimited.
  • Upload file size is limited to 2MB by default. This is not very much, but does not interfere with large HTTP POST requests. Set the upload_max_filesize setting to change this.
  • HTTP POST size is by default maximum 8MB. It is unlikely that I reached that limit (the size of the POST was about 25kB). You can change this setting by editing post_max_size in php.ini.
  • There are other limits that deal, for instance, with database persistence. By default, all these limits are set to no limit.
Limits of PCRE, the regular expression engine used most by PHP

The Perl Compatible Regular Expression engine, in other words, the engine that takes care of the regular expressions that are used inside the filters of WordPress, has its own limits and maximums. By default, these are very high. It is unlikely that you ever need to set these higher then the default. I haven’t tested with different settings, if I find anything, I’ll update here.

  • The limit for backtracking, pcre.backtrack_limit, defaults to 100000 (100k). This limit is the maximum number of backtrack positions a regular expression does when using greedy regular expressions.
  • The limit for recursion, pcre.recursion_limit, defaults to 100000 (100k). It is difficult to predict recursion depth as it highly depends on the expressions, but in general, recursion increases when you use nested quantifiers. Another term for recursion limit is stack size limit.

Side note: PHP has three different regular expression engines, but the preg_* functions all use the PCRE library, which is the most versatile and fastest of the three. If you want to know more about these flavors, I suggest you get yourself a copy of Jeffrey Friedl’s Mastering Regular Expressions.

Conclusion

WordPress is a lovely though rather simplistic environment. The simplicity stems from its heritage with PHP and with the impossibility to do structured programming or proper debugging. PHP adepts will tell you that both are very well possible, but the main cause for errors like these is that structured text is approached with unstructured tools. Valid structured XHTML is approached with regular expressions that are applied on top of one another. Even a mastermind cannot follow the course of a content string throughout the WordPress code and explain from looking at it how the contents will change or how it will conflict with other parts of the code (other filters). Luck, trial and error and a whole lot of stamina from the tiredless developer’s community has proven the best medicine against a poor design and a poor choice of programming language.

And let’s not forget, amateur programmers love PHP (and to a lesser extend Ruby, Perl or Python). Without these amateur programmers there wouldn’t be such a lively community and such a — to a certain extend stable — implementation of blog software.

This was just a single bug and it showed me some of the inner workings of WordPress. I haven’t gone too deep into the details this time as I just wanted to provide a little context to the problem. This look at the code hasn’t made me any more enthusiastic, but I’ll continue using it (using is different then building it) as it is definitely one of the best in its kind in spite of its shortcomings.

References

Selection of some of the more a less relevant references. Not all references from the text are cited here, if you feel you’ve been the originator of some of my ideas that I explained, drop me line and I add a reference to your site as well:

History

A little history of this page, only major updates are included. Check here if you want to know whether anything has changed since your last visit here.

2009-09-29 added and corrected some links, improved some parts of the text, added history overview
2009-09-04 updated the text on php.ini, added workarounds
2009-09-03 changed order of the story-flow, added quick-nav menu
2009-08-30 changed text dramatically to move php.ini PCRE solution to the front and older solution further back
2009-07-31 added php.ini solution on PCRE recursion limits, researched new solutions, updated large parts of the text
2009-07-08 initial post on [ shortcode ] problem and fix in core WordPress files


– Abel –

Get Adobe Flash player