Fix for “Leading Whitespace Bug” in Syntaxhighlighter Evolved for WordPress

Fix for “Leading Whitespace Bug” in Syntaxhighlighter Evolved for WordPress

download:  plugin with fix | fix only
directly to: prologue | whitespace issue description | solution description | faq | notes

Update 12 June ’09: there’s a new version of SyntaxHighlighter Evolved, version 2.2.0, that includes this fix.

There are many WordPress syntax highlighters. Traditionally, WordPress is a very poor platform when it comes to posting code, but thanks to the plugins and filter extensibility models there are many programmers who have come up with their own plugin for syntax highlighting. There are in essence three types of syntax highlighters: the ones the are based on GesHi, the ones that are based on Google’s syntax highlighter and the ones that are based on Alex Gorbatchev’s SyntaxHighlighter. I’m not gonna separate chaf from wheat here, others have already done that; to cut to the chase, Gorbatchev’s highligter is simply superior to the others in terms of speed, ease of use and impact to your site. The reason? It does not augment your code with loads of spans and inline CSS, instead, it uses JavaScript to render your pre-tags when rendered in the browser.

Yes, there are downsides too, but if you want see a good comparison, visit for instance this excellent top 10 list (it lists SyntaxHighlighter Plus as number 1, this should now be SyntaxHighligher Evolved, you can download it directly from WordPress).

Prologue: where we’re coming from

The original plugin, SyntaxHighlighter, was written by Viper007Bond and elevated the JavaScript based general purpose syntax highligher from Alex Gorbatchev. Viper used code from WordPress (by Matt from Automattic) and from mdawaffe and expanded this significantly. Its slim design attracted many people and it became a very popular plugin (I haven’t checked all, but it seems only second to WP-Syntax, which is the WordPress-suggested GesHi based plugin). I have not used this version and don’t know whether it had the indentation bug.

Then came several forks (a fork is a derivative of original code, slightly changed) like Fred Wu’s SyntaxHighlighter Plus, which should fix some errors in the original SyntaxHighligher and added features like valid XHTML (by mhavilla),   background expansion (by brucknerite) and others. Or like SyntaxHighligher2 by Mohanjith, who only gives the credits to its origins in the source code but failed to say so on his website. Of a different breed is something like Google Syntax Highlighter made by Peter Ryan, which also employs Gorbatchev’s JavaScript, but isn’t based on Viper’s code. It only supports the shortcode [ code ], seems to include all brushes- javascript files for each and every page on your blog, which is not really efficient.

An odd beast in this field of forks and copycats is Visual Code Editor, which allows SyntaxHighlighter (Plus) to be used from the paragraph dropdown menu to make for easier editing in the Visual mode of TinyMCE.

A clear problem with all these highlighters was that many people simply copied Viper’s code, changed it a bit and called it their own, with new bugs and all. That’s entirely legal of course, but very confusing to the unaware plugin searcher. Viper wanted to make an end to all this confusing and renamed his original SyntaxHighlighter to SyntaxHighlighter Evolved. He did so without changing the URL, the original script (syntaxhighlighter.php) or the plugin directory names, which eases upgrades but — to me at least — added a bit to the confusion of what’s what in this world.

This new Evolved version was completely rewritten and implemented features that were added in some forks or requested by users. It seems that it quickly gained as much attention as the original, which is deserved as this one is a big step forward. Under the hood, the code has been substantially changed to make use of the new plugin and filter architecture that’s available since WordPress 2.5 and it now uses wp_register_script and wp_register_style for attaching the js and css respectively. This is a much cleaner and secure way.

Other improvements included the absence of such things as having to use <pre> + language classname-values around the [language_tag] shortcodes. All in all, working with this new version has surely shown to be a vast improvement over the previous versions (and if you are from the camp that liked the text-style buttons instead of the icons, stop moaning and replace them yourself, it only takes a few lines of CSS to do so).

Problem: leading whitespace gets removed bug

This new version did not need the Visual Code Editor anymore as it worked right out of the box with TinyMCE. The beauty of not needing <pre> anymore however introduced a bug that was thought to be gone: leading whitespace. In most cases, whitespace is preserved, but not when it is on the beginning of a line:

foreach(val in aa) {
if(someText == &quot;foo&quot;) {
someText = &quot;bar&quot;;
}
}

Code like that would be much more readable if it were indented neatly like in the following code fragment, but whatever you try (using &nbsp;, Alt-0160 for hard-space, Unicode Em-space or En-space U+2006) it doesn’t help and it won’t give you this:

foreach(val in aa) {
    if(someText == &quot;foo&quot;) {
        someText = &quot;bar&quot;;
    }
}

“Aha!”,  you say, “that’s live and it works!”. Indeed, it works and it uses the SyntaxHighlighter Evolved behind the scenes. But before you jump up and email me for asking for yet another copy cat of the SyntaxHighlighter, let me tell you right now: there ain’t gonna be one! But I do have a fix for you, of course. Just read on or go directly to the download section.

Solution: working around TinyMCE’s auto-cleanup process

Now that’s a mouth full! Before I attempted this, I tried to change the way WordPress deals with [tags], because I wanted to be able to add these to my text without too much ado. I never found out how to “escape” these shortcode tags without them being interpreted by shortcodes.php, but it did show me how the wpautop filter worked, what wptexturize does and the clean_pre, and basically the whole system that makes working with plugins and filters easy. Meanwhile it dawned to me that if I can’t get the shortcodes to stop working, maybe I could get the leading whitespace to start working…

This brought me to the SH-E’s plugin code and soon it brought me to MoxieCode’s TinyMCE and how it registers events. A little bit of extra research later, I thought I nailed it down. My first attempt was to simply replace each leading space with a tilde; albeit ugly, it worked pretty well. By now I was infected by the “it can be done” virus and I delved a bit deeper to come up with the basics of the current solution: filter just before it is saved and just before it is shown.

How it works

For anyone who’s interested in how the fix works or who wants to know a little about how the TinyMCE events work, this section is for you. The rest can skip to the next section for downloading the fix.

TinyMCE events used by SyntaxHighligher Evolved

TinyMCE employs an events based system. There are many events you can subscribe yourself to, but we’re mainly interested in the events that are raised by the editor of TinyMCE. The reason is simple: we’re working inside the editor and the syntax highlight plugin already subscribed to some events:

  • onBeforeSetContent is raised when MCE translates the HTML into Visual. This is not called when saving the page.
  • onPostProces is raised after MCE has done its magic on the HTML. What you see in this event is what is going to be saved in the database. Use this event to change this.

The original plugin subscribed itself to these events and much to my surprise, the leading whitespace was already removed before onPostProcess hits. Since the onBeforeSetContent is only used when switching from HTML to Visual, I didn’t need to do much there: the problem was not in the Visualizer, but in the HTML view, as soon as you switched to it, the leading space is removed (or when you save or preview your post for that matter).

Other TinyMCE events we can use

The current events are of no use and changing the code that’s called on them will not help. Reading further on the events subsystem of TinyMCE I came across many other possibly useful events but in the end, the following two were what I needed for fixing this issue:

  • onSaveContent is raised just before the content is saved. This is different from onPostProcess in that this contains the HTML the way we know it, as is displayed in HTML view and in the database. In other words: normlized HTML. This event is called when you click Preview, Save Draft or Publish and when you click the HTML tab.
  • onPreProcess is raised on many occasions: when it gets or sets the content, when it’s loaded and when it’s saved. Depending on the event the data is different. We’re interested in when it’s getting the data (for the HTML view) and all other events, which is for instance when it needs it when you go to Visual.

Order of events and what data is passed, where is says “cleaned or Visual HTML” it means that it depends on what the current view is: Visual or HTML.

  • Loading the post:
    • onBeforeSetContent, with the cleaned or Visual HTML
  • Switch to HTML
    • onPreProcess (o.get == true), with the Visual HTML
    • onPostProcess, with the Visual HTML
    • onSaveContent, with the cleaned HTML
  • Switch to Visual
    • onPreProcess (o.get == false), with the cleaned HTML
    • onPostProcess, with the Visual HTML
  • Auto Save Draft, Preview
    • onPreProcess (o.get == false), with the cleaned or Visual HTML
    • onPostProcess, with the cleaned or Visual HTML
    • onSaveContent, with the cleaned HTML
  • Click on Save Draft, Publish
    • onPreProcess (o.get == false), with the cleaned or Visual HTML
    • onPostProcess, with the cleaned or Visual HTML
    • onSaveContent, with the cleaned HTML
    • (after refresh) onBeforeSetContent, with the cleaned or Visual HTML

To summarize: when onPreProcess is raised and it is a get-event, somehow we need to fix the whitespace so that it is not removed when clicking HTML view. And when it is not a get-event or when onSaveContent is raised, we need to undo this action.

The big trick

When I looked at the events and tried  a bit with replacing and reverting the replacements, I quickly found out that this was not the right way to go. It worked half way with replacing the leading spaces with tildes, but then the tildes would show in the HTML view. I needed to prevent the HTML Cleaning Process to delete these spaces. Because spaces are left as they are if they are not at the beginning of a line (i.e., after a <br> tag), I decided to add some text in front of each line that starts with one or more spaces.

Basically it looks like this:

&lt;!-- before the HTML gets cleaned --&gt;
&lt;p&gt;   some normal text&lt;/p&gt;
&lt;pre&gt;[tag]&lt;br&gt;    line starting with spaces[/tag]&lt;/pre&gt;

&lt;!-- when my function kicks in onPreProcess (o.get == true) --&gt;
&lt;p&gt;    some normal text&lt;/p&gt;
&lt;pre&gt;[tag]&lt;br&gt;{{NBSP}}    line starting with spaces[/tag]&lt;/pre&gt;

&lt;!-- after the regular Highlighter plugin does its work --&gt;
&lt;p&gt;    some normal text&lt;/p&gt;
[tag]&lt;br&gt;{{NBSP}}    line starting with spaces[/tag]

&lt;!-- after the cleaning process --&gt;
&lt;p&gt;some normal text&lt;/p&gt;
[tag]&lt;br&gt;{{NBSP}}    line starting with spaces[/tag]

&lt;!-- when my function kicks in onSaveContent or onPreProcess (o.get = false) --&gt;
&lt;p&gt;some normal text&lt;/p&gt;
[tag]&lt;br&gt;    line starting with spaces[/tag]

which is in a nutshell what my code does: it puts something in front of the spaces to prevent them from being normalized and it removes them just before the data is displayed in the HTML or Visual view, or just before it is saved.

Subscribing to events of TinyMCE

Subscribing to events in TinyMCE is rather easy. Create a plugin (tinymce.create) and in the initializer object, add an init function. This will be called by TinyMCE when it adds your plugin. The syntax is as follows:

// the basic concept of adding a plugin to TinyMCE
tinymce.create('tinymce.plugins.YourPluginNameHere', {
    init : function(ed, url) {
        // your initialization code here, ed is the MCE Editor object
    }
}

In general, you’d want to subscribe to the events right from the beginning of the lifetime of your object: you’d want to use init for this. Luckily, that too is a breeze and goes something like this:

// the basic concept of subscribing to events, here with the events
// you need for fixing the leading whitespace problem
tinymce.create('tinymce.plugins.YourPluginNameHere', {
    init : function(ed, url) {
        var t = this;      // handy to keep a reference to current object

        //......
        // code left out for clarity
        //......

        // AB: fix for leading whitespace problem
        ed.onSaveContent.add(function(ed, o) {
            o.content = t._fixWhitespaceFromHtml(o.content);
        });

        ed.onPreProcess.add(function(ed, o) {
            if(o.get)
                 o.node.innerHTML = t._fixWhitespaceToHtml(o.node.innerHTML);
            else
                 o.node.innerHTML = t._fixWhitespaceFromHtml(o.node.innerHTML);
        });
    }
}
Regular expressions to the rescue

This is not a section that I will go in too deeply. Suffice it to say that I needed to loop through all the places where [tag]some code[/tag] was and inside that I needed to replace each occurrence of leading spaces. And the same of course when the reverse was needed just before displaying or saving the data to prevent my ugly {{NBSP}} (internally I use something even uglier) to show up.

I’ll present the regular expression here because it can be useful to anybody in need of looking up BBCode (or shortcodes) with using JavaScript:

var re = new RegExp(
        '([\\s\\S]*?)' +        // $1: anything before [tag]
        '(\\[(' +               // $2: [
            syntaxHLcodes +     // $3:   tag
        ').*?\\])' +            //     ]
        '([\\s\\S]*?)' +        // $4: anything inside tag
        '(\\[\\/\\3\\])',       // $5: [/tag]
        'ig');                  // /i is necessary for matching the internal
                                //  &lt;BR&gt; in innerHTML on OP/IE, while FF
                                // uses the more XHTML correct &lt;br&gt; internally

And in the following snippet is the actual loop. I left out redundant fault checking and null-checking. The point of presenting this loop is that I found very little information around that shows the power of using RegExp.exec() in favor of string.match(). The first changes some settings on each iteration on the RegExp object (re in the example) and the second does not (contrary to some documentation, even on official sites, may let you believe).

var re_matches = re.exec(content);

// go through all matches, meaning, go through each [tag]code section[/tag]
// until no more matches are found
while (re.lastIndex &gt; 0) {
    lastMatchIndex = re.lastIndex;
    result += re_matches[1] +
        re_matches[2] +
        re_matches[4].replace(/(&lt;br[^&gt;]*&gt;) /gi, '$1' + this.__MAGIC_WHITESPACE) +
        re_matches[5];

    // next match
    re_matches = re.exec(content);
}
Conclusion

In essence, this is no rocket science. It was a bit of looking through the existing libraries and tools, of which I knew little. The final JavaScript that does the actual magic is nothing more ground-breaking then a simple for-loop and a basic regular expression. For me, the one thing that this brought me was finally a clear understanding of the differences between the  .exec() and the .match() methods, on which I hope to write another blog post one day.

If you read through all this mumble jumble about javascript, regular expressions, TinyMCE and coding practices then I’m sure you’re now very eager to know whether all this babble is actually worth something. Go to the next section and download the fix to find out for yourself.

Download the fix for SyntaxHighlighter’s leading whitespace problem

Note: you can also download a pre-packaged updated plugin below.
Update 12 June ’09: there’s a new version of SyntaxHighlighter Evolved, version 2.2.0, that includes this fix.

You can download the fix and replace the same file in your plugin directory of syntaxhighlighter.

If you’re unsure of how to do that or where to find these files, you can follow the instructions below. This version was tested with version 2.1.0 of the plugin. If you don’t feel comfortable with replacing a plugin file, make sure you have proper backups.

Installation instructions from WordPress Admin screens

Just follow these steps if you don’t have the Easy Uploader Plugin:

  • Download the fix for syntaxhighligher_mce.js, if you have not already done so.
  • Open the file in a text editor of your choice and select all and copy.
  • Go to your WordPress Admin screen and select Plugins > Edit Plugins
  • Select Syntax Highlighter Evolved and then open syntaxhighligher_mce.js in the editor.
  • (optional, but recommended!) Make a copy of the contents in the editor before you proceed.
  • Paste the contents for the download over the current contents.
  • Done!

Follow these steps if you do have the Easy Uploader Plugin. This method does not make a backup, use at your own risk:

  • Open the Easy Uploader page and paste the download link in the URL field: http://www.undermyhat.org/blog/wp-content/uploads/2009/07/syntaxhighlighter_mce.js
  • Type this in Manual target: wp-content\plugins\syntaxhighlighter\syntaxhighlighter_mce.js
  • Click Upload
  • Done!

Installation instructions for FTP or local disk access

Just follow these steps if you don’t want or cannot use the WordPress Admin screens and need to do it through FTP or local disk access (remote desktop or similar):

  • Download the fix for syntaxhighligher_mce.js, if you have not already done so.
  • Go to wp-content\plugins\syntaxhighlighter directory.
  • (optional, but recommended!) backup the file syntaxhighlighter_mce.js.
  • Upload or copy the downloaded file on top of the existing syntaxhighlighter_mce.js
  • Done!

Download the fixed plugin as complete plugin package

Note: you can also download just the fix here, which contains only the updated javascript file.
Update 12 June ’09: there’s a new version of SyntaxHighlighter Evolved, version 2.2.0, that includes this fix.

For convenience and because I received a request for it via email, I’ve updated the original package of the SyntaxHighlighter Evolved plugin for WordPress and made it available for download. Simply remove your original plugin (or back it up) and follow normal procedure for installing this plugin. You can find the original installation instructions here.

Download the updated package now.

FAQ, frequently asked questions

How do I test whether it works?

To know whether the upload succeeded or whether the fix works for your situation, edit a page with code snippets on it and change a line to start with some spaces. Switch from Visual to Html and back and click Preview to find that the spaces remain.

It doesn’t work, what should I check?

First thing to check is whether you uploaded the file correctly. Go to your plugin directory and check whether the file syntaxhighlighter_mce.js contains the the following text in the header (note the two highlighted lines):

/*
 * Syntax Highlighter shortcode plugin
 * Based on v20090208 from WordPress.com
 * Andrew Ozz kicks ass
 *
 * Changes by Abel Braaksma, http://www.undermyhat.org:
 * AB 20090709: FIX for leading whitespace bug
 */

It still doesn’t work, what can I test next?

Make sure you emptied your cache (or use Chris Pederick’s Web Developer’s toolbar plugin for Firefox and click Disable > Cache to disable cache only temporarily without having to remove all your existing cache). You can also try a browser that you don’t use too often to be sure that your cache is renewed. Even then, you can try with Ctrl-F5 (IE, FF) or Ctrl-R (Opera) to force-refresh the WordPress Edit page. It is my experience however that changes are pretty quickly picked up by the browser.

I receive errors, what should I do?

If you receive errors after installing the plugin, check whether the version is 2.1.0. If it isn’t, make sure to update the plugin before you try to fix it again.

If this doesn’t help, load the Edit page in Firefox, open the Javascript console (Ctrl-Shift-J on Firefox), click Clear in the console and then refresh your page. Switch from HTML to Visual and back. Click the Errors button in the console and report to me any errors you find.

Important notes

The currently presented fix has been tested on Firefox 3.0, 3.5, Opera 9 and 10, Chrome 1.0, Internet Explorer 7 and 8, Safari 4 and works in Visual Design and HTML Mode, as well as in Draft, Preview and Saved versions of your post. It does not fix existing posts as the leading space information has been overwritten.

The fix is brand new — as of yet — and I’m in dire need of any experiences you may have with it. Please comment if you have any positive or negative experiences with this fix.

Update: this fix works well with the default installation of WordPress and TinyMCE as well as with the TinyMCE Advanced plugin.

What about new versions?

Earlier on I already said that I do not intend to create yet another syntax highlighter. I hope that the original author of this excellent plugin will have its next version with this fix included. I will maintain this page, even after such fix is included and am open to any suggestions or new bugs I introduced. Unfortunately, the javascript code of the MCE part of the plugin has more then doubled with this fix, but there’s little I can do to make it more terse.

Update 11 June ’09: there’s a new version of SyntaxHighlighter Evolved, version 2.2.0, that includes this fix.

Afterthought

I started this little exercise without knowing it would take so much time — almost half a day — to learn about TinyMCE’s and WordPress’s inner workings. After installing the new Evolved version, I just wanted to use it. When I noticed that my indentation was gone and that old posts that I re-edited also lost their indentation, I had a little problem: I had immediately recognized the superiority over the previous highlighters, but I couldn’t use this plugin if it didn’t do the indentation.

I risked the chance of creating something that has meanwhile been implemented, but by the best of my knowledge, it hasn’t yet been done. I’ll follow-up when Viper decides to either use my code (under his same license terms of course) or if a new edition of his plugin has a different fix for this issue.

– Abel –

  • Joanna Robinson
  • http://cirux.ru Nick

    Thanks a lot. I used your tutorial to fix ‘chrome-auto-translate-plugin-dialog’ divs that are inserted automatically in tinyMCE.

  • http://www.undermyhat.org Abel Braaksma

    @DOHAB: I noticed that using a plugin like the Excerpts plugin causes this behavior: when you edit an excerpt, even outside your original post, it re-saves your post, but also re-escapes all HTML entities, resulting in incorrect data. It is possible that you use a similar plugin with similar issues.

    Try disabling plugins one by one to find out what’s causing this and then take it on with the author of that plugin.

  • http://weshloan.co.cc/ Dohab

    the code fix it self here, but not in my case.

  • http://www.thelazysysadmin.net/ Jon Smith

    This is awesome. Thanks for your contribution. The whitespace problem is the only thing that has stopped me from using SyntaxHighligher Evolved.

  • http://www.viper007bond.com/ Viper007Bond

    Your “Prologue” section is incorrect.

    I wrote the original SyntaxHighlighter plugin. It was based on some code that Matt gave me that WordPress.com used at the time. I improved it significantly and mdawaffe contributed some great kses code.

    Other people came along and wanted to add their own features to the plugin, so they forked it and poorly named their forks (Syntaxhighlighter Plus is one of those forks). I contributed no code to the “Plus” version, although it was based on my plugin.

    Getting this confused is quite understandable though. The forkers chose very poor names.

    • http://www.undermyhat.org Abel Braaksma

      Thanks for pointing that out, I expanded and abbreviated the section, hopefully it is more factual now.

      Also thanks for bringing out a new version of the plugin which includes this fix. I’ll have a check at the copy-problem.

  • http://imwill.com Hendrik

    I came across the same whitespace problem but your fix doesn’t work for me. I’m using WP 2.8, FF 3.5 and the latest version of the plugin. I already disabled and emptied my cache. Any ideas?

    • http://www.undermyhat.org Abel Braaksma

      Hendrik, can you try to check whether you get any errors by following the instructions here?

      Note that it will not work with pages or blogs you already saved. The spaces are lost for these ones and have to be added again.

      Viper has meanwhile updated the plugin. To effectuate the bugfix, all you have to do now is go to your plugin repository and click Upgrade Automatically for the Syntaxhighlighter Evolved plugin:
      Screenshot SyntaxHighlighter Evolved Upgrade Automatically

      If that still doesn’t solve your problem please let me know.

      – Abel –

      • http://weshloan.co.cc/ Dohab

        I’m still having this problem too, thus I’m using the latest version 2.2.1 (at the time i wrote this response) which says it has fixed, but it’s not, the bug still there.

        I tried every fix I read about it too.

        My code still looks like :

        <?php echo "nice work"; ?<

        even qouts converted.

      • http://weshloan.co.cc/ Dohab

        that’s strage!
        I just created a post and test the same code and it works just fine. wth’s goinon??

Get Adobe Flash player