Mediawiki RawFile

Very short introduction

Just have a look to the 2 examples to see how to use the extension
and to the Installation section to see how to install the extension in your MediaWiki server

Introduction

Originally the idea was to be able to download directly a portion of code as a file.
I've numerous code examples in my wiki and I wanted an easy way to download them, easier than a copy/paste!
But from there it was rather easy to get something very close to literate programming just by allowing multiple blocks referring to the same file, which will be concatenated together at download time.

It must work with pre, nowiki, js, css, code, source, so let's make it general: take the tag that comes after the parser function we'll create and select data up to the closing tag.
There are two distinct functionalities provided by the extension:
- the parser that will convert a magic word into a link to the download URL
- an extended ?action=raw that will strip the raw output to keep the desired code

Syntax

The extension introduces 3 elements:

Anchor: Used to flag that the next code block in the wiki text belongs to a specific file. The code block can be any wiki block (such as <pre>, <code>, <tt>, <source>...). <br> tags are ignored. Note that anchors are invisible in the wiki display.
Link: They are transformed by the extension into links that allows for downloading all blocks attached to a given anchor name.
Anchor-link: A shortcut notation mixing both an anchor and download link, handy for regular use, when a single code block is used and when the download link can be at the same position as the anchor.

The syntax is as follows. The syntax using tag <file> and tag attribute class is new since v0.4. Note that elements of both syntaxes can be mixed in a same page.

Element Syntax and description

Anchor

{{#fileAnchor: anchorname}}
<pre class='anchorname'>...</pre>
<code class="anchorname">...</code>
<code class="cssclass anchorname">...</code>
...

Indicates that the next wiki block is attached to an anchor anchorname. The content of that block will be downloaded (possibly appended with other blocks if there are several blocks attached to the same anchorname) when a file link is clicked on.
(since v0.4) To attach an anchor anchorname to a wiki block, simply add an attribute class="anchorname" to it. The extension supports multi-class specification, meaning that a same block can be associated to different files, and that the class attribute can still be used to specify custom CSS properties as in standard wiki text.

anchorname
class="anchorname": The name of the anchor to which the wiki block is attached

Link

[{{#fileLink: anchorname}} link text]
[{{#fileLink: anchorname|pagetitle}} link text]
<file anchor="anchorname" [name="filename"] [title="pagetitle"]>link text</file>

Creates a link to download all blocks that are attached to an anchor anchorname.

anchorname
anchor="anchorname": The name of the anchor to look for. All blocks attached to an anchor anchorname will be downloaded.
name="filename": Optional - Specifies the name of the file to download. If absent, anchorname is then used as the name of the downloaded file.
pagetitle
title="pagetitle": Optional - Indicates that the blocks to download are on the wiki page titled pagetitle. If absent, blocks are looked for on the current page.
link text: The text of the link to display.

Anchor-link

[{{#file: filename}} link text]
<file name="filename" [tag="''tagname''"]>link text</file>

Creates a link to download the next wiki block as a file named filename.
(since v0.4) The attribute tag can be used to specify the tagname of the block to download.

filename
name="filename": The name of the file to download.
tag="tagname": Optional - When set, the extension only looks for blocks whose name matches the given tagname. This attribute is particularly useful when there are some irrelevant blocks between the anchor-link and the block you want to download. If absent, the first encountered block following the anchor is downloaded.
link text: The text of the link to display.

Short example

The extension works with any block such as pre, nowiki, js, css, code, source,...
This example is using the syntax highlighting <source> tag provided by SyntaxHighlight extension (using GeSHi Highlighter)
If you didn't install that extension on your MediaWiki, you can try the example by using <pre> instead of <source>.

Let's save the following code [{{#file: myscript.sh}} as myscript.sh]
<source lang=bash>
#!/bin/bash

echo 'Hello world!'
exit 0
</source>

will give:

Let's save the following code [{{#file: myscript.sh}} as myscript.sh]

#!/bin/bash

echo 'Hello world!'
exit 0

Complete example

And a full example with anchors & link:

Let's start with the Bash usual header:
{{#fileanchor: myotherscript.sh}}
<source lang=bash>
#!/bin/bash
</source>
Then we'll display a welcome message:
{{#fileanchor: myotherscript.sh}}
<source lang=bash>
echo 'Welcome on earth!'
</source>
And we finally exit cleanly:
{{#fileanchor: myotherscript.sh}}
<source lang=bash>
exit 0
</source>
[{{#filelink: myotherscript.sh}} myotherscript.sh is now available for download below the code]

will give:

Let's start with the Bash usual header: {{#fileanchor: myotherscript.sh}}

#!/bin/bash

Then we'll display a welcome message: {{#fileanchor: myotherscript.sh}}

echo 'Welcome on earth!'

And we finally exit cleanly: {{#fileanchor: myotherscript.sh}}

exit 0

[{{#filelink: myotherscript.sh}} myotherscript.sh is now available for download below the code]

The code (the ultimate example)

Which you can of course download just by following [{{#filelink: RawFile.php}} this link :-)]

So let's explain a bit the code in a Literate Programming way...

Hooks

First some hooks for our functions...

We will create:

a Parser Function (see also here), with help of
- $wgExtensionFunctions or ParserFirstCallInit global hook to define the setup function
- Magic Words
- Tag extensions
- LanguageGetMagic hook to initialize the magic words
a RawPageViewBeforeOutput hook to intercept the raw output

<?php

if (defined('MEDIAWIKI')) {

//Avoid unstubbing $wgParser on setHook() too early on modern (1.12+) MW versions, as per r35980
if ( defined( 'MW_SUPPORTS_PARSERFIRSTCALLINIT' ) ) {
    $wgHooks['ParserFirstCallInit'][] = 'efRawFile_Setup';
} else { // Otherwise do things the old fashioned way
    $wgExtensionFunctions[] = 'efRawFile_Setup';
}
$wgHooks['LanguageGetMagic'][]       = 'efRawFile_Magic';
$wgHooks['RawPageViewBeforeOutput'][] = 'fnRawFile_Strip';

Setup function

For the wiki parsing to create download links, the parser functions file and fileLink are equally treated, while fileAnchor will be simply left out. We also create a new tag file as explained here. {{#fileanchor: RawFile.php}}

function efRawFile_Setup() {
    global $wgParser;
    $wgParser->setFunctionHook( 'file', 'efRawFile_Render' );
    $wgParser->setFunctionHook( 'filelink', 'efRawFile_Render' );
    $wgParser->setFunctionHook( 'fileanchor', 'efRawFile_Empty' );
    $wgParser->setHook( 'file', 'efRawFile_FileTagRender' );
    return true;
}

Hook to initialize the magic words

We add the magic words here: the first array element indicates if it is case sensitive, in this case it is not case sensitive. We could add extra elements to create synonyms for our parser function.
Unless we return true, other parser functions extensions will not get loaded. {{#fileanchor: RawFile.php}}

function efRawFile_Magic( &$magicWords, $langCode ) {
    $magicWords['file'] = array( 0, 'file' );
    $magicWords['filelink'] = array( 0, 'filelink' );
    $magicWords['fileanchor'] = array( 0, 'fileanchor' );
    return true;
}

Parser functions of the magic words

The transformation rule to replace link shortcuts to actual links for download, handling an optional local wiki page title if present.
The input parameters are wikitext with templates expanded, the output should be wikitext too
TODO: what error to send out if there is no filename given?
EDIT: It seems that commit 27667 (1.11 -> 1.12) changed the default parser, which breaks the recursive parsing. Thanks to Tim Starling for helping me to get around the problem! {{#fileanchor: RawFile.php}}

function efRawFile_Render( &$parser, $filename = '', $titleText = '') {
    if( $titleText == '' )
        $title = $parser->mTitle;
    else
        $title = Title::newFromText( $titleText );
    return $title->getFullURL( 'action=raw&anchor='.urlencode( $filename ) );
}

And the other one, just removing the anchors from the rendered wiki page.
Curiously enough if the function doesn't exist at all the effect is exactly the same, MW doesn't throw any error.
But let's keep things clean... {{#fileanchor: RawFile.php}}

function efRawFile_Empty( &$parser, $filename = '') {
    return '';
}

Parser functions of the new tag `<file>`

The transformation rule to replace <file> tag to actual links for download. The same parser function is used for both anchors and anchor-links. Since the link text may contain wiki text, we generate the link as wiki text that we ask the parser to parse again. {{#fileanchor: RawFile.php}}

function efRawFile_FileTagRender( $input, $args, &$parser ) {
    if( $args['title'] == '' )
        $title = $parser->mTitle;
    else
        $title = Title::newFromText( $args['title'] );
    $link=$title->getFullURL( 'action=raw' );
    if( $args['name'] != '' )
        $link.='&name='.urlencode( $args['name'] );
    if( $args['anchor'] != '' )
        $link.='&anchor='.urlencode( $args['anchor'] );
    if( $args['tag'] != '' )
        $link.='&tag='.urlencode( $args['tag'] );
    return $parser->recursiveTagParse( "[$link $input]" );
}

Hook to intercept the raw output

This part of the code doesn't look that nice because we've to parse the raw wiki page ourselves to retrieve the code sections we want.

First we define a helper function that we will use to report error messages. This is simply done by replacing the content of the downloaded file with the error message and when necessary a copy of the raw text relevant to the error.
TODO: Cancel the file download header and return a proper error page {{#fileanchor: RawFile.php}}

function fnRawFile_Strip_Error($msg,$out,&$text) {
    $text=$msg;
    if($out != '')
        $text.="\nCandidate match: $out";
    return true;
}

Next let's see if ?action=raw was used in the context of this extension: in that case we receive the filename as GET parameter, otherwise we simply return from our extension with return value=true which means we authorize the raw display (originally the hook was created to add an authentication point) {{#fileanchor: RawFile.php}}

function fnRawFile_Strip(&$rawPage, &$text) {
    $filename=$_GET['name'];
    $anchor=$_GET['anchor'];
    // for backward compatibility, accept also URLs with parameter 'file'
    if( $anchor=='' )
        $anchor=$_GET['file'];
    $tag=$_GET['tag'];
    // Either anchor or name must be specified
    if( $filename=='' )
        $filename=$anchor;
    if ( $filename=='' )
        return true;

By default the downloadable file will still be handled by the ob_gzhandler session made by Mediawiki. To avoid output buffering and gzipping, one can uncomment the following line: {{#fileanchor: RawFile.php}}

    // Uncomment the following line to avoid output buffering and gzipping:
    // wfResetOutputBuffers();

Raw action already set the headers with some client cache pragmas and is supposed to be displayed in the browser but in our case we want to make this "page" a downloadable file so we overwrite the headers which were defined and we add a few more, to ensure there is no caching on the client (it's very hard for the client to force a refresh on a file download, contrary to a web page) and to provide the adequate filename. {{#fileanchor: RawFile.php}}

    header("Content-disposition: attachment;filename={$filename}");
    header("Content-type: application/octet-stream"); 
    header("Content-Transfer-Encoding: binary"); 
    header("Expires: 0");
    header("Pragma: no-cache"); 
    header("Cache-Control: no-store");

Then we'll strip the output, first we've to locate the anchors but there are anchors that could be protected in literal blocks like nowiki.
So we'll mask the literal blocks before searching for the anchors (we mask with the same string length because we'll retrieve an offset that we will use on the initial string and offsets must match). This is done with the scary regex below:

we use ! instead of / as pattern indicator so that the pattern string is self-matching. This is necessary since we will apply the extension on this page as well.
we use option s (multiline) and e (evaluate replace expression)
Evaluated expression replaces all characters in the matched string with X's. However if there are single quote (') in the matched string, they will be escaped with \. So we need to search for \'|.. The many back-slashes is because the expression is evaluated several times.

TODO: should we care also of source, js, css, pre,... blocks? {{#fileanchor: RawFile.php}}

    $maskedtext=preg_replace('!<nowiki>.*?</nowiki>!se',
        'preg_replace("/\\\\\\\\\\\'|./","X","$0")',
        $text);

Now we can search for the anchors:

If an anchor name is specified, we looked for all magic words {{#fileanchor:...}} or blocks with attribute class="[someclass ]anchorname"
Otherwise we look for the first magic word {{#file:...}} with specified file name,
And finally for the first <file> tag with the specified file name (no multiple blocks support)

And we free the memory used for the masked version {{#fileanchor: RawFile.php}}

    if (($anchor!='') && preg_match_all('/({{#fileanchor: *'.$anchor.' *}})|(<[^>]+ class *= *"([^"]*\w)?'.$anchor.'(\w[^"]*)?"[^>]*>)/i', $maskedtext, $matches, PREG_OFFSET_CAPTURE))
        $offsets=$matches[0];
    else if (preg_match_all('/{{#file: *'.$anchor.' *}}/i', $maskedtext, $matches, PREG_OFFSET_CAPTURE))
        $offsets=array($matches[0][0]);
    else if (preg_match_all('/<file( [^>]*)? name *= *"'.$filename.'"[^>]*>/i', $maskedtext, $matches, PREG_OFFSET_CAPTURE))
        $offsets=array($matches[0][0]);
    else {
        // We didn't find our anchor
        return fnRawFile_Strip_Error("ERROR - RawFile: anchor not found (anchor=$anchor, name=$filename, tag=$tag)","",$text);
    }
    unset($maskedtext);

$text is both input & output so we copy it and start with an empty output. {{#fileanchor: RawFile.php}}

    $textorig=$text;
    $text='';

For each anchor found we've to isolate the content of the next block. {{#fileanchor: RawFile.php}}

    foreach ($offsets as $offset) {

We start from the position of the current anchor. If the tag name of the block attached to the anchor is not specified, we look for the first block that follows the anchor, excluding <br> and <file> block. The search can be easily done with a regular expression, using the lookahead negative assertion (?!br\b|file\b) to exclude the tags to ignore. Note that we need to ignore the anchor-link block <file> since the anchor starts right before that tag, and so the regular expression would match the anchor-link block it that tag is not specifically excluded. {{#fileanchor: RawFile.php}}

        $out = substr($textorig, $offset[1]);
        // If no tag specified, we take the first one
        if ($tag == '')
        {
            // With a regex assertion, we can easily ignore 'br' and 'file' tags
            if (!preg_match('/<((?!br\b|file\b)\w+\b)/', $out, $matches))
                return fnRawFile_Strip_Error ("ERROR - RawFile: Can't find opening tag after anchor '$offset[0]' (anchor=$anchor, name=$filename, tag=$tag)",$out,$text);
            $tag=$matches[1];
        }

Now, we know the tag name of the block to download, either because it was already specified as a GET attribute in the URL, or because we've found it in the search above. Again, using a regular expression, we look for the first block matching the specified tag name that follows the current anchor, and extract the content of the blocks. Note the use of the regex option /.../s to tell the regex engine that the matched text can span on multiple lines (with that option, . does match any character or a newline character). Also, we skip the first carriage return after the opening tag, if any (with \n?). {{#fileanchor: RawFile.php}}

        // Find the first tag matching $tag, and return enclosed text
        if (!preg_match('/<'.$tag.'( [^>]*)?>\n?(.*?)<\/'.$tag.'>/s', $out, $matches))
            return fnRawFile_Strip_Error ("ERROR - RawFile: no closing '$tag' found after anchor '$offset[0]' (anchor=$anchor, name=$filename, tag=$tag)",$out,$text);
        $text .= $matches[2];
    }

No need to deal with a Content-Length header because Mediawiki will do it for us, moreover more properly than we could if the output is sent gzipped, which is the default.
So that's it, $text contains our file! {{#fileanchor: RawFile.php}}

    return true;
}

Credits

There is an official way to register the extension in a Mediawiki installation, so that it will be visible on the Special:Version page.
Let's say the extension is in the category of parser hooks even if there is also a hook on Raw action. {{#fileanchor: RawFile.php}}

$wgExtensionCredits['parserhook'][] = array('name' => 'RawFile',
                           'version' => '0.4.1',
                           'author' => 'Philippe Teuwen, Michael Peeters',
                           'url' => 'http://www.mediawiki.org/wiki/Extension:RawFile',
//                         'url' => 'http://wiki.yobi.be/wiki/Mediawiki_RawFile',
                           'description' => 'Downloads a RAW copy of <nowiki><tag>data</tag></nowiki> in a file<br>'.
                                            'Useful e.g. to download a script or a patch<br>'.
                                            'It also allows what is called [http://en.wikipedia.org/wiki/Literate_programming Literate Programming]');
}

?>

And finally registration of the extension at the Mediawiki website according to the Extensions Manual.

So this extension has now its own page on the official Mediawiki site.

Installation

Download [{{#filelink: RawFile.php}} RawFile.php] and save it under the MediaWiki directory as extensions/RawFile/RawFile.php

Add at the end of LocalSettings.php:

require_once("$IP/extensions/RawFile/RawFile.php");

Status

If you use the extension properly the code is fully functional but it's rather raw on error handling.

ChangeLog

0.4

Anchors can be specified using html class attribute
New syntax for Links and Anchor-links:

<file [name="..."] [anchor="..."] [tag="..."] [title="..."] >Link text</file>

Support multiple files on the same page with same name (differentiated by their anchor name) or even common blocks in multiple files.
Can specify the tag name of the block to download (to skip some irrelevant blocks when using an anchor-link).
Ignore <br> tag.
Some error reporting.

0.3

Added optional parameter to #fileLink to indicate that the file is on another local wiki page

0.2

Fix problem with Content-Length mismatch when transport is gzipped (default for Mediawiki if client supports it)

0.1

Initial version

Known bugs

Jani Uusitalo reported the following issue:
For some reason, if you use Epiphany's 'Save as' instead of a direct left-click, the downloaded file is a single byte. In Firefox the links work just fine, so this is probably an Epiphany bug. Uncommenting the // wfResetOutputBuffers(); line didn't help.

This bug shows up in Opera 11 as well. Here is how to fix it: edit RawFile.php and change the content type provided to read

header("Content-type: application/octet-stream");

I.e add a dash between "octet" and "stream" - check the MIME reference

BTW. Thanks for the great extension, I am using it on this wiki Sicvolo 00:46, 12 February 2011 (UTC)

Questions and feedback

If you've any trouble, questions or suggestions, you can contact me.

Mediawiki RawFile

Contents

Very short introduction

Introduction

Syntax

Short example

Complete example

The code (the ultimate example)

Hooks

Setup function

Hook to initialize the magic words

Parser functions of the magic words

Parser functions of the new tag `<file>`

Hook to intercept the raw output

Credits

Installation

Status

ChangeLog

Known bugs

Questions and feedback

Navigation menu

Search

Mediawiki RawFile

Very short introduction

Introduction

Syntax

Short example

Complete example

The code (the ultimate example)

Hooks

Setup function

Hook to initialize the magic words

Parser functions of the magic words

Parser functions of the new tag <file>

Hook to intercept the raw output

Credits

Installation

Status

ChangeLog

Known bugs

Questions and feedback

Navigation menu

Search

Parser functions of the new tag `<file>`