Welcome to our new series of strange bugs. Today we’ve got a brand new candidate called emptyness which is a really great star on the floor of PHP bugs. I was being stumbled upon this one by Thomas Puls who mailed me about the fact, that if you highlight a single 0 with GeSHi, it magically disappears.
At first thought there are two major issues that might come into play here. The first possibility might be an misguided internal parsing rule which I excluded from being the cause by the fact that there was more to the report, namely the instructions on howto reproduce.
If you have visited the GeSHi website you might know the small demo page where you can test out GeSHi to your hearts content. Unfortunately noone anticipated that some easy source like
0
was simply to big to fail – well, somehow it did so anyway:
There was an error. For large sources this might be caused due to a security limit of this server. For stuff larger than 1MB use file upload.
Yeah, you’re right: That’s exactly ONE character there – ASCII 48 to be precise – that’s causing an „Input too large“ error message. So seeing an error message the natural thought is: Something is wrong!
Let’s get things sorted out. If you follow the flow of information on the demo page you’ll notice that the demo.php submits data to the php_highlighter.php which does all the interfacing work. As a bug in the connection between browser and the script seems a bit awkward the starting place is the part of the script where the parameters are retrieved from the client. This is done quite straight forward with source like this:
// Are there errors to output?
if ( !empty($_POST['url']) )
{
if (substr($_POST['url'], 0, 7) != 'http://') {
exit;
}
$code = file_get_contents($_POST['url']);
}
if ( empty($_POST['source']) && ($_FILES['file']['error'] == UPLOAD_ERR_NO_FILE) && ($code == '') && ( !$_POST['source_id'] ) )
{
$geshi_error = true;
}
if ( $geshi_error )
{
die('There was an error. For large sources this might be caused due to a security limit of this server. For stuff larger than 1MB use file upload.');
}
if ( !empty( $_POST['source'] ) )
{
$source = $_POST['source'];
}
elseif ( $_FILES['file']['size'] != 0 )
{
$source = file_get_contents($_FILES['file']['tmp_name']);
}
elseif ( $_POST['source_id'] )
{
$id = abs(intval($_POST['source_id']));
$sql = "SELECT source FROM geshi_highlight_data
WHERE id = $id";
if ( !$result = $db->query($sql) )
{
$db->DbDie(mysql_error(), $sql, __LINE__, __FILE__);
}
$row = $db->fetchrow($result);
$source = $row['source'];
}
else
{
// URL upload
$source = $code;
}
Having a closer look you hopefully spotted the following lines:
// Are there errors to output?
if ( !empty($_POST['url']) )
{
if (substr($_POST['url'], 0, 7) != 'http://') {
exit;
}
$code = file_get_contents($_POST['url']);
}
if ( empty($_POST['source']) && ($_FILES['file']['error'] == UPLOAD_ERR_NO_FILE) && ($code == '') && ( !$_POST['source_id'] ) )
{
$geshi_error = true;
}
if ( $geshi_error )
{
die('There was an error. For large sources this might be caused due to a security limit of this server. For stuff larger than 1MB use file upload.');
}
if ( !empty( $_POST['source'] ) )
{
$source = $_POST['source'];
}
elseif ( $_FILES['file']['size'] != 0 )
{
$source = file_get_contents($_FILES['file']['tmp_name']);
}
elseif ( $_POST['source_id'] )
{
$id = abs(intval($_POST['source_id']));
$sql = "SELECT source FROM geshi_highlight_data
WHERE id = $id";
if ( !$result = $db->query($sql) )
{
$db->DbDie(mysql_error(), $sql, __LINE__, __FILE__);
}
$row = $db->fetchrow($result);
$source = $row['source'];
}
else
{
// URL upload
$source = $code;
}
How many of you have spotted the issue by now? If you didn’t you should restart your PHP beginner’s manual – or read the fine documentation. 😉
To correct the problem the best way is to check against the empty string explicitly here which changes the code to look like the following snippet:
// Are there errors to output?
if ( !empty($_POST['url']) )
{
if (substr($_POST['url'], 0, 7) != 'http://') {
exit;
}
$code = file_get_contents($_POST['url']);
}
if ( ('' == $_POST['source']) && ($_FILES['file']['error'] == UPLOAD_ERR_NO_FILE) && ($code == '') && ( !$_POST['source_id'] ) )
{
$geshi_error = true;
}
if ( $geshi_error )
{
die('There was an error. For large sources this might be caused due to a security limit of this server. For stuff larger than 1MB use file upload.');
}
if ( '' != $_POST['source'] )
{
$source = $_POST['source'];
}
elseif ( $_FILES['file']['size'] != 0 )
{
$source = file_get_contents($_FILES['file']['tmp_name']);
}
elseif ( $_POST['source_id'] )
{
$id = abs(intval($_POST['source_id']));
$sql = "SELECT source FROM geshi_highlight_data
WHERE id = $id";
if ( !$result = $db->query($sql) )
{
$db->DbDie(mysql_error(), $sql, __LINE__, __FILE__);
}
$row = $db->fetchrow($result);
$source = $row['source'];
}
else
{
// URL upload
$source = $code;
}
Having the interface working right and now even accepting the single zero as an input let’s move on to the next level – since as to be expected this didn’t fix the problem 😉
The next piece of code that is called by php_highlighter.php is the constructor of the GeSHi object doing all the actual highlighting work passing to it the – now valid – value of $source. The code there looks like this:
/**
* Creates a new GeSHi object, with source and language
*
* @param string The source code to highlight
* @param string The language to highlight the source with
* @param string The path to the language file directory. <b>This
* is deprecated!</b> I've backported the auto path
* detection from the 1.1.X dev branch, so now it
* should be automatically set correctly. If you have
* renamed the language directory however, you will
* still need to set the path using this parameter or
* {@link GeSHi->set_language_path()}
* @since 1.0.0
*/
function GeSHi($source = '', $language = '', $path = '') {
if (!empty($source)) {
$this->set_source($source);
}
if (!empty($language)) {
$this->set_language($language);
}
$this->set_language_path($path);
}
With a small change in the constructor even the demo page now works properly:
/**
* Creates a new GeSHi object, with source and language
*
* @param string The source code to highlight
* @param string The language to highlight the source with
* @param string The path to the language file directory. <b>This
* is deprecated!</b> I've backported the auto path
* detection from the 1.1.X dev branch, so now it
* should be automatically set correctly. If you have
* renamed the language directory however, you will
* still need to set the path using this parameter or
* {@link GeSHi->set_language_path()}
* @since 1.0.0
*/
function GeSHi($source = '', $language = '', $path = '') {
if ('' != $source) {
$this->set_source($source);
}
if ('' != $language) {
$this->set_language($language);
}
$this->set_language_path($path);
}
Remember: ‚0‘ can be quite large source and leave your page feel kinda empty. 😉