Byte serving from PHP

Razvan V. Florian
Arxia / Coneural

 

What is byte serving?

Byte serving (or byteserving) is the ability of a web server to provide a range of bytes in a file instead of the entire file. If a file is being “byte served,” that means that the server which is sending the file is able to give specific bytes that the client (e.g., a web browser) requests. This feature of the HTTP protocol is commonly used by the Adobe Acrobat Reader plugin. For example, if a PDF file is being byte served, Acrobat can ask for the bytes for the 2nd page, and the server will send only the bytes for the 2nd page. In order to benefit of this HTTP protocol feature for viewing PDF files, two conditions must be fulfilled:
- the server must be able to respond to byte ranges requests;
- the PDF file must be "optimized", i.e. have a linear structure that would permit the independent download of separate pages or sections (otherwise Acrobat will probably not attempt byte ranges request).

Why byte serving from PHP?

The byte serving of regular PDF files is usually managed by the web server, if it is set up correspondingly. However, one may sometimes need to generate PDF file dynamically from PHP. For example, we would like to restrict access to certain PDF files to users authenticated by PHP, or to serve PDF files generated on the fly from PHP (e.g., using PDFLib or the R&OS PDF class). If we would like our users to benefit of byte serving (view the first page of the file, or other particular pages without downloading the entire file), we must implement byte serving from inside PHP.

How to do it?

To implement byteserving, the script should do the following:
- notify the client that it accepts byte ranges requests;
- detect byte ranges requests;
- extract from the file the requested byte range and serve it.

You may check below for a sample implementation. For more details about how byteserving works, you may check the documentation for the HTTP/1.1 protocol.

The script

<?
/*

This code is released under the Simplified BSD License:

Copyright 2004 Razvan Florian. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are
permitted provided that the following conditions are met:

   1. Redistributions of source code must retain the above copyright notice, this list of
      conditions and the following disclaimer.

   2. Redistributions in binary form must reproduce the above copyright notice, this list
      of conditions and the following disclaimer in the documentation and/or other materials
      provided with the distribution.

THIS SOFTWARE IS PROVIDED BY Razvan Florian ''AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL Razvan Florian OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation are those of the
authors and should not be interpreted as representing official policies, either expressed
or implied, of Razvan Florian.

http://www.coneural.org/florian/papers/04_byteserving.php

*/

function set_range($range, $filesize, &$first, &$last){
  /*
  Sets the first and last bytes of a range, given a range expressed as a string 
  and the size of the file.

  If the end of the range is not specified, or the end of the range is greater 
  than the length of the file, $last is set as the end of the file.

  If the begining of the range is not specified, the meaning of the value after 
  the dash is "get the last n bytes of the file".

  If $first is greater than $last, the range is not satisfiable, and we should 
  return a response with a status of 416 (Requested range not satisfiable).

  Examples:
  $range='0-499', $filesize=1000 => $first=0, $last=499 .
  $range='500-', $filesize=1000 => $first=500, $last=999 .
  $range='500-1200', $filesize=1000 => $first=500, $last=999 .
  $range='-200', $filesize=1000 => $first=800, $last=999 .

  */
  $dash=strpos($range,'-');
  $first=trim(substr($range,0,$dash));
  $last=trim(substr($range,$dash+1));
  if ($first=='') {
    //suffix byte range: gets last n bytes
    $suffix=$last;
    $last=$filesize-1;
    $first=$filesize-$suffix;
    if($first<0) $first=0;
  } else {
    if ($last=='' || $last>$filesize-1) $last=$filesize-1;
  }
  if($first>$last){
    //unsatisfiable range
    header("Status: 416 Requested range not satisfiable");
    header("Content-Range: */$filesize");
    exit;
  }
}

function buffered_read($file, $bytes, $buffer_size=1024){
  /*
  Outputs up to $bytes from the file $file to standard output, $buffer_size bytes at a time.
  */
  $bytes_left=$bytes;
  while($bytes_left>0 && !feof($file)){
    if($bytes_left>$buffer_size)
      $bytes_to_read=$buffer_size;
    else
      $bytes_to_read=$bytes_left;
    $bytes_left-=$bytes_to_read;
    $contents=fread($file, $bytes_to_read);
    echo $contents;
    flush();
  }
}

function byteserve($filename){
  /*
  Byteserves the file $filename.  

  When there is a request for a single range, the content is transmitted 
  with a Content-Range header, and a Content-Length header showing the number 
  of bytes actually transferred.

  When there is a request for multiple ranges, these are transmitted as a 
  multipart message. The multipart media type used for this purpose is 
  "multipart/byteranges".
  */

  $filesize=filesize($filename);
  $file=fopen($filename,"rb");

  $ranges=NULL;
  if ($_SERVER['REQUEST_METHOD']=='GET' && isset($_SERVER['HTTP_RANGE']) && $range=stristr(trim($_SERVER['HTTP_RANGE']),'bytes=')){
    $range=substr($range,6);
    $boundary='g45d64df96bmdf4sdgh45hf5';//set a random boundary
    $ranges=explode(',',$range);
  }

  if($ranges && count($ranges)){
    header("HTTP/1.1 206 Partial content");
    header("Accept-Ranges: bytes");
    if(count($ranges)>1){
      /*
      More than one range is requested. 
      */

      //compute content length
      $content_length=0;
      foreach ($ranges as $range){
        set_range($range, $filesize, $first, $last);
        $content_length+=strlen("\r\n--$boundary\r\n");
        $content_length+=strlen("Content-type: application/pdf\r\n");
        $content_length+=strlen("Content-range: bytes $first-$last/$filesize\r\n\r\n");
        $content_length+=$last-$first+1;          
      }
      $content_length+=strlen("\r\n--$boundary--\r\n");

      //output headers
      header("Content-Length: $content_length");
      //see http://httpd.apache.org/docs/misc/known_client_problems.html for an discussion of x-byteranges vs. byteranges
      header("Content-Type: multipart/x-byteranges; boundary=$boundary");

      //output the content
      foreach ($ranges as $range){
        set_range($range, $filesize, $first, $last);
        echo "\r\n--$boundary\r\n";
        echo "Content-type: application/pdf\r\n";
        echo "Content-range: bytes $first-$last/$filesize\r\n\r\n";
        fseek($file,$first);
        buffered_read ($file, $last-$first+1);          
      }
      echo "\r\n--$boundary--\r\n";
    } else {
      /*
      A single range is requested.
      */
      $range=$ranges[0];
      set_range($range, $filesize, $first, $last);  
      header("Content-Length: ".($last-$first+1) );
      header("Content-Range: bytes $first-$last/$filesize");
      header("Content-Type: application/pdf");  
      fseek($file,$first);
      buffered_read($file, $last-$first+1);
    }
  } else{
    //no byteserving
    header("Accept-Ranges: bytes");
    header("Content-Length: $filesize");
    header("Content-Type: application/pdf");
    readfile($filename);
  }
  fclose($file);
}

function serve($filename, $download=0){
  //Just serves the file without byteserving
  //if $download=true, then the save file dialog appears
  $filesize=filesize($filename);
  header("Content-Length: $filesize");
  header("Content-Type: application/pdf");
  $filename_parts=pathinfo($filename);
  if($download) header('Content-disposition: attachment; filename='.$filename_parts['basename']);
  readfile($filename);
}

//unset magic quotes; otherwise, file contents will be modified
set_magic_quotes_runtime(0);

//do not send cache limiter header
ini_set('session.cache_limiter','none');


$filename='myfile.pdf'; //this is the PDF file that will be byteserved
byteserve($filename); //byteserve it!

?>

Known problems

It seems that when a byte ranges request by the Acrobat Reader plugin is not delivered by the server, Acrobat Reader may give a "There was a problem reading this document (109)" error when the user gets to the part of the PDF file that was not delivered. A typical scenario to get this error is to navigate quickly back and forth inside a PDF file while it downloads (by clicking the side navigation bar, for example). This results in many requests being submitted to the server in a short time, and the response to some of them may not reach the Acrobat Reader, due to network latency. However, this is not a bug of this script, but a bug in Acrobat: the same phenomenon also appears with files served directly by the Apache server.

If you discover bugs in this script, or repeatable patterns of unexpected behavior, please report them.

Further support

If you would like further support on byte serving from PHP, or on other web development issues from me or my colleagues at Arxia, please write me.

Acknowledgements

I thank to Mathieu Roche and Gaetano Giunta for detecting some bugs in previous versions of the script.

Last updated May 25, 2005; first released July 23, 2004.