phpDocumentor PHP_LexerGenerator
[ class tree: PHP_LexerGenerator ] [ index: PHP_LexerGenerator ] [ all elements ]

Class: PHP_LexerGenerator

Source Location: /LexerGenerator.php

Class PHP_LexerGenerator

Class Overview

The basic home class for the lexer generator. A lexer scans text and organizes it into tokens for usage by a parser.

Sample Usage:

  1.  require_once 'PHP/LexerGenerator.php';
  2.  $lex new PHP_LexerGenerator('/path/to/lexerfile.plex');

A file named "/path/to/lexerfile.php" will be created.

File format consists of a PHP file containing specially formatted comments like so:

  1.  /*!lex2php
  2.  */

All lexer definition files must contain at least two lex2php comment blocks:

  • 1 regex declaration block
  • 1 or more rule declaration blocks
The first lex2php comment is the regex declaration block and must contain several processor instruction as well as defining a name for all regular expressions. Processor instructions start with a "%" symbol and must be:

  • %counter
  • %input
  • %token
  • %value
  • %line
token and counter should define the class variables used to define lexer input and the index into the input. token and value should be used to define the class variables used to store the token number and its textual value. Finally, line should be used to define the class variable used to define the current line number of scanning.

For example:

  1.  /*!lex2php
  2.  %counter {$this->N}
  3.  %input {$this->data}
  4.  %token {$this->token}
  5.  %value {$this->value}
  6.  %line {%this->linenumber}
  7.  */

Patterns consist of an identifier containing an letters or an underscore, and a descriptive match pattern.

Descriptive match patterns may either be regular expressions (regexes) or quoted literal strings. Here are some examples:

 pattern = "quoted literal"
 ANOTHER = /[a-zA-Z_]+/
 COMPLEX = @<([a-zA-Z_]+)( +(([a-zA-Z_]+)=((["\'])([^\6]*)\6))+){0,1}>[^<]*@

Quoted strings must escape the \ and " characters with \" and \\.

Regex patterns must be in Perl-compatible regular expression format (preg). special characters (like \t \n or \x3H) can only be used in regexes, all \ will be escaped in literal strings.

Sub-patterns may be defined and back-references (like \1) may be used. Any sub- patterns detected will be passed to the token handler in the variable $yysubmatches.

In addition, lookahead expressions, and once-only expressions are allowed. Lookbehind expressions are impossible (scanning always occurs from the current position forward), and recursion (?R) can't work and is not allowed.

  1.  /*!lex2php
  2.  %counter {$this->N}
  3.  %input {$this->data}
  4.  %token {$this->token}
  5.  %value {$this->value}
  6.  %line {%this->linenumber}
  7.  alpha = /[a-zA-Z]/
  8.  alphaplus = /[a-zA-Z]+/
  9.  number = /[0-9]/
  10.  numerals = /[0-9]+/
  11.  whitespace = /[ \t\n]+/
  12.  blah = "$\""
  13.  blahblah = /a\$/
  14.  GAMEEND = @(?:1\-0|0\-1|1/2\-1/2)@
  15.  PAWNMOVE = /P?[a-h]([2-7]|[18]\=(Q|R|B|N))|P?[a-h]x[a-h]([2-7]|[18]\=(Q|R|B|N))/
  16.  */

All regexes must be delimited. Any legal preg delimiter can be used (as in @ or / in the example above)

Rule lex2php blocks each define a lexer state. You can optionally name the state with the %statename processor instruction. State names can be used to transfer to a new lexer state with the yybegin() method

  1.  /*!lexphp
  2.  %statename INITIAL
  3.  blah {
  4.      $this->yybegin(self::INBLAH);
  5.      // note - $this->yybegin(2) would also work
  6.  }
  7.  */
  8.  /*!lex2php
  9.  %statename INBLAH
  10.  ANYTHING {
  11.      $this->yybegin(self::INITIAL);
  12.      // note - $this->yybegin(1) would also work
  13.  }
  14.  */

You can maintain a parser state stack simply by using yypushstate() and yypopstate() instead of yybegin():

  1.  /*!lexphp
  2.  %statename INITIAL
  3.  blah {
  4.      $this->yypushstate(self::INBLAH);
  5.  }
  6.  */
  7.  /*!lex2php
  8.  %statename INBLAH
  9.  ANYTHING {
  10.      $this->yypopstate();
  11.      // now INBLAH doesn't care where it was called from
  12.  }
  13.  */

Code blocks can choose to skip the current token and cycle to the next token by returning "false"

  1.  /*!lex2php
  2.  WHITESPACE {
  3.      return false;
  4.  }
  5.  */

If you wish to re-process the current token in a new state, simply return true. If you forget to change lexer state, this will cause an unterminated loop, so be careful!

  1.  /*!lex2php
  2.  "(" {
  3.      $this->yypushstate(self::INPARAMS);
  4.      return true;
  5.  }
  6.  */

Lastly, if you wish to cycle to the next matching rule, return any value other than true, false or null:

  1.  /*!lex2php
  2.  "{@" ALPHA {
  3.      if ($this->value == '{@internal') {
  4.          return 'more';
  5.      }
  6.      ...
  7.  }
  8.  "{@internal" {
  9.      ...
  10.  }
  11.  */

Note that this procedure is exceptionally inefficient, and it would be far better to take advantage of PHP_LexerGenerator's top-down precedence and instead code:

  1.  /*!lex2php
  2.  "{@internal" {
  3.      ...
  4.  }
  5.  "{@" ALPHA {
  6.      ...
  7.  }
  8.  */

Located in /LexerGenerator.php [line 263]



		
				Author(s):
		
		
		API Tags:
Example:  Example lexer generated php code
Example:  Example usage of PHP_LexerGenerator
Example:  File_ChessPGN lexer source (complex)
Example:  File_ChessPGN lexer generated php code
Example:  Example lexer source

Information Tags:
Since:  Class available since Release 0.1.0
Copyright:  2006 Gregory Beaver
License:  PHP License 3.01
Version:  @package_version@

Methods

[ Top ]
Method Summary
PHP_LexerGenerator   __construct()   Create a lexer file from its skeleton plex file.

[ Top ]
Methods
Constructor __construct  [line 273]

  PHP_LexerGenerator __construct( string $lexerfile  )

Create a lexer file from its skeleton plex file.

Parameters:
string   $lexerfile:  path to the plex file


[ Top ]

Documentation generated on Sun, 02 Jul 2006 08:51:25 -0400 by phpDocumentor 1.3.0