How MathLex Works

MathLex works in two phases. The first phase compiles a MathLex expression into an Abstract Syntax Tree (AST) that can be represented in memory, and the second phase converts the AST into some type of output.

Input → Syntax Tree

When provided with a valid MathLex string, MathLex.parse() produces an abstract syntax tree (AST) representing the inferred value of the MathLex code. Under the hood, the first phase has two components: a preprocessor called a Tokenizer and then the main Parser.

The Tokenizer is responsible for translating the characters in the MathLex input string into a list of Tokens, a way to group related characters into a single symbol. For example, "<=" is shorthand for "less than or equal to" (in display math, '≤') and is comprised of two separate characters. The Tokenizer groups these characters into a TLessEqual Token for the Parser. A list of all Tokens is given in Appendix A.

The Parser then reads the list of tokens and assembles the corresponding AST. The AST is built from different "node" types represented as a recursive array. Every node has a string name indicating the type of node, and optionally one or more subnodes for its arguments. The grammar rules used by the parser are given in the source file and an overview of all produced tree nodes are given in Appendix B.

For example, the MathLex input for the quadratic formula, \[ x = \frac{-b \pm \sqrt{b^2-4a\,c}}{2a} \] is x = (-b +/- sqrt(b^2-4*a*c))/(2*a). This is an equation, so the root node is an equality (=), and its two subnodes are an identifier (\( x \)) and a quotient (÷), which is further broken down into its subnodes as displayed in the figure below.

AST for the Quadratic Formula
AST for the Quadratic Formula

Syntax Tree → Output

The AST returned by the parser gives a mathematically faithful model of the meaning behind the interpreted input text. It is evaluated correctly by evaluating each nodes' children and then performing the parent node operation on the child values (this is called a recursive postorder traversal). Such tools to recursively evaluate the AST are called Translators or Renderers. These terms are used interchangeably. So far, tranlators have been written for LaTeX, Sage CAS (partially), and a textual output of the AST. The author plans to write additional translators (Maple, Mathematica, and MathML, for example), and volunteers willing and able to help write such translators are welcome.

LaTeX Translator

Using the quadratic formula example above, one could build a LaTeX translator from the following rules:

  • An equality is represented as "LHS = RHS"
  • Variables and numbers are expressed as-is
  • A fraction is represented as "\frac{NUMERATOR}{DENOMINATOR}"
  • Plus-or-Minus is represented as "LHS \pm RHS"
  • Negation is represented as "-SUBEXPR"
  • Square Roots are represented as "\sqrt{SUBEXPR}"
  • Subtraction is represented as "LHS - RHS"
  • Exponents are represented as "BASE^{POWER}". Note the braces around the exponent.
  • Multiplication is represented as a space between operands: "LHS \, RHS"

This latex translator would start at the root node: since it is an equality, the translator will translate the left-hand-side (LHS) and the right-hand-side (RHS) and then put an equals sign (=) between them. The LHS is a variable (\( x \)), so its translated value would be x. The RHS is a quotient, and the numerator and denominator will each have to be translated before they can be entered into the LaTeX fraction command. The translator will continue until all sub-nodes are translated, and then the root node's translation will be returned as

x = \frac{-b \pm \sqrt{b^{2} - 4 \, a \, c}}{2 \, a}

Sage Translator

The sage translator works similarly and returns the following line of code:

x == ( (? PlusMinus ?) )/(2*a)

Note that Sage does not support the Plus/Minus operation and therefore cannot accurately be translated. Future support for this operation may split the returned Sage expression into two forms: one plus, and the other minus. If the +/- operator is replaced by a +, then the Sage renderer returns

x == (-b + sqrt(b^(2) - 4 * a * c))/(2 * a)

Text-Tree Renderer

The text-tree renderer yields the following output:

    Variable: x
                    Variable: b
                        Variable: sqrt
                                Variable: b
                                Literal: 2
                                    Literal: 4
                                    Variable: a
                                Variable: c
                Literal: 2
                Variable: a