Tips

This section contains a lot of accumulated lore about using Happy.

Performance Tips

How to make your parser go faster:

  • If you are using GHC, generate parsers using the -a -g -c options, and compile them using GHC with the -fglasgow-exts option. This is worth a lot, in terms of compile-time, execution speed and binary size. 4

  • The lexical analyser is usually the most performance critical part of a parser, so it’s worth spending some time optimising this. Profiling tools are essential here. In really dire circumstances, resort to some of the hacks that are used in the Glasgow Haskell Compiler’s interface-file lexer.

  • Simplify the grammar as much as possible, as this reduces the number of states and reduction rules that need to be applied.

  • Use left recursion rather than right recursion wherever possible. While not strictly a performance issue, this affects the size of the parser stack, which is kept on the heap and thus needs to be garbage collected.

Compilation-Time Tips

We have found that compiling parsers generated by Happy can take a large amount of time/memory, so here’s some tips on making things more sensible:

  • Include as little code as possible in the module trailer. This code is included verbatim in the generated parser, so if any of it can go in a separate module, do so.

  • Give type signatures for everything (see Type Signatures. This is reported to improve things by about 50%. If there is a type signature for every single non-terminal in the grammar, then Happy automatically generates type signatures for most functions in the parser.

  • Simplify the grammar as much as possible (applies to everything, this one).

  • Use a recent version of GHC. Versions from 4.04 onwards have lower memory requirements for compiling Happy-generated parsers.

  • Using Happy’s -g -a -c options when generating parsers to be compiled with GHC will help considerably.

Finding Type Errors

Finding type errors in grammar files is inherently difficult because the code for reductions is moved around before being placed in the parser. We currently have no way of passing the original filename and line numbers to the Haskell compiler, so there is no alternative but to look at the parser and match the code to the grammar file. An info file (generated by the -i option) can be helpful here.

Type signature sometimes help by pinning down the particular error to the place where the mistake is made, not half way down the file. For each production in the grammar, there’s a bit of code in the generated file that looks like this:

HappyAbsSyn<n> ( E )

where E is the Haskell expression from the grammar file (with $n replaced by happy_var_n). If there is a type signature for this production, then Happy will have taken it into account when declaring the HappyAbsSyn datatype, and errors in E will be caught right here. Of course, the error may be really caused by incorrect use of one of the happy_var_n variables.

(this section will contain more info as we gain experience with creating grammar files. Please send us any helpful tips you find.)

Conflict Tips

Conflicts arise from ambiguities in the grammar. That is, some input sequences may possess more than one parse. Shift/reduce conflicts are benign in the sense that they are easily resolved (Happy automatically selects the shift action, as this is usually the intended one). Reduce/reduce conflicts are more serious. A reduce/reduce conflict implies that a certain sequence of tokens on the input can represent more than one non-terminal, and the parser is uncertain as to which reduction rule to use. It will select the reduction rule uppermost in the grammar file, so if you really must have a reduce/reduce conflict you can select which rule will be used by putting it first in your grammar file.

It is usually possible to remove conflicts from the grammar, but sometimes this is at the expense of clarity and simplicity. Here is a cut-down example from the grammar of Haskell (1.2):

exp     : exp op exp0
        | exp0

exp0    : if exp then exp else exp
        ...
        | atom

atom    : var
        | integer
        | '(' exp ')'
        ...

This grammar has a shift/reduce conflict, due to the following ambiguity. In an input such as

if 1 then 2 else 3 + 4

the grammar doesn’t specify whether the parse should be

if 1 then 2 else (3 + 4)

or

(if 1 then 2 else 3) + 4

and the ambiguity shows up as a shift/reduce conflict on reading the ‘op’ symbol. In this case, the first parse is the intended one (the ‘longest parse’ rule), which corresponds to the shift action. Removing this conflict relies on noticing that the expression on the left-hand side of an infix operator can’t be an exp0 (the grammar previously said otherwise, but since the conflict was resolved as shift, this parse was not allowed). We can reformulate the exp rule as:

exp     : atom op exp
        | exp0

and this removes the conflict, but at the expense of some stack space while parsing (we turned a left-recursion into a right-recursion). There are alternatives using left-recursion, but they all involve adding extra states to the parser, so most programmers will prefer to keep the conflict in favour of a clearer and more efficient parser.

LALR(1) parsers

There are three basic ways to build a shift-reduce parser. Full LR(1) (the `L’ is the direction in which the input is scanned, the `R’ is the way in which the parse is built, and the `1’ is the number of tokens of lookahead) generates a parser with many states, and is therefore large and slow. SLR(1) (simple LR(1)) is a cut-down version of LR(1) which generates parsers with roughly one-tenth as many states, but lacks the power to parse many grammars (it finds conflicts in grammars which have none under LR(1)).

LALR(1) (look-ahead LR(1)), the method used by Happy and yacc, is a tradeoff between the two. An LALR(1) parser has the same number of states as an SLR(1) parser, but it uses a more complex method to calculate the lookahead tokens that are valid at each point, and resolves many of the conflicts that SLR(1) finds. However, there may still be conflicts in an LALR(1) parser that wouldn’t be there with full LR(1).

Using Happy with GHCi

GHCi’s compilation manager doesn’t understand Happy grammars, but with some creative use of macros and makefiles we can give the impression that GHCi is invoking Happy automatically:

  • Create a simple makefile, called Makefile_happysrcs:

    HAPPY = happy
    HAPPY_OPTS =
    
    all: MyParser.hs
    
    %.hs: %.y
        $(HAPPY) $(HAPPY_OPTS) $< -o $@
    
  • Create a macro in GHCi to replace the :reload command, like so (type this all on one line):

    :def myreload (\_ -> System.system "make -f Makefile_happysrcs"
       >>= \rr -> case rr of { System.ExitSuccess -> return ":reload" ;
                               _ -> return "" })
    
  • Use :myreload (:my will do) instead of :reload (:r).

Basic monadic Happy use with Alex

Alex lexers are often used by Happy parsers, for example in GHC. While many of these applications are quite sophisticated, it is still quite useful to combine the basic Happy %monad directive with the Alex monad wrapper. By using monads for both, the resulting parser and lexer can handle errors far more gracefully than by throwing an exception.

The most straightforward way to use a monadic Alex lexer is to simply use the Alex monad as the Happy monad:

{
module Lexer where
}

%wrapper "monad"

tokens :-
  ...

{
data Token = ... | EOF
  deriving (Eq, Show)

alexEOF = return EOF
}
{
module Parser where

import Lexer
}

%name pFoo
%tokentype { Token }
%error { parseError }
%monad { Alex } { >>= } { return }
%lexer { lexer } { EOF }

%token
  ...

%%
  ...

parseError :: Token -> Alex a
parseError _ = do
  ((AlexPn _ line column), _, _, _) <- alexGetInput
  alexError ("parse error at line " ++ (show line) ++ ", column " ++ (show column))

lexer :: (Token -> Alex a) -> Alex a
lexer = (alexMonadScan >>=)
}

We can then run the finished parser in the Alex monad using runAlex, which returns an Either value rather than throwing an exception in case of a parse or lexical error:

import qualified Lexer as Lexer
import qualified Parser as Parser

parseFoo :: String -> Either String Foo
parseFoo s = Lexer.runAlex s Parser.pFoo
4

omitting the -a may generate slightly faster parsers, but they will be much bigger.