Tips
This section contains a lot of accumulated lore about using Happy.
Performance Tips
How to make your parser go faster:
If you are using GHC, generate parsers using the
-a -g -c
options, and compile them using GHC with the-fglasgow-exts
option. This is worth a lot, in terms of compile-time, execution speed and binary size. [4]The lexical analyser is usually the most performance critical part of a parser, so it’s worth spending some time optimising this. Profiling tools are essential here. In really dire circumstances, resort to some of the hacks that are used in the Glasgow Haskell Compiler’s interface-file lexer.
Simplify the grammar as much as possible, as this reduces the number of states and reduction rules that need to be applied.
Use left recursion rather than right recursion wherever possible. While not strictly a performance issue, this affects the size of the parser stack, which is kept on the heap and thus needs to be garbage collected.
Compilation-Time Tips
We have found that compiling parsers generated by Happy can take a large amount of time/memory, so here’s some tips on making things more sensible:
Include as little code as possible in the module trailer. This code is included verbatim in the generated parser, so if any of it can go in a separate module, do so.
Give type signatures for everything (see Type Signatures. This is reported to improve things by about 50%. If there is a type signature for every single non-terminal in the grammar, then Happy automatically generates type signatures for most functions in the parser.
Simplify the grammar as much as possible (applies to everything, this one).
Use a recent version of GHC. Versions from 4.04 onwards have lower memory requirements for compiling Happy-generated parsers.
Using Happy’s
-g -a -c
options when generating parsers to be compiled with GHC will help considerably.
Finding Type Errors
Finding type errors in grammar files is inherently difficult because the code for reductions is moved around before being placed in the parser.
We currently have no way of passing the original filename and line numbers to the Haskell compiler, so there is no alternative but to look at the parser and match the code to the grammar file.
An info file (generated by the -i
option) can be helpful here.
Type signature sometimes help by pinning down the particular error to the place where the mistake is made, not half way down the file. For each production in the grammar, there’s a bit of code in the generated file that looks like this:
HappyAbsSyn<n> ( E )
where E
is the Haskell expression from the grammar file (with $n
replaced by happy_var_n
).
If there is a type signature for this production, then Happy will have taken it into account when declaring the HappyAbsSyn datatype, and errors in E
will be caught right here.
Of course, the error may be really caused by incorrect use of one of the happy_var_n
variables.
(this section will contain more info as we gain experience with creating grammar files. Please send us any helpful tips you find.)
Conflict Tips
Conflicts arise from ambiguities in the grammar. That is, some input sequences may possess more than one parse. Shift/reduce conflicts are benign in the sense that they are easily resolved (Happy automatically selects the shift action, as this is usually the intended one). Reduce/reduce conflicts are more serious. A reduce/reduce conflict implies that a certain sequence of tokens on the input can represent more than one non-terminal, and the parser is uncertain as to which reduction rule to use. It will select the reduction rule uppermost in the grammar file, so if you really must have a reduce/reduce conflict you can select which rule will be used by putting it first in your grammar file.
It is usually possible to remove conflicts from the grammar, but sometimes this is at the expense of clarity and simplicity. Here is a cut-down example from the grammar of Haskell (1.2):
exp : exp op exp0
| exp0
exp0 : if exp then exp else exp
...
| atom
atom : var
| integer
| '(' exp ')'
...
This grammar has a shift/reduce conflict, due to the following ambiguity. In an input such as
if 1 then 2 else 3 + 4
the grammar doesn’t specify whether the parse should be
if 1 then 2 else (3 + 4)
or
(if 1 then 2 else 3) + 4
and the ambiguity shows up as a shift/reduce conflict on reading the ‘op’ symbol.
In this case, the first parse is the intended one (the ‘longest parse’ rule), which corresponds to the shift action.
Removing this conflict relies on noticing that the expression on the left-hand side of an infix operator can’t be an exp0
(the grammar previously said otherwise, but since the conflict was resolved as shift, this parse was not allowed).
We can reformulate the exp
rule as:
exp : atom op exp
| exp0
and this removes the conflict, but at the expense of some stack space while parsing (we turned a left-recursion into a right-recursion). There are alternatives using left-recursion, but they all involve adding extra states to the parser, so most programmers will prefer to keep the conflict in favour of a clearer and more efficient parser.
LALR(1) parsers
There are three basic ways to build a shift-reduce parser. Full LR(1) (the `L’ is the direction in which the input is scanned, the `R’ is the way in which the parse is built, and the `1’ is the number of tokens of lookahead) generates a parser with many states, and is therefore large and slow. SLR(1) (simple LR(1)) is a cut-down version of LR(1) which generates parsers with roughly one-tenth as many states, but lacks the power to parse many grammars (it finds conflicts in grammars which have none under LR(1)).
LALR(1) (look-ahead LR(1)), the method used by Happy and yacc, is a tradeoff between the two. An LALR(1) parser has the same number of states as an SLR(1) parser, but it uses a more complex method to calculate the lookahead tokens that are valid at each point, and resolves many of the conflicts that SLR(1) finds. However, there may still be conflicts in an LALR(1) parser that wouldn’t be there with full LR(1).
Using Happy with GHCi
GHCi’s compilation manager doesn’t understand Happy grammars, but with some creative use of macros and makefiles we can give the impression that GHCi is invoking Happy automatically:
Create a simple makefile, called
Makefile_happysrcs
:HAPPY = happy HAPPY_OPTS = all: MyParser.hs %.hs: %.y $(HAPPY) $(HAPPY_OPTS) $< -o $@
Create a macro in GHCi to replace the
:reload
command, like so (type this all on one line)::def myreload (\_ -> System.system "make -f Makefile_happysrcs" >>= \rr -> case rr of { System.ExitSuccess -> return ":reload" ; _ -> return "" })
Use
:myreload
(:my
will do) instead of:reload
(:r
).
Basic monadic Happy use with Alex
Alex lexers are often used by Happy parsers, for example in GHC.
While many of these applications are quite sophisticated, it is still quite useful to combine the basic Happy %monad
directive with the Alex monad
wrapper.
By using monads for both, the resulting parser and lexer can handle errors far more gracefully than by throwing an exception.
The most straightforward way to use a monadic Alex lexer is to simply use the Alex
monad as the Happy monad:
{
module Lexer where
}
%wrapper "monad"
tokens :-
...
{
data Token = ... | EOF
deriving (Eq, Show)
alexEOF = return EOF
}
{
module Parser where
import Lexer
}
%name pFoo
%tokentype { Token }
%error { parseError }
%monad { Alex } { >>= } { return }
%lexer { lexer } { EOF }
%token
...
%%
...
parseError :: Token -> Alex a
parseError _ = do
((AlexPn _ line column), _, _, _) <- alexGetInput
alexError ("parse error at line " ++ (show line) ++ ", column " ++ (show column))
lexer :: (Token -> Alex a) -> Alex a
lexer = (alexMonadScan >>=)
}
We can then run the finished parser in the Alex
monad using runAlex
,
which returns an Either
value rather than throwing an exception in case of a parse or lexical error:
import qualified Lexer as Lexer
import qualified Parser as Parser
parseFoo :: String -> Either String Foo
parseFoo s = Lexer.runAlex s Parser.pFoo