Haskell-Style Parsing and Interpretation
Table of Contents
1 Introduction
Haskell is a language full of embedded domain-specific languages. A "domain-specific language", or DSL, is a language specialized for a particular purpose, or "domain". The benefit of a DSL is that it can offer programming primitives directly relevant to your domain, while guarding you from common domain pitfalls. Some languages come with libraries and frameworks, while others offer full-blown, embedded DSLs.
This document presents a couple embedded Haskell DSLs for the parsing and interpretation of some regular, context-free, and context-sensitive languages. The distinction between these language classes will be made clear along the way. The DSLs presented are "embedded" in the sense that you always have the full power of Haskell at your fingertips.
Not all programming langugages allow the "embedding" of domain-specific languages. This demands that the designer of a language library can define new, and override existing operators and control-flow structures.
For instance, C++, C#, and Python, all allow advanced operator overloading, but
none of them let you to modify the behaviour of sequencing program statements
(in C# and C++, the ;
operator). Haskell lets you do this.
Haskell is also a functional programming language; which in this case means that it comes with quick and (relatively) painless "glue code" for binding seemingly disparate domain-specific code, into one cohesive whole (e.g., binding the stages of a compiler or interpreter).
1.1 Audience
The principal target audience for this tutorial are DIKU students, coming to learn Haskell after having taken an undergraduate course on Compiler Construction.
2 Abstract Syntax Trees
The input to a compiler or interpreter is an unstructured stream of bytes. To do any useful work, we recast this stream of bytes into a more structured data type, an abstract syntax tree. In a conventional, multi-pass compiler this is the job of a lexer and/or parser.
For instance, consider the strings "2+5*8" and "5*8+3". To evaluate such arithmetical expressions properly, we need to mind the rules of operator precedence. One possible abstract syntax tree for such arithmetical expressions can be defined as follows:
data Expr = Add Expr Expr | Mul Expr Expr | Num Integer deriving (Show)
This defines a type constructor Expr
, with three data constructors:
Add :: Expr -> Expr -> Expr Mul :: Expr -> Expr -> Expr Num :: Integer -> Expr
This notation means that Add
and Mul
are functions that given two Expr
,
yield an Expr
, and Num
is a function that given an Integer
yields an
Expr
.
The last line of the Expr
declaration:
deriving (Show)
Means that Haskell should employ some default behaviour when you attempt to
print values of type Expr
. Without this line, the Haskell will complain that
Expr
is not an instance of Show
, and so cannot be "shown".
The above strings would then be represented as the Expr
values (Add (Num 2)
(Mul (Num 5) (Num 8)))
and (Add (Mul (Num 5) (Num 8)) (Num 3))
.
Here, evaluation order no longer is ambiguous, and we can walk the syntax tree to evaluate it:
eval :: Expr -> Integer eval (Add e1 e2) = (eval e1) + (eval e2) eval (Mul e1 e2) = (eval e1) * (eval e2) eval (Num x) = x
2.1 Files and Modules
A Haskell file begins with a module
declaration:
module Ast where data Expr = Add Expr Expr | Mul Expr Expr | Num Integer deriving (Show)
The basename of the Haskell file name must be the name of the module (e.g.,
the above module should be stored in a file called Ast.hs
).
2.2 Testing
ghci
3 Regular Expressions
Regex | Haskell |
---|---|
x | char :: Char -> Parser Char |
xyz | string :: String -> Parser String |
| | (+++) :: Parser a -> Parser a -> Parser a |
* |
many :: Parser a -> Parser [a] |
+ |
many1 :: Parser a -> Parser (a, [a]) |
? |
option :: Parser a -> a -> Parser a |
[] |
choose :: [a] -> Parser a |
Before considering
3.1 Modules and Directories
Modules can be structured in directories. This allows you hide module internals, while permitting submodules with broad access rights. This in turn lets you write internal module tests, while exporting
We can put the implementation of the parser in Parser/Impl.hs
, exporting
everything:
module Parser.Impl where
This allows to write a comprehensive test module testing parser internals in
Parser/Test.hs
:
module Parser.Test where
As to the parser itself, we can declare a new module which imports
Parser.Impl
, but hides all unnecessary functionality:
module Parser where import Parser.Impl
3.2 Testing
Shell scripts.
#!/usr/bin/env bash set -euo pipefail ghci -v0 Parser.hs <<EOF EOF