Haskell-Style Parsing and Interpretation

Table of Contents

1 Introduction

Haskell is a language full of embedded domain-specific languages. A "domain-specific language", or DSL, is a language specialized for a particular purpose, or "domain". The benefit of a DSL is that it can offer programming primitives directly relevant to your domain, while guarding you from common domain pitfalls. Some languages come with libraries and frameworks, while others offer full-blown, embedded DSLs.

This document presents a couple embedded Haskell DSLs for the parsing and interpretation of some regular, context-free, and context-sensitive languages. The distinction between these language classes will be made clear along the way. The DSLs presented are "embedded" in the sense that you always have the full power of Haskell at your fingertips.

Not all programming langugages allow the "embedding" of domain-specific languages. This demands that the designer of a language library can define new, and override existing operators and control-flow structures.

For instance, C++, C#, and Python, all allow advanced operator overloading, but none of them let you to modify the behaviour of sequencing program statements (in C# and C++, the ; operator). Haskell lets you do this.

Haskell is also a functional programming language; which in this case means that it comes with quick and (relatively) painless "glue code" for binding seemingly disparate domain-specific code, into one cohesive whole (e.g., binding the stages of a compiler or interpreter).

1.1 Audience

The principal target audience for this tutorial are DIKU students, coming to learn Haskell after having taken an undergraduate course on Compiler Construction.

2 Abstract Syntax Trees

The input to a compiler or interpreter is an unstructured stream of bytes. To do any useful work, we recast this stream of bytes into a more structured data type, an abstract syntax tree. In a conventional, multi-pass compiler this is the job of a lexer and/or parser.

For instance, consider the strings "2+5*8" and "5*8+3". To evaluate such arithmetical expressions properly, we need to mind the rules of operator precedence. One possible abstract syntax tree for such arithmetical expressions can be defined as follows:

data Expr = Add Expr Expr
	  | Mul Expr Expr
	  | Num Integer
	  deriving (Show)

This defines a type constructor Expr, with three data constructors:

Add :: Expr -> Expr -> Expr
Mul :: Expr -> Expr -> Expr
Num :: Integer -> Expr

This notation means that Add and Mul are functions that given two Expr, yield an Expr, and Num is a function that given an Integer yields an Expr.

The last line of the Expr declaration:

deriving (Show)

Means that Haskell should employ some default behaviour when you attempt to print values of type Expr. Without this line, the Haskell will complain that Expr is not an instance of Show, and so cannot be "shown".

The above strings would then be represented as the Expr values (Add (Num 2) (Mul (Num 5) (Num 8))) and (Add (Mul (Num 5) (Num 8)) (Num 3)).

Here, evaluation order no longer is ambiguous, and we can walk the syntax tree to evaluate it:

eval :: Expr -> Integer
eval (Add e1 e2) = (eval e1) + (eval e2)
eval (Mul e1 e2) = (eval e1) * (eval e2)
eval (Num x) = x

2.1 Files and Modules

A Haskell file begins with a module declaration:

module Ast where

data Expr = Add Expr Expr
	  | Mul Expr Expr
	  | Num Integer
	  deriving (Show)

The basename of the Haskell file name must be the name of the module (e.g., the above module should be stored in a file called Ast.hs).

2.2 Testing

ghci

3 Regular Expressions

Regex Haskell
x char :: Char -> Parser Char
xyz string :: String -> Parser String
| (+++) :: Parser a -> Parser a -> Parser a
* many :: Parser a -> Parser [a]
+ many1 :: Parser a -> Parser (a, [a])
? option :: Parser a -> a -> Parser a
[] choose :: [a] -> Parser a

Before considering

3.1 Modules and Directories

Modules can be structured in directories. This allows you hide module internals, while permitting submodules with broad access rights. This in turn lets you write internal module tests, while exporting

We can put the implementation of the parser in Parser/Impl.hs, exporting everything:

module Parser.Impl where

This allows to write a comprehensive test module testing parser internals in Parser/Test.hs:

module Parser.Test where

As to the parser itself, we can declare a new module which imports Parser.Impl, but hides all unnecessary functionality:

module Parser where

import Parser.Impl

3.2 Testing

Shell scripts.

#!/usr/bin/env bash

set -euo pipefail

ghci -v0 Parser.hs <<EOF

EOF

4 Context-Free Grammars

5 Context-Sensitive Languages

6 Interpretation

Created: 2016-05-12 Thu 18:16

Emacs 24.5.1 (Org mode 8.2.10)

Validate