Compilerbau: Berechnung von FIRST- und FOLLOW-Mengen

Beispiel in Haskell

Zentrale Datentypen für Symbole, Regeln, Grammatiken

type Symbol      = String
 
type Nonterminal = Symbol
 
type Terminal    = Symbol
 
type Word        = [Symbol]
 
type SymSet      = Set Symbol
 
type SymMap      = Map Symbol SymSet
 
type Rule        = (Nonterminal, Word)
 
type Rules       = Set Rule
 
type Grammar     = (SymSet, SymSet, Rules, Nonterminal)
--                   N       T       P      S

Funktionen für eigene Kontrollstrukturen

-- loop over a set of values, e.g. Symbols, Rules, ...
 
forEachElem :: (v -> a -> a) -> Set v -> a -> a
forEachElem op =
  flip (foldl' (flip op))
 
-- loop over a list of values, e.g. Symbols, Words, ...
 
forEach :: (v -> a -> a) -> [v] -> a -> a
forEach op =
  flip (P.foldr op)
 
-- repeated application of a function
-- result: [x, f x, f (f x), f (f (f x)), ...]
 
iterate :: (a -> a) -> a -> [a]
iterate f x = x : iterate f (f x)
 
-- look for the least fixpoint
-- in a list constructed by an iterate
 
fixpoint :: Eq a => [a] -> a
fixpoint (x1 : xs1@(x2 : _))
  | x1 == x2  = x1
  | otherwise = fixpoint xs1

Die Nullable-Menge und die FIRST- und FOLLOW-Mengen werden iterativ berechnet.

Es wird mit einfachen Anfangsmengen begonnen, diese werden schrittweise bei jeder Iteration vergrößert.

Die Iteration wird abgebrochen, wenn sich die Mengen nicht mehr vergrößern (fixpoint).

Während einer Iteration wird eine Schleife über alle Produktionen ausgeführt (forEachElem).

Während der Verarbeitung einer Regel werden Schleifen über alle Symbole einer rechten Seite (forEach) benötigt.

Weiter muss über alle Elemente von Symbolmengen iteriert werden (forEachElem)

Algorithmus für die Menge der auf ε ableitbaren Nichtterminal-Symbole

-- the set of nullable symbols
 
nullables :: Grammar -> SymSet
nullables = fixpoint . nullables'
 
-- the list of intermediate results
-- when computing the nullable symbols
 
nullables' :: Grammar -> [SymSet]
nullables' (n, t, rules, s) =
  iterate nullSyms empty   -- start with empty set
  where
    nullSyms :: SymSet -> SymSet
    nullSyms nsys =
      forEachElem nullSym rules nsys
      where
        nullSym :: Rule -> SymSet -> SymSet
        nullSym (x, ys) acc
          | nullableWord nsys ys = insert x acc  -- extend !!!
          | otherwise            =          acc
 
-- test nullable for all symbols in a word
 
nullableWord :: SymSet -> Word -> Bool
nullableWord nulls w =
  forEach (\ y r -> y `member` nulls && r) w True

Eine Iteration besteht aus einer Schleife über alle Regeln und einer Schleife über alle Zeichen der rechten Seite einer Regel.

Algorithmus für die FIRST-Mengen

-- get the least fixpoint of the list of first sets
 
firstSets :: SymSet -> Grammar -> SymMap
firstSets nulls =
  fixpoint . firstSets' nulls
 
-- for traces we want the list of all intermediate results
 
firstSets' :: SymSet -> Grammar -> [SymMap]
firstSets' nulls (n, t, rules, s) =
  iterate firstSyms initFirstSyms  -- initial table
  where
    firstSyms :: SymMap -> SymMap
    firstSyms fsyms =
      forEachElem firstSym rules fsyms
      where
        -- insert first set of RHS into firstSyms of LHS
        firstSym :: Rule -> SymMap -> SymMap
        firstSym (x, ys) acc =
          insertSyms x (first nulls fsyms ys) acc  -- extend sets
 
    -- init first map
    -- for all terminal syms t: first(t) = {t}
    -- for all nonterminals  n: first(n) = {}
 
    initFirstSyms :: SymMap
    initFirstSyms = initT `unionSyms` initN
      where
        initT, initN :: SymMap
        initT =
          forEachElem (\sym -> insertSyms sym (singleton sym)) t emptySyms
 
        initN =
          forEachElem (\sym -> insertSyms sym empty) n emptySyms
 
-- take a word [y1,y2,..,yn], e.g. a right hand side
-- of a grammar rule, and compute the FIRST set
-- with respect to the table "fsets" and the "nulls" set
 
first :: SymSet -> SymMap -> Word -> SymSet
first nulls fSets w =
  forEach firstSym w empty
  where
    firstSym :: Symbol -> SymSet -> SymSet
    firstSym x r
      | x `member` nulls = fx `union` r
      | otherwise            = fx
      where
        fx = lookupSyms x fSets

Eine Iteration besteht aus einer Schleife über alle Regeln (firstSyms).

Die FIRST-Menge der linken Seite einer Regel wird um first von der rechten Seite vergrößert.

first nimmt eine rechte Seite (ein Wort) und berechnet unter Berücksichtigung der ε-Symbole aus der FIRST-Tabelle für Zeichen die FIRST-Menge der rechten Seite.

Diese Berechnung enthält wieder eine Schleife über die rechte Seite.

Algorithmus für die Berechnung der FOLLOW-Mengen

followSets :: SymSet -> SymMap -> Grammar -> SymMap
followSets nulls firsts g =
  fixpoint $ followSets' nulls firsts g
 
-- for traces we want the list of intermediate results
 
followSets' :: SymSet -> SymMap -> Grammar -> [SymMap]
followSets' nulls firsts (n, t, rules, s) =
  iterate followSyms initFollowSyms
  where
    followSyms :: SymMap -> SymMap
    followSyms fsyms =
      forEachElem followSym rules fsyms
      where
        followSym :: Rule -> SymMap -> SymMap
        followSym (x, ys) = followX (reverse ys) . followYS ys
          where
 
            -- extend follow of last symbol of RHS ys' by follow of LHS x
            -- and in case of nullables at the end of RHS
            -- extend follow of the previous symbols too
            -- observe followX is called with reversed RHS
 
            followX :: Word -> SymMap -> SymMap
            followX ys' sm =
              forEach addFX ys' emptySyms `unionSyms` sm
              where
                addFX :: Symbol -> SymMap -> SymMap
                addFX y r
                  -- optimization: terminals don't need to be processed
                  | y `member` t     = emptySyms
                  | y `member` nulls = r' `unionSyms` r
                  | otherwise        = r'
                  where
                    r' = singletonSyms y (lookupSyms x fsyms)
 
            -- extend follow(y1) by first(y2) and
            -- in case of nullable(y2) the y3 and so on
 
            followYS :: Word -> SymMap -> SymMap
            followYS w sm =
              forEach followRHS (rhs w) sm
              where
 
                -- split a RHS [y1, y2, ..., yn] into
                -- [(y1, [y2, ..., yn]), (y2, [..., yn]), ..., (yn, [])]
 
                rhs :: Word -> [(Symbol, Word)]
                rhs = map (head &&& tail) . init . tails
 
                followRHS :: (Symbol, Word) -> SymMap -> SymMap
                followRHS (y1, ys) r1
                  -- optimization: terminals don't need to be processed
                  | y1 `member`t = r1
                  | otherwise    = insertSyms y1 (first nulls firsts ys) r1
 
    -- optimization: for parser construction
    -- follow sets are only used for nontermnals
    -- so the table of follow sets is restricted to n
    -- this can reduce the # of iterations
    -- and the work per rule
 
    initFollowSyms :: SymMap
    initFollowSyms =
      forEachElem (\sym -> insertSyms sym empty) n emptySyms

Eine Iteration besteht aus einer Schleife über alle Regeln (followSyms).

Aus einer Regel kann auf zwei Wegen Information über die FOLLOW-Mengen gewonnen werden (followX und followYS).

Aus X ::= Y₁ ... Y_n kann geschlossen werden, dass FOLLOW(X) in FOLLOW(Yn) enthalten ist.

Aus X ::= ...Y_iys kann geschlossen werden, dass für ys, dem Wort hinter Y_i, first(ys) in FOLLOW(Y_i) enthalten ist.

In diesem Algorithmus sind vier geschachtelte Schleifen enthalten.

Berechnung von FIRST- und FOLLOW-Mengen

Algorithmen für Nullable, FIRST und FOLLOW

Zentrale Datentypen für Symbole, Regeln, Grammatiken

Funktionen für eigene Kontrollstrukturen

Algorithmus für die Menge der auf ε ableitbaren Nichtterminal-Symbole

Algorithmus für die FIRST-Mengen

Algorithmus für die Berechnung der FOLLOW-Mengen