Objectives

  • Explore CFLs from the perspective of context-free grammars

Homework Problems

Total: 10 points

Submitting

Submit this assignment on Gradescope.

The submission should contain only a zip archive of your code and the associated README. Don’t forget to submit a README file containing the required information, including time spent and resources consulted.

The submission must include the following files (NOTE: everything is case-sensitive):

  • a Makefile with the following targets:

    • setup (optional)

    • run-hw6-dyck

    • run-hw6-generate-n

    • run-hw6-verify

    • run-hw6-derive

  • a README containing the required information,

  • the source code files needed by your Makefile,

1. Represent and Produce a CFG

This problems asks you to demonstrate you can design CFGs.

Recall Definition 2.2 from the book. A CFG is a 4-tuple $(V,\Sigma,R,S)$, where:

  • $V$ is a finite set of variables,

  • $\Sigma$ is a finite alphabet,

  • $R: V \times \textrm{ Stringof } (V \cup \Sigma)$ is a finite set of rules, and

  • $S$ is the start variable

Given an English language description of a context free language, you will write in your own internal representation a grammar generating that language. Your grammar should be /unambiguous/. You will also construct a function CFG->XML to output the result as an XML element. Create a CFG that generates the following language L. You can assume alphabet $\Sigma = \{(,),[,]\}$.

$ L = \{w \mid w = \textrm{ is a string of well-balanced matching brackets}\}$.

Your Tasks

  • Design a data representation for this formal definition of CFGs. You may use objects, structs, or anything else available in your language.

  • Write up an informal description of your chosen representation in a section of your README, labeled CFG Data Representation.

  • Implement your data representation in your chosen language.

  • Create a CFG that generates the given language L.

  • Construct a function CFG->XML to output the result as an XML

You will produce as output an XML structure element containing the productions of your grammar for this language. We will exercise your output by testing it against a sequence of strings in the language.

Your solution will be tested as follows:

  • Input (from stdin): Nothing

  • Expected Output (to stdout): an XML structure element, containing the productions of your grammar. Note we will manually inspect that you are in fact constructing the CFG and not just printing the output.

  • Makefile target name: run-hw6-dyck

  • Example:

    make -sf Makefile run-hw6-dyck

    Output:

<structure>
<type>grammar</type>
<production><left>S</left><right>[S]</right></production>
<production><left>S</left><right>(S)</right></production>
<production><left>S</left><right>SS</right></production>
<production><left>S</left><right/></production>
</structure>

2. Generate Strings in a CFG

This problem asks you to demonstrate you understand the JFLAP CFG format, to internalize a new technical term related to derivations, and know what it means to generate strings a language.

You will below a JFLAP file containing a CFG representing the language of types in a simple ML-like language. I include below a diagram from “Programming in Standard ML” describing this same information. The terminals of this grammar are the tokens, i.e., the “words”, of the language, which includes identifiers (like labels, ids, or type variables), parens and curly braces, , *, and punctuation (colon and comma). This means all names would be leaves in a parse tree: we don’t separate them into individual characters. Whitespace is not included in the terminals and should be ignored. Notice that real-world languages are more complex than the languages we have thus-far worked with; this is just to describe a type in ML.

In fact, the JFLAP format requires the variables be (a subset of the) single capital letters and treats any other characters as terminals. You should adhere to this format. I’ve given you a version of this situation with a fixed quantity of labels, type variables, and IDs to both simplify your lives and to accommodate JFLAP. More specifically, you will receive the name of a JFF file containing the description of a CFG, and a number $n$ representing the maximum quantity of simultaneous substitutions.

The productions of a grammar are also called (Cf. 2.1) “substitution rules” or simply “rules.” Define simultaneous substitution into $w$ of $R$ as, if $w$ is a word $abXdcYdZX$, with $a$, $b$, $c$, $d$, in $\Sigma$ and $X$, $Y$, $Z$ in $V$, as the set of all strings produced by applying at most one rule from $R$ for each instance of each variable in $w$. Note that not applying a rule to a variable is also a valid operation. There are almost always more than one ways to perform a simultaneous substitution into a string. To help you we use the following one-character abbreviations: Type, TypeVariable, Id, Label, TUple (a parethesized grouping of types), and Struct (a non-empty sequence of label-type pairs).

Your Tasks

  • Open and read in a file (types.jff for this example) containing a grammar

  • Construct an instance of this grammar in the internal representation you devised

  • Generate the set of all strings in the language of the grammar produced within a fixed number of simultaneous substitutions.

  • Write out the set of strings one string per line

Your solution will be tested as follows:

  • Input (from stdin): The name of a .jff file containing a CFG, and a number $n$

  • Expected Output (to stdout): the set of all strings derivable through up to $n$ simultaneous derivations, one string per line.

  • Makefile target name: run-hw6-generate-n

  • Example:

    printf "types.jff 2" | make -sf Makefile run-hw6-generate-n

    Output:

    d
    c
    b
    a
    t4
    t3
    t2
    t1
    

    Please notice the set of strings gets dramatically larger as the values of $n$ increases.

3. Verify a Simultaneous Derivation

This problem asks you to demonstrate that you understand how to derive strings from a grammar.

Your task is to verify a putative simultaneous derivation of a string in the language of the grammar, starting with the string containing only the start symbol.

Again, we ask you to verify /simultaneous derivations/.

Your Tasks

  • Given a putative simultaneous derivation, decide if this sequence represents a valid simultaneous derivation in the language of the grammar of types.jff.

Your solution will be tested as follows:

  • Input (from stdin): A putative simultaneous derivation, one intermediate string per line, starting with the string containing only the initial symbol on the first line.

  • Expected Output (to stdout):

    • accept, if this sequence represents a valid simultaneous derivation in the grammar,

    • otherwise reject otherwise

  • Makefile target name: run-hw6-verify

  • Example:

    printf "T\n{S}\n{L:T,S}\n{fst:V,L:T}\n{fst:t1,snd:V}\n{fst:t1,snd:t2}\n" | make -sf Makefile run-hw6-verify

    Output: accept

    printf "T\n{T}\n{L:T,S}\n{fst:V,L:T}\n{fst:t1,snd:V}\n{fst:t1,snd:t2}\n" | make -sf Makefile run-hw6-verify

    Output: reject

    printf "T\n{S}\n{L:T,S}\n{fst:V,L:T}\n" | make -sf Makefile run-hw6-verify

    Output: reject

    printf "T\n{S}\n{L:T,S}\n{fst:V,L:V}\n{fst:t1,snd:V}\n{fst:t1,snd:t2}\n" | make -sf Makefile run-hw6-verify

    Output: reject

    printf "T\n{S}\n{L:T,S}\n{fst:V,L:T}\n{fst:V,L:V}\n{fst:t1,snd:t2}\n" | make -sf Makefile run-hw6-verify

    Output: accept

4. Produce a simultaneous derivation

You will implement a program that, given a JFLAP XML description of a CFG, an intermediate source string in the language of that grammar, and an intermediate target string in the language of that grammar, will return a simultaneous derivation of the target from the source. We guarantee that the strings we provide will be intermediate strings of the grammar. See the both the discussion of Lemma 2.21 and the discussion after Definition 2.2 (“yields”, “derives”) distinguishing the language of the grammar from the language of intermediate strings

You will produce a derivation of the intermediate string in the grammar. You will write out the sequence of strings in the derivation, one per line, starting with the initial source string, and ending with the target string in question. Our checks will ensure that this sequence of strings in your derivation follows from the grammar.

Your Tasks

  • Given the name of a JFF file containing the description of a CFG, an intermediate source string in the language of the grammar, and an intermediate target string in the language of this grammar, construct a valid simultaneous derivation in the language of the grammar from the source to the target.

Your solution will be tested as follows:

  • Input (from stdin): A filename, followed by a source string, followed by a target string. These will be newline-separated, to account for derivations beginning with and ending with the empty string. Likewise you will use a blank link in your output to indicate the empty string.

  • Expected Output (to stdout):

    • a sequence of intermediate strings, one per line, representing a valid simultaneous derivation in the grammar
  • Makefile target name: run-hw6-derive

  • Example:

    printf "types.jff\n{fst:V,L:T}\n{fst:t1,snd:t2}\n" | make -sf Makefile run-hw6-derive

    Output:

    {fst:V,L:T}
    {fst:t1,snd:V}
    {fst:t1,snd:t2}
    

    printf "types.jff\nT\n{fst:t1,snd:t2}\n" | make -sf Makefile run-hw6-derive

    T
    {S}
    {L:T,S}
    {fst:V,L:T}
    {fst:t1,snd:V}
    {fst:t1,snd:t2}
    

Updated: