{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parser grammar\n",
    "\n",
    "### Primitives\n",
    "\n",
    "The boundary between what we consider *primitives* and derived parsers can become a bit vague, nevertheless here is a selection of the most important primitive parsers.\n",
    "\n",
    "`value(x)`\n",
    ": Always succeeds, doesn't consume input, returns `x`\n",
    "\n",
    "`fail(msg)`\n",
    ": Always fails, raises an exception with `msg` as text.\n",
    "\n",
    "`item`\n",
    ": Get a single byte from the stream.\n",
    "\n",
    "`text_literal(str)`\n",
    ": Succeeds if the next characters in the stream exactly match `str`.\n",
    "\n",
    "`char_pred(pred)`\n",
    ": Advances the end of the cursor if `pred` succeeds.\n",
    "\n",
    "`text_end_by(char)`\n",
    ": Advances the end of the cursor as until `char` is found.\n",
    "\n",
    "`push(x)`\n",
    ": Push a value on the auxiliary stack.\n",
    "\n",
    "`pop()`\n",
    ": Pop a value from the auxiliary stack.\n",
    "\n",
    "We also defined some derived parsers that should be useful in most contexts.\n",
    "\n",
    "`whitespace`\n",
    ": Matches tabs spaces and newlines.\n",
    "\n",
    "`eol`\n",
    ": Matches End of Line characters (_i.e.:_ either `\\n` or `\\n\\r`).\n",
    "\n",
    "`integer`\n",
    ": Matches an integer value.\n",
    "\n",
    "`scientific_number`\n",
    ": Matches a floating point number, possibly in scientific notation.\n",
    "\n",
    "### Combinators\n",
    "\n",
    "<!-- This package is strongly based on Haskell's syntax and philosophy. But Python is obviously not Haskell. That is to say, there is no nice syntax for monadic actions. In order to solve this issue, we developed a similar grammar for Python. Below, we present a description of such a grammar. -->\n",
    "The next question is, how can we combine our primitive parsers? We already listed the main combinators briefly, here we go into a little more detail.\n",
    "\n",
    "`choice(*p)`\n",
    ": Tries every parser `p` in sequence until one succeeds. If all fail, `choice` gathers all exceptions and composes an error message from that.\n",
    "\n",
    "`sequence(*p)`\n",
    ": Runs every parser `p` in sequence and only returns the result of the last one.\n",
    "\n",
    "`named_sequence(**p)`\n",
    ": Runs every parser `p` in sequence and stores results in a dictionary. Keys that start with an underscore are not stored.\n",
    "\n",
    "`many(p)`\n",
    ": Runs the parser `p` until it fails. Returns a list of parsed items.\n",
    "\n",
    "`some(p)`\n",
    ": Parses `p` at least one time, or fail.\n",
    "\n",
    "The `many` and `some` combinators come in several flavours. Both have a variant called `many_char` and `some_char` that return a string instead of a list. One more flavour is `many_char_0` and `some_char_0` that do not flush the cursor.\n",
    "\n",
    "Some derived combinators help us shape a little language to describe grammars.\n",
    "\n",
    "`optional(p, default=None)`\n",
    ": Parses `p` or gives the default value.\n",
    "\n",
    "`tokenize(p)`\n",
    ": Parses `p` followed by optional whitespace. This makes sure we always start at the next token.\n",
    "\n",
    "`fmap(f)`\n",
    ": Takes a function `f`, returns a lambda that maps an argument through `f` to a `value` parser. That sounds complicated, but it allows us to pass a parsed result through `f` using the `>>` operator. For an example, see the PPM parser at the end of this paper.\n",
    "\n",
    "### `named_sequence` and `construct`\n",
    "\n",
    "The `named_sequence` combinator forms a particularly useful pair with the `construct` function. Used on its own, the `named_sequence` creates a dictionary. Many times when we're parsing, we want our results to form some class. The `construct` function takes a dictionary and constructs an object by forwarding the dictionary as keyword arguments.\n",
    "\n",
    "```python\n",
    "@dataclass\n",
    "Point:\n",
    "  x: float\n",
    "  y: float\n",
    "```\n",
    "\n",
    "```python\n",
    "point = named_sequence(\n",
    "  _1=tokenize(char(\"(\")),\n",
    "  x=tokenize(scientific_number),\n",
    "  _2=tokenize(char(\",\"))\n",
    "  y=tokenize(scientific_number),\n",
    "  _3=tokenize(char(\")\"))\n",
    "  ) >> construct(Point)\n",
    "```\n",
    "\n",
    "The `point` parser then constructs `Point` objects, such that\n",
    "\n",
    "```python\n",
    "parse_bytes(point, b\"(1, 2)\")\n",
    "```\n",
    "\n",
    "gives `Point(x=1, y=2)` as output.\n",
    "\n",
    "### `using_config` and `with_config`\n",
    "We may use the auxiliary stack to store a config variable that can be accessed from any parser. To make this use a bit more user-friendly, we define two functions: `with_config()` and the `@use_config` decorator. Functions decorated with `@use_config` should have the last argument be the `config` variable. The `with_config` parser sets a config dictionary to be the bottom of the auxiliary stack.\n",
    "\n",
    "Example: We have as input a number and a string. The string is returned in upper-case if the number is 1:\n",
    "\n",
    "```python\n",
    "@using_config\n",
    "def set_case(x, config):\n",
    "    config[\"uppercase\"] = (x == 1)\n",
    "    return value(None)\n",
    "\n",
    "@using_config\n",
    "def get_text(config):\n",
    "    if config[\"uppercase\"]:\n",
    "        return many_char(item, lambda x: x.decode().upper())\n",
    "    else:\n",
    "        return many_char(item, lambda x: x.decode())\n",
    "\n",
    "assert parse_bytes(\n",
    "    with_config(sequence(integer >> set_case, get_text())),\n",
    "    b'0hello') == \"hello\"\n",
    "assert parse_bytes(\n",
    "    with_config(sequence(integer >> set_case, get_text())),\n",
    "    b'1hello') == \"HELLO\"\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}