Parsing PPM files
To show how we can mix ASCII and binary data, we have an example where we parse Portable PixMap files (PPM). These files have a small ASCII header and the image itself in binary. The header looks something like this:
P6 # this marks the file type in the Netpbm family
640 480
256
<<binary rgb values: 3*w*h bytes>>
[1]:
import numpy as np
from dataclasses import dataclass
from byteparsing import parse_bytes
from byteparsing.parsers import (
text_literal, integer, eol, named_sequence, sequence, construct,
tokenize, item, array, fmap, text_end_by, optional)
The PPM header format allows for comments in the ASCII header.
[2]:
comment = sequence(text_literal("#"), text_end_by("\n"))
We define a class that should contain all the data in the header.
[3]:
@dataclass
class Header:
width: int
height: int
maxint: int
Then we can construct a parser for this header, using named_sequence
and construct
.
[4]:
header = named_sequence(
_1 = tokenize(text_literal("P6")),
_2 = optional(comment),
width = tokenize(integer),
height = tokenize(integer),
maxint = tokenize(integer)) >> construct(Header)
We’ll have to pass on the header information to the parser for the binary blob somehow, so we define a function.
[5]:
def image_bytes(header: Header):
shape = (header.height, header.width, 3)
size = header.height * header.width * 3
return array(np.uint8, size) >> fmap(lambda a: a.reshape(shape))
ppm_image = header >> image_bytes
Let’s test this on a sample image, and ignore the fact that PIL
has a perfectly good parser for PPM files itself.
[6]:
raw_data = open("python-logo.ppm", "rb").read()
image = parse_bytes(ppm_image, raw_data)
from PIL import Image
Image.frombytes(mode="RGB", size=(image.shape[1], image.shape[0]), data=image)
[6]: