writing a message notation parser for a protocol ::

intro#

Recently, I had the opportunity to start over a library that I had been building for months now. I am very unproductive these days due to outsider circumstances but when I do have time; I make up for it by developing this library.

One day, I thought: “Hey, I learned a lot from developing this library. I should start over” and then deleted every single file in the directory including the .git directory.

The library in question was a 9p library. Any reader of this blog knows I write too much about 9p while not having a lot of experience in 9p. I thought I’d write a library to further understand the inner workings of the protocol.

I went about writing this library the same way aqwari.net went about developing the Go library styx.

The author of styx made a bunch of Go tests to read indian unsigned integers of sizes 8,16,32,64. Her/his workflow was simple:

Write a Go function
Write the unit test for said Go function
Validate the unit test

After validating each unit test, the author validated her/his library with some test data he/she collected from a 9p session. If any errors came up, he/she examined the parsing functions and unit test.

I set about with the same workflow; only writing it in Common Lisp. The reason I chose Common Lisp was because it lacked a 9p library. Yes, Common Lisp, a language that has existed for close to 30 years had no 9p library.

Go workflow in Common Lisp: assertions#

Common Lisp has no standard unit testing library; that is, nothing analogous to testing in Go. But, in Common Lisp, you don’t need such a library because the assert function exists.

The assert function is simple: It evaluates the argument you provide and throws an error if the argument returns NIL.

;; These are okay
(assert "Hello")
(assert 3.14)
(assert (list))
(assert 'symbol)
(assert :key)
;; These aren't
(assert nil)
(assert (> 5 10))
(assert (when nil 5))

I set about writing a macro that adds assertion expressions in a global list, asserts the expression and removes it from the list if it was invalid.

(defvar *assertions* nil)
(defmacro assert-f (expr)
  `(progn
     (push ',expr *assertions*)
     (handler-case (assert ,expr)
       (error (c)
         (pop *assertions*)
         (error c)))))

Then going about writing the message parsing functions. Luckily, 9p is one of the most simple protocols I have ever learned about. Because once I throughouly understood the message notation available in the intro(5) man page, all that was left was to parse the message.

Take the Rerror message which has the following notation

              size[4] Rerror tag[2] ename[s]

size is a uint32 because it has 4 bytes. Rerror has a constant value of 17, a tag value is a uint16 and ename[s] is a string.

I’d then write the function which looks something like this and assert that it works with some test data. Since xoripilo reads and writes a message; one can chain the written message generated by a writer to a reader and verify both at the same time.

The pitfalls of Go in Common Lisp#

At first, this approach was okay since it does the job. However, this approach quickly became annoying to develop in because I ran into too much assertions being wrong.

Sometimes, I’d leave the Common Lisp development session thinking I am closer to finishing the library, I’d be shocked in the next developmenet session where all my of assertions fail. Also, running sbcl --load proto.cl while an assertion fails means that the rest of the assertions do not run.

do not optimize; seriously, don’t.#

The API of xoripilo was subject to change because I had never wrote another protocol beside cl-scgi. I went through bouts of using streams, vector, literally writing my own UTF-8 decoder based on the DFA algorithm created by Bjoern Hoehrmann <bjoern@hoehrmann.de>.

As well as making function parameters as complex as can be because I wanted to optimize for performance. I thought; instead of allocating string vectors unnecessarily, why not just re-use existing strings? Which makes sense on paper until you end up with a Twalk that has a function callback to parse N amount of strings.

Although I really wanted to create a no-op 9p library with pre-allocated strings, bytes, et cetera but returning indexes of a byte array seemed too awkward to deal with.

streams or byte vectors?#

Byte vectors are better because they’re easier to deal with. Simply write/read the message to the bytes and let the user read/write the byte vector from/to streams.

It is also easier to make a generic function that reads the message’s size, type and then parses the message according to its type than to make every single message function use a stream.

;; this is good
(defun read-message (bytes stream)
  (stream-read-sequence bytes stream 0 4)
  (let ((size (r4 bytes)) (type nil))
    (stream-read-sequence bytes stream 4 5)
    (setf type (r1 bytes 4))
    (case type
      (:tversion (r-tversion bytes)))))
;; this is bad
(defun r-tversion (bytes stream)
  ;; same body of read-message
  ;; and then extra body that reads every single element
  ;; very useless duplicate code
  ;; im not gonna write an example
  ;; because I care for your sanity
  )

going further: writing a message notation parser#

Anybody coming with a language like Go, C, Javascript or any other language would think to write a 9p message parser; write some abstraction of a server/client that handles requests, maybe an abstraction for files and that’s it.

Somebody coming from a Common Lisp background might have a different more subversive idea; write a parser for the notation of the messages. A Common Lisp developer usually wants to rewrite the world in Common Lisp for the sake of spreading chaos and disturbing the peace.

I had not known this when I started the library; that my inner intentions were dubious at best and plain evil at worst.

Sarcasm aside, writing a message notation parser was a considerable solution to my problems. The obsession with writing functions optimized to the teeth lead to files that were in some 1000 lines. This is not acceptable.

The message notation I chose was the one provided in the intro(5). It was simple enough to parse; simply split the strings by newline to indicate each line.

That was simple enough to do without writing any code but using the make-string-input-stream alongside the string and looping through (read-line) until it emitted an error value such as NIL.

Each field was conjoined by a space. I wrote a string splitter to do that.

What was left was to determine the size of the message field. For example, size[4] had a size of 4. While uname[s] had a size of s and data[count] had count as its size.

We can use the splitter from before and remove the last character ]. The size is by far the most important thing in a field because if the size is parsed incorrectly; the whole message becomes invalid.

What was cool about writing a parser for message notations than for messages was how concise it was. For example, some messages like Twalk and Rwalk have a loop structure to them. The loop size is indicated by a previous parameter. For reference, here are their message notations:

size[4] Twalk tag[2] fid[4] newfid[4] nwname[2]
              nwname*(wname[s])
size[4] Rwalk tag[2] nwqid[2] nwqid*(wqid[13])

nwname and nwqid are the array lengths and the following parameter wname[s] and wqid[13] are descriptions of the iterable field. That is, if nwname is set 5, we’d expect 5 strings.

That meant that while you parsed the message notation; you could generate a function that parsed the message. Iterable fields such as wqid[13] would be parsed through a form like this one:

(loop collect (read-wqid bytes) repeat nwqid)

You might think: Well, that’s cool but it is ultimately useless. No, it is not useless because that meant that my parser contained itself in 256 lines of Common Lisp. That includes the message notations, the constants and every message parser/serializer.

To give you an image, proto.go and parse.go in the styxproto sub-package of styx were 1046 lines in total. That’s without the packages that there used like bytes, io, strings, fmt, et cetera.

The whole package styxproto is 2676 in total. I don’t know styxproto too much but if its purpose is simply to parse the messages, proto.cl would be 10% of its line count.

Cool, right?

But it gets even cooler. styx supports the normal 9p2000 version of the 9p protocol. Since the 9p protocol is really that simple, other developers have extended for their purposes. There is a 9p2000.u and a 9p2000.l for unix and linux respectively.

proto.cl technically can support both as long as the message notation follows the same format of 9p. The functions are generated and exported on demand. Double cool, ain’t it?

The even tripler cool is that most of the generated functions are documented. They all have a description that also includes their notation. Types can also be specified if wanted in the generator.

Common Lisp and DSLs: a match made in heaven#

I’ve previously created an HTML DSL in 20 lines. It seems as if Common Lisp is really a language made for DSLs.

Regardless, this solution was never intended. I wouldn’t have thought to implement a DSL for message notations in-order to parse messages. But, here we are and I am sticking with solution as it is far better than anything I had imagined.

I plan to release xoripilo into the public domain some day soon but I need to really think about the server/client aspects of the library as well as how to package the project and where to host it; I still don’t use ASDF in any of my projects because I don’t understand it. I think Common Lisp has more potential in this area than any other language by a long shot.

I originally thought this way with a project I had done for a client written entirely in Common Lisp. HTML, CSS, Javascript (beside the libraries) as well as backend in Common Lisp.

There were some goodies there like a regular expression router; a locally bound *user* to avoid refetching the user; weird ways to encrypt user tokens; and a whole lotta other things. As for 9p, I had tried styx in-order to build a filesystem for a chat protocol that originally used HTTP and styx was awkward due to it being written in Go than any issues with the design of the library.

When I say awkward, I really mean awkward. I spent the majority of my time trying to figure out how to write a router for specific files as I didn’t wanna generate and regenerate the filesystem; instead, I wanted a synthetic file system. I went through a week not knowing what to do and eventually gave up.

Expect more Common Lisp goodies in the coming days.