ashd - a sane http daemon
intro⌗
When it comes to web servers, there are a lot of options that are useful, you have:
- Apache: the oldest widely used web server
- nginx: everything you’d want in a web server and more
- lighttpd: alternate nginx
- Caddy: nginx v2
All these web servers implement the latest HTTP protocol with SSL support and support a plethora of configuration directives so that your heart can rest easy knowing that you could probably do whatever you want with these web servers. But, all of these web servers lack one virtue in good software which is good design.
Good Design is critical in better performance and security; no code is infinitely more secure than good code. Moreover, it follows the less lines of code that a program has, the simpler it is to debug.
In-order to illustrate good design from bad design, let’s count the lines of code for all of these web servers in-order to get a look at what programs are doing more with less. This benchmark was made with the following shell script:
# caddy
cd caddy
echo "caddy:"
find . -iname "*.go" ! -iname "*_test.go" ! -path "./caddytest/*" -type f -exec wc -l {} \+ | grep total
cd ..
# nginx
cd nginx/src
echo "nginx:"
find . -iname "*.c" ! -path "./mail/*" -type f -exec wc -l {} \+ | grep total
cd ../..
# lighthttpd
cd lighttpd2/src
echo "lighthttpd:"
find . -iname "*.c" ! -path "./unittests/*" -type f -exec wc -l {} \+ | grep total
cd ../..
# apache
cd httpd/server
echo "apache:"
find . -iname "*.c" -type f -exec wc -l {} \+ | grep total
And, the results are:
caddy:
47783 total
nginx:
181990 total
lighthttpd:
45784 total
apache:
64802 total
Keep in mind, however, that Caddy doesn’t parse http requests on its own and instead uses a the Go’s standard http library, that’d add a lot more lines of code (30594 to be exact).
That’s all fine and dandy; I don’t know about you but for me, 50k to 100k for a web server is completely fine. But, what if I told you, you could only have 10% of that amount and still get a pretty damn good http server? You’d probably say that I am outside of my mind.
I mean, if the Apache foundation, renonwed for creating industry standard software besides its HTTP server, itself couldn’t make its http server in 10k lines, how could anyone every dream of such a thing?
Meet ashd, a sane http daemon, which itself clocks in at only 6445 lines of code. This is not simply an HTTP request parser, but a program that could qualify as a production-ready http server with features like:
- FastCGI, SCGI and normal CGI support
- Pattern matching against URLs to allow for customized behaivor as certain locations
- Normal service of static files
- Infinitely extensible through outside programs
Sounds puzzling enough, doesn’t it? Well, in this article, I will explain what makes ashd and the rest of web servers different and how ashd’s architecture, though strange and mystical, can be extremely useful for production-like environments.
traditional web-servers⌗
Traditional web-servers, and in-many ways programs, did their best to accomodate for their clients through implementing features. Many of these web-servers, like nginx, expanded their feature-set to get clients to continue using their software.
As mentioned before, nginx is a great-example of this because when nginx started, it had a pretty clear directive: “do HTTP faster than any other http server” and it succeeded at that for a long while.
Basically, what nginx did different was use an event-oriented appraoch that enabled it to handle multiple connections even when a system was single-threaded. Not only that, but it enabled nginx to be far more efficient in terms of CPU and Memory resources since its counterpart, Apache, re-created its process just to handle a request, which is extremely inefficient.
nginx, was, and is still the “king” of web servers. Later on it implemented support for languages like PHP through the FastCGI interface(driving much of the web today), allowed systems to scale through its upstream
and server
configuration directives, and even helped out with driving many email-systems through proxying connections from itself to actual email servers(allowing for easier scallability).
In essence, this works, but with one single danger: the application is essentially more prone to attacks. Sure, nginx has great features and all, but if one of these “extra” features has a bug, you could not only destroy that “extra” feature but the whole web-server.
Let’s say there was a vulnerability in the regex implementation of nginx that was so severe it could crash the program, this one single flaw could render all other services offerred by nginx(like email proxying) useless. Furthermore, there is no layer of seperation between the regex implementation and the web-server itself, meaning that there is no way to tell the user “hey, we are having an issue with our regex implementation” because it is not seperated from the web-server.
Clearly, this approach should be improved but how? Well, long-time users of *nix would suggest seperating a program into many small pieces that interact with each other. This works well as it allows each program to have one specific duty that can be easily testable for vulnerabilities and buggy behaivor.
That is precisely what ashd uses to achieve its low amount of lines of code; it delegates much of its functionalty to smaller components.
unix and ashd⌗
ashd is not a program by itself but rather a suite of extremely useful software to be used in-conjuction with each other. The main program is htparser
and what it does is parse HTTP requests and handles it via a child program.
htparser’s syntax is simple:
## starts at :8080
## serves files from /var/www
htparser plain:port=8080 -- dirplex /var/www/
## OR
## starts at 80, 443 with SSL encryption
## serves files from /var/www
htparser plain ssl:cert=/etc/ssl/private/web.pem -- dirplex /var/www
Pretty simple, right? However, you are not limited to just serving files from a directory; you could serve from FastCGI.
## starts a http server at :8080
## sends requests to the FastCGI "server" at 8888
htparser plain:port=8080 -- callfcgi -u 127.0.0.1:8888
ashd’s architecture is quite simple yet very powerful because it allows requests to be concurrently; the parsing is done by one program, the routing by a second and the response by another. It also follows that this type of architecture is, in many ways, more secure than nginx’s architecture for example.
This is because the only program that must run with root privileges is htparser
and that’s only because it listens to port 80. It is kinda strange that a program must have root privileges to listen on a specific port but that’s a topic for another day.
Other programs like patplex(which is the router for ashd) or callfcgi can be safely isolated and kept in user-space. Further security enhancements like limitting syscalls using systrace
can allow for greater security & in some cases better performance, since enforcing only necessary syscalls allows the programmer to find more performant ways to build programs because syscalls are more expensive than normal operations, than your average nginx instance.
a better static web server⌗
I know this article is solely about ashd but let’s deviate in-order to explore creativity in software engineering & web servers. Recently, I have found about kawipiko
which is a pecuilar static-only http servers.
kawipiko
is faster because it doesn’t read from disk; it only reads once and keeps all that data in memory. It does this via generating a constant binary database (CDB), putting it in memory and listening and serving from that database only.
Because kawipiko
only plans on serving static files, it can use a constant database since static files have absolute paths and are not dynamic. A web-server like nginx which plans on supporting load balancing, reverse-proxies and many more features cannot be better at serving static files than kawipiko
.
One more thing I want to mention is that kawipiko
can be implemented as part of ashd
. In fact, it probably would serve better as that because many websites nowadays have both static and dynamic portions. Furthermore, it can be made so that all requests are first matched by that database, and if they exists serve them, if not return a custom 404 page or something similar.
general statement⌗
Though, this statement is completely anecdotal and lacks scientific data, I find it to be true: “Specialized software > generic software”
We have seen how good a program’s design can be through ashd
and kawipiko
. When programs stick to a specific purpose, they can have advantages that otherwise wouldn’t have been thought of before hand.
ashd, to me, serves as the pinnacle of software design when the obvious answer is to just replicate whatever is popular(nginx, apache, or Caddy).
what’s missing from ashd⌗
I know I’ve talked about the benefits of ashd for far too long, truth be told it has a lot of “missing” areas:
- No HTTP 2.0 support
- ~Lack of support for subdomains~ apparently that could be solved through patplex or a customized program
conclusion⌗
nginx, apache, caddy and lighttpd are all fine choices if you want a web-server. ashd only serves as an example of what good design could achieve, in a one/tenth of the lines of code, ashd acheives what these web-servers could do with more room for expansion. Furthermore, it allows for easier expansion since you don’t have to write dynamic modules or learn a specific language like Go, you just need to parse the ashd format and write to the socket it provides you with.