A classic dependency issue, or how to resolve a conflict between FreeBSD pf and ipfw

Dependencies, dependencies, dependencies!

Through the years, the software development field accumulated a strong understanding that dependencies matter. Be it a dependency on time to market, the current state of an established "ocean", budget limits, availability of required resources, presence and stability of technologies and practices, hardware capabilities, and myriads of other things that may hurt the business from different sides.

But honestly, it’s not so bad, and many of the questions are covered by our business partners, investors, open-source communities, and existing solutions to build a new product on.

Having many points covered we still have at least two extremes to watch and control: pure business strategy & tactics and the technical side, where our custom code dwells. The latter may seem to have no direct impact on the business — the same goals may be achieved using different technical approaches, programming languages, or frameworks. And we can easily think of business examples where it does not make much sense to worry about. I saw such businesses, even the ones when every single business iteration was an overhaul of the entire technical side with re-programming using another language or framework, for the sake of modernization and keeping up with the trends.

Whether we like it or not, many projects need to pay thorough attention to architecture questions, due to it may have a great impact on the business. Dependencies are hard, especially when we consider options for the foundation: key technologies and their providers, main programming paradigms, core frameworks and libraries, etc.

Okay, let’s assume our third-party dependencies are sorted out. Are we fine now? Unfortunately, there is another layer of dependencies, which we are the creators of. Code structure and software architecture are in question here.

The case

I was working on a defect in FreeBSD pf firewall. It’s related to the divert mechanism which allows one to divert a network packet and pass it to an application for examination and possible modification. A divert application may decide to drop the packet or re-inject it back into the network flow. This very simple concept can be depicted as follows:

The key job of the firewall here is to avoid loops, not to re-divert the same packet again and again. And such looping was the defect. The existing loop detection logic did not work as expected.

The very first patch was trivial — packet direction check required fixing. Instead, 99% of the patch was about the respective system tests. However, the specific test environment of another FreeBSD developer reported failure of some tests. I found out that pf divert logic works incorrectly if another firewall kernel module is loaded — ipfw.

It feels like a classic dependency issue came to say "Hi": the code was successfully compiled and run, and the related tests were passed, but a particular combination of kernel modules breaks the expected behavior.

The conflict

The key data structure in the FreeBSD network stack is mbuf (memory buffer). It’s very usual: meta + data. For instance, the metadata links multiple buffers into a logical chain, e.g. an IP packet. Usually, a special pkthdr structure is used as part of the data in addition to the network packet itself, it describes important information for packet processing like the network interface the packet was received on, flowid, fibnum, and other supporting flags and information. Many different network-related subsystems, protocols, and mechanisms would like to add some bits of their internals to this header for their use. But the FreeBSD project tries not to modify this header’s structure whenever a new feature would like to extend the header with its specifics. One of the reasons is not to get this structure bloated per every single network packet. A dynamic tags mechanism was added instead. It adds a little overhead by being a separate linked list structure to create and traverse, but it can be good enough for other network features like divert, dummynet, IPsec, and so on.

And the conflict itself has historical roots:

ipfw is a native FreeBSD firewall
pf was originated in OpenBSD and was ported to FreeBSD later
the divert mechanism was originally designed to be used with ipfw, it expects ipfw specific mbuf tag to be attached to a packet with divert instructions like divert port which specifies the target application
FreeBSD port of pf tries to mimic ipfw to make it work with the existing divert kernel interface — it crafts and attaches such specific tag as it comes from ipfw itself

There is a strong scent of a hacky dependency, huh?

Officially there is no support for the cases when both firewalls are used at the same time, but it’s possible and there are production cases. Therefore, the obstacle for pf is that ipfw does not care about such cases and other firewalls — its behavior is to read and remove this tag when it’s found attached to a packet. Why not, this is ipfw implementation detail after all. But the presence of this tag is the only way for pf to detect re-injection and avoid the re-divert loop:

After the first diversion cycle, at step #5 the packet hits ipfw first, the tag gets removed, then pf thinks that such packet has not been diverted yet and step #6 repeats step #2 — the unhappy loop begins.

The solution

Well, the issue does not look like a complex twist of dependencies. It’s rather felt as a very trivial topic. The danger is that various ways can be found to fix it, and some of them can easily be new traps to exchange one dependency issue for a brand new one.

One of the cheapest and simplest solutions is to introduce a separate tag for pf itself, so that ipfw is not involved in the diversion process at all. Also, an additional benefit of this approach is that ipfw logic stays unchanged — no possible regressions as a result. Yes, the divert mechanism has to be altered, but it minimizes a possible impact on existing systems in production all over the world. The OS kernel can be thought of as a very fragile thing compared to the high-level userland applications, e.g. a trivial change of a carefully CPU-aligned structure may lead to interesting consequences we would like to avoid and sleep peacefully.

Hence, the final patch went this way to fix it, with additional system tests to cover cases when ipfw is also on the scene.

Conclusions

We may obtain a strong level of confidence by having all the best practices, design patterns, tests, and enormous help from the technology itself (e.g. Rust), but we may still have lurked logical dependencies waiting for their show time, especially for such complex, long-standing, huge, overloaded with legacy and backward compatibility projects like an operating system.

Submit a like

igoro.pro