The case

Initially, I had no intention of working on Netlink related stuff, I was simply testing my pf patch. The patch was meant specifically for ARMv7 architecture, but my build could not accomplish the simplest pfctl incantation:

# echo 'pass all' | pfctl -f-
pfctl: DIOCADDRULENV: Device not configured

After a quick check of 13.2-RELEASE and 14.0-RELEASE, it revealed itself as a regression on my way.

The investigation

Git bisect led to a recent point in the project history where the rule addition logic of pfctl was migrated to Netlink. The high-level overview of the libpfctl code showed that there is an issue with the parsing of a Netlink message. The existing NL_LOG facility of FreeBSD Netlink, which is enabled like sysctl net.netlink.debug.nl_parser_debug_level=9, helped to get the first details:

[nl_parser] nl_parse_attrs_raw: >> parsing 0xdcc32200 attr_type 0 len 0 (rem 120)
[nl_parser] PID 1024 nl_parse_attrs_raw: Invalid attr 0xdcc32200 type 0 len: 0
[nl_parser] nl_parse_attrs_raw: parse failed att offset 428

And our old good friend, ktrace, quickly showed the facts without additional preparations. Every single byte of the message pfctl sent can be examined:

1039 pfctl    CALL  sendto(0x4,0x2082f790,0x42c,0,0,0)
1039 pfctl    GIO   fd 4 wrote 1068 bytes
     0x0000 2c04 0000 1100 0503 0200 0000 0000 0000 0500 0000  |,...................|
     0x0410 0500 0000 0800 0100 0300 0000 0800 0200 0300 0000  |....................|
     0x0424 0600 3e00 0000 0000                                |..>.....|

The same works for the reply it received from Netlink:

1039 pfctl    CALL  recvmsg(0x4,0xbfbfd948,0)
1039 pfctl    GIO   fd 4 read 1140 bytes
     0x0000 7404 0000 0200 0002 0200 0000 0000 0000 1600 0000  |t...................|
     0x0438 0600 3e00 0000 0000 2a00 0100 496e 7661 6c69 6420  |..>.....*...Invalid |
     0x044c 6174 7472 2030 7864 6363 3332 3230 3020 7479 7065  |attr 0xdcc32200 type|
     0x0460 2030 206c 656e 3a20 3000 0000 0800 0200 0002 0000  | 0 len: 0...........|

Tracing the respective code path inside the SNL (Simple Netlink Library) with Netlink message specification beside me as a reference, I found that attributes of a message are constructed correctly, as expected, due to such obvious defects would be spotted and fixed already. But the resulting message still is distorted. By traversing the code path ahead I found out why.

As long as pfctl’s addrule request message is quite big, it does not fit the default size of the SNL writer buffer. Hence, it always triggers buffer reallocation logic to get a bigger space. I discovered a defect in the logic around buffer reallocation — respective length and offset values were not maintained correctly. It creates a buffer overflow effect. However, the actual buffer allocated was always bigger than needed, so the program was not failing due to writing to a wrong address, it was creating a wrong Netlink message instead.

The actual mess in data was done by a memcpy(3) call during buffer reallocation. The incorrect calculations made it copy data within the same buffer with a wrong offset in addition.

Assuming we copy data per byte sequentially, it would require 5 read & write operations for the following example of a usual and successful memory copying:

01 usual memcpy

But in this case, the source and the destination buffers were overlapping. Let’s imagine that the first two bytes of the destination buffer are located at the same addresses as the last two bytes of the source buffer:

02 overlapping forward memcpy

So, when it’s time to read the 4th and the 5th bytes from the source they are wrong already due to overlapping writing of the 1st and the 2nd bytes. We would expect to see D and E, but we get A and B instead. This is how we get a broken Netlink message crafted.

Nothing special, quite a trivial issue, but it’s not the end of the story.

The target architecture specifics

Okay, the logic has a bug, but my tests on other architectures like AMD64 and AArch64 showed no problems — everything works fine. So, knowing the root cause I still had an open question, which I would like to close to make sure the reasoning is correct and the fix is dealing with the root cause, not with its consequences.

The most frequently used routines of libc like memcpy, memcmp, and other invisible giants the userland applications stand on usually have optimized separate implementations per CPU architecture. So, the next step was to consider the specific implementation of the memcpy. And the dead simple answer to my question was quickly found.

From the C standard perspective, it’s implemented and works as expected. For instance, C11’s section states:

If copying takes place between objects that overlap, the behavior is undefined.

In addition, the standard declares the function with restrict qualifiers:

void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

So, an implementation expects no overlap from a caller. But the history behind such functions is long and complex, the "no overlapping" expectation and the restrict keyword were not there from the very beginning. As a result, some implementations have overlap cases covered by doing backward copying, from the tail of the source buffer:

03 overlapping backward memcpy

Therefore, the answer to why ARMv7 differs from others and reveals the bug is that its implementation of memcpy does not support overlap cases. But other architectures hide the bug by getting correctly crafted Netlink messages while still doing redundant actions and extra allocations behind the scenes. There are references for a quick comparison and intuition of the implementations:


It was not a wrong usage of a libc function in this case, it was done so as a consequence of another bug. But the situation itself reminds us to consider, when feasible, hardware and environment specifics and the standards could be the first guides to help with.

Regarding the bug itself, the general recommendation could be to extract length/offset decision making logic and cover corner cases with unit tests, but it’s easier to say than do — every code and project has invisible for a reader story behind involving budgets, deadlines, development process nuances, and a myriad of other possible reasons not to get everything perfect at the first shot. The fix of this bug also involves a trade-off — it had to break the Law of Demeter.

IT happens.


Copyright © Igor Ostapenko
(handmade content)

Submit a like
Post a comment