Subject: Unexpected timestamps in tcpdump with veth + tc qdisc netem delay

(please CC me in any replies, thank you!)

Hello,

While trying to simulate large delay links using veth and netns, I came across what looks like unexpected / incorrect behavior.

I have reproduced it in Debian 4.19 and 5.10 kernels, and a quick look at mainline doesn't show any relevant deviation from Debian kernels to mainline in my limited understanding of this area of the kernel.

I have attached a simple script to reproduce the scenario. If my explanation below is not clear, please just look at the script to see what it does: it should be trivial to understand. It needs tcpdump, and CAP_NET_ADMIN (or root, etc).

Topology

root netns:
veth vec0 (192.168.233.1) paired to ves0 (192.168.233.2)
tc qdisc dev vec0 root netem delay 250ms

lab500ms netns:
veth ves0 (192.168.233.2), paired to vec0 (192.168.233.1)
tc qdisc dev ves0 root netem delay 250ms

So:
[root netns -- veth (tc qdisc netem delay 250ms) ] <> [ veth (tc qdisc netem delay 250ms) -- lab500ms netns ]

Expected RTT from a packet roundtrip (root nets -> lab500ms netns -> root netns) is 500ms.


The problem:

[root netns]: ping 192.168.233.2
PING 192.168.233.2 (192.168.233.2) 56(84) bytes of data.
64 bytes from 192.168.233.2: icmp_seq=1 ttl=64 time=500 ms
64 bytes from 192.168.233.2: icmp_seq=2 ttl=64 time=500 ms

(the RTT reported by ping is 500ms as expected: there is a 250ms transmit delay attached to each member of the veth pair)

However:

[root netns]: tcpdump -i vec0 -s0 -n -p net 192.168.233.0/30
listening on vec0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:09:09.740681 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64
17:09:09.990891 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
17:09:10.741903 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
17:09:10.992031 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
17:09:11.742813 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
17:09:11.993009 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64

[lab500ms netns]: ip netns exec lab500ms tcpdump -i ves0 -s0 -n -p net 192.168.233.0/30
listening on ves0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:09:09.740724 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64
17:09:09.990867 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
17:09:10.741942 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
17:09:10.992012 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
17:09:11.742851 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
17:09:11.992985 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64

One can see that the timestamps shown by tcpdump (also reproduced using wireshark) are *not* what one would expect: the 250ms delays are missing in incoming packets (i.e. there's 250ms missing from timestamps in packets "echo reply" in vec0, and "echo request" in ves0).

The 250ms vec0->ves0 delay AND 250ms ves0 -> vec0 delay *are* there, as shown by "ping", but you'd not know it if you look at the tcpdump. The timing shown in tcpdump looks more like packet injection time at the first interface, than the time the packet was "seen" at the other end (capture interface).

Adding more namespaces and VETH pairs + routing "in a row" so that the packet "exits" one veth tunnel and enters another one (after trivial routing) doesn't fix the tcpdump timestamps in the capture at the other end of the veth-veth->routing->veth-veth->routing->... chain.

It looks like some sort of bug to me, but maybe I am missing something, in which case I would greatly appreciate an explanation of where I went wrong...

Thanks in advance,
Henrique de Moraes Holschuh <[email protected]>


Attachments:
netns.sh (1.81 kB)

2021-04-26 17:08:25

by Eric Dumazet

[permalink] [raw]
Subject: Re: Unexpected timestamps in tcpdump with veth + tc qdisc netem delay



On 4/26/21 4:35 PM, Henrique de Moraes Holschuh wrote:
> (please CC me in any replies, thank you!)
>
> Hello,
>
> While trying to simulate large delay links using veth and netns, I came across what looks like unexpected / incorrect behavior.
>
> I have reproduced it in Debian 4.19 and 5.10 kernels, and a quick look at mainline doesn't show any relevant deviation from Debian kernels to mainline in my limited understanding of this area of the kernel.
>
> I have attached a simple script to reproduce the scenario. If my explanation below is not clear, please just look at the script to see what it does: it should be trivial to understand. It needs tcpdump, and CAP_NET_ADMIN (or root, etc).
>
> Topology
>
> root netns:
> veth vec0 (192.168.233.1) paired to ves0 (192.168.233.2)
> tc qdisc dev vec0 root netem delay 250ms
>
> lab500ms netns:
> veth ves0 (192.168.233.2), paired to vec0 (192.168.233.1)
> tc qdisc dev ves0 root netem delay 250ms
>
> So:
> [root netns -- veth (tc qdisc netem delay 250ms) ] <> [ veth (tc qdisc netem delay 250ms) -- lab500ms netns ]
>
> Expected RTT from a packet roundtrip (root nets -> lab500ms netns -> root netns) is 500ms.
>
>
> The problem:
>
> [root netns]: ping 192.168.233.2
> PING 192.168.233.2 (192.168.233.2) 56(84) bytes of data.
> 64 bytes from 192.168.233.2: icmp_seq=1 ttl=64 time=500 ms
> 64 bytes from 192.168.233.2: icmp_seq=2 ttl=64 time=500 ms
>
> (the RTT reported by ping is 500ms as expected: there is a 250ms transmit delay attached to each member of the veth pair)
>
> However:
>
> [root netns]: tcpdump -i vec0 -s0 -n -p net 192.168.233.0/30
> listening on vec0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 17:09:09.740681 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64

Here you see the packet _after_ the 250ms delay

> 17:09:09.990891 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
Same here.

I do not see any problem.

> 17:09:10.741903 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
> 17:09:10.992031 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
> 17:09:11.742813 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
> 17:09:11.993009 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64
>
> [lab500ms netns]: ip netns exec lab500ms tcpdump -i ves0 -s0 -n -p net 192.168.233.0/30
> listening on ves0, link-type EN10MB (Ethernet), capture size 262144 bytes
> 17:09:09.740724 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64
> 17:09:09.990867 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
> 17:09:10.741942 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 2, length 64
> 17:09:10.992012 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 2, length 64
> 17:09:11.742851 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 3, length 64
> 17:09:11.992985 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 3, length 64
>
> One can see that the timestamps shown by tcpdump (also reproduced using wireshark) are *not* what one would expect: the 250ms delays are missing in incoming packets (i.e. there's 250ms missing from timestamps in packets "echo reply" in vec0, and "echo request" in ves0).
>
> The 250ms vec0->ves0 delay AND 250ms ves0 -> vec0 delay *are* there, as shown by "ping", but you'd not know it if you look at the tcpdump. The timing shown in tcpdump looks more like packet injection time at the first interface, than the time the packet was "seen" at the other end (capture interface).
>
> Adding more namespaces and VETH pairs + routing "in a row" so that the packet "exits" one veth tunnel and enters another one (after trivial routing) doesn't fix the tcpdump timestamps in the capture at the other end of the veth-veth->routing->veth-veth->routing->... chain.
>
> It looks like some sort of bug to me, but maybe I am missing something, in which case I would greatly appreciate an explanation of where I went wrong...
>

That is only because you expect to see something, but you forgot that tcpdump captures
TX packet _after_ netem.

Subject: Re: Unexpected timestamps in tcpdump with veth + tc qdisc netem delay

On Mon, Apr 26, 2021, at 14:07, Eric Dumazet wrote:
> On 4/26/21 4:35 PM, Henrique de Moraes Holschuh wrote:
[...]
> > [root netns]: tcpdump -i vec0 -s0 -n -p net 192.168.233.0/30
> > listening on vec0, link-type EN10MB (Ethernet), capture size 262144 bytes
> > 17:09:09.740681 IP 192.168.233.1 > 192.168.233.2: ICMP echo request, id 9327, seq 1, length 64
>
> Here you see the packet _after_ the 250ms delay
> > 17:09:09.990891 IP 192.168.233.2 > 192.168.233.1: ICMP echo reply, id 9327, seq 1, length 64
> Same here.
[...]
> > Adding more namespaces and VETH pairs + routing "in a row" so that the packet "exits" one veth tunnel and enters another one (after trivial routing) doesn't fix the tcpdump timestamps in the capture at the other end of the veth-veth->routing->veth-veth->routing->... chain.
> >
> > It looks like some sort of bug to me, but maybe I am missing something, in which case I would greatly appreciate an explanation of where I went wrong...
>
> That is only because you expect to see something, but you forgot that tcpdump captures TX packet _after_ netem.

That was it! Thank you very much for the quick reply, and direct, precise explanation.

I had completely forgotten about the qdisc running before interface timestamping, and I overlooked the fact that I did not try three netns in-a-chain with the qdiscs *in the middle netns*: I tried them in the two opposite edge netns, only.

Again, thank you!

--
Henrique Holschuh