2014-02-10 08:57:06

by Ortwin Glück

[permalink] [raw]
Subject: xfrm: is pmtu broken with ESP tunneling?

Hi,

I am using Openswan to configure an IPSec VPN (using the xfrm/netkey
backend). Large HTTP POST requests from the client seem to get stuck,
because the outgoing packets are 1530 bytes (before being wrapped into
ESP packets). The problem goes away by setting sysctl
net.ipv4.ip_no_pmtu_disc=1.

May have something to do with it:
The tunneled network is 10.6.6.6/32 and I am SNAT'ing some destinations
to that IP, so they get routed through the tunnel. Any other networks
are not to go through the tunnel.

iptables -t nat -A POSTROUTING -d "${R}" -j SNAT --to-source 10.6.6.6

It seems quite clear to me that xfrm is doing something wrong here.

Ortwin


2014-02-11 02:33:01

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: xfrm: is pmtu broken with ESP tunneling?

Hi!

On Mon, Feb 10, 2014 at 09:41:54AM +0100, Ortwin Glück wrote:
> I am using Openswan to configure an IPSec VPN (using the xfrm/netkey
> backend). Large HTTP POST requests from the client seem to get stuck,
> because the outgoing packets are 1530 bytes (before being wrapped into
> ESP packets). The problem goes away by setting sysctl
> net.ipv4.ip_no_pmtu_disc=1.

This setting will shrink the path mtu to min_pmtu when a frag needed icmp is
received. It sounds like we calculate the path mtu incorreclty in case of
fragmentation.

> May have something to do with it:
> The tunneled network is 10.6.6.6/32 and I am SNAT'ing some destinations
> to that IP, so they get routed through the tunnel. Any other networks
> are not to go through the tunnel.
>
> iptables -t nat -A POSTROUTING -d "${R}" -j SNAT --to-source 10.6.6.6
>
> It seems quite clear to me that xfrm is doing something wrong here.

Can you send a ip route get <ip> to the problematic target to see how
far off the calculated value is?

Thanks,

Hannes

2014-02-11 20:20:51

by Ortwin Glück

[permalink] [raw]
Subject: Re: xfrm: is pmtu broken with ESP tunneling?

On 02/11/2014 03:32 AM, Hannes Frederic Sowa wrote:
>> net.ipv4.ip_no_pmtu_disc=1.
>
> This setting will shrink the path mtu to min_pmtu when a frag needed icmp is
> received.

The UDP+ESP encapsulation adds 60 bytes to the original packet size.

ifconfig wla0 shows an mtu of 1500.

The size of the first big packet on the interface:
net.ipv4.ip_no_pmtu_disc=1: packet length is 1300
net.ipv4.ip_no_pmtu_disc=0: packet length is 1500

Length is without the ESP wrapper and UDP encapsulation. The packets are so big
that they can't even leave the wireless interface and never show up on the
router. So no ICMP packets are received. PMTU can't work with initial packets of
that size.

dump question: which layer discard these packets? qdisc? why no notification to
the sender?

When I increase the mtu of the interface to 2000 with ifconfig, then I start
seeing ICMP fragmentation needed from the next hop, indicating 1500 as the mtu
as response to a 1560 byte UDP[ESP] packet.

The next UDP[ESP] packet is shorter: 1360 bytes. It gets hard to see what's
going on after that, but the connection is still not working.

So, instead of somehow losing these packets on the way out of the interface
should the kernel not start with a lower mtu in the first place? Now it seems it
is trying with the maximum of the interface and expecting to scale down with
pmtu - which can ever happen.

> Can you send a ip route get <ip> to the problematic target to see how
> far off the calculated value is?

That command doesn't return anything useful. No hint on the mtu here.

BTW, instead of disabling pmtu, setting mtu explicitly also helps:
ip route add 10.6.6.0/24 via ${localip} mtu 1300

Thanks,

Ortwin

2014-02-13 00:01:17

by Hannes Frederic Sowa

[permalink] [raw]
Subject: Re: xfrm: is pmtu broken with ESP tunneling?

On Tue, Feb 11, 2014 at 09:20:40PM +0100, Ortwin Glück wrote:
> On 02/11/2014 03:32 AM, Hannes Frederic Sowa wrote:
> >>net.ipv4.ip_no_pmtu_disc=1.
> >
> >This setting will shrink the path mtu to min_pmtu when a frag needed icmp
> >is
> >received.
>
> The UDP+ESP encapsulation adds 60 bytes to the original packet size.
>
> ifconfig wla0 shows an mtu of 1500.
>
> The size of the first big packet on the interface:
> net.ipv4.ip_no_pmtu_disc=1: packet length is 1300
> net.ipv4.ip_no_pmtu_disc=0: packet length is 1500
>
> Length is without the ESP wrapper and UDP encapsulation. The packets are so
> big that they can't even leave the wireless interface and never show up on
> the router. So no ICMP packets are received. PMTU can't work with initial
> packets of that size.
>
> dump question: which layer discard these packets? qdisc? why no
> notification to the sender?

Could you try either dropwatch or perf script net_dropmonitor and flood the
network with the problematic packets. From the traces we could see where the
packets get dropped without notification in the kernel.

> When I increase the mtu of the interface to 2000 with ifconfig, then I
> start seeing ICMP fragmentation needed from the next hop, indicating 1500
> as the mtu as response to a 1560 byte UDP[ESP] packet.
>
> The next UDP[ESP] packet is shorter: 1360 bytes. It gets hard to see what's
> going on after that, but the connection is still not working.
>
> So, instead of somehow losing these packets on the way out of the interface
> should the kernel not start with a lower mtu in the first place? Now it
> seems it is trying with the maximum of the interface and expecting to scale
> down with pmtu - which can ever happen.
>
> >Can you send a ip route get <ip> to the problematic target to see how
> >far off the calculated value is?
>
> That command doesn't return anything useful. No hint on the mtu here.
>
> BTW, instead of disabling pmtu, setting mtu explicitly also helps:
> ip route add 10.6.6.0/24 via ${localip} mtu 1300

Strange that the problem disappears if you enable no_pmtu_disc then.

Thanks,

Hannes

2014-02-13 19:53:43

by Ortwin Glück

[permalink] [raw]
Subject: Re: xfrm: is pmtu broken with ESP tunneling?

On 02/13/2014 01:01 AM, Hannes Frederic Sowa wrote:
> Could you try either dropwatch or perf script net_dropmonitor and flood the
> network with the problematic packets. From the traces we could see where the
> packets get dropped without notification in the kernel.

Not much to see, unfortunately. The COUNT doesn't reflect the number packets
that I am missing.

LOCATION OFFSET COUNT

ieee80211_iface_work 208 1

> Strange that the problem disappears if you enable no_pmtu_disc then.

It seems with PMTU the initial mtu is the one of the device (1500). So the
original packet will have that size, but is subsequently wrapped into ESP and
UDP, which add to that size. And the final packet is then larger than the device
MTU... I know nothing about the ip / xfrm kernel code, so it's hard for me to
verify if that theory is real.

Thanks,

Ortwin