LinuxLists.cc - Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

2016-10-11 07:48:00

Subject: Re: [PATCH net v2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

On 11/10/16 02:54, R Parameswaran wrote:
>
>
> Hi James,
>
> Please see inline:
>
> On Tue, Oct 4, 2016 at 12:53 AM, James Chapman <[email protected]
> <mailto:[email protected]>> wrote:
>
> On 04/10/16 04:12, R. Parameswaran wrote:
> >
> > Hi James,
> >
> > Please see inline, thanks for the reply:
> >
> > On Sat, 1 Oct 2016, James Chapman wrote:
> >
> >> On 30/09/16 03:39, R. Parameswaran wrote:
> >>>>> + /* Adjust MTU, factor overhead - underlay L3 hdr, overlay
> L2 hdr*/
> >>>>> + if (tunnel->sock->sk_family == AF_INET)
> >>>>> + overhead += (ETH_HLEN + sizeof(struct iphdr));
> >>>>> + else if (tunnel->sock->sk_family == AF_INET6)
> >>>>> + overhead += (ETH_HLEN + sizeof(struct ipv6hdr));
> >>>> What about options in the IP header? If certain options are
> set on the
> >>>> socket, the IP header may be larger.
> >>>>
> >>> Thanks for the reply - It looks like IP options can only be
> >>> enabled through setsockopt on an application's socket (if
> there's any
> >>> other way to turn on IP options, please let me know - didn't
> see any
> >>> sysctl setting for transmit). This scenario would come
> >>> into picture when an application opens a raw IP or UDP socket
> such that it
> >>> routes into the L2TP logical interface.
> >> No. An L2TP daemon (userspace) will open a socket for each
> tunnel that
> >> it creates. Control and data packets use the same socket, which
> is the
> >> socket used by this code. It may set any options on its
> sockets. L2TP
> >> tunnel sockets can be created either by an L2TP daemon (managed
> tunnels)
> >> or by ip l2tp commands (unmanaged tunnels).
> >>
> > One Q I have is whether it would be sufficient to solve this for the
> > common case (i.e no IP options) and have an expectation that the
> > administrator will explicitly provision the mtu using the 'ip
> link ...
> > mtu' command when dealing with infrequent occurences like IP
> options?
> >
> > But looking at the code, it looks to be possible to pick up whether
> > options are enabled and how long the options are, from the
> ip_options struct
> > embedded in the tunnel socket. If you want me to, I can repost
> the patch
> > with this change (will need a few days) - please let me know if
> this is
> > what you had in mind.
> >
> >
> Yes, that's what I had in mind. But my preference would be that this
> would be a new function in the ip core, for use by any encap protocol,
> where appropriate.
>
> Discussed this with Nachi (nprachan), we were thinking of a new
> function in ip_sockglue.c which would take the tunnel socket as
> parameter, derive the underlay device MTU and compute the underlay L3
> overhead (IPv4/IPv6 header, UDP header if it is a UDP socket, and IP
> option length if the ip_options struct exists in the socket). The
> function would be agnostic to the tunnel type (although we could
> provision tunnel-type and encap type as parameters). Callers would
> call it to figure out the cumulative underlay L3 overhead and the
> underlay MTU, and then use these numbers in the MTU calculation for
> their specific tunnel type. Let me know if that is different from what
> you had in mind, and/or if you have any suggestions on which file to
> place this in. I'll try and have this re-posted by the end of this
> week or by early next week.
>

I think keep it simple. A function to return the size of the IP header
associated with any IP socket, not necessarily a tunnel socket. Don't
mix in any MTU derivation logic or UDP header size etc.

Post code early as an RFC. You're more likely to get review feedback
from others.

2016-10-17 04:08:47

by R. Parameswaran

[permalink] [raw]

Subject: [RFC PATCH v3 1/2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

[v3: Picked up review comments from James Chapman, added a
function to compute ip header + ip option overhead on a socket, and factored
it into L2TP change-set, RFC, would like early feedback on name and
placement, and logic of new function while I test this]

>From 30c4b3900d09deb912fc6ce4af3c19e870f84e14 Mon Sep 17 00:00:00 2001
From: "R. Parameswaran" <[email protected]>
Date: Sun, 16 Oct 2016 20:19:38 -0700

In existing kernel code, when setting up the L2TP interface, all of the
tunnel encapsulation headers are not taken into account when setting
up the MTU on the L2TP logical interface device. Due to this, the
packets created by the applications on top of the L2TP layer are larger
than they ought to be, relative to the underlay MTU, which leads to
needless fragmentation once the L2TP packet is encapsulated in an outer IP
packet.

Specifically, the MTU calculation does not take into account the (outer)
IP header imposed on the encapsulated L2TP packet, and the Layer 2 header
imposed on the inner L2TP packet prior to encapsulation. The patch posted
here takes care of these.

Existing code also seems to assume an Ethernet (non-jumbo) underlay. The
patch uses the PMTU mechanism and the dst entry in the L2TP tunnel socket
to directly pull up the underlay MTU (as the baseline number on top of
which the encapsulation headers are factored in). Ethernet MTU is
assumed as a fallback only if this fails.

Picked up review comments from James Chapman, added a function
to compute ip header + ip option overhead on a socket, and factored it
into L2TP change-set.

Signed-off-by: [email protected],
Signed-off-by: [email protected],
Signed-off-by: [email protected],
Signed-off-by: [email protected]
---
include/linux/net.h | 3 +++
net/socket.c | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 40 insertions(+)

diff --git a/include/linux/net.h b/include/linux/net.h
index cd0c8bd..2c8b092 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -298,6 +298,9 @@ int kernel_sendpage(struct socket *sock, struct page *page, int offset,
int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg);
int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how);

+/* Following routine returns the IP overhead imposed by a socket. */
+u32 kernel_sock_ip_overhead(struct sock *sk);
+
#define MODULE_ALIAS_NETPROTO(proto) \
MODULE_ALIAS("net-pf-" __stringify(proto))

diff --git a/net/socket.c b/net/socket.c
index 5a9bf5e..d5e79c2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3293,3 +3293,40 @@ int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how)
return sock->ops->shutdown(sock, how);
}
EXPORT_SYMBOL(kernel_sock_shutdown);
+
+/*
+ * This routine returns the IP overhead imposed by a socket i.e.
+ * the length of the underlying IP header, depending on whether
+ * this is an IPv4 or IPv6 socket and the length from IP options turned
+ * on at the socket.
+ */
+u32 kernel_sock_ip_overhead(struct sock *sk)
+{
+ u32 overhead = 0;
+ if (!sk)
+ goto done;
+ if (sk->sk_family == AF_INET) {
+ struct ip_options_rcu *opt = NULL;
+ struct inet_sock *inet = inet_sk(sk);
+ overhead += sizeof(struct iphdr);
+ if (inet)
+ opt = rcu_dereference_protected(inet->inet_opt,
+ sock_owned_by_user(sk));
+ if (opt)
+ overhead += opt->opt.optlen;
+ }
+ else if (sk->sk_family == AF_INET6) {
+ struct ipv6_pinfo *np = inet6_sk(sk);
+ struct ipv6_txoptions *opt = NULL;
+ overhead += sizeof(struct ipv6hdr);
+ if (np)
+ opt = rcu_dereference_protected(np->opt,
+ sock_owned_by_user(sk));
+ if (opt)
+ overhead += (opt->opt_flen + opt->opt_nflen);
+ }
+
+done:
+ return overhead;
+}
+EXPORT_SYMBOL_GPL(kernel_sock_ip_overhead);
--
2.1.4

----
On Tue, 11 Oct 2016, James Chapman wrote:

>
> I think keep it simple. A function to return the size of the IP header
> associated with any IP socket, not necessarily a tunnel socket. Don't
> mix in any MTU derivation logic or UDP header size etc.
>
> Post code early as an RFC. You're more likely to get review feedback
> from others.
>
>
>
>

2016-10-17 05:22:36

by R. Parameswaran

[permalink] [raw]

Subject: [RFC PATCH v3 2/2] L2TP:Adjust intf MTU,factor underlay L3,overlay L2

[v3: Picked up review comments from James Chapman, added a
function to compute ip header + ip option overhead on a socket, and factored
it into L2TP change-set, RFC, would like early feedback on name and
placement of new function while I test this.

Part 2/2: Changes in l2tp_eth.c, using the new API from part 1]

>From f4066da53e781ef167055c1e89ca1a7819215a40 Mon Sep 17 00:00:00 2001
From: "R. Parameswaran" <[email protected]>
Date: Sun, 16 Oct 2016 20:27:20 -0700

In existing kernel code, when setting up the L2TP interface, all of the
tunnel encapsulation headers are not taken into account when setting
up the MTU on the L2TP logical interface device. Due to this, the
packets created by the applications on top of the L2TP layer are larger
than they ought to be, relative to the underlay MTU, which leads to
needless fragmentation once the L2TP packet is encapsulated in an outer IP
packet.

Specifically, the MTU calculation does not take into account the (outer)
IP header imposed on the encapsulated L2TP packet, and the Layer 2 header
imposed on the inner L2TP packet prior to encapsulation. The patch posted
here takes care of these.

Existing code also seems to assume an Ethernet (non-jumbo) underlay. The
patch uses the PMTU mechanism and the dst entry in the L2TP tunnel socket
to directly pull up the underlay MTU (as the baseline number on top of
which the encapsulation headers are factored in). Ethernet MTU is
assumed as a fallback only if this fails.

Picked up review comments from James Chapman, added a function
to compute ip header + ip option overhead on a socket, and factored it
into L2TP change-set.

Signed-off-by: [email protected],
Signed-off-by: [email protected],
Signed-off-by: [email protected],
Signed-off-by: [email protected]
---
net/l2tp/l2tp_eth.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 47 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 965f7e3..75eb5d3 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -30,6 +30,9 @@
#include <net/xfrm.h>
#include <net/net_namespace.h>
#include <net/netns/generic.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/udp.h>

#include "l2tp_core.h"

@@ -206,6 +209,49 @@ static void l2tp_eth_show(struct seq_file *m, void *arg)
}
#endif

+static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
+ struct l2tp_session *session,
+ struct net_device *dev)
+{
+ unsigned int overhead = 0;
+ struct dst_entry *dst;
+ u32 l3_overhead = 0;
+
+ if (session->mtu != 0) {
+ dev->mtu = session->mtu;
+ dev->needed_headroom += session->hdr_len;
+ if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+ dev->needed_headroom += sizeof(struct udphdr);
+ return;
+ }
+ overhead = session->hdr_len;
+ l3_overhead = kernel_sock_ip_overhead(tunnel->sock);
+ if (!tunnel->sock || (l3_overhead == 0)) {
+ /* L3 Overhead couldn't be identified, dev mtu stays at 1500 */
+ return;
+ }
+ /* Adjust MTU, factor overhead - underlay L3, overlay L2 hdr*/
+ overhead += ETH_HLEN + l3_overhead;
+ /* Additionally, if the encap is UDP, account for UDP header size */
+ if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+ overhead += sizeof(struct udphdr);
+ /* If PMTU discovery was enabled, use discovered MTU on L2TP device */
+ dst = sk_dst_get(tunnel->sock);
+ if (dst) {
+ /* dst_mtu will use PMTU if found, else fallback to intf MTU */
+ u32 pmtu = dst_mtu(dst);
+
+ if (pmtu != 0)
+ dev->mtu = pmtu;
+ dst_release(dst);
+ }
+ session->mtu = dev->mtu - overhead;
+ dev->mtu = session->mtu;
+ dev->needed_headroom += session->hdr_len;
+ if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+ dev->needed_headroom += sizeof(struct udphdr);
+}
+
static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg)
{
struct net_device *dev;
@@ -255,11 +301,8 @@ static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 p
}

dev_net_set(dev, net);
- if (session->mtu == 0)
- session->mtu = dev->mtu - session->hdr_len;
- dev->mtu = session->mtu;
- dev->needed_headroom += session->hdr_len;

+ l2tp_eth_adjust_mtu(tunnel, session, dev);
priv = netdev_priv(dev);
priv->dev = dev;
priv->session = session;
--
2.1.4

----

>
> I think keep it simple. A function to return the size of the IP header
> associated with any IP socket, not necessarily a tunnel socket. Don't
> mix in any MTU derivation logic or UDP header size etc.
>
> Post code early as an RFC. You're more likely to get review feedback
> from others.
>
>
>
>