2019-11-19 16:17:29

by Byron Stanoszek

[permalink] [raw]
Subject: Kernel 5.4 regression - memory leak in network layer

Hi Dave,

I'm experiencing a large memory leak in kernel 5.4-rc* when receiving network
packets using (at least) the intel igb driver. The leak is not present in
kernel 5.3. I've bisected this down to a set of ~900 commits, but I'm having
trouble bisecting any further due to several of the commits causing a kernel
panic on my system.

Git bisect shows the problem started somewhere in this range of commits:

a18670f4617d41829ce88ab3e47bbc406e4dc5e8 028db79cf46ae860fbdc5aa4f431b6969ea72c7b

I've traced the unreferenced objects using kmemleak. They all look like the
following:

unreferenced object 0xffff88821a48a180 (size 64):
comm "softirq", pid 0, jiffies 4294709480 (age 192.558s)
hex dump (first 32 bytes):
01 00 00 00 01 06 ff ff 00 00 00 00 00 00 00 00 ................
00 20 72 3d 82 88 ff ff 00 00 00 00 00 00 00 00 . r=............
backtrace:
[<00000000edf73c5e>] skb_ext_add+0xc0/0xf0
[<00000000ca960770>] br_nf_pre_routing+0x171/0x489
[<0000000063a55d83>] br_handle_frame+0x171/0x300
[<0000000023337088>] __netif_receive_skb_core+0x12a/0x840
[<0000000072de0b8c>] __netif_receive_skb_one_core+0x21/0x50
[<0000000006be8eb7>] netif_receive_skb_internal+0x55/0xa0
[<000000006ba3ef5b>] napi_gro_receive+0x45/0x80
[<00000000f960dfcf>] igb_poll+0x434/0xeb0
[<0000000002710716>] net_rx_action+0xcb/0x240
[<00000000657ca2bd>] __do_softirq+0xc8/0x209
[<000000004860e2aa>] irq_exit+0x9a/0xa0
[<0000000024397c0c>] do_IRQ+0x4a/0xc0
[<000000008ecdeb2a>] ret_from_intr+0x0/0x14

I am currently using this patch to igb_main.c to disable MSI-X (just in case it
matters):

--- linux/drivers/net/ethernet/intel/igb/igb_main.c.bak 2019-06-19 16:39:12.000000000 -0400
+++ linux/drivers/net/ethernet/intel/igb/igb_main.c 2019-06-19 16:39:21.000000000 -0400
@@ -1097,7 +1097,7 @@ static void igb_set_interrupt_capability
int err;
int numvecs, i;

- if (!msix)
+// if (!msix)
goto msi_only;
adapter->flags |= IGB_FLAG_HAS_MSIX;


Please let me know if I can assist further to help you track this leak down.

I've attached a copy of my kernel config.

Thanks,
-Byron


Attachments:
config-5.4 (97.86 kB)

2019-11-19 16:24:12

by Florian Westphal

[permalink] [raw]
Subject: Re: Kernel 5.4 regression - memory leak in network layer

Byron Stanoszek <[email protected]> wrote:
> unreferenced object 0xffff88821a48a180 (size 64):
> comm "softirq", pid 0, jiffies 4294709480 (age 192.558s)
> hex dump (first 32 bytes):
> 01 00 00 00 01 06 ff ff 00 00 00 00 00 00 00 00 ................
> 00 20 72 3d 82 88 ff ff 00 00 00 00 00 00 00 00 . r=............
> backtrace:
> [<00000000edf73c5e>] skb_ext_add+0xc0/0xf0
> [<00000000ca960770>] br_nf_pre_routing+0x171/0x489
> [<0000000063a55d83>] br_handle_frame+0x171/0x300

Brnf related, I will have a look.

2019-11-20 20:32:01

by Florian Westphal

[permalink] [raw]
Subject: Re: Kernel 5.4 regression - memory leak in network layer

Florian Westphal <[email protected]> wrote:
> Byron Stanoszek <[email protected]> wrote:
> > unreferenced object 0xffff88821a48a180 (size 64):
> > comm "softirq", pid 0, jiffies 4294709480 (age 192.558s)
> > hex dump (first 32 bytes):
> > 01 00 00 00 01 06 ff ff 00 00 00 00 00 00 00 00 ................
> > 00 20 72 3d 82 88 ff ff 00 00 00 00 00 00 00 00 . r=............
> > backtrace:
> > [<00000000edf73c5e>] skb_ext_add+0xc0/0xf0
> > [<00000000ca960770>] br_nf_pre_routing+0x171/0x489
> > [<0000000063a55d83>] br_handle_frame+0x171/0x300
>
> Brnf related, I will have a look.

Not reproducible.

$ ipables-save -c
[66041:2794275493] -A INPUT -i br0 -m physdev --physdev-in eth0

... so br-netfilter is active (else physdev would not work). No leaks
after multiple netperf runs.

I'm on

c74386d50fbaf4a54fd3fe560f1abc709c0cff4b ("afs: Fix missing timeout reset").

Simple bridge with two ethernet interfaces (no vlans or netns for instance).

Does your setup use any other settings (ethtool, sysctl, qdiscs, tunnels
and the like)?

Thanks.

2019-11-21 00:17:02

by Byron Stanoszek

[permalink] [raw]
Subject: Re: Kernel 5.4 regression - memory leak in network layer

On Wed, 20 Nov 2019, Florian Westphal wrote:

> Florian Westphal <[email protected]> wrote:
>> Byron Stanoszek <[email protected]> wrote:
>>> unreferenced object 0xffff88821a48a180 (size 64):
>>> comm "softirq", pid 0, jiffies 4294709480 (age 192.558s)
>>> hex dump (first 32 bytes):
>>> 01 00 00 00 01 06 ff ff 00 00 00 00 00 00 00 00 ................
>>> 00 20 72 3d 82 88 ff ff 00 00 00 00 00 00 00 00 . r=............
>>> backtrace:
>>> [<00000000edf73c5e>] skb_ext_add+0xc0/0xf0
>>> [<00000000ca960770>] br_nf_pre_routing+0x171/0x489
>>> [<0000000063a55d83>] br_handle_frame+0x171/0x300
>>
>> Brnf related, I will have a look.
>
> Not reproducible.
>
> I'm on
>
> c74386d50fbaf4a54fd3fe560f1abc709c0cff4b ("afs: Fix missing timeout reset").

I confirm I still see the issue on that commit.

> Does your setup use any other settings (ethtool, sysctl, qdiscs, tunnels
> and the like)?

Yeah, I'm using macvlan. Here are my settings:

$ ethtool -i eth0
driver: e1000e
version: 3.2.6-k
firmware-version: 0.13-4
expansion-rom-version:
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

$ ethtool -i eth1
driver: igb
version: 5.6.0-k
firmware-version: 3.25, 0x800005cf
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Commands to set up network:

ethtool -K eth0 tx off rx off
ethtool -K eth1 tx off rx off
ifconfig lo 127.0.0.1
ifconfig eth0 up
brctl addbr br0
brctl addif br0 eth0
brctl setfd br0 0
ifconfig eth1 up
brctl addbr br1
brctl addif br1 eth1
brctl setfd br1 0
ifconfig br0 172.17.2.10 netmask 255.255.0.0
ifconfig br1 192.168.0.1 netmask 255.255.255.0
ip l add link br1 mac1 address BE:77:00:00:00:70 type macvlan mode bridge
ip l set mac1 up
ip a add 192.168.0.70/24 broadcast + dev mac1

$ iptables-save -c
# Generated by iptables-save v1.8.3 on Wed Nov 20 17:26:29 2019
*raw
:PREROUTING ACCEPT [3701999:2657924997]
:OUTPUT ACCEPT [1122825:291796686]
COMMIT
# Completed on Wed Nov 20 17:26:29 2019
# Generated by iptables-save v1.8.3 on Wed Nov 20 17:26:29 2019
*nat
:PREROUTING ACCEPT [612068:41087443]
:INPUT ACCEPT [17:2254]
:OUTPUT ACCEPT [55:3780]
:POSTROUTING ACCEPT [36:2340]
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 102 -j DNAT --to-destination 192.168.0.2
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 2222 -j DNAT --to-destination 192.168.0.2
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 5900 -j DNAT --to-destination 192.168.0.4
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 44818 -j DNAT --to-destination 192.168.0.2
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 51234 -j DNAT --to-destination 192.168.0.9
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 51235 -j DNAT --to-destination 192.168.0.9
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 51236 -j DNAT --to-destination 192.168.0.9
[0:0] -A PREROUTING -d 172.17.2.10/32 -i br0 -p tcp -m tcp --dport 44444 -j DNAT --to-destination 192.168.0.9
[2:120] -A POSTROUTING -o br0 -j MASQUERADE
[17:1320] -A POSTROUTING -o br1 -j MASQUERADE
[0:0] -A POSTROUTING -o eth2 -j MASQUERADE
COMMIT
# Completed on Wed Nov 20 17:26:29 2019
# Generated by iptables-save v1.8.3 on Wed Nov 20 17:26:29 2019
*filter
:INPUT ACCEPT [3093143:2617432037]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1122803:291795238]
COMMIT
# Completed on Wed Nov 20 17:26:29 2019

Setting up another box as IP 172.17.2.11 and 192.168.0.99 and running this
command from the original box reliably adds about 2MB of memory marked used
according to "free":

netperf -H 172.17.2.11 -t UDP_RR

or

netperf -H 192.168.0.99 -t UDP_RR

Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec

212992 212992 1 1 10.00 4000.98
212992 212992

Nothing else at the moment is attached to the bridges:

$ brctl show
bridge name bridge id STP enabled interfaces
br0 8000.2046a101b1fb no eth0
br1 8000.2046a101b1fc no eth1

As for network-related sysctls, I've got:

# Enable IP Forwarding
net.ipv4.ip_forward = 1

# Increase the number of in-flight AF_UNIX datagrams per socket
net.unix.max_dgram_qlen = 1000

Regards,
-Byron

2019-11-21 05:33:37

by Florian Westphal

[permalink] [raw]
Subject: Re: Kernel 5.4 regression - memory leak in network layer

Byron Stanoszek <[email protected]> wrote:
> On Wed, 20 Nov 2019, Florian Westphal wrote:
> > Not reproducible.
> >
> > I'm on
> >
> > c74386d50fbaf4a54fd3fe560f1abc709c0cff4b ("afs: Fix missing timeout reset").
>
> I confirm I still see the issue on that commit.

[..]

> netperf -H 172.17.2.11 -t UDP_RR

Ah, thats all it takes. Its related to UDP_SKB_IS_STATELESS, commit
895b5c9f206eb7d25dc1360a is the culprit. I'll send a fix shortly.

2019-11-21 05:58:11

by Florian Westphal

[permalink] [raw]
Subject: [PATCH net] udp: drop skb extensions before marking skb stateless

Once udp stack has set the UDP_SKB_IS_STATELESS flag, later skb free
assumes all skb head state has been dropped already.

This will leak the extension memory in case the skb has extensions other
than the ipsec secpath, e.g. bridge nf data.

To fix this, set the UDP_SKB_IS_STATELESS flag only if we don't have
extensions or if the extension space can be free'd.

Fixes: 895b5c9f206eb7d25dc1360a ("netfilter: drop bridge nf reset from nf_reset")
Cc: Paolo Abeni <[email protected]>
Reported-by: Byron Stanoszek <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
---
include/linux/skbuff.h | 6 ++++++
net/ipv4/udp.c | 27 ++++++++++++++++++++++-----
2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 64a395c7f689..8688f7adfda7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4169,12 +4169,18 @@ static inline void skb_ext_reset(struct sk_buff *skb)
skb->active_extensions = 0;
}
}
+
+static inline bool skb_has_extensions(struct sk_buff *skb)
+{
+ return unlikely(skb->active_extensions);
+}
#else
static inline void skb_ext_put(struct sk_buff *skb) {}
static inline void skb_ext_reset(struct sk_buff *skb) {}
static inline void skb_ext_del(struct sk_buff *skb, int unused) {}
static inline void __skb_ext_copy(struct sk_buff *d, const struct sk_buff *s) {}
static inline void skb_ext_copy(struct sk_buff *dst, const struct sk_buff *s) {}
+static inline bool skb_has_extensions(struct sk_buff *skb) { return false; }
#endif /* CONFIG_SKB_EXTENSIONS */

static inline void nf_reset_ct(struct sk_buff *skb)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1d58ce829dca..447defbfccdd 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1297,6 +1297,27 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,

#define UDP_SKB_IS_STATELESS 0x80000000

+/* all head states (dst, sk, nf conntrack) except skb extensions are
+ * cleared by udp_rcv().
+ *
+ * We need to preserve secpath, if present, to eventually process
+ * IP_CMSG_PASSSEC at recvmsg() time.
+ *
+ * Other extensions can be cleared.
+ */
+static bool udp_try_make_stateless(struct sk_buff *skb)
+{
+ if (!skb_has_extensions(skb))
+ return true;
+
+ if (!secpath_exists(skb)) {
+ skb_ext_reset(skb);
+ return true;
+ }
+
+ return false;
+}
+
static void udp_set_dev_scratch(struct sk_buff *skb)
{
struct udp_dev_scratch *scratch = udp_skb_scratch(skb);
@@ -1308,11 +1329,7 @@ static void udp_set_dev_scratch(struct sk_buff *skb)
scratch->csum_unnecessary = !!skb_csum_unnecessary(skb);
scratch->is_linear = !skb_is_nonlinear(skb);
#endif
- /* all head states execept sp (dst, sk, nf) are always cleared by
- * udp_rcv() and we need to preserve secpath, if present, to eventually
- * process IP_CMSG_PASSSEC at recvmsg() time
- */
- if (likely(!skb_sec_path(skb)))
+ if (udp_try_make_stateless(skb))
scratch->_tsize_state |= UDP_SKB_IS_STATELESS;
}

--
2.23.0

2019-11-21 14:41:35

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH net] udp: drop skb extensions before marking skb stateless

On Thu, 2019-11-21 at 06:56 +0100, Florian Westphal wrote:
> Once udp stack has set the UDP_SKB_IS_STATELESS flag, later skb free
> assumes all skb head state has been dropped already.
>
> This will leak the extension memory in case the skb has extensions other
> than the ipsec secpath, e.g. bridge nf data.
>
> To fix this, set the UDP_SKB_IS_STATELESS flag only if we don't have
> extensions or if the extension space can be free'd.
>
> Fixes: 895b5c9f206eb7d25dc1360a ("netfilter: drop bridge nf reset from nf_reset")
> Cc: Paolo Abeni <[email protected]>
> Reported-by: Byron Stanoszek <[email protected]>
> Signed-off-by: Florian Westphal <[email protected]>
> ---
> include/linux/skbuff.h | 6 ++++++
> net/ipv4/udp.c | 27 ++++++++++++++++++++++-----
> 2 files changed, 28 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 64a395c7f689..8688f7adfda7 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -4169,12 +4169,18 @@ static inline void skb_ext_reset(struct sk_buff *skb)
> skb->active_extensions = 0;
> }
> }
> +
> +static inline bool skb_has_extensions(struct sk_buff *skb)
> +{
> + return unlikely(skb->active_extensions);
> +}
> #else
> static inline void skb_ext_put(struct sk_buff *skb) {}
> static inline void skb_ext_reset(struct sk_buff *skb) {}
> static inline void skb_ext_del(struct sk_buff *skb, int unused) {}
> static inline void __skb_ext_copy(struct sk_buff *d, const struct sk_buff *s) {}
> static inline void skb_ext_copy(struct sk_buff *dst, const struct sk_buff *s) {}
> +static inline bool skb_has_extensions(struct sk_buff *skb) { return false; }
> #endif /* CONFIG_SKB_EXTENSIONS */
>
> static inline void nf_reset_ct(struct sk_buff *skb)
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 1d58ce829dca..447defbfccdd 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1297,6 +1297,27 @@ int udp_sendpage(struct sock *sk, struct page *page, int offset,
>
> #define UDP_SKB_IS_STATELESS 0x80000000
>
> +/* all head states (dst, sk, nf conntrack) except skb extensions are
> + * cleared by udp_rcv().
> + *
> + * We need to preserve secpath, if present, to eventually process
> + * IP_CMSG_PASSSEC at recvmsg() time.
> + *
> + * Other extensions can be cleared.
> + */
> +static bool udp_try_make_stateless(struct sk_buff *skb)
> +{
> + if (!skb_has_extensions(skb))
> + return true;
> +
> + if (!secpath_exists(skb)) {
> + skb_ext_reset(skb);
> + return true;
> + }
> +
> + return false;
> +}
> +
> static void udp_set_dev_scratch(struct sk_buff *skb)
> {
> struct udp_dev_scratch *scratch = udp_skb_scratch(skb);
> @@ -1308,11 +1329,7 @@ static void udp_set_dev_scratch(struct sk_buff *skb)
> scratch->csum_unnecessary = !!skb_csum_unnecessary(skb);
> scratch->is_linear = !skb_is_nonlinear(skb);
> #endif
> - /* all head states execept sp (dst, sk, nf) are always cleared by
> - * udp_rcv() and we need to preserve secpath, if present, to eventually
> - * process IP_CMSG_PASSSEC at recvmsg() time
> - */
> - if (likely(!skb_sec_path(skb)))
> + if (udp_try_make_stateless(skb))
> scratch->_tsize_state |= UDP_SKB_IS_STATELESS;
> }

Acked-by: Paolo Abeni <[email protected]>

2019-11-21 20:47:55

by Byron Stanoszek

[permalink] [raw]
Subject: Re: [PATCH net] udp: drop skb extensions before marking skb stateless

On Thu, 21 Nov 2019, Florian Westphal wrote:

> Once udp stack has set the UDP_SKB_IS_STATELESS flag, later skb free
> assumes all skb head state has been dropped already.
>
> This will leak the extension memory in case the skb has extensions other
> than the ipsec secpath, e.g. bridge nf data.
>
> To fix this, set the UDP_SKB_IS_STATELESS flag only if we don't have
> extensions or if the extension space can be free'd.
>
> Fixes: 895b5c9f206eb7d25dc1360a ("netfilter: drop bridge nf reset from nf_reset")
> Cc: Paolo Abeni <[email protected]>
> Reported-by: Byron Stanoszek <[email protected]>
> Signed-off-by: Florian Westphal <[email protected]>

I confirm that this fixes the memory leak on my systems. Thank you for the fast
turnaround.

Regards,
-Byron

2019-11-21 23:05:36

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net] udp: drop skb extensions before marking skb stateless

From: Florian Westphal <[email protected]>
Date: Thu, 21 Nov 2019 06:56:23 +0100

> Once udp stack has set the UDP_SKB_IS_STATELESS flag, later skb free
> assumes all skb head state has been dropped already.
>
> This will leak the extension memory in case the skb has extensions other
> than the ipsec secpath, e.g. bridge nf data.
>
> To fix this, set the UDP_SKB_IS_STATELESS flag only if we don't have
> extensions or if the extension space can be free'd.
>
> Fixes: 895b5c9f206eb7d25dc1360a ("netfilter: drop bridge nf reset from nf_reset")
> Cc: Paolo Abeni <[email protected]>
> Reported-by: Byron Stanoszek <[email protected]>
> Signed-off-by: Florian Westphal <[email protected]>

Applied, thanks Florian.