2016-11-18 12:41:37

by Ding Tianhong

[permalink] [raw]
Subject: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic

The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
will introduce a new problem that when huge IP abnormal packet arrived,
it may cause OOM and break the kernel, just like this:

[ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
[ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
[ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
[ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
[ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
[ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
[ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
[ 100.067050] Call Trace:
[ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
[ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
[ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
[ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
[ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
[ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
[ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
[ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
[ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
[ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
[ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
[ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
[ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
[ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
[ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
[ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
[ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
[ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
[ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
[ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
[ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
[ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
[ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140

================================cut here=====================================

The reason is that the huge abnormal IP packet will be received to net stack
and be dropped finally by dst_release, and the dst_release would use the rcuos
callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
calling do_softirq() to receive more and more IP abnormal packets which will be
throw into the RCU callbacks again later, the number of received packet is much
greater than the number of packets freed, it will exhaust the memory and then OOM,
so don't try to process any pending softirqs in the rcuos callback-offload kthread
is a more effective solution.

Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
Signed-off-by: Ding Tianhong <[email protected]>

Signed-off-by: Ding Tianhong <[email protected]>
---
kernel/rcu/tree_plugin.h | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 85c5a88..760c3b5 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
if (__rcu_reclaim(rdp->rsp->name, list))
cl++;
c++;
- local_bh_enable();
- cond_resched_rcu_qs();
+ _local_bh_enable();
list = next;
}
trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
--
1.9.0




2016-11-18 13:01:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic

On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> will introduce a new problem that when huge IP abnormal packet arrived,
> it may cause OOM and break the kernel, just like this:
>
> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
> [ 100.067050] Call Trace:
> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
>
> ================================cut here=====================================
>
> The reason is that the huge abnormal IP packet will be received to net stack
> and be dropped finally by dst_release, and the dst_release would use the rcuos
> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
> calling do_softirq() to receive more and more IP abnormal packets which will be
> throw into the RCU callbacks again later, the number of received packet is much
> greater than the number of packets freed, it will exhaust the memory and then OOM,
> so don't try to process any pending softirqs in the rcuos callback-offload kthread
> is a more effective solution.

OK, but we could still have softirqs processed by the grace-period kthread
as a result of any number of other events. So this change might reduce
the probability of this problem, but it doesn't eliminate it.

How huge are these huge IP packets? Is the underlying problem that they
are too large to use the memory-allocator fastpaths?

Thanx, Paul

> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> Signed-off-by: Ding Tianhong <[email protected]>
>
> Signed-off-by: Ding Tianhong <[email protected]>
> ---
> kernel/rcu/tree_plugin.h | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 85c5a88..760c3b5 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> if (__rcu_reclaim(rdp->rsp->name, list))
> cl++;
> c++;
> - local_bh_enable();
> - cond_resched_rcu_qs();
> + _local_bh_enable();
> list = next;
> }
> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> --
> 1.9.0
>
>
>

2016-11-19 07:52:10

by Ding Tianhong

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic



On 2016/11/18 21:01, Paul E. McKenney wrote:
> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
>> will introduce a new problem that when huge IP abnormal packet arrived,
>> it may cause OOM and break the kernel, just like this:
>>
>> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
>> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
>> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
>> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
>> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
>> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
>> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
>> [ 100.067050] Call Trace:
>> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
>> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
>> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
>> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
>> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
>> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
>> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
>> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
>> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
>> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
>> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
>> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
>> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
>> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
>> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
>> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
>> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
>> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
>> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
>> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
>> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
>> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
>> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
>>
>> ================================cut here=====================================
>>
>> The reason is that the huge abnormal IP packet will be received to net stack
>> and be dropped finally by dst_release, and the dst_release would use the rcuos
>> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
>> calling do_softirq() to receive more and more IP abnormal packets which will be
>> throw into the RCU callbacks again later, the number of received packet is much
>> greater than the number of packets freed, it will exhaust the memory and then OOM,
>> so don't try to process any pending softirqs in the rcuos callback-offload kthread
>> is a more effective solution.
>
> OK, but we could still have softirqs processed by the grace-period kthread
> as a result of any number of other events. So this change might reduce
> the probability of this problem, but it doesn't eliminate it.
>
> How huge are these huge IP packets? Is the underlying problem that they
> are too large to use the memory-allocator fastpaths?
>
> Thanx, Paul
>

I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and
IP abnormal packet to full speed.

The Mac abnormal packet would be dropped at low level and not be received to net stack,
but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and
release later by dst_release because it is meaningless.

dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);

so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop
if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread,
so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater
fastpaths, there is no memory leak and the Ixgbe may meet the same problem too.

Thanks.
Ding


>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
>> Signed-off-by: Ding Tianhong <[email protected]>
>>
>> Signed-off-by: Ding Tianhong <[email protected]>
>> ---
>> kernel/rcu/tree_plugin.h | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> index 85c5a88..760c3b5 100644
>> --- a/kernel/rcu/tree_plugin.h
>> +++ b/kernel/rcu/tree_plugin.h
>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
>> if (__rcu_reclaim(rdp->rsp->name, list))
>> cl++;
>> c++;
>> - local_bh_enable();
>> - cond_resched_rcu_qs();
>> + _local_bh_enable();
>> list = next;
>> }
>> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
>> --
>> 1.9.0
>>
>>
>>
>
>
> .
>

2016-11-19 08:22:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic

On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote:
>
>
> On 2016/11/18 21:01, Paul E. McKenney wrote:
> > On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
> >> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> >> will introduce a new problem that when huge IP abnormal packet arrived,
> >> it may cause OOM and break the kernel, just like this:
> >>
> >> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
> >> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
> >> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
> >> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
> >> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
> >> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
> >> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
> >> [ 100.067050] Call Trace:
> >> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
> >> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
> >> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
> >> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
> >> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
> >> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
> >> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
> >> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
> >> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
> >> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
> >> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
> >> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
> >> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
> >> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
> >> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
> >> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
> >> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
> >> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
> >> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
> >> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
> >> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> >> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
> >> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> >>
> >> ================================cut here=====================================
> >>
> >> The reason is that the huge abnormal IP packet will be received to net stack
> >> and be dropped finally by dst_release, and the dst_release would use the rcuos
> >> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
> >> calling do_softirq() to receive more and more IP abnormal packets which will be
> >> throw into the RCU callbacks again later, the number of received packet is much
> >> greater than the number of packets freed, it will exhaust the memory and then OOM,
> >> so don't try to process any pending softirqs in the rcuos callback-offload kthread
> >> is a more effective solution.
> >
> > OK, but we could still have softirqs processed by the grace-period kthread
> > as a result of any number of other events. So this change might reduce
> > the probability of this problem, but it doesn't eliminate it.
> >
> > How huge are these huge IP packets? Is the underlying problem that they
> > are too large to use the memory-allocator fastpaths?
> >
> > Thanx, Paul
> >
>
> I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and
> IP abnormal packet to full speed.
>
> The Mac abnormal packet would be dropped at low level and not be received to net stack,
> but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and
> release later by dst_release because it is meaningless.
>
> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);
>
> so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop
> if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread,
> so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater
> fastpaths, there is no memory leak and the Ixgbe may meet the same problem too.

The overall effect of these two patches is to move from enabling bh
(and processing recent softirqs) to enabling bh without processing
recent softirqs. Is this really the correct way to solve this problem?
What about this solution is avoiding re-introducing the original
softlockups? Have you talked to the networking guys about this issue?

Thanx, Paul

> Thanks.
> Ding
>
>
> >> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> >> Signed-off-by: Ding Tianhong <[email protected]>
> >>
> >> Signed-off-by: Ding Tianhong <[email protected]>
> >> ---
> >> kernel/rcu/tree_plugin.h | 3 +--
> >> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >> index 85c5a88..760c3b5 100644
> >> --- a/kernel/rcu/tree_plugin.h
> >> +++ b/kernel/rcu/tree_plugin.h
> >> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> >> if (__rcu_reclaim(rdp->rsp->name, list))
> >> cl++;
> >> c++;
> >> - local_bh_enable();
> >> - cond_resched_rcu_qs();
> >> + _local_bh_enable();
> >> list = next;
> >> }
> >> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> >> --
> >> 1.9.0
> >>
> >>
> >>
> >
> >
> > .
> >
>

2016-11-21 00:13:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic

On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote:
> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote:
> >
> >
> > On 2016/11/18 21:01, Paul E. McKenney wrote:
> > > On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
> > >> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> > >> will introduce a new problem that when huge IP abnormal packet arrived,
> > >> it may cause OOM and break the kernel, just like this:
> > >>
> > >> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
> > >> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
> > >> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
> > >> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
> > >> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
> > >> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
> > >> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
> > >> [ 100.067050] Call Trace:
> > >> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
> > >> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
> > >> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
> > >> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
> > >> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
> > >> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
> > >> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
> > >> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
> > >> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
> > >> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
> > >> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
> > >> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
> > >> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
> > >> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
> > >> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
> > >> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
> > >> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
> > >> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
> > >> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
> > >> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
> > >> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> > >> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
> > >> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> > >>
> > >> ================================cut here=====================================
> > >>
> > >> The reason is that the huge abnormal IP packet will be received to net stack
> > >> and be dropped finally by dst_release, and the dst_release would use the rcuos
> > >> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
> > >> calling do_softirq() to receive more and more IP abnormal packets which will be
> > >> throw into the RCU callbacks again later, the number of received packet is much
> > >> greater than the number of packets freed, it will exhaust the memory and then OOM,
> > >> so don't try to process any pending softirqs in the rcuos callback-offload kthread
> > >> is a more effective solution.
> > >
> > > OK, but we could still have softirqs processed by the grace-period kthread
> > > as a result of any number of other events. So this change might reduce
> > > the probability of this problem, but it doesn't eliminate it.
> > >
> > > How huge are these huge IP packets? Is the underlying problem that they
> > > are too large to use the memory-allocator fastpaths?
> > >
> > > Thanx, Paul
> > >
> >
> > I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and
> > IP abnormal packet to full speed.
> >
> > The Mac abnormal packet would be dropped at low level and not be received to net stack,
> > but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and
> > release later by dst_release because it is meaningless.
> >
> > dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);
> >
> > so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop
> > if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread,
> > so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater
> > fastpaths, there is no memory leak and the Ixgbe may meet the same problem too.

And following up on my fastpath point -- from what I can see, one
big effect of the large invalid packets is that they push processing
off of a number of fastpaths. If these packets could be rejected with
less per-packet processing, I bet that things would work much better.

Thanx, Paul

> The overall effect of these two patches is to move from enabling bh
> (and processing recent softirqs) to enabling bh without processing
> recent softirqs. Is this really the correct way to solve this problem?
> What about this solution is avoiding re-introducing the original
> softlockups? Have you talked to the networking guys about this issue?
>
> Thanx, Paul
>
> > Thanks.
> > Ding
> >
> >
> > >> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> > >> Signed-off-by: Ding Tianhong <[email protected]>
> > >>
> > >> Signed-off-by: Ding Tianhong <[email protected]>
> > >> ---
> > >> kernel/rcu/tree_plugin.h | 3 +--
> > >> 1 file changed, 1 insertion(+), 2 deletions(-)
> > >>
> > >> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > >> index 85c5a88..760c3b5 100644
> > >> --- a/kernel/rcu/tree_plugin.h
> > >> +++ b/kernel/rcu/tree_plugin.h
> > >> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> > >> if (__rcu_reclaim(rdp->rsp->name, list))
> > >> cl++;
> > >> c++;
> > >> - local_bh_enable();
> > >> - cond_resched_rcu_qs();
> > >> + _local_bh_enable();
> > >> list = next;
> > >> }
> > >> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> > >> --
> > >> 1.9.0
> > >>
> > >>
> > >>
> > >
> > >
> > > .
> > >
> >

2016-11-21 01:29:26

by Ding Tianhong

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic



On 2016/11/21 8:13, Paul E. McKenney wrote:
> On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote:
>> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote:
>>>
>>>
>>> On 2016/11/18 21:01, Paul E. McKenney wrote:
>>>> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
>>>>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
>>>>> will introduce a new problem that when huge IP abnormal packet arrived,
>>>>> it may cause OOM and break the kernel, just like this:
>>>>>
>>>>> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
>>>>> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
>>>>> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
>>>>> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
>>>>> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
>>>>> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
>>>>> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
>>>>> [ 100.067050] Call Trace:
>>>>> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
>>>>> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
>>>>> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
>>>>> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
>>>>> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
>>>>> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
>>>>> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
>>>>> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
>>>>> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
>>>>> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
>>>>> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
>>>>> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
>>>>> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
>>>>> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
>>>>> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
>>>>> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
>>>>> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
>>>>> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
>>>>> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
>>>>> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
>>>>> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
>>>>> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
>>>>> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
>>>>>
>>>>> ================================cut here=====================================
>>>>>
>>>>> The reason is that the huge abnormal IP packet will be received to net stack
>>>>> and be dropped finally by dst_release, and the dst_release would use the rcuos
>>>>> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
>>>>> calling do_softirq() to receive more and more IP abnormal packets which will be
>>>>> throw into the RCU callbacks again later, the number of received packet is much
>>>>> greater than the number of packets freed, it will exhaust the memory and then OOM,
>>>>> so don't try to process any pending softirqs in the rcuos callback-offload kthread
>>>>> is a more effective solution.
>>>>
>>>> OK, but we could still have softirqs processed by the grace-period kthread
>>>> as a result of any number of other events. So this change might reduce
>>>> the probability of this problem, but it doesn't eliminate it.
>>>>
>>>> How huge are these huge IP packets? Is the underlying problem that they
>>>> are too large to use the memory-allocator fastpaths?
>>>>
>>>> Thanx, Paul
>>>>
>>>
>>> I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and
>>> IP abnormal packet to full speed.
>>>
>>> The Mac abnormal packet would be dropped at low level and not be received to net stack,
>>> but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and
>>> release later by dst_release because it is meaningless.
>>>
>>> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);
>>>
>>> so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop
>>> if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread,
>>> so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater
>>> fastpaths, there is no memory leak and the Ixgbe may meet the same problem too.
>
> And following up on my fastpath point -- from what I can see, one
> big effect of the large invalid packets is that they push processing
> off of a number of fastpaths. If these packets could be rejected with
> less per-packet processing, I bet that things would work much better.
>
> Thanx, Paul

Yes, and I found the WARN_ON_ONCE(!irqs_disabled()) will be triggered if use _local_bh_enable here,
so I think we could ask some help from Eric and David how to reject the huge number packets.

Thanks
Ding

>
>> The overall effect of these two patches is to move from enabling bh
>> (and processing recent softirqs) to enabling bh without processing
>> recent softirqs. Is this really the correct way to solve this problem?
>> What about this solution is avoiding re-introducing the original
>> softlockups? Have you talked to the networking guys about this issue?
>>
>> Thanx, Paul
>>
>>> Thanks.
>>> Ding
>>>
>>>
>>>>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
>>>>> Signed-off-by: Ding Tianhong <[email protected]>
>>>>>
>>>>> Signed-off-by: Ding Tianhong <[email protected]>
>>>>> ---
>>>>> kernel/rcu/tree_plugin.h | 3 +--
>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>> index 85c5a88..760c3b5 100644
>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
>>>>> if (__rcu_reclaim(rdp->rsp->name, list))
>>>>> cl++;
>>>>> c++;
>>>>> - local_bh_enable();
>>>>> - cond_resched_rcu_qs();
>>>>> + _local_bh_enable();
>>>>> list = next;
>>>>> }
>>>>> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
>>>>> --
>>>>> 1.9.0
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>
>
> .
>

2016-11-21 06:53:04

by kernel test robot

[permalink] [raw]
Subject: [lkp] [rcu] 83ee00c6cf: WARNING:at_kernel/softirq.c:#__local_bh_enable


FYI, we noticed the following commit:

https://github.com/0day-ci/linux Ding-Tianhong/rcu-fix-the-OOM-problem-of-huge-IP-abnormal-packet-traffic/20161118-204521
commit 83ee00c6cf5eaa85f74094d6800732edf7114ef9 ("rcu: fix the OOM problem of huge IP abnormal packet traffic")

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -m 320M

caused below changes:


+------------------------------------------------+------------+------------+
| | 68ad1194cf | 83ee00c6cf |
+------------------------------------------------+------------+------------+
| boot_successes | 6 | 0 |
| boot_failures | 0 | 6 |
| WARNING:at_kernel/softirq.c:#__local_bh_enable | 0 | 6 |
| calltrace:_local_bh_enable | 0 | 6 |
+------------------------------------------------+------------+------------+



[ 0.846125] PCI: CLS 0 bytes, default 64
[ 0.847479] Unpacking initramfs...
[ 0.849690] ------------[ cut here ]------------
[ 0.850615] WARNING: CPU: 0 PID: 9 at kernel/softirq.c:140 __local_bh_enable+0x35/0x41
[ 0.852518] Modules linked in:
[ 0.853178] CPU: 0 PID: 9 Comm: rcuos/0 Not tainted 4.9.0-rc1-00041-g83ee00c #1
[ 0.854630] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[ 0.856628] ffff8800002f7c70 ffffffff81267760 0000000000000000 ffffffff81837340
[ 0.858185] ffff8800002f7cb0 ffffffff81060e07 0000008c00000009 0000000000000200
[ 0.859742] ffff8800002f7dc8 ffff88000f808bb0 0000000000000001 ffff88000f808bb0
[ 0.861293] Call Trace:
[ 0.861795] [<ffffffff81267760>] dump_stack+0x61/0x7e
[ 0.862809] [<ffffffff81060e07>] __warn+0xf5/0x110
[ 0.863870] [<ffffffff81060f85>] warn_slowpath_null+0x18/0x1a
[ 0.865020] [<ffffffff81066b09>] __local_bh_enable+0x35/0x41
[ 0.866143] [<ffffffff81066b52>] _local_bh_enable+0x3d/0x3f
[ 0.867252] [<ffffffff810dbf14>] rcu_nocb_kthread+0x69b/0x6f2
[ 0.868393] [<ffffffff811cbe81>] ? __d_free_external+0x3f/0x3f
[ 0.869554] [<ffffffff810db879>] ? note_gp_changes+0xcd/0xcd
[ 0.870679] [<ffffffff815ff413>] ? __schedule+0x5fc/0x73c
[ 0.871755] [<ffffffff810db879>] ? note_gp_changes+0xcd/0xcd
[ 0.872980] [<ffffffff810898c7>] kthread+0x191/0x1a0
[ 0.873971] [<ffffffff81089736>] ? kthread_park+0x5d/0x5d
[ 0.875059] [<ffffffff8108fd54>] ? finish_task_switch+0x1e4/0x2a0
[ 0.876262] [<ffffffff81089736>] ? kthread_park+0x5d/0x5d
[ 0.877331] [<ffffffff81089736>] ? kthread_park+0x5d/0x5d
[ 0.878401] [<ffffffff81606a45>] ret_from_fork+0x25/0x30
[ 0.879484] ---[ end trace 825c5dbf85ebfadd ]---
[ 0.899723] workqueue: round-robin CPU selection forced, expect performance impact
[ 2.115863] Freeing initrd memory: 9088K (ffff880013700000 - ffff880013fe0000)


To reproduce:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (3.11 kB)
config-4.9.0-rc1-00041-g83ee00c (81.43 kB)
job-script (3.95 kB)
dmesg.xz (10.14 kB)
Download all attachments

2016-12-28 05:58:49

by Ding Tianhong

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic

Hi, Paul:

I try to debug this problem and found this solution could work well for both problem scene.


diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 85c5a88..dbc14a7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2172,7 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
if (__rcu_reclaim(rdp->rsp->name, list))
cl++;
c++;
- local_bh_enable();
+ _local_bh_enable();
cond_resched_rcu_qs();
list = next;
}


The cond_resched_rcu_qs() would process the softirq if the softirq is pending, so no need to use
local_bh_enable() to process the softirq twice here, and it will avoid OOM when huge packets arrives,
what do you think about it? Please give me some suggestion.

Thanks.
Ding

On 2016/11/21 9:28, Ding Tianhong wrote:
>
>
> On 2016/11/21 8:13, Paul E. McKenney wrote:
>> On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote:
>>> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote:
>>>>
>>>>
>>>> On 2016/11/18 21:01, Paul E. McKenney wrote:
>>>>> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
>>>>>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
>>>>>> will introduce a new problem that when huge IP abnormal packet arrived,
>>>>>> it may cause OOM and break the kernel, just like this:
>>>>>>
>>>>>> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
>>>>>> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
>>>>>> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
>>>>>> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
>>>>>> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
>>>>>> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
>>>>>> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
>>>>>> [ 100.067050] Call Trace:
>>>>>> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
>>>>>> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
>>>>>> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
>>>>>> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
>>>>>> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
>>>>>> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
>>>>>> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
>>>>>> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
>>>>>> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
>>>>>> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
>>>>>> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
>>>>>> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
>>>>>> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
>>>>>> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
>>>>>> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
>>>>>> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
>>>>>> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
>>>>>> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
>>>>>> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
>>>>>> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
>>>>>> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
>>>>>> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
>>>>>> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
>>>>>>
>>>>>> ================================cut here=====================================
>>>>>>
>>>>>> The reason is that the huge abnormal IP packet will be received to net stack
>>>>>> and be dropped finally by dst_release, and the dst_release would use the rcuos
>>>>>> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
>>>>>> calling do_softirq() to receive more and more IP abnormal packets which will be
>>>>>> throw into the RCU callbacks again later, the number of received packet is much
>>>>>> greater than the number of packets freed, it will exhaust the memory and then OOM,
>>>>>> so don't try to process any pending softirqs in the rcuos callback-offload kthread
>>>>>> is a more effective solution.
>>>>>
>>>>> OK, but we could still have softirqs processed by the grace-period kthread
>>>>> as a result of any number of other events. So this change might reduce
>>>>> the probability of this problem, but it doesn't eliminate it.
>>>>>
>>>>> How huge are these huge IP packets? Is the underlying problem that they
>>>>> are too large to use the memory-allocator fastpaths?
>>>>>
>>>>> Thanx, Paul
>>>>>
>>>>
>>>> I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and
>>>> IP abnormal packet to full speed.
>>>>
>>>> The Mac abnormal packet would be dropped at low level and not be received to net stack,
>>>> but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and
>>>> release later by dst_release because it is meaningless.
>>>>
>>>> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);
>>>>
>>>> so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop
>>>> if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread,
>>>> so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater
>>>> fastpaths, there is no memory leak and the Ixgbe may meet the same problem too.
>>
>> And following up on my fastpath point -- from what I can see, one
>> big effect of the large invalid packets is that they push processing
>> off of a number of fastpaths. If these packets could be rejected with
>> less per-packet processing, I bet that things would work much better.
>>
>> Thanx, Paul
>
> Yes, and I found the WARN_ON_ONCE(!irqs_disabled()) will be triggered if use _local_bh_enable here,
> so I think we could ask some help from Eric and David how to reject the huge number packets.
>
> Thanks
> Ding
>
>>
>>> The overall effect of these two patches is to move from enabling bh
>>> (and processing recent softirqs) to enabling bh without processing
>>> recent softirqs. Is this really the correct way to solve this problem?
>>> What about this solution is avoiding re-introducing the original
>>> softlockups? Have you talked to the networking guys about this issue?
>>>
>>> Thanx, Paul
>>>
>>>> Thanks.
>>>> Ding
>>>>
>>>>
>>>>>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
>>>>>> Signed-off-by: Ding Tianhong <[email protected]>
>>>>>>
>>>>>> Signed-off-by: Ding Tianhong <[email protected]>
>>>>>> ---
>>>>>> kernel/rcu/tree_plugin.h | 3 +--
>>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>>>>> index 85c5a88..760c3b5 100644
>>>>>> --- a/kernel/rcu/tree_plugin.h
>>>>>> +++ b/kernel/rcu/tree_plugin.h
>>>>>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
>>>>>> if (__rcu_reclaim(rdp->rsp->name, list))
>>>>>> cl++;
>>>>>> c++;
>>>>>> - local_bh_enable();
>>>>>> - cond_resched_rcu_qs();
>>>>>> + _local_bh_enable();
>>>>>> list = next;
>>>>>> }
>>>>>> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
>>>>>> --
>>>>>> 1.9.0
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>
>>
>> .
>>

2016-12-29 00:14:09

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] rcu: fix the OOM problem of huge IP abnormal packet traffic

On Wed, Dec 28, 2016 at 01:58:06PM +0800, Ding Tianhong wrote:
> Hi, Paul:
>
> I try to debug this problem and found this solution could work well for both problem scene.
>
>
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 85c5a88..dbc14a7 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -2172,7 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> if (__rcu_reclaim(rdp->rsp->name, list))
> cl++;
> c++;
> - local_bh_enable();
> + _local_bh_enable();
> cond_resched_rcu_qs();
> list = next;
> }
>
>
> The cond_resched_rcu_qs() would process the softirq if the softirq is pending, so no need to use
> local_bh_enable() to process the softirq twice here, and it will avoid OOM when huge packets arrives,
> what do you think about it? Please give me some suggestion.

>From what I can see, there is absolutely no guarantee that
cond_resched_rcu_qs() will do local_bh_enable(), and thus no guarantee
that it will process any pending softirqs -- and that is not part of
its job in any case. So I cannot recommend the above patch.

On efficient handling of large invalid packets (that is still the issue,
right?), I must defer to Dave and Eric.

Thanx, Paul

> Thanks.
> Ding
>
> On 2016/11/21 9:28, Ding Tianhong wrote:
> >
> >
> > On 2016/11/21 8:13, Paul E. McKenney wrote:
> >> On Sat, Nov 19, 2016 at 12:22:09AM -0800, Paul E. McKenney wrote:
> >>> On Sat, Nov 19, 2016 at 03:50:32PM +0800, Ding Tianhong wrote:
> >>>>
> >>>>
> >>>> On 2016/11/18 21:01, Paul E. McKenney wrote:
> >>>>> On Fri, Nov 18, 2016 at 08:40:09PM +0800, Ding Tianhong wrote:
> >>>>>> The commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> >>>>>> will introduce a new problem that when huge IP abnormal packet arrived,
> >>>>>> it may cause OOM and break the kernel, just like this:
> >>>>>>
> >>>>>> [ 79.441538] mlx4_en: eth5: Leaving promiscuous mode steering mode:2
> >>>>>> [ 100.067032] ksoftirqd/0: page allocation failure: order:0, mode:0x120
> >>>>>> [ 100.067038] CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G OE ----V------- 3.10.0-327.28.3.28.x86_64 #1
> >>>>>> [ 100.067039] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-20161018_184732-HGH1000003483 04/01/2014
> >>>>>> [ 100.067041] 0000000000000120 00000000b080d798 ffff8802afd5b968 ffffffff81638cb9
> >>>>>> [ 100.067045] ffff8802afd5b9f8 ffffffff81171380 0000000000000010 0000000000000000
> >>>>>> [ 100.067048] ffff8802befd8000 00000000ffffffff 0000000000000001 00000000b080d798
> >>>>>> [ 100.067050] Call Trace:
> >>>>>> [ 100.067057] [<ffffffff81638cb9>] dump_stack+0x19/0x1b
> >>>>>> [ 100.067062] [<ffffffff81171380>] warn_alloc_failed+0x110/0x180
> >>>>>> [ 100.067066] [<ffffffff81175b16>] __alloc_pages_nodemask+0x9b6/0xba0
> >>>>>> [ 100.067070] [<ffffffff8151e400>] ? skb_add_rx_frag+0x90/0xb0
> >>>>>> [ 100.067075] [<ffffffff811b6fba>] alloc_pages_current+0xaa/0x170
> >>>>>> [ 100.067080] [<ffffffffa06b9be0>] mlx4_alloc_pages.isra.24+0x40/0x170 [mlx4_en]
> >>>>>> [ 100.067083] [<ffffffffa06b9dec>] mlx4_en_alloc_frags+0xdc/0x220 [mlx4_en]
> >>>>>> [ 100.067086] [<ffffffff8152eeb8>] ? __netif_receive_skb+0x18/0x60
> >>>>>> [ 100.067088] [<ffffffff8152ef40>] ? netif_receive_skb+0x40/0xc0
> >>>>>> [ 100.067092] [<ffffffffa06bb521>] mlx4_en_process_rx_cq+0x5f1/0xec0 [mlx4_en]
> >>>>>> [ 100.067095] [<ffffffff8131027d>] ? list_del+0xd/0x30
> >>>>>> [ 100.067098] [<ffffffff8152c90f>] ? __napi_complete+0x1f/0x30
> >>>>>> [ 100.067101] [<ffffffffa06bbeef>] mlx4_en_poll_rx_cq+0x9f/0x170 [mlx4_en]
> >>>>>> [ 100.067103] [<ffffffff8152f372>] net_rx_action+0x152/0x240
> >>>>>> [ 100.067107] [<ffffffff81084d1f>] __do_softirq+0xef/0x280
> >>>>>> [ 100.067109] [<ffffffff81084ee0>] run_ksoftirqd+0x30/0x50
> >>>>>> [ 100.067114] [<ffffffff810ae93f>] smpboot_thread_fn+0xff/0x1a0
> >>>>>> [ 100.067117] [<ffffffff8163e269>] ? schedule+0x29/0x70
> >>>>>> [ 100.067120] [<ffffffff810ae840>] ? lg_double_unlock+0x90/0x90
> >>>>>> [ 100.067122] [<ffffffff810a5d4f>] kthread+0xcf/0xe0
> >>>>>> [ 100.067124] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> >>>>>> [ 100.067127] [<ffffffff81649198>] ret_from_fork+0x58/0x90
> >>>>>> [ 100.067129] [<ffffffff810a5c80>] ? kthread_create_on_node+0x140/0x140
> >>>>>>
> >>>>>> ================================cut here=====================================
> >>>>>>
> >>>>>> The reason is that the huge abnormal IP packet will be received to net stack
> >>>>>> and be dropped finally by dst_release, and the dst_release would use the rcuos
> >>>>>> callback-offload kthread to free the packet, but the cond_resched_rcu_qs() will
> >>>>>> calling do_softirq() to receive more and more IP abnormal packets which will be
> >>>>>> throw into the RCU callbacks again later, the number of received packet is much
> >>>>>> greater than the number of packets freed, it will exhaust the memory and then OOM,
> >>>>>> so don't try to process any pending softirqs in the rcuos callback-offload kthread
> >>>>>> is a more effective solution.
> >>>>>
> >>>>> OK, but we could still have softirqs processed by the grace-period kthread
> >>>>> as a result of any number of other events. So this change might reduce
> >>>>> the probability of this problem, but it doesn't eliminate it.
> >>>>>
> >>>>> How huge are these huge IP packets? Is the underlying problem that they
> >>>>> are too large to use the memory-allocator fastpaths?
> >>>>>
> >>>>> Thanx, Paul
> >>>>>
> >>>>
> >>>> I use the 40G mellanox NiC to receive packet, and the testgine could send Mac abnormal packet and
> >>>> IP abnormal packet to full speed.
> >>>>
> >>>> The Mac abnormal packet would be dropped at low level and not be received to net stack,
> >>>> but the IP abnormal packet will introduce this problem, every packet will looks as new dst first and
> >>>> release later by dst_release because it is meaningless.
> >>>>
> >>>> dst_release->call_rcu(&dst->rcu_head, dst_destroy_rcu);
> >>>>
> >>>> so all packet will be freed until the rcuos callback-offload kthread processing, it will be a infinite loop
> >>>> if huge packet is coming because the do_softirq will load more and more packet to the rcuos processing kthread,
> >>>> so I still could not find a better way to fix this, btw, it is really hard to say the driver use too large memory-allocater
> >>>> fastpaths, there is no memory leak and the Ixgbe may meet the same problem too.
> >>
> >> And following up on my fastpath point -- from what I can see, one
> >> big effect of the large invalid packets is that they push processing
> >> off of a number of fastpaths. If these packets could be rejected with
> >> less per-packet processing, I bet that things would work much better.
> >>
> >> Thanx, Paul
> >
> > Yes, and I found the WARN_ON_ONCE(!irqs_disabled()) will be triggered if use _local_bh_enable here,
> > so I think we could ask some help from Eric and David how to reject the huge number packets.
> >
> > Thanks
> > Ding
> >
> >>
> >>> The overall effect of these two patches is to move from enabling bh
> >>> (and processing recent softirqs) to enabling bh without processing
> >>> recent softirqs. Is this really the correct way to solve this problem?
> >>> What about this solution is avoiding re-introducing the original
> >>> softlockups? Have you talked to the networking guys about this issue?
> >>>
> >>> Thanx, Paul
> >>>
> >>>> Thanks.
> >>>> Ding
> >>>>
> >>>>
> >>>>>> Fix commit bedc196915 ("rcu: Fix soft lockup for rcu_nocb_kthread")
> >>>>>> Signed-off-by: Ding Tianhong <[email protected]>
> >>>>>>
> >>>>>> Signed-off-by: Ding Tianhong <[email protected]>
> >>>>>> ---
> >>>>>> kernel/rcu/tree_plugin.h | 3 +--
> >>>>>> 1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>>>>> index 85c5a88..760c3b5 100644
> >>>>>> --- a/kernel/rcu/tree_plugin.h
> >>>>>> +++ b/kernel/rcu/tree_plugin.h
> >>>>>> @@ -2172,8 +2172,7 @@ static int rcu_nocb_kthread(void *arg)
> >>>>>> if (__rcu_reclaim(rdp->rsp->name, list))
> >>>>>> cl++;
> >>>>>> c++;
> >>>>>> - local_bh_enable();
> >>>>>> - cond_resched_rcu_qs();
> >>>>>> + _local_bh_enable();
> >>>>>> list = next;
> >>>>>> }
> >>>>>> trace_rcu_batch_end(rdp->rsp->name, c, !!list, 0, 0, 1);
> >>>>>> --
> >>>>>> 1.9.0
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> .
> >>>>>
> >>>>
> >>
> >>
> >> .
> >>
>