2019-10-23 00:38:30

by Tom Rix

[permalink] [raw]
Subject: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

On PREEMPT_RT_FULL while running netperf, a corruption
of the skb queue causes an oops.

This appears to be caused by a race condition here
__skb_queue_tail(&trans->queue, skb);
tasklet_schedule(&trans->tasklet);
Where the queue is changed before the tasklet is locked by
tasklet_schedule.

The fix is to use the skb queue lock.

This is the original work of Joerg Vehlow <[email protected]>
https://lkml.org/lkml/2019/9/9/111
xfrm_input: Protect queue with lock

During the skb_queue_splice_init the tasklet could have been preempted
and __skb_queue_tail called, which led to an inconsistent queue.

ifdefs for CONFIG_PREEMPT_RT_FULL added to reduce runtime effects
on the normal kernel.

Signed-off-by: Tom Rix <[email protected]>
---
net/xfrm/xfrm_input.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 9b599ed66d97..decd515f84cf 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -755,13 +755,21 @@ EXPORT_SYMBOL(xfrm_input_resume);

static void xfrm_trans_reinject(unsigned long data)
{
+#ifdef CONFIG_PREEMPT_RT_FULL
+ unsigned long flags;
+#endif
struct xfrm_trans_tasklet *trans = (void *)data;
struct sk_buff_head queue;
struct sk_buff *skb;

__skb_queue_head_init(&queue);
+#ifdef CONFIG_PREEMPT_RT_FULL
+ spin_lock_irqsave(&trans->queue.lock, flags);
+#endif
skb_queue_splice_init(&trans->queue, &queue);
-
+#ifdef CONFIG_PREEMPT_RT_FULL
+ spin_unlock_irqrestore(&trans->queue.lock, flags);
+#endif
while ((skb = __skb_dequeue(&queue)))
XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
}
@@ -778,7 +786,11 @@ int xfrm_trans_queue(struct sk_buff *skb,
return -ENOBUFS;

XFRM_TRANS_SKB_CB(skb)->finish = finish;
+#ifdef CONFIG_PREEMPT_RT_FULL
+ skb_queue_tail(&trans->queue, skb);
+#else
__skb_queue_tail(&trans->queue, skb);
+#endif
tasklet_schedule(&trans->tasklet);
return 0;
}
@@ -798,7 +810,11 @@ void __init xfrm_input_init(void)
struct xfrm_trans_tasklet *trans;

trans = &per_cpu(xfrm_trans_tasklet, i);
+#ifdef CONFIG_PREEMPT_RT_FULL
+ skb_queue_head_init(&trans->queue);
+#else
__skb_queue_head_init(&trans->queue);
+#endif
tasklet_init(&trans->tasklet, xfrm_trans_reinject,
(unsigned long)trans);
}
--
2.23.0


2019-10-25 09:33:21

by Steffen Klassert

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

On Tue, Oct 22, 2019 at 05:22:04PM -0700, Tom Rix wrote:
> On PREEMPT_RT_FULL while running netperf, a corruption
> of the skb queue causes an oops.
>
> This appears to be caused by a race condition here
> __skb_queue_tail(&trans->queue, skb);
> tasklet_schedule(&trans->tasklet);
> Where the queue is changed before the tasklet is locked by
> tasklet_schedule.
>
> The fix is to use the skb queue lock.
>
> This is the original work of Joerg Vehlow <[email protected]>
> https://lkml.org/lkml/2019/9/9/111
> xfrm_input: Protect queue with lock
>
> During the skb_queue_splice_init the tasklet could have been preempted
> and __skb_queue_tail called, which led to an inconsistent queue.
>
> ifdefs for CONFIG_PREEMPT_RT_FULL added to reduce runtime effects
> on the normal kernel.

Has Herbert commented on your initial patch, please
fix PREEMPT_RT_FULL instead. There are certainly many
more codepaths that take such assumptions. You can not
fix this by distributing a spin_lock_irqsave here
and there.

2019-10-25 19:34:49

by Joerg Vehlow

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

Hi,

I always expected this to be applied to the RT patches. That's why
I originally send my patch to to Sebastian, Thomas and Steven (I added
them again now. The website of the rt patches says patches for the
CONFIG_REEMPT_RT patchset should be send to lkml.

I hope one of the rt patch maintainers will reply here.

Jörg

Am 24.10.2019 um 12:31 schrieb Steffen Klassert:
> On Tue, Oct 22, 2019 at 05:22:04PM -0700, Tom Rix wrote:
>> On PREEMPT_RT_FULL while running netperf, a corruption
>> of the skb queue causes an oops.
>>
>> This appears to be caused by a race condition here
>> __skb_queue_tail(&trans->queue, skb);
>> tasklet_schedule(&trans->tasklet);
>> Where the queue is changed before the tasklet is locked by
>> tasklet_schedule.
>>
>> The fix is to use the skb queue lock.
>>
>> This is the original work of Joerg Vehlow <[email protected]>
>> https://lkml.org/lkml/2019/9/9/111
>> xfrm_input: Protect queue with lock
>>
>> During the skb_queue_splice_init the tasklet could have been preempted
>> and __skb_queue_tail called, which led to an inconsistent queue.
>>
>> ifdefs for CONFIG_PREEMPT_RT_FULL added to reduce runtime effects
>> on the normal kernel.
> Has Herbert commented on your initial patch, please
> fix PREEMPT_RT_FULL instead. There are certainly many
> more codepaths that take such assumptions. You can not
> fix this by distributing a spin_lock_irqsave here
> and there.
>

Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

On 2019-10-25 11:37:59 [+0200], Joerg Vehlow wrote:
> Hi,
>
> I always expected this to be applied to the RT patches. That's why
> I originally send my patch to to Sebastian, Thomas and Steven (I added
> them again now. The website of the rt patches says patches for the
> CONFIG_REEMPT_RT patchset should be send to lkml.
>
> I hope one of the rt patch maintainers will reply here.

I've seen the first patch and it was not mentioned that it was RT
related so I did not pay any attention to it.
Please repost your v2, please add RT next to patch, please state the RT
version and the actual problem and I take a look.

> Jörg

Sebastian

2019-10-25 19:41:11

by Joerg Vehlow

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

Now that I look back at my mail, you are right, I did not say anything
about rt, my bad. But maybe you could add too the rt patches website, that
an RT tag has to be added.

@Tom Rix, will you resend the patch? You may also add the information,
that I found the bug running the ipsec_stress ltp tests. Generating any
ipsec traffic (maybe concurrent) should be sufficient.

Here is one of the oops logs I still have:

[  139.717259] BUG: unable to handle kernel NULL pointer dereference at
0000000000000518
[  139.717260] PGD 0 P4D 0
[  139.717262] Oops: 0000 [#1] PREEMPT SMP PTI
[  139.717273] CPU: 2 PID: 11987 Comm: netstress Not tainted
4.19.59-rt24-preemt-rt #1
[  139.717274] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[  139.717306] RIP: 0010:xfrm_trans_reinject+0x97/0xd0
[  139.717307] Code: 42 eb 45 83 6d b0 01 31 f6 48 8b 42 08 48 c7 42 08
00 00 00 00 48 8b 0a 48 c7 02 00 00 00 00 48 89 41 08 48 89 08 48 8b 42
10 <48> 8b b8 18 05 00 00 48 8b 42 40 e8 d9 e1 4b 00 48 8b 55 a0 48 39
[  139.717307] RSP: 0018:ffffc900007b37e8 EFLAGS: 00010246
[  139.717308] RAX: 0000000000000000 RBX: ffffc900007b37e8 RCX:
ffff88807db206a8
[  139.717309] RDX: ffff88807db206a8 RSI: 0000000000000000 RDI:
0000000000000000
[  139.717309] RBP: ffffc900007b3848 R08: 0000000000000001 R09:
ffffc900007b35c8
[  139.717309] R10: ffffea0001dcfc00 R11: 00000000000890c4 R12:
ffff88807db20680
[  139.717310] R13: 00000000000f4240 R14: 0000000000000000 R15:
0000000000000000
[  139.717310] FS:  00007f4643034700(0000) GS:ffff88807db00000(0000)
knlGS:0000000000000000
[  139.717311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  139.717337] CR2: 0000000000000518 CR3: 00000000769c6000 CR4:
00000000000006e0
[  139.717350] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  139.717350] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  139.717350] Call Trace:
[  139.717387]  tasklet_action_common.isra.18+0x6d/0xd0
[  139.717388]  tasklet_action+0x1d/0x20
[  139.717389]  do_current_softirqs+0x196/0x360
[  139.717390]  __local_bh_enable+0x51/0x60
[  139.717397]  ip_finish_output2+0x18b/0x3f0
[  139.717408]  ? task_rq_lock+0x53/0xe0
[  139.717415]  ip_finish_output+0xbe/0x1b0
[  139.717416]  ip_output+0x72/0x100
[  139.717422]  ? ipcomp_output+0x5e/0x280
[  139.717424]  xfrm_output_resume+0x4b5/0x540
[  139.717436]  ? refcount_dec_and_test_checked+0x11/0x20
[  139.717443]  ? kfree_skbmem+0x33/0x80
[  139.717444]  xfrm_output+0xd7/0x110
[  139.717451]  xfrm4_output_finish+0x2b/0x30
[  139.717452]  __xfrm4_output+0x3a/0x50
[  139.717453]  xfrm4_output+0x40/0xe0
[  139.717454]  ? xfrm_dst_check+0x174/0x250
[  139.717455]  ? xfrm4_output+0x40/0xe0
[  139.717456]  ? xfrm_dst_check+0x174/0x250
[  139.717457]  ip_local_out+0x3b/0x50
[  139.717458]  __ip_queue_xmit+0x16b/0x420
[  139.717464]  ip_queue_xmit+0x10/0x20
[  139.717466]  __tcp_transmit_skb+0x566/0xad0
[  139.717467]  tcp_write_xmit+0x3a4/0x1050
[  139.717468]  __tcp_push_pending_frames+0x35/0xe0
[  139.717469]  tcp_push+0xdb/0x100
[  139.717469]  tcp_sendmsg_locked+0x491/0xd70
[  139.717470]  tcp_sendmsg+0x2c/0x50
[  139.717476]  inet_sendmsg+0x3e/0xf0
[  139.717483]  sock_sendmsg+0x3e/0x50
[  139.717484]  __sys_sendto+0x114/0x1a0
[  139.717491]  ? __rt_mutex_unlock+0xe/0x10
[  139.717492]  ? _mutex_unlock+0xe/0x10
[  139.717500]  ? ksys_write+0xc5/0xe0
[  139.717501]  __x64_sys_sendto+0x28/0x30
[  139.717503]  do_syscall_64+0x4d/0x110
[  139.717504]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Am 25.10.2019 um 11:47 schrieb Sebastian Andrzej Siewior:
> On 2019-10-25 11:37:59 [+0200], Joerg Vehlow wrote:
>> Hi,
>>
>> I always expected this to be applied to the RT patches. That's why
>> I originally send my patch to to Sebastian, Thomas and Steven (I added
>> them again now. The website of the rt patches says patches for the
>> CONFIG_REEMPT_RT patchset should be send to lkml.
>>
>> I hope one of the rt patch maintainers will reply here.
> I've seen the first patch and it was not mentioned that it was RT
> related so I did not pay any attention to it.
> Please repost your v2, please add RT next to patch, please state the RT
> version and the actual problem and I take a look.
>
>> Jörg
> Sebastian

2019-10-25 20:36:30

by Tom Rix

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

I will work on refactoring the patch to v5.2-rt.
trix

On Fri, Oct 25, 2019 at 3:22 AM Sebastian Andrzej Siewior
<[email protected]> wrote:
>
> On 2019-10-25 12:14:59 [+0200], Joerg Vehlow wrote:
> > Here is one of the oops logs I still have:
> >
> > [ 139.717273] CPU: 2 PID: 11987 Comm: netstress Not tainted
> > 4.19.59-rt24-preemt-rt #1
>
> could you retry with the latest v5.2-RT, please? qemu should boot fine…
>
> Sebastian

Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

On 2019-10-25 07:26:49 [-0700], Tom Rix wrote:
> I will work on refactoring the patch to v5.2-rt.

please verify first that the issue still exists.

> trix

Sebastian

Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

On 2019-10-25 12:14:59 [+0200], Joerg Vehlow wrote:
> Here is one of the oops logs I still have:
>
> [  139.717273] CPU: 2 PID: 11987 Comm: netstress Not tainted
> 4.19.59-rt24-preemt-rt #1

could you retry with the latest v5.2-RT, please? qemu should boot fine…

Sebastian

Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

On 2019-10-28 11:44:57 [+0100], Joerg Vehlow wrote:
> I was unable to reproduce it with 5.2.21-rt13. Do you know if something
> changed in network scheduling code or could it be just less likely?

the softirq/BH handling has been rewritten in the v5.0-RT cycle,
v5.0.19-rt11 to be exact. So if that the cause for it (which I hope)
then you should be able to trigger the problem before that release and
not later.

Sebastian

2019-10-29 01:28:39

by Joerg Vehlow

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue



Am 25.10.2019 um 12:22 schrieb Sebastian Andrzej Siewior:
> On 2019-10-25 12:14:59 [+0200], Joerg Vehlow wrote:
>> Here is one of the oops logs I still have:
>>
>> [  139.717273] CPU: 2 PID: 11987 Comm: netstress Not tainted
>> 4.19.59-rt24-preemt-rt #1
> could you retry with the latest v5.2-RT, please? qemu should boot fine…
>
> Sebastian
I was unable to reproduce it with 5.2.21-rt13. Do you know if something
changed in network scheduling code or could it be just less likely?

2019-10-29 09:57:18

by Joerg Vehlow

[permalink] [raw]
Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue


> On 2019-10-28 11:44:57 [+0100], Joerg Vehlow wrote:
>> I was unable to reproduce it with 5.2.21-rt13. Do you know if something
>> changed in network scheduling code or could it be just less likely?
> the softirq/BH handling has been rewritten in the v5.0-RT cycle,
> v5.0.19-rt11 to be exact. So if that the cause for it (which I hope)
> then you should be able to trigger the problem before that release and
> not later.
>
I testes again with 5.0.19-rt10 and 5.0.19-rt11  and I am pretty sure
the bug wasfixed by the changes in rt11. I was able to reproduce it with
rt10 within secondsand unable to reproduce it at all in several minutes on
rt11. Will 4.19 rt patches receive anymore updates? Is it possible to
backport
the changes to softirq/BH habdling from 5.0.16-rt11 to 4.19?

Subject: Re: [PATCH v2 1/1] xfrm : lock input tasklet skb queue

On 2019-10-29 08:33:01 [+0100], Joerg Vehlow wrote:
> I testes again with 5.0.19-rt10 and 5.0.19-rt11  and I am pretty sure
> the bug wasfixed by the changes in rt11. I was able to reproduce it with
> rt10 within secondsand unable to reproduce it at all in several minutes on
> rt11. Will 4.19 rt patches receive anymore updates? Is it possible to
> backport
> the changes to softirq/BH habdling from 5.0.16-rt11 to 4.19?

Oh, sorry, I almost forgot about it.
Please repost the patch, state "RT" next to "PATCH" in the subject line.
Please state which kernels are affected (v4.19 and earlier due to BH
rework) and how this can be tested.
The patch/fix should be included in the affected kernels. The softirq
changes are very intrusive and can not be backported for the RT-stable
kernels.

Sebastian