2019-10-20 15:46:53

by Tom Rix

[permalink] [raw]
Subject: [PATCH] xfrm : lock input tasklet skb queue

On PREEMPT_RT_FULL while running netperf, a corruption
of the skb queue causes an oops.

This appears to be caused by a race condition here
__skb_queue_tail(&trans->queue, skb);
tasklet_schedule(&trans->tasklet);
Where the queue is changed before the tasklet is locked by
tasklet_schedule.

The fix is to use the skb queue lock.

Signed-off-by: Tom Rix <[email protected]>
---
net/xfrm/xfrm_input.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 9b599ed66d97..226dead86828 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -758,12 +758,16 @@ static void xfrm_trans_reinject(unsigned long data)
struct xfrm_trans_tasklet *trans = (void *)data;
struct sk_buff_head queue;
struct sk_buff *skb;
+ unsigned long flags;

__skb_queue_head_init(&queue);
+ spin_lock_irqsave(&trans->queue.lock, flags);
skb_queue_splice_init(&trans->queue, &queue);

while ((skb = __skb_dequeue(&queue)))
XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
+
+ spin_unlock_irqrestore(&trans->queue.lock, flags);
}

int xfrm_trans_queue(struct sk_buff *skb,
@@ -771,15 +775,20 @@ int xfrm_trans_queue(struct sk_buff *skb,
struct sk_buff *))
{
struct xfrm_trans_tasklet *trans;
+ unsigned long flags;

trans = this_cpu_ptr(&xfrm_trans_tasklet);
+ spin_lock_irqsave(&trans->queue.lock, flags);

- if (skb_queue_len(&trans->queue) >= netdev_max_backlog)
+ if (skb_queue_len(&trans->queue) >= netdev_max_backlog) {
+ spin_unlock_irqrestore(&trans->queue.lock, flags);
return -ENOBUFS;
+ }

XFRM_TRANS_SKB_CB(skb)->finish = finish;
__skb_queue_tail(&trans->queue, skb);
tasklet_schedule(&trans->tasklet);
+ spin_unlock_irqrestore(&trans->queue.lock, flags);
return 0;
}
EXPORT_SYMBOL(xfrm_trans_queue);
--
2.23.0


2019-10-21 08:39:55

by Steffen Klassert

[permalink] [raw]
Subject: Re: [PATCH] xfrm : lock input tasklet skb queue

On Sun, Oct 20, 2019 at 08:46:10AM -0700, Tom Rix wrote:
> On PREEMPT_RT_FULL while running netperf, a corruption
> of the skb queue causes an oops.
>
> This appears to be caused by a race condition here
> __skb_queue_tail(&trans->queue, skb);
> tasklet_schedule(&trans->tasklet);
> Where the queue is changed before the tasklet is locked by
> tasklet_schedule.
>
> The fix is to use the skb queue lock.
>
> Signed-off-by: Tom Rix <[email protected]>
> ---
> net/xfrm/xfrm_input.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 9b599ed66d97..226dead86828 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -758,12 +758,16 @@ static void xfrm_trans_reinject(unsigned long data)
> struct xfrm_trans_tasklet *trans = (void *)data;
> struct sk_buff_head queue;
> struct sk_buff *skb;
> + unsigned long flags;
>
> __skb_queue_head_init(&queue);
> + spin_lock_irqsave(&trans->queue.lock, flags);
> skb_queue_splice_init(&trans->queue, &queue);
>
> while ((skb = __skb_dequeue(&queue)))
> XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
> +
> + spin_unlock_irqrestore(&trans->queue.lock, flags);
> }
>
> int xfrm_trans_queue(struct sk_buff *skb,
> @@ -771,15 +775,20 @@ int xfrm_trans_queue(struct sk_buff *skb,
> struct sk_buff *))
> {
> struct xfrm_trans_tasklet *trans;
> + unsigned long flags;
>
> trans = this_cpu_ptr(&xfrm_trans_tasklet);
> + spin_lock_irqsave(&trans->queue.lock, flags);

As you can see above 'trans' is per cpu, so a spinlock
is not needed here. Also this does not run in hard
interrupt context, so irqsave is also not needed.
I don't see how this can fix anything.

Can you please explain that race a bit more detailed?

2019-10-21 16:31:59

by Tom Rix

[permalink] [raw]
Subject: Re: [PATCH] xfrm : lock input tasklet skb queue

When preempt rt is full, softirq and interrupts run in kthreads. So it
is possible for the tasklet to sleep and for its queue to get modified
while it sleeps.

On Mon, Oct 21, 2019 at 1:37 AM Steffen Klassert
<[email protected]> wrote:
>
> On Sun, Oct 20, 2019 at 08:46:10AM -0700, Tom Rix wrote:
> > On PREEMPT_RT_FULL while running netperf, a corruption
> > of the skb queue causes an oops.
> >
> > This appears to be caused by a race condition here
> > __skb_queue_tail(&trans->queue, skb);
> > tasklet_schedule(&trans->tasklet);
> > Where the queue is changed before the tasklet is locked by
> > tasklet_schedule.
> >
> > The fix is to use the skb queue lock.
> >
> > Signed-off-by: Tom Rix <[email protected]>
> > ---
> > net/xfrm/xfrm_input.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> > index 9b599ed66d97..226dead86828 100644
> > --- a/net/xfrm/xfrm_input.c
> > +++ b/net/xfrm/xfrm_input.c
> > @@ -758,12 +758,16 @@ static void xfrm_trans_reinject(unsigned long data)
> > struct xfrm_trans_tasklet *trans = (void *)data;
> > struct sk_buff_head queue;
> > struct sk_buff *skb;
> > + unsigned long flags;
> >
> > __skb_queue_head_init(&queue);
> > + spin_lock_irqsave(&trans->queue.lock, flags);
> > skb_queue_splice_init(&trans->queue, &queue);
> >
> > while ((skb = __skb_dequeue(&queue)))
> > XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
> > +
> > + spin_unlock_irqrestore(&trans->queue.lock, flags);
> > }
> >
> > int xfrm_trans_queue(struct sk_buff *skb,
> > @@ -771,15 +775,20 @@ int xfrm_trans_queue(struct sk_buff *skb,
> > struct sk_buff *))
> > {
> > struct xfrm_trans_tasklet *trans;
> > + unsigned long flags;
> >
> > trans = this_cpu_ptr(&xfrm_trans_tasklet);
> > + spin_lock_irqsave(&trans->queue.lock, flags);
>
> As you can see above 'trans' is per cpu, so a spinlock
> is not needed here. Also this does not run in hard
> interrupt context, so irqsave is also not needed.
> I don't see how this can fix anything.
>
> Can you please explain that race a bit more detailed?
>

2019-10-22 06:14:14

by Herbert Xu

[permalink] [raw]
Subject: Re: [PATCH] xfrm : lock input tasklet skb queue

On Mon, Oct 21, 2019 at 09:31:13AM -0700, Tom Rix wrote:
> When preempt rt is full, softirq and interrupts run in kthreads. So it
> is possible for the tasklet to sleep and for its queue to get modified
> while it sleeps.

This is ridiculous. The network stack is full of assumptions
like this. So I think we need to fix preempt rt instead because
you can't make a major change like this without auditing the entire
kernel first rather than relying on a whack-a-mole approach.

Cheers,
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2019-10-22 09:02:34

by Joerg Vehlow

[permalink] [raw]
Subject: Re: [PATCH] xfrm : lock input tasklet skb queue

Hi,

I already send a patch on 2019-09-09 to this mailing list with a similar
issue[1].
Sadly no replies, although this is a huge bug in the rt kernel.
I fixed it a bit differently, using smaller locked regions.
You have also propably a bug in your patch, because trans->queue.lock is
no initialized by __skb_queue_head_init (in xfrm_input_init)

Jörg

[1] https://lkml.org/lkml/2019/9/9/111

Am 20.10.2019 um 17:46 schrieb Tom Rix:
> On PREEMPT_RT_FULL while running netperf, a corruption
> of the skb queue causes an oops.
>
> This appears to be caused by a race condition here
> __skb_queue_tail(&trans->queue, skb);
> tasklet_schedule(&trans->tasklet);
> Where the queue is changed before the tasklet is locked by
> tasklet_schedule.
>
> The fix is to use the skb queue lock.
>
> Signed-off-by: Tom Rix <[email protected]>
> ---
> net/xfrm/xfrm_input.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 9b599ed66d97..226dead86828 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -758,12 +758,16 @@ static void xfrm_trans_reinject(unsigned long data)
> struct xfrm_trans_tasklet *trans = (void *)data;
> struct sk_buff_head queue;
> struct sk_buff *skb;
> + unsigned long flags;
>
> __skb_queue_head_init(&queue);
> + spin_lock_irqsave(&trans->queue.lock, flags);
> skb_queue_splice_init(&trans->queue, &queue);
>
> while ((skb = __skb_dequeue(&queue)))
> XFRM_TRANS_SKB_CB(skb)->finish(dev_net(skb->dev), NULL, skb);
> +
> + spin_unlock_irqrestore(&trans->queue.lock, flags);
> }
>
> int xfrm_trans_queue(struct sk_buff *skb,
> @@ -771,15 +775,20 @@ int xfrm_trans_queue(struct sk_buff *skb,
> struct sk_buff *))
> {
> struct xfrm_trans_tasklet *trans;
> + unsigned long flags;
>
> trans = this_cpu_ptr(&xfrm_trans_tasklet);
> + spin_lock_irqsave(&trans->queue.lock, flags);
>
> - if (skb_queue_len(&trans->queue) >= netdev_max_backlog)
> + if (skb_queue_len(&trans->queue) >= netdev_max_backlog) {
> + spin_unlock_irqrestore(&trans->queue.lock, flags);
> return -ENOBUFS;
> + }
>
> XFRM_TRANS_SKB_CB(skb)->finish = finish;
> __skb_queue_tail(&trans->queue, skb);
> tasklet_schedule(&trans->tasklet);
> + spin_unlock_irqrestore(&trans->queue.lock, flags);
> return 0;
> }
> EXPORT_SYMBOL(xfrm_trans_queue);