2003-05-23 21:02:43

by Paul Fulghum

[permalink] [raw]
Subject: [BUGS] 2.5.69 syncppp

I'm seeing the following with syncppp under 2.5.69:

1. When syncppp tries to send a control protocol packet,
I see the following kernel messages:

Badness in local_bh_enable at kernel/softirq.c:105
Call Trace:
[<c01254b4>] local_bh_enable+0x84/0x90
[<c02dbbc0>] dev_queue_xmit+0x1c0/0x250
[<cc82fd18>] sppp_lcp_open+0x78/0x90 [syncppp]
[<cc82fe3d>] sppp_cp_timeout+0xad/0xd0 [syncppp]
[<cc82fd90>] sppp_cp_timeout+0x0/0xd0 [syncppp]
[<c01295dd>] run_timer_softirq+0xcd/0x190
[<c012542b>] do_softirq+0xdb/0xe0
[<c011790d>] smp_apic_timer_interrupt+0xcd/0x140
[<c0108f90>] default_idle+0x0/0x40
[<c010bc82>] apic_timer_interrupt+0x1a/0x20

This appears the indicate that dev_queue_xmit() can't
be called with local interrupts disabled (which is the
case as a spinlock is held when this call is made).

It looks like a lot of reworking is necessary if this is true.


2. When network packets are received I see the following message:

[<c0121e65>] fix old protocol handler sppp_rcv+0x0/0x1e [syncppp]!

Any docs covering what needs to be done for this one?

Thanks,
Paul

--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com



2003-05-23 21:20:16

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUGS] 2.5.69 syncppp

Paul Fulghum <[email protected]> wrote:
>
> 1. When syncppp tries to send a control protocol packet,
> I see the following kernel messages:
>
> Badness in local_bh_enable at kernel/softirq.c:105

sppp_lcp_open() is called from other places without that lock held, so it
is probably not totally stupid to drop it in the timer handler too.

It's good (and surprising) that someone is actually using that stuff.
Please beat on it for a while.


diff -puN drivers/net/wan/syncppp.c~syncppp-locking-fix drivers/net/wan/syncppp.c
--- 25/drivers/net/wan/syncppp.c~syncppp-locking-fix Fri May 23 14:28:50 2003
+++ 25-akpm/drivers/net/wan/syncppp.c Fri May 23 14:29:24 2003
@@ -1297,6 +1297,7 @@ static void sppp_cp_timeout (unsigned lo
spin_unlock_irqrestore(&spppq_lock, flags);
return;
}
+ spin_unlock_irqrestore(&spppq_lock, flags);
switch (sp->lcp.state) {
case LCP_STATE_CLOSED:
/* No ACK for Configure-Request, retry. */
@@ -1333,7 +1334,6 @@ static void sppp_cp_timeout (unsigned lo
}
break;
}
- spin_unlock_irqrestore(&spppq_lock, flags);
}

static char *sppp_lcp_type_name (u8 type)

_

2003-05-23 22:57:52

by Paul Fulghum

[permalink] [raw]
Subject: RE: [BUGS] 2.5.69 syncppp

> sppp_lcp_open() is called from other places
> without that lock held, so it is probably not
> totally stupid to drop it in the timer handler too.

That section was previously covered by cli/sti,
and I changed it to use the spinlock instead
when cli/sti went away in 2.5.x.

I thought it was in place to serialize state changes.
I'll look at it harder, you may be right in that
it is not necessary.

> It's good (and surprising) that someone is
> actually using that stuff.

It's not pretty, but it works.
Some customers prefer it to pppd.

> Please beat on it for a while.

Yes, that code is in need of a good beating :-)

Paul Fulghum
[email protected]


2003-05-26 18:27:40

by Alan

[permalink] [raw]
Subject: RE: [BUGS] 2.5.69 syncppp

On Sad, 2003-05-24 at 00:11, Paul Fulghum wrote:
> I thought it was in place to serialize state changes.
> I'll look at it harder, you may be right in that
> it is not necessary.

The state serialization doesn't have to be 100% for PPP however,
you already have the same races present due to wire time so I
also think it should be ok.


2003-05-28 21:11:24

by Paul Fulghum

[permalink] [raw]
Subject: RE: [BUGS] 2.5.69 syncppp

On Mon, 2003-05-26 at 12:42, Alan Cox wrote:
> On Sad, 2003-05-24 at 00:11, Paul Fulghum wrote:
> > I thought it was in place to serialize state changes.
> > I'll look at it harder, you may be right in that
> > it is not necessary.
>
> The state serialization doesn't have to be 100% for PPP however,
> you already have the same races present due to wire time so I
> also think it should be ok.

OK, state changes can happen from several different
sources, and not all of these sources of input are
synchronized.

The spinlock in cp_timeout() does not synchronize
with input from sppp_input(), but *does* synchronize
with sppp_keepalive() which is run off another timer.

But I think I understand what Alan is getting at in
that the PPP state tables are designed to be tolerant
of transient oddities and should converge to a final
state regardless of a timing glitch/race.

Not worrying about state change synchronization and
discarding the use of the spinlock in cp_timeout()
will remove the warning for that case.

But sppp_keepalive() uses a spinlock to synchronize
access to the linked list of sppp devices. So this
path can also cause the warning.

So there are multiple places that call dev_queue_xmit()
with spinlocks held, which provokes this warning.

Which makes me wonder:
Was it really the intention of the change to kernel/softirq.c:105
(source of the warning) that callers to dev_queue_xmit()
not be allowed to use spinlocks? If so, then what other
synchronization techniques are appropriate for use in
an interrupt and timer context?



--
Paul Fulghum, [email protected]
Microgate Corporation, http://www.microgate.com

2003-05-28 21:44:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [BUGS] 2.5.69 syncppp

Paul Fulghum <[email protected]> wrote:
>
> Was it really the intention of the change to kernel/softirq.c:105
> (source of the warning) that callers to dev_queue_xmit()
> not be allowed to use spinlocks? If so, then what other
> synchronization techniques are appropriate for use in
> an interrupt and timer context?

That warning is there because local_bh_enable will unconditionally enable
interrupts, to run softirqs.

Hence, if someone is calling local_bh_enable() with interrupts disabled
then local_bh_enable() is about to break their locking scheme in subtle
ways. So the warning is there to tell you about this.

And we don't want to be running softirqs with interrupts disabled, for
latency reasons.

2003-05-28 22:21:55

by Alan

[permalink] [raw]
Subject: RE: [BUGS] 2.5.69 syncppp

On Mer, 2003-05-28 at 22:24, Paul Fulghum wrote:
> The spinlock in cp_timeout() does not synchronize
> with input from sppp_input(), but *does* synchronize
> with sppp_keepalive() which is run off another timer.

In which case you can also use the del_timer return to do
a del_timer_sync/add_timer based protection in some
situations