Hi,
I seem to be getting some odd behavior that I think may be related to
the preempt patch somehow.
Problem:
--------
The machine completely hangs with the exception of constantly repeating
the same 1/4 second sound sample that happens to be playing at the time
of the hang. The kernel does not respond to network traffic, nor to
SYS-REQ commands. A hard reset is all that works at this point.
CPU, Kernels and Patches:
-------------------------
Dual Pentium III 450MHz, SMP Kernels
Stock 2.4.17+preempt+lock-break+mki-adapter+win4lin
- Problem extremely intermittent, maybe once a day.
Stock 2.4.18+preempt+mki-adapter+win4lin
- Very frequent, and also repeatable every time I
try to start win4lin.
Stock 2.4.18+mki-adapter+win4lin
- No problems thus far. Win4lin works fine.
The mki-adapter and win4lin patches are needed for win4lin to work. I
include them here because it seems to be some sort of interaction
between them and preempt, or some behavior that they can evoke
repeatable which causes the problem to surface.
The problem could easily be somewhere in the win4lin stuff, and preempt
causes it to appear there.
What can I do to try to hunt down this problem, given that the machine
is completly useless once it happens?
-- Ian
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ian Duggan [email protected]
http://www.ianduggan.net
On Thu, 2002-03-14 at 20:51, Ian Duggan wrote:
> Stock 2.4.17+preempt+lock-break+mki-adapter+win4lin
>
> - Problem extremely intermittent, maybe once a day.
>
>
> Stock 2.4.18+preempt+mki-adapter+win4lin
>
> - Very frequent, and also repeatable every time I
> try to start win4lin.
Pretty clear it is win4line. Is it SMP-safe?
Is there another kernel module you load for win4lin? Binary? It needs
to be made preempt (and SMP) -safe.
Robert Love
> Stock 2.4.18+preempt+mki-adapter+win4lin
> - Very frequent, and also repeatable every time I
> try to start win4lin.
pre-empt is almost certainly going to break things like win4lin
> Pretty clear it is win4line. Is it SMP-safe?
>
> Is there another kernel module you load for win4lin? Binary? It needs
> to be made preempt (and SMP) -safe.
It is SMP safe. I've used it for ages on SMP kernels. The frutstrating
thing is that it worked for weeks without a hitch using
2.4.17+preempt+mki+win4lin. It is only recently that I began to
experience very intermittent problems on that kernel.
What does it mean to make something "preempt" safe? Is it something
beyond "SMP safe"?
I misspoke slightly before. The mki-adapter patch actually provides a
GPL'd module. The README from it says:
"This kernel module attempts to isolate all of the functions and
structures that NeTraverse utilizes in it's binary kernel modules."
The win4lin patch (also GPL) provides hooks for the mki-adapter module
to call.
I'm not asking for help fixing it, because of the binary module issue.
I'm just looking for ways to narrow down where the problem might be,
given that the machine completely locks up.
-- Ian
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ian Duggan [email protected]
http://www.ianduggan.net
Alan Cox wrote:
>
> > Stock 2.4.18+preempt+mki-adapter+win4lin
> > - Very frequent, and also repeatable every time I
> > try to start win4lin.
>
> pre-empt is almost certainly going to break things like win4lin
What is required for preempt beyond "SMP safe" code? I thought the whole
idea was to make the preemptions transparent to other code by utilizing
the SMP critical regions?
-- Ian
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ian Duggan [email protected]
http://www.ianduggan.net
On March 15, 2002 03:11 am, Alan Cox wrote:
> > Stock 2.4.18+preempt+mki-adapter+win4lin
> > - Very frequent, and also repeatable every time I
> > try to start win4lin.
>
> pre-empt is almost certainly going to break things like win4lin
Or rather, a binary thing like win4lin may break pre-empt, putting
further gentle pressure on the vendor to 'go open'. You could look
at that as a good thing.
I suppose Netraverse will fix it up and life will go on.
--
Daniel
From: Ian Duggan <[email protected]>
Date: Fri, 15 Mar 2002 00:38:37 -0800
What is required for preempt beyond "SMP safe" code? I thought the whole
idea was to make the preemptions transparent to other code by utilizing
the SMP critical regions?
Pre-empt makes things like per-cpu data structures require
preemption disables around cpu-local critical regions.
Code that works before just because it knows the data structure is
only even accessed by the current cpu doesn't work because preemption
can cause a context switch at any time.
> What is required for preempt beyond "SMP safe" code? I thought the whole
> idea was to make the preemptions transparent to other code by utilizing
> the SMP critical regions?
SMP safe code
Actual source code when recompiling modules
Reviewing things like driver code use of disable_irq by hand
Reviewing driver code for situations where it requires a small timing delay
and a large one is unacceptable
Checking anywhere you use the cpu id that you don't do somthing where it
might change under you (eg per cpu variables)
> "This kernel module attempts to isolate all of the functions and
> structures that NeTraverse utilizes in it's binary kernel modules."
>
> The win4lin patch (also GPL) provides hooks for the mki-adapter module
> to call.
Not really. Because if the win4lin patch provides hooks for binary modules
that the binary modules depend upon then its hard to see how the two are
resolvable. Either its GPL , in which case why is nonGPL code dependant
on it and not shipped GPL, or it isnt
> I'm not asking for help fixing it, because of the binary module issue.
> I'm just looking for ways to narrow down where the problem might be,
> given that the machine completely locks up.
I've also seen the win4lin patch. I really don't envy anyone trying to
debug it
On Fri, 2002-03-15 at 03:36, Ian Duggan wrote:
> I'm not asking for help fixing it, because of the binary module issue.
> I'm just looking for ways to narrow down where the problem might be,
> given that the machine completely locks up.
Chances are the binary win4lin module just needs to be recompiled
against a preemptive kernel.
Of course, it could need some specific preempt-safe work but more than
likely it just needs to be recompiled. Binary modules most be
specifically preempt-kernel aware, like they need be SMP-kernel aware.
Robert Love
Robert Love writes:
> Chances are the binary win4lin module just needs to be recompiled
> against a preemptive kernel.
>
> Of course, it could need some specific preempt-safe work but more than
> likely it just needs to be recompiled. Binary modules most be
> specifically preempt-kernel aware, like they need be SMP-kernel aware.
"more than likely": that's perhaps true for your average NIC/soundcard/
whatever driver, but things that poke the processor itself (like my
performance-monitoring counters driver) really do depend on not being
preempted. In my view, CONFIG_SMP is a minor triviality compared to
CONFIG_PREEMPT ...
/Mikael
On Fri, 2002-03-15 at 11:11, Mikael Pettersson wrote:
> "more than likely": that's perhaps true for your average NIC/soundcard/
> whatever driver, but things that poke the processor itself (like my
> performance-monitoring counters driver) really do depend on not being
> preempted. In my view, CONFIG_SMP is a minor triviality compared to
> CONFIG_PREEMPT ...
If you "poke the processor", to be SMP-safe, you should hold a lock to
prevent multiple concurrent "pokings of the processor" - thus you become
preempt-safe.
It is a rare case where something does not hold lock, assumes some sort
of non-reentrancy/concurrency, and is actually still SMP-safe. The only
nontrivial case I have seen is drivers that call disable_irq(n) and thus
are assured they won't have another driver request and then go off to
touch hardware.
In general, the sort of "non-preemptibility" you are requiring is also a
requirement for non-reentrancy and non-concurrency and thus your
measures to protect those (SMP locking, et al) assure you your
preempt-kernel protection, too.
Robert Love
> > "This kernel module attempts to isolate all of the functions and
> > structures that NeTraverse utilizes in it's binary kernel modules."
> >
> > The win4lin patch (also GPL) provides hooks for the mki-adapter module
> > to call.
>
> Not really. Because if the win4lin patch provides hooks for binary modules
> that the binary modules depend upon then its hard to see how the two are
> resolvable. Either its GPL , in which case why is nonGPL code dependant
> on it and not shipped GPL, or it isnt
Perhaps there is a license violation here then. There are two patches
involved, both available from the Netraverse site once you are
registered there.
https://www.netraverse.com/member/downloads/misc.php?refresh=yes&
The first patch, w4l-hooks, modifies these files, which are all part of
the kernel, thus GPL. The mki.c file has a header which says that it is
GPL.
arch/i386/Makefile
arch/i386/boot/compressed/head.S
arch/i386/boot/compressed/misc.c
arch/i386/boot/setup.S
arch/i386/config.in
arch/i386/kernel/entry.S
arch/i386/kernel/head.S
arch/i386/kernel/process.c
arch/i386/kernel/signal.c
arch/i386/kernel/smpboot.c
arch/i386/kernel/trampoline.S
arch/i386/mki/Makefile
arch/i386/mki/mki.c
arch/i386/mm/fault.c
include/asm-i386/desc.h
include/asm-i386/mki.h
include/asm-i386/mkiversion.h
include/asm-i386/segment.h
include/linux/sched.h
kernel/exit.c
kernel/fork.c
kernel/sched.c
mm/vmscan.c
The second patch, mki-adapter, creates a new kernel module with a
LICENSE file that says that it is GPL, version 2. All the files in the
patch refer to the LICENSE file.
The output of lsmod shows:
fokker% lsmod
Module Size Used by Tainted: P
Mvnetd 8676 1 (unused)
Mvnet 51952 0 [Mvnetd]
Mvnetint 216 0 (unused)
Mvw 4172 0 (unused)
Mvmouse 704 0 (unused)
Mvkbd 824 0 (unused)
Mvgic 3160 0 (unused)
Mvdsp 904 0 (unused)
Mserial 5724 0 (unused)
Mmpip 6796 0 (unused)
Mmerge 127556 0 [Mvnetd Mvnet Mvw Mvmouse Mvkbd Mvgic
Mvdsp Mserial Mmpip]
mki-adapter 20944 0 [Mvnetd Mvnet Mvnetint Mvw Mvmouse
Mvkbd Mvgic Mvdsp Mserial Mmpip Mmerge]
[...]
All of M* modules are binary modules that come as part of the win4lin
binary package. It looks like they are all dependent on Mmerge (binary)
and mki-adapter (GPL). Additionally, the mki-adapter README file has
this:
"This kernel module attempts to isolate all of the functions and
structures that NeTraverse utilizes in it's binary kernel modules."
So it certainly seems that they are dependent on the existence of
mki-adapter.
The M* modules should be available as GPL then?
-- Ian
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ian Duggan [email protected]
http://www.ianduggan.net
On Fri, Mar 15, 2002 at 02:11:49PM -0500, Robert Love wrote:
> If you "poke the processor", to be SMP-safe, you should hold a lock to
> prevent multiple concurrent "pokings of the processor" - thus you become
> preempt-safe.
Without preempt:
x = movefrom processor register;
do_something with x
is safe in SMP
With SMP it requires a lock.
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com
On Fri, Mar 15, 2002 at 05:40:36PM -0700, [email protected] wrote:
> On Fri, Mar 15, 2002 at 02:11:49PM -0500, Robert Love wrote:
> > If you "poke the processor", to be SMP-safe, you should hold a lock to
> > prevent multiple concurrent "pokings of the processor" - thus you become
> > preempt-safe.
>
> Without preempt:
> x = movefrom processor register;
> do_something with x
>
> is safe in SMP
> With SMP it requires a lock.
>
"With preempt it requires a lock" you mean?
On Fri, Mar 15, 2002 at 05:46:15PM -0800, Mike Fedyk wrote:
> On Fri, Mar 15, 2002 at 05:40:36PM -0700, [email protected] wrote:
> > On Fri, Mar 15, 2002 at 02:11:49PM -0500, Robert Love wrote:
> > > If you "poke the processor", to be SMP-safe, you should hold a lock to
> > > prevent multiple concurrent "pokings of the processor" - thus you become
> > > preempt-safe.
> >
> > Without preempt:
> > x = movefrom processor register;
> > do_something with x
> >
> > is safe in SMP
> > With SMP it requires a lock.
> >
>
> "With preempt it requires a lock" you mean?
Yep. Keyboard Operator error.
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com
On March 15, 2002 03:28 pm, Alan Cox wrote:
> > What is required for preempt beyond "SMP safe" code? I thought the whole
> > idea was to make the preemptions transparent to other code by utilizing
> > the SMP critical regions?
>
> SMP safe code
> Actual source code when recompiling modules
> Reviewing things like driver code use of disable_irq by hand
> Reviewing driver code for situations where it requires a small timing delay
> and a large one is unacceptable
Has anyone found one of those yet?
> Checking anywhere you use the cpu id that you don't do somthing where it
> might change under you (eg per cpu variables)
Is per-cpu data the whole list there?
--
Daniel
On March 16, 2002 01:40 am, [email protected] wrote:
>
> Without preempt:
> x = movefrom processor register;
> do_something with x
>
> is safe in SMP
> With [preempt] it requires a lock.
It must be a trick question. Why would it?
--
Daniel
On Sun, Mar 17, 2002 at 01:33:04AM +0100, Daniel Phillips wrote:
> On March 16, 2002 01:40 am, [email protected] wrote:
> >
> > Without preempt:
> > x = movefrom processor register;
// if preemption is on, we can be preempted and restart
// on another processor so x will be wrong
> > do_something with x
> >
> > is safe in SMP
> > With [preempt] it requires a lock.
>
> It must be a trick question. Why would it?
See comment.
>
> --
> Daniel
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com
On March 17, 2002 02:13 am, [email protected] wrote:
> On Sun, Mar 17, 2002 at 01:33:04AM +0100, Daniel Phillips wrote:
> > On March 16, 2002 01:40 am, [email protected] wrote:
> > >
> > > Without preempt:
> > > x = movefrom processor register;
> // if preemption is on, we can be preempted and restart
> // on another processor so x will be wrong
> > > do_something with x
> > >
> > > is safe in SMP
> > > With [preempt] it requires a lock.
> >
> > It must be a trick question. Why would it?
>
> See comment.
Which processor register were you thinking of? Surely not anything in the
general register set, and otherwise, it's just another example of per-cpu
data. It needs to be protected, and the protection is lightweight.
--
Daniel
On Sun, Mar 17, 2002 at 02:14:14AM +0100, Daniel Phillips wrote:
> On March 17, 2002 02:13 am, [email protected] wrote:
> > On Sun, Mar 17, 2002 at 01:33:04AM +0100, Daniel Phillips wrote:
> > > On March 16, 2002 01:40 am, [email protected] wrote:
> > > >
> > > > Without preempt:
> > > > x = movefrom processor register;
> > // if preemption is on, we can be preempted and restart
> > // on another processor so x will be wrong
> > > > do_something with x
> > > >
> > > > is safe in SMP
> > > > With [preempt] it requires a lock.
> > >
> > > It must be a trick question. Why would it?
> >
> > See comment.
>
> Which processor register were you thinking of? Surely not anything in the
> general register set, and otherwise, it's just another example of per-cpu
> data. It needs to be protected, and the protection is lightweight.
So what didn't you understand? Your (dubious)
assertion that the lock is "lightweight"
has absolutely no bearing on whether a lock is needed or not.
>
> --
> Daniel
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com
On March 17, 2002 02:54 am, [email protected] wrote:
> On Sun, Mar 17, 2002 at 02:14:14AM +0100, Daniel Phillips wrote:
> > On March 17, 2002 02:13 am, [email protected] wrote:
> > > On Sun, Mar 17, 2002 at 01:33:04AM +0100, Daniel Phillips wrote:
> > > > On March 16, 2002 01:40 am, [email protected] wrote:
> > > > >
> > > > > Without preempt:
> > > > > x = movefrom processor register;
> > > // if preemption is on, we can be preempted and restart
> > > // on another processor so x will be wrong
> > > > > do_something with x
> > > > >
> > > > > is safe in SMP
> > > > > With [preempt] it requires a lock.
> > > >
> > > > It must be a trick question. Why would it?
> > >
> > > See comment.
> >
> > Which processor register were you thinking of? Surely not anything in the
> > general register set, and otherwise, it's just another example of per-cpu
> > data. It needs to be protected, and the protection is lightweight.
>
> So what didn't you understand? Your (dubious)
> assertion that the lock is "lightweight"
> has absolutely no bearing on whether a lock is needed or not.
I didn't understand which kind of register you meant (because you didn't say).
For the bog-standard variety I don't see a problem.
Protection of special registers is lightweight, it's just a preempt
disable/enable (inc/dec).
--
Daniel
> > Reviewing driver code for situations where it requires a small timing delay
> > and a large one is unacceptable
>
> Has anyone found one of those yet?
There are some frame buffers with that requirement. The stuff I've looked
at where the are such timing rules already disables interrupts.
(The other classic btw is older PIO IDE setups)
> > Checking anywhere you use the cpu id that you don't do somthing where it
> > might change under you (eg per cpu variables)
>
> Is per-cpu data the whole list there?
Think about profiling registers, mtrrs, msrs, and so forth. For example
if we had thread handling MCE traps we would hit a problem. As it happens
MCE is an interrupt so its all nice.
Alan
On Sun, Mar 17, 2002 at 03:31:24AM +0000, Alan Cox wrote:
> Think about profiling registers, mtrrs, msrs, and so forth. For example
> if we had thread handling MCE traps we would hit a problem. As it happens
> MCE is an interrupt so its all nice.
Ah, I'm glad you mentioned this. It's reminded me that my
timer-based 'check for non-fatal machine check and log' code
needs some work for SMP..
Are routines called with smp_call_function() preempt safe, or
must they have extra locking added ?
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
On Sat, 16 Mar 2002 [email protected] wrote:
> So what didn't you understand? Your (dubious)
> assertion that the lock is "lightweight"
> has absolutely no bearing on whether a lock is needed or not.
more than a lock you better have a preempt disable
- Davide