I've been experiencing a particular kind of hang for many versions
(since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
the alpha architecture. The symptom is that any program that tries to
access the process table will hang. (ps, w, top) The hang will go away
by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
see that it gets halfway through the process list and hangs. The
process that comes next in the list (after hang goes away) almost always
has nonsensical memory numbers, like multi-gigabyte SIZE.
Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
alpha unknown
Gnu C 2.96
Gnu make 3.78.1
binutils 2.10.0.18
util-linux 2.11a
modutils 2.4.5
e2fsprogs 1.18
PPP 2.3.11
Linux C Library 2.2.1
Dynamic linker (ldd) 2.2.1
Procps 2.0.7
Net-tools 1.54
Kbd 0.94
Sh-utils 2.0
Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
soundcore
Has anyone else seen this? Is there a fix?
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
You wouldn't happen to have khttpd loaded as a module, would you? I've seen
this type of problem caused by that before...
- Pete
Bob McElrath wrote:
> I've been experiencing a particular kind of hang for many versions
> (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> the alpha architecture. The symptom is that any program that tries to
> access the process table will hang. (ps, w, top) The hang will go away
> by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> see that it gets halfway through the process list and hangs. The
> process that comes next in the list (after hang goes away) almost always
> has nonsensical memory numbers, like multi-gigabyte SIZE.
>
> Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
> alpha unknown
>
> Gnu C 2.96
> Gnu make 3.78.1
> binutils 2.10.0.18
> util-linux 2.11a
> modutils 2.4.5
> e2fsprogs 1.18
> PPP 2.3.11
> Linux C Library 2.2.1
> Dynamic linker (ldd) 2.2.1
> Procps 2.0.7
> Net-tools 1.54
> Kbd 0.94
> Sh-utils 2.0
> Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
> soundcore
>
> Has anyone else seen this? Is there a fix?
>
> -- Bob
>
> Bob McElrath ([email protected])
> Univ. of Wisconsin at Madison, Department of Physics
>
> ------------------------------------------------------------------------
> Part 1.2Type: application/pgp-signature
Peter Rival [[email protected]] wrote:
> You wouldn't happen to have khttpd loaded as a module, would you? I've seen
> this type of problem caused by that before...
Nope...
>
> - Pete
>
> Bob McElrath wrote:
>
> > I've been experiencing a particular kind of hang for many versions
> > (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> > the alpha architecture. The symptom is that any program that tries to
> > access the process table will hang. (ps, w, top) The hang will go away
> > by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> > see that it gets halfway through the process list and hangs. The
> > process that comes next in the list (after hang goes away) almost always
> > has nonsensical memory numbers, like multi-gigabyte SIZE.
> >
> > Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
> > alpha unknown
> >
> > Gnu C 2.96
> > Gnu make 3.78.1
> > binutils 2.10.0.18
> > util-linux 2.11a
> > modutils 2.4.5
> > e2fsprogs 1.18
> > PPP 2.3.11
> > Linux C Library 2.2.1
> > Dynamic linker (ldd) 2.2.1
> > Procps 2.0.7
> > Net-tools 1.54
> > Kbd 0.94
> > Sh-utils 2.0
> > Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
> > soundcore
> >
> > Has anyone else seen this? Is there a fix?
> >
> > -- Bob
> >
> > Bob McElrath ([email protected])
> > Univ. of Wisconsin at Madison, Department of Physics
> >
> > ------------------------------------------------------------------------
> > Part 1.2Type: application/pgp-signature
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
Hmpf. Haven't seen this at all on any of the Alphas that I'm running. What
exact system are you seeing this on, and what are you running when it happens?
- Pete
Bob McElrath wrote:
> Peter Rival [[email protected]] wrote:
> > You wouldn't happen to have khttpd loaded as a module, would you? I've seen
> > this type of problem caused by that before...
>
> Nope...
>
> >
> > - Pete
> >
> > Bob McElrath wrote:
> >
> > > I've been experiencing a particular kind of hang for many versions
> > > (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> > > the alpha architecture. The symptom is that any program that tries to
> > > access the process table will hang. (ps, w, top) The hang will go away
> > > by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> > > see that it gets halfway through the process list and hangs. The
> > > process that comes next in the list (after hang goes away) almost always
> > > has nonsensical memory numbers, like multi-gigabyte SIZE.
> > >
> > > Linux draal.physics.wisc.edu 2.3.99-pre5 #8 Sun Apr 23 16:21:48 CDT 2000
> > > alpha unknown
> > >
> > > Gnu C 2.96
> > > Gnu make 3.78.1
> > > binutils 2.10.0.18
> > > util-linux 2.11a
> > > modutils 2.4.5
> > > e2fsprogs 1.18
> > > PPP 2.3.11
> > > Linux C Library 2.2.1
> > > Dynamic linker (ldd) 2.2.1
> > > Procps 2.0.7
> > > Net-tools 1.54
> > > Kbd 0.94
> > > Sh-utils 2.0
> > > Modules Loaded nfsd lockd sunrpc af_packet msdos fat pas2 sound
> > > soundcore
> > >
> > > Has anyone else seen this? Is there a fix?
> > >
> > > -- Bob
> > >
> > > Bob McElrath ([email protected])
> > > Univ. of Wisconsin at Madison, Department of Physics
> > >
> > > ------------------------------------------------------------------------
> > > Part 1.2Type: application/pgp-signature
> -- Bob
>
> Bob McElrath ([email protected])
> Univ. of Wisconsin at Madison, Department of Physics
>
> ------------------------------------------------------------------------
> Part 1.2Type: application/pgp-signature
Peter Rival [[email protected]] wrote:
> Hmpf. Haven't seen this at all on any of the Alphas that I'm running. What
> exact system are you seeing this on, and what are you running when it happens?
This is a LX164 system, 533 MHz.
I have a hunch it's related to the X server because I've seen it many,
many times while sitting at the console (in X), but never when I'm
logged on remotely. I've seen it with both XFree86 3.3.6, 4.0.2, 4.0.3,
Matrox Millenium II video card, 8MB.
I'm also experiencing regular X crashes, but the process-table-hang
doesn't occur at the same time as an X crash (or v/v). I sent a patch
to [email protected] a few days ago that seemed to fix (one of) the X
crashes (in the mga driver, ask if you want details).
(But since the X server shouldn't have the ability to corrupt the
kernel's process list, there has to be a problem in the kernel
somewhere)
Note that this system was completely stable with 2.2 kernels.
Cheers,
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
Well, here's the list of modules I have loaded:
nfsd 102496 8 (autoclean)
lockd 72976 1 (autoclean) [nfsd]
sunrpc 87984 1 (autoclean) [nfsd lockd]
nls_iso8859-1 4160 1 (autoclean)
nls_cp437 5664 1 (autoclean)
msdos 7728 1 (autoclean)
fat 42784 0 (autoclean) [msdos]
pas2 17488 1
sound 83184 1 [pas2]
soundcore 5568 5 [sound]
Are there any known problems with these? I have at times also used
matroxfb, and usb-uhci (along with visor, usb-storage), but I've seen
the process-table-hang with matroxfb and usb-uhci *not* installed, so I
don't think that's it. I have the above modules installed consistently
at each bootup.
Der Herr Hofrat [[email protected]] wrote:
> > I've been experiencing a particular kind of hang for many versions
> > (since 2.3.99 days, recently seen with 2.4.1, 2.4.2, and 2.4.2-ac4) on
> > the alpha architecture. The symptom is that any program that tries to
> > access the process table will hang. (ps, w, top) The hang will go away
> > by itself after ~10minutes - 1 hour or so. When it hangs I run ps and
> > see that it gets halfway through the process list and hangs. The
> > process that comes next in the list (after hang goes away) almost always
> > has nonsensical memory numbers, like multi-gigabyte SIZE.
> >
> >
> I know this effect independant of the platform when you have a proc entry that
> is not corectly unregistered.
>
> (the code only compiles for 2.2.X, for 2.4.X you need to change
> the proc struct.)
>
> ---snip---
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/proc_fs.h>
>
> #define BUF_LEN 1024
> struct proc_dir_entry prockill_proc_file={
> 0,
> 0,
> "prockill",
> S_IFREG|S_IRUGO,
> 1,
> 0,
> 0,
> BUF_LEN,
> NULL,
> NULL,
> NULL,
> };
>
> int init_module(void) {
> printk("prockill.o registering proc entry\n");
> return proc_register(&proc_root,&prockill_proc_file);
> }
>
> void cleanup_module(void) {
> printk("prockill.o fogets to unregister proc entry\n");
> }
> ---snip---
> compile this as kernel module
>
> insmod proc_kill.o
> rmmod proc_kill
>
> and the system will run without error until you do something like
>
> ls /proc/<TAB><TAB> or
> ls -R /proc
>
> after this the system will drop dead for minutes to hours or even for good....
>
>
> any chance you have a faulty module ??
>
>
> hofrat
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
Alan Cox [[email protected]] wrote:
> > (But since the X server shouldn't have the ability to corrupt the
> > kernel's process list, there has to be a problem in the kernel
> > somewhere)
>
> The X server has enough priviledge to corrupt anything. Its unlikely to and
> I do agree they two are likely to be unrelated.
Well, nix that idea. I just fell back to 2.2.19, and I see neither the
X crash nor the process-table-hang crash (which rules out hardware
problems, thankfully). The X crash is also kernel related, it seems.
I'm using XFree86 4.0.3 with the mga driver. It hangs in mga_storm.c on
a line that looks like:
while (MGAISBUSY()) {}
where:
#define MGAISBUSY() (INREG8(MGAREG_Status + 2) & 0x01)
Killing and restarting X causes it to immediately hang in the same
place. (I have to reboot to recover the console)
This would seem to be PCI related. Have any significant PCI code
changes been made to the alpha architecture, especially pyxis or
cabriolet code? I see that arch/alpha/kernel has been totally
rearranged, but since this doesn't crash in kernel code, I have no idea
how to debug it.
Thanks,
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
On Fri, Apr 13, 2001 at 08:48:05AM -0500, Bob McElrath wrote:
> Alan Cox [[email protected]] wrote:
> > > (But since the X server shouldn't have the ability to corrupt the
> > > kernel's process list, there has to be a problem in the kernel
> > > somewhere)
> >
> > The X server has enough priviledge to corrupt anything. Its unlikely to and
> > I do agree they two are likely to be unrelated.
>
> Well, nix that idea. I just fell back to 2.2.19, and I see neither the
> X crash nor the process-table-hang crash (which rules out hardware
> problems, thankfully). The X crash is also kernel related, it seems.
>
> I'm using XFree86 4.0.3 with the mga driver. It hangs in mga_storm.c on
> a line that looks like:
> while (MGAISBUSY()) {}
> where:
> #define MGAISBUSY() (INREG8(MGAREG_Status + 2) & 0x01)
>
> Killing and restarting X causes it to immediately hang in the same
> place. (I have to reboot to recover the console)
>
> This would seem to be PCI related. Have any significant PCI code
> changes been made to the alpha architecture, especially pyxis or
> cabriolet code? I see that arch/alpha/kernel has been totally
> rearranged, but since this doesn't crash in kernel code, I have no idea
> how to debug it.
It seems it was an SMP race in the rw alpha semaphores. I rewrote the
rwsemaphores starting from my first implementation of them in C that is now
adpoted by the ppc port (I added some scalability and locking optimization),
and made them generic dropping all the rwsem stuff that is been included into
2.4.4pre[23] (the generic rwsemaphores in those kernels is broken, try to use
them in other archs or x86 and you will notice) and I cannot reproduce the hang
any longer.
My generic rwsem should be also cleaner and faster than the generic ones in
2.4.4pre3 and they can be turned off completly so an architecture can really
takeover with its own asm implementation (while with the 2.4.4pre3 design this
is obviously not possible because lib/rwsem.c compilation isn't conditional and
such file knows the internals of the struct rw_semaphore).
In the below generic implementation of the rw sem the max limit of concurrent
readers in the critical section is 2^sizeof(int) and down_read is recursive.
There's no limit of tasks sleeping in the slow path either by down_read or
down_write. The waitqueue wakeups are done without any additional lock (the
lock in the waitqueue is unused).
So please try to reproduce the hang with 2.4.4pre3 with those two
patches applied:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
All alpha users should run with at least the above two patches applied
to compile their tree and to make sure to have rock solid rwsemaphores.
Both patches are suggested for inclusion, the arch optimizations can be done on
top of the cleaner and arch friendly rwsem code (just copy the asm files from
2.4.4pre3 and set CONFIG_GENERIC_RWSEM to `n') and the current lib/rwsem.c can be
moved in arch/i386/kernel without any problem. I didn't do that myself because
I wasn't going to audit every line of the x86 asm rwsem right now and I only
wanted obviously right stuff into my tree but I'd appreciate if David could do
that. Note that besides my patch drops the asm stuff I don't want to reject the asm
based implementation in the long run, but I only care to proivide a solid
and clean generic implementation that can be used as a fallback anytime on any
arch by only changing a configuration option.
The alpha-numa patch also fixes some mm bug in the common code.
Andrea
Andrea Arcangeli [[email protected]] wrote:
>
> So please try to reproduce the hang with 2.4.4pre3 with those two
> patches applied:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
>
> All alpha users should run with at least the above two patches applied
> to compile their tree and to make sure to have rock solid rwsemaphores.
Excellent! I'll give it a try.
Note that I recently saw the X hang with the 2.2.19 kernel, but I still
haven't seen the process-table-hang with 2.2.19 (about 4 days running
with 2.2.19). It is *far* easier to get the X hang in 2.4 than 2.2.
(minutes for 2.4, days for 2.2) Also note that this is not an SMP
machine (single processor 21164a, LX164 mobo).
But I'll apply your patch tonight and let you know the results.
Cheers,
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
In article <[email protected]> you wrote:
> My generic rwsem should be also cleaner and faster than the generic ones in
> 2.4.4pre3 and they can be turned off completly so an architecture can really
> takeover with its own asm implementation (while with the 2.4.4pre3 design this
> is obviously not possible because lib/rwsem.c compilation isn't conditional and
> such file knows the internals of the struct rw_semaphore).
>
> In the below generic implementation of the rw sem the max limit of concurrent
> readers in the critical section is 2^sizeof(int) and down_read is recursive.
> There's no limit of tasks sleeping in the slow path either by down_read or
> down_write. The waitqueue wakeups are done without any additional lock (the
> lock in the waitqueue is unused).
>
> So please try to reproduce the hang with 2.4.4pre3 with those two
> patches applied:
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
Hey it looks like someone finally fixed the rwsems :P
A little comment on the path:
In lib/Makefile you should _always_ add rwsem.o the export-objs, not only if
CONFIG_GENERIC_RWSEM is 'y' - that's the whole idea behind export-objs.
Christoph
--
Of course it doesn't work. We've performed a software upgrade.
Andrea,
How did you generate the 00_rwsem-generic-1 patch? Against what did you diff?
You seem to have removed all the optimised i386 rwsem stuff... Did it not work
for you?
> (the generic rwsemaphores in those kernels is broken, try to use them in
> other archs or x86 and you will notice) and I cannot reproduce the hang any
> longer.
Can you supply a test case that demonstrates it not working?
> My generic rwsem should be also cleaner and faster than the generic ones in
> 2.4.4pre3 and they can be turned off completly so an architecture can really
> takeover with its own asm implementation.
I quick look says it shouldn't be faster (inline functions and all that).
However, I think you might be right about it being too dependent on the
algorithm I put in, and that is easy to change.
> (while with the 2.4.4pre3 design this is obviously not possible because
> lib/rwsem.c compilation isn't conditional and such file knows the internals
> of the struct rw_semaphore).
Could be very easily changed.
David
On Tue, Apr 17, 2001 at 05:59:13PM +0100, David Howells wrote:
> Andrea,
>
> How did you generate the 00_rwsem-generic-1 patch? Against what did you diff?
2.4.4pre3 from kernel.org.
> You seem to have removed all the optimised i386 rwsem stuff... Did it not work
> for you?
As said the design of the framework to plugin per-arch rwsem implementation
isn't flexible enough and the generic spinlocks are as well broken, try to use
them if you can (yes I tried that for the alpha, it was just a mess and it was
more productive to rewrite than to fix).
> > (the generic rwsemaphores in those kernels is broken, try to use them in
> > other archs or x86 and you will notice) and I cannot reproduce the hang any
> > longer.
>
> Can you supply a test case that demonstrates it not working?
#define __RWSEM_INITIALIZER(name,count) \
^^^^^
{ RWSEM_UNLOCKED_VALUE, SPIN_LOCK_UNLOCKED, \
^^^^^^^^^^^^^^^^^^^^
__WAIT_QUEUE_HEAD_INITIALIZER((name).wait) \
__RWSEM_DEBUG_INIT __RWSEM_DEBUG_MINIT(name) }
#define __DECLARE_RWSEM_GENERIC(name,count) \
struct rw_semaphore name = __RWSEM_INITIALIZER(name,count)
^^^^^
#define DECLARE_RWSEM(name) __DECLARE_RWSEM_GENERIC(name,RW_LOCK_BIAS)
^^^^^^^^^^^^
#define DECLARE_RWSEM_READ_LOCKED(name) __DECLARE_RWSEM_GENERIC(name,RW_LOCK_BIAS-1)
^^^^^^^^^^^^^^
#define DECLARE_RWSEM_WRITE_LOCKED(name) __DECLARE_RWSEM_GENERIC(name,0)
> > My generic rwsem should be also cleaner and faster than the generic ones in
> > 2.4.4pre3 and they can be turned off completly so an architecture can really
> > takeover with its own asm implementation.
>
> I quick look says it shouldn't be faster (inline functions and all that).
The spinlock based generic semaphores are quite large, so I don't want to waste
icache because of that, a call asm instruction isn't that costly (it's
obviously _very_ costly for a spinlock because a spinlock is 1 asm instruction
in the fast path, but not for a C based rwsem). But the real point is the
locking and the waitqueue mechanism that is superior in my implementation (not
the non inlining part).
And it's also more readable and it's not bloated code, 65+110 lines compared to
156+148+174 lines.
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3aa/include/linux/rwsem.h
65 2.4.4pre3aa/include/linux/rwsem.h
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3aa/lib/rwsem.c
110 2.4.4pre3aa/lib/rwsem.c
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/lib/rwsem.c
156 2.4.4pre3/lib/rwsem.c
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/include/linux/rwsem.h
148 2.4.4pre3/include/linux/rwsem.h
andrea@athlon:~/devel/kernel > wc -l 2.4.4pre3/include/linux/rwsem-spinlock.h
174 2.4.4pre3/include/linux/rwsem-spinlock.h
andrea@athlon:~/devel/kernel >
I suggest you to apply my patch, read my implementation, tell me if you think
it's not more efficient and more readable, and then to set CONFIG_RWSEM_GENERIC
to n in arch/i386/config.in and to plugin your asm code taken from vanilla
2.4.4pre3 into include/asm-i386/rwsem.h and arch/i386/kernel/rwsem.c then we're
done, and if someone has problems with the asm code with a one liner he can
fallback in a obviously right and quite efficient implementation [even if the
fastpath is not 1 inlined asm instruction] (all archs will be allowed to do
that transparently to the arch dependent code). Same can be done on alpha and
other archs, resurrecting the inlined fast paths based on the atomic_add_return
APIs is easy too. Infact I'd _recommend_ for archs that can implement the
atomic_add_return and friends (included ia32 with xadd on >=586) to
implement the "fast path" version of the rwsem it in C too in the common code
selectable with a CONFIG_RWSEM_ATOMIC_RETURN (plus we add
linux/include/linux/compiler.h with the builtin_expect macro to be able to
define the fast path in C too). Most archs have the atomic_*_return and friends
and they will be able share completly the common code and have rwsem fast paths
as fast as ia32 without risk to introduce bugs in the port. The more we share
the less risk there is. After CONFIG_RWSEM_ATOMIC_RETURN is implemented we can
probably drop the file asm-i386/rwsem-xadd.h.
Andrea
Andrea,
> As said the design of the framework to plugin per-arch rwsem implementation
> isn't flexible enough and the generic spinlocks are as well broken, try to
> use them if you can (yes I tried that for the alpha, it was just a mess and
> it was more productive to rewrite than to fix).
Having thought about the matter a bit, I know what the problem is:
As stated in the email with the latest patch, I haven't yet extended this to
cover any architecture but i386. That patch was actually put up for comments,
though it got included anyway.
Therefore, all other archs use the old (and probably) broken implementations!
I'll quickly knock up a patch to fix the other archs. This should also fix
the alpha problem.
As for making the stuff I had done less generic, and more specific, I only
made it more generic because I got asked to by a number of people. It was
suggested that I move the contention functions into lib/rwsem.c and make them
common.
As far as using atomic_add_return() goes, the C compiler cannot make the
fastpath anywhere near as efficient, because amongst other things, I can make
use of the condition flags set in EFLAGS and the compiler can't.
> And it's also more readable and it's not bloated code, 65+110 lines
> compared to 156+148+174 lines.
You do my code an injustice there... I've put comments in mine.
David
On Tue, Apr 17, 2001 at 08:18:57PM +0100, D . W . Howells wrote:
> Andrea,
>
> > As said the design of the framework to plugin per-arch rwsem implementation
> > isn't flexible enough and the generic spinlocks are as well broken, try to
> > use them if you can (yes I tried that for the alpha, it was just a mess and
> > it was more productive to rewrite than to fix).
>
> Having thought about the matter a bit, I know what the problem is:
>
> As stated in the email with the latest patch, I haven't yet extended this to
> cover any architecture but i386. That patch was actually put up for comments,
> though it got included anyway.
>
> Therefore, all other archs use the old (and probably) broken implementations!
I am sure ppc couldn't race (at least unless up_read/up_write were excuted
from irq/softnet context and that never happens in 2.4.4pre3, see below ;).
> I'll quickly knock up a patch to fix the other archs. This should also fix
> the alpha problem.
This is not the point. The point is that we want a generic implementation in C
always available as a fallback and my one is IMHO better to what is in
2.4.4pre3, and secondly it has a superior API to replace the generic
implementation with a per-arch implementation. So the only thing left to do is
to plugin your x86 specific implementation in my patch using the simple API I
provide to override the generic implementation completly and I preferred if you
could do that at least for the 386/486 case (you know your code better). If
you're not interested I probably end ingoring the <586 compiles by implementing
only the atomic_*_return with xadd for >=586 and a CONFIG_RWSEM_ATOMIC_RETURN
config option in the common code so I will optimize almost all archs in one go
as I think that's the way to go and so I prefer to invest time in such direction
only.
> As for making the stuff I had done less generic, and more specific, I only
> made it more generic because I got asked to by a number of people. It was
> suggested that I move the contention functions into lib/rwsem.c and make them
> common.
And the generic part was implemented bad and that's why I rewrote it to boot
my alpha.
> As far as using atomic_add_return() goes, the C compiler cannot make the
> fastpath anywhere near as efficient, because amongst other things, I can make
> use of the condition flags set in EFLAGS and the compiler can't.
All we need to do is to avoid the spinlock until we enter the fast path. We
only need the fast path to be a few asm inlined opcodes that jumps out of line
if there's contention. I don't think other optimizations are interesting.
That can obviously be done for example with C code like this:
count = atomic_inc_return(&sem->count);
if (__builtin_expect(count == 0, 0))
slow_path()
The above is the perfect C implementation IMHO, but it cannot be
the most generic one because [34]86 doesn't have xadd and they
cannot implement atomic_*_return and friends.
And incidentally the above is what (I guess Richard) did on the alpha and that
should really go into common code instead of having asm-i386/rwsem-xadd.h
asm-alpha/rwsem.h etcc.etc... just implement atomic_inc_return using xadd in
asm-i386/atomic.h, that's much better design IMHO.
> > And it's also more readable and it's not bloated code, 65+110 lines
> > compared to 156+148+174 lines.
>
> You do my code an injustice there... I've put comments in mine.
Put it this way: my one is readable enough that I don't need to add comments ;).
More serously I may be biased in the readability point, but if I read
lib/rwsem.c and include/linux/rwsem*.h before applying the patch and after
applying the patch I have no dobut on what I want to run on my computer (I'm
not talking about asm-i386/rwsem*.h of course). You are of course free to keep
your one if you prefer it but I don't see technical arguments for that
decision.
BTW, Andrew Morton is been so kind to audit my code and he noticed my patch was
not allowing up_read/up_write to be called from irqs because I forgotten an
_irq in the down_write (thanks Andrew!). I didn't catched during regression
testing because nobody in the whole 2.4.4pre3 kernel is running either up_read
or up_write from irq/softirq context, so it cannot destabilize the runtime but
nevertheless that was a leftover also shared by the ppc port code in 2.4.3. So
if you just started the alpha regression testing on the -1 revision go head and
don't stop because you are reading this. just for Linus I released a -2 new
version of the patch but again upgrade is not necessary for production (at
least unless you use drivers outside the kernel tree that could release rwsem
from irq/softirq context).
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre3/rwsem-generic-2
I didn't exported rwsem.c if CONFIG_RWSEM_GENERIC is set to n as suggested
by Christoph yet because the old code couldn't be buggy and it's not obvious to
me that the other way around is correct (Christoph are you sure we can export an
object file that is not even compiled/generated? If answer is yes the export
mechanism must be smart enough to discard that file if not present but I'm not
sure if that's the case ;)
thanks for your comments.
Andrea
Hi Andrea,
In article <[email protected]> you wrote:
> I didn't exported rwsem.c if CONFIG_RWSEM_GENERIC is set to n as suggested
> by Christoph yet because the old code couldn't be buggy and it's not obvious to
> me that the other way around is correct (Christoph are you sure we can export an
> object file that is not even compiled/generated? If answer is yes the export
> mechanism must be smart enough to discard that file if not present but I'm not
> sure if that's the case ;)
Yes! All the objects in export-objs only get additional depencies in
Rules.make - but if they do not get compiled at all that depencies won't
matter either. All other makefile work this way, btw.
In my first mail I forgot that the makefile can be optimized even
further, the hunk should look like this:
(NOTE: the patch is handwritten, no apply gurantee)
diff -urN 2.4.4pre3/lib/Makefile rwsem/lib/Makefile
--- 2.4.4pre3/lib/Makefile Sat Apr 14 15:21:29 2001
+++ rwsem/lib/Makefile Tue Apr 17 21:58:57 2001
@@ -10,10 +10,12 @@
export-objs := cmdline.o rwsem.o
-obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o rwsem.o
+obj-y := errno.o ctype.o string.o vsprintf.o brlock.o cmdline.o
ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
obj-y += dec_and_lock.o
endif
+obj-$(CONFIG_GENERIC_RWSEM) += rwsem.o
+
include $(TOPDIR)/Rules.make
Christoph
--
Of course it doesn't work. We've performed a software upgrade.
> I am sure ppc couldn't race (at least unless up_read/up_write were excuted
> from irq/softnet context and that never happens in 2.4.4pre3, see below ;).
This is not actually using the rwsem code I wrote at the moment.
> And incidentally the above is what (I guess Richard) did on the alpha and
> that should really go into common code instead of having
> asm-i386/rwsem-xadd.h asm-alpha/rwsem.h etcc.etc... just implement
> atomic_inc_return using xadd in asm-i386/atomic.h, that's much better
> design IMHO.
I disagree... you want such primitives to be as efficient as possible. The
whole point of having asm/xxxx.h files is that you can stuff them full of
dirty tricks specific to certain architectures.
> That can obviously be done for example with C code like this:
> count = atomic_inc_return(&sem->count);
> if (__builtin_expect(count == 0, 0))
> slow_path()
>
> The above is the perfect C implementation IMHO
But not so efficient since it _has_ to take a jump unless the compiler can
emit code in alternative text sections when it seems appropriate. Plus, you
have to test count's value. XADD sets EFLAGS based on the result in memory,
something that allows all but one fastpath to be two instructions in length
(the one that isn't is three).
But, yes, there should probably be two generic cases: one implemented with a
spinlock in the rwsem struct (as I supply) and one implemented using
atomic_add_return(). Note, however! atomic_add_return() is not necessarily
implemented efficiently: if it involves a cmpxchg loop (as many seem to),
then that is really quite inefficient, and may be better done as a spinlock
anyway.
I've had a look at your implementation... It seems to hold the spinlocks for
an awfully long time... specifically around the local variable initialisation
in the 'failed' functions. Don't forget that the compiler can't reorder these
because they're inside the spinlock. I would, if I were you, fold the
'failed' functions into the main ones to avoid this problem.
Your rw_semaphore structure is also rather large: 46 bytes without debugging
stuff (16 bytes apiece for the waitqueues and 12 bytes for the rest).
Contrast that with mine: generic is 24 bytes and the i386-xadd optimised is
20.
Admittedly, though, yours is extremely simple and easy to follow, but I don't
think it's going to be very fast.
Of course, I still prefer mine... smaller, faster, more efficient:-)
David
On Tue, Apr 17, 2001 at 11:29:23PM +0200, Christoph Hellwig wrote:
> Yes! All the objects in export-objs only get additional depencies in
> Rules.make - but if they do not get compiled at all that depencies won't
> matter either. All other makefile work this way, btw.
ok thanks for the confirm.
> In my first mail I forgot that the makefile can be optimized even
> further, the hunk should look like this:
Yes, I didn't used the -$() form only because I thought I had to make
conditional the export-objs too.
I applied it.
Andrea
On Tue, Apr 17, 2001 at 10:48:02PM +0100, D . W . Howells wrote:
> I disagree... you want such primitives to be as efficient as possible. The
> whole point of having asm/xxxx.h files is that you can stuff them full of
> dirty tricks specific to certain architectures.
Of course you always have the option to override completly and you should
on x86 (providing an API for total override is the main object of my patch).
> I've had a look at your implementation... It seems to hold the spinlocks for
> an awfully long time... specifically around the local variable initialisation
My point for not unlocking is that unlocking and locking back another spinlock
for the waitqueue and using the wait_even interface for serializing the slow
path is expensive and generates more cacheline ping pong between cpus. And
quite frankly I don't care about the scalability of the slow path so if the
slow path is simpler and slower I'm happy with it.
> Your rw_semaphore structure is also rather large: 46 bytes without debugging
It is 36bytes. and on 64bit archs the difference is going to be less.
> stuff (16 bytes apiece for the waitqueues and 12 bytes for the rest).
The real waste is the lock of the waitqueue that I don't need, so I should
probably keep two list_head in the waitqueue instead of using the
wait_queue_head_t and wake_up_process by hand.
> Admittedly, though, yours is extremely simple and easy to follow, but I don't
> think it's going to be very fast.
The fast path has to be as fast as yours, if not then the only variable that
can make difference is the fact I'm not inlining the fast path because it's not
that small, in such a case I should simply inline the fast path, I don't care
about the scalability of the slow path and I think the slow path may even be
faster than yours because I don't run additional unlock/lock and memory
barriers and the other cpus will stop dirtifying my stuff after their first
trylock until I unlock.
If you have time to benchmark I'd be interested to see some number. But anyways
my implementation was mostly meant to be obviously right and possible to
ovverride with per-arch algorithms.
Andrea
> It is 36bytes. and on 64bit archs the difference is going to be less.
You're right - I can't add up (must be too late at night), and I was looking
at wait_queue not wait_queue_head. I suppose that means my implementations
are then 20 and 16 bytes respectively.
On 64-bit archs the difference will be less, depending on what a "long" is.
> The real waste is the lock of the waitqueue that I don't need, so I should
> probably keep two list_head in the waitqueue instead of using the
> wait_queue_head_t and wake_up_process by hand.
Perhaps you should steal my wake_up_ctx() idea. That means you only need one
wait queue, and you use bits in the wait_queue flags to note which type of
waiter is at the front of the queue.
You can then say "wake up the first thing at the front of the queue if it is
a writer"; and you can say "wake up the first consequtive bunch of things at
the front of the queue, provided they're all readers" or "wake up all the
readers in the queue".
> The fast path has to be as fast as yours, if not then the only variable
> that can make difference is the fact I'm not inlining the fast path because
> it's not that small, in such a case I should simply inline the fast path
My point exactly... It can't be as fast because it's _all_ out of line.
Therefore you always have to go through the overhead of a function call,
whatever that entails on the architecture of choice.
> I don't care about the scalability of the slow path and I think the slow
> path may even be faster than yours because I don't run additional
> unlock/lock and memory barriers and the other cpus will stop dirtifying my
> stuff after their first trylock until I unlock.
Except for the optimised case, you may be correct on an SMP configured kernel
(for a UP kernel, spinlocks are nops).
However! mine runs for as little time as possible with spinlocks held in the
generic case, and, perhaps more importantly, as little time as possible with
interrupts disabled.
One other thing: should you be using spin_lock_irqsave() instead of
spin_lock_irq() in your down functions? I'm not sure it's necessary, however,
since you probably shouldn't be sleeping if you've got the interrupts
disabled (though schedule() will cope).
> If you have time to benchmark I'd be interested to see some number. But
> anyways my implementation was mostly meant to be obviously right and
> possible to ovverride with per-arch algorithms
I'll have a go tomorrow evening. It's time to go to bed now I think:-)
David
On Wed, Apr 18, 2001 at 12:54:41AM +0100, D . W . Howells wrote:
> > It is 36bytes. and on 64bit archs the difference is going to be less.
>
> You're right - I can't add up (must be too late at night), and I was looking
> at wait_queue not wait_queue_head. I suppose that means my implementations
> are then 20 and 16 bytes respectively.
>
> On 64-bit archs the difference will be less, depending on what a "long" is.
yes. I actually modified my implementation and now I gone below the size of a
waitqueue because I don't need its lock. The rw_semaphore now is only 16
bytes in size even in SMP (while your generic rw_semaphore is larger than 16 in
smp). I also changed the wakeup mechanism to do the same fair logic as yours
but I'm not changing any common code (so it's complete FIFO with a wake up all
contigous readers behaviour). It's also now completly fair as the fast path
will go to sleep if anybody is registered in a waitqueue so it has the property
that we are still missing in the non rw semaphores. And it still seems quite
obvious while reading it.
> Perhaps you should steal my wake_up_ctx() idea. That means you only need one
I now stolen the wakeup logic but I reimplemented it internally to the rwsem.c
without involving the external visible waitqueue mechanism (short version: no
changes to sched.c and wait.h).
> You can then say "wake up the first thing at the front of the queue if it is
> a writer"; and you can say "wake up the first consequtive bunch of things at
> the front of the queue, provided they're all readers" or "wake up all the
> readers in the queue".
I preferred not to generalize that in the wake_up_ctx way but yes I hardcoded
that in a function that knows what to do with a simple list_head that is
even ligther and faster.
> My point exactly... It can't be as fast because it's _all_ out of line.
Ok, I also inlined the fast path. Of course if you're not running out of icache
(like in a benchmark dedicated to the rwsem) inlining the fast path is
faster... (previously I was talking about real world misc load where we can more
easily run out of icache)
> However! mine runs for as little time as possible with spinlocks held in the
> generic case, and, perhaps more importantly, as little time as possible with
> interrupts disabled.
But it reacquires other locks for the waitqueue and it clears irqs again too
and it will ping pong cachelines in the wait event interface. So it's _slower_
and not faster. Making the locks more granular makes sense when the contention
on the lock goes away after you make it granular, but you are using two
spinlocks instead of one and you still have contention in the same slow path
on the second spinlock and wait_even runtime, while I only have contention in
the first one and that's why I'm more efficient (see numbers below).
> One other thing: should you be using spin_lock_irqsave() instead of
> spin_lock_irq() in your down functions? I'm not sure it's necessary, however,
It's not necessary to save flags because they can sleep.
BTW, your rwsem-spinlock.h forgets the clear irqs in down_* and to clear irqs
and save flags in the up_*! So my spinlock are penalized as the cli/sti pairs
are not that light and you are providing weaker wakeup semantics than me.
My new implementation only handle up to 2^31 concurrent readers, as usual
unlimited sleepers in the slow paths and down_read is not anymore recursive
because of the guaranteed total fifo behaviour. This scenario will deadlock
while it was working fine with the previous patches on my ftp area:
task0 task1
down_read(sem)
down_write(sem)
down_read(sem)
new patch is here:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre3/rwsem-generic-5
And now I compared 2.4.4pre3aa3 - 00_rwsem-generic-1 with 2.4.4pre3aa3 - 00_rwsem-generic-1 + rwsem-generic-5,
that is the same as comparing vanilla 2.4.4pre3 with vanilla 2.4.4pre3 + rwsem-generic-5.
I wrote this rwsem stresser (if you have any bug in the rwsem this stresser will trigger it
almost immediatly).
/*
* rw_semaphore benchmark (use with 2.4.3 or more recent kernels
* that uses down_read() in the page fault and down_write() in mmap).
*
* Copyright (C) 2001 Andrea Arcangeli <[email protected]> SuSE
*/
#include <pthread.h>
#include <stdio.h>
#include <asm/page.h>
#include <sys/mman.h>
#include <asm/system.h>
#define NR_THREADS 50
#define RWSEM_LOOPS 500
#define READ_DOWN_PER_WRITE_DOWN 4
static int start;
void * rwsemflood(void * foo)
{
int i;
pthread_mutex_t * mutex = (pthread_mutex_t *) foo;
if (pthread_mutex_lock(mutex))
perror("pthread_mutex_lock"), exit(1);
for (i = 0; i < RWSEM_LOOPS; i++) {
volatile char * mem;
int i;
mem = mmap(NULL, PAGE_SIZE * READ_DOWN_PER_WRITE_DOWN,
PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (mem == MAP_FAILED)
perror("mmap"), exit(1);
for (i = 0; i < PAGE_SIZE * READ_DOWN_PER_WRITE_DOWN; i += PAGE_SIZE)
*(mem+i);
if (munmap((char *)mem, PAGE_SIZE) < 0)
perror("munmap"), exit(1);
}
pthread_exit(NULL);
}
main()
{
pthread_t thread[NR_THREADS];
pthread_mutex_t mutex[NR_THREADS];
int i;
for (i = 0; i < NR_THREADS; i++) {
pthread_mutex_init(&mutex[i], NULL);
if (pthread_mutex_lock(&mutex[i]))
perror("pthread_mutex_lock"), exit(1);
}
for (i = 0; i < NR_THREADS; i++)
if (pthread_create(&thread[i], NULL , rwsemflood, &mutex[i]) < 0)
perror("pthread_create"), exit(1);
for (i = 0; i < NR_THREADS; i++)
if (pthread_mutex_unlock(&mutex[i]))
perror("pthread_mutex_unlock"), exit(1);
for (i = 0; i < NR_THREADS; i++)
if (pthread_join(thread[i], NULL))
perror("pthread_join"), exit(1);
}
And here are the numbers with the plain 2.4.4pre3 rwsemaphores that are
implemented in asm with the kernel configured for PII and the wakeup that
cannot happen in an irq/softirq. The hardware is a 2-way SMP PII.
andrea@laser:~ > for i in 1 2 3 4; do time ./rwsem ;done
real 0m51.587s
user 0m0.100s
sys 0m52.770s
real 0m50.476s
user 0m0.100s
sys 0m50.730s
real 0m51.502s
user 0m0.110s
sys 0m53.110s
real 0m50.437s
user 0m0.080s
sys 0m51.070s
and now here it is the same benchmark run with rwsem-generic-5:
ndrea@laser:~ > for i in 1 2 3 4; do time ./rwsem ;done
real 0m50.035s
user 0m0.080s
sys 0m51.430s
real 0m50.636s
user 0m0.090s
sys 0m51.100s
real 0m50.038s
user 0m0.050s
sys 0m50.640s
real 0m50.655s
user 0m0.060s
sys 0m50.800s
as you can see despite it was an unfair comparison my implementation was still
a bit faster or at least running at the same speed. And yes, this only
benchmark the slow path, the fast path is not easy to measure from userspace
and since I now inlined the fast path it must not run slower than yours. If you
have more interesting bench go ahead of course.
I think you should now agree on my generic rwsemaphore implementation.
About your last patch where you try to change all archs to use your generic
implementation in pre3, it still breaks during compilation and I dislike
that long cryptic names:
CONFIG_RWSEM_GENERIC_SPINLOCK
CONFIG_RWSEM_XCHGADD_ALGORITHM
I don't see why you didn't call them CONFIG_RWSEM_GENERIC and
CONFIG_RWSEM_ATOMIC_RETURN respectively (as I suggested originally).
I now recommend anyone with an alpha to use 2.4.4pre3 with those
two patches applied:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/patches/v2.4/2.4.4pre3/rwsem-generic-5
(they can be applied also on ia32 kernels of course)
About your comment on the atomic_*_return vs spinlock in the fast path, the
atomic_*_return way is obviously much faster and shorter than the spinlock
version on the alpha and it also saves 8 bytes in the size of the rw_semaphore
compared to the generic implementation (and I'm sure the same is valid for the
other RISC chips that provides read locked store conditional mechanism to
implement atomic updates in memory). That's why I suggested to move it in the
common code (if it would be slower not even the alpha would be trying to use it
:). It's just that by sharing it we increase the userbase of the brainer part.
Also on any 64bit arch we can provide 2^32 readers and 2^32 writers at the same
time without the need of a spinlock. We could do the same also on >=586 using
chmxchg8b but I'm not sure if that would be a great idea.
Andrea
Bob McElrath [[email protected]] wrote:
> Andrea Arcangeli [[email protected]] wrote:
> >
> > So please try to reproduce the hang with 2.4.4pre3 with those two
> > patches applied:
> >
> > ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_alpha-numa-3
> > ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre3aa3/00_rwsem-generic-1
> >
> > All alpha users should run with at least the above two patches applied
> > to compile their tree and to make sure to have rock solid rwsemaphores.
>
> Excellent! I'll give it a try.
>
> Note that I recently saw the X hang with the 2.2.19 kernel, but I still
> haven't seen the process-table-hang with 2.2.19 (about 4 days running
> with 2.2.19). It is *far* easier to get the X hang in 2.4 than 2.2.
> (minutes for 2.4, days for 2.2) Also note that this is not an SMP
> machine (single processor 21164a, LX164 mobo).
>
> But I'll apply your patch tonight and let you know the results.
Status report:
I'm at 2 days uptime now, and have not seen the process-table-hang.
Looks like this fixed it. Previously I would get a hang in the first
day or so. I'm using your alpha-numa-3 and rwsem-generic-4 against
2.4.4pre3.
Cheers,
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
On Thu, Apr 19, 2001 at 11:21:17AM -0500, Bob McElrath wrote:
> I'm at 2 days uptime now, and have not seen the process-table-hang.
> Looks like this fixed it. Previously I would get a hang in the first
> day or so. I'm using your alpha-numa-3 and rwsem-generic-4 against
> 2.4.4pre3.
good, thanks for the report.
BTW, if you upgrade to 2.4.4pre4 you can apply those two patches:
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_alpha-numa-4
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6
really the first is not necessary anymore unless you're using a wildfire. The
second also resurrect the optimized rwsemaphores for all archs but alpha and
ia32.
Andrea
Andrea Arcangeli [[email protected]] wrote:
> On Thu, Apr 19, 2001 at 11:21:17AM -0500, Bob McElrath wrote:
> > I'm at 2 days uptime now, and have not seen the process-table-hang.
> > Looks like this fixed it. Previously I would get a hang in the first
> > day or so. I'm using your alpha-numa-3 and rwsem-generic-4 against
> > 2.4.4pre3.
>
> good, thanks for the report.
>
> BTW, if you upgrade to 2.4.4pre4 you can apply those two patches:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_alpha-numa-4
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.4pre4aa1/00_rwsem-generic-6
>
> really the first is not necessary anymore unless you're using a wildfire. The
> second also resurrect the optimized rwsemaphores for all archs but alpha and
> ia32.
Well, take that back, I just got it to hang. Again, this is 2.4.4pre3
with alpha-numa-3 and rwsem-generic-4. I saw it upon starting mozilla.
I also saw some scary filesystem errors that may or may not be related:
Apr 23 18:09:40 draal kernel: EXT2-fs error (device sd(8,2)):
ext2_new_block: Free blocks count corrupted for block group 252
There has been a lot of discussion on the topic of rwsems (that,
admittedly, I haven't followed very closely). It looks like
rwsem-generic-6 is the latest from Andrea, I'll build a new 2.4.4pre4
kernel with these patches and let you know the results. Have you made
changes between rwsem-generic-4 and rwsem-generic-6 that would
fix/prevent a deadlock?
Let me know if there are any useful tests I could perform. Would it be
useful for me to run the rwsem benchmarks you've been using? Could
these detect a deadlock situation?
Cheers,
-- Bob
Bob McElrath ([email protected])
Univ. of Wisconsin at Madison, Department of Physics
On Mon, Apr 23, 2001 at 06:27:23PM -0500, Bob McElrath wrote:
> Well, take that back, I just got it to hang. Again, this is 2.4.4pre3
> with alpha-numa-3 and rwsem-generic-4. I saw it upon starting mozilla.
> I also saw some scary filesystem errors that may or may not be related:
> Apr 23 18:09:40 draal kernel: EXT2-fs error (device sd(8,2)):
> ext2_new_block: Free blocks count corrupted for block group 252
That is probably unrelated to the ps hang. I suspect you are been bitten by the
ext2 metadata corruption (2.4.4pre2 was just fixed but previous kernel wasn't).
> rwsem-generic-6 is the latest from Andrea, I'll build a new 2.4.4pre4
> kernel with these patches and let you know the results. Have you made
Yes, that's safe.
> changes between rwsem-generic-4 and rwsem-generic-6 that would
> fix/prevent a deadlock?
No, but I think they are two separate issues.
> Let me know if there are any useful tests I could perform. Would it be
> useful for me to run the rwsem benchmarks you've been using? Could
> these detect a deadlock situation?
yes to be sure you can run it without my patch and see if it hangs (I never
tried that myself, but I was able to reproduce the ps hang quite easily and it
was quite obviously due the rwsemaphores and it gone away completly after I
used the generic semaphores).
Andrea