2000-11-07 03:16:12

by kumon

[permalink] [raw]
Subject: locks.c: removal of semaphores

Andrew,
I got 5250 Req/s with your locks-sem.patch on normal Apache.
It is good performance on normal Apache.

Andrew Morton writes:
> Kouichi, could you please test the performance of this on
> your 8-way with Apache+fcntl serialisation? (the normal
> Apache). Please use 2.4.0-test10-pre5, not 2.4.0-test10.
> Something has gone funny with test10 and I'm getting much
> lower rates.

Followings are the recent data with/without serialization.

w/ serialize w/o serialize
240t10pre5 2237 5358
240t10pre5+P2 5253 5355**
240t10pre5+P3 --- NG
240t10pre5+locksem 5250 ---
**: once we found deadlock
NG: cannot complete measurement
--: we've not measured.

Normal apache on various kernel setting as follows:

> test8 5287 <-- best performance
> test10-pre5+P2 5258
> 240t10pre5+locksem 5250
> test9+P2 5243
> test9+mypatch 5192 <-- a little bit worse
> test10-pre5+P1 5187
> test1 3702 <-- no good scalability
> test10-pre5 2255 <-- negative scalability
> test9 2193


We also did durability test of 2.4.0-test10-pre5. Unfortunately
enough, we didn't successfully complete the test of Apache w/o
serialization (-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT), it couldn't
continue to run for a night. The kernel got complete deadlock.

The message is:
"Unable to handle kernel NULL pointer dereference NMI watchdog detected LOCKUP on CPU1."

Yes, obviously it's not Andrew's problem, that is genuine test10-pre5.

Hidden bugs are awakened by removing serialization.

If the bug is same as what I observed, It is NULL pointer dereference
on run-queue list.
--
Computer Systems Laboratory, Fujitsu Labs.
[email protected]


2000-11-07 07:25:12

by Andrew Morton

[permalink] [raw]
Subject: Re: locks.c: removal of semaphores

[email protected] wrote:
>
> Andrew,
> I got 5250 Req/s with your locks-sem.patch on normal Apache.
> It is good performance on normal Apache.

Great. Thanks again. Trond has this patch now for ongoing
NFS locking stuff. Hopefully 2.4 will now work OK with "legacy"
Apache configurations.

As long as the "new style" Apache configurations work OK.
Which leads us to...

> ...
> We also did durability test of 2.4.0-test10-pre5. Unfortunately
> enough, we didn't successfully complete the test of Apache w/o
> serialization (-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT), it couldn't
> continue to run for a night. The kernel got complete deadlock.
>
> The message is:
> "Unable to handle kernel NULL pointer dereference NMI watchdog detected LOCKUP on CPU1."
>
> Yes, obviously it's not Andrew's problem, that is genuine test10-pre5.
>
> Hidden bugs are awakened by removing serialization.
>

(Places fingertips lightly upon spinning disk drive.
Closes eyes. Quietly hums low monotone. Time passes...)



I sense a corrupted struct timer_list!

Your kernel was traversing the tvecs[] array and took a fault.
The fault handler called printk() which called the console code
which called mod_timer() which deadlocked on timerlist_lock.
The NMI handler broke the deadlock and called the console code.
The NMI handler also deadlocked on timerlist_lock.

If this theory is correct the below patch should remove the
deadlock and thus allow you to get the usual diagnostics.

But I suspect the diagnostics will just show a trace up
into cascade_timers() or run_timer_list() which doesn't
tell us much.

If that is the case I'm afraid you will have to identify the
_exact_ statement where the error occured, put some diagnostic
code just prior to it and run the tests a second time.

The interesting piece of information will be the timer's
function pointer. So you'll need to add something like:

struct timer_list *some_timer; /* The timer which causes the fault */
...
if (some_timer->list.next == 0 || some_timer->list.prev == 0)
printk("Corrupted timer! function=0x%p\n", some_timer->function);
/* The next statement is where the oops occurs */
<some statement which uses *some_timer>

Then if you can look the function pointer up in System.map or gdb
we will have our culprit.




--- linux-2.4.0-test10/arch/i386/kernel/traps.c Sat Nov 4 16:22:47 2000
+++ linux-akpm/arch/i386/kernel/traps.c Tue Nov 7 17:56:15 2000
@@ -396,9 +396,22 @@

__setup("nmi_watchdog=", setup_nmi_watchdog);

-extern spinlock_t console_lock;
+extern spinlock_t console_lock, timerlist_lock;
static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED;

+/*
+ * Unlock any spinlocks which will prevent us from getting the
+ * message out
+ */
+
+void bust_spinlocks(void)
+{
+ spin_trylock(&console_lock);
+ spin_unlock(&console_lock);
+ spin_trylock(&timerlist_lock);
+ spin_unlock(&timerlist_lock);
+}
+
inline void nmi_watchdog_tick(struct pt_regs * regs)
{
/*
@@ -439,8 +452,7 @@
* We are in trouble anyway, lets at least try
* to get a message out.
*/
- spin_trylock(&console_lock);
- spin_unlock(&console_lock);
+ bust_spinlocks();
printk("NMI Watchdog detected LOCKUP on CPU%d, registers:\n", cpu);
show_registers(regs);
printk("console shuts up ...\n");
--- linux-2.4.0-test10/arch/i386/mm/fault.c Sat Nov 4 16:22:47 2000
+++ linux-akpm/arch/i386/mm/fault.c Tue Nov 7 17:57:15 2000
@@ -24,6 +24,7 @@
#include <asm/hardirq.h>

extern void die(const char *,struct pt_regs *,long);
+extern void bust_spinlocks(void);

/*
* Ugly, ugly, but the goto's result in better assembly..
@@ -250,6 +251,8 @@
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/
+
+ bust_spinlocks();

if (address < PAGE_SIZE)
printk(KERN_ALERT "Unable to handle kernel NULL pointer dereference");