2006-12-25 23:01:20

by Florin Iucha

[permalink] [raw]
Subject: Linux 2.6.20-rc2

I've got an oops or two while copying 60 Gb of files over NFS then
comparing them using diff. The client is AMD64 running Debian
testing/unstable with the shinny new 2.6.20-rc2 kernel. The server is
Debian testing with 2.6.18-3 distribution kernel. The source
filesystem is ext3.

I left the machine to run the diff and when I came back, the USB keyboard
was unresponsive although the USB mice plugged in the hub built into
the keyboard were working fine. I was able to ssh into the box,
capture the dmesg and reboot. Everything went down quietly but the
box froze at the "... will restart". I had no working keyboard and
no way to see if it was indeed frozen or not.

I got a similar event of keyboard loss while copying the files using
2.6.20-rc1. I was able to copy the files using 2.6.19.

The dmesg from the client machine is attached.
florin

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (956.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2006-12-27 01:50:20

by Florin Iucha

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc2

On Wed, Dec 27, 2006 at 12:42:53AM +0100, Ingo Molnar wrote:
> * Florin Iucha <[email protected]> wrote:
> > I saw your subsequent message and will apply the patch, retest and
> > report.
>
> yeah. Just to make sure i've attached the latest and greatest version of
> the patch - please make sure you have this one applied.

The good news is, with this patch there is no oops.

The bad news is, the USB keyboard is still not functioning, but the
mice plugged in the keyboard hub are working.

One down, one more to go...

florin

> ---------------------->
> Subject: [patch] sched: fix cond_resched_softirq() offset
> From: Ingo Molnar <[email protected]>
>
> remove the __resched_legal() check: it is conceptually broken.
> The biggest problem it had is that it can mask buggy cond_resched()
> calls. A cond_resched() call is only legal if we are not in an
> atomic context, with two narrow exceptions:
>
> - if the system is booting
> - a reacquire_kernel_lock() down() done while PREEMPT_ACTIVE is set
>
> But __resched_legal() hid this and just silently returned whenever
> these primitives were called from invalid contexts. (Same goes for
> cond_resched_locked() and cond_resched_softirq()).
>
> furthermore, the __legal_resched(0) call was buggy in that it caused
> unnecessarily long softirq latencies via cond_resched_softirq(). (which
> is only called from softirq-off sections, hence the code did nothing.)
>
> the fix is to resurrect the efficiency of the might_sleep checks and to
> only allow the narrow exceptions.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/sched.c | 18 ++++--------------
> 1 file changed, 4 insertions(+), 14 deletions(-)
>
> Index: linux/kernel/sched.c
> ===================================================================
> --- linux.orig/kernel/sched.c
> +++ linux/kernel/sched.c
> @@ -4617,17 +4617,6 @@ asmlinkage long sys_sched_yield(void)
> return 0;
> }
>
> -static inline int __resched_legal(int expected_preempt_count)
> -{
> -#ifdef CONFIG_PREEMPT
> - if (unlikely(preempt_count() != expected_preempt_count))
> - return 0;
> -#endif
> - if (unlikely(system_state != SYSTEM_RUNNING))
> - return 0;
> - return 1;
> -}
> -
> static void __cond_resched(void)
> {
> #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
> @@ -4647,7 +4636,8 @@ static void __cond_resched(void)
>
> int __sched cond_resched(void)
> {
> - if (need_resched() && __resched_legal(0)) {
> + if (need_resched() && !(preempt_count() & PREEMPT_ACTIVE) &&
> + system_state == SYSTEM_RUNNING) {
> __cond_resched();
> return 1;
> }
> @@ -4673,7 +4663,7 @@ int cond_resched_lock(spinlock_t *lock)
> ret = 1;
> spin_lock(lock);
> }
> - if (need_resched() && __resched_legal(1)) {
> + if (need_resched() && system_state == SYSTEM_RUNNING) {
> spin_release(&lock->dep_map, 1, _THIS_IP_);
> _raw_spin_unlock(lock);
> preempt_enable_no_resched();
> @@ -4689,7 +4679,7 @@ int __sched cond_resched_softirq(void)
> {
> BUG_ON(!in_softirq());
>
> - if (need_resched() && __resched_legal(0)) {
> + if (need_resched() && system_state == SYSTEM_RUNNING) {
> raw_local_irq_disable();
> _local_bh_enable();
> raw_local_irq_enable();
>

--
Bruce Schneier expects the Spanish Inquisition.
http://geekz.co.uk/schneierfacts/fact/163


Attachments:
(No filename) (3.22 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2007-01-03 13:27:48

by Jiri Kosina

[permalink] [raw]
Subject: Re: Linux 2.6.20-rc2

On Mon, 25 Dec 2006, Florin Iucha wrote:

> I left the machine to run the diff and when I came back, the USB
> keyboard was unresponsive although the USB mice plugged in the hub built
> into the keyboard were working fine. I was able to ssh into the box,
> capture the dmesg and reboot. Everything went down quietly but the box
> froze at the "... will restart". I had no working keyboard and no way
> to see if it was indeed frozen or not.

Hi Florin,

I have not seen any similar bugreports, but it seems that you are able to
reproduce the problem reliably to some extent.

Do you think that you could try to narrow down whether the HID core
patches that went to 2.6.20-rc1 might possibly be causing your problem?

The easiest way might probably be just reverting the following commits and
see if you can still reproduce the problems. It would be nice if you could
try, so that we know whether it is caused by HID core, or any other
post-2.6.19 USB/input changes.

10f549fa1538849548787879d96bbb3450f06117
4ef4caad41630c7caa6e2b94c6e7dda7e9689714
1c1e40b5ad6e345feba69bc612db006efccf4cdc
e3a0dd7ced76bb439ddeda244a9667e7b3800fc8
63f3861d2fbf8ccbad1386ac9ac8b822c036ea00
4c2ae844b5ef85fd4b571c9c91ac48afa6ef2dfc
aa8de2f038baec993f07ef66fb3e94481d1ec22b
aa938f7974b82cfd9ee955031987344f332b7c77
4916b3a57fc94664677d439b911b8aaf86c7ec23
229695e51efc4ed5e04ab471c82591d0f432909d
dde5845a529ff753364a6d1aea61180946270bfa
64bb67b1702958759f650adb64ab33496641e526

They should be revertible without conflict in this order.

Thanks,

--
Jiri Kosina