2001-07-11 09:07:37

by kladit

[permalink] [raw]
Subject: 2.4.7p6 hang

Kernel: 2.4.7p5 or 2.4.7p6
System: PII-SMP, BX-Chipset

The kernel boots up to the message

..
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039

and then stops.

I actually use 2.4.7p3 without problems.

I am not on the kernel mailing-list.

--
Best regards
Klaus Dittrich

e-mail: [email protected]


2001-07-11 12:57:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

>>>>> " " == Klaus Dittrich <[email protected]> writes:

> Kernel: 2.4.7p5 or 2.4.7p6 System: PII-SMP, BX-Chipset

> The kernel boots up to the message

> .. Linux NET4.0 for Linux 2.4 Based upon Swansea University
> Computer Society NET3.039

> and then stops.

> I actually use 2.4.7p3 without problems.

I have the same problem on my setup. To me, it looks like the loop in
spawn_ksoftirqd() is suffering from some sort of atomicity problem.

I managed to band-aid over the problem by replacing the loop with a
semaphore which the child clears when it has been initialized (as per
the appended patch).

Linus?

Cheers,
Trond

--- linux-2.4.7-smp/kernel/softirq.c.orig Wed Jul 11 10:31:50 2001
+++ linux-2.4.7-smp/kernel/softirq.c Wed Jul 11 14:43:03 2001
@@ -371,6 +371,8 @@
}
}

+static DECLARE_MUTEX_LOCKED(ksoftirqd_start);
+
static int ksoftirqd(void * __bind_cpu)
{
int bind_cpu = *(int *) __bind_cpu;
@@ -391,6 +393,7 @@
mb();

ksoftirqd_task(cpu) = current;
+ up(&ksoftirqd_start);

for (;;) {
if (!softirq_pending(cpu))
@@ -416,12 +419,8 @@
if (kernel_thread(ksoftirqd, (void *) &cpu,
CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0)
printk("spawn_ksoftirqd() failed for cpu %d\n", cpu);
- else {
- while (!ksoftirqd_task(cpu_logical_map(cpu))) {
- current->policy |= SCHED_YIELD;
- schedule();
- }
- }
+ else
+ down(&ksoftirqd_start);
}

return 0;

2001-07-11 13:38:10

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

Trond Myklebust wrote:
>
> ...
> I have the same problem on my setup. To me, it looks like the loop in
> spawn_ksoftirqd() is suffering from some sort of atomicity problem.

Does a `set_current_state(TASK_RUNNING);' in spawn_ksoftirqd()
fix it? If so we have a rogue initcall...

-

2001-07-11 14:22:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

>>>>> " " == Andrew Morton <[email protected]> writes:

> Trond Myklebust wrote:
>>
>> ... I have the same problem on my setup. To me, it looks like
>> the loop in spawn_ksoftirqd() is suffering from some sort of
>> atomicity problem.

> Does a `set_current_state(TASK_RUNNING);' in spawn_ksoftirqd()
> fix it? If so we have a rogue initcall...

Nope. The same thing happens as before.

A couple of debugging statements show that ksoftirqd_CPU0 gets created
fine, and that ksoftirqd_task(0) is indeed getting set correctly
before we loop in spawn_ksoftirqd().
After this the second call to kernel_thread() succeeds, but
ksoftirqd() itself never gets called before the hang occurs.

Cheers,
Trond

2001-07-11 15:49:33

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

On Wed, Jul 11, 2001 at 02:56:43PM +0200, Trond Myklebust wrote:
> >>>>> " " == Klaus Dittrich <[email protected]> writes:
>
> > Kernel: 2.4.7p5 or 2.4.7p6 System: PII-SMP, BX-Chipset
>
> > The kernel boots up to the message
>
> > .. Linux NET4.0 for Linux 2.4 Based upon Swansea University
> > Computer Society NET3.039
>
> > and then stops.
>
> > I actually use 2.4.7p3 without problems.
>
> I have the same problem on my setup. To me, it looks like the loop in
> spawn_ksoftirqd() is suffering from some sort of atomicity problem.

can you reproduce with 2.4.7pre5aa1?

Andrea

2001-07-11 15:58:43

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

On Wed, Jul 11, 2001 at 04:22:04PM +0200, Trond Myklebust wrote:
> >>>>> " " == Andrew Morton <[email protected]> writes:
>
> > Trond Myklebust wrote:
> >>
> >> ... I have the same problem on my setup. To me, it looks like
> >> the loop in spawn_ksoftirqd() is suffering from some sort of
> >> atomicity problem.
>
> > Does a `set_current_state(TASK_RUNNING);' in spawn_ksoftirqd()
> > fix it? If so we have a rogue initcall...
>
> Nope. The same thing happens as before.
>
> A couple of debugging statements show that ksoftirqd_CPU0 gets created
> fine, and that ksoftirqd_task(0) is indeed getting set correctly
> before we loop in spawn_ksoftirqd().
> After this the second call to kernel_thread() succeeds, but
> ksoftirqd() itself never gets called before the hang occurs.

ksoftirqd is quite scheduler intensive, and while its startup is
correct (no need of any change there), it tends to trigger scheduler
bugs (one of those bugs was just fixed in pre5). The reason I never seen
the deadlock I also fixed this other scheduler bug in my tree:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.7pre5aa1/00_sched-yield-1

this one I forgot to sumbit but here it is now for easy merging:

--- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001
+++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001
@@ -674,8 +674,10 @@
#endif
spin_unlock_irq(&runqueue_lock);

- if (prev == next)
+ if (prev == next) {
+ current->policy &= ~SCHED_YIELD;
goto same_process;
+ }

#ifdef CONFIG_SMP
/*


Andrea

2001-07-11 16:31:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

>>>>> " " == Andrea Arcangeli <[email protected]> writes:

> ksoftirqd is quite scheduler intensive, and while its startup
> is correct (no need of any change there), it tends to trigger
> scheduler bugs (one of those bugs was just fixed in pre5). The
> reason I never seen the deadlock I also fixed this other
> scheduler bug in my tree:

> --- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001
> +++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001
> @@ -674,8 +674,10 @@
> #endif
> spin_unlock_irq(&runqueue_lock);

> - if (prev == next)
> + if (prev == next) {
> + current->policy &= ~SCHED_YIELD;
> goto same_process;
> + }

> #ifdef CONFIG_SMP
> /*

I no longer see the hang with this patch, but I'm not sure I
understand why it works.
Does the above mean that the hang is occuring because spawn_ksoftirqd
is yielding back to itself? If so, the semaphore trick seems more
robust, as it causes a proper sleep until it's safe to wake up.

Cheers,
Trond

2001-07-11 16:53:19

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

On Wed, Jul 11, 2001 at 06:30:43PM +0200, Trond Myklebust wrote:
> >>>>> " " == Andrea Arcangeli <[email protected]> writes:
>
> > ksoftirqd is quite scheduler intensive, and while its startup
> > is correct (no need of any change there), it tends to trigger
> > scheduler bugs (one of those bugs was just fixed in pre5). The
> > reason I never seen the deadlock I also fixed this other
> > scheduler bug in my tree:
>
> > --- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001
> > +++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001
> > @@ -674,8 +674,10 @@
> > #endif
> > spin_unlock_irq(&runqueue_lock);
>
> > - if (prev == next)
> > + if (prev == next) {
> > + current->policy &= ~SCHED_YIELD;
> > goto same_process;
> > + }
>
> > #ifdef CONFIG_SMP
> > /*
>
> I no longer see the hang with this patch, but I'm not sure I
> understand why it works.

I do. It's very subtle and it goes down to the fork and scheduler
details.

> Does the above mean that the hang is occuring because spawn_ksoftirqd
> is yielding back to itself? If so, the semaphore trick seems more

No, that's a generic bug.

> robust, as it causes a proper sleep until it's safe to wake up.

rwsem is definitenly not more robust than the current code, if something
it hides if sched_yield is broken in the scheduler. no need to change
it wasting some static ram for a rwsem for no good reason.

The bug is that sched_yield must always be cleared at the time of a
fork() or the child may never get schedule. Only tasks running in-cpu are
allowed to have SCHED_YIELD set.

Another way to cure the deadlock could be to clear SCHED_YIELD in the child so
then you could even do something as silly as:

current->policy |= SCHED_YIELD;
fork()
schedule()

but the above doesn't make sense so we can optimize away the clear of
SCHED_YIELD of the child in fork. And even if you allow the above you
still need my attached fix for performance reason because if schedule()
returns that's all for the last sched_yield try, the next time we run
schedule without specifying sched_yield we don't want it to be threated
like a sched_yield again (that was the original reason of the patch
infact, I noticed now that the bug had very serious implication with
fork, such implication won't trigger only with ksoftirqd but also with
normal userspace forks, it's only that with ksoftirqd banging of the
scheduler it becomes reproducible).

Andrea

2001-07-11 17:20:42

by Mike Kravetz

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

On Wed, Jul 11, 2001 at 05:58:09PM +0200, Andrea Arcangeli wrote:
>
> this one I forgot to sumbit but here it is now for easy merging:
>
> --- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001
> +++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001
> @@ -674,8 +674,10 @@
> #endif
> spin_unlock_irq(&runqueue_lock);
>
> - if (prev == next)
> + if (prev == next) {
> + current->policy &= ~SCHED_YIELD;
> goto same_process;
> + }
>
> #ifdef CONFIG_SMP
> /*

I would like to second the need for this patch in the 'mainline' kernel.
Not too long ago, I came up with the following senario caused by this
bug. The scenario is based on the unmodified 2.4.4 scheduler.

- Task A calls sched_yield(), and the code in sys_sched_yield()
determines that a yield is in order and sets SCHED_YIELD in
the task's policy field and need_resched is set for this task.

- When Task A attempts to return to user land, schedule() will
be called (since need_resched was set). However, in this case
schedule() does not find a better task than A to run. Since
task A will continue to run, the 'same_process' goto is taken
in schedule(). Note that __schedule_tail() is not called, so
the SCHED_YIELD flag remains set in A when it continues to
execute.

- Task A then performs some operation which causes it to go into
a non-runnable state (such as calling nanosleep()). After setting
the state of Task A to something other than TASK_RUNNING, a call
to schedule() will be made. At this time Task A will be removed
from the runqueue (again note that SCHED_YIELD remains set in A).
Also, assume that there are no other runnable tasks so the idle
task is chosen to run next on this CPU.

- Now, after schedule() releases the runqueue lock the timer for
Task A fires and we call the wake_up code. This code path will
eventually call try_to_wake_up() which will set the state of A
to TASK_RUNNING, add A to the runqueue and call reschedule_idle()
for A.

- Note that we have not yet cleared the has_cpu field in A. Hence,
can_schedule() will never be true for task A. As a result, we
will not send an IPI to any other CPU. In effect, reschedule_idle()
is a noop.

- Now, we finally call __schedule_tail() for task A. After clearing
the SCHED_YIELD and has_cpu flags, we notice that the state of A
is TASK_RUNNING (it was set by try_to_wake_up()) and take the
needs_resched goto.

- The needs_resched block of code usually results in a call to
reschedule_idle for the task. However, the first line of code
in this block is:

/*
* Avoid taking the runqueue lock in cases where
* no preemption-check is necessery:
*/
if ((prev == idle_task(smp_processor_id())) ||
(policy & SCHED_YIELD))
goto out_unlock;

Since, the SCHED_YIELD flag was set in A when we entered this routine
we will not call reschedule_idle().

In this case, the CPU associated with task A is still idle yet we will
not schedule the task on the CPU. In addition, it is possible that at
this time ALL CPUs in the system could be idle. Hence, we would end up
with all CPUs idle while task A is on the runqueue. Not good!

--
Mike Kravetz [email protected]
IBM Linux Technology Center

2001-07-11 18:34:08

by Josh Logan

[permalink] [raw]
Subject: Re: 2.4.7p6 hang


I'm having a hang right after the floppy is initialised with pre5 and pre6
(2.4.3 works fine) I tried this patch, but it did not make any
improvments. The machine still has SysRq commands available. Please let
me know what other information you would like to debug this problem.

BTW, I also tried to disable the floppy in the BIOS and got:
...
Floppy OK
task queue still active
<HANG>

Later, JOSH


On Wed, 11 Jul 2001, Andrea Arcangeli wrote:

> On Wed, Jul 11, 2001 at 04:22:04PM +0200, Trond Myklebust wrote:
> > >>>>> " " == Andrew Morton <[email protected]> writes:
> >
> > > Trond Myklebust wrote:
> > >>
> > >> ... I have the same problem on my setup. To me, it looks like
> > >> the loop in spawn_ksoftirqd() is suffering from some sort of
> > >> atomicity problem.
> >
> > > Does a `set_current_state(TASK_RUNNING);' in spawn_ksoftirqd()
> > > fix it? If so we have a rogue initcall...
> >
> > Nope. The same thing happens as before.
> >
> > A couple of debugging statements show that ksoftirqd_CPU0 gets created
> > fine, and that ksoftirqd_task(0) is indeed getting set correctly
> > before we loop in spawn_ksoftirqd().
> > After this the second call to kernel_thread() succeeds, but
> > ksoftirqd() itself never gets called before the hang occurs.
>
> ksoftirqd is quite scheduler intensive, and while its startup is
> correct (no need of any change there), it tends to trigger scheduler
> bugs (one of those bugs was just fixed in pre5). The reason I never seen
> the deadlock I also fixed this other scheduler bug in my tree:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.7pre5aa1/00_sched-yield-1
>
> this one I forgot to sumbit but here it is now for easy merging:
>
> --- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001
> +++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001
> @@ -674,8 +674,10 @@
> #endif
> spin_unlock_irq(&runqueue_lock);
>
> - if (prev == next)
> + if (prev == next) {
> + current->policy &= ~SCHED_YIELD;
> goto same_process;
> + }
>
> #ifdef CONFIG_SMP
> /*
>
>
> Andrea
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-07-11 19:05:25

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

On Wed, Jul 11, 2001 at 11:33:40AM -0700, Josh Logan wrote:
>
> I'm having a hang right after the floppy is initialised with pre5 and pre6
> (2.4.3 works fine) I tried this patch, but it did not make any

is the problem introduced in pre5? Can you reproduce under 2.4.7pre4?

> improvments. The machine still has SysRq commands available. Please let
> me know what other information you would like to debug this problem.

SYSRQ+T

> BTW, I also tried to disable the floppy in the BIOS and got:
> ...
> Floppy OK
> task queue still active
> <HANG>

I'll soon have a look at this message.

Andrea

2001-07-11 19:28:19

by David Ford

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

This patch fixes the hang for me.

Thank you,
David

Josh Logan wrote:

>I'm having a hang right after the floppy is initialised with pre5 and pre6
>(2.4.3 works fine) I tried this patch, but it did not make any
>improvments. The machine still has SysRq commands available. Please let
>me know what other information you would like to debug this problem.
>
>BTW, I also tried to disable the floppy in the BIOS and got:
>...
>Floppy OK
>task queue still active
><HANG>
>
> Later, JOSH
>
>
>On Wed, 11 Jul 2001, Andrea Arcangeli wrote:
>
>>On Wed, Jul 11, 2001 at 04:22:04PM +0200, Trond Myklebust wrote:
>>
>>>>>>>>" " == Andrew Morton <[email protected]> writes:
>>>>>>>>
>>> > Trond Myklebust wrote:
>>> >>
>>> >> ... I have the same problem on my setup. To me, it looks like
>>> >> the loop in spawn_ksoftirqd() is suffering from some sort of
>>> >> atomicity problem.
>>>
>>> > Does a `set_current_state(TASK_RUNNING);' in spawn_ksoftirqd()
>>> > fix it? If so we have a rogue initcall...
>>>
>>>Nope. The same thing happens as before.
>>>
>>>A couple of debugging statements show that ksoftirqd_CPU0 gets created
>>>fine, and that ksoftirqd_task(0) is indeed getting set correctly
>>>before we loop in spawn_ksoftirqd().
>>>After this the second call to kernel_thread() succeeds, but
>>>ksoftirqd() itself never gets called before the hang occurs.
>>>
>>ksoftirqd is quite scheduler intensive, and while its startup is
>>correct (no need of any change there), it tends to trigger scheduler
>>bugs (one of those bugs was just fixed in pre5). The reason I never seen
>>the deadlock I also fixed this other scheduler bug in my tree:
>>
>> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.7pre5aa1/00_sched-yield-1
>>
>>this one I forgot to sumbit but here it is now for easy merging:
>>
>>--- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001
>>+++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001
>>@@ -674,8 +674,10 @@
>> #endif
>> spin_unlock_irq(&runqueue_lock);
>>
>>- if (prev == next)
>>+ if (prev == next) {
>>+ current->policy &= ~SCHED_YIELD;
>> goto same_process;
>>+ }
>>
>> #ifdef CONFIG_SMP
>> /*
>>


2001-07-11 19:28:59

by Josh Logan

[permalink] [raw]
Subject: Re: 2.4.7p6 hang



On Wed, 11 Jul 2001, Andrea Arcangeli wrote:

> On Wed, Jul 11, 2001 at 11:33:40AM -0700, Josh Logan wrote:
> >
> > I'm having a hang right after the floppy is initialised with pre5 and pre6
> > (2.4.3 works fine) I tried this patch, but it did not make any
>
> is the problem introduced in pre5? Can you reproduce under 2.4.7pre4?

I'll have to go try it...

>
> > improvments. The machine still has SysRq commands available. Please let
> > me know what other information you would like to debug this problem.
>
> SYSRQ+T

Floppy Drives(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
SysRq: Show State

task PC stack pid father child younger older
swapper D C03EDEC0 4980 1 0 7 (L-TLB)
keventd S C1234560 6624 2 1 3 (L-TLB)
ksoftirqd_CPU S C1232000 6468 3 1 4 2 (L-TLB)
kswapd S C1231FA8 6588 4 1 5 3 (L-TLB)
kreclaimd S 00000286 6656 5 1 6 4 (L-TLB)
bdflush S 00000286 6652 6 1 7 5 (L-TLB)
kupdated S C7F9BFC8 6620 7 1 6 (L-TLB)

I can add Call Traces if needed, this is done by hand.

>
> > BTW, I also tried to disable the floppy in the BIOS and got:
> > ...
> > Floppy OK
> > task queue still active
> > <HANG>
>
> I'll soon have a look at this message.
>
> Andrea
>

Later, JOSH


2001-07-12 00:19:33

by Johan Kullstam

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

Andrea Arcangeli <[email protected]> writes:

> On Wed, Jul 11, 2001 at 04:22:04PM +0200, Trond Myklebust wrote:
> > >>>>> " " == Andrew Morton <[email protected]> writes:
> >
> > > Trond Myklebust wrote:
> > >>
> > >> ... I have the same problem on my setup. To me, it looks like
> > >> the loop in spawn_ksoftirqd() is suffering from some sort of
> > >> atomicity problem.
> >
> > > Does a `set_current_state(TASK_RUNNING);' in spawn_ksoftirqd()
> > > fix it? If so we have a rogue initcall...
> >
> > Nope. The same thing happens as before.
> >
> > A couple of debugging statements show that ksoftirqd_CPU0 gets created
> > fine, and that ksoftirqd_task(0) is indeed getting set correctly
> > before we loop in spawn_ksoftirqd().
> > After this the second call to kernel_thread() succeeds, but
> > ksoftirqd() itself never gets called before the hang occurs.
>
> ksoftirqd is quite scheduler intensive, and while its startup is
> correct (no need of any change there), it tends to trigger scheduler
> bugs (one of those bugs was just fixed in pre5). The reason I never seen
> the deadlock I also fixed this other scheduler bug in my tree:
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.7pre5aa1/00_sched-yield-1
>
> this one I forgot to sumbit but here it is now for easy merging:
>
> --- 2.4.4aa3/kernel/sched.c.~1~ Sun Apr 29 17:37:05 2001
> +++ 2.4.4aa3/kernel/sched.c Tue May 1 16:39:42 2001
> @@ -674,8 +674,10 @@
> #endif
> spin_unlock_irq(&runqueue_lock);
>
> - if (prev == next)
> + if (prev == next) {
> + current->policy &= ~SCHED_YIELD;
> goto same_process;
> + }
>
> #ifdef CONFIG_SMP
> /*

thank you.

this patch fixes things for me too.

i was freezing at boot, right after the kernel prints the line
Initializing RT netlink.
with both 2.4.7-pre5 and 2.4.7-pre6.

after applying this to 2.4.7-pre6, things are working fine (afaict
after just a few minutes...).

--
J o h a n K u l l s t a m
[[email protected]]
Don't Fear the Penguin!

2001-07-16 19:17:07

by Josh Logan

[permalink] [raw]
Subject: Re: 2.4.7p6 hang


I just tried 2.4.6-ac5 and I had the same problem. I'll go try 2.4.7-pre4
next.

Later, JOSH


On Wed, 11 Jul 2001, Josh Logan wrote:

>
>
> On Wed, 11 Jul 2001, Andrea Arcangeli wrote:
>
> > On Wed, Jul 11, 2001 at 11:33:40AM -0700, Josh Logan wrote:
> > >
> > > I'm having a hang right after the floppy is initialised with pre5 and pre6
> > > (2.4.3 works fine) I tried this patch, but it did not make any
> >
> > is the problem introduced in pre5? Can you reproduce under 2.4.7pre4?
>
> I'll have to go try it...
>
> >
> > > improvments. The machine still has SysRq commands available. Please let
> > > me know what other information you would like to debug this problem.
> >
> > SYSRQ+T
>
> Floppy Drives(s): fd0 is 1.44M
> FDC 0 is a post-1991 82077
> SysRq: Show State
>
> task PC stack pid father child younger older
> swapper D C03EDEC0 4980 1 0 7 (L-TLB)
> keventd S C1234560 6624 2 1 3 (L-TLB)
> ksoftirqd_CPU S C1232000 6468 3 1 4 2 (L-TLB)
> kswapd S C1231FA8 6588 4 1 5 3 (L-TLB)
> kreclaimd S 00000286 6656 5 1 6 4 (L-TLB)
> bdflush S 00000286 6652 6 1 7 5 (L-TLB)
> kupdated S C7F9BFC8 6620 7 1 6 (L-TLB)
>
> I can add Call Traces if needed, this is done by hand.
>
> >
> > > BTW, I also tried to disable the floppy in the BIOS and got:
> > > ...
> > > Floppy OK
> > > task queue still active
> > > <HANG>
> >
> > I'll soon have a look at this message.
> >
> > Andrea
> >
>
> Later, JOSH
>
>
>


2001-07-16 19:36:00

by David Ford

[permalink] [raw]
Subject: Re: 2.4.7p6 hang

Chances are that you have TEQL as one of your packet schedulers?

Try the patch Dave M posted this morning, let me fetch it...

--- net/core/dev.c.~1~ Mon Jul 9 22:19:33 2001
+++ net/core/dev.c Sat Jul 14 17:25:51 2001
@@ -2654,10 +2654,6 @@
if (!dev_boot_phase)
return 0;

-#ifdef CONFIG_NET_SCHED
- pktsched_init();
-#endif
-
#ifdef CONFIG_NET_DIVERT
dv_init();
#endif /* CONFIG_NET_DIVERT */
@@ -2771,6 +2767,10 @@

dst_init();
dev_mcast_init();
+
+#ifdef CONFIG_NET_SCHED
+ pktsched_init();
+#endif

/*
* Initialise network devices


David

Josh Logan wrote:

>I just tried 2.4.6-ac5 and I had the same problem. I'll go try 2.4.7-pre4
>next.
>
> Later, JOSH
>
>
>On Wed, 11 Jul 2001, Josh Logan wrote:
>
>>
>>On Wed, 11 Jul 2001, Andrea Arcangeli wrote:
>>
>>>On Wed, Jul 11, 2001 at 11:33:40AM -0700, Josh Logan wrote:
>>>
>>>>I'm having a hang right after the floppy is initialised with pre5 and pre6
>>>>(2.4.3 works fine) I tried this patch, but it did not make any
>>>>
>>>is the problem introduced in pre5? Can you reproduce under 2.4.7pre4?
>>>
>>I'll have to go try it...
>>
>>>>improvments. The machine still has SysRq commands available. Please let
>>>>me know what other information you would like to debug this problem.
>>>>
>>>SYSRQ+T
>>>
>>Floppy Drives(s): fd0 is 1.44M
>>FDC 0 is a post-1991 82077
>>SysRq: Show State
>>
>> task PC stack pid father child younger older
>>swapper D C03EDEC0 4980 1 0 7 (L-TLB)
>>keventd S C1234560 6624 2 1 3 (L-TLB)
>>ksoftirqd_CPU S C1232000 6468 3 1 4 2 (L-TLB)
>>kswapd S C1231FA8 6588 4 1 5 3 (L-TLB)
>>kreclaimd S 00000286 6656 5 1 6 4 (L-TLB)
>>bdflush S 00000286 6652 6 1 7 5 (L-TLB)
>>kupdated S C7F9BFC8 6620 7 1 6 (L-TLB)
>>
>>I can add Call Traces if needed, this is done by hand.
>>
>>>>BTW, I also tried to disable the floppy in the BIOS and got:
>>>>...
>>>>Floppy OK
>>>>task queue still active
>>>><HANG>
>>>>
>>>I'll soon have a look at this message.
>>>
>>>Andrea
>>>
>> Later, JOSH
>>
>>
>>
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>


2001-07-16 21:07:47

by Josh Logan

[permalink] [raw]
Subject: Re: 2.4.7p6 hang


Thanks. With this patch it now boots. Hope this is part of 2.4.7.

Later, JOSH


On Mon, 16 Jul 2001, David Ford wrote:

> Chances are that you have TEQL as one of your packet schedulers?
>
> Try the patch Dave M posted this morning, let me fetch it...
>
> --- net/core/dev.c.~1~ Mon Jul 9 22:19:33 2001
> +++ net/core/dev.c Sat Jul 14 17:25:51 2001
> @@ -2654,10 +2654,6 @@
> if (!dev_boot_phase)
> return 0;
>
> -#ifdef CONFIG_NET_SCHED
> - pktsched_init();
> -#endif
> -
> #ifdef CONFIG_NET_DIVERT
> dv_init();
> #endif /* CONFIG_NET_DIVERT */
> @@ -2771,6 +2767,10 @@
>
> dst_init();
> dev_mcast_init();
> +
> +#ifdef CONFIG_NET_SCHED
> + pktsched_init();
> +#endif
>
> /*
> * Initialise network devices
>
>
> David
>
> Josh Logan wrote:
>
> >I just tried 2.4.6-ac5 and I had the same problem. I'll go try 2.4.7-pre4
> >next.
> >
> > Later, JOSH
> >
> >
> >On Wed, 11 Jul 2001, Josh Logan wrote:
> >
> >>
> >>On Wed, 11 Jul 2001, Andrea Arcangeli wrote:
> >>
> >>>On Wed, Jul 11, 2001 at 11:33:40AM -0700, Josh Logan wrote:
> >>>
> >>>>I'm having a hang right after the floppy is initialised with pre5 and pre6
> >>>>(2.4.3 works fine) I tried this patch, but it did not make any
> >>>>
> >>>is the problem introduced in pre5? Can you reproduce under 2.4.7pre4?
> >>>
> >>I'll have to go try it...
> >>
> >>>>improvments. The machine still has SysRq commands available. Please let
> >>>>me know what other information you would like to debug this problem.
> >>>>
> >>>SYSRQ+T
> >>>
> >>Floppy Drives(s): fd0 is 1.44M
> >>FDC 0 is a post-1991 82077
> >>SysRq: Show State
> >>
> >> task PC stack pid father child younger older
> >>swapper D C03EDEC0 4980 1 0 7 (L-TLB)
> >>keventd S C1234560 6624 2 1 3 (L-TLB)
> >>ksoftirqd_CPU S C1232000 6468 3 1 4 2 (L-TLB)
> >>kswapd S C1231FA8 6588 4 1 5 3 (L-TLB)
> >>kreclaimd S 00000286 6656 5 1 6 4 (L-TLB)
> >>bdflush S 00000286 6652 6 1 7 5 (L-TLB)
> >>kupdated S C7F9BFC8 6620 7 1 6 (L-TLB)
> >>
> >>I can add Call Traces if needed, this is done by hand.
> >>
> >>>>BTW, I also tried to disable the floppy in the BIOS and got:
> >>>>...
> >>>>Floppy OK
> >>>>task queue still active
> >>>><HANG>
> >>>>
> >>>I'll soon have a look at this message.
> >>>
> >>>Andrea
> >>>
> >> Later, JOSH
> >>
> >>
> >>
> >
> >
> >-
> >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >the body of a message to [email protected]
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
>