2004-11-16 19:48:08

by Jan Engelhardt

[permalink] [raw]
Subject: Work around a lockup?

Hello,


I am currently looking into an issue where a host sporadically locks up. I will
retrieve the SYSRQ+P tomorrow when I am back at the machine.
Until then, here's the real question:

Given that some kernel code (possibly a module) runs in an infinite loop, and
thus not giving back control to the user (in an UP environment), is there a
possibility to force a schedule?
Something like the normal scheduler does to processes ("you got your timeslice,
and not more"), but also when they are in kernel mode.



Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de


2004-11-16 20:22:28

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Work around a lockup?

>No driver code should ever wait forever. Some module code may
>be broken where the writter assumed that some bit must eventually
>be set or some FIFO must eventually empty, etc. Hardware breaks.

The box has locked up and I would like to know if there's a way around it.

>If you need to wait a long time for something, you can execute
>schedule_timeout(n) in your counted loop. This will give up
>the CPU to other tasks while you are waiting. More sophisticated
>code sleeps until interrupted, etc. Of course, the interrupt
>may never happen so your driver needs to plan for that too.

Let's *do* assume that some module's algorithm is not perfect, and further
assume that ATM, it's in an endless loop. Moreover, editing the module's source
is not an option.

This is not a homework or something, it's real. And I do not know where it's
hanging. Sure, SYSRQ+P would tell me where, but that could get hard to track if
it's the Nth stack frame (seen from the inner-most) for big N.

So for the moment to keep downtimes small, best option would be to have
something to circumvent the blocker process. E.g. putting it to sleep and
(then, finally, when I regain control) poke with the module's/kernel's source.

I've generalized the case into the above-mentioned for(;;); because that's the
worst case for uniprocessors, and I think it's best to start tackling there.


Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-16 20:42:35

by linux-os

[permalink] [raw]
Subject: Re: Work around a lockup?

On Tue, 16 Nov 2004, Jan Engelhardt wrote:

>> No driver code should ever wait forever. Some module code may
>> be broken where the writter assumed that some bit must eventually
>> be set or some FIFO must eventually empty, etc. Hardware breaks.
>
> The box has locked up and I would like to know if there's a way around it.
>
>> If you need to wait a long time for something, you can execute
>> schedule_timeout(n) in your counted loop. This will give up
>> the CPU to other tasks while you are waiting. More sophisticated
>> code sleeps until interrupted, etc. Of course, the interrupt
>> may never happen so your driver needs to plan for that too.
>
> Let's *do* assume that some module's algorithm is not perfect, and further
> assume that ATM, it's in an endless loop. Moreover, editing the module's
> source is not an option.
>
> This is not a homework or something, it's real. And I do not know where it's
> hanging. Sure, SYSRQ+P would tell me where, but that could get hard to
> track if it's the Nth stack frame (seen from the inner-most) for big N.
>
> So for the moment to keep downtimes small, best option would be to have
> something to circumvent the blocker process. E.g. putting it to sleep and
> (then, finally, when I regain control) poke with the module's/kernel's source.
>
> I've generalized the case into the above-mentioned for(;;); because that's the
> worst case for uniprocessors, and I think it's best to start tackling there.
>
>
> Jan Engelhardt

If there is a continuous loop inside the kernel, something outside
the kernel (you) are never going to get control except from an
interrupt. The keyboard interrupt is going to let you see what
is happening, but you won't get any real control because the
kernel is not a task. If the kernel were a task (like VMS),
you could (maybe) context-switch out of the kernel. But,
the kernel is some common code that executes on behalf of
all the tasks in the context of "current". If the current
task is stuck inside the kernel code, it has nowhere to go.
If the current task is looping while sleeping then other tasks
can be scheduled including yours, but if it's not sleeping
(never calling the scheduler), the reboot-switch is the only
way out.

When some user task executes outside the kernel, it doesn't
have the priviliges to loop forever. A context switch will
occur and the CPU will be shared with others. However, when
that user task calls some kernel function, perhaps from
a driver interface, that function has the priviliges to
keep the CPU forever. If the driver is improperly written,
it will.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by John Ashcroft.
98.36% of all statistics are fiction.

2004-11-16 20:13:46

by linux-os

[permalink] [raw]
Subject: Re: Work around a lockup?

On Tue, 16 Nov 2004, Jan Engelhardt wrote:

> Hello,
>
>
> I am currently looking into an issue where a host sporadically locks up. I will
> retrieve the SYSRQ+P tomorrow when I am back at the machine.
> Until then, here's the real question:
>
> Given that some kernel code (possibly a module) runs in an infinite loop, and
> thus not giving back control to the user (in an UP environment), is there a
> possibility to force a schedule?
> Something like the normal scheduler does to processes ("you got your timeslice,
> and not more"), but also when they are in kernel mode.
>
>
>
> Jan Engelhardt
> --

No driver code should ever wait forever. Some module code may
be broken where the writter assumed that some bit must eventually
be set or some FIFO must eventually empty, etc. Hardware breaks.

Every loop in kernel code, not just in drivers, needs some way
"out" if things don't go according to plan. To do that, you
have a course timer called "jiffies" and you have finer granularity
from counted-spin-loops. Never assume anything. DMA may never
complete, UART data-ready bits may never be true, SNICS (Network)
controllers may never be able to receive data, etc. Always have
a way to nicely fail a hardware interface request.

If you need to wait a long time for something, you can execute
schedule_timeout(n) in your counted loop. This will give up
the CPU to other tasks while you are waiting. More sophisticated
code sleeps until interrupted, etc. Of course, the interrupt
may never happen so your driver needs to plan for that too.

There are numerous examples of kernel driver code where
the CPU schedules while the code waits for some event. But,
beware, that some procedures are being removed and some
methods are broken by design. Copy the code in newer drivers.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by John Ashcroft.
98.36% of all statistics are fiction.

2004-11-16 20:52:12

by Jan Engelhardt

[permalink] [raw]
Subject: Re: Work around a lockup?

>If there is a continuous loop inside the kernel, something outside
>the kernel (you) are never going to get control except from an
>interrupt. The keyboard interrupt is going to let you see what
>is happening, but you won't get any real control because the
>kernel is not a task. If the kernel were a task (like VMS),

(Surprise.) Yes, I can still ping it and initiate a connection (i.e. the queue
accepts it, because someone did listen() on the socket), but that's all. I bet
that's due to the network card generating an interrupt.

>you could (maybe) context-switch out of the kernel. But,
>the kernel is some common code that executes on behalf of
>all the tasks in the context of "current". If the current
>task is stuck inside the kernel code, it has nowhere to go.

Wait, an interrupt can ... well interrupt a task, /even/ if it is in kernel
mode, otherwise jiffies would not get incremented. So, would not it be possible
to call some sort of schedule() when do_timer() (or similar) is run?
Like:
foreach p in runqueue {
if(p->location==KERNELSPACE && exceeded-kernelspace-timeslic) {
switch_to(rq->next); // "never returns"
}
}

>When some user task executes outside the kernel, it doesn't
>have the priviliges to loop forever. A context switch will
>occur and the CPU will be shared with others. However, when
>that user task calls some kernel function, perhaps from
>a driver interface, that function has the priviliges to
>keep the CPU forever. If the driver is improperly written,
>it will.

So to summarize what I need: disprivilege a process to keep the CPU forever
when it is in kernel mode.



Jan Engelhardt
--
Gesellschaft für Wissenschaftliche Datenverarbeitung
Am Fassberg, 37077 Göttingen, http://www.gwdg.de

2004-11-16 21:15:34

by linux-os

[permalink] [raw]
Subject: Re: Work around a lockup?

On Tue, 16 Nov 2004, Jan Engelhardt wrote:

>> If there is a continuous loop inside the kernel, something outside
>> the kernel (you) are never going to get control except from an
>> interrupt. The keyboard interrupt is going to let you see what
>> is happening, but you won't get any real control because the
>> kernel is not a task. If the kernel were a task (like VMS),
>
> (Surprise.) Yes, I can still ping it and initiate a connection (i.e. the queue
> accepts it, because someone did listen() on the socket), but that's all. I bet
> that's due to the network card generating an interrupt.
>
>> you could (maybe) context-switch out of the kernel. But,
>> the kernel is some common code that executes on behalf of
>> all the tasks in the context of "current". If the current
>> task is stuck inside the kernel code, it has nowhere to go.
>
> Wait, an interrupt can ... well interrupt a task, /even/ if it is in kernel
> mode, otherwise jiffies would not get incremented. So, would not it be
> possible
> to call some sort of schedule() when do_timer() (or similar) is run?
> Like:
> foreach p in runqueue {
> if(p->location==KERNELSPACE && exceeded-kernelspace-timeslic) {
> switch_to(rq->next); // "never returns"
> }
> }
>

You can't schedule from an interrupt because, if (when) the
scheduled task calls the kernel to do something, the return
address, context info, etc., of the previously-interrupted
task will be overwritten and lost forever.

Now, VMS (solved) this non-problem, at great performance penalty,
by having a context-switch for everything. A hardware interrupt
generated a context-switch so the hardware interrupt could
certainly directly context-switch to a user-mode task. The
kernel was, itself, a task (called SWAPPER). When you called
the kernel (trapped to), a context-switch occurred and the
kernel did whatever in whatever order it wanted because it
didn't have to return to the caller right away (if ever).
In fact, it didn't even have to return to whatever got
interrupted. That task was just put into the queue of
runnable tasks.

The performance was nice for a single task that used, for
instance, a DR11 parallel-port board. An interrupt occurred
and the task got control right away after the data was
DMAed. Add more tasks and the system bogged down.
If you had 20 people compiling FORTRAN of a 11/780, it
took MINUTES to log in.

However, with such a system a high-priority task could
take the CPU away from anything. That meant that SYSTEM
could usually get control, assuming it was already logged
in. A dead driver was just marked unusable and everything
continued. Even dead RAM was able to marked unusable.

Unix was invented to bypass all this stuff. The kernel
is not a task. It is just some privileged shared code.
Therefore, it can execute quickly. The trade-off is
that you need to fix bad drivers.


>> When some user task executes outside the kernel, it doesn't
>> have the priviliges to loop forever. A context switch will
>> occur and the CPU will be shared with others. However, when
>> that user task calls some kernel function, perhaps from
>> a driver interface, that function has the priviliges to
>> keep the CPU forever. If the driver is improperly written,
>> it will.
>
> So to summarize what I need: disprivilege a process to keep the CPU forever
> when it is in kernel mode.
>

You can't. Once the kernel code starts executing, it must behave.

>
> Jan Engelhardt
> --
> Gesellschaft f??r Wissenschaftliche Datenverarbeitung
> Am Fassberg, 37077 G??ttingen, http://www.gwdg.de
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.9 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by John Ashcroft.
98.36% of all statistics are fiction.