2004-11-10 08:33:32

by Thomas Hood

[permalink] [raw]
Subject: [PATCH] Documentation/preempt-locking.txt clarification

On Tue, 2004-01-20 at 02:24, Robert Love wrote:
> On Mon, 2004-01-05 at 06:31, Thomas Hood wrote:
> > [...]
> > If I sent you a patch [for preempt-locking.txt] would you review
> > it and apply it if it improves the wording?
> [...]
> Sure, you can send me the patch. If you want, CC [email protected] and lkml
> and we can just apply it.


Here is a patch for Documentation/preempt-locking.txt.

I have tried to clarify the text while at the same time
preserving the original meaning. Everything is pretty clear
now except for the last paragraph which I still find baffling.
I don't know what "a test to see if preemption is required" is
exactly and I don't understand when such a test is required.
I read a number of source files but this didn't help.

It does need proof reading and if anyone would like to work on
it further then they're completely welcome.

Apart from that I have formatted the file to fit within 70
columns.

I also attach a patch. It was prepared some time ago for Linux
2.6.3 but it still applies to Linux 2.6.8.1.
// Thomas Hood


Proper locking in a Preemptible Kernel
Keeping kernel code preempt-safe
Robert Love <[email protected]>
Last Update: 24 Feb 2004


INTRODUCTION


Allowing preemption within kernel code raises new locking issues.
For the most part the issues are the same as those under SMP --
concurrency and reentrancy -- and fortunately SMP locking mechanisms
can be leveraged. The kernel requires additional explicit locking for
a few situations. The purpose of this document is to help kernel
hackers to understand the issues and to deal correctly with these
situations.


RULE #1: Explicitly protect per-CPU data structures


Two problems can arise. Consider this code snippet:

struct nifty data[NR_CPUS];
some_value = tux[smp_processor_id()] + ...;
/* imagine the task is preempted here... */
tux[smp_processor_id()] = some_value;
/* imagine the task is preempted here... */
something = tux[smp_processor_id()];

The first problem is that the local data may fail to be locked for SMP
purposes because tasks running on other CPUs cannot access it. But if
one task executing this code were to be preempted then another task on
the same CPU might interfere with the data.

Second, when the preempted task is finally rescheduled the value of
smp_processor_id() may be different from the old value.

For both of these reasons you must protect per-CPU data by disabling
preemption for the duration of the critical region.

The functions get_cpu() and put_cpu(), which disable preemption, may
be useful for this purpose.


RULE #2: Protect CPU state


In a preemptable kernel the state of the CPU must be protected.
Vulnerable are the CPU structures and state not preserved over context
switches. Which structures these are depends on the arch. On x86,
for example, entering and exiting FPU mode must happen in critical
sections that are executed with preemption disabled. Think what would
happen if a task was executing a floating-point instruction and was
preempted! (Remember, the kernel does not save FPU state except for
user tasks.) The FPU registers would be sold to the highest bidder.
Preemption must be disabled around such regions.

Note that some FPU functions have already been modified for preempt-
safety. For example, kernel_fpu_begin() disables preemption. On the
other hand, math_state_restore() must be called with preemption
already disabled.


RULE #3: A lock must be released by the task that acquires it


A lock acquired by one task must be released by the same task. This
means you can't do oddball things like acquire a lock and go off to
play, leaving another task to release it. If you want to do something
like that then acquire and release the lock in the first task and
make the other task wait on an event triggered by the first.


DISABLING PREEMPTION


Here are the functions that are used explicitly to disable and enable
preemption.

preempt_disable() increment the preempt counter
preempt_enable() decrement the preempt counter
preempt_enable_no_resched() decrement, but do not immediately preempt
preempt_check_resched() reschedule if needed
preempt_count() return the preempt count

The disable and enable functions are nestable. In other words, you
can call preempt_disable() n times in a code path and preemption will
not be re-enabled until the n'th call to preempt_enable().

These are actually macros which are defined as no-ops if the kernel
is configured not to support preemption.

Note that you do not need explicitly to disable preemption if any
locks are held or if interrupts are disabled, since preemption is
implicitly disabled in those cases.

However, keep in mind that relying on irqs being disabled is a risky
business. Any spin_unlock() that decreases the preemption count to 0
can trigger a reschedule. Even a simple printk() might trigger such
a reschedule. So rely on implicit preemption-disabling only if you
know that this sort of thing cannot happen in your code path. The
best policy is to rely on implicit preemption-disabling only for
short times and only so long as your remain within your own code.


EXAMPLE -- EXPLICITLY PREVENTING PREEMPTION


An example of disabling preemption using the above functions:

cpucache_t *cc; /* pointer to per-CPU data */
/* ... */
preempt_disable();
cc = cc_data(searchp);
if (cc && cc->avail) {
__free_block(searchp, cc_entry(cc), cc->avail);
cc->avail = 0;
}
preempt_enable();
return 0;

Notice how the preemption-disabled range must encompass every
reference to the per-CPU data.

Another example:

int buf[NR_CPUS]; /* per-CPU data */
/* ... */
set_cpu_val(buf);
if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
spin_lock(&buf_lock);
/* ... */

This code is not preempt-safe, but see how easily we can fix it by
simply moving the spin_lock() up two lines.


EXAMPLE -- PREVENTING PREEMPTION USING INTERRUPT DISABLING


It is possible to prevent preemption using local_irq_disable() and
local_irq_save(). If you do this you must be very careful not to
cause an event that would set need_resched and result in a preemption
check. When in doubt, rely on locking or explicit preemption
disabling.

Note that as of 2.5 interrupt disabling is always local (i.e.,
per-CPU).

An additional concern is proper usage of local_irq_disable() and
local_irq_save(). These may be used to prevent preemption. However,
on exit, if preemption may be enabled, a test to see if preemption is
required should be done. If these are called from the spin_lock and
read/write lock macros then the right thing is done. They may also be
called from within spinlock-protected regions and from interrupt
contexts and bottom halves/tasklets (which are protected by preemption
locks) without further ado. However, if they are ever called from
outside of such contexts then a test for preemption should be done.



Attachments:
preempt-locking.txt.patch (10.97 kB)

2004-11-10 08:57:52

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Documentation/preempt-locking.txt clarification

Thomas Hood <[email protected]> wrote:
>
> I have tried to clarify the text while at the same time
> preserving the original meaning. Everything is pretty clear
> now except for the last paragraph which I still find baffling.
> I don't know what "a test to see if preemption is required" is
> exactly and I don't understand when such a test is required.

I guess it's saying that if you do this:

local_irq_disable();
lah_de_dah();
local_irq_enable();

then you should follow that by

preempt_check_resched();

just in case someone told this task to preempt itself while it had
interrupts disabled.

But I don't see why that's needed: if the preempt command came from another
CPU then this CPU will take the cross-CPU interrupt as soon as interrupts
are enabled. And the preempt command couldn't have come from _this_ CPU,
because it had interrupts disabled.

2004-11-10 09:39:44

by Thomas Hood

[permalink] [raw]
Subject: Re: [PATCH] Documentation/preempt-locking.txt clarification

On Wed, 2004-11-10 at 09:57, Andrew Morton wrote:
> I guess it's saying ...

Thanks for the explanation. I include a new patch which incorporates
your example. I am in no position to judge the _truth_ of the
statements in this document; I am only hoping to _understand_ them. :)

// Thomas Hood


Proper locking in a Preemptible Kernel
Keeping kernel code preempt-safe
Robert Love <[email protected]>
Last Update: 10 Nov 2004


INTRODUCTION


Allowing preemption within kernel code raises locking issues that
don't arise for a non-preemptable kernel. For the most part the
issues are the same as those under SMP -- concurrency and reentrancy --
and fortunately SMP locking mechanisms can be leveraged. The kernel
requires additional explicit locking for a few situations. The purpose
of this document is to help kernel hackers to understand the issues
and to deal correctly with these situations.


RULE #1: Explicitly protect per-CPU data structures


Two problems can arise. Consider this code snippet:

struct nifty data[NR_CPUS];
data[smp_processor_id()] = ...;
/* imagine the task is preempted here */
some_value = data[smp_processor_id()];
/* imagine the task is preempted here */
data[smp_processor_id()] = some_value + ...;

The first problem is that local data may lack SMP locking (because
tasks running on other CPUs cannot access it); but when preemption is
possible, if one task executing this code is preempted then another
task on the same CPU can interfere with the data.

Second, when the preempted task is finally rescheduled the value of
smp_processor_id() may be different from the old value.

For both of these reasons you must protect per-CPU data by disabling
preemption for the duration of the critical region.

The functions get_cpu() and put_cpu(), which disable preemption, may
be useful for this purpose.


RULE #2: Protect CPU state


In a preemptable kernel the state of the CPU must be protected.
Vulnerable are the CPU structures and state not preserved over context
switches. Which structures these are depends on the arch. On x86,
for example, entering and exiting FPU mode must happen in critical
sections that are executed with preemption disabled. Think what would
happen if a task was executing a floating-point instruction and was
preempted! (Remember, the kernel does not save FPU state except for
user tasks.) The FPU registers would be sold to the highest bidder.
Preemption must be disabled around such regions.

Note that some FPU functions have already been modified for preempt-
safety. For example, kernel_fpu_begin() disables preemption. On the
other hand, math_state_restore() must be called with preemption
already disabled.


RULE #3: A lock must be released by the task that acquires it


A lock acquired by one task must be released by the same task. This
means you can't do oddball things like acquire a lock and go off to
play, leaving another task to release the lock. If you want to do
something like that then acquire and release the lock in the first
task and make the other task wait on an event triggered by the first.


DISABLING PREEMPTION


Here are the functions that are used explicitly to disable and enable
preemption.

preempt_disable() increment the preempt counter
preempt_enable() decrement the preempt counter
preempt_enable_no_resched() decrement, but do not immediately preempt
preempt_check_resched() reschedule if needed
preempt_count() return the preempt count

The disable and enable functions are nestable. In other words, you
can call preempt_disable() n times in a code path and preemption will
not be re-enabled until the n'th call to preempt_enable().

These are actually macros which are defined as no-ops if the kernel
is configured not to support preemption.

Note that you do not need explicitly to disable preemption if any
locks are held or if interrupts are disabled, since preemption is
implicitly disabled in those cases.

However, keep in mind that relying on irqs being disabled is a risky
business. Any spin_unlock() that decreases the preemption count to 0
can trigger a reschedule. Even a simple printk() might trigger such
a reschedule. So rely on implicit preemption-disabling only if you
know that this sort of thing cannot happen in your code path. The
best policy is to rely on implicit preemption-disabling only for
short times and only so long as your remain within your own code.


EXAMPLE -- EXPLICITLY PREVENTING PREEMPTION


An example of disabling preemption using the above functions:

cpucache_t *cc; /* pointer to per-CPU data */
/* ... */
preempt_disable();
cc = cc_data(searchp);
if (cc && cc->avail) {
__free_block(searchp, cc_entry(cc), cc->avail);
cc->avail = 0;
}
preempt_enable();
return 0;

Notice how the preemption-disabled range must encompass every
reference to the per-CPU data.

Another example:

int buf[NR_CPUS]; /* per-CPU data */
/* ... */
set_cpu_val(buf);
if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n");
spin_lock(&buf_lock);
/* ... */

This code is not preempt-safe, but see how easily we can fix it by
simply moving the spin_lock() up two lines.


EXAMPLE -- PREVENTING PREEMPTION USING INTERRUPT DISABLING


It is possible to prevent preemption using local_irq_disable() and
local_irq_save(). If you do this you must be very careful not to
cause an event that would set need_resched and result in a preemption
check. When in doubt, rely on locking or explicit preemption
disabling.

Note that as of 2.5 interrupt disabling is always local (i.e.,
per-CPU).

An additional concern is proper usage of local_irq_disable() and
local_irq_save(). These may be used to prevent preemption. If these
are called from the spin_lock and read/write lock macros then the
right thing is done. They may also be called from within spinlock-
protected regions and from interrupt contexts and bottom halves /
tasklets (which are protected by preemption locks) without further
ado. However, if they are ever called from outside of such contexts
then a test for preemption should be done. E.g., if you do this:

local_irq_disable();
lah_de_dah();
local_irq_enable();

then you should follow that by:

preempt_check_resched();

just in case someone told this task to preempt itself while it had
interrupts disabled.



Attachments:
preempt-locking.txt_jdth20041110_2.patch (11.17 kB)

2004-11-10 09:49:51

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Documentation/preempt-locking.txt clarification

Thomas Hood <[email protected]> wrote:
>
> On Wed, 2004-11-10 at 09:57, Andrew Morton wrote:
> > I guess it's saying ...
>
> Thanks for the explanation. I include a new patch which incorporates
> your example. I am in no position to judge the _truth_ of the
> statements in this document; I am only hoping to _understand_ them. :)

I think the statement is in fact false. Ingo, what's your take on this
paragraph, from preempt-locking.txt?


An additional concern is proper usage of local_irq_disable and
local_irq_save. These may be used to protect from preemption, however,
on exit, if preemption may be enabled, a test to see if preemption is
required should be done. If these are called from the spin_lock and
read/write lock macros, the right thing is done. They may also be called
within a spin-lock protected region, however, if they are ever called
outside of this context, a test for preemption should be made. Do note
that calls from interrupt context or bottom half/ tasklets are also
protected by preemption locks and so may use the versions which do not
check preemption.

2004-11-10 09:58:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Documentation/preempt-locking.txt clarification


* Andrew Morton <[email protected]> wrote:

> I think the statement is in fact false. Ingo, what's your take on
> this paragraph, from preempt-locking.txt?
>
> An additional concern is proper usage of local_irq_disable and
> local_irq_save. These may be used to protect from preemption,
> however, on exit, if preemption may be enabled, a test to see if
> preemption is required should be done. If these are called from the
> spin_lock and read/write lock macros, the right thing is done. They
> may also be called within a spin-lock protected region, however, if
> they are ever called outside of this context, a test for preemption
> should be made. Do note that calls from interrupt context or bottom
> half/ tasklets are also protected by preemption locks and so may use
> the versions which do not check preemption.

seems mostly correct. The issue is that if a wakeup is done from within
an irqs-off critical section (perfectly possible - a simple printk can
trigger a wakeup) then the current task may be marked for rescheduling
but cannot do it just yet. So when interrupts are re-enabled again
(outside of a critical section) a manual preempt_check_resched() is
necessary in the generic case, or else we miss the reschedule.

In special cases, if no wakeup may happen from within the irqs-off
section then the manual preemption can be skipped, because asynchronous
reschedules (e.g. on SMP) always come in the form of interrupts. I'd
wager that in fact these 'special cases' are in the majority, but
there's no guarantee.

Ingo

2004-11-10 10:01:32

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] Documentation/preempt-locking.txt clarification


* Andrew Morton <[email protected]> wrote:

> But I don't see why that's needed: if the preempt command came from
> another CPU then this CPU will take the cross-CPU interrupt as soon as
> interrupts are enabled. And the preempt command couldn't have come
> from _this_ CPU, because it had interrupts disabled.

this CPU can easily trigger a reschedule by _waking_ another task
synchronously (from within the irqs-off section) and hence causing
itself to preempt. I'd not say it's common, but it can happen in quite
unexpected places, so i'd encourage us to keep the more cautious advice
that is in the current text.

Ingo