Linus,
Misc. kernel preemption-related bits. Specifically,
- update to Documentation/preempt-locking.txt (me)
- preempt-safe arch/i386/kernel/ioport.c :: sys_ioperm()
(George Anzinger)
- remove "kernel_lock()" cruft in include/linux/smp.h
(Andrew Morton)
- we have a debug check in preempt_schedule that, even
on detecting a schedule with irqs disabled, still goes
ahead and reschedules. We should return. (me)
- preempt-safe net/core/dev.c :: netif_rx() (George Anzinger)
All fairly trivial and/or simple. Patch is against 2.5.32-bk. Please,
apply.
Robert Love
diff -urN linux-2.5.32/Documentation/preempt-locking.txt linux/Documentation/preempt-locking.txt
--- linux-2.5.32/Documentation/preempt-locking.txt Tue Aug 27 15:26:32 2002
+++ linux/Documentation/preempt-locking.txt Wed Aug 28 23:23:30 2002
@@ -1,7 +1,7 @@
Proper Locking Under a Preemptible Kernel:
Keeping Kernel Code Preempt-Safe
- Robert Love <[email protected]>
- Last Updated: 22 Jan 2002
+ Robert Love <[email protected]>
+ Last Updated: 28 Aug 2002
INTRODUCTION
@@ -112,3 +112,24 @@
This code is not preempt-safe, but see how easily we can fix it by simply
moving the spin_lock up two lines.
+
+
+PREVENTING PREEMPTION USING INTERRUPT DISABLING
+
+
+It is possible to prevent a preemption event using local_irq_disable and
+local_irq_save. Note, when doing so, you must be very careful to not cause
+an event that would set need_resched and result in a preemption check. When
+in doubt, rely on locking or explicit preemption disabling.
+
+Note in 2.5 interrupt disabling is now only per-CPU (e.g. local).
+
+An additional concern is proper usage of local_irq_disable and local_irq_save.
+These may be used to protect from preemption, however, on exit, if preemption
+may be enabled, a test to see if preemption is required should be done. If
+these are called from the spin_lock and read/write lock macros, the right thing
+is done. They may also be called within a spin-lock protected region, however,
+if they are ever called outside of this context, a test for preemption should
+be made. Do note that calls from interrupt context or bottom half/ tasklets
+are also protected by preemption locks and so may use the versions which do
+not check preemption.
diff -urN linux-2.5.32/arch/i386/kernel/ioport.c linux/arch/i386/kernel/ioport.c
--- linux-2.5.32/arch/i386/kernel/ioport.c Tue Aug 27 15:26:42 2002
+++ linux/arch/i386/kernel/ioport.c Wed Aug 28 23:23:30 2002
@@ -55,12 +55,16 @@
asmlinkage int sys_ioperm(unsigned long from, unsigned long num, int turn_on)
{
struct thread_struct * t = ¤t->thread;
- struct tss_struct * tss = init_tss + smp_processor_id();
+ struct tss_struct * tss;
+ int ret = 0;
if ((from + num <= from) || (from + num > IO_BITMAP_SIZE*32))
return -EINVAL;
if (turn_on && !capable(CAP_SYS_RAWIO))
return -EPERM;
+
+ tss = init_tss + get_cpu();
+
/*
* If it's the first ioperm() call in this thread's lifetime, set the
* IO bitmap up. ioperm() is much less timing critical than clone(),
@@ -69,8 +73,11 @@
if (!t->ts_io_bitmap) {
unsigned long *bitmap;
bitmap = kmalloc(IO_BITMAP_BYTES, GFP_KERNEL);
- if (!bitmap)
- return -ENOMEM;
+ if (!bitmap) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
/*
* just in case ...
*/
@@ -88,7 +95,9 @@
set_bitmap(t->ts_io_bitmap, from, num, !turn_on);
set_bitmap(tss->io_bitmap, from, num, !turn_on);
- return 0;
+out:
+ put_cpu();
+ return ret;
}
/*
diff -urN linux-2.5.32/include/linux/smp.h linux/include/linux/smp.h
--- linux-2.5.32/include/linux/smp.h Tue Aug 27 15:26:43 2002
+++ linux/include/linux/smp.h Wed Aug 28 23:23:30 2002
@@ -87,9 +87,6 @@
#define smp_processor_id() 0
#define hard_smp_processor_id() 0
#define smp_threads_ready 1
-#ifndef CONFIG_PREEMPT
-#define kernel_lock()
-#endif
#define smp_call_function(func,info,retry,wait) ({ 0; })
static inline void smp_send_reschedule(int cpu) { }
static inline void smp_send_reschedule_all(void) { }
diff -urN linux-2.5.32/kernel/sched.c linux/kernel/sched.c
--- linux-2.5.32/kernel/sched.c Tue Aug 27 15:26:37 2002
+++ linux/kernel/sched.c Wed Aug 28 23:23:24 2002
@@ -1039,6 +1039,7 @@
printk("bad: schedule() with irqs disabled!\n");
show_stack(NULL);
preempt_enable_no_resched();
+ return;
}
need_resched:
diff -urN linux-2.5.32/net/core/dev.c linux/net/core/dev.c
--- linux-2.5.32/net/core/dev.c Tue Aug 27 15:26:43 2002
+++ linux/net/core/dev.c Wed Aug 28 23:23:30 2002
@@ -1229,19 +1229,20 @@
int netif_rx(struct sk_buff *skb)
{
- int this_cpu = smp_processor_id();
+ int this_cpu;
struct softnet_data *queue;
unsigned long flags;
if (!skb->stamp.tv_sec)
do_gettimeofday(&skb->stamp);
- /* The code is rearranged so that the path is the most
- short when CPU is congested, but is still operating.
+ /*
+ * The code is rearranged so that the path is the most
+ * short when CPU is congested, but is still operating.
*/
- queue = &softnet_data[this_cpu];
-
local_irq_save(flags);
+ this_cpu = smp_processor_id();
+ queue = &softnet_data[this_cpu];
netdev_rx_stat[this_cpu].total++;
if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
@@ -1252,10 +1253,10 @@
enqueue:
dev_hold(skb->dev);
__skb_queue_tail(&queue->input_pkt_queue, skb);
- local_irq_restore(flags);
#ifndef OFFLINE_SAMPLE
get_sample_stats(this_cpu);
#endif
+ local_irq_restore(flags);
return queue->cng_level;
}
Robert Love wrote:
>
> ...
> - we have a debug check in preempt_schedule that, even
> on detecting a schedule with irqs disabled, still goes
> ahead and reschedules. We should return. (me)
>
OK, but that warning will still come out of the mess in mm/slab.c.
Reminder:
CPU0: CPU1:
local_irq_disable(); resched_task(task on CPU0)
spin_lock();
... p->need_resched = 1;
spin_unlock(); // reschedules
local_irq_enable();
There is one code path in slab which fixes this with _raw_spin_unlock()
and a subsequent preempt_disable(), but there are quite a lot of other
code paths which need the same treatment. Fixing them is messy.
So unless you guys hit me with a brick, I'll create:
#ifdef CONFIG_SMP
#define preempt_disable_irqsave(flags)
do {
preempt_disable();
local_irq_save(flags);
} while (0)
#define preempt_enable_irqrestore(flags)
do {
local_irq_restore(flags);
preempt_enable();
} while (0)
#else
#define preempt_disable_irqsave(flags)
do {
local_irq_save(flags);
} while (0)
#define preempt_enable_irqrestore(flags)
do {
local_irq_restore(flags);
} while (0)
#endif
and use that in slab.
On 29 Aug 2002, Robert Love wrote:
>
> - we have a debug check in preempt_schedule that, even
> on detecting a schedule with irqs disabled, still goes
> ahead and reschedules. We should return. (me)
I think we should return silently, and simply consider the case of
disabled local interrupts to be equivalent to having preemption disabled.
So I would remove even the warning.
Comments?
Linus
On Thu, 2002-08-29 at 14:38, Linus Torvalds wrote:
> I think we should return silently, and simply consider the case of
> disabled local interrupts to be equivalent to having preemption disabled.
>
> So I would remove even the warning.
>
> Comments?
This is a tough question.
I originally did just that but Ingo said we should aim to find the
problem areas, too. The issue is, for 99% of the cases, disabling
interrupts really is equivalent to disabling preemption (e.g.
preempt_schedule() is never called). For the remaining 1% of the cases,
it is possible to fix up the problems by playing safely with interrupts
off.
We _must_ return since we are seeing these in the wild. If we want to
leave the debug checking to try to "fix" the remaining cases we can do
so too.
How about this: add the return now (i.e. accept the patch as-is) and
keep the debug check so we can continue to find areas that cause
incorrect preemptions. Before 2.6, I will send a patch to remove the
check and just return silently.
Sound good?
Robert Love
On Thu, 2002-08-29 at 14:24, Andrew Morton wrote:
> Robert Love wrote:
> >
> > ...
> > - we have a debug check in preempt_schedule that, even
> > on detecting a schedule with irqs disabled, still goes
> > ahead and reschedules. We should return. (me)
> >
>
> OK, but that warning will still come out of the mess in mm/slab.c.
Yah I saw your previous post. There is also a report or two re oops on
poisoned memory with SLAB_DEBUG and preempt enabled - which should be
fixed by the above return but one report says it does not.
Anyhow, your new methods are fine... if they work, they work.
I am starting to get concerned over the number of routines we are having
to define. It is getting complicated (i.e. the
inc_preempt_non_preempt() stuff).
Maybe keep these local to mm/slab.c ?
Robert Love
Robert Love wrote:
>
> On Thu, 2002-08-29 at 14:24, Andrew Morton wrote:
>
> > Robert Love wrote:
> > >
> > > ...
> > > - we have a debug check in preempt_schedule that, even
> > > on detecting a schedule with irqs disabled, still goes
> > > ahead and reschedules. We should return. (me)
> > >
> >
> > OK, but that warning will still come out of the mess in mm/slab.c.
>
> Yah I saw your previous post. There is also a report or two re oops on
> poisoned memory with SLAB_DEBUG and preempt enabled - which should be
> fixed by the above return but one report says it does not.
>
> Anyhow, your new methods are fine... if they work, they work.
>
> I am starting to get concerned over the number of routines we are having
> to define. It is getting complicated (i.e. the
> inc_preempt_non_preempt() stuff).
>
> Maybe keep these local to mm/slab.c ?
Robert, I think what is needed is the changes I sent you for
spinlock.h and system.h. Could you convert to 2.5 and post?
All, I defined:
local_irq_restore_preempt(x)
local_irq_enable_preempt()
local_irq_restore_ck_preempt(x)
local_irq_enable_ck_preempt()
The preempt versions are to be used in the spin_unlock_*
macros. The ck_preempt versions in open code where we do
need to check preemption. I also redefined:
local_irq_enable()
local_irq_restore(x)
to call _local_irq_* and then test_preempt(x).
test_preempt() then, if CONFIG_ enables, prints warning
message the first time a test AFTER the irq restore finds a
preemptable condition AND if restore, the interrupt system
is now on.
The net result is that all the code is the same, but we can
now turn on a test to see if there are some preemptions we
are missing. Usually this would come from turning off
interrupts while wake-up is called.
--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml