2013-06-14 14:40:55

by Kirill Tkhai

[permalink] [raw]
Subject: [PATCH] sched: schedule_raw_spin_unlock() and schedule_spin_unlock()

Helpers for replacement repeating patterns:

1)raw_spin_unlock_irq(lock);
schedule();
2)raw_spin_unlock_irqrestore(lock, flags);
schedule();

(The same for spinlock_t)

They allow to prevent excess preempt_schedule(), which can happen on preemptible kernel.

Signed-off-by: Kirill Tkhai <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Peter Zijlstra <[email protected]>
---
include/linux/sched.h | 20 ++++++++++++++++++++
kernel/sched/core.c | 24 ++++++++++++++++++++++++
2 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8b10339..0ae79f3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -310,6 +310,26 @@ extern signed long schedule_timeout_uninterruptible(signed long timeout);
asmlinkage void schedule(void);
extern void schedule_preempt_disabled(void);

+#ifdef CONFIG_PREEMPT
+extern void schedule_raw_spin_unlock(raw_spinlock_t *lock);
+/* See comment for schedule_raw_spin_unlock() */
+static inline void schedule_spin_unlock(spinlock_t *lock)
+{
+ schedule_raw_spin_unlock(&lock->rlock);
+}
+#else
+static inline void schedule_raw_spin_unlock(raw_spinlock_t *lock)
+{
+ raw_spin_unlock_irq(lock);
+ schedule();
+}
+static inline void schedule_spin_unlock(spinlock_t *lock)
+{
+ spin_unlock_irq(lock);
+ schedule();
+}
+#endif
+
struct nsproxy;
struct user_namespace;

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58453b8..381e493 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3125,6 +3125,30 @@ asmlinkage void __sched preempt_schedule_irq(void)
exception_exit(prev_state);
}

+/*
+ * schedule_raw_spin_unlock - unlock raw_spinlock and call schedule()
+ *
+ * Should be used instead of the constructions
+ * 1) raw_spin_unlock_irq(lock);
+ * schedule();
+ * or
+ * 2) raw_spin_unlock_irqrestore(lock, flags);
+ * schedule();
+ * where they have to be.
+ *
+ * It helps to prevent excess preempt_schedule() during the unlocking,
+ * which can be called on preemptible kernel.
+ * Returns with irqs enabled.
+ */
+void __sched schedule_raw_spin_unlock(raw_spinlock_t *lock)
+{
+ preempt_disable();
+ raw_spin_unlock_irq(lock);
+ sched_preempt_enable_no_resched();
+ schedule();
+}
+EXPORT_SYMBOL(schedule_raw_spin_unlock);
+
#endif /* CONFIG_PREEMPT */

int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,


2013-06-17 14:29:06

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] sched: schedule_raw_spin_unlock() and schedule_spin_unlock()

On Fri, 2013-06-14 at 18:40 +0400, Kirill Tkhai wrote:
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 58453b8..381e493 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3125,6 +3125,30 @@ asmlinkage void __sched preempt_schedule_irq(void)
> exception_exit(prev_state);
> }
>
> +/*
> + * schedule_raw_spin_unlock - unlock raw_spinlock and call schedule()
> + *
> + * Should be used instead of the constructions
> + * 1) raw_spin_unlock_irq(lock);
> + * schedule();
> + * or
> + * 2) raw_spin_unlock_irqrestore(lock, flags);
> + * schedule();

Is there a place that does #2? If interrupts were disabled, the flags
would keep them disabled and that would not be good when calling
schedule.

> + * where they have to be.
> + *
> + * It helps to prevent excess preempt_schedule() during the unlocking,
> + * which can be called on preemptible kernel.
> + * Returns with irqs enabled.
> + */
> +void __sched schedule_raw_spin_unlock(raw_spinlock_t *lock)
> +{

I agree with the idea of adding this, but I don't like this
implementation. Also, if this is to enable interrupts, the name must
represent that:

schedule_raw_spin_unlock_irq()

You can't just enable them if they were not disabled. That will break
things like lockdep.


> + preempt_disable();
> + raw_spin_unlock_irq(lock);
> + sched_preempt_enable_no_resched();

This is the easy way of implementing this, but it does add a slight
overhead here. Adding and subtracting preempt count just to prevent the
disable does add a bit more computation. I've done this in the tracing
code, but its within the tracing and in an unlikely path. The overhead
is not common. But I can see this being in a fast path and not something
that we want to add overhead to.

The ideal solution is not the easy one. It is to introduce a real
schedule_raw_spin_unlock() that is a copy of raw_spin_unlock() that does
not call preempt_enable, but calls preempt_enable_no_resched() instead,
and then does the schedule.

-- Steve



> + schedule();
> +}
> +EXPORT_SYMBOL(schedule_raw_spin_unlock);
> +
> #endif /* CONFIG_PREEMPT */
>
> int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,

2013-06-17 16:18:48

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] sched: schedule_raw_spin_unlock() and schedule_spin_unlock()



17.06.2013, 18:29, "Steven Rostedt" <[email protected]>:
> On Fri, 2013-06-14 at 18:40 +0400, Kirill Tkhai wrote:
>
>> ?diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> ?index 58453b8..381e493 100644
>> ?--- a/kernel/sched/core.c
>> ?+++ b/kernel/sched/core.c
>> ?@@ -3125,6 +3125,30 @@ asmlinkage void __sched preempt_schedule_irq(void)
>> ??????????exception_exit(prev_state);
>> ??}
>>
>> ?+/*
>> ?+ * schedule_raw_spin_unlock - unlock raw_spinlock and call schedule()
>> ?+ *
>> ?+ * Should be used instead of the constructions
>> ?+ * ??1) raw_spin_unlock_irq(lock);
>> ?+ * ?????schedule();
>> ?+ * ??or
>> ?+ * ??2) raw_spin_unlock_irqrestore(lock, flags);
>> ?+ * ?????schedule();
>
> Is there a place that does #2? If interrupts were disabled, the flags
> would keep them disabled and that would not be good when calling
> schedule.

Some drivers use _irqsave even if they are certainly enabled, I was thinking
about them. This is mistake, but people use.

Now I'm agree with you, I'll remove advice #2 to not multiply this errors.

>> ?+ * where they have to be.
>> ?+ *
>> ?+ * It helps to prevent excess preempt_schedule() during the unlocking,
>> ?+ * which can be called on preemptible kernel.
>> ?+ * Returns with irqs enabled.
>> ?+ */
>> ?+void __sched schedule_raw_spin_unlock(raw_spinlock_t *lock)
>> ?+{
>
> I agree with the idea of adding this, but I don't like this
> implementation. Also, if this is to enable interrupts, the name must
> represent that:
>
> ??schedule_raw_spin_unlock_irq()
>
> You can't just enable them if they were not disabled. That will break
> things like lockdep.

Ok, and in addition may be useful schedule_raw_spin_unlock() for
use in cases like this happens in fs/*

>> ?+ preempt_disable();
>> ?+ raw_spin_unlock_irq(lock);
>> ?+ sched_preempt_enable_no_resched();
>
> This is the easy way of implementing this, but it does add a slight
> overhead here. Adding and subtracting preempt count just to prevent the
> disable does add a bit more computation. I've done this in the tracing
> code, but its within the tracing and in an unlikely path. The overhead
> is not common. But I can see this being in a fast path and not something
> that we want to add overhead to.
>
> The ideal solution is not the easy one. It is to introduce a real
> schedule_raw_spin_unlock() that is a copy of raw_spin_unlock() that does
> not call preempt_enable, but calls preempt_enable_no_resched() instead,
> and then does the schedule.

Ok

> -- Steve
>
>> ?+ schedule();
>> ?+}
>> ?+EXPORT_SYMBOL(schedule_raw_spin_unlock);
>> ?+
>> ??#endif /* CONFIG_PREEMPT */
>>
>> ??int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,