LinuxLists.cc - [PATCH 13/21] sched: Add p->pi_lock to task_rq

[permalink] [raw]

Subject: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

Commit-ID: 0122ec5b02f766c355b3168df53a6c038a24fa0d
Gitweb: http://git.kernel.org/tip/0122ec5b02f766c355b3168df53a6c038a24fa0d
Author: Peter Zijlstra <[email protected]>
AuthorDate: Tue, 5 Apr 2011 17:23:51 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 14 Apr 2011 08:52:38 +0200

sched: Add p->pi_lock to task_rq_lock()

In order to be able to call set_task_cpu() while either holding
p->pi_lock or task_rq(p)->lock we need to hold both locks in order to
stabilize task_rq().

This makes task_rq_lock() acquire both locks, and have
__task_rq_lock() validate that p->pi_lock is held. This increases the
locking overhead for most scheduler syscalls but allows reduction of
rq->lock contention for some scheduler hot paths (ttwu).

Reviewed-by: Frank Rowand <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Nick Piggin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Andrew Morton <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched.c | 103 +++++++++++++++++++++++++------------------------------
1 files changed, 47 insertions(+), 56 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 6b269b7..f155127 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -599,7 +599,7 @@ static inline int cpu_of(struct rq *rq)
* Return the group to which this tasks belongs.
*
* We use task_subsys_state_check() and extend the RCU verification
- * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
+ * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
* holds that lock for each task it moves into the cgroup. Therefore
* by holding that lock, we pin the task to the current cgroup.
*/
@@ -609,7 +609,7 @@ static inline struct task_group *task_group(struct task_struct *p)
struct cgroup_subsys_state *css;

css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
- lockdep_is_held(&task_rq(p)->lock));
+ lockdep_is_held(&p->pi_lock));
tg = container_of(css, struct task_group, css);

return autogroup_task_group(p, tg);
@@ -924,23 +924,15 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
#endif /* __ARCH_WANT_UNLOCKED_CTXSW */

/*
- * Check whether the task is waking, we use this to synchronize ->cpus_allowed
- * against ttwu().
- */
-static inline int task_is_waking(struct task_struct *p)
-{
- return unlikely(p->state == TASK_WAKING);
-}
-
-/*
- * __task_rq_lock - lock the runqueue a given task resides on.
- * Must be called interrupts disabled.
+ * __task_rq_lock - lock the rq @p resides on.
*/
static inline struct rq *__task_rq_lock(struct task_struct *p)
__acquires(rq->lock)
{
struct rq *rq;

+ lockdep_assert_held(&p->pi_lock);
+
for (;;) {
rq = task_rq(p);
raw_spin_lock(&rq->lock);
@@ -951,22 +943,22 @@ static inline struct rq *__task_rq_lock(struct task_struct *p)
}

/*
- * task_rq_lock - lock the runqueue a given task resides on and disable
- * interrupts. Note the ordering: we can safely lookup the task_rq without
- * explicitly disabling preemption.
+ * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
*/
static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)
+ __acquires(p->pi_lock)
__acquires(rq->lock)
{
struct rq *rq;

for (;;) {
- local_irq_save(*flags);
+ raw_spin_lock_irqsave(&p->pi_lock, *flags);
rq = task_rq(p);
raw_spin_lock(&rq->lock);
if (likely(rq == task_rq(p)))
return rq;
- raw_spin_unlock_irqrestore(&rq->lock, *flags);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
}
}

@@ -976,10 +968,13 @@ static void __task_rq_unlock(struct rq *rq)
raw_spin_unlock(&rq->lock);
}

-static inline void task_rq_unlock(struct rq *rq, unsigned long *flags)
+static inline void
+task_rq_unlock(struct rq *rq, struct task_struct *p, unsigned long *flags)
__releases(rq->lock)
+ __releases(p->pi_lock)
{
- raw_spin_unlock_irqrestore(&rq->lock, *flags);
+ raw_spin_unlock(&rq->lock);
+ raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
}

/*
@@ -2175,6 +2170,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
*/
WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
+
+#ifdef CONFIG_LOCKDEP
+ WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
+ lockdep_is_held(&task_rq(p)->lock)));
+#endif
#endif

trace_sched_migrate_task(p, new_cpu);
@@ -2270,7 +2270,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
ncsw = 0;
if (!match_state || p->state == match_state)
ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);

/*
* If it changed from the expected state, bail out now.
@@ -2652,6 +2652,7 @@ static void __sched_fork(struct task_struct *p)
*/
void sched_fork(struct task_struct *p, int clone_flags)
{
+ unsigned long flags;
int cpu = get_cpu();

__sched_fork(p);
@@ -2702,9 +2703,9 @@ void sched_fork(struct task_struct *p, int clone_flags)
*
* Silence PROVE_RCU.
*/
- rcu_read_lock();
+ raw_spin_lock_irqsave(&p->pi_lock, flags);
set_task_cpu(p, cpu);
- rcu_read_unlock();
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);

#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
if (likely(sched_info_on()))
@@ -2753,7 +2754,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
set_task_cpu(p, cpu);

p->state = TASK_RUNNING;
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
#endif

rq = task_rq_lock(p, &flags);
@@ -2765,7 +2766,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
if (p->sched_class->task_woken)
p->sched_class->task_woken(rq, p);
#endif
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
put_cpu();
}

@@ -3490,12 +3491,12 @@ void sched_exec(void)
likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
struct migration_arg arg = { p, dest_cpu };

- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
return;
}
unlock:
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
}

#endif
@@ -3532,7 +3533,7 @@ unsigned long long task_delta_exec(struct task_struct *p)

rq = task_rq_lock(p, &flags);
ns = do_task_delta_exec(p, rq);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);

return ns;
}
@@ -3550,7 +3551,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)

rq = task_rq_lock(p, &flags);
ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);

return ns;
}
@@ -3574,7 +3575,7 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
rq = task_rq_lock(p, &flags);
thread_group_cputime(p, &totals);
ns = totals.sum_exec_runtime + do_task_delta_exec(p, rq);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);

return ns;
}
@@ -4693,16 +4694,13 @@ EXPORT_SYMBOL(sleep_on_timeout);
*/
void rt_mutex_setprio(struct task_struct *p, int prio)
{
- unsigned long flags;
int oldprio, on_rq, running;
struct rq *rq;
const struct sched_class *prev_class;

BUG_ON(prio < 0 || prio > MAX_PRIO);

- lockdep_assert_held(&p->pi_lock);
-
- rq = task_rq_lock(p, &flags);
+ rq = __task_rq_lock(p);

trace_sched_pi_setprio(p, prio);
oldprio = p->prio;
@@ -4727,7 +4725,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
enqueue_task(rq, p, oldprio < prio ? ENQUEUE_HEAD : 0);

check_class_changed(rq, p, prev_class, oldprio);
- task_rq_unlock(rq, &flags);
+ __task_rq_unlock(rq);
}

#endif
@@ -4775,7 +4773,7 @@ void set_user_nice(struct task_struct *p, long nice)
resched_task(rq->curr);
}
out_unlock:
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);
}
EXPORT_SYMBOL(set_user_nice);

@@ -5003,20 +5001,17 @@ recheck:
/*
* make sure no PI-waiters arrive (or leave) while we are
* changing the priority of the task:
- */
- raw_spin_lock_irqsave(&p->pi_lock, flags);
- /*
+ *
* To be able to change p->policy safely, the appropriate
* runqueue lock must be held.
*/
- rq = __task_rq_lock(p);
+ rq = task_rq_lock(p, &flags);

/*
* Changing the policy of the stop threads its a very bad idea
*/
if (p == rq->stop) {
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
return -EINVAL;
}

@@ -5040,8 +5035,7 @@ recheck:
if (rt_bandwidth_enabled() && rt_policy(policy) &&
task_group(p)->rt_bandwidth.rt_runtime == 0 &&
!task_group_is_autogroup(task_group(p))) {
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
return -EPERM;
}
}
@@ -5050,8 +5044,7 @@ recheck:
/* recheck policy now with rq lock held */
if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
policy = oldpolicy = -1;
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
goto recheck;
}
on_rq = p->on_rq;
@@ -5073,8 +5066,7 @@ recheck:
activate_task(rq, p, 0);

check_class_changed(rq, p, prev_class, oldprio);
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);

rt_mutex_adjust_pi(p);

@@ -5666,7 +5658,7 @@ SYSCALL_DEFINE2(sched_rr_get_interval, pid_t, pid,

rq = task_rq_lock(p, &flags);
time_slice = p->sched_class->get_rr_interval(rq, p);
- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, p, &flags);

rcu_read_unlock();
jiffies_to_timespec(time_slice, &t);
@@ -5889,8 +5881,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
unsigned int dest_cpu;
int ret = 0;

- raw_spin_lock_irqsave(&p->pi_lock, flags);
- rq = __task_rq_lock(p);
+ rq = task_rq_lock(p, &flags);

if (!cpumask_intersects(new_mask, cpu_active_mask)) {
ret = -EINVAL;
@@ -5918,15 +5909,13 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
if (need_migrate_task(p)) {
struct migration_arg arg = { p, dest_cpu };
/* Need help from migration thread: drop lock and wait. */
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);
stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
tlb_migrate_finish(p->mm);
return 0;
}
out:
- __task_rq_unlock(rq);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ task_rq_unlock(rq, p, &flags);

return ret;
}
@@ -5954,6 +5943,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
rq_src = cpu_rq(src_cpu);
rq_dest = cpu_rq(dest_cpu);

+ raw_spin_lock(&p->pi_lock);
double_rq_lock(rq_src, rq_dest);
/* Already moved. */
if (task_cpu(p) != src_cpu)
@@ -5976,6 +5966,7 @@ done:
ret = 1;
fail:
double_rq_unlock(rq_src, rq_dest);
+ raw_spin_unlock(&p->pi_lock);
return ret;
}

@@ -8702,7 +8693,7 @@ void sched_move_task(struct task_struct *tsk)
if (on_rq)
enqueue_task(rq, tsk, 0);

- task_rq_unlock(rq, &flags);
+ task_rq_unlock(rq, tsk, &flags);
}
#endif /* CONFIG_CGROUP_SCHED */

2011-06-01 13:59:26

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

Hi,

git bisect blames this commit for a problem I have with v3.0-rc1:
If I printk large amounts of data, the machine locks up.
As the commit does not revert cleanly on top of 3.0, I haven't been
able to double check.
The test I use is simple, just add something like

for (i=0; i < 10000; ++i) printk("test %d\n", i);

and trigger it, in most cases I can see the first 10 printks before
I have to power cycle the machine (sysrq-b does not work anymore).
Attached my .config.

-Arne

On 14.04.2011 10:36, tip-bot for Peter Zijlstra wrote:
> Commit-ID: 0122ec5b02f766c355b3168df53a6c038a24fa0d
> Gitweb: http://git.kernel.org/tip/0122ec5b02f766c355b3168df53a6c038a24fa0d
> Author: Peter Zijlstra <[email protected]>
> AuthorDate: Tue, 5 Apr 2011 17:23:51 +0200
> Committer: Ingo Molnar <[email protected]>
> CommitDate: Thu, 14 Apr 2011 08:52:38 +0200
>
> sched: Add p->pi_lock to task_rq_lock()
>
> In order to be able to call set_task_cpu() while either holding
> p->pi_lock or task_rq(p)->lock we need to hold both locks in order to
> stabilize task_rq().
>
> This makes task_rq_lock() acquire both locks, and have
> __task_rq_lock() validate that p->pi_lock is held. This increases the
> locking overhead for most scheduler syscalls but allows reduction of
> rq->lock contention for some scheduler hot paths (ttwu).
>
> Reviewed-by: Frank Rowand <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Cc: Mike Galbraith <[email protected]>
> Cc: Nick Piggin <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/sched.c | 103 +++++++++++++++++++++++++------------------------------
> 1 files changed, 47 insertions(+), 56 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 6b269b7..f155127 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -599,7 +599,7 @@ static inline int cpu_of(struct rq *rq)
> * Return the group to which this tasks belongs.
> *
> * We use task_subsys_state_check() and extend the RCU verification
> - * with lockdep_is_held(&task_rq(p)->lock) because cpu_cgroup_attach()
> + * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
> * holds that lock for each task it moves into the cgroup. Therefore
> * by holding that lock, we pin the task to the current cgroup.
> */
> @@ -609,7 +609,7 @@ static inline struct task_group *task_group(struct task_struct *p)
> struct cgroup_subsys_state *css;
>
> css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> - lockdep_is_held(&task_rq(p)->lock));
> + lockdep_is_held(&p->pi_lock));
> tg = container_of(css, struct task_group, css);
>
> return autogroup_task_group(p, tg);
> @@ -924,23 +924,15 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
> #endif /* __ARCH_WANT_UNLOCKED_CTXSW */
>
> /*
> - * Check whether the task is waking, we use this to synchronize ->cpus_allowed
> - * against ttwu().
> - */
> -static inline int task_is_waking(struct task_struct *p)
> -{
> - return unlikely(p->state == TASK_WAKING);
> -}
> -
> -/*
> - * __task_rq_lock - lock the runqueue a given task resides on.
> - * Must be called interrupts disabled.
> + * __task_rq_lock - lock the rq @p resides on.
> */
> static inline struct rq *__task_rq_lock(struct task_struct *p)
> __acquires(rq->lock)
> {
> struct rq *rq;
>
> + lockdep_assert_held(&p->pi_lock);
> +
> for (;;) {
> rq = task_rq(p);
> raw_spin_lock(&rq->lock);
> @@ -951,22 +943,22 @@ static inline struct rq *__task_rq_lock(struct task_struct *p)
> }
>
> /*
> - * task_rq_lock - lock the runqueue a given task resides on and disable
> - * interrupts. Note the ordering: we can safely lookup the task_rq without
> - * explicitly disabling preemption.
> + * task_rq_lock - lock p->pi_lock and lock the rq @p resides on.
> */
> static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)
> + __acquires(p->pi_lock)
> __acquires(rq->lock)
> {
> struct rq *rq;
>
> for (;;) {
> - local_irq_save(*flags);
> + raw_spin_lock_irqsave(&p->pi_lock, *flags);
> rq = task_rq(p);
> raw_spin_lock(&rq->lock);
> if (likely(rq == task_rq(p)))
> return rq;
> - raw_spin_unlock_irqrestore(&rq->lock, *flags);
> + raw_spin_unlock(&rq->lock);
> + raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
> }
> }
>
> @@ -976,10 +968,13 @@ static void __task_rq_unlock(struct rq *rq)
> raw_spin_unlock(&rq->lock);
> }
>
> -static inline void task_rq_unlock(struct rq *rq, unsigned long *flags)
> +static inline void
> +task_rq_unlock(struct rq *rq, struct task_struct *p, unsigned long *flags)
> __releases(rq->lock)
> + __releases(p->pi_lock)
> {
> - raw_spin_unlock_irqrestore(&rq->lock, *flags);
> + raw_spin_unlock(&rq->lock);
> + raw_spin_unlock_irqrestore(&p->pi_lock, *flags);
> }
>
> /*
> @@ -2175,6 +2170,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
> */
> WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
> !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
> +
> +#ifdef CONFIG_LOCKDEP
> + WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
> + lockdep_is_held(&task_rq(p)->lock)));
> +#endif
> #endif
>
> trace_sched_migrate_task(p, new_cpu);
> @@ -2270,7 +2270,7 @@ unsigned long wait_task_inactive(struct task_struct *p, long match_state)
> ncsw = 0;
> if (!match_state || p->state == match_state)
> ncsw = p->nvcsw | LONG_MIN; /* sets MSB */
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
>
> /*
> * If it changed from the expected state, bail out now.
> @@ -2652,6 +2652,7 @@ static void __sched_fork(struct task_struct *p)
> */
> void sched_fork(struct task_struct *p, int clone_flags)
> {
> + unsigned long flags;
> int cpu = get_cpu();
>
> __sched_fork(p);
> @@ -2702,9 +2703,9 @@ void sched_fork(struct task_struct *p, int clone_flags)
> *
> * Silence PROVE_RCU.
> */
> - rcu_read_lock();
> + raw_spin_lock_irqsave(&p->pi_lock, flags);
> set_task_cpu(p, cpu);
> - rcu_read_unlock();
> + raw_spin_unlock_irqrestore(&p->pi_lock, flags);
>
> #if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> if (likely(sched_info_on()))
> @@ -2753,7 +2754,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
> set_task_cpu(p, cpu);
>
> p->state = TASK_RUNNING;
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
> #endif
>
> rq = task_rq_lock(p, &flags);
> @@ -2765,7 +2766,7 @@ void wake_up_new_task(struct task_struct *p, unsigned long clone_flags)
> if (p->sched_class->task_woken)
> p->sched_class->task_woken(rq, p);
> #endif
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
> put_cpu();
> }
>
> @@ -3490,12 +3491,12 @@ void sched_exec(void)
> likely(cpu_active(dest_cpu)) && need_migrate_task(p)) {
> struct migration_arg arg = { p, dest_cpu };
>
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
> stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> return;
> }
> unlock:
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
> }
>
> #endif
> @@ -3532,7 +3533,7 @@ unsigned long long task_delta_exec(struct task_struct *p)
>
> rq = task_rq_lock(p, &flags);
> ns = do_task_delta_exec(p, rq);
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
>
> return ns;
> }
> @@ -3550,7 +3551,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
>
> rq = task_rq_lock(p, &flags);
> ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
>
> return ns;
> }
> @@ -3574,7 +3575,7 @@ unsigned long long thread_group_sched_runtime(struct task_struct *p)
> rq = task_rq_lock(p, &flags);
> thread_group_cputime(p, &totals);
> ns = totals.sum_exec_runtime + do_task_delta_exec(p, rq);
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
>
> return ns;
> }
> @@ -4693,16 +4694,13 @@ EXPORT_SYMBOL(sleep_on_timeout);
> */
> void rt_mutex_setprio(struct task_struct *p, int prio)
> {
> - unsigned long flags;
> int oldprio, on_rq, running;
> struct rq *rq;
> const struct sched_class *prev_class;
>
> BUG_ON(prio < 0 || prio > MAX_PRIO);
>
> - lockdep_assert_held(&p->pi_lock);
> -
> - rq = task_rq_lock(p, &flags);
> + rq = __task_rq_lock(p);
>
> trace_sched_pi_setprio(p, prio);
> oldprio = p->prio;
> @@ -4727,7 +4725,7 @@ void rt_mutex_setprio(struct task_struct *p, int prio)
> enqueue_task(rq, p, oldprio < prio ? ENQUEUE_HEAD : 0);
>
> check_class_changed(rq, p, prev_class, oldprio);
> - task_rq_unlock(rq, &flags);
> + __task_rq_unlock(rq);
> }
>
> #endif
> @@ -4775,7 +4773,7 @@ void set_user_nice(struct task_struct *p, long nice)
> resched_task(rq->curr);
> }
> out_unlock:
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
> }
> EXPORT_SYMBOL(set_user_nice);
>
> @@ -5003,20 +5001,17 @@ recheck:
> /*
> * make sure no PI-waiters arrive (or leave) while we are
> * changing the priority of the task:
> - */
> - raw_spin_lock_irqsave(&p->pi_lock, flags);
> - /*
> + *
> * To be able to change p->policy safely, the appropriate
> * runqueue lock must be held.
> */
> - rq = __task_rq_lock(p);
> + rq = task_rq_lock(p, &flags);
>
> /*
> * Changing the policy of the stop threads its a very bad idea
> */
> if (p == rq->stop) {
> - __task_rq_unlock(rq);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + task_rq_unlock(rq, p, &flags);
> return -EINVAL;
> }
>
> @@ -5040,8 +5035,7 @@ recheck:
> if (rt_bandwidth_enabled() && rt_policy(policy) &&
> task_group(p)->rt_bandwidth.rt_runtime == 0 &&
> !task_group_is_autogroup(task_group(p))) {
> - __task_rq_unlock(rq);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + task_rq_unlock(rq, p, &flags);
> return -EPERM;
> }
> }
> @@ -5050,8 +5044,7 @@ recheck:
> /* recheck policy now with rq lock held */
> if (unlikely(oldpolicy != -1 && oldpolicy != p->policy)) {
> policy = oldpolicy = -1;
> - __task_rq_unlock(rq);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + task_rq_unlock(rq, p, &flags);
> goto recheck;
> }
> on_rq = p->on_rq;
> @@ -5073,8 +5066,7 @@ recheck:
> activate_task(rq, p, 0);
>
> check_class_changed(rq, p, prev_class, oldprio);
> - __task_rq_unlock(rq);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + task_rq_unlock(rq, p, &flags);
>
> rt_mutex_adjust_pi(p);
>
> @@ -5666,7 +5658,7 @@ SYSCALL_DEFINE2(sched_rr_get_interval, pid_t, pid,
>
> rq = task_rq_lock(p, &flags);
> time_slice = p->sched_class->get_rr_interval(rq, p);
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, p, &flags);
>
> rcu_read_unlock();
> jiffies_to_timespec(time_slice, &t);
> @@ -5889,8 +5881,7 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
> unsigned int dest_cpu;
> int ret = 0;
>
> - raw_spin_lock_irqsave(&p->pi_lock, flags);
> - rq = __task_rq_lock(p);
> + rq = task_rq_lock(p, &flags);
>
> if (!cpumask_intersects(new_mask, cpu_active_mask)) {
> ret = -EINVAL;
> @@ -5918,15 +5909,13 @@ int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
> if (need_migrate_task(p)) {
> struct migration_arg arg = { p, dest_cpu };
> /* Need help from migration thread: drop lock and wait. */
> - __task_rq_unlock(rq);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + task_rq_unlock(rq, p, &flags);
> stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> tlb_migrate_finish(p->mm);
> return 0;
> }
> out:
> - __task_rq_unlock(rq);
> - raw_spin_unlock_irqrestore(&p->pi_lock, flags);
> + task_rq_unlock(rq, p, &flags);
>
> return ret;
> }
> @@ -5954,6 +5943,7 @@ static int __migrate_task(struct task_struct *p, int src_cpu, int dest_cpu)
> rq_src = cpu_rq(src_cpu);
> rq_dest = cpu_rq(dest_cpu);
>
> + raw_spin_lock(&p->pi_lock);
> double_rq_lock(rq_src, rq_dest);
> /* Already moved. */
> if (task_cpu(p) != src_cpu)
> @@ -5976,6 +5966,7 @@ done:
> ret = 1;
> fail:
> double_rq_unlock(rq_src, rq_dest);
> + raw_spin_unlock(&p->pi_lock);
> return ret;
> }
>
> @@ -8702,7 +8693,7 @@ void sched_move_task(struct task_struct *tsk)
> if (on_rq)
> enqueue_task(rq, tsk, 0);
>
> - task_rq_unlock(rq, &flags);
> + task_rq_unlock(rq, tsk, &flags);
> }
> #endif /* CONFIG_CGROUP_SCHED */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Attachments:

config (77.74 kB)

2011-06-01 16:31:56

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
> git bisect blames this commit for a problem I have with v3.0-rc1:
> If I printk large amounts of data, the machine locks up.
> As the commit does not revert cleanly on top of 3.0, I haven't been
> able to double check.
> The test I use is simple, just add something like
>
> for (i=0; i < 10000; ++i) printk("test %d\n", i);
>
> and trigger it, in most cases I can see the first 10 printks before
> I have to power cycle the machine (sysrq-b does not work anymore).
> Attached my .config.

I've made me a module that does the above, I've also changed my .config
to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
but sadly I cannot reproduce, I get all 10k prints on my serial line.

Even without serial line it works (somehow booting without visible
console is scary as hell :)

Which makes me ask, how are you observing your console?

Because those 10k lines aren't even near the amount of crap a regular
boot spews out on this box, although I guess the tight loop might
generate it slightly faster than a regular boot does.

2011-06-01 17:21:28

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On 01.06.2011 18:35, Peter Zijlstra wrote:
> On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
>> git bisect blames this commit for a problem I have with v3.0-rc1:
>> If I printk large amounts of data, the machine locks up.
>> As the commit does not revert cleanly on top of 3.0, I haven't been
>> able to double check.
>> The test I use is simple, just add something like
>>
>> for (i=0; i< 10000; ++i) printk("test %d\n", i);
>>
>> and trigger it, in most cases I can see the first 10 printks before
>> I have to power cycle the machine (sysrq-b does not work anymore).
>> Attached my .config.
>
> I've made me a module that does the above, I've also changed my .config
> to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
> but sadly I cannot reproduce, I get all 10k prints on my serial line.
>
> Even without serial line it works (somehow booting without visible
> console is scary as hell :)
>
> Which makes me ask, how are you observing your console?
>

They don't go out to the serial line, I only observe them with a
tail -f on messages. Default log level doesn't go the console here.

> Because those 10k lines aren't even near the amount of crap a regular
> boot spews out on this box, although I guess the tight loop might
> generate it slightly faster than a regular boot does.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-06-01 18:05:45

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Wed, 2011-06-01 at 19:20 +0200, Arne Jansen wrote:
> On 01.06.2011 18:35, Peter Zijlstra wrote:
> > On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
> >> git bisect blames this commit for a problem I have with v3.0-rc1:
> >> If I printk large amounts of data, the machine locks up.
> >> As the commit does not revert cleanly on top of 3.0, I haven't been
> >> able to double check.
> >> The test I use is simple, just add something like
> >>
> >> for (i=0; i< 10000; ++i) printk("test %d\n", i);
> >>
> >> and trigger it, in most cases I can see the first 10 printks before
> >> I have to power cycle the machine (sysrq-b does not work anymore).
> >> Attached my .config.
> >
> > I've made me a module that does the above, I've also changed my .config
> > to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
> > but sadly I cannot reproduce, I get all 10k prints on my serial line.
> >
> > Even without serial line it works (somehow booting without visible
> > console is scary as hell :)
> >
> > Which makes me ask, how are you observing your console?
> >
>
> They don't go out to the serial line, I only observe them with a
> tail -f on messages. Default log level doesn't go the console here.

Right ok, so I used your exact .config, added a few drivers needed for
my hardware and indeed, it doesn't even finish booting and gets stuck
someplace.

Sadly it looks like even the NMI watchdog is dead,.. /me goes try and
make sense of this.

2011-06-01 18:41:04

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Wed, 2011-06-01 at 20:09 +0200, Peter Zijlstra wrote:
> On Wed, 2011-06-01 at 19:20 +0200, Arne Jansen wrote:
> > On 01.06.2011 18:35, Peter Zijlstra wrote:
> > > On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
> > >> git bisect blames this commit for a problem I have with v3.0-rc1:
> > >> If I printk large amounts of data, the machine locks up.
> > >> As the commit does not revert cleanly on top of 3.0, I haven't been
> > >> able to double check.
> > >> The test I use is simple, just add something like
> > >>
> > >> for (i=0; i< 10000; ++i) printk("test %d\n", i);
> > >>
> > >> and trigger it, in most cases I can see the first 10 printks before
> > >> I have to power cycle the machine (sysrq-b does not work anymore).
> > >> Attached my .config.
> > >
> > > I've made me a module that does the above, I've also changed my .config
> > > to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
> > > but sadly I cannot reproduce, I get all 10k prints on my serial line.
> > >
> > > Even without serial line it works (somehow booting without visible
> > > console is scary as hell :)
> > >
> > > Which makes me ask, how are you observing your console?
> > >
> >
> > They don't go out to the serial line, I only observe them with a
> > tail -f on messages. Default log level doesn't go the console here.
>
> Right ok, so I used your exact .config, added a few drivers needed for
> my hardware and indeed, it doesn't even finish booting and gets stuck
> someplace.
>
> Sadly it looks like even the NMI watchdog is dead,.. /me goes try and
> make sense of this.

Sadly both 0122ec5b02f766c355b3168df53a6c038a24fa0d^1 and
0122ec5b02f766c355b3168df53a6c038a24fa0d itself boot just fine and run
the test module without problems.

I will have to re-bisect this.

2011-06-01 19:31:43

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On 01.06.2011 20:44, Peter Zijlstra wrote:
> On Wed, 2011-06-01 at 20:09 +0200, Peter Zijlstra wrote:
>> On Wed, 2011-06-01 at 19:20 +0200, Arne Jansen wrote:
>>> On 01.06.2011 18:35, Peter Zijlstra wrote:
>>>> On Wed, 2011-06-01 at 15:58 +0200, Arne Jansen wrote:
>>>>> git bisect blames this commit for a problem I have with v3.0-rc1:
>>>>> If I printk large amounts of data, the machine locks up.
>>>>> As the commit does not revert cleanly on top of 3.0, I haven't been
>>>>> able to double check.
>>>>> The test I use is simple, just add something like
>>>>>
>>>>> for (i=0; i< 10000; ++i) printk("test %d\n", i);
>>>>>
>>>>> and trigger it, in most cases I can see the first 10 printks before
>>>>> I have to power cycle the machine (sysrq-b does not work anymore).
>>>>> Attached my .config.
>>>>
>>>> I've made me a module that does the above, I've also changed my .config
>>>> to match yours (smp=y, sched-cgroup=y, autogroup=n, preempt=n, no_hz=y),
>>>> but sadly I cannot reproduce, I get all 10k prints on my serial line.
>>>>
>>>> Even without serial line it works (somehow booting without visible
>>>> console is scary as hell :)
>>>>
>>>> Which makes me ask, how are you observing your console?
>>>>
>>>
>>> They don't go out to the serial line, I only observe them with a
>>> tail -f on messages. Default log level doesn't go the console here.
>>
>> Right ok, so I used your exact .config, added a few drivers needed for
>> my hardware and indeed, it doesn't even finish booting and gets stuck
>> someplace.
>>
>> Sadly it looks like even the NMI watchdog is dead,.. /me goes try and
>> make sense of this.
>
> Sadly both 0122ec5b02f766c355b3168df53a6c038a24fa0d^1 and
> 0122ec5b02f766c355b3168df53a6c038a24fa0d itself boot just fine and run
> the test module without problems.

I can only partially confirm this:

2acca55ed98ad9b9aa25e7e587ebe306c0313dc7 runs fine
0122ec5b02f766c355b3168df53a6c038a24fa0d freezes after line 189
ab2515c4b98f7bc4fa11cad9fa0f811d63a72a26 freezes after line 39

>
> I will have to re-bisect this.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-06-01 21:09:33

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

Boot-time hang - maybe due to the mis-merge that re-introduced the
infinite media change signals for ide-cd?

I just pushed out a fix, it may not have mirrored out yet.

I dunno. Worth checking out before spending a lot of time bisecting.

Linus

2011-06-03 09:16:00

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Thu, 2011-06-02 at 06:09 +0900, Linus Torvalds wrote:
> Boot-time hang - maybe due to the mis-merge that re-introduced the
> infinite media change signals for ide-cd?
>
> I just pushed out a fix, it may not have mirrored out yet.
>
> I dunno. Worth checking out before spending a lot of time bisecting.

Right, so that wasn't it. I haven't done a full bisect yet because I
noticed it died on a usb suspend line every single time and that machine
only had a single usb device, a memory stick, in it. So I simply pulled
the stick and voila it booted. So something is screwy with usb suspend
or something.

This of course means that I'm now completely unable to reproduce the
issue at hand :/

Maybe if I try another box..

Anyway, Arne, how long did you wait before power cycling the box? The
NMI watchdog should trigger in about a minute or so if it will trigger
at all (its enabled in your config).

2011-06-03 10:03:05

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On 03.06.2011 11:15, Peter Zijlstra wrote:
> On Thu, 2011-06-02 at 06:09 +0900, Linus Torvalds wrote:
>> Boot-time hang - maybe due to the mis-merge that re-introduced the
>> infinite media change signals for ide-cd?
>>
>> I just pushed out a fix, it may not have mirrored out yet.
>>
>> I dunno. Worth checking out before spending a lot of time bisecting.
>
> Right, so that wasn't it. I haven't done a full bisect yet because I
> noticed it died on a usb suspend line every single time and that machine
> only had a single usb device, a memory stick, in it. So I simply pulled
> the stick and voila it booted. So something is screwy with usb suspend
> or something.
>
> This of course means that I'm now completely unable to reproduce the
> issue at hand :/
>
> Maybe if I try another box..
>
> Anyway, Arne, how long did you wait before power cycling the box? The
> NMI watchdog should trigger in about a minute or so if it will trigger
> at all (its enabled in your config).

No, it doesn't trigger, but the hang is not as complete as I first
thought. A running iostat via ssh continues to give output for a while,
the serial console still reacts to return and prompts for login. But
after a while more and more locks up. The console locks as soon as I
sysrq-t.
Maybe it has also something to do with the place where I added the
printks (btrfs_scan_one_device). Also the 10k-print gets triggered
several times (though I only see 10 lines of output). Maybe you can
send me your test-module and I'll try that, so we have more equal
conditions.
What also might help: the maschine I'm testing with is a quad-core
X3450 with 8GB RAM.

2011-06-03 10:31:15

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
> On 03.06.2011 11:15, Peter Zijlstra wrote:

> > Anyway, Arne, how long did you wait before power cycling the box? The
> > NMI watchdog should trigger in about a minute or so if it will trigger
> > at all (its enabled in your config).
>
> No, it doesn't trigger,

Bummer.

> but the hang is not as complete as I first
> thought. A running iostat via ssh continues to give output for a while,
> the serial console still reacts to return and prompts for login. But
> after a while more and more locks up. The console locks as soon as I
> sysrq-t.

OK, that seems to suggest one CPU is stuck, and once you try something
that touches the CPU everything grinds to a halt. Does something like
sysrq-l work? That would send NMIs to the other CPUs.

Anyway, good to know using serial doesn't make it go away, that means
its not too timing sensitive.

> Maybe it has also something to do with the place where I added the
> printks (btrfs_scan_one_device).

printk() should work pretty much anywhere these days, and filesystem
code in particular shouldn't be ran from any weird and wonderful
contexts afaik.

> Also the 10k-print gets triggered
> several times (though I only see 10 lines of output). Maybe you can
> send me your test-module and I'll try that, so we have more equal
> conditions.

Sure, see below.

> What also might help: the maschine I'm testing with is a quad-core
> X3450 with 8GB RAM.

/me & wikipedia, that's a nehalem box, ok I'm testing on a westmere
(don't have a nehalem).

---
kernel/Makefile | 1 +
kernel/test.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/kernel/Makefile b/kernel/Makefile
index 2d64cfc..65eff6c 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -80,6 +80,7 @@ obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
obj-$(CONFIG_SECCOMP) += seccomp.o
obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
+obj-m += test.o
obj-$(CONFIG_TREE_RCU) += rcutree.o
obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
diff --git a/kernel/test.c b/kernel/test.c
index e69de29..8005395 100644
--- a/kernel/test.c
+++ b/kernel/test.c
@@ -0,0 +1,23 @@
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+MODULE_LICENSE("GPL");
+
+static void
+test_cleanup(void)
+{
+}
+
+static int __init
+test_init(void)
+{
+ int i;
+
+ for (i = 0; i < 10000; i++)
+ printk("test %d\n", i);
+
+ return 0;
+}
+
+module_init(test_init);
+module_exit(test_cleanup);

2011-06-03 11:52:21

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On 03.06.2011 12:30, Peter Zijlstra wrote:
> On Fri, 2011-06-03 at 12:02 +0200, Arne Jansen wrote:
>> On 03.06.2011 11:15, Peter Zijlstra wrote:
>
> Bummer.
>
>> but the hang is not as complete as I first
>> thought. A running iostat via ssh continues to give output for a while,
>> the serial console still reacts to return and prompts for login. But
>> after a while more and more locks up. The console locks as soon as I
>> sysrq-t.
>
> OK, that seems to suggest one CPU is stuck, and once you try something
> that touches the CPU everything grinds to a halt. Does something like
> sysrq-l work? That would send NMIs to the other CPUs.
>
> Anyway, good to know using serial doesn't make it go away, that means
> its not too timing sensitive.
>
>
>> Also the 10k-print gets triggered
>> several times (though I only see 10 lines of output). Maybe you can
>> send me your test-module and I'll try that, so we have more equal
>> conditions.
>
> Sure, see below.
>

Your module also triggers it. On first test directly on first try, on
second test only on the 3rd try. When it hangs sysrq-l doesn't give
any output. I double checked without a hang, and then it dumps
something.

>> What also might help: the maschine I'm testing with is a quad-core
>> X3450 with 8GB RAM.
>
> /me & wikipedia, that's a nehalem box, ok I'm testing on a westmere
> (don't have a nehalem).

>
> ---
> kernel/Makefile | 1 +
> kernel/test.c | 23 +++++++++++++++++++++++
> 2 files changed, 24 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/Makefile b/kernel/Makefile
> index 2d64cfc..65eff6c 100644
> --- a/kernel/Makefile
> +++ b/kernel/Makefile
> @@ -80,6 +80,7 @@ obj-$(CONFIG_LOCKUP_DETECTOR) += watchdog.o
> obj-$(CONFIG_GENERIC_HARDIRQS) += irq/
> obj-$(CONFIG_SECCOMP) += seccomp.o
> obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o
> +obj-m += test.o
> obj-$(CONFIG_TREE_RCU) += rcutree.o
> obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
> obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
> diff --git a/kernel/test.c b/kernel/test.c
> index e69de29..8005395 100644
> --- a/kernel/test.c
> +++ b/kernel/test.c
> @@ -0,0 +1,23 @@
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +
> +MODULE_LICENSE("GPL");
> +
> +static void
> +test_cleanup(void)
> +{
> +}
> +
> +static int __init
> +test_init(void)
> +{
> + int i;
> +
> + for (i = 0; i < 10000; i++)
> + printk("test %d\n", i);
> +
> + return 0;
> +}
> +
> +module_init(test_init);
> +module_exit(test_cleanup);
>

2011-06-03 12:44:26

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Fri, Jun 3, 2011 at 7:02 PM, Arne Jansen <[email protected]> wrote:
>
> No, it doesn't trigger, but the hang is not as complete as I first
> thought. A running iostat via ssh continues to give output for a while,
> the serial console still reacts to return and prompts for login. But
> after a while more and more locks up. The console locks as soon as I
> sysrq-t.

Is it the tty rescheduling bug?

That would explain the printk's mattering.

Remove the schedule_work() call from flush_to_ldisc() in
drivers/tty/tty_buffer.c and see if the problem goes away. See the
other discussion thread on lkml ("tty breakage in X (Was: tty vs
workqueue oddities)")

Hmm?

Linus

2011-06-03 13:06:03

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On 03.06.2011 14:44, Linus Torvalds wrote:
> On Fri, Jun 3, 2011 at 7:02 PM, Arne Jansen <[email protected]> wrote:
>>
>> No, it doesn't trigger, but the hang is not as complete as I first
>> thought. A running iostat via ssh continues to give output for a while,
>> the serial console still reacts to return and prompts for login. But
>> after a while more and more locks up. The console locks as soon as I
>> sysrq-t.
>
> Is it the tty rescheduling bug?
>
> That would explain the printk's mattering.
>
> Remove the schedule_work() call from flush_to_ldisc() in
> drivers/tty/tty_buffer.c and see if the problem goes away. See the
> other discussion thread on lkml ("tty breakage in X (Was: tty vs
> workqueue oddities)")
>
> Hmm?

No change. Also git bisect quite clearly points to
0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
b1c43f82c5aa2654 mentioned in the other thread.

-Arne

> Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2011-06-04 21:30:28

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Fri, Jun 3, 2011 at 10:05 PM, Arne Jansen <[email protected]> wrote:
>
> No change. Also git bisect quite clearly points to
> 0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
> b1c43f82c5aa2654 mentioned in the other thread.

Ok, I haven't heard anything further on this. Ingo? Peter?

We're getting to the point where we just need to revert the thing,
since I'm not getting the feeling that there are any fixes
forthcoming, and I'd like -rc2 to not have this kind of bisected bug.

Ingo? Those two commits no longer revert cleanly, presumably due to
other changes in the area (but I didn't check). Can you do a patch to
do the reverts, and then you can try to re-do the thing later once you
figure out what's wrong.

Linus

2011-06-04 22:05:17

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Sun, 2011-06-05 at 06:29 +0900, Linus Torvalds wrote:
> On Fri, Jun 3, 2011 at 10:05 PM, Arne Jansen <[email protected]> wrote:
> >
> > No change. Also git bisect quite clearly points to
> > 0122ec5b02f766c and ab2515c4b98f7bc4, both are older than
> > b1c43f82c5aa2654 mentioned in the other thread.
>
> Ok, I haven't heard anything further on this. Ingo? Peter?

I'm a bit stumped, and not being able to reproduce at all :/

> We're getting to the point where we just need to revert the thing,
> since I'm not getting the feeling that there are any fixes
> forthcoming, and I'd like -rc2 to not have this kind of bisected bug.

Agreed.

> Ingo? Those two commits no longer revert cleanly, presumably due to
> other changes in the area (but I didn't check). Can you do a patch to
> do the reverts, and then you can try to re-do the thing later once you
> figure out what's wrong.

Yeah, that wants a whole lot of reverting, from the offending commit up
to and including 317f394160e9beb97d19a84c39b7e5eb3d7815a8.

2011-06-04 22:51:07

[permalink] [raw]

Subject: Re: [tip:sched/locking] sched: Add p->pi_lock to task_rq_lock()

On Sun, Jun 5, 2011 at 7:08 AM, Peter Zijlstra <[email protected]> wrote:
>
> Yeah, that wants a whole lot of reverting, from the offending commit up
> to and including 317f394160e9beb97d19a84c39b7e5eb3d7815a8.

Mind sending one single tested patch? I still get conflicts, even just
trying to revert the last of those (ie 317f394160e9) due to all the
other scheduler changes..

Linus

2011-06-05 06:02:13