2009-07-03 01:31:43

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly caused by netem)

On 07/02/2009 01:59 PM, Andres Freund wrote:
> On 07/02/2009 01:54 PM, Jarek Poplawski wrote:
>> On Thu, Jul 02, 2009 at 01:43:49PM +0200, Andres Freund wrote: ...
>>> I will start trying to place the issue by testing with existing
>>> kernels between 2.6.30 and now.
>> If you can afford your time of course this would be very helpful.
> Well. Waiting for the issue to resolve itself would cost time as well
> ;-) I wont be able to finish this today, but perhaps some reduction
> of the search space will be enough.
I lied.

> I placed it between 2.6.30 and
> 03347e2592078a90df818670fddf97a33eec70fb (v2.6.30-5415-g03347e2) so
> far.
Ok. I finally see the light. I bisected the issue down to
eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
pinned timers

Disabling timer migration like provided in the earlier commit stops the issue
from occuring.

That it is related to timers is sensible in the light of my findings, that I
could trigger the issue only when using delay in netem - that is the codepath
using qdisc_watchdog...

Andres

Repasted original problem description for newly CC'ed people:
> While playing around with netem (time, not packet count based loss-
> bursts) I experienced soft lockups several times - to exclude it was
> my modifications causing this I recompiled with the original and it
> is still locking up. I captured several of those traces via the
> thankfully still working netconsole. The simplest policy I could
> reproduce the error with was: tc qdisc add dev eth0 root handle 1:
> netem delay 10ms loss 0
>
> I could not reproduce the error without delay - but that may only be
> a timing issue, as the host I was mainly transferring data to was on
> a local network. I could not reproduce the issue on lo.
>
> The time to reproduce the error varied from seconds after executing
> tc to several minutes.
>
> Traces 5+6 are made with vanilla
> 52989765629e7d182b4f146050ebba0abf2cb0b7
>
> The earlier traces are made with parts of my patches applied, and
> only included for completeness as I don't believe my modifications
> were causing this and all traces are different, so it may give some
> clues.
>
> Lockdep was enabled but did not diagnose anything relevant (one dvb
> warning during bootup).
>
> Any ideas for debugging?


2009-07-03 06:20:10

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
...
> Ok. I finally see the light. I bisected the issue down to
> eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> pinned timers
>
> Disabling timer migration like provided in the earlier commit stops the
> issue from occuring.
>
> That it is related to timers is sensible in the light of my findings,
> that I could trigger the issue only when using delay in netem - that is
> the codepath using qdisc_watchdog...

Andres, thanks for your work and time. It saved me a lot of searching,
because I wasn't able to trigger this on my old box.

Jarek P.

2009-07-03 11:26:35

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> ...
>
> > Ok. I finally see the light. I bisected the issue down to
> > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> > pinned timers
> >
> > Disabling timer migration like provided in the earlier commit stops the
> > issue from occuring.
> >
> > That it is related to timers is sensible in the light of my findings,
> > that I could trigger the issue only when using delay in netem - that is
> > the codepath using qdisc_watchdog...
>
> Andres, thanks for your work and time. It saved me a lot of searching,
> because I wasn't able to trigger this on my old box.
Thanks. It allowed me to go through some of my remaining paperwork ;-)

Does anybody of you have an idea where the problem actually resides?
qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
patch/infrastructure from Arun goes over my head...
I will happily test some ideas/patches.

Aside from that - is the whole PSCHED_TICKS2NS/PSCHED_NS2TICKS conversion
business purely backward compatibility?

Andres

2009-07-03 12:03:23

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> > ...
> >
> > > Ok. I finally see the light. I bisected the issue down to
> > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> > > pinned timers
> > >
> > > Disabling timer migration like provided in the earlier commit stops the
> > > issue from occuring.
> > >
> > > That it is related to timers is sensible in the light of my findings,
> > > that I could trigger the issue only when using delay in netem - that is
> > > the codepath using qdisc_watchdog...
> >
> > Andres, thanks for your work and time. It saved me a lot of searching,
> > because I wasn't able to trigger this on my old box.
> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>
> Does anybody of you have an idea where the problem actually resides?

Do you mean possibly broken timers are not enough?

> qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
> patch/infrastructure from Arun goes over my head...
> I will happily test some ideas/patches.
>
> Aside from that - is the whole PSCHED_TICKS2NS/PSCHED_NS2TICKS conversion
> business purely backward compatibility?

The whole PSCHED_ conversion was to get finer resolution without
breaking backward compatibility, I hope.;-)

Jarek P.

2009-07-03 12:30:19

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Friday 03 July 2009 14:03:01 Jarek Poplawski wrote:
> On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
> > On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> > > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> > > ...
> > >
> > > > Ok. I finally see the light. I bisected the issue down to
> > > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> > > > pinned timers
> > > >
> > > > Disabling timer migration like provided in the earlier commit stops
> > > > the issue from occuring.
> > > >
> > > > That it is related to timers is sensible in the light of my findings,
> > > > that I could trigger the issue only when using delay in netem - that
> > > > is the codepath using qdisc_watchdog...
> > > Andres, thanks for your work and time. It saved me a lot of searching,
> > > because I wasn't able to trigger this on my old box.
> > Thanks. It allowed me to go through some of my remaining paperwork ;-)
> > Does anybody of you have an idea where the problem actually resides?
> Do you mean possibly broken timers are not enough?
I have no ideas how/if the timers are actually broken or if the problem does
reside somewhere else and is just made visible by the timer changes.

I would have expected more problem with completely borked timers ;-)


> > Aside from that - is the whole PSCHED_TICKS2NS/PSCHED_NS2TICKS conversion
> > business purely backward compatibility?
> The whole PSCHED_ conversion was to get finer resolution without
> breaking backward compatibility, I hope.;-)
I haven't seen any problems - just curious ;-)

Andres

2009-07-03 20:22:25

by David Miller

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

From: Jarek Poplawski <[email protected]>
Date: Fri, 3 Jul 2009 12:03:01 +0000

> On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
>> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
>> > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
>> > ...
>> >
>> > > Ok. I finally see the light. I bisected the issue down to
>> > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
>> > > pinned timers
>> > >
>> > > Disabling timer migration like provided in the earlier commit stops the
>> > > issue from occuring.
>> > >
>> > > That it is related to timers is sensible in the light of my findings,
>> > > that I could trigger the issue only when using delay in netem - that is
>> > > the codepath using qdisc_watchdog...
>> >
>> > Andres, thanks for your work and time. It saved me a lot of searching,
>> > because I wasn't able to trigger this on my old box.
>> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>>
>> Does anybody of you have an idea where the problem actually resides?
>
> Do you mean possibly broken timers are not enough?

Well, if you look at that commit the bisect pointed to Jarek, it is a
change which starts causing a situation which never happened before.
Namely, timers added on one cpu can be migrated and fire on another.

So this could be exposing races in the networking that technically
always existed.

2009-07-03 22:57:18

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Fri, Jul 03, 2009 at 01:22:20PM -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Fri, 3 Jul 2009 12:03:01 +0000
>
> > On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
> >> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> >> > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> >> > ...
> >> >
> >> > > Ok. I finally see the light. I bisected the issue down to
> >> > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> >> > > pinned timers
> >> > >
> >> > > Disabling timer migration like provided in the earlier commit stops the
> >> > > issue from occuring.
> >> > >
> >> > > That it is related to timers is sensible in the light of my findings,
> >> > > that I could trigger the issue only when using delay in netem - that is
> >> > > the codepath using qdisc_watchdog...
> >> >
> >> > Andres, thanks for your work and time. It saved me a lot of searching,
> >> > because I wasn't able to trigger this on my old box.
> >> Thanks. It allowed me to go through some of my remaining paperwork ;-)
> >>
> >> Does anybody of you have an idea where the problem actually resides?
> >
> > Do you mean possibly broken timers are not enough?
>
> Well, if you look at that commit the bisect pointed to Jarek, it is a
> change which starts causing a situation which never happened before.
> Namely, timers added on one cpu can be migrated and fire on another.
>
> So this could be exposing races in the networking that technically
> always existed.

I'm not sure I get your point; could you give some example?
Actually, I've suspected races in timers code.

Jarek P.

2009-07-04 01:55:59

by David Miller

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

From: Jarek Poplawski <[email protected]>
Date: Sat, 4 Jul 2009 00:56:40 +0200

> On Fri, Jul 03, 2009 at 01:22:20PM -0700, David Miller wrote:
>> Well, if you look at that commit the bisect pointed to Jarek, it is a
>> change which starts causing a situation which never happened before.
>> Namely, timers added on one cpu can be migrated and fire on another.
>>
>> So this could be exposing races in the networking that technically
>> always existed.
>
> I'm not sure I get your point; could you give some example?
> Actually, I've suspected races in timers code.

Let's say that a particular networking timer always gets
re-added on the cpu where the timer fires.

In that case, beforehand, no inter-cpu races could possibly
be tested. But with the new timer code, such races could
now be potentially triggered.

2009-07-04 06:37:17

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Fri, Jul 03, 2009 at 06:55:53PM -0700, David Miller wrote:
> From: Jarek Poplawski <[email protected]>
> Date: Sat, 4 Jul 2009 00:56:40 +0200
>
> > On Fri, Jul 03, 2009 at 01:22:20PM -0700, David Miller wrote:
> >> Well, if you look at that commit the bisect pointed to Jarek, it is a
> >> change which starts causing a situation which never happened before.
> >> Namely, timers added on one cpu can be migrated and fire on another.
> >>
> >> So this could be exposing races in the networking that technically
> >> always existed.
> >
> > I'm not sure I get your point; could you give some example?
> > Actually, I've suspected races in timers code.
>
> Let's say that a particular networking timer always gets
> re-added on the cpu where the timer fires.
>
> In that case, beforehand, no inter-cpu races could possibly
> be tested. But with the new timer code, such races could
> now be potentially triggered.

Maybe I still miss something, but even if it were possible, lockdep
should have reported such things long ago.

Jarek P.

2009-07-04 15:19:31

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Andres Freund wrote, On 07/03/2009 01:26 PM:

> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
>> On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
>> ...
>>
>>> Ok. I finally see the light. I bisected the issue down to
>>> eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
>>> pinned timers
>>>
>>> Disabling timer migration like provided in the earlier commit stops the
>>> issue from occuring.
>>>
>>> That it is related to timers is sensible in the light of my findings,
>>> that I could trigger the issue only when using delay in netem - that is
>>> the codepath using qdisc_watchdog...
>> Andres, thanks for your work and time. It saved me a lot of searching,
>> because I wasn't able to trigger this on my old box.
> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>
> Does anybody of you have an idea where the problem actually resides?
> qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
> patch/infrastructure from Arun goes over my head...
> I will happily test some ideas/patches.

Since there are still no 100% proofs nor suspects, here are some
suggestions of additional checking. This bisected commit could
probably be additionally verified by applying to 2.6.30 with a
preceding one; I attach both of them below.

Another suggestion is to try this without lockdep e.g. by setting
/sys/module/lockdep/parameters/lock_stat and prove_locking to 0.

Thanks,
Jarek P.

--------------->
commit 597d0275736dad9c3bda6f0a00a1c477dc0f37b1
Author: Arun R Bharadwaj <[email protected]>
Date: Thu Apr 16 12:13:26 2009 +0530

timers: Framework for identifying pinned timers

* Arun R Bharadwaj <[email protected]> [2009-04-16 12:11:36]:

This patch creates a new framework for identifying cpu-pinned timers
and hrtimers.

This framework is needed because pinned timers are expected to fire on
the same CPU on which they are queued. So it is essential to identify
these and not migrate them, in case there are any.

For regular timers, the currently existing add_timer_on() can be used
queue pinned timers and subsequently mod_timer_pinned() can be used
to modify the 'expires' field.

For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
added to queue cpu-pinned hrtimer.

[ tglx: use .._PINNED mode argument instead of creating tons of new
functions ]

Signed-off-by: Arun R Bharadwaj <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 0d2f7c8..7400900 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -30,8 +30,11 @@ struct hrtimer_cpu_base;
* Mode arguments of xxx_hrtimer functions:
*/
enum hrtimer_mode {
- HRTIMER_MODE_ABS, /* Time value is absolute */
- HRTIMER_MODE_REL, /* Time value is relative to now */
+ HRTIMER_MODE_ABS = 0x0, /* Time value is absolute */
+ HRTIMER_MODE_REL = 0x1, /* Time value is relative to now */
+ HRTIMER_MODE_PINNED = 0x02, /* Timer is bound to CPU */
+ HRTIMER_MODE_ABS_PINNED = 0x02,
+ HRTIMER_MODE_REL_PINNED = 0x03,
};

/*
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 6cdb6f3..ccf882e 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -163,7 +163,10 @@ extern void add_timer_on(struct timer_list *timer, int cpu);
extern int del_timer(struct timer_list * timer);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
+extern int mod_timer_pinned(struct timer_list *timer, unsigned long expires);

+#define TIMER_NOT_PINNED 0
+#define TIMER_PINNED 1
/*
* The jiffies value which is added to now, when there is no timer
* in the timer wheel:
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cb8a15c..c71bcd5 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -193,7 +193,8 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
* Switch the timer base to the current CPU when possible.
*/
static inline struct hrtimer_clock_base *
-switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base)
+switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
+ int pinned)
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
@@ -907,9 +908,9 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
ret = remove_hrtimer(timer, base);

/* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base);
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);

- if (mode == HRTIMER_MODE_REL) {
+ if (mode & HRTIMER_MODE_REL) {
tim = ktime_add_safe(tim, new_base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
diff --git a/kernel/timer.c b/kernel/timer.c
index 5c1e84b..3424dfd 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -604,7 +604,8 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
}

static inline int
-__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
+__mod_timer(struct timer_list *timer, unsigned long expires,
+ bool pending_only, int pinned)
{
struct tvec_base *base, *new_base;
unsigned long flags;
@@ -668,7 +669,7 @@ out_unlock:
*/
int mod_timer_pending(struct timer_list *timer, unsigned long expires)
{
- return __mod_timer(timer, expires, true);
+ return __mod_timer(timer, expires, true, TIMER_NOT_PINNED);
}
EXPORT_SYMBOL(mod_timer_pending);

@@ -702,11 +703,33 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
if (timer->expires == expires && timer_pending(timer))
return 1;

- return __mod_timer(timer, expires, false);
+ return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
}
EXPORT_SYMBOL(mod_timer);

/**
+ * mod_timer_pinned - modify a timer's timeout
+ * @timer: the timer to be modified
+ * @expires: new timeout in jiffies
+ *
+ * mod_timer_pinned() is a way to update the expire field of an
+ * active timer (if the timer is inactive it will be activated)
+ * and not allow the timer to be migrated to a different CPU.
+ *
+ * mod_timer_pinned(timer, expires) is equivalent to:
+ *
+ * del_timer(timer); timer->expires = expires; add_timer(timer);
+ */
+int mod_timer_pinned(struct timer_list *timer, unsigned long expires)
+{
+ if (timer->expires == expires && timer_pending(timer))
+ return 1;
+
+ return __mod_timer(timer, expires, false, TIMER_PINNED);
+}
+EXPORT_SYMBOL(mod_timer_pinned);
+
+/**
* add_timer - start a timer
* @timer: the timer to be added
*
@@ -1356,7 +1379,7 @@ signed long __sched schedule_timeout(signed long timeout)
expire = timeout + jiffies;

setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
- __mod_timer(&timer, expire, false);
+ __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
schedule();
del_singleshot_timer_sync(&timer);


commit eea08f32adb3f97553d49a4f79a119833036000a
Author: Arun R Bharadwaj <[email protected]>
Date: Thu Apr 16 12:16:41 2009 +0530

timers: Logic to move non pinned timers

* Arun R Bharadwaj <[email protected]> [2009-04-16 12:11:36]:

This patch migrates all non pinned timers and hrtimers to the current
idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
are not migrated.

While migrating hrtimers, care should be taken to check if migrating
a hrtimer would result in a latency or not. So we compare the expiry of the
hrtimer with the next timer interrupt on the target cpu and migrate the
hrtimer only if it expires *after* the next interrupt on the target cpu.
So, added a clockevents_get_next_event() helper function to return the
next_event on the target cpu's clock_event_device.

[ tglx: cleanups and simplifications ]

Signed-off-by: Arun R Bharadwaj <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 3a1dbba..20a100f 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -143,3 +143,12 @@ extern void clockevents_notify(unsigned long reason, void *arg);
#endif

#endif
+
+#ifdef CONFIG_GENERIC_CLOCKEVENTS
+extern ktime_t clockevents_get_next_event(int cpu);
+#else
+static inline ktime_t clockevents_get_next_event(int cpu)
+{
+ return (ktime_t) { .tv64 = KTIME_MAX };
+}
+#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6185040..311dec1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -257,6 +257,7 @@ extern void task_rq_unlock_wait(struct task_struct *p);
extern cpumask_var_t nohz_cpu_mask;
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
extern int select_nohz_load_balancer(int cpu);
+extern int get_nohz_load_balancer(void);
#else
static inline int select_nohz_load_balancer(int cpu)
{
@@ -1772,6 +1773,17 @@ int sched_nr_latency_handler(struct ctl_table *table, int write,
struct file *file, void __user *buffer, size_t *length,
loff_t *ppos);
#endif
+#ifdef CONFIG_SCHED_DEBUG
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return sysctl_timer_migration;
+}
+#else
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return 1;
+}
+#endif
extern unsigned int sysctl_sched_rt_period;
extern int sysctl_sched_rt_runtime;

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index c71bcd5..b675a67 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -43,6 +43,8 @@
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/debugobjects.h>
+#include <linux/sched.h>
+#include <linux/timer.h>

#include <asm/uaccess.h>

@@ -198,8 +200,19 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
+ int cpu, preferred_cpu = -1;
+
+ cpu = smp_processor_id();
+#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+ if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
+ preferred_cpu = get_nohz_load_balancer();
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
+#endif

- new_cpu_base = &__get_cpu_var(hrtimer_bases);
+again:
+ new_cpu_base = &per_cpu(hrtimer_bases, cpu);
new_base = &new_cpu_base->clock_base[base->index];

if (base != new_base) {
@@ -219,6 +232,40 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
timer->base = NULL;
spin_unlock(&base->cpu_base->lock);
spin_lock(&new_base->cpu_base->lock);
+
+ /* Optimized away for NOHZ=n SMP=n */
+ if (cpu == preferred_cpu) {
+ /* Calculate clock monotonic expiry time */
+#ifdef CONFIG_HIGH_RES_TIMERS
+ ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
+ new_base->offset);
+#else
+ ktime_t expires = hrtimer_get_expires(timer);
+#endif
+
+ /*
+ * Get the next event on target cpu from the
+ * clock events layer.
+ * This covers the highres=off nohz=on case as well.
+ */
+ ktime_t next = clockevents_get_next_event(cpu);
+
+ ktime_t delta = ktime_sub(expires, next);
+
+ /*
+ * We do not migrate the timer when it is expiring
+ * before the next event on the target cpu because
+ * we cannot reprogram the target cpu hardware and
+ * we would cause it to fire late.
+ */
+ if (delta.tv64 < 0) {
+ cpu = smp_processor_id();
+ spin_unlock(&new_base->cpu_base->lock);
+ spin_lock(&base->cpu_base->lock);
+ timer->base = base;
+ goto again;
+ }
+ }
timer->base = new_base;
}
return new_base;
@@ -236,7 +283,7 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
return base;
}

-# define switch_hrtimer_base(t, b) (b)
+# define switch_hrtimer_base(t, b, p) (b)

#endif /* !CONFIG_SMP */

diff --git a/kernel/sched.c b/kernel/sched.c
index 7f1dd56..9fe3774 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4244,6 +4244,11 @@ static struct {
.load_balancer = ATOMIC_INIT(-1),
};

+int get_nohz_load_balancer(void)
+{
+ return atomic_read(&nohz.load_balancer);
+}
+
/*
* This routine will try to nominate the ilb (idle load balancing)
* owner among the cpus whose ticks are stopped. ilb owner will do the idle
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index d13be21..ab20ded 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -18,6 +18,7 @@
#include <linux/notifier.h>
#include <linux/smp.h>
#include <linux/sysdev.h>
+#include <linux/tick.h>

/* The registered clock event devices */
static LIST_HEAD(clockevent_devices);
@@ -251,4 +252,15 @@ void clockevents_notify(unsigned long reason, void *arg)
spin_unlock(&clockevents_lock);
}
EXPORT_SYMBOL_GPL(clockevents_notify);
+
+ktime_t clockevents_get_next_event(int cpu)
+{
+ struct tick_device *td;
+ struct clock_event_device *dev;
+
+ td = &per_cpu(tick_cpu_device, cpu);
+ dev = td->evtdev;
+
+ return dev->next_event;
+}
#endif
diff --git a/kernel/timer.c b/kernel/timer.c
index 3424dfd..3f841db 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -37,6 +37,7 @@
#include <linux/delay.h>
#include <linux/tick.h>
#include <linux/kallsyms.h>
+#include <linux/sched.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -609,9 +610,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
{
struct tvec_base *base, *new_base;
unsigned long flags;
- int ret;
-
- ret = 0;
+ int ret = 0 , cpu;

timer_stats_timer_set_start_info(timer);
BUG_ON(!timer->function);
@@ -630,6 +629,18 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

new_base = __get_cpu_var(tvec_bases);

+ cpu = smp_processor_id();
+
+#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+ if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
+ int preferred_cpu = get_nohz_load_balancer();
+
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
+#endif
+ new_base = per_cpu(tvec_bases, cpu);
+
if (base != new_base) {
/*
* We are trying to schedule the timer on the local CPU.

2009-07-06 04:54:22

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Hello

System freezes immediatly after grub, no init processing at all, after
applying those patches on top of vanilla 2.6.30 on my box.

Also, 2nd patch, this piece:

+#ifdef CONFIG_SCHED_DEBUG
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return sysctl_timer_migration;
+}
+#else
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return 1;
+}
+#endif

doesnt work on top of 2.6.30. It complains, while compiling, that
sysctl_timer_migration is not defined. So i just replaced that call
with return 1, like on the not debug case. Hope this doesnt defeat
your test case, but it wouldnt compile otherwise. Probably that was
just introduced after 2.6.30?

(Have also tried latest .31-rc2, original bug still exists.)

Thank you for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal


On Sat, Jul 4, 2009 at 4:18 PM, Jarek Poplawski<[email protected]> wrote:
> Andres Freund wrote, On 07/03/2009 01:26 PM:
>
>> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
>>> On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
>>> ...
>>>
>>>> Ok. I finally see the light. I bisected the issue down to
>>>> eea08f32adb3f97553d49a4f79a119833036000a : ?timers: Logic to move non
>>>> pinned timers
>>>>
>>>> Disabling timer migration like provided in the earlier commit stops the
>>>> issue from occuring.
>>>>
>>>> That it is related to timers is sensible in the light of my findings,
>>>> that I could trigger the issue only when using delay in netem - that is
>>>> the codepath using qdisc_watchdog...
>>> Andres, thanks for your work and time. It saved me a lot of searching,
>>> because I wasn't able to trigger this on my old box.
>> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>>
>> Does anybody of you have an idea where the problem actually resides?
>> qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
>> patch/infrastructure from Arun goes over my head...
>> I will happily test some ideas/patches.
>
> Since there are still no 100% proofs nor suspects, here are some
> suggestions of additional checking. This bisected commit could
> probably be additionally verified by applying to 2.6.30 with a
> preceding one; I attach both of them below.
>
> Another suggestion is to try this without lockdep e.g. by setting
> /sys/module/lockdep/parameters/lock_stat and prove_locking to 0.
>
> Thanks,
> Jarek P.
>
> --------------->
> commit 597d0275736dad9c3bda6f0a00a1c477dc0f37b1
> Author: Arun R Bharadwaj <[email protected]>
> Date: ? Thu Apr 16 12:13:26 2009 +0530
>
> ? ?timers: Framework for identifying pinned timers
>
> ? ?* Arun R Bharadwaj <[email protected]> [2009-04-16 12:11:36]:
>
> ? ?This patch creates a new framework for identifying cpu-pinned timers
> ? ?and hrtimers.
>
> ? ?This framework is needed because pinned timers are expected to fire on
> ? ?the same CPU on which they are queued. So it is essential to identify
> ? ?these and not migrate them, in case there are any.
>
> ? ?For regular timers, the currently existing add_timer_on() can be used
> ? ?queue pinned timers and subsequently mod_timer_pinned() can be used
> ? ?to modify the 'expires' field.
>
> ? ?For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
> ? ?added to queue cpu-pinned hrtimer.
>
> ? ?[ tglx: use .._PINNED mode argument instead of creating tons of new
> ? ?functions ]
>
> ? ?Signed-off-by: Arun R Bharadwaj <[email protected]>
> ? ?Signed-off-by: Thomas Gleixner <[email protected]>
>
> diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> index 0d2f7c8..7400900 100644
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -30,8 +30,11 @@ struct hrtimer_cpu_base;
> ?* Mode arguments of xxx_hrtimer functions:
> ?*/
> ?enum hrtimer_mode {
> - ? ? ? HRTIMER_MODE_ABS, ? ? ? /* Time value is absolute */
> - ? ? ? HRTIMER_MODE_REL, ? ? ? /* Time value is relative to now */
> + ? ? ? HRTIMER_MODE_ABS = 0x0, ? ? ? ? /* Time value is absolute */
> + ? ? ? HRTIMER_MODE_REL = 0x1, ? ? ? ? /* Time value is relative to now */
> + ? ? ? HRTIMER_MODE_PINNED = 0x02, ? ? /* Timer is bound to CPU */
> + ? ? ? HRTIMER_MODE_ABS_PINNED = 0x02,
> + ? ? ? HRTIMER_MODE_REL_PINNED = 0x03,
> ?};
>
> ?/*
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 6cdb6f3..ccf882e 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -163,7 +163,10 @@ extern void add_timer_on(struct timer_list *timer, int cpu);
> ?extern int del_timer(struct timer_list * timer);
> ?extern int mod_timer(struct timer_list *timer, unsigned long expires);
> ?extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
> +extern int mod_timer_pinned(struct timer_list *timer, unsigned long expires);
>
> +#define TIMER_NOT_PINNED ? ? ? 0
> +#define TIMER_PINNED ? ? ? ? ? 1
> ?/*
> ?* The jiffies value which is added to now, when there is no timer
> ?* in the timer wheel:
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index cb8a15c..c71bcd5 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -193,7 +193,8 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
> ?* Switch the timer base to the current CPU when possible.
> ?*/
> ?static inline struct hrtimer_clock_base *
> -switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base)
> +switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
> + ? ? ? ? ? ? ? ? ? int pinned)
> ?{
> ? ? ? ?struct hrtimer_clock_base *new_base;
> ? ? ? ?struct hrtimer_cpu_base *new_cpu_base;
> @@ -907,9 +908,9 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
> ? ? ? ?ret = remove_hrtimer(timer, base);
>
> ? ? ? ?/* Switch the timer base, if necessary: */
> - ? ? ? new_base = switch_hrtimer_base(timer, base);
> + ? ? ? new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
>
> - ? ? ? if (mode == HRTIMER_MODE_REL) {
> + ? ? ? if (mode & HRTIMER_MODE_REL) {
> ? ? ? ? ? ? ? ?tim = ktime_add_safe(tim, new_base->get_time());
> ? ? ? ? ? ? ? ?/*
> ? ? ? ? ? ? ? ? * CONFIG_TIME_LOW_RES is a temporary way for architectures
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 5c1e84b..3424dfd 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -604,7 +604,8 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
> ?}
>
> ?static inline int
> -__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
> +__mod_timer(struct timer_list *timer, unsigned long expires,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? bool pending_only, int pinned)
> ?{
> ? ? ? ?struct tvec_base *base, *new_base;
> ? ? ? ?unsigned long flags;
> @@ -668,7 +669,7 @@ out_unlock:
> ?*/
> ?int mod_timer_pending(struct timer_list *timer, unsigned long expires)
> ?{
> - ? ? ? return __mod_timer(timer, expires, true);
> + ? ? ? return __mod_timer(timer, expires, true, TIMER_NOT_PINNED);
> ?}
> ?EXPORT_SYMBOL(mod_timer_pending);
>
> @@ -702,11 +703,33 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
> ? ? ? ?if (timer->expires == expires && timer_pending(timer))
> ? ? ? ? ? ? ? ?return 1;
>
> - ? ? ? return __mod_timer(timer, expires, false);
> + ? ? ? return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
> ?}
> ?EXPORT_SYMBOL(mod_timer);
>
> ?/**
> + * mod_timer_pinned - modify a timer's timeout
> + * @timer: the timer to be modified
> + * @expires: new timeout in jiffies
> + *
> + * mod_timer_pinned() is a way to update the expire field of an
> + * active timer (if the timer is inactive it will be activated)
> + * and not allow the timer to be migrated to a different CPU.
> + *
> + * mod_timer_pinned(timer, expires) is equivalent to:
> + *
> + * ? ? del_timer(timer); timer->expires = expires; add_timer(timer);
> + */
> +int mod_timer_pinned(struct timer_list *timer, unsigned long expires)
> +{
> + ? ? ? if (timer->expires == expires && timer_pending(timer))
> + ? ? ? ? ? ? ? return 1;
> +
> + ? ? ? return __mod_timer(timer, expires, false, TIMER_PINNED);
> +}
> +EXPORT_SYMBOL(mod_timer_pinned);
> +
> +/**
> ?* add_timer - start a timer
> ?* @timer: the timer to be added
> ?*
> @@ -1356,7 +1379,7 @@ signed long __sched schedule_timeout(signed long timeout)
> ? ? ? ?expire = timeout + jiffies;
>
> ? ? ? ?setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
> - ? ? ? __mod_timer(&timer, expire, false);
> + ? ? ? __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
> ? ? ? ?schedule();
> ? ? ? ?del_singleshot_timer_sync(&timer);
>
>
> commit eea08f32adb3f97553d49a4f79a119833036000a
> Author: Arun R Bharadwaj <[email protected]>
> Date: ? Thu Apr 16 12:16:41 2009 +0530
>
> ? ?timers: Logic to move non pinned timers
>
> ? ?* Arun R Bharadwaj <[email protected]> [2009-04-16 12:11:36]:
>
> ? ?This patch migrates all non pinned timers and hrtimers to the current
> ? ?idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
> ? ?are not migrated.
>
> ? ?While migrating hrtimers, care should be taken to check if migrating
> ? ?a hrtimer would result in a latency or not. So we compare the expiry of the
> ? ?hrtimer with the next timer interrupt on the target cpu and migrate the
> ? ?hrtimer only if it expires *after* the next interrupt on the target cpu.
> ? ?So, added a clockevents_get_next_event() helper function to return the
> ? ?next_event on the target cpu's clock_event_device.
>
> ? ?[ tglx: cleanups and simplifications ]
>
> ? ?Signed-off-by: Arun R Bharadwaj <[email protected]>
> ? ?Signed-off-by: Thomas Gleixner <[email protected]>
>
> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
> index 3a1dbba..20a100f 100644
> --- a/include/linux/clockchips.h
> +++ b/include/linux/clockchips.h
> @@ -143,3 +143,12 @@ extern void clockevents_notify(unsigned long reason, void *arg);
> ?#endif
>
> ?#endif
> +
> +#ifdef CONFIG_GENERIC_CLOCKEVENTS
> +extern ktime_t clockevents_get_next_event(int cpu);
> +#else
> +static inline ktime_t clockevents_get_next_event(int cpu)
> +{
> + ? ? ? return (ktime_t) { .tv64 = KTIME_MAX };
> +}
> +#endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6185040..311dec1 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -257,6 +257,7 @@ extern void task_rq_unlock_wait(struct task_struct *p);
> ?extern cpumask_var_t nohz_cpu_mask;
> ?#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
> ?extern int select_nohz_load_balancer(int cpu);
> +extern int get_nohz_load_balancer(void);
> ?#else
> ?static inline int select_nohz_load_balancer(int cpu)
> ?{
> @@ -1772,6 +1773,17 @@ int sched_nr_latency_handler(struct ctl_table *table, int write,
> ? ? ? ? ? ? ? ?struct file *file, void __user *buffer, size_t *length,
> ? ? ? ? ? ? ? ?loff_t *ppos);
> ?#endif
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> + ? ? ? return sysctl_timer_migration;
> +}
> +#else
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> + ? ? ? return 1;
> +}
> +#endif
> ?extern unsigned int sysctl_sched_rt_period;
> ?extern int sysctl_sched_rt_runtime;
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index c71bcd5..b675a67 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -43,6 +43,8 @@
> ?#include <linux/seq_file.h>
> ?#include <linux/err.h>
> ?#include <linux/debugobjects.h>
> +#include <linux/sched.h>
> +#include <linux/timer.h>
>
> ?#include <asm/uaccess.h>
>
> @@ -198,8 +200,19 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
> ?{
> ? ? ? ?struct hrtimer_clock_base *new_base;
> ? ? ? ?struct hrtimer_cpu_base *new_cpu_base;
> + ? ? ? int cpu, preferred_cpu = -1;
> +
> + ? ? ? cpu = smp_processor_id();
> +#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> + ? ? ? if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> + ? ? ? ? ? ? ? preferred_cpu = get_nohz_load_balancer();
> + ? ? ? ? ? ? ? if (preferred_cpu >= 0)
> + ? ? ? ? ? ? ? ? ? ? ? cpu = preferred_cpu;
> + ? ? ? }
> +#endif
>
> - ? ? ? new_cpu_base = &__get_cpu_var(hrtimer_bases);
> +again:
> + ? ? ? new_cpu_base = &per_cpu(hrtimer_bases, cpu);
> ? ? ? ?new_base = &new_cpu_base->clock_base[base->index];
>
> ? ? ? ?if (base != new_base) {
> @@ -219,6 +232,40 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
> ? ? ? ? ? ? ? ?timer->base = NULL;
> ? ? ? ? ? ? ? ?spin_unlock(&base->cpu_base->lock);
> ? ? ? ? ? ? ? ?spin_lock(&new_base->cpu_base->lock);
> +
> + ? ? ? ? ? ? ? /* Optimized away for NOHZ=n SMP=n */
> + ? ? ? ? ? ? ? if (cpu == preferred_cpu) {
> + ? ? ? ? ? ? ? ? ? ? ? /* Calculate clock monotonic expiry time */
> +#ifdef CONFIG_HIGH_RES_TIMERS
> + ? ? ? ? ? ? ? ? ? ? ? ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? new_base->offset);
> +#else
> + ? ? ? ? ? ? ? ? ? ? ? ktime_t expires = hrtimer_get_expires(timer);
> +#endif
> +
> + ? ? ? ? ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ? ? ? ? ?* Get the next event on target cpu from the
> + ? ? ? ? ? ? ? ? ? ? ? ?* clock events layer.
> + ? ? ? ? ? ? ? ? ? ? ? ?* This covers the highres=off nohz=on case as well.
> + ? ? ? ? ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? ? ? ? ? ktime_t next = clockevents_get_next_event(cpu);
> +
> + ? ? ? ? ? ? ? ? ? ? ? ktime_t delta = ktime_sub(expires, next);
> +
> + ? ? ? ? ? ? ? ? ? ? ? /*
> + ? ? ? ? ? ? ? ? ? ? ? ?* We do not migrate the timer when it is expiring
> + ? ? ? ? ? ? ? ? ? ? ? ?* before the next event on the target cpu because
> + ? ? ? ? ? ? ? ? ? ? ? ?* we cannot reprogram the target cpu hardware and
> + ? ? ? ? ? ? ? ? ? ? ? ?* we would cause it to fire late.
> + ? ? ? ? ? ? ? ? ? ? ? ?*/
> + ? ? ? ? ? ? ? ? ? ? ? if (delta.tv64 < 0) {
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? cpu = smp_processor_id();
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? spin_unlock(&new_base->cpu_base->lock);
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? spin_lock(&base->cpu_base->lock);
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? timer->base = base;
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? goto again;
> + ? ? ? ? ? ? ? ? ? ? ? }
> + ? ? ? ? ? ? ? }
> ? ? ? ? ? ? ? ?timer->base = new_base;
> ? ? ? ?}
> ? ? ? ?return new_base;
> @@ -236,7 +283,7 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
> ? ? ? ?return base;
> ?}
>
> -# define switch_hrtimer_base(t, b) ? ? (b)
> +# define switch_hrtimer_base(t, b, p) ?(b)
>
> ?#endif /* !CONFIG_SMP */
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 7f1dd56..9fe3774 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4244,6 +4244,11 @@ static struct {
> ? ? ? ?.load_balancer = ATOMIC_INIT(-1),
> ?};
>
> +int get_nohz_load_balancer(void)
> +{
> + ? ? ? return atomic_read(&nohz.load_balancer);
> +}
> +
> ?/*
> ?* This routine will try to nominate the ilb (idle load balancing)
> ?* owner among the cpus whose ticks are stopped. ilb owner will do the idle
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index d13be21..ab20ded 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -18,6 +18,7 @@
> ?#include <linux/notifier.h>
> ?#include <linux/smp.h>
> ?#include <linux/sysdev.h>
> +#include <linux/tick.h>
>
> ?/* The registered clock event devices */
> ?static LIST_HEAD(clockevent_devices);
> @@ -251,4 +252,15 @@ void clockevents_notify(unsigned long reason, void *arg)
> ? ? ? ?spin_unlock(&clockevents_lock);
> ?}
> ?EXPORT_SYMBOL_GPL(clockevents_notify);
> +
> +ktime_t clockevents_get_next_event(int cpu)
> +{
> + ? ? ? struct tick_device *td;
> + ? ? ? struct clock_event_device *dev;
> +
> + ? ? ? td = &per_cpu(tick_cpu_device, cpu);
> + ? ? ? dev = td->evtdev;
> +
> + ? ? ? return dev->next_event;
> +}
> ?#endif
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 3424dfd..3f841db 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -37,6 +37,7 @@
> ?#include <linux/delay.h>
> ?#include <linux/tick.h>
> ?#include <linux/kallsyms.h>
> +#include <linux/sched.h>
>
> ?#include <asm/uaccess.h>
> ?#include <asm/unistd.h>
> @@ -609,9 +610,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
> ?{
> ? ? ? ?struct tvec_base *base, *new_base;
> ? ? ? ?unsigned long flags;
> - ? ? ? int ret;
> -
> - ? ? ? ret = 0;
> + ? ? ? int ret = 0 , cpu;
>
> ? ? ? ?timer_stats_timer_set_start_info(timer);
> ? ? ? ?BUG_ON(!timer->function);
> @@ -630,6 +629,18 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>
> ? ? ? ?new_base = __get_cpu_var(tvec_bases);
>
> + ? ? ? cpu = smp_processor_id();
> +
> +#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> + ? ? ? if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> + ? ? ? ? ? ? ? int preferred_cpu = get_nohz_load_balancer();
> +
> + ? ? ? ? ? ? ? if (preferred_cpu >= 0)
> + ? ? ? ? ? ? ? ? ? ? ? cpu = preferred_cpu;
> + ? ? ? }
> +#endif
> + ? ? ? new_base = per_cpu(tvec_bases, cpu);
> +
> ? ? ? ?if (base != new_base) {
> ? ? ? ? ? ? ? ?/*
> ? ? ? ? ? ? ? ? * We are trying to schedule the timer on the local CPU.
>

2009-07-06 08:15:30

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
> Hello
Hi,

> System freezes immediatly after grub, no init processing at all, after
> applying those patches on top of vanilla 2.6.30 on my box.
>
> Also, 2nd patch, this piece:
>
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> + return sysctl_timer_migration;
> +}
> +#else
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> + return 1;
> +}
> +#endif
>
> doesnt work on top of 2.6.30. It complains, while compiling, that
> sysctl_timer_migration is not defined. So i just replaced that call
> with return 1, like on the not debug case. Hope this doesnt defeat
> your test case, but it wouldnt compile otherwise. Probably that was
> just introduced after 2.6.30?

Yes, it would be too easy if it had worked just like this... (I only
test-compiled it, without such warnings, but without CONFIG_SCHED_DEBUG.)
I'll look at this again.

>
> (Have also tried latest .31-rc2, original bug still exists.)

Btw., could you send your .config? Do you have lockdep enabled (if so
did you try without it)? Is it 'a real' box or something virtual?

Thanks,
Jarek P.

2009-07-06 11:29:34

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Hello

.config is attached. This is a real box, a fedora 11 with rawhide
updates applied. Its a quad core system (amd phenom 9600) with 4gb
ddr2 ram. The board is an asus m3a-h/hdmi. We have a couple of these
boxes just to test run linux before dropping into production, hence
the availability for tests.

Haven't tried without lockdep, will try later and let you know if any
difference.

Thank you for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal

On Mon, Jul 6, 2009 at 9:14 AM, Jarek Poplawski<[email protected]> wrote:
> On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
>> Hello
> Hi,
>
>> System freezes immediatly after grub, no init processing at all, after
>> applying those patches on top of vanilla 2.6.30 on my box.
>>
>> Also, 2nd patch, this piece:
>>
>> +#ifdef CONFIG_SCHED_DEBUG
>> +static inline unsigned int get_sysctl_timer_migration(void)
>> +{
>> + ? ? ? return sysctl_timer_migration;
>> +}
>> +#else
>> +static inline unsigned int get_sysctl_timer_migration(void)
>> +{
>> + ? ? ? return 1;
>> +}
>> +#endif
>>
>> doesnt work on top of 2.6.30. It complains, while compiling, that
>> sysctl_timer_migration is not defined. So i just replaced that call
>> with return 1, like on the not debug case. Hope this doesnt defeat
>> your test case, but it wouldnt compile otherwise. Probably that was
>> just introduced after 2.6.30?
>
> Yes, it would be too easy if it had worked just like this... (I only
> test-compiled it, without such warnings, but without CONFIG_SCHED_DEBUG.)
> I'll look at this again.
>
>>
>> (Have also tried latest .31-rc2, original bug still exists.)
>
> Btw., could you send your .config? Do you have lockdep enabled (if so
> did you try without it)? Is it 'a real' box or something virtual?
>
> Thanks,
> Jarek P.
>


Attachments:
config (97.93 kB)

2009-07-06 14:19:41

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
> Hello
>
> System freezes immediatly after grub, no init processing at all, after
> applying those patches on top of vanilla 2.6.30 on my box.
...
> doesnt work on top of 2.6.30. It complains, while compiling, that
> sysctl_timer_migration is not defined. So i just replaced that call
> with return 1, like on the not debug case. Hope this doesnt defeat
> your test case, but it wouldnt compile otherwise. Probably that was
> just introduced after 2.6.30?

Yes, my bad, sorry. I've found 2 more patches from this series; can't
guarantee that's all, but seems to work & migrate within my one and
only core without any problems ;-)

Jarek P.


Attachments:
(No filename) (705.00 B)
timers.597d027 (5.45 kB)
timers.5c333864a (3.51 kB)
timers.cd1bb94b4a (1.97 kB)
timers.eea08f32 (6.95 kB)
Download all attachments

2009-07-06 16:13:40

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Monday 06 July 2009 16:19:16 Jarek Poplawski wrote:
> On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
> > Hello
> >
> > System freezes immediatly after grub, no init processing at all, after
> > applying those patches on top of vanilla 2.6.30 on my box.
>
> ...
>
> > doesnt work on top of 2.6.30. It complains, while compiling, that
> > sysctl_timer_migration is not defined. So i just replaced that call
> > with return 1, like on the not debug case. Hope this doesnt defeat
> > your test case, but it wouldnt compile otherwise. Probably that was
> > just introduced after 2.6.30?

I stupidly sent two emails in private to Jarek. Reposting here:

Jarek:
> > > > > Yes, my bad, sorry. I've found 2 more patches from this series;
can't
> > > > > guarantee that's all, but seems to work & migrate within my one and
> > > > > only core without any problems ;-)
Andres:
> > > > I have some doubt that this will give us new information:
> > > > The commit i bisected the failure to:
> > > > eea08f32adb3f97553d49a4f79a119833036000a
> > > > Is just 2.6.30-rc4 + the four commits you listed...
Jarek:
> > > I guess, you mean 2.6.31-rc1?
Andres:
> > No - I tested the timer development branch to exclude its a problem caused
> > by some other change between 2.6.30 and 2.6.31-git
> > And that branch is based on rc4...
Jarek:
> I misunderstood, sorry! That's just what I needed to know!

Andres:
> > > > And I seperately tested eea08f32adb3f97553d49a4f79a119833036000a^ to
> > > > be sure. So I am pretty sure its those commits which trigger the
> > > > problem - whats causing it is another matter.
Jarek:
> > > It might be true, but it isn't 100% proof. This patchset is special:
> > > by moving timers to other cores it generates much more SMP concurrency,
> > > so it could trigger some hidden races, which otherwise need much more
> > > time to show up. So I'm trying to establish if this could be the case.
> > > Btw., I guess there is nothing to hide from the lists, plus somebody
> > > could verify this idea?
Andres:
> > No, absolutely not. Just hit the wrong key. Sorry.
> > Btw, I ran netem with delay for more than 48h on around 80mbit... That
> > does not exclude such a rarely triggered race, but makes it a bit more
> > unlikely. (With migration thats around 3sec or so)
> This is a very important information: it should give timers' guys some
> incentive to start looking for this, and me less incentive to verify
> network code ;-)
Jarek:
> Btw., there were some strange traces of lockdep and stack overruning;
> did you try if without lockdep maybe there are some more readable
> warnings?
Lockdep was not enabled at first. Actually I think most if not all of the
traces I posted at first were without.

Will verify.
> And once again, consider resending this to the public, please. (At
> least Joao might be interested.)
Sorry once more.


Andres

2009-07-06 16:31:30

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Mon, Jul 06, 2009 at 06:13:29PM +0200, Andres Freund wrote:
...
> > > Btw, I ran netem with delay for more than 48h on around 80mbit... That
> > > does not exclude such a rarely triggered race, but makes it a bit more
> > > unlikely. (With migration thats around 3sec or so)
...
> Sorry once more.

Andres, the bisection + the above - you did 'the whole lotta work' and
you shouldn't be sorry at all! ;-)

Thanks again,
Jarek P.

2009-07-06 17:23:48

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Hello

Since i already had the kernel compiled and ready to boot when i read
this, i gave it a go anyway :-).

I can reproduce the freeze with those 4 patches applied, so i can
confirm that its, at least, related to, or exposed by, those patches.
There must be something else too, or its just too much fuzziness, but
the freeze takes a bit more time (approximately five minutes, give or
take) compared to the instant freeze before, but its there with the
patches, and without them, no freeze.

I assume there isnt a "safe" way to get them out of current .31-rc's, right?

Thank you very much for your time,
Joao Correia


On Mon, Jul 6, 2009 at 5:31 PM, Jarek Poplawski<[email protected]> wrote:
> On Mon, Jul 06, 2009 at 06:13:29PM +0200, Andres Freund wrote:
> ...
>> > > Btw, I ran netem with delay for more than 48h on around 80mbit... That
>> > > does ?not exclude such a rarely triggered race, but makes it a bit more
>> > > unlikely. (With migration thats around 3sec or so)
> ...
>> Sorry once more.
>
> Andres, the bisection + the above - you did 'the whole lotta work' and
> you shouldn't be sorry at all! ;-)
>
> Thanks again,
> Jarek P.
>

2009-07-06 17:24:58

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Monday 06 July 2009 18:31:06 Jarek Poplawski wrote:
> On Mon, Jul 06, 2009 at 06:13:29PM +0200, Andres Freund wrote:
> > > > Btw, I ran netem with delay for more than 48h on around 80mbit...
> > > > That does not exclude such a rarely triggered race, but makes it a
> > > > bit more unlikely. (With migration thats around 3sec or so)
> > Sorry once more.
> Andres, the bisection + the above - you did 'the whole lotta work' and
> you shouldn't be sorry at all! ;-)
To be fair I have to admit I did run netem for 48hours for other reasons ;-)
I also ran my modified netem, but all it currently does is adjusing
delay/jitter by other means, so that should have no influence.
I am trying to produce artificial loss patterns similar to really occuring
network loss.

I just switched mail client btw ;-)

ATM I have no further ideas except maybe trying different timers or a lower
timer resolution?

Andres

2009-07-06 17:26:52

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Monday 06 July 2009 19:23:18 Joao Correia wrote:
> Hello
>
> Since i already had the kernel compiled and ready to boot when i read
> this, i gave it a go anyway :-).
>
> I can reproduce the freeze with those 4 patches applied, so i can
> confirm that its, at least, related to, or exposed by, those patches.
> There must be something else too, or its just too much fuzziness, but
> the freeze takes a bit more time (approximately five minutes, give or
> take) compared to the instant freeze before, but its there with the
> patches, and without them, no freeze.
>
> I assume there isnt a "safe" way to get them out of current .31-rc's,
> right?
`echo 0 > /proc/sys/kernel/timer_migration` should mitigate the problem.

Andres

2009-07-07 06:50:38

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Mon, Jul 06, 2009 at 07:26:43PM +0200, Andres Freund wrote:
> On Monday 06 July 2009 19:23:18 Joao Correia wrote:
> > Hello
> >
> > Since i already had the kernel compiled and ready to boot when i read
> > this, i gave it a go anyway :-).
> >
> > I can reproduce the freeze with those 4 patches applied, so i can
> > confirm that its, at least, related to, or exposed by, those patches.
> > There must be something else too, or its just too much fuzziness, but
> > the freeze takes a bit more time (approximately five minutes, give or
> > take) compared to the instant freeze before, but its there with the
> > patches, and without them, no freeze.
> >
> > I assume there isnt a "safe" way to get them out of current .31-rc's,
> > right?
> `echo 0 > /proc/sys/kernel/timer_migration` should mitigate the problem.

I guess it should fix it entirely. Btw., here is a patch disabling the
timers' part, so to make it hrtimers only. Could you try?

Thanks,
Jarek P.
---

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..011429c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

cpu = smp_processor_id();

-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if 0
if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
int preferred_cpu = get_nohz_load_balancer();

2009-07-07 10:40:45

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

I am now running 2.6.31-rc2 for a couple of hours, no freeze.

Let me know what/if i can help with tracking down the original source
of the problem.

Thank you very much for your time,
Joao Correia

On Tue, Jul 7, 2009 at 7:50 AM, Jarek Poplawski<[email protected]> wrote:
> On Mon, Jul 06, 2009 at 07:26:43PM +0200, Andres Freund wrote:
>> On Monday 06 July 2009 19:23:18 Joao Correia wrote:
>> > Hello
>> >
>> > Since i already had the kernel compiled and ready to boot when i read
>> > this, i gave it a go anyway :-).
>> >
>> > I can reproduce the freeze with those 4 patches applied, so i can
>> > confirm that its, at least, related to, or exposed by, those patches.
>> > There must be something else too, or its just too much fuzziness, but
>> > the freeze takes a bit more time (approximately five minutes, give or
>> > take) compared to the instant freeze before, but its there with the
>> > patches, and without them, no freeze.
>> >
>> > I assume there isnt a "safe" way to get them out of current .31-rc's,
>> > right?
>> `echo 0 > /proc/sys/kernel/timer_migration` should mitigate the problem.
>
> I guess it should fix it entirely. Btw., here is a patch disabling the
> timers' part, so to make it hrtimers only. Could you try?
>
> Thanks,
> Jarek P.
> ---
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 0b36b9e..011429c 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>
> ? ? ? ?cpu = smp_processor_id();
>
> -#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> +#if 0
> ? ? ? ?if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> ? ? ? ? ? ? ? ?int preferred_cpu = get_nohz_load_balancer();
>
>

2009-07-07 10:47:29

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tuesday 07 July 2009 12:40:16 Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.
You dont see the problem anymore with the `echo 0 >
/proc/sys/kernel/timer_migration` change (or equivalently with the patch from
Jarek) or has the problem vanished completely?

Andres

2009-07-07 13:18:38

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.

OK, so we know it's only about timers. Here is another tiny patch
(the previous one should be removed), which could tell (with oops) if
there's something while migrating. Anyway, the bug should be back :-(

Thanks,
Jarek P.

2009-07-07 13:20:50

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Sorry, here is this tiny patch!

On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.

OK, so we know it's only about timers. Here is another tiny patch
(the previous one should be removed), which could tell (with oops) if
there's something while migrating. Anyway, the bug should be back :-(

Thanks,
Jarek P.
---

kernel/timer.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..61ba855 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -658,6 +658,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
+ BUG_ON(tbase_get_base(timer->base));
timer_set_base(timer, base);
}
}

2009-07-07 13:22:22

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tuesday 07 July 2009 15:18:03 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> > I am now running 2.6.31-rc2 for a couple of hours, no freeze.
> > Let me know what/if i can help with tracking down the original source
> > of the problem.
> OK, so we know it's only about timers. Here is another tiny patch
> (the previous one should be removed), which could tell (with oops) if
> there's something while migrating. Anyway, the bug should be back :-(
How do we know this? It still could be a race uncovered by timer migration,
right?

Andres

PS: You forgot the patch ;-)

2009-07-07 13:30:59

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tue, Jul 07, 2009 at 03:22:06PM +0200, Andres Freund wrote:
> On Tuesday 07 July 2009 15:18:03 Jarek Poplawski wrote:
> > On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> > > I am now running 2.6.31-rc2 for a couple of hours, no freeze.
> > > Let me know what/if i can help with tracking down the original source
> > > of the problem.
> > OK, so we know it's only about timers. Here is another tiny patch
> > (the previous one should be removed), which could tell (with oops) if
> > there's something while migrating. Anyway, the bug should be back :-(
> How do we know this? It still could be a race uncovered by timer migration,
> right?

Right. But (rather) not by or in hrtimers.

> PS: You forgot the patch ;-)

Yes, I hope you got it already ;-)

Thanks,
Jarek P.

2009-07-07 13:34:22

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tuesday 07 July 2009 15:29:37 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 03:22:06PM +0200, Andres Freund wrote:
> > On Tuesday 07 July 2009 15:18:03 Jarek Poplawski wrote:
> > > On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> > > > I am now running 2.6.31-rc2 for a couple of hours, no freeze.
> > > > Let me know what/if i can help with tracking down the original source
> > > > of the problem.
> > >
> > > OK, so we know it's only about timers. Here is another tiny patch
> > > (the previous one should be removed), which could tell (with oops) if
> > > there's something while migrating. Anyway, the bug should be back :-(
> > PS: You forgot the patch ;-)
> Yes, I hope you got it already ;-)
Yes. Can't reboot that machine right now, will test later.

Testing wether its triggerable inside a vm might be interesting...

Andres

2009-07-07 13:58:26

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
...
> Testing wether its triggerable inside a vm might be interesting...

Probably similarly to testing without this patch or even less. Maybe
I should've warned you but this type of bugs in -rc with possible
memory or stack overwrites might be fatal for your data (at least).

Jarek P.

2009-07-07 16:11:42

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> ...
> > Testing wether its triggerable inside a vm might be interesting...
> Probably similarly to testing without this patch or even less. Maybe
> I should've warned you but this type of bugs in -rc with possible
> memory or stack overwrites might be fatal for your data (at least).
Fortunately all the data on that machine should either be replaceable or
regularly backuped.

Will test later today if that patch bugs.

Andres

2009-07-08 08:09:33

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:
> On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > ...
> > > Testing wether its triggerable inside a vm might be interesting...
> > Probably similarly to testing without this patch or even less. Maybe
> > I should've warned you but this type of bugs in -rc with possible
> > memory or stack overwrites might be fatal for your data (at least).
> Fortunately all the data on that machine should either be replaceable or
> regularly backuped.
>
> Will test later today if that patch bugs.

If you didn't start yet, it would be nice to use this, btw:

CONFIG_HOTPLUG_CPU = N
CONFIG_DEBUG_OBJECTS = Y
CONFIG_DEBUG_OBJECTS_TIMERS = Y

Thanks,
Jarek P.

2009-07-08 08:29:51

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Wednesday 08 July 2009 10:08:52 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:
> > On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > > ...
> > >
> > > > Testing wether its triggerable inside a vm might be interesting...
> > >
> > > Probably similarly to testing without this patch or even less. Maybe
> > > I should've warned you but this type of bugs in -rc with possible
> > > memory or stack overwrites might be fatal for your data (at least).
> >
> > Fortunately all the data on that machine should either be replaceable or
> > regularly backuped.
> >
> > Will test later today if that patch bugs.
>
> If you didn't start yet, it would be nice to use this, btw:
>
> CONFIG_HOTPLUG_CPU = N
> CONFIG_DEBUG_OBJECTS = Y
> CONFIG_DEBUG_OBJECTS_TIMERS = Y
So I should test with a single cpu? Or is there a config where HOTPLUG_CPU does
not imply !SMP?

Andres

2009-07-08 09:14:31

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Wed, Jul 08, 2009 at 10:29:34AM +0200, Andres Freund wrote:
> On Wednesday 08 July 2009 10:08:52 Jarek Poplawski wrote:
> > On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:
> > > On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > > > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > > > ...
> > > >
> > > > > Testing wether its triggerable inside a vm might be interesting...
> > > >
> > > > Probably similarly to testing without this patch or even less. Maybe
> > > > I should've warned you but this type of bugs in -rc with possible
> > > > memory or stack overwrites might be fatal for your data (at least).
> > >
> > > Fortunately all the data on that machine should either be replaceable or
> > > regularly backuped.
> > >
> > > Will test later today if that patch bugs.
> >
> > If you didn't start yet, it would be nice to use this, btw:
> >
> > CONFIG_HOTPLUG_CPU = N
> > CONFIG_DEBUG_OBJECTS = Y
> > CONFIG_DEBUG_OBJECTS_TIMERS = Y
> So I should test with a single cpu? Or is there a config where HOTPLUG_CPU does
> not imply !SMP?

No, my single cpu should be enough ;-) There is something wrong I guess.
I can see in my menuconfig:

SMP [=y]
...
HOTPLUG [=n]
...
HOTPUG_CPU [=y]
...
Depends on SMP && HOTPLUG

So, let it be HOTPLUG_CPU = Y for now...

Jarek P.

2009-07-08 21:45:25

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Hello again

On Tue, Jul 7, 2009 at 11:47 AM, Andres Freund<[email protected]> wrote:
> On Tuesday 07 July 2009 12:40:16 Joao Correia wrote:
>> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>>
>> Let me know what/if i can help with tracking down the original source
>> of the problem.
> You dont see the problem anymore with the `echo 0 >
> /proc/sys/kernel/timer_migration` change (or equivalently with the patch from
> Jarek) or has the problem vanished completely?
>
> Andres
>
> On Tuesday 07 July 2009 13:03:50 Joao Correia wrote:
>> I dont see the problem with the patch from Jarek


I have to correct this information.
I had inserted `echo 0 >> /proc/sys/kernel/timer_migration` into
rc.local, and i left it there when i applied your first patch.

Im talking about this patch:

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..011429c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

cpu = smp_processor_id();

-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if 0

After removing the line from rc.local, and leaving only the patch, the
freeze still happens. The patch -does not- prevent the freeze. It was
my mistake saying it does, i totally forgot i had added that line to
rc.local.

So again, the only thing that stops that freeze is `echo 0 >>
/proc/sys/kernel/timer_migration`. Apologies for pointing you in the
wrong direction.

I also tried the other patch provided:

kernel/timer.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..61ba855 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -658,6 +658,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
+ BUG_ON(tbase_get_base(timer->base));
timer_set_base(timer, base);
}
}

but the OPS never triggers, either with your first patch or with the
echo 0 > proc[...]

I was under the impression that disabling the entry in /proc or
applying the first patch would provide the same result, but alas, it
does not.

Joao Correia

[PS Im providing the patches in this email to contextualize this so
that people dont get lost]

2009-07-08 22:08:24

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Wed, Jul 08, 2009 at 10:44:47PM +0100, Joao Correia wrote:
> Hello again
Hello!

...
> So again, the only thing that stops that freeze is `echo 0 >>
> /proc/sys/kernel/timer_migration`. Apologies for pointing you in the
> wrong direction.

No problem: the direction is almost right, we only need one U-turn ;-)
In case you're not bored or too bored, one little patch to check the
other side (after reverting the previous patch).

Thanks,
Jarek P.
---

kernel/hrtimer.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 9002958..23387e4 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -203,7 +203,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
int cpu, preferred_cpu = -1;

cpu = smp_processor_id();
-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if 0
if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
preferred_cpu = get_nohz_load_balancer();
if (preferred_cpu >= 0)

2009-07-08 22:23:34

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Wednesday 08 July 2009 10:08:52 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 06:11:27PM +0200, Andres Freund wrote:
> > On Tuesday 07 July 2009 15:57:42 Jarek Poplawski wrote:
> > > On Tue, Jul 07, 2009 at 03:34:07PM +0200, Andres Freund wrote:
> > > ...
> > >
> > > > Testing wether its triggerable inside a vm might be interesting...
> > >
> > > Probably similarly to testing without this patch or even less. Maybe
> > > I should've warned you but this type of bugs in -rc with possible
> > > memory or stack overwrites might be fatal for your data (at least).
> >
> > Fortunately all the data on that machine should either be replaceable or
> > regularly backuped.
> >
> > Will test later today if that patch bugs.
>
> If you didn't start yet, it would be nice to use this, btw:
> CONFIG_HOTPLUG_CPU = N
> CONFIG_DEBUG_OBJECTS = Y
> CONFIG_DEBUG_OBJECTS_TIMERS = Y
Unfortunately this just yields the same backtraces during softlockup and not
earlier.
I did not test without lockdep yet, but that should not have stopped the BUG
from appearing, right?


Andres


Attachments:
trace.txt (3.57 kB)

2009-07-08 22:28:06

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Wed, Jul 8, 2009 at 11:07 PM, Jarek Poplawski<[email protected]> wrote:
> On Wed, Jul 08, 2009 at 10:44:47PM +0100, Joao Correia wrote:
>> Hello again
> Hello!
>
> ...
>> So again, the only thing that stops that freeze is ?`echo 0 >>
>> /proc/sys/kernel/timer_migration`. Apologies for pointing you in the
>> wrong direction.
>
> No problem: the direction is almost right, we only need one U-turn ;-)
> In case you're not bored or too bored, one little patch to check the
> other side (after reverting the previous patch).
>
> Thanks,
> Jarek P.
> ---
>
> ?kernel/hrtimer.c | ? ?2 +-
> ?1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index 9002958..23387e4 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -203,7 +203,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
> ? ? ? ?int cpu, preferred_cpu = -1;
>
> ? ? ? ?cpu = smp_processor_id();
> -#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> +#if 0
> ? ? ? ?if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> ? ? ? ? ? ? ? ?preferred_cpu = get_nohz_load_balancer();
> ? ? ? ? ? ? ? ?if (preferred_cpu >= 0)
>

(this time i triple-checked :-) )

So, with only this last patch applied, no freeze. No need to disable
anything through /proc.

Where should i put the BUG_ON?

Joao Correia

2009-07-08 22:43:44

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Wed, Jul 08, 2009 at 11:27:30PM +0100, Joao Correia wrote:
> On Wed, Jul 8, 2009 at 11:07 PM, Jarek Poplawski<[email protected]> wrote:
> > On Wed, Jul 08, 2009 at 10:44:47PM +0100, Joao Correia wrote:
> >> Hello again
> > Hello!
> >
> > ...
> >> So again, the only thing that stops that freeze is ?`echo 0 >>
> >> /proc/sys/kernel/timer_migration`. Apologies for pointing you in the
> >> wrong direction.
> >
> > No problem: the direction is almost right, we only need one U-turn ;-)
> > In case you're not bored or too bored, one little patch to check the
> > other side (after reverting the previous patch).
> >
> > Thanks,
> > Jarek P.
> > ---
> >
> > ?kernel/hrtimer.c | ? ?2 +-
> > ?1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> > index 9002958..23387e4 100644
> > --- a/kernel/hrtimer.c
> > +++ b/kernel/hrtimer.c
> > @@ -203,7 +203,7 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
> > ? ? ? ?int cpu, preferred_cpu = -1;
> >
> > ? ? ? ?cpu = smp_processor_id();
> > -#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> > +#if 0
> > ? ? ? ?if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> > ? ? ? ? ? ? ? ?preferred_cpu = get_nohz_load_balancer();
> > ? ? ? ? ? ? ? ?if (preferred_cpu >= 0)
> >
>
> (this time i triple-checked :-) )
>
> So, with only this last patch applied, no freeze. No need to disable
> anything through /proc.
>
> Where should i put the BUG_ON?

Hmm... Not so fast! I've looked in timers till now; "tomorrow" I'll
"change resolution". ;-)

Thanks again,
Jarek P.

2009-07-08 22:48:39

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

>> (this time i triple-checked :-) )
>>
>> So, with only this last patch applied, no freeze. No need to disable
>> anything through /proc.
>>
>> Where should i put the BUG_ON?
>
> Hmm... Not so fast! I've looked in timers till now; "tomorrow" I'll
> "change resolution". ;-)
>
> Thanks again,
> Jarek P.
>

Of course :-)

Thanks for looking into this.
Joao Correia

2009-07-08 22:54:28

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, Jul 09, 2009 at 12:23:17AM +0200, Andres Freund wrote:
...
> Unfortunately this just yields the same backtraces during softlockup and not
> earlier.
> I did not test without lockdep yet, but that should not have stopped the BUG
> from appearing, right?

Since it looks like hrtimers now, these changes in timers shouldn't
matter. Let's wait for new ideas.

Thanks for testing anyway,
Jarek P.

2009-07-09 10:32:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> On Thu, Jul 09, 2009 at 12:23:17AM +0200, Andres Freund wrote:
> ...
> > Unfortunately this just yields the same backtraces during softlockup and not
> > earlier.
> > I did not test without lockdep yet, but that should not have stopped the BUG
> > from appearing, right?
>
> Since it looks like hrtimers now, these changes in timers shouldn't
> matter. Let's wait for new ideas.

Some background:

Up to 2.6.30 hrtimer_start() and add_timer() enqueue (hr)timers on the
CPU on which the functions are called. There is one exception when the
timer callback is currently running on another CPU then it is enqueued
on that other CPU.

The migration patches change that behaviour and enqeue the timer on
the nohz.idle_balancer CPU when parts of the system are idle.

With the migration code disabled (via sysctl or the #if 0 patch) the
timer is always enqeued on the same CPU, i.e. you get the 2.6.30
behaviour back.

As you found out it is probably related to hrtimers. Checking the
network code the only hrtimer users are in net/sched/sch_api.c and
net/sched/sch_cbq.c . There is some in net/can as well, but that's
probably irrelevant for the problem at hand.

I'm not familiar with that code, so I have no clue which problems
might pop up due to enqueueing the timer on another CPU, but there is
one pretty suspicios code sequence in cbq_ovl_delay()

expires = ktime_set(0, 0);
expires = ktime_add_ns(expires, PSCHED_US2NS(sched));
if (hrtimer_try_to_cancel(&q->delay_timer) &&
ktime_to_ns(ktime_sub(
hrtimer_get_expires(&q->delay_timer),
expires)) > 0)
hrtimer_set_expires(&q->delay_timer, expires);
hrtimer_restart(&q->delay_timer);

So we set the expiry value of the timer only when the timer was active
(hrtimer_try_to_cancel() returned != 0) and the new expiry time is
before the expiry time which was in the active timer. If the timer was
inactive we start the timer with the last expiry time which is
probably already in the past.

I'm quite sure that this is not causing the migration problem, because
we do not enqueue it on a different CPU when the timer is already
expired.

For completeness: hrtimer_try_to_cancel() can return -1 when the timer
callback is running. So in that case we also fiddle with the expiry
value and restart the timer while the callback code itself might do
the same. There is no serializiation of that code and the callback it
seems. The watchdog timer callback in sch_api.c is not serialized
either.

There is another oddity in cbq_undelay() which is the hrtimer callback
function:

if (delay) {
ktime_t time;

time = ktime_set(0, 0);
time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay));
hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS);

The canocial way to restart a hrtimer from the callback function is to
set the expiry value and return HRTIMER_RESTART.

}

sch->flags &= ~TCQ_F_THROTTLED;
__netif_schedule(qdisc_root(sch));
return HRTIMER_NORESTART;

Again, this should not cause the timer to be enqueued on another CPU
as we do not enqueue on a different CPU when the callback is running,
but see above ...

I have the feeling that the code relies on some implicit cpu
boundness, which is not longer guaranteed with the timer migration
changes, but that's a question for the network experts.

Thanks,

tglx

2009-07-09 10:44:59

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, Jul 09, 2009 at 12:31:53PM +0200, Thomas Gleixner wrote:
> On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > On Thu, Jul 09, 2009 at 12:23:17AM +0200, Andres Freund wrote:
> > ...
> > > Unfortunately this just yields the same backtraces during softlockup and not
> > > earlier.
> > > I did not test without lockdep yet, but that should not have stopped the BUG
> > > from appearing, right?
> >
> > Since it looks like hrtimers now, these changes in timers shouldn't
> > matter. Let's wait for new ideas.
>
> Some background:
...
> There is another oddity in cbq_undelay() which is the hrtimer callback
> function:
>
> if (delay) {
> ktime_t time;
>
> time = ktime_set(0, 0);
> time = ktime_add_ns(time, PSCHED_TICKS2NS(now + delay));
> hrtimer_start(&q->delay_timer, time, HRTIMER_MODE_ABS);
>
> The canocial way to restart a hrtimer from the callback function is to
> set the expiry value and return HRTIMER_RESTART.

OK, that's for later because we didn't use cbq here.

>
> }
>
> sch->flags &= ~TCQ_F_THROTTLED;
> __netif_schedule(qdisc_root(sch));
> return HRTIMER_NORESTART;
>
> Again, this should not cause the timer to be enqueued on another CPU
> as we do not enqueue on a different CPU when the callback is running,
> but see above ...
>
> I have the feeling that the code relies on some implicit cpu
> boundness, which is not longer guaranteed with the timer migration
> changes, but that's a question for the network experts.

As a matter of fact, I've just looked at this __netif_schedule(),
which really is cpu bound, so you might be 100% right.

Thanks for your help,
Jarek P.

2009-07-09 12:04:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> >
> > I have the feeling that the code relies on some implicit cpu
> > boundness, which is not longer guaranteed with the timer migration
> > changes, but that's a question for the network experts.
>
> As a matter of fact, I've just looked at this __netif_schedule(),
> which really is cpu bound, so you might be 100% right.

So the watchdog is the one which causes the trouble. The patch below
should fix this.

Thanks,

tglx
---

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 24d17ce..fbe554f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -485,7 +485,7 @@ void qdisc_watchdog_schedule(struct qdisc_watchdog *wd, psched_time_t expires)
wd->qdisc->flags |= TCQ_F_THROTTLED;
time = ktime_set(0, 0);
time = ktime_add_ns(time, PSCHED_TICKS2NS(expires));
- hrtimer_start(&wd->timer, time, HRTIMER_MODE_ABS);
+ hrtimer_start(&wd->timer, time, HRTIMER_MODE_ABS_PINNED);
}
EXPORT_SYMBOL(qdisc_watchdog_schedule);

2009-07-09 13:23:26

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, Jul 09, 2009 at 02:03:50PM +0200, Thomas Gleixner wrote:
> On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > >
> > > I have the feeling that the code relies on some implicit cpu
> > > boundness, which is not longer guaranteed with the timer migration
> > > changes, but that's a question for the network experts.
> >
> > As a matter of fact, I've just looked at this __netif_schedule(),
> > which really is cpu bound, so you might be 100% right.
>
> So the watchdog is the one which causes the trouble. The patch below
> should fix this.

I hope so. On the other hand it seems it should work with this
migration yet, so it probably needs additional debugging.

Thanks,
Jarek P.

> ---
>
> diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
> index 24d17ce..fbe554f 100644
> --- a/net/sched/sch_api.c
> +++ b/net/sched/sch_api.c
> @@ -485,7 +485,7 @@ void qdisc_watchdog_schedule(struct qdisc_watchdog *wd, psched_time_t expires)
> wd->qdisc->flags |= TCQ_F_THROTTLED;
> time = ktime_set(0, 0);
> time = ktime_add_ns(time, PSCHED_TICKS2NS(expires));
> - hrtimer_start(&wd->timer, time, HRTIMER_MODE_ABS);
> + hrtimer_start(&wd->timer, time, HRTIMER_MODE_ABS_PINNED);
> }
> EXPORT_SYMBOL(qdisc_watchdog_schedule);
>

2009-07-09 14:15:57

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> On Thu, Jul 09, 2009 at 02:03:50PM +0200, Thomas Gleixner wrote:
> > On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > > >
> > > > I have the feeling that the code relies on some implicit cpu
> > > > boundness, which is not longer guaranteed with the timer migration
> > > > changes, but that's a question for the network experts.
> > >
> > > As a matter of fact, I've just looked at this __netif_schedule(),
> > > which really is cpu bound, so you might be 100% right.
> >
> > So the watchdog is the one which causes the trouble. The patch below
> > should fix this.
>
> I hope so. On the other hand it seems it should work with this
> migration yet, so it probably needs additional debugging.

Right. I just provided the patch to narrow down the problem, but
please test the fix of the hrtimer migration code which I sent out a
bit earlier: http://lkml.org/lkml/2009/7/9/150

It fixes a possible endless loop in the timer code which is related to
the migration changes. Looking at the backtraces of the spinlock
lockup I think that is what you hit.

spin_lock(root_lock);
qdisc_run(q);
__qdisc_run(q);
dequeue_skb(q);
q->dequeue(q);
qdisc_watchdog_schedule();
hrtimer_start();
switch_hrtimer_base(); <- loops forever

Now the other CPU is stuck in dev_xmit() spin_lock(root_lock)

Thanks,

tglx

2009-07-09 14:24:44

by Jarek Poplawski

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, Jul 09, 2009 at 04:15:28PM +0200, Thomas Gleixner wrote:
> On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > On Thu, Jul 09, 2009 at 02:03:50PM +0200, Thomas Gleixner wrote:
> > > On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > > > >
> > > > > I have the feeling that the code relies on some implicit cpu
> > > > > boundness, which is not longer guaranteed with the timer migration
> > > > > changes, but that's a question for the network experts.
> > > >
> > > > As a matter of fact, I've just looked at this __netif_schedule(),
> > > > which really is cpu bound, so you might be 100% right.
> > >
> > > So the watchdog is the one which causes the trouble. The patch below
> > > should fix this.
> >
> > I hope so. On the other hand it seems it should work with this
> > migration yet, so it probably needs additional debugging.
>
> Right. I just provided the patch to narrow down the problem, but
> please test the fix of the hrtimer migration code which I sent out a
> bit earlier: http://lkml.org/lkml/2009/7/9/150
>
> It fixes a possible endless loop in the timer code which is related to
> the migration changes. Looking at the backtraces of the spinlock
> lockup I think that is what you hit.

Actually, Andres and Joao hit this, and I hope they'll try these two
patches.

Thanks,
Jarek P.

2009-07-09 14:26:29

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, Jul 9, 2009 at 3:24 PM, Jarek Poplawski<[email protected]> wrote:
> On Thu, Jul 09, 2009 at 04:15:28PM +0200, Thomas Gleixner wrote:
>> On Thu, 9 Jul 2009, Jarek Poplawski wrote:
>> > On Thu, Jul 09, 2009 at 02:03:50PM +0200, Thomas Gleixner wrote:
>> > > On Thu, 9 Jul 2009, Jarek Poplawski wrote:
>> > > > >
>> > > > > I have the feeling that the code relies on some implicit cpu
>> > > > > boundness, which is not longer guaranteed with the timer migration
>> > > > > changes, but that's a question for the network experts.
>> > > >
>> > > > As a matter of fact, I've just looked at this __netif_schedule(),
>> > > > which really is cpu bound, so you might be 100% right.
>> > >
>> > > So the watchdog is the one which causes the trouble. The patch below
>> > > should fix this.
>> >
>> > I hope so. On the other hand it seems it should work with this
>> > migration yet, so it probably needs additional debugging.
>>
>> Right. I just provided the patch to narrow down the problem, but
>> please test the fix of the hrtimer migration code which I sent out a
>> bit earlier: http://lkml.org/lkml/2009/7/9/150
>>
>> It fixes a possible endless loop in the timer code which is related to
>> the migration changes. Looking at the backtraces of the spinlock
>> lockup I think that is what you hit.
>
> Actually, Andres and Joao hit this, and I hope they'll try these two
> patches.
>
> Thanks,
> Jarek P.
>

I can only try later on today. Will post back as soon as i do it.

Joao Correia

2009-07-09 14:28:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> On Thu, Jul 09, 2009 at 04:15:28PM +0200, Thomas Gleixner wrote:
> > On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > > On Thu, Jul 09, 2009 at 02:03:50PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > > > > >
> > > > > > I have the feeling that the code relies on some implicit cpu
> > > > > > boundness, which is not longer guaranteed with the timer migration
> > > > > > changes, but that's a question for the network experts.
> > > > >
> > > > > As a matter of fact, I've just looked at this __netif_schedule(),
> > > > > which really is cpu bound, so you might be 100% right.
> > > >
> > > > So the watchdog is the one which causes the trouble. The patch below
> > > > should fix this.
> > >
> > > I hope so. On the other hand it seems it should work with this
> > > migration yet, so it probably needs additional debugging.
> >
> > Right. I just provided the patch to narrow down the problem, but
> > please test the fix of the hrtimer migration code which I sent out a
> > bit earlier: http://lkml.org/lkml/2009/7/9/150
> >
> > It fixes a possible endless loop in the timer code which is related to
> > the migration changes. Looking at the backtraces of the spinlock
> > lockup I think that is what you hit.
>
> Actually, Andres and Joao hit this, and I hope they'll try these two
> patches.

Please test them separate from each other. The one I sent in this
thread was just for narrowing down the issue, but I'm now quite sure
that they really hit the issue which is addressed by the hrtimer
patch.

Thanks,

tglx

2009-07-09 15:28:38

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Hi,

On Thursday 09 July 2009 16:28:05 Thomas Gleixner wrote:
> On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > On Thu, Jul 09, 2009 at 04:15:28PM +0200, Thomas Gleixner wrote:
> > > On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > > > On Thu, Jul 09, 2009 at 02:03:50PM +0200, Thomas Gleixner wrote:
> > > > > On Thu, 9 Jul 2009, Jarek Poplawski wrote:
> > > > > > > I have the feeling that the code relies on some implicit cpu
> > > > > > > boundness, which is not longer guaranteed with the timer
> > > > > > > migration changes, but that's a question for the network
> > > > > > > experts.
> > > > > >
> > > > > > As a matter of fact, I've just looked at this __netif_schedule(),
> > > > > > which really is cpu bound, so you might be 100% right.
> > > > >
> > > > > So the watchdog is the one which causes the trouble. The patch
> > > > > below should fix this.
> > > >
> > > > I hope so. On the other hand it seems it should work with this
> > > > migration yet, so it probably needs additional debugging.
> > >
> > > Right. I just provided the patch to narrow down the problem, but
> > > please test the fix of the hrtimer migration code which I sent out a
> > > bit earlier: http://lkml.org/lkml/2009/7/9/150
> > >
> > > It fixes a possible endless loop in the timer code which is related to
> > > the migration changes. Looking at the backtraces of the spinlock
> > > lockup I think that is what you hit.
> >
> > Actually, Andres and Joao hit this, and I hope they'll try these two
> > patches.
>
> Please test them separate from each other. The one I sent in this
> thread was just for narrowing down the issue, but I'm now quite sure
> that they really hit the issue which is addressed by the hrtimer
> patch.
No crash yet. 15min running (seconds to a minute before).

Will let it run for some hours to be sure.

Nice!

Andres

2009-07-09 16:02:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Andres,

On Thu, 9 Jul 2009, Andres Freund wrote:
> On Thursday 09 July 2009 16:28:05 Thomas Gleixner wrote:
> > Please test them separate from each other. The one I sent in this
> > thread was just for narrowing down the issue, but I'm now quite sure
> > that they really hit the issue which is addressed by the hrtimer
> > patch.
> No crash yet. 15min running (seconds to a minute before).
>
> Will let it run for some hours to be sure.

Which one of the patches are you testing ?

Thanks,

tglx

2009-07-09 16:46:52

by Andres Freund

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thursday 09 July 2009 18:01:56 Thomas Gleixner wrote:
> Andres,
>
> On Thu, 9 Jul 2009, Andres Freund wrote:
> > On Thursday 09 July 2009 16:28:05 Thomas Gleixner wrote:
> > > Please test them separate from each other. The one I sent in this
> > > thread was just for narrowing down the issue, but I'm now quite sure
> > > that they really hit the issue which is addressed by the hrtimer
> > > patch.
> >
> > No crash yet. 15min running (seconds to a minute before).
> >
> > Will let it run for some hours to be sure.
>
> Which one of the patches are you testing ?
Your "real" one, i.e. de907e8432b08f2d5966c36e0747e97c0e596810

1h30m now...


Andres

2009-07-09 17:45:05

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

Andres,

On Thu, 9 Jul 2009, Andres Freund wrote:

> On Thursday 09 July 2009 18:01:56 Thomas Gleixner wrote:
> > Andres,
> >
> > On Thu, 9 Jul 2009, Andres Freund wrote:
> > > On Thursday 09 July 2009 16:28:05 Thomas Gleixner wrote:
> > > > Please test them separate from each other. The one I sent in this
> > > > thread was just for narrowing down the issue, but I'm now quite sure
> > > > that they really hit the issue which is addressed by the hrtimer
> > > > patch.
> > >
> > > No crash yet. 15min running (seconds to a minute before).
> > >
> > > Will let it run for some hours to be sure.
> >
> > Which one of the patches are you testing ?
> Your "real" one, i.e. de907e8432b08f2d5966c36e0747e97c0e596810
>
> 1h30m now...

Looks like I hit the nail on the head. Queueing it up for Linus.

Thanks,

tglx

2009-07-09 21:20:12

by Joao Correia

[permalink] [raw]
Subject: Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem)

On Thu, Jul 9, 2009 at 6:44 PM, Thomas Gleixner<[email protected]> wrote:
> Andres,
>
> On Thu, 9 Jul 2009, Andres Freund wrote:
>
>> On Thursday 09 July 2009 18:01:56 Thomas Gleixner wrote:
>> > Andres,
>> >
>> > On Thu, 9 Jul 2009, Andres Freund wrote:
>> > > On Thursday 09 July 2009 16:28:05 Thomas Gleixner wrote:
>> > > > Please test them separate from each other. The one I sent in this
>> > > > thread was just for narrowing down the issue, but I'm now quite sure
>> > > > that they really hit the issue which is addressed by the hrtimer
>> > > > patch.
>> > >
>> > > No crash yet. 15min running (seconds to a minute before).
>> > >
>> > > Will let it run for some hours to be sure.
>> >
>> > Which one of the patches are you testing ?
>> Your "real" one, i.e. de907e8432b08f2d5966c36e0747e97c0e596810
>>
>> 1h30m now...
>
> Looks like I hit the nail on the head. Queueing it up for Linus.
>
> Thanks,
>
> ? ? ? ?tglx
>

Confirmed working from me as well.

Thank you for your time,
Joao Correia