2012-08-05 16:55:10

by Sasha Levin

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 07/30/2012 03:17 PM, Avi Kivity wrote:
> Possible causes:
> - the APIC calibration in the guest failed, so it is programming too
> low values into the timer
> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
> interrupt injection is slowing it down)
>
> You can try to find out by changing
> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
> (say) 20 microseconds which will let the guest live long enough for you
> to ftrace it and see what kind of timers it is programming.

I've kept trying to narrow it down, and found out It's triggerable using adjtimex().

At that point I've bisected it, and got the following commit (parties Cc'ed):

commit 5baefd6d84163443215f4a99f6a20f054ef11236
Author: John Stultz <[email protected]>
Date: Tue Jul 10 18:43:25 2012 -0400

hrtimer: Update hrtimer base offsets each hrtimer_interrupt


I've also confirmed that reverting the commit above on top of linux-next indeed fixes the issue.


2012-08-06 17:25:26

by john stultz

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/05/2012 09:55 AM, Sasha Levin wrote:
> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>> Possible causes:
>> - the APIC calibration in the guest failed, so it is programming too
>> low values into the timer
>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>> interrupt injection is slowing it down)
>>
>> You can try to find out by changing
>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>> (say) 20 microseconds which will let the guest live long enough for you
>> to ftrace it and see what kind of timers it is programming.
> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>
> At that point I've bisected it, and got the following commit (parties Cc'ed):
>
> commit 5baefd6d84163443215f4a99f6a20f054ef11236
> Author: John Stultz <[email protected]>
> Date: Tue Jul 10 18:43:25 2012 -0400
>
> hrtimer: Update hrtimer base offsets each hrtimer_interrupt
>
>
> I've also confirmed that reverting the commit above on top of linux-next indeed fixes the issue.
Hey Sasha,
Thanks for the heads up. I don't have a clear sense of what could
be wrong here yet, but if you see this with 3.6-rc but not 3.5, could
you try the fix(1d17d17484d40f2d5b35c79518597a2b25296996) Ingo just made
on tip/timers/urgent?

thanks
-john


2012-08-06 18:13:01

by john stultz

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/06/2012 10:21 AM, John Stultz wrote:
> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>> Possible causes:
>>> - the APIC calibration in the guest failed, so it is programming too
>>> low values into the timer
>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>> interrupt injection is slowing it down)
>>>
>>> You can try to find out by changing
>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>> (say) 20 microseconds which will let the guest live long enough for you
>>> to ftrace it and see what kind of timers it is programming.
>> I've kept trying to narrow it down, and found out It's triggerable
>> using adjtimex().
>>
>> At that point I've bisected it, and got the following commit (parties
>> Cc'ed):
>>
>> commit 5baefd6d84163443215f4a99f6a20f054ef11236
>> Author: John Stultz <[email protected]>
>> Date: Tue Jul 10 18:43:25 2012 -0400
>>
>> hrtimer: Update hrtimer base offsets each hrtimer_interrupt
>>
>>
>> I've also confirmed that reverting the commit above on top of
>> linux-next indeed fixes the issue.
> Hey Sasha,
> Thanks for the heads up. I don't have a clear sense of what could
> be wrong here yet, but if you see this with 3.6-rc but not 3.5, could
> you try the fix(1d17d17484d40f2d5b35c79518597a2b25296996) Ingo just
> made on tip/timers/urgent?

Reading over the thread here, the large timeouts also made me think that
it could also be related to this pending fix:
http://lkml.org/lkml/2012/8/1/436

Its not a clear cut solution though, since the edge case that limits
usually results in a hang since we stop expiring timers all together.

Still working to reproduce what you're seeing, and will let you know as
soon as I have any more info.

thanks
-john

2012-08-06 18:22:33

by john stultz

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/06/2012 10:21 AM, John Stultz wrote:
> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>> Possible causes:
>>> - the APIC calibration in the guest failed, so it is programming too
>>> low values into the timer
>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>> interrupt injection is slowing it down)
>>>
>>> You can try to find out by changing
>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>> (say) 20 microseconds which will let the guest live long enough for you
>>> to ftrace it and see what kind of timers it is programming.
>> I've kept trying to narrow it down, and found out It's triggerable
>> using adjtimex().
Sorry, one more question: Could you provide details on how is it
trigger-able using adjtimex?

thanks
-john

2012-08-06 18:28:22

by Sasha Levin

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/06/2012 08:20 PM, John Stultz wrote:
> On 08/06/2012 10:21 AM, John Stultz wrote:
>> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>>> Possible causes:
>>>> - the APIC calibration in the guest failed, so it is programming too
>>>> low values into the timer
>>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>>> interrupt injection is slowing it down)
>>>>
>>>> You can try to find out by changing
>>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>>> (say) 20 microseconds which will let the guest live long enough for you
>>>> to ftrace it and see what kind of timers it is programming.
>>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
> Sorry, one more question: Could you provide details on how is it trigger-able using adjtimex?

It triggers after a while of fuzzing using trinity of just adjtimex ('./trinity --quiet -l off -cadjtimex').

Trinity is available here: http://git.codemonkey.org.uk/?p=trinity.git .

Let me know if I can help further with reproducing this, I can probably copy over my testing environment to some other host if you'd like.

2012-08-06 19:23:50

by Sasha Levin

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/06/2012 07:21 PM, John Stultz wrote:
> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>> Possible causes:
>>> - the APIC calibration in the guest failed, so it is programming too
>>> low values into the timer
>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>> interrupt injection is slowing it down)
>>>
>>> You can try to find out by changing
>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>> (say) 20 microseconds which will let the guest live long enough for you
>>> to ftrace it and see what kind of timers it is programming.
>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>>
>> At that point I've bisected it, and got the following commit (parties Cc'ed):
>>
>> commit 5baefd6d84163443215f4a99f6a20f054ef11236
>> Author: John Stultz <[email protected]>
>> Date: Tue Jul 10 18:43:25 2012 -0400
>>
>> hrtimer: Update hrtimer base offsets each hrtimer_interrupt
>>
>>
>> I've also confirmed that reverting the commit above on top of linux-next indeed fixes the issue.
> Hey Sasha,
> Thanks for the heads up. I don't have a clear sense of what could be wrong here yet, but if you see this with 3.6-rc but not 3.5, could you try the fix(1d17d17484d40f2d5b35c79518597a2b25296996) Ingo just made on tip/timers/urgent?

I've just tested it, and it didn't solve the issue.

2012-08-06 20:34:16

by john stultz

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/06/2012 11:28 AM, Sasha Levin wrote:
> On 08/06/2012 08:20 PM, John Stultz wrote:
>> On 08/06/2012 10:21 AM, John Stultz wrote:
>>> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>>>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>>>> Possible causes:
>>>>> - the APIC calibration in the guest failed, so it is programming too
>>>>> low values into the timer
>>>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>>>> interrupt injection is slowing it down)
>>>>>
>>>>> You can try to find out by changing
>>>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>>>> (say) 20 microseconds which will let the guest live long enough for you
>>>>> to ftrace it and see what kind of timers it is programming.
>>>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>> Sorry, one more question: Could you provide details on how is it trigger-able using adjtimex?
> It triggers after a while of fuzzing using trinity of just adjtimex ('./trinity --quiet -l off -cadjtimex').
>
> Trinity is available here: http://git.codemonkey.org.uk/?p=trinity.git .
>
> Let me know if I can help further with reproducing this, I can probably copy over my testing environment to some other host if you'd like.
So far no luck. Dmesg mostly just gets filled up with trinity-child
OOMs. How much memory are you running with?

Are you running trinity as root or as some user that has CAP_SYS_TIME
and can actually change values via adjtimex? Or does it trip just by
reading the values?

thanks
-john

2012-08-06 20:34:46

by Sasha Levin

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/06/2012 10:31 PM, John Stultz wrote:
> On 08/06/2012 11:28 AM, Sasha Levin wrote:
>> On 08/06/2012 08:20 PM, John Stultz wrote:
>>> On 08/06/2012 10:21 AM, John Stultz wrote:
>>>> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>>>>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>>>>> Possible causes:
>>>>>> - the APIC calibration in the guest failed, so it is programming too
>>>>>> low values into the timer
>>>>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>>>>> interrupt injection is slowing it down)
>>>>>>
>>>>>> You can try to find out by changing
>>>>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>>>>> (say) 20 microseconds which will let the guest live long enough for you
>>>>>> to ftrace it and see what kind of timers it is programming.
>>>>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>>> Sorry, one more question: Could you provide details on how is it trigger-able using adjtimex?
>> It triggers after a while of fuzzing using trinity of just adjtimex ('./trinity --quiet -l off -cadjtimex').
>>
>> Trinity is available here: http://git.codemonkey.org.uk/?p=trinity.git .
>>
>> Let me know if I can help further with reproducing this, I can probably copy over my testing environment to some other host if you'd like.
> So far no luck. Dmesg mostly just gets filled up with trinity-child OOMs. How much memory are you running with?
>
> Are you running trinity as root or as some user that has CAP_SYS_TIME and can actually change values via adjtimex? Or does it trip just by reading the values?

As root in a disposable vm. It triggers at a random point, not after a specific call.

2012-08-07 05:05:53

by Michael wang

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/07/2012 04:35 AM, Sasha Levin wrote:
> On 08/06/2012 10:31 PM, John Stultz wrote:
>> On 08/06/2012 11:28 AM, Sasha Levin wrote:
>>> On 08/06/2012 08:20 PM, John Stultz wrote:
>>>> On 08/06/2012 10:21 AM, John Stultz wrote:
>>>>> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>>>>>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>>>>>> Possible causes:
>>>>>>> - the APIC calibration in the guest failed, so it is programming too
>>>>>>> low values into the timer
>>>>>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>>>>>> interrupt injection is slowing it down)
>>>>>>>
>>>>>>> You can try to find out by changing
>>>>>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>>>>>> (say) 20 microseconds which will let the guest live long enough for you
>>>>>>> to ftrace it and see what kind of timers it is programming.
>>>>>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>>>> Sorry, one more question: Could you provide details on how is it trigger-able using adjtimex?
>>> It triggers after a while of fuzzing using trinity of just adjtimex ('./trinity --quiet -l off -cadjtimex').
>>>
>>> Trinity is available here: http://git.codemonkey.org.uk/?p=trinity.git .
>>>
>>> Let me know if I can help further with reproducing this, I can probably copy over my testing environment to some other host if you'd like.
>> So far no luck. Dmesg mostly just gets filled up with trinity-child OOMs. How much memory are you running with?
>>
>> Are you running trinity as root or as some user that has CAP_SYS_TIME and can actually change values via adjtimex? Or does it trip just by reading the values?
>
> As root in a disposable vm. It triggers at a random point, not after a specific call.

I have tested with a 3.6.0-rc1 guest again, running command:

./trinity --quiet -l off -cadjtimex --dangerous

for normal user:
only oom info
for root:
the guest hung without any stall info printed

I'm not sure how this trinity tool implemented, but at least it do help
to produce some rarely kernel bug...

And could you please also provide the way you start the guest? Is there
any special option?

Regards,
Michael Wang

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2012-08-07 05:42:20

by john stultz

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/06/2012 11:28 AM, Sasha Levin wrote:
> On 08/06/2012 08:20 PM, John Stultz wrote:
>> On 08/06/2012 10:21 AM, John Stultz wrote:
>>> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>>>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>>>> Possible causes:
>>>>> - the APIC calibration in the guest failed, so it is programming too
>>>>> low values into the timer
>>>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>>>> interrupt injection is slowing it down)
>>>>>
>>>>> You can try to find out by changing
>>>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>>>> (say) 20 microseconds which will let the guest live long enough for you
>>>>> to ftrace it and see what kind of timers it is programming.
>>>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>> Sorry, one more question: Could you provide details on how is it trigger-able using adjtimex?
> It triggers after a while of fuzzing using trinity of just adjtimex ('./trinity --quiet -l off -cadjtimex').
>
> Trinity is available here: http://git.codemonkey.org.uk/?p=trinity.git .
>
> Let me know if I can help further with reproducing this, I can probably copy over my testing environment to some other host if you'd like.

Ok. Finally I *think* got it reproduced. (Had some trouble initially,
as I think since the first time I ran it as a normal user, the socket
cache isn't the same as if you run it the first time as root? Anyway,
after doing a make clean and rebuilding it started to trigger).

I'm not seeing the rcu stall message, but I do manage to trigger two
other behaviors: a hard hang and a sort of zombie state where memory
isn't properly being freed & everything starts segfaulting. So this
may not be the exact same issue, but it triggers quickly as you
described (within a few seconds of running trinity as root).

It looks like both of these issues are caused by
adjtimex(ADJ_SETOFFSET), which adds or subtracts a huge offset and that
either goes negative or gets clamped to a ktime_t at KTIME_MAX (if you
get clamped the system hangs, if it goes negative, the system barely
functions, but sort of drags along).

An updated version of my KTIME_MAX sanity checking patch to handle both
of these conditions is below.

Would you mind giving this patch a shot and letting me know if you still
see problems?

thanks
-john

From 7a37a171f8b93ce8f89137d2dfac37fdc45994ba Mon Sep 17 00:00:00 2001
From: John Stultz <[email protected]>
Date: Tue, 31 Jul 2012 02:06:14 -0400
Subject: [PATCH] time: Improve sanity checking of timekeeping inputs

Unexpected behavior could occur if the time is set to
a value large enough to overflow a 64bit ktime_t
(which is something larger then the year 2262).

Also unexpected behavior could occur if large negative
offsets are injected via adjtimex.

So this patch improves the sanity check timekeeping inputs
by improving the timespec_valid() check, and then makes better
use of timespec_valid() to make sure we don't set the time to
an invalid negative value or one that overflows ktime_t.

Note: This does not protect from setting the time close to
overflowing ktime_t and then letting natural accumulation
cause the overflow.

Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Zhouping Liu <[email protected]>
Cc: CAI Qian <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: [email protected]
Reported-by: CAI Qian <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
include/linux/ktime.h | 7 -------
include/linux/time.h | 23 +++++++++++++++++++++--
kernel/time/timekeeping.c | 26 ++++++++++++++++++++++++--
3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index 603bec2..06177ba10 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -58,13 +58,6 @@ union ktime {

typedef union ktime ktime_t; /* Kill this */

-#define KTIME_MAX ((s64)~((u64)1 << 63))
-#if (BITS_PER_LONG == 64)
-# define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
-#else
-# define KTIME_SEC_MAX LONG_MAX
-#endif
-
/*
* ktime_t definitions when using the 64-bit scalar representation:
*/
diff --git a/include/linux/time.h b/include/linux/time.h
index c81c5e4..68e68c5 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -107,11 +107,30 @@ static inline struct timespec timespec_sub(struct timespec lhs,
return ts_delta;
}

+#define KTIME_MAX ((s64)~((u64)1 << 63))
+#if (BITS_PER_LONG == 64)
+# define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
+#else
+# define KTIME_SEC_MAX LONG_MAX
+#endif
+
/*
* Returns true if the timespec is norm, false if denorm:
*/
-#define timespec_valid(ts) \
- (((ts)->tv_sec >= 0) && (((unsigned long) (ts)->tv_nsec) < NSEC_PER_SEC))
+static inline bool timespec_valid(const struct timespec *ts)
+{
+ /* Dates before 1970 are bogus */
+ if (ts->tv_sec < 0)
+ return false;
+ /* Can't have more nanoseconds then a second */
+ if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
+ return false;
+ /* Disallow values that could overflow ktime_t */
+ if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)
+ return false;
+ return true;
+}
+

extern void read_persistent_clock(struct timespec *ts);
extern void read_boot_clock(struct timespec *ts);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index e16af19..898bef0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -427,7 +427,7 @@ int do_settimeofday(const struct timespec *tv)
struct timespec ts_delta, xt;
unsigned long flags;

- if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
+ if (!timespec_valid(tv))
return -EINVAL;

write_seqlock_irqsave(&tk->lock, flags);
@@ -463,6 +463,8 @@ int timekeeping_inject_offset(struct timespec *ts)
{
struct timekeeper *tk = &timekeeper;
unsigned long flags;
+ struct timespec tmp;
+ int ret = 0;

if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
return -EINVAL;
@@ -471,10 +473,17 @@ int timekeeping_inject_offset(struct timespec *ts)

timekeeping_forward_now(tk);

+ /* Make sure the proposed value is valid */
+ tmp = timespec_add(tk_xtime(tk), *ts);
+ if (!timespec_valid(&tmp)) {
+ ret = -EINVAL;
+ goto error;
+ }

tk_xtime_add(tk, ts);
tk_set_wall_to_mono(tk, timespec_sub(tk->wall_to_monotonic, *ts));

+error: /* even if we error out, we forwarded the time, so call update */
timekeeping_update(tk, true);

write_sequnlock_irqrestore(&tk->lock, flags);
@@ -482,7 +491,7 @@ int timekeeping_inject_offset(struct timespec *ts)
/* signal hrtimers about time change */
clock_was_set();

- return 0;
+ return ret;
}
EXPORT_SYMBOL(timekeeping_inject_offset);

@@ -649,7 +658,20 @@ void __init timekeeping_init(void)
struct timespec now, boot, tmp;

read_persistent_clock(&now);
+ if (!timespec_valid(&now)) {
+ pr_warn("WARNING: Persistent clock returned invalid value!\n"
+ " Check your CMOS/BIOS settings.\n");
+ now.tv_sec = 0;
+ now.tv_nsec = 0;
+ }
+
read_boot_clock(&boot);
+ if (!timespec_valid(&boot)) {
+ pr_warn("WARNING: Boot clock returned invalid value!\n"
+ " Check your CMOS/BIOS settings.\n");
+ boot.tv_sec = 0;
+ boot.tv_nsec = 0;
+ }

seqlock_init(&tk->lock);

--
1.7.9.5

2012-08-07 10:24:26

by Sasha Levin

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/07/2012 07:40 AM, John Stultz wrote:
> On 08/06/2012 11:28 AM, Sasha Levin wrote:
>> On 08/06/2012 08:20 PM, John Stultz wrote:
>>> On 08/06/2012 10:21 AM, John Stultz wrote:
>>>> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>>>>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>>>>> Possible causes:
>>>>>> - the APIC calibration in the guest failed, so it is programming too
>>>>>> low values into the timer
>>>>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>>>>> interrupt injection is slowing it down)
>>>>>>
>>>>>> You can try to find out by changing
>>>>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>>>>> (say) 20 microseconds which will let the guest live long enough for you
>>>>>> to ftrace it and see what kind of timers it is programming.
>>>>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>>> Sorry, one more question: Could you provide details on how is it trigger-able using adjtimex?
>> It triggers after a while of fuzzing using trinity of just adjtimex ('./trinity --quiet -l off -cadjtimex').
>>
>> Trinity is available here: http://git.codemonkey.org.uk/?p=trinity.git .
>>
>> Let me know if I can help further with reproducing this, I can probably copy over my testing environment to some other host if you'd like.
>
> Ok. Finally I *think* got it reproduced. (Had some trouble initially, as I think since the first time I ran it as a normal user, the socket cache isn't the same as if you run it the first time as root? Anyway, after doing a make clean and rebuilding it started to trigger).
>
> I'm not seeing the rcu stall message, but I do manage to trigger two other behaviors: a hard hang and a sort of zombie state where memory isn't properly being freed & everything starts segfaulting. So this may not be the exact same issue, but it triggers quickly as you described (within a few seconds of running trinity as root).
>
> It looks like both of these issues are caused by adjtimex(ADJ_SETOFFSET), which adds or subtracts a huge offset and that either goes negative or gets clamped to a ktime_t at KTIME_MAX (if you get clamped the system hangs, if it goes negative, the system barely functions, but sort of drags along).
>
> An updated version of my KTIME_MAX sanity checking patch to handle both of these conditions is below.
>
> Would you mind giving this patch a shot and letting me know if you still see problems?

This fixes the problem for me. Thanks!

2012-08-08 02:20:21

by Michael wang

[permalink] [raw]
Subject: Re: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks on v3.6

On 08/07/2012 06:24 PM, Sasha Levin wrote:
> On 08/07/2012 07:40 AM, John Stultz wrote:
>> On 08/06/2012 11:28 AM, Sasha Levin wrote:
>>> On 08/06/2012 08:20 PM, John Stultz wrote:
>>>> On 08/06/2012 10:21 AM, John Stultz wrote:
>>>>> On 08/05/2012 09:55 AM, Sasha Levin wrote:
>>>>>> On 07/30/2012 03:17 PM, Avi Kivity wrote:
>>>>>>> Possible causes:
>>>>>>> - the APIC calibration in the guest failed, so it is programming too
>>>>>>> low values into the timer
>>>>>>> - it actually needs 1 us wakeups and then can't keep up (esp. as kvm
>>>>>>> interrupt injection is slowing it down)
>>>>>>>
>>>>>>> You can try to find out by changing
>>>>>>> arch/x86/kvm/lapic.c:start_lapic_timer() to impose a minimum wakeup of
>>>>>>> (say) 20 microseconds which will let the guest live long enough for you
>>>>>>> to ftrace it and see what kind of timers it is programming.
>>>>>> I've kept trying to narrow it down, and found out It's triggerable using adjtimex().
>>>> Sorry, one more question: Could you provide details on how is it trigger-able using adjtimex?
>>> It triggers after a while of fuzzing using trinity of just adjtimex ('./trinity --quiet -l off -cadjtimex').
>>>
>>> Trinity is available here: http://git.codemonkey.org.uk/?p=trinity.git .
>>>
>>> Let me know if I can help further with reproducing this, I can probably copy over my testing environment to some other host if you'd like.
>>
>> Ok. Finally I *think* got it reproduced. (Had some trouble initially, as I think since the first time I ran it as a normal user, the socket cache isn't the same as if you run it the first time as root? Anyway, after doing a make clean and rebuilding it started to trigger).
>>
>> I'm not seeing the rcu stall message, but I do manage to trigger two other behaviors: a hard hang and a sort of zombie state where memory isn't properly being freed & everything starts segfaulting. So this may not be the exact same issue, but it triggers quickly as you described (within a few seconds of running trinity as root).
>>
>> It looks like both of these issues are caused by adjtimex(ADJ_SETOFFSET), which adds or subtracts a huge offset and that either goes negative or gets clamped to a ktime_t at KTIME_MAX (if you get clamped the system hangs, if it goes negative, the system barely functions, but sort of drags along).
>>
>> An updated version of my KTIME_MAX sanity checking patch to handle both of these conditions is below.
>>
>> Would you mind giving this patch a shot and letting me know if you still see problems?
>
> This fixes the problem for me. Thanks!

Could you please also try to reduce the CONFIG_RCU_CPU_STALL_TIMEOUT
from 60 to 3 and see whether the rcu stall info will show again?(sorry I
can't do it by my self since the stall info never appear in my sight...)

That will be very helpful as I suppose it is time to recover
CONFIG_RCU_CPU_STALL_TIMEOUT to a reasonable value now ;-)

Regards,
Michael Wang

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>