2009-04-29 16:45:52

by john stultz

[permalink] [raw]
Subject: Re: TSC marked unstable on suspend and resume

On Fri, Feb 20, 2009 at 7:39 PM, David Fries <[email protected]> wrote:
> I'm getting TSC marked as unsable on hibernate to disk with
> 2.6.29-rc3. ?The last kernel I ran 2.6.24.4 ran 300+ days without a
> problem and does not loose TSC on hibernate.
>
> I'm hibernating with `echo disk > /sys/power/state`.
> These look like the relavant messages,
>
> CPU 0 irqstacks, hard=c0379000 soft=c0378000
> PID hash table entries: 1024 (order: 10, 4096 bytes)
> Fast TSC calibration using PIT
> Detected 300.705 MHz processor.
> Calibrating delay loop (skipped), value calculated using timer frequency.. 601.41 BogoMIPS (lpj=300705)
> hibernate to disk,
> Restarting tasks ... done.
> Clocksource tsc unstable (delta = 499883531 ns)

Hmm. Seems like the clocksource watchdog is having some problem
handling state around hibernate.

Out of curiousity: Does suspend-to-ram also trigger this?

Can you send me a full dmesg log?

thanks
-john


2009-04-29 23:40:31

by David Fries

[permalink] [raw]
Subject: Re: TSC marked unstable on suspend and resume

On Wed, Apr 29, 2009 at 09:45:40AM -0700, john stultz wrote:
> On Fri, Feb 20, 2009 at 7:39 PM, David Fries <[email protected]> wrote:
> > I'm getting TSC marked as unsable on hibernate to disk with
> > 2.6.29-rc3. ?The last kernel I ran 2.6.24.4 ran 300+ days without a
> > problem and does not loose TSC on hibernate.
> >
> > I'm hibernating with `echo disk > /sys/power/state`.
> > These look like the relavant messages,
> >
> > CPU 0 irqstacks, hard=c0379000 soft=c0378000
> > PID hash table entries: 1024 (order: 10, 4096 bytes)
> > Fast TSC calibration using PIT
> > Detected 300.705 MHz processor.
> > Calibrating delay loop (skipped), value calculated using timer frequency.. 601.41 BogoMIPS (lpj=300705)
> > hibernate to disk,
> > Restarting tasks ... done.
> > Clocksource tsc unstable (delta = 499883531 ns)
>
> Hmm. Seems like the clocksource watchdog is having some problem
> handling state around hibernate.
>
> Out of curiousity: Does suspend-to-ram also trigger this?

Suspend to ram isn't supported by the system, disk is the only
available /sys/power/state.

> Can you send me a full dmesg log?

Attached, it was the only log I saw that had bootup through the tsc
marked unstable. Looking at the log I assume I booted to a shell,
hibernated, resumed, then ran init, in that case the system is as
fresh and idle as it gets. This system's been up for two months and I
don't reboot much so I don't remember for sure.

> thanks
> -john

--
David Fries <[email protected]>
http://fries.net/~david/ (PGP encryption key available)


Attachments:
(No filename) (1.51 kB)
dmesg.3.gz (5.82 kB)
Download all attachments

2009-04-30 03:37:50

by john stultz

[permalink] [raw]
Subject: Re: TSC marked unstable on suspend and resume

On Wed, 2009-04-29 at 18:25 -0500, David Fries wrote:
> On Wed, Apr 29, 2009 at 09:45:40AM -0700, john stultz wrote:
> > On Fri, Feb 20, 2009 at 7:39 PM, David Fries <[email protected]> wrote:
> > > I'm getting TSC marked as unsable on hibernate to disk with
> > > 2.6.29-rc3. ?The last kernel I ran 2.6.24.4 ran 300+ days without a
> > > problem and does not loose TSC on hibernate.
> > >
> > > I'm hibernating with `echo disk > /sys/power/state`.
> > > These look like the relavant messages,
> > >
> > > CPU 0 irqstacks, hard=c0379000 soft=c0378000
> > > PID hash table entries: 1024 (order: 10, 4096 bytes)
> > > Fast TSC calibration using PIT
> > > Detected 300.705 MHz processor.
> > > Calibrating delay loop (skipped), value calculated using timer frequency.. 601.41 BogoMIPS (lpj=300705)
> > > hibernate to disk,
> > > Restarting tasks ... done.
> > > Clocksource tsc unstable (delta = 499883531 ns)
> >
> > Hmm. Seems like the clocksource watchdog is having some problem
> > handling state around hibernate.

Yea. This seems to be the case here. There's watchdog code and variables
to handle resume properly, but there isn't a corresponding suspend hook,
so the resume path doesn't actually change anything.

Here's a patch you can try, I only had time to compile test it today, so
its untested. Hopefully I didn't miss anything obvious.

Thomas, I suspect you intended to have something like the following?

Signed-off-by: John Stultz <[email protected]>

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 5a40d14..450431f 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -360,6 +360,7 @@ extern void clocksource_touch_watchdog(void);
extern struct clocksource* clocksource_get_next(void);
extern void clocksource_change_rating(struct clocksource *cs, int rating);
extern void clocksource_resume(void);
+extern void clocksource_suspend(void);

#ifdef CONFIG_GENERIC_TIME_VSYSCALL
extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index ecfd7b5..3dbe55f 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -236,6 +236,11 @@ static void clocksource_resume_watchdog(void)
set_bit(0, &watchdog_resumed);
}

+static void clocksource_suspend_watchdog(void)
+{
+ set_bit(1, &watchdog_resumed);
+}
+
static void clocksource_check_watchdog(struct clocksource *cs)
{
struct clocksource *cse;
@@ -278,6 +283,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
}
spin_unlock_irqrestore(&watchdog_lock, flags);
}
+
#else
static void clocksource_check_watchdog(struct clocksource *cs)
{
@@ -286,6 +292,7 @@ static void clocksource_check_watchdog(struct clocksource *cs)
}

static inline void clocksource_resume_watchdog(void) { }
+static inline void clocksource_suspend_watchdog(void) { }
#endif

/**
@@ -321,6 +328,18 @@ void clocksource_touch_watchdog(void)
}

/**
+ * clocksource_suspend - suspend the clocksource(s)
+ */
+void clocksource_suspend(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&clocksource_lock, flags);
+ clocksource_suspend_watchdog();
+ spin_unlock_irqrestore(&clocksource_lock, flags);
+}
+
+/**
* clocksource_get_next - Returns the selected clocksource
*
*/
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 687dff4..e39e3d3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -368,6 +368,7 @@ static int timekeeping_suspend(struct sys_device *dev, pm_message_t state)

clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);

+ clocksource_suspend();
return 0;
}


2009-05-01 03:14:14

by David Fries

[permalink] [raw]
Subject: Re: TSC marked unstable on suspend and resume

On Wed, Apr 29, 2009 at 08:37:33PM -0700, john stultz wrote:
> On Wed, 2009-04-29 at 18:25 -0500, David Fries wrote:
> > On Wed, Apr 29, 2009 at 09:45:40AM -0700, john stultz wrote:
> > > On Fri, Feb 20, 2009 at 7:39 PM, David Fries <[email protected]> wrote:
> > > > I'm getting TSC marked as unsable on hibernate to disk with
> > > > 2.6.29-rc3. ?The last kernel I ran 2.6.24.4 ran 300+ days without a
> > > > problem and does not loose TSC on hibernate.
> > > >
> > > > I'm hibernating with `echo disk > /sys/power/state`.
> > > > These look like the relavant messages,
> > > >
> > > > CPU 0 irqstacks, hard=c0379000 soft=c0378000
> > > > PID hash table entries: 1024 (order: 10, 4096 bytes)
> > > > Fast TSC calibration using PIT
> > > > Detected 300.705 MHz processor.
> > > > Calibrating delay loop (skipped), value calculated using timer frequency.. 601.41 BogoMIPS (lpj=300705)
> > > > hibernate to disk,
> > > > Restarting tasks ... done.
> > > > Clocksource tsc unstable (delta = 499883531 ns)
> > >
> > > Hmm. Seems like the clocksource watchdog is having some problem
> > > handling state around hibernate.
>
> Yea. This seems to be the case here. There's watchdog code and variables
> to handle resume properly, but there isn't a corresponding suspend hook,
> so the resume path doesn't actually change anything.
>
> Here's a patch you can try, I only had time to compile test it today, so
> its untested. Hopefully I didn't miss anything obvious.
>
> Thomas, I suspect you intended to have something like the following?
>
> Signed-off-by: John Stultz <[email protected]>

I added my own patch to verify your changes were being called, no
dice. I do wonder, is it too late when the clocksource_suspend is
called after Suspending consoles? Any theories why this is the only
system I have that is showing the problem? It is an older system.

In the attached dmesg dump I suspended without a swap file, so it
never hibernated or rebooted.

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 7575f23..387fed4 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -261,6 +261,7 @@ void clocksource_suspend(void)
spin_lock_irqsave(&clocksource_lock, flags);
clocksource_suspend_watchdog();
spin_unlock_irqrestore(&clocksource_lock, flags);
+ printk("Calling %s\n", __func__);
}

/**

--
David Fries <[email protected]>
http://fries.net/~david/ (PGP encryption key available)


Attachments:
(No filename) (2.38 kB)
clocksource_suspend.txt.bz2 (4.40 kB)
Download all attachments