MIME-Version: 1.0
In-Reply-To: <20150818025704.GA1129225@devbig257.prn2.facebook.com>
References: <1439844063-7957-1-git-send-email-john.stultz@linaro.org>
	<1439844063-7957-9-git-send-email-john.stultz@linaro.org>
	<alpine.DEB.2.11.1508172326020.3873@nanos>
	<CALAqxLWOzkbbFeAUKrE9B=O786vuJVmWQdagbfxkpLJDm2mXwQ@mail.gmail.com>
	<20150818025704.GA1129225@devbig257.prn2.facebook.com>
Date: Mon, 17 Aug 2015 20:39:20 -0700
Message-ID: <CALAqxLUj0XMLj03YO7WKYteMYcJbJ0zt=8f3oxmE-X_tsjfKDw@mail.gmail.com>
Subject: Re: [PATCH 8/9] clocksource: Improve unstable clocksource detection
From: John Stultz <john.stultz@linaro.org>
To: Shaohua Li <shli@fb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>, lkml <linux-kernel@vger.kernel.org>,
        Prarit Bhargava <prarit@redhat.com>,
        Richard Cochran <richardcochran@gmail.com>,
        Daniel Lezcano <daniel.lezcano@linaro.org>,
        Ingo Molnar <mingo@kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2487
Lines: 51

On Mon, Aug 17, 2015 at 7:57 PM, Shaohua Li <shli@fb.com> wrote:
> On Mon, Aug 17, 2015 at 03:17:28PM -0700, John Stultz wrote:
>> That said, I agree the "should"s and other vague qualifiers in the
>> commit description you point out should have more specifics to back
>> things up. And I'm fine delaying this (and the follow-on) patch until
>> those details are provided.
>
> It's not something I guess. We do see the issue from time to time. The
> IPMI driver accesses some IO ports in softirq and hog cpu for a very
> long time, then the watchdog alert. The false alert on the other hand
> has very worse effect. It forces to use HPET as clocksource, which has
> very big performance penality. We can't even manually switch back to TSC
> as current interface doesn't allow us to do it, then we can only reboot
> the system. I agree the driver should be fixed, but the watchdog has
> false alert, we definitively should fix it.

I think Thomas is requesting that some of the vague terms be
quantified. Seeing the issue "from time to time" isn't super
informative. When the IPMI driver hogs the cpu "for a very long time",
how long does that  actually take?  You've provided the HPET
frequency, and  the wrapping interval on your hardware. Do these
intervals all line up properly?

I sympathize that the "show-your-work" math problem aspect of this
request might be a little remedial and irritating, esp when the patch
fixes the problem for you. But its important, so later on when some
bug crops up in near by code, folks can easily repeat your calculation
and know the problem isn't from your code.

> The 1s interval is arbitary. If you think there is better way to fix the
> issue, please let me know.

I don't think 1s is necessarily arbitrary. Maybe not much conscious
thought was put into it, but clearly .001 sec wasn't chosen, nor
10minutes for a reason.

So given the intervals you're seeing the problem with, would maybe a
larger max interval (say 30-seconds) make more or less sense? What
would the tradeoffs be? (ie: Would that exclude clocksources with
faster wraps from being used as watchdogs, with your patches?).

I'm sure an good interval could be chosen with some thought, and the
rational be explained. :)

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/