Date: Fri, 2 Dec 2016 09:36:42 +0100 (CET)
From: Thomas Gleixner <tglx@linutronix.de>
To: David Gibson <david@gibson.dropbear.id.au>
cc: John Stultz <john.stultz@linaro.org>,
        lkml <linux-kernel@vger.kernel.org>, Liav Rehana <liavr@mellanox.com>,
        Chris Metcalf <cmetcalf@mellanox.com>,
        Richard Cochran <richardcochran@gmail.com>,
        Ingo Molnar <mingo@kernel.org>, Prarit Bhargava <prarit@redhat.com>,
        Laurent Vivier <lvivier@redhat.com>,
        "Christopher S . Hall" <christopher.s.hall@intel.com>,
        "4.6+" <stable@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH] timekeeping: Change type of nsec variable to unsigned
 in its calculation.
In-Reply-To: <20161201233210.GB31412@umbus.fritz.box>
Message-ID: <alpine.DEB.2.20.1612020921500.4295@nanos>
References: <1479531216-25361-1-git-send-email-john.stultz@linaro.org> <alpine.DEB.2.20.1611291520070.4358@nanos> <20161129235727.GA19891@umbus> <alpine.DEB.2.20.1611302355070.3619@nanos> <20161201021233.GI19891@umbus> <alpine.DEB.2.20.1612011110270.3453@nanos>
 <20161201233210.GB31412@umbus.fritz.box>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2523
Lines: 55

On Fri, 2 Dec 2016, David Gibson wrote:
> On Thu, Dec 01, 2016 at 12:59:51PM +0100, Thomas Gleixner wrote:
> > So I assume that you are talking about a VM which was not scheduled by the
> > host due to overcommitment (who ever thought that this is a good idea) or
> > whatever other reason (yes, people were complaining about wreckage caused
> > by stopping kernels with debuggers) for a long enough time to trigger that
> > overflow situation. If that's the case then the unsigned conversion will
> > just make it more unlikely but it still will happen.
> 
> It was essentially the stopped by debugger case.  I forget exactly
> why, but the guest was being explicitly stopped from outside, it
> wasn't just scheduling lag.  I think it was something in the vicinity
> of 10 minutes stopped.

Ok. Debuggers stopping stuff is one issue, but if I understood Liav
correctly, then he is seing the issue on a heavy loaded machine.

Liav, can you please describe the scenario in detail? Are you observing
this on bare metal or in a VM which gets scheduled out long enough or was
there debugging/hypervisor intervention involved?

> It's long enough ago that I can't be sure, but I thought we'd tried
> various different stoppage periods, which should have also triggered
> the unsigned overflow you're describing, and didn't observe the crash
> once the change was applied.  Note that there have been other changes
> to the timekeeping code since then, which might have made a
> difference.
> 
> I agree that it's not reasonable for the guest to be entirely
> unaffected by such a large stoppage: I'd have no complaints if the
> guest time was messed up, and/or it spewed warnings.  But complete
> guest death seems a rather more fragile response to the situation than
> we'd like.

Guests death? Is it really dead/crashed or just stuck in that endless loop
trying to add that huge negative value piecewise?

That's at least what Liav was describing as he mentioned
__iter_div_u64_rem() explicitely.

While I'm less worried about debuggers, I worry about the real thing.

I agree that we should not starve after resume from a debug stop, but in
that case the least of my worries is time going backwards.

Though if the signed mult overrun is observable in a live system, then we
need to worry about time going backwards even with the unsigned
conversion. Simply because once we fixed the starvation issue people with
insane enough setups will trigger the unsigned overrun and complain about
time going backwards.

Thanks,

	tglx