Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758042AbcLCAd6 (ORCPT ); Fri, 2 Dec 2016 19:33:58 -0500 Received: from ozlabs.org ([103.22.144.67]:39417 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755299AbcLCAd4 (ORCPT ); Fri, 2 Dec 2016 19:33:56 -0500 Date: Sat, 3 Dec 2016 11:33:09 +1100 From: David Gibson To: Thomas Gleixner Cc: John Stultz , lkml , Liav Rehana , Chris Metcalf , Richard Cochran , Ingo Molnar , Prarit Bhargava , Laurent Vivier , "Christopher S . Hall" , "4.6+" , Peter Zijlstra Subject: Re: [PATCH] timekeeping: Change type of nsec variable to unsigned in its calculation. Message-ID: <20161203003309.GL10089@umbus.fritz.box> References: <1479531216-25361-1-git-send-email-john.stultz@linaro.org> <20161129235727.GA19891@umbus> <20161201021233.GI19891@umbus> <20161201233210.GB31412@umbus.fritz.box> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="1X+6QtwRodzgDPAC" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4162 Lines: 102 --1X+6QtwRodzgDPAC Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Dec 02, 2016 at 09:36:42AM +0100, Thomas Gleixner wrote: > On Fri, 2 Dec 2016, David Gibson wrote: > > On Thu, Dec 01, 2016 at 12:59:51PM +0100, Thomas Gleixner wrote: > > > So I assume that you are talking about a VM which was not scheduled b= y the > > > host due to overcommitment (who ever thought that this is a good idea= ) or > > > whatever other reason (yes, people were complaining about wreckage ca= used > > > by stopping kernels with debuggers) for a long enough time to trigger= that > > > overflow situation. If that's the case then the unsigned conversion w= ill > > > just make it more unlikely but it still will happen. > >=20 > > It was essentially the stopped by debugger case. I forget exactly > > why, but the guest was being explicitly stopped from outside, it > > wasn't just scheduling lag. I think it was something in the vicinity > > of 10 minutes stopped. >=20 > Ok. Debuggers stopping stuff is one issue, but if I understood Liav > correctly, then he is seing the issue on a heavy loaded machine. Right. I can't speak to other situations which might trigger this. > Liav, can you please describe the scenario in detail? Are you observing > this on bare metal or in a VM which gets scheduled out long enough or was > there debugging/hypervisor intervention involved? >=20 > > It's long enough ago that I can't be sure, but I thought we'd tried > > various different stoppage periods, which should have also triggered > > the unsigned overflow you're describing, and didn't observe the crash > > once the change was applied. Note that there have been other changes > > to the timekeeping code since then, which might have made a > > difference. > >=20 > > I agree that it's not reasonable for the guest to be entirely > > unaffected by such a large stoppage: I'd have no complaints if the > > guest time was messed up, and/or it spewed warnings. But complete > > guest death seems a rather more fragile response to the situation than > > we'd like. >=20 > Guests death? Is it really dead/crashed or just stuck in that endless loop > trying to add that huge negative value piecewise? Well, I don't know. But the point was it was unusable from the console, and didn't come back any time soon. > That's at least what Liav was describing as he mentioned > __iter_div_u64_rem() explicitely. >=20 > While I'm less worried about debuggers, I worry about the real thing. >=20 > I agree that we should not starve after resume from a debug stop, but in > that case the least of my worries is time going backwards. >=20 > Though if the signed mult overrun is observable in a live system, then we > need to worry about time going backwards even with the unsigned > conversion. Simply because once we fixed the starvation issue people with > insane enough setups will trigger the unsigned overrun and complain about > time going backwards. >=20 > Thanks, >=20 > tglx >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --1X+6QtwRodzgDPAC Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJYQhLFAAoJEGw4ysog2bOSFsUP/03Lg7wkTGDQU/dvMDreD+aK OzPXgtSKnL52mivQcg7C6H2O1ZfFyyG43H5OZoeK/DP+goEpW0lt6qOw4YVyVMbe mPe7NhijKcIe7v1VTnlTM7XwvtiE0EwV/1tvOxkMirah7rVJiAnTXdsoDRIyXA/t hU94bTFLuRfgXDuCVBA2EAwxOSn7XFL6H0jVyhk7wNvQlo/E7ojFMQXr4RxpERWq kECrXCGi2+enumAkfKFtlpJf9lBE0BXZz+FzZBrZUpMmC4i6HS0Yscryd6x+esw4 BoZuKoxYoqBeXg7sitgnNA3MoTnsX3VXNseijjuksbaWYILjJ7fu43v87XDoydOi E1o463yYOo3SKzBq8Lm7SqWtt91edVFeXWvlGSSfhJQv3W8CO5MljmisVyiNUHqo OhTzt2fG0+11X6KCc8U+/5704AO54h3A71pTHwO18epFChJw9+W2YWpAwWLZdRdF HckXoNCsrNlgavhYrWAZ1OKJ6W2A8q7KQxVMuH4dglnlmHnI4AfWPrfrU4tVGU3U HSL8rZyJbKV44rUnVSVv7gHKqaYCvh+2hnMUwoBi9lAj0uCQ26sJ3vcKDJqUL5z0 lofHm8LgiVnfGz+g4J/qulkJtQ5t2B37fC39gq5xfwUhiO6FGeFZ5RLmjOuG9p+p bDr7Wg1SQMscYgy7Sst+ =1cXk -----END PGP SIGNATURE----- --1X+6QtwRodzgDPAC--