Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758817AbXHWLIh (ORCPT ); Thu, 23 Aug 2007 07:08:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755158AbXHWLIa (ORCPT ); Thu, 23 Aug 2007 07:08:30 -0400 Received: from nf-out-0910.google.com ([64.233.182.191]:35611 "EHLO nf-out-0910.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753032AbXHWLI3 (ORCPT ); Thu, 23 Aug 2007 07:08:29 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:sender:to:subject:cc:mime-version:content-type:content-transfer-encoding:content-disposition:x-google-sender-auth; b=RbeahuZKxEHzOujiX2OHC1xB6EvZo4cwA7/1tHSRxHWZlk+fFZv7B+zPHpZF1ZV+niVtqPbLyWRyYPrqcwAnuputZ+Lelzu4yn+ssSe3sHyjGePGX0ildlfKS+HD69WcG91S4rSOeFeRtmBs6Lw+Oy/J9n8P7Tj5XvV+/jBTF8Q= Message-ID: <3c1737210708230408i7a8049a9m5db49e6c4d89ab62@mail.gmail.com> Date: Thu, 23 Aug 2007 13:08:27 +0200 From: "Michael Smith" To: linux-kernel@vger.kernel.org Subject: gettimeofday() jumping into the future Cc: "Andy Wingo" MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Google-Sender-Auth: 5a8749259c2cfd5e Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2309 Lines: 52 Hi, We've been seeing some strange behaviour on some of our applications recently. I've tracked this down to gettimeofday() returning spurious values occasionally. Specifically, gettimeofday() will suddenly, for a single call, return a value about 4398 seconds (~1 hour 13 minutes) in the future. The following call goes back to a normal value. This seems to be occurring when the clock source goes slightly backwards for a single call. In kernel/time/timekeeping.c:__get_nsec_offset(), we have this: cycle_delta = (cycle_now - clock->cycle_last) & clock->mask; So a small decrease in time here will (this is all unsigned arithmetic) give us a very large cycle_delta. cyc2ns() then multiplies this by some value, then right shifts by 22. The resulting value (in nanoseconds) is approximately 4398 seconds; this gets added on to the xtime value, giving us our jump into the future. The next call to gettimeofday() returns to normal as we don't have this huge nanosecond offset. This system is a 2-socket core 2 quad machine (8 cpus), running 32 bit mode. It's a dell poweredge 1950. The kernel selects the TSC as the clock source, having determined that the tsc runs synchronously on this system. Switching the systems to use a different time source seems to make the problem go away (which is fine for us, but we'd like to get this fixed properly upstream). We've also seen this behaviour with a synthetic test program (which just runs 4 threads all calling gettimeofday() in a loop as fast as possible and testing that it doesn't jump) on an older machine, a dell poweredge SC1425 with two p4 hyperthreaded xeons. Can anyone advise on what's going wrong here? I can't find much in the way of documentation on whether the TSC is guaranteed to be monotonically increasing on intel systems. Should the code choose not to use the TSC? Or should the TSC reading code ensure that the returned values are monotonic? Is there any more information that would be useful? I'll be on a plane for most of tomorrow, so might be a little slow responding. Thanks, Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/