Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932206Ab0G3L4B (ORCPT ); Fri, 30 Jul 2010 07:56:01 -0400 Received: from mail-ey0-f174.google.com ([209.85.215.174]:51458 "EHLO mail-ey0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932175Ab0G3Lz7 convert rfc822-to-8bit (ORCPT ); Fri, 30 Jul 2010 07:55:59 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=jyFX6Ccx8oqKHB+M7ANfJfhhNrDEd9lQxHQ2lUe+GFs16KU30vo4VrCBcCZa4sg5tj JrBqKz8AHw3XlpV5KUJIdwRkzoJKoXaoPceh3L2CN9v32qpvcMzVHIizj45IIzWQlx27 HF+LXBqz8dcFRMSwf/C7q6ay/OeH5S4GUPtVE= MIME-Version: 1.0 In-Reply-To: References: Date: Fri, 30 Jul 2010 17:25:57 +0530 Message-ID: Subject: Re: clock drift in set_task_cpu() From: Jack Daniel To: Peter Zijlstra , Peter Zijlstra , Ingo Molnar Cc: LKML Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3081 Lines: 83 Hi Peter, As a follow up on this... On Wed, Jul 21, 2010 at 5:10 PM, Jack Daniel wrote: > Hi Peter/Ingo, > > I have a query with the kernel code that was changed not too long time > back in v2.6.33-rc1 commit id 5afcdab706d6002cb02b567ba46e650215e694e8 > [tip:sched/urgent] sched: Remove rq->clock coupling from > set_task_cpu() > > void set_task_cpu(struct task_struct *p, unsigned int new_cpu) > { > int old_cpu = task_cpu(p); > struct rq *old_rq = cpu_rq(old_cpu), *new_rq = cpu_rq(new_cpu); > struct cfs_rq *old_cfsrq = task_cfs_rq(p), > ? ? ?*new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu); > u64 clock_offset; > > clock_offset = old_rq->clock - new_rq->clock; > --- > > On a Xeon 55xx with 8 CPU's, I found out the new_rq->clock value is > sometimes larger than old_rq->clock and so clock_offset tends to warp > around leading to incorrect values. You have very correctly noted in > the commit header that all functions that access set_task_cpu() must > do so after a call to sched_clock_remote(), in this case the function > is sched_fork(). I validated by adding update_rq_clock(old_rq); into > set_task_cpu() and that seems to fix the issue. But I noticed that > since CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is already set, if > (sched_clock_stable) ?in sched_clock_cpu() will yield to true and the > flow never gets to sched_clock_remote() or sched_clock_local(). > > What do you think is the best way to approach the problem *assuming > the older kernel*, since I believe the problem still exists? That is > to reinstate your axiom ".... which should ensure the observed time > between these two cpus is monotonic" > > 1) CONFIG_HAVE_UNSTABLE_SCHED_CLOCK cannot be disabled since it is set > by default for x86 > 2) Does one create a new function with just this line of code? > fix_clock_drift() > { > if (cpu != smp_processor_id()) > ? ? ? ? ? ? ? ?clock = sched_clock_remote(scd); > ? ? ? ?else > ? ? ? ? ? ? ? ?clock = sched_clock_local(scd); > > ? ? ? ?return clock; > } > I bet you would have had come across this problem and hence chose to surgically remove the impeding code with commit 5afcdab. I now think it was a good choice but the right thing would have been to correct the problem itself. I think this code should have solved the problem. diff --git a/kernel/sched.c b/kernel/sched.c index 1d39b00..5fd63f2 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2068,6 +2068,13 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) struct cfs_rq *old_cfsrq = task_cfs_rq(p), *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu); u64 clock_offset; + unsigned long flags; + + rmb(); + local_irq_save(flags); + update_rq_clock(old_rq); + update_rq_clock(new_rq); + local_irq_restore(flags); Thanks and regards, Jack -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/