Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4649232imm; Mon, 18 Jun 2018 19:59:36 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLiqE/yrrqxwKUp5FNV4Sa2QfIiMZg8dNFL7pL/cDl5VxUOwjMgn3Vl2fXmRDzydL9ncOl6 X-Received: by 2002:a63:83c3:: with SMTP id h186-v6mr12896170pge.298.1529377176315; Mon, 18 Jun 2018 19:59:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529377176; cv=none; d=google.com; s=arc-20160816; b=lHZpQVP0U1MI3iQBhTWsCeze4bqBNtlu/HrWMURDQo6jcXBnEP2Aq7Qu7sI047lei3 UlV/m8Hauus01rU2xI1QsR1Dz8v+AHftHOetU+hNHlfwzY48gtfnkwZCvUKDfO2nEyrb VwFnusYeKpbZ08g5mvgUuaoGKDd9CamGUCbXeHNrsxn6A6Or4OWTwGh/OeXIo0+EvMhK 9PHXQkIU/Jwg1toT1WkGUpeOhE2e5GydLwoH4Qq2CNmEWZRljGOP3fGMCbhAGn81bQe3 SnXuDWvanIlgjBuz9foIKmzIVTKB46GXx0u5e5Jrga6duUuAXZwd9WFClDfxyuq+9/jO DxPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:cc:to :subject:reply-to:arc-authentication-results; bh=wFlzgwRYyTDCYG6om1DBoCBTbLONzbpmt3kEacwgwPE=; b=N2EQMkqrdBIdxdG7R9Q02VLL5QSOzcWzkdZsYjKITqohb0H1UUB/4KjAjVUOW8eu25 7zR/95gjR4wGL8mtU+elNGoc078D5mCuwsOxKXDyNVt+UQearS2w8747ffQzCpXMBMr/ lUhLZuyFv0JP9vEqm9OI4BU+ijTtf0VG+IsROQO4yRwhp4i/G+JTDpKWXej4HMtXHDfw 44z3QN9cG8nM8PI+C9Hd3oSFqmmLQrem5MCeb9r6JSaAhzXNh/i7GdGmlUiBH9nS2Xr3 zOMVEDwrW3Lz0t6X6Q8fnn6EJuoN0y2mQBNqI8ZkHeNzfKuAzRp+VAvdxw64g6PvXmtl W4Tg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y8-v6si15072243pfe.241.2018.06.18.19.59.22; Mon, 18 Jun 2018 19:59:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964900AbeFSC5X (ORCPT + 99 others); Mon, 18 Jun 2018 22:57:23 -0400 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:54937 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934644AbeFSC5W (ORCPT ); Mon, 18 Jun 2018 22:57:22 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01422;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0T2ychIM_1529377025; Received: from xunleideMacBook-Pro.local(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0T2ychIM_1529377025) by smtp.aliyun-inc.com(127.0.0.1); Tue, 19 Jun 2018 10:57:06 +0800 Reply-To: xlpang@linux.alibaba.com Subject: Re: [PATCH 1/2] sched/fair: Fix bandwidth timer clock drift condition To: bsegall@google.com Cc: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org References: <20180618091618.21480-1-xlpang@linux.alibaba.com> From: Xunlei Pang Message-ID: Date: Tue, 19 Jun 2018 10:57:05 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=gbk Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/19/18 2:44 AM, bsegall@google.com wrote: > Xunlei Pang writes: > >> The current condition to judge clock drift in expire_cfs_rq_runtime() >> is wrong, the two runtime_expires are actually the same when clock >> drift happens, so this condtion can never hit. The orginal design was >> correctly done by commit a9cf55b28610 ("sched: Expire invalid runtime"), >> but was changed to be the current one due to its locking issue. >> >> This patch introduces another way, it adds a new field in both structure >> cfs_rq and cfs_bandwidth to record the expiration update sequence, and >> use them to figure out if clock drift happens(true if they equal). > > It might just be simplest to revert the comparison change - if we read a > torn value, the worst that happens is we extend incorrectly, and that > is exactly what happens if we just read the old value. > > An extra int isn't exactly the worst thing though, so whichever. I tried that, it might still consume the old runtime in the worst case, I choosed this way considering more cfs_b->runtime_expires change in the 2nd patch. The extra fields added can gurantee the correct control, also it does not increase the total size of the two structures. Thanks, Xunlei > >> >> This fix is also needed by the following patch. >> >> Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period") >> Cc: Ben Segall >> Signed-off-by: Xunlei Pang >> --- >> kernel/sched/fair.c | 14 ++++++++------ >> kernel/sched/sched.h | 6 ++++-- >> 2 files changed, 12 insertions(+), 8 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index e497c05aab7f..9f384264e832 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -4590,6 +4590,7 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) >> now = sched_clock_cpu(smp_processor_id()); >> cfs_b->runtime = cfs_b->quota; >> cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period); >> + cfs_b->expires_seq++; >> } >> >> static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) >> @@ -4612,6 +4613,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) >> struct task_group *tg = cfs_rq->tg; >> struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg); >> u64 amount = 0, min_amount, expires; >> + int expires_seq; >> >> /* note: this is a positive sum as runtime_remaining <= 0 */ >> min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining; >> @@ -4629,6 +4631,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) >> } >> } >> expires = cfs_b->runtime_expires; >> + expires_seq = cfs_b->expires_seq; >> raw_spin_unlock(&cfs_b->lock); >> >> cfs_rq->runtime_remaining += amount; >> @@ -4637,8 +4640,10 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) >> * spread between our sched_clock and the one on which runtime was >> * issued. >> */ >> - if ((s64)(expires - cfs_rq->runtime_expires) > 0) >> + if ((s64)(expires - cfs_rq->runtime_expires) > 0) { >> cfs_rq->runtime_expires = expires; >> + cfs_rq->expires_seq = expires_seq; >> + } >> >> return cfs_rq->runtime_remaining > 0; >> } >> @@ -4664,12 +4669,9 @@ static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq) >> * has not truly expired. >> * >> * Fortunately we can check determine whether this the case by checking >> - * whether the global deadline has advanced. It is valid to compare >> - * cfs_b->runtime_expires without any locks since we only care about >> - * exact equality, so a partial write will still work. >> + * whether the global deadline(cfs_b->expires_seq) has advanced. >> */ >> - >> - if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { >> + if (cfs_rq->expires_seq == cfs_b->expires_seq) { >> /* extend local deadline, drift is bounded above by 2 ticks */ >> cfs_rq->runtime_expires += TICK_NSEC; >> } else { >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h >> index 6601baf2361c..e977e04f8daf 100644 >> --- a/kernel/sched/sched.h >> +++ b/kernel/sched/sched.h >> @@ -334,9 +334,10 @@ struct cfs_bandwidth { >> u64 runtime; >> s64 hierarchical_quota; >> u64 runtime_expires; >> + int expires_seq; >> >> - int idle; >> - int period_active; >> + short idle; >> + short period_active; >> struct hrtimer period_timer; >> struct hrtimer slack_timer; >> struct list_head throttled_cfs_rq; >> @@ -551,6 +552,7 @@ struct cfs_rq { >> >> #ifdef CONFIG_CFS_BANDWIDTH >> int runtime_enabled; >> + int expires_seq; >> u64 runtime_expires; >> s64 runtime_remaining;