Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3687792imm; Mon, 18 Jun 2018 02:18:25 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJLa7Cz9xubsPPVwv7ML7+5CrCcrdXfQNZoU7lL+lsDrUOH10NOCOmKuv9mh6AheX4BVBEz X-Received: by 2002:a62:aa18:: with SMTP id e24-v6mr12586041pff.72.1529313505271; Mon, 18 Jun 2018 02:18:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529313505; cv=none; d=google.com; s=arc-20160816; b=F7icGHOEke4EgsXUy6t1pHnHu1rvaB+aKSn03yWvLqPwSnDU/N3tVUkC20SgE6h3Pp djIdewCTcqoaEAWBNHtDX4K7Y15ZryWsQ7egPeJbosR1oUSO+ryU3GzJFH0VciTd9yYN 1B1elq4/dCx4ZH5Hfr0GwqTzYe21B0hJ/pHI3rn525ydtAKkN6Tn75ALt95JakNeamwY 9wZiQ76DF86ourXaoFULXuIEiqrcvgnB0FsF8MapA/kb85GVlXpqJTVWflKc7sw6d4wQ a6PpbvKTr+eCqmpzC43ZYOYNEVoFt5ecjWmhJraS5hIZRM+uDCpqSPGkMk94fs40ULsY oihA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=GvcWBMQY64ew1Ex5RX3q040+G2C/XztjO1FJUBiwI9Q=; b=vSv9nZgY3bAhnyZ0iKRXrJw41gOJnbSdtHaovpcKTtRNV5fYQyVjGXQgLSnTf/PP49 0i++7GTmIM657oR215OmUTJK0vL1Z4Ob5RJwMNV0wDA0hvzqg9xYF8ZbYTavv6UhheKC HMVP53ZZgUZX8dlseJSRSxI2MadaZ6EcSFeewD680sD0t/LApwXR2wdPB9rlOJsNrFI8 51xMk8l9mnmjIaRKmz1w7/kvek2f0rkMFiTTT199w30DBSuGlkX5TNe1B1rUrri2ACyk e8QD+k24OXgUv2ktA4IrlS25pb0fAAeqY2YtKYTdNWLBRFYCrNA+ywaJHqlLqDAQ0kLN tHYg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w11-v6si14200401plz.333.2018.06.18.02.18.11; Mon, 18 Jun 2018 02:18:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966733AbeFRJQn (ORCPT + 99 others); Mon, 18 Jun 2018 05:16:43 -0400 Received: from out30-131.freemail.mail.aliyun.com ([115.124.30.131]:41924 "EHLO out30-131.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965495AbeFRJQk (ORCPT ); Mon, 18 Jun 2018 05:16:40 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R861e4;CH=green;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04455;MF=xlpang@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0T2wux.s_1529313378; Received: from localhost(mailfrom:xlpang@linux.alibaba.com fp:SMTPD_---0T2wux.s_1529313378) by smtp.aliyun-inc.com(127.0.0.1); Mon, 18 Jun 2018 17:16:35 +0800 From: Xunlei Pang To: Peter Zijlstra , Ingo Molnar , Ben Segall Cc: linux-kernel@vger.kernel.org Subject: [PATCH 1/2] sched/fair: Fix bandwidth timer clock drift condition Date: Mon, 18 Jun 2018 17:16:17 +0800 Message-Id: <20180618091618.21480-1-xlpang@linux.alibaba.com> X-Mailer: git-send-email 2.14.1.40.g8e62ba1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The current condition to judge clock drift in expire_cfs_rq_runtime() is wrong, the two runtime_expires are actually the same when clock drift happens, so this condtion can never hit. The orginal design was correctly done by commit a9cf55b28610 ("sched: Expire invalid runtime"), but was changed to be the current one due to its locking issue. This patch introduces another way, it adds a new field in both structure cfs_rq and cfs_bandwidth to record the expiration update sequence, and use them to figure out if clock drift happens(true if they equal). This fix is also needed by the following patch. Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period") Cc: Ben Segall Signed-off-by: Xunlei Pang --- kernel/sched/fair.c | 14 ++++++++------ kernel/sched/sched.h | 6 ++++-- 2 files changed, 12 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e497c05aab7f..9f384264e832 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4590,6 +4590,7 @@ void __refill_cfs_bandwidth_runtime(struct cfs_bandwidth *cfs_b) now = sched_clock_cpu(smp_processor_id()); cfs_b->runtime = cfs_b->quota; cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period); + cfs_b->expires_seq++; } static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) @@ -4612,6 +4613,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) struct task_group *tg = cfs_rq->tg; struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg); u64 amount = 0, min_amount, expires; + int expires_seq; /* note: this is a positive sum as runtime_remaining <= 0 */ min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining; @@ -4629,6 +4631,7 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) } } expires = cfs_b->runtime_expires; + expires_seq = cfs_b->expires_seq; raw_spin_unlock(&cfs_b->lock); cfs_rq->runtime_remaining += amount; @@ -4637,8 +4640,10 @@ static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) * spread between our sched_clock and the one on which runtime was * issued. */ - if ((s64)(expires - cfs_rq->runtime_expires) > 0) + if ((s64)(expires - cfs_rq->runtime_expires) > 0) { cfs_rq->runtime_expires = expires; + cfs_rq->expires_seq = expires_seq; + } return cfs_rq->runtime_remaining > 0; } @@ -4664,12 +4669,9 @@ static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq) * has not truly expired. * * Fortunately we can check determine whether this the case by checking - * whether the global deadline has advanced. It is valid to compare - * cfs_b->runtime_expires without any locks since we only care about - * exact equality, so a partial write will still work. + * whether the global deadline(cfs_b->expires_seq) has advanced. */ - - if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { + if (cfs_rq->expires_seq == cfs_b->expires_seq) { /* extend local deadline, drift is bounded above by 2 ticks */ cfs_rq->runtime_expires += TICK_NSEC; } else { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 6601baf2361c..e977e04f8daf 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -334,9 +334,10 @@ struct cfs_bandwidth { u64 runtime; s64 hierarchical_quota; u64 runtime_expires; + int expires_seq; - int idle; - int period_active; + short idle; + short period_active; struct hrtimer period_timer; struct hrtimer slack_timer; struct list_head throttled_cfs_rq; @@ -551,6 +552,7 @@ struct cfs_rq { #ifdef CONFIG_CFS_BANDWIDTH int runtime_enabled; + int expires_seq; u64 runtime_expires; s64 runtime_remaining; -- 2.14.1.40.g8e62ba1