Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp6254088imu; Mon, 21 Jan 2019 05:55:27 -0800 (PST) X-Google-Smtp-Source: ALg8bN6EQAjKW1OJzDekw91LOutUep1ksEDePu7o2vKrwRTrh7k9kSVLmV3I767n4BpcKIBzXVSZ X-Received: by 2002:a63:5964:: with SMTP id j36mr28198341pgm.210.1548078927789; Mon, 21 Jan 2019 05:55:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1548078927; cv=none; d=google.com; s=arc-20160816; b=kQtkd2kvWs+lYJ2SPM1lpd/g3fvQUASeVBcFzcPkJ9J1kEX/5ylB27cYpm4FQ9Mdhd 7hCjSxzoeu/F+1ekAipgNE4uaVEVs8XqVggexPKNlWEThEi6lbpTrxI37HDywoCU37N5 xiNr4FXmZepDoTO5H/RZh8unGtpXgbjNL2kS/mX4AWAi4cDk31i85J8aRBnhiz9Ie6Ie afpF/s/dCT8LoBiu540bBfYxGovXLh6/3P+Bc3ov0oIAzpBDDVVAaDw2lI3p8tZkZipo w7K/GvfZqKLvFVfiaZoDelusEYSiWTrlLzQRjerdz+IbTRs0dRQdLnBhBrOBNq1FgJCi JbVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=krdyKjwcJbC8XgkP9lQs6kWTmiZl3d4By73o6bCHnRM=; b=wCDCDliQEyVykXlJ0173JqiCgixAO+eIRPrJ0AYCMWmvHaZW4UJlbiDfl5DsSRV/K7 pPbALpdpLTlUU5HHL1F0KHrE1gGWW+jzZ8hXJK1AUK7EO1rUnJ3G/jTrgdklAYxm9HYR qbWKIhHOsn8tucSn/hPooO8aFd5KzllVxwi9NoX3w7zgEExP2qafjEf7wJ/9H9k0nHmz YuWcKIU7Xm6EHiQ3d0JXESVErHKhAksKGR0RkIkadOzgZ6ksPQK20AF2P8f2xhClsv02 HhAFFzLNHVYh03R5djRCJiM4VgBOAIZyZisCDiggPRKNEygyNcvd264Vfm6wrBA5HgEE 0Z9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=inH3Gree; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a19si13493031pgn.102.2019.01.21.05.55.12; Mon, 21 Jan 2019 05:55:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=inH3Gree; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731133AbfAUNwr (ORCPT + 99 others); Mon, 21 Jan 2019 08:52:47 -0500 Received: from mail.kernel.org ([198.145.29.99]:36026 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731117AbfAUNwo (ORCPT ); Mon, 21 Jan 2019 08:52:44 -0500 Received: from localhost (5356596B.cm-6-7b.dynamic.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B4D7F2084C; Mon, 21 Jan 2019 13:52:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1548078763; bh=Ed/kZ/Ja50aRv6IIqtTR9SCFlA7rF7doPLuiu1kubEY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=inH3GreeazALLZM2Jy0A1vi3EOCA0xl3QhxevYd40dN89NzoTtrONn9m7edUlMasz oWfjIVpn9k6Y9/PbUqpslZbKfMirvj2PNuGvfdC5eowwg+TGX0VrJaMeeTsiMDho6Q VCQ/+Z2BLq342mq1gR6JJbmwDn3wPkbQONNVXkO4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Xunlei Pang , "Peter Zijlstra (Intel)" , Alakesh Haloi , Ben Segall , Linus Torvalds , Thomas Gleixner , Ingo Molnar Subject: [PATCH 4.14 08/59] sched/fair: Fix bandwidth timer clock drift condition Date: Mon, 21 Jan 2019 14:43:33 +0100 Message-Id: <20190121122457.405562328@linuxfoundation.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190121122456.529172919@linuxfoundation.org> References: <20190121122456.529172919@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review X-Patchwork-Hint: ignore MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Xunlei Pang commit 512ac999d2755d2b7109e996a76b6fb8b888631d upstream. I noticed that cgroup task groups constantly get throttled even if they have low CPU usage, this causes some jitters on the response time to some of our business containers when enabling CPU quotas. It's very simple to reproduce: mkdir /sys/fs/cgroup/cpu/test cd /sys/fs/cgroup/cpu/test echo 100000 > cpu.cfs_quota_us echo $$ > tasks then repeat: cat cpu.stat | grep nr_throttled # nr_throttled will increase steadily After some analysis, we found that cfs_rq::runtime_remaining will be cleared by expire_cfs_rq_runtime() due to two equal but stale "cfs_{b|q}->runtime_expires" after period timer is re-armed. The current condition to judge clock drift in expire_cfs_rq_runtime() is wrong, the two runtime_expires are actually the same when clock drift happens, so this condtion can never hit. The orginal design was correctly done by this commit: a9cf55b28610 ("sched: Expire invalid runtime") ... but was changed to be the current implementation due to its locking bug. This patch introduces another way, it adds a new field in both structures cfs_rq and cfs_bandwidth to record the expiration update sequence, and uses them to figure out if clock drift happens (true if they are equal). Signed-off-by: Xunlei Pang Signed-off-by: Peter Zijlstra (Intel) [alakeshh: backport: Fixed merge conflicts: - sched.h: Fix the indentation and order in which the variables are declared to match with coding style of the existing code in 4.14 Struct members of same type were declared in separate lines in upstream patch which has been changed back to having multiple members of same type in the same line. e.g. int a; int b; -> int a, b; ] Signed-off-by: Alakesh Haloi Reviewed-by: Ben Segall Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: # 4.14.x Fixes: 51f2176d74ac ("sched/fair: Fix unlocked reads of some cfs_b->quota/period") Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/sched/fair.c | 14 ++++++++------ kernel/sched/sched.h | 4 +++- 2 files changed, 11 insertions(+), 7 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4087,6 +4087,7 @@ void __refill_cfs_bandwidth_runtime(stru now = sched_clock_cpu(smp_processor_id()); cfs_b->runtime = cfs_b->quota; cfs_b->runtime_expires = now + ktime_to_ns(cfs_b->period); + cfs_b->expires_seq++; } static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg) @@ -4109,6 +4110,7 @@ static int assign_cfs_rq_runtime(struct struct task_group *tg = cfs_rq->tg; struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(tg); u64 amount = 0, min_amount, expires; + int expires_seq; /* note: this is a positive sum as runtime_remaining <= 0 */ min_amount = sched_cfs_bandwidth_slice() - cfs_rq->runtime_remaining; @@ -4125,6 +4127,7 @@ static int assign_cfs_rq_runtime(struct cfs_b->idle = 0; } } + expires_seq = cfs_b->expires_seq; expires = cfs_b->runtime_expires; raw_spin_unlock(&cfs_b->lock); @@ -4134,8 +4137,10 @@ static int assign_cfs_rq_runtime(struct * spread between our sched_clock and the one on which runtime was * issued. */ - if ((s64)(expires - cfs_rq->runtime_expires) > 0) + if (cfs_rq->expires_seq != expires_seq) { + cfs_rq->expires_seq = expires_seq; cfs_rq->runtime_expires = expires; + } return cfs_rq->runtime_remaining > 0; } @@ -4161,12 +4166,9 @@ static void expire_cfs_rq_runtime(struct * has not truly expired. * * Fortunately we can check determine whether this the case by checking - * whether the global deadline has advanced. It is valid to compare - * cfs_b->runtime_expires without any locks since we only care about - * exact equality, so a partial write will still work. + * whether the global deadline(cfs_b->expires_seq) has advanced. */ - - if (cfs_rq->runtime_expires != cfs_b->runtime_expires) { + if (cfs_rq->expires_seq == cfs_b->expires_seq) { /* extend local deadline, drift is bounded above by 2 ticks */ cfs_rq->runtime_expires += TICK_NSEC; } else { --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -281,8 +281,9 @@ struct cfs_bandwidth { u64 quota, runtime; s64 hierarchical_quota; u64 runtime_expires; + int expires_seq; - int idle, period_active; + short idle, period_active; struct hrtimer period_timer, slack_timer; struct list_head throttled_cfs_rq; @@ -488,6 +489,7 @@ struct cfs_rq { #ifdef CONFIG_CFS_BANDWIDTH int runtime_enabled; + int expires_seq; u64 runtime_expires; s64 runtime_remaining;