Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp1320864pxb; Wed, 10 Feb 2021 05:57:18 -0800 (PST) X-Google-Smtp-Source: ABdhPJzY61Ly6gnnXHmIreodh+rXVQHKI4c985KyiGZZt81nkzYeqaLERij67QBTo1lJDs1eKYjP X-Received: by 2002:a17:906:26ca:: with SMTP id u10mr3045287ejc.165.1612965437954; Wed, 10 Feb 2021 05:57:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612965437; cv=none; d=google.com; s=arc-20160816; b=QC+SCNcJWXfrAZ3IkUpUxwSu7WKkQZOsU79G+/5FKYTdRTbra16Wx+BWE/k+RPtA1e 7y0lblRmrcKQ7ASTllf2W3ffLnqVu8P+c3BXCF4YPBmtXJ5v+A1y/KrCfvBg3Z8IRUSS IoVLpbbS1wyFtZ4Z9ioI/cKBU+JUTdK77zWwzjGE5XyWnd652nkLw+WurLt3a6f0UyVE do0BRLpQEU/73qhf5zji3sCHhk+duv2d0pX+hZ9Ko/z27PN/XZ0pjjbtZ7XX8Ek8NQHO jW0zwS4+m4I0Iac4ER0B9M2Z3RXfoeunbvNdYAn84h7RxnO3enb+haQFznstReBXxC/O ZNxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=/rD0ODAFWZXsqDtzh/zsqRGhdaA02yb5kS7jJ3TKkgs=; b=BO4PuYGULVw+NBm9nJ2XbNLVvXHk463H5JenKSlNlEq056tSrgxoNkw4DR1WyCcfga yz5ZX8BK1YdMYaoFERYM8DNFb0kAUHN5AxRg2wQWFgcIW2XYsEU29syjVzbU1IsjVx0F e9rNL5zfn1nGfmpnrlwcuZ4AGKsCrwNN38B940SBq27prbKBo4hJ03mqrue/cbpQ6qHV 5iUO1g9wOYSKzp3Euk7ND6g85Iar0EMswZALiVkNcqkOQRIXvJ/Tf/FeYoJl969dvuLG wV/70duARzGZIw14wdljYKlsX/oZyV17clzPisp2n4yjDbajuLMDJQi3B1PcRl9Adtne lM9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="p/5jVmjD"; dkim=neutral (no key) header.i=@linutronix.de header.b=lG1bwHUN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y11si1646434edj.305.2021.02.10.05.56.54; Wed, 10 Feb 2021 05:57:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="p/5jVmjD"; dkim=neutral (no key) header.i=@linutronix.de header.b=lG1bwHUN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230148AbhBJNyw (ORCPT + 99 others); Wed, 10 Feb 2021 08:54:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231499AbhBJNyR (ORCPT ); Wed, 10 Feb 2021 08:54:17 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BA43C061786; Wed, 10 Feb 2021 05:53:36 -0800 (PST) Date: Wed, 10 Feb 2021 13:53:33 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1612965213; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/rD0ODAFWZXsqDtzh/zsqRGhdaA02yb5kS7jJ3TKkgs=; b=p/5jVmjDRwzoWjR6Wrf6l7XlnN+ruz5YS1w7hFHgfxDYnn9ug3VoRLwPd9bajpgqYLLorK BAU7+DfTGgY6oAiIG1D7CAZmfbb7M9EOCNDEVK08b/V+/4Fzlt1JlGUk/fEWNVUVBem/pS vHVpq8ifhuXXuqAEqha/Ue2LgCOg4h3srtnaLOooAGmHcAIMrDNGkdlDOlhp2v6RCmaWcz PuWw28iJaCYp/Uh0a36mEVSpBczE/AtPOpHhG05ELalgs/LE1zxerDhXLftA8rw7VwQ106 hJ2tsk/xybr7rQcrV2VRqEb/g2IE5K4AG7+5iBMzvzXEsXNN3LwafO1s4qsy/Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1612965213; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/rD0ODAFWZXsqDtzh/zsqRGhdaA02yb5kS7jJ3TKkgs=; b=lG1bwHUNH3Pp1AmpydqK9yDJ666oIG8kOnJHGlA1zK+s0V+m4eQ/gGTfEOlfYjFNSUw7vh bg6NvCA2b6RBDhBw== From: "tip-bot2 for Juri Lelli" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/features: Fix hrtick reprogramming Cc: Juri Lelli , "Luis Claudio R. Goncalves" , Daniel Bristot de Oliveira , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20210208073554.14629-2-juri.lelli@redhat.com> References: <20210208073554.14629-2-juri.lelli@redhat.com> MIME-Version: 1.0 Message-ID: <161296521335.23325.17083054579544872785.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the sched/core branch of tip: Commit-ID: 0abadfdf696f648ed32fa1bd16d4e0358de19bab Gitweb: https://git.kernel.org/tip/0abadfdf696f648ed32fa1bd16d4e0358de19bab Author: Juri Lelli AuthorDate: Mon, 08 Feb 2021 08:35:53 +01:00 Committer: Peter Zijlstra CommitterDate: Wed, 10 Feb 2021 14:44:49 +01:00 sched/features: Fix hrtick reprogramming Hung tasks and RCU stall cases were reported on systems which were not 100% busy. Investigation of such unexpected cases (no sign of potential starvation caused by tasks hogging the system) pointed out that the periodic sched tick timer wasn't serviced anymore after a certain point and that caused all machinery that depends on it (timers, RCU, etc.) to stop working as well. This issues was however only reproducible if HRTICK was enabled. Looking at core dumps it was found that the rbtree of the hrtimer base used also for the hrtick was corrupted (i.e. next as seen from the base root and actual leftmost obtained by traversing the tree are different). Same base is also used for periodic tick hrtimer, which might get "lost" if the rbtree gets corrupted. Much alike what described in commit 1f71addd34f4c ("tick/sched: Do not mess with an enqueued hrtimer") there is a race window between hrtimer_set_expires() in hrtick_start and hrtimer_start_expires() in __hrtick_restart() in which the former might be operating on an already queued hrtick hrtimer, which might lead to corruption of the base. Use hrtick_start() (which removes the timer before enqueuing it back) to ensure hrtick hrtimer reprogramming is entirely guarded by the base lock, so that no race conditions can occur. Signed-off-by: Juri Lelli Signed-off-by: Luis Claudio R. Goncalves Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210208073554.14629-2-juri.lelli@redhat.com --- kernel/sched/core.c | 8 +++----- kernel/sched/sched.h | 1 + 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index cec507b..18d51ab 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -355,8 +355,9 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer) static void __hrtick_restart(struct rq *rq) { struct hrtimer *timer = &rq->hrtick_timer; + ktime_t time = rq->hrtick_time; - hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED_HARD); + hrtimer_start(timer, time, HRTIMER_MODE_ABS_PINNED_HARD); } /* @@ -380,7 +381,6 @@ static void __hrtick_start(void *arg) void hrtick_start(struct rq *rq, u64 delay) { struct hrtimer *timer = &rq->hrtick_timer; - ktime_t time; s64 delta; /* @@ -388,9 +388,7 @@ void hrtick_start(struct rq *rq, u64 delay) * doesn't make sense and can cause timer DoS. */ delta = max_t(s64, delay, 10000LL); - time = ktime_add_ns(timer->base->get_time(), delta); - - hrtimer_set_expires(timer, time); + rq->hrtick_time = ktime_add_ns(timer->base->get_time(), delta); if (rq == this_rq()) __hrtick_restart(rq); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2185b3b..0dfdd52 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1031,6 +1031,7 @@ struct rq { call_single_data_t hrtick_csd; #endif struct hrtimer hrtick_timer; + ktime_t hrtick_time; #endif #ifdef CONFIG_SCHEDSTATS