Received: by 2002:a5b:505:0:0:0:0:0 with SMTP id o5csp2088757ybp; Thu, 10 Oct 2019 02:00:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqybO1K4BjnJYdGWN1iewmyyzBBQ/oh7Dxf15ySu1CaVohhLnHs1z2PLCCG1WJ+xWpgy3yW8 X-Received: by 2002:a17:906:1542:: with SMTP id c2mr6931287ejd.80.1570698024691; Thu, 10 Oct 2019 02:00:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1570698024; cv=none; d=google.com; s=arc-20160816; b=qZhcQOy50FKwC0Syfiy7iPEPcgfnkf2AYeFeLFvGqdGgMSgHprByVTXD4UYEx2oxLn uLIzzOI0rWQ7zqf6wSu9JB9/zwVZLIRMsT5x5qlpv+xooGg2JFVgIRG84piwPb84RSkw uUj96CyvU3etNsbPWiVnzYf+pPfANUFq5nkLACVU62jYJkvCS6JSf8A/6vq2Wl42Cy5+ sh9mytcWd0JFzAdR7flIj60WVn8QdMLGcDU1zL4CsluVcSEnK/TYJhYGjmEAcibSyL97 tdf2eJKdSbSXNu1Sf8QKrvu8hPRUm9xkn95jssNLzvOconkR+VOaFtCkzIt8tPB3VwFh E3FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=glyCnewQI9r5hBDmCsXA7wpSsGUIdEuelC2FFZi6YlE=; b=RwpYHm0QVtjfcNoDZMkmSDAgjUFhY+yovCLHsDCQHZilXXdkB2JUuPmCy8RnRcdxcu p1BBruFVna5J/WQ0/E2SHievPi2DGIo2fk0KANnzDVJe87cS64Avn6xuaA2wE9cqeNkQ s3XjYUhVZFEpPMRezqjSWSGGx8r592/pgJVvI9pb4/ncQy81RM1mqfGkjaEMXQHsWtbR 3ZWjmu1ivXK+4IZyh6HOjWUbwW9T4IGYBm2ofXiCQg4sZSV5Tj8jJ8o8leLXFE8Usrok aqLn1/bwTtWGLV4r5YiZQicRiUKydur7mxjwwgM28/3+gs6yJKdNbf0xVNzGswYvxHTd Wpiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=HN7VgKMc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id gh3si2588761ejb.306.2019.10.10.02.00.01; Thu, 10 Oct 2019 02:00:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=HN7VgKMc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388774AbfJJInm (ORCPT + 99 others); Thu, 10 Oct 2019 04:43:42 -0400 Received: from mail.kernel.org ([198.145.29.99]:48518 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387749AbfJJInj (ORCPT ); Thu, 10 Oct 2019 04:43:39 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DC4A621D6C; Thu, 10 Oct 2019 08:43:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570697018; bh=Juyn08iySSOZ2HAzZg/2OdnBcyAXJyMLacNSIqx3PyU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=HN7VgKMcMfb+8QhYBHZ3e06+HZJ7uiXpOsl+Og1fF3dU+rE7n7HECnyCwyS3uJG4W h5MgmFuH/2IKXovbkguVvovwP5F25H2uBDCtvo2knXz6YzrwJOMaTI7AnSRhdaSHz1 /2DnWnO7IIF7WdrzH5lU18Jgnh/L9QvrBUU54/EI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Balasubramani Vivekanandan , Thomas Gleixner , Sasha Levin Subject: [PATCH 5.3 138/148] tick: broadcast-hrtimer: Fix a race in bc_set_next Date: Thu, 10 Oct 2019 10:36:39 +0200 Message-Id: <20191010083620.712424142@linuxfoundation.org> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20191010083609.660878383@linuxfoundation.org> References: <20191010083609.660878383@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Balasubramani Vivekanandan [ Upstream commit b9023b91dd020ad7e093baa5122b6968c48cc9e0 ] When a cpu requests broadcasting, before starting the tick broadcast hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide the required synchronization when the callback is active on other core. The callback could have already executed tick_handle_oneshot_broadcast() and could have also returned. But still there is a small time window where the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns without doing anything, but the next_event of the tick broadcast clock device is already set to a timeout value. In the race condition diagram below, CPU #1 is running the timer callback and CPU #2 is entering idle state and so calls bc_set_next(). In the worst case, the next_event will contain an expiry time, but the hrtimer will not be started which happens when the racing callback returns HRTIMER_NORESTART. The hrtimer might never recover if all further requests from the CPUs to subscribe to tick broadcast have timeout greater than the next_event of tick broadcast clock device. This leads to cascading of failures and finally noticed as rcu stall warnings Here is a depiction of the race condition CPU #1 (Running timer callback) CPU #2 (Enter idle and subscribe to tick broadcast) --------------------- --------------------- __run_hrtimer() tick_broadcast_enter() bc_handler() __tick_broadcast_oneshot_control() tick_handle_oneshot_broadcast() raw_spin_lock(&tick_broadcast_lock); dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock //next_event for tick broadcast clock set to KTIME_MAX since no other cores subscribed to tick broadcasting raw_spin_unlock(&tick_broadcast_lock); if (dev->next_event == KTIME_MAX) return HRTIMER_NORESTART // callback function exits without restarting the hrtimer //tick_broadcast_lock acquired raw_spin_lock(&tick_broadcast_lock); tick_broadcast_set_event() clockevents_program_event() dev->next_event = expires; bc_set_next() hrtimer_try_to_cancel() //returns -1 since the timer callback is active. Exits without restarting the timer cpu_base->running = NULL; The comment that hrtimer cannot be armed from within the callback is wrong. It is fine to start the hrtimer from within the callback. Also it is safe to start the hrtimer from the enter/exit idle code while the broadcast handler is active. The enter/exit idle code and the broadcast handler are synchronized using tick_broadcast_lock. So there is no need for the existing try to cancel logic. All this can be removed which will eliminate the race condition as well. Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") Originally-by: Thomas Gleixner Signed-off-by: Balasubramani Vivekanandan Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20190926135101.12102-2-balasubramani_vivekanandan@mentor.com Signed-off-by: Sasha Levin --- kernel/time/tick-broadcast-hrtimer.c | 57 ++++++++++++++-------------- 1 file changed, 29 insertions(+), 28 deletions(-) diff --git a/kernel/time/tick-broadcast-hrtimer.c b/kernel/time/tick-broadcast-hrtimer.c index 5be6154e2fd2c..99fbfb8d9117c 100644 --- a/kernel/time/tick-broadcast-hrtimer.c +++ b/kernel/time/tick-broadcast-hrtimer.c @@ -42,34 +42,39 @@ static int bc_shutdown(struct clock_event_device *evt) */ static int bc_set_next(ktime_t expires, struct clock_event_device *bc) { - int bc_moved; /* - * We try to cancel the timer first. If the callback is on - * flight on some other cpu then we let it handle it. If we - * were able to cancel the timer nothing can rearm it as we - * own broadcast_lock. + * This is called either from enter/exit idle code or from the + * broadcast handler. In all cases tick_broadcast_lock is held. * - * However we can also be called from the event handler of - * ce_broadcast_hrtimer itself when it expires. We cannot - * restart the timer because we are in the callback, but we - * can set the expiry time and let the callback return - * HRTIMER_RESTART. + * hrtimer_cancel() cannot be called here neither from the + * broadcast handler nor from the enter/exit idle code. The idle + * code can run into the problem described in bc_shutdown() and the + * broadcast handler cannot wait for itself to complete for obvious + * reasons. * - * Since we are in the idle loop at this point and because - * hrtimer_{start/cancel} functions call into tracing, - * calls to these functions must be bound within RCU_NONIDLE. + * Each caller tries to arm the hrtimer on its own CPU, but if the + * hrtimer callbback function is currently running, then + * hrtimer_start() cannot move it and the timer stays on the CPU on + * which it is assigned at the moment. + * + * As this can be called from idle code, the hrtimer_start() + * invocation has to be wrapped with RCU_NONIDLE() as + * hrtimer_start() can call into tracing. */ - RCU_NONIDLE({ - bc_moved = hrtimer_try_to_cancel(&bctimer) >= 0; - if (bc_moved) - hrtimer_start(&bctimer, expires, - HRTIMER_MODE_ABS_PINNED);}); - if (bc_moved) { - /* Bind the "device" to the cpu */ - bc->bound_on = smp_processor_id(); - } else if (bc->bound_on == smp_processor_id()) { - hrtimer_set_expires(&bctimer, expires); - } + RCU_NONIDLE( { + hrtimer_start(&bctimer, expires, HRTIMER_MODE_ABS_PINNED); + /* + * The core tick broadcast mode expects bc->bound_on to be set + * correctly to prevent a CPU which has the broadcast hrtimer + * armed from going deep idle. + * + * As tick_broadcast_lock is held, nothing can change the cpu + * base which was just established in hrtimer_start() above. So + * the below access is safe even without holding the hrtimer + * base lock. + */ + bc->bound_on = bctimer.base->cpu_base->cpu; + } ); return 0; } @@ -95,10 +100,6 @@ static enum hrtimer_restart bc_handler(struct hrtimer *t) { ce_broadcast_hrtimer.event_handler(&ce_broadcast_hrtimer); - if (clockevent_state_oneshot(&ce_broadcast_hrtimer)) - if (ce_broadcast_hrtimer.next_event != KTIME_MAX) - return HRTIMER_RESTART; - return HRTIMER_NORESTART; } -- 2.20.1