Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp1053955pxu; Fri, 16 Oct 2020 02:38:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4hEl4SsAhP+5fDM6uxxvUM1+HtCe4eyoRyIQblv8MrpIBPAjy+VyyQYBOw4ESDbDP+xjr X-Received: by 2002:a17:906:b204:: with SMTP id p4mr2912395ejz.214.1602841111243; Fri, 16 Oct 2020 02:38:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602841111; cv=none; d=google.com; s=arc-20160816; b=MUHLYDQ6Cj4GsNyeLStt3r79gbDmd/JIfBznW6kcjdYe9Z39ZIQKI2gbLstD3j0xHP 75XJBf28+i2AnfWI/b5ZL+vPXKMdntWsSOcBQsa0RWKiQ+znoo9xAHIuXS9HV83lSEQy xFBNIFCrgDZmNuEPOmweh6WxvkeIjteh1HE+Iv5GCmlzTHkdWH26cluXoDhaTMlQwskG /rlYmLaW/z5oAoFhQE7y9qkRSC5TxTtbf4VpfwBa52Z175IKIN1OQSU+w/Qxqj1b0+tF aDMM1ojqb3PPPa0HMqrMNry1Qz28yhvSnKJhO+9m4K8DWQeIsuAdUqwlhqgAEWEwTlyY LUUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Y3rF5kjBODAFF2iWj1vtHgLzkO7yw32otqFPHRi7iog=; b=qmIQGAA2R1YUb3XQ2E1Ftw8W0xcLBBYU8xmE1Bu/eDn0VRfFcM0l3wrrI9vNxCgV3u nk4zgewymDTGe0I6Ap6Ci1E0y3ZP2czCn0ssYfYRtXzAO0YiEAmb92xY6UqiVNwSvk1N FQB5tzxmK66lIp/XtIJIznnOH6lNs6VbAjcDNtS35s5AAeIOCWGeDZJ97cq29V9fD7X+ 8HyWmbdSMQkeQhUgV4lZIg21+1seUObyAhcF1NkaFAgBNy7jnqojCJMLZbw9EPhsAX7M 3E6bKqnJX53aTWpFT0yFooB6kM9BTmoTCV/ZZvjnqWwgWeDgyq1XlcekQbvLoV+4aCW+ /z9g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p17si1239714edx.292.2020.10.16.02.38.08; Fri, 16 Oct 2020 02:38:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405486AbgJPJeZ (ORCPT + 99 others); Fri, 16 Oct 2020 05:34:25 -0400 Received: from foss.arm.com ([217.140.110.172]:60794 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2404835AbgJPJeZ (ORCPT ); Fri, 16 Oct 2020 05:34:25 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8869A30E; Fri, 16 Oct 2020 02:34:24 -0700 (PDT) Received: from bogus (unknown [10.57.17.164]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 43AB33F719; Fri, 16 Oct 2020 02:34:23 -0700 (PDT) Date: Fri, 16 Oct 2020 10:34:21 +0100 From: Sudeep Holla To: Jerome Brunet Cc: Ionela Voinescu , Jassi Brar , Sudeep Holla , Kevin Hilman , linux-amlogic@lists.infradead.org, Da Xue , linux-kernel@vger.kernel.org Subject: Re: [PATCH] mailbox: cancel timer before starting it Message-ID: <20201016093421.7hyiqrekiy6mtyso@bogus> References: <20200923123916.1115962-1-jbrunet@baylibre.com> <20201015134628.GA11989@arm.com> <1jlfg7k2ux.fsf@starbuckisacylon.baylibre.com> <20201015142935.GA12516@arm.com> <20201016084428.gthqj25wrvnqjsvz@bogus> <1jimbak0hh.fsf@starbuckisacylon.baylibre.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1jimbak0hh.fsf@starbuckisacylon.baylibre.com> User-Agent: NeoMutt/20171215 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 16, 2020 at 11:02:02AM +0200, Jerome Brunet wrote: > > On Fri 16 Oct 2020 at 10:44, Sudeep Holla wrote: > > > On Thu, Oct 15, 2020 at 03:29:35PM +0100, Ionela Voinescu wrote: > >> Hi Jerome, > >> > >> On Thursday 15 Oct 2020 at 15:58:30 (+0200), Jerome Brunet wrote: > >> > > >> > On Thu 15 Oct 2020 at 15:46, Ionela Voinescu wrote: > >> > > >> > > Hi guys, > >> > > > >> > > On Wednesday 23 Sep 2020 at 14:39:16 (+0200), Jerome Brunet wrote: > >> > >> If the txdone is done by polling, it is possible for msg_submit() to start > >> > >> the timer while txdone_hrtimer() callback is running. If the timer needs > >> > >> recheduling, it could already be enqueued by the time hrtimer_forward_now() > >> > >> is called, leading hrtimer to loudly complain. > >> > >> > >> > >> WARNING: CPU: 3 PID: 74 at kernel/time/hrtimer.c:932 hrtimer_forward+0xc4/0x110 > >> > >> CPU: 3 PID: 74 Comm: kworker/u8:1 Not tainted 5.9.0-rc2-00236-gd3520067d01c-dirty #5 > >> > >> Hardware name: Libre Computer AML-S805X-AC (DT) > >> > >> Workqueue: events_freezable_power_ thermal_zone_device_check > >> > >> pstate: 20000085 (nzCv daIf -PAN -UAO BTYPE=--) > >> > >> pc : hrtimer_forward+0xc4/0x110 > >> > >> lr : txdone_hrtimer+0xf8/0x118 > >> > >> [...] > >> > >> > >> > >> Canceling the timer before starting it ensure that the timer callback is > >> > >> not running when the timer is started, solving this race condition. > >> > >> > >> > >> Fixes: 0cc67945ea59 ("mailbox: switch to hrtimer for tx_complete polling") > >> > >> Reported-by: Da Xue > >> > >> Signed-off-by: Jerome Brunet > >> > >> --- > >> > >> drivers/mailbox/mailbox.c | 8 ++++++-- > >> > >> 1 file changed, 6 insertions(+), 2 deletions(-) > >> > >> > >> > >> diff --git a/drivers/mailbox/mailbox.c b/drivers/mailbox/mailbox.c > >> > >> index 0b821a5b2db8..34f9ab01caef 100644 > >> > >> --- a/drivers/mailbox/mailbox.c > >> > >> +++ b/drivers/mailbox/mailbox.c > >> > >> @@ -82,9 +82,13 @@ static void msg_submit(struct mbox_chan *chan) > >> > >> exit: > >> > >> spin_unlock_irqrestore(&chan->lock, flags); > >> > >> > >> > >> - if (!err && (chan->txdone_method & TXDONE_BY_POLL)) > >> > >> - /* kick start the timer immediately to avoid delays */ > >> > >> + if (!err && (chan->txdone_method & TXDONE_BY_POLL)) { > >> > >> + /* Disable the timer if already active ... */ > >> > >> + hrtimer_cancel(&chan->mbox->poll_hrt); > >> > >> + > >> > >> + /* ... and kick start it immediately to avoid delays */ > >> > >> hrtimer_start(&chan->mbox->poll_hrt, 0, HRTIMER_MODE_REL); > >> > >> + } > >> > >> } > >> > >> > >> > >> static void tx_tick(struct mbox_chan *chan, int r) > >> > > > >> > > I've tracked a regression back to this commit. Details to reproduce: > >> > > >> > Hi Ionela, > >> > > >> > I don't have access to your platform and I don't get what is going on > >> > from the log below. > >> > > >> > Could you please give us a bit more details about what is going on ? > >> > > >> > >> I'm not familiar with the mailbox subsystem, so the best I can do right > >> now is to add Sudeep to Cc, in case this conflicts in some way with the > >> ARM MHU patches [1]. > >> > > > > Not it can't be doorbell driver as we use SCPI(old firmware) with upstream > > MHU driver as is limiting the number of channels to be used. > > > >> In the meantime I'll get some traces and get more familiar with the > >> code. > >> > > > > I will try that too. > > BTW, this issue was originally reported on amlogic platforms which also > use arm,mhu mailbox driver. > OK. Anyway just noticed that hrtimer_cancel uses hrtimer_try_to_cancel and hrtimer_cancel_wait_running. The latter is just cpu_relax() if PREEMPT_RT=n, so you may still have issue if the hrtimer is still active or restarts in the meantime. -- Regards, Sudeep