Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp5354926pxb; Sun, 7 Nov 2021 10:26:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJyEmnvsjFQ7KtxwujCy2zYayHzegQ5zWJxsU388TKTBuoCExPh0Jniur4kqcmpGnlRN0xxz X-Received: by 2002:a05:6402:405:: with SMTP id q5mr54325023edv.62.1636309568517; Sun, 07 Nov 2021 10:26:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1636309568; cv=none; d=google.com; s=arc-20160816; b=PPeB6QxGhJ+vIW9yllOZ9yKzeRPQ29XTu4tTBI3nBTISG+TXxXHQMrLexO39k7/G1E pSilZupb7pnU/R9UAk4+LLdnt9PiRNImG10hFHGkRe0MFYlGCZEDrrKwNn4T1BJyXmzC AOGIT8/7z/hc+XnVV2Gc1dSSB88uFrn7L9cW/glS672IV03qZ0awwCfwe6Bei4YZcCKA hxznf010OVOyc0CxwtCTToCMuhrh/Vftnck1JwzWksMzMwjsqAHYX72nuhZ8FTxxVRpj +GKOstad6daiABKVZmQ46e9D7o4tHfp1nA12kDNt4oezIt+6t2mwuJ7ZkZlWEQCdBfCv YjXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=U7NyyXGlq15b37MRP4zCN5OGr3JVvL5+61CAkEKgVKw=; b=pWroAXZkB0Z6ca4KSinrmqxf27vvBy+b6vGmAzV8UTi0XWxvraPazG5uauH0EcWEZx CQnOLhCefqz53h85XQcqlyx8ByPOuWYZT7+VhyoHGC7cjXbh9/i6qahUI6NmwURNeM19 hbjcbEX5rajXtJCx8nhYdrMIULYB03Q8tynMgfkZP8moRpqfRzdPsc6/zrtDfmV4zBgt 5XWGCcGJ8MEDcnzqrNMdmVWJdvOc8avZeC0i9cqY+z05UYzkOIDLsCotZdX8tHzGKSQD Xbso7VXV4INaakgSbEv3oKrZ+Y9Ftag3E3f6yB4KaQ63oON7ZPeuM1b4MCIQ9fbj9bFl H+Bw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a41si23863489edf.309.2021.11.07.10.25.44; Sun, 07 Nov 2021 10:26:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234267AbhKGKgz (ORCPT + 99 others); Sun, 7 Nov 2021 05:36:55 -0500 Received: from foss.arm.com ([217.140.110.172]:42902 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232211AbhKGKgz (ORCPT ); Sun, 7 Nov 2021 05:36:55 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2BC552B; Sun, 7 Nov 2021 02:34:12 -0800 (PST) Received: from e120937-lin (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CA2043F7F5; Sun, 7 Nov 2021 02:34:10 -0800 (PST) Date: Sun, 7 Nov 2021 10:34:07 +0000 From: Cristian Marussi To: rishabhb@codeaurora.org Cc: Sudeep Holla , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, avajid@codeaurora.org, adharmap@codeaurora.org Subject: Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails Message-ID: <20211107103407.GJ6526@e120937-lin> References: <1628111999-21595-1-git-send-email-rishabhb@codeaurora.org> <20210805105427.GU6592@e120937-lin> <51782599a01a6a22409d01e5fc1f8a50@codeaurora.org> <20210831054835.GJ13160@e120937-lin> <20210901093558.GL13160@e120937-lin> <20211102113221.w7ivffssjb6jmggj@bogus> <9385b2ca9b688b00735cc0b7f626f008@codeaurora.org> <20211105094310.GI6526@e120937-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.4 (2018-02-28) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 05, 2021 at 10:40:59AM -0700, rishabhb@codeaurora.org wrote: > On 2021-11-05 02:43, Cristian Marussi wrote: > > On Thu, Nov 04, 2021 at 04:40:03PM -0700, rishabhb@codeaurora.org wrote: > > > On 2021-11-02 04:32, Sudeep Holla wrote: > > > > On Mon, Nov 01, 2021 at 09:35:42AM -0700, rishabhb@codeaurora.org wrote: > > > > > On 2021-09-01 02:35, Cristian Marussi wrote: > > > > > > On Tue, Aug 31, 2021 at 06:48:35AM +0100, Cristian Marussi wrote: > > > > > > > On Mon, Aug 30, 2021 at 02:09:37PM -0700, rishabhb@codeaurora.org > > > > > > > wrote: > > > > > > > > Hi Christian > > > > > > > > > > > > > > Hi Rishabh, > > > > > > > > > > > Hi Rishabh, > > Hi Rishabh, > > apologies for the delay in coming back to you. > > A few comments below. > > > > > > > > > thanks for looking into this kind of bad interactions. > > > > > > > > > > > > > > > There seems to be another issue here. The response from agent can be delayed > > > > > > > > causing a timeout during base protocol acquire, > > > > > > > > which leads to the probe failure. What I have observed is sometimes the > > > > > > > > failure of probe and rx_callback (due to a delayed message) > > > > > > > > happens at the same time on different cpus. > > > > > > > > Because of this race, the device memory may be cleared while the > > > > > > > > interrupt(rx_callback) is executing on another cpu. > > > > > > > > > > > > > > You are right that concurrency was not handled properly in this kind > > > > > > > of > > > > > > > context and moreover, if you think about it, even the case of out of > > > > > > > order reception of responses and delayed_responses (type2 SCMI > > > > > > > messages) > > > > > > > for asynchronous SCMI commands was not handled properly. > > > > > > > > > > > > > > > How do you propose we solve this? Do you think it is better to take the > > > > > > > > setting up of base and other protocols out of probe and > > > > > > > > in some delayed work? That would imply the device memory is not released > > > > > > > > until remove is called. Or should we add locking to > > > > > > > > the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid > > > > > > > > the race? > > > > > > > > > > > > > > > > > > > > > > These issues were more easily exposed by SCMI Virtio transport, so in > > > > > > > the series where I introduced scmi-virtio: > > > > > > > > > > > > > > https://lore.kernel.org/linux-arm-kernel/162848483974.232214.9506203742448269364.b4-ty@arm.com/ > > > > > > > > > > > > > > (which is now queued for v5.15 ... now on -next I think...finger > > > > > > > crossed) > > > > > > > > > > > > > > I took the chance to rectify a couple of other things in the SCMI core > > > > > > > in the initial commits. > > > > > > > As an example, in the above series > > > > > > > > > > > > > > [PATCH v7 05/15] firmware: arm_scmi: Handle concurrent and > > > > > > > out-of-order messages > > > > > > > > > > > > > > cares to add a refcount to xfers and some locking on xfers between TX > > > > > > > and RX path to avoid that a timed out xfer can vanish while the rx > > > > > > > path > > > > > > > is concurrently working on it (as you said); moreover I handle the > > > > > > > condition (rare if not unplausible anyway) in which a transport > > > > > > > delivers > > > > > > > out of order responses and delayed responses. > > > > > > > > > > > > > > I tested this scenarios on some fake emulated SCMI Virtio transport > > > > > > > where I could play any sort of mess and tricks to stress this limit > > > > > > > conditions, but you're more than welcome to verify if the race you are > > > > > > > seeing on Base protocol time out is solved (as I would hope :D) by > > > > > > > this > > > > > > > series of mine. > > > > > > > > > > > > > > Let me know, any feedback is welcome. > > > > > > > > > > > > > > Btw, in the series above there are also other minor changes, but there > > > > > > > is also another more radical change needed to ensure correctness and > > > > > > > protection against stale old messages which maybe could interest you > > > > > > > in general if you are looking into SCMI: > > > > > > > > > > > > > > [PATCH v7 04/15] firmware: arm_scmi: Introduce monotonically > > > > > > > increasing tokens > > > > > > > > > > > > > > Let me know if yo have other concerns. > > > > > > > > > > > > > > > > > > > Hi Rishabhb, > > > > > > > > > > > > just a quick remark, thinking again about your fail @probe scenario > > > > > > above > > > > > > I realized that while the concurrency patch I mentioned above could help > > > > > > on > > > > > > races against vanishing xfers when late timed-out responses are > > > > > > delivered, > > > > > > here we really are then also shutting down everything on failure, so > > > > > > there > > > > > > could be further issues between a very late invokation of > > > > > > scmi_rx_callback > > > > > > and the core devm_ helpers freeing the underlying xfer/cinfo/etc.. > > > > > > structs > > > > > > used by scmi-rx-callback itself (maybe this was already what you meant > > > > > > and > > > > > > I didn't get it,...sorry) > > > > > > > > > > > > On the other side, I don't feel that delaying Base init to a deferred > > > > > > worker is a viable solution since we need Base protocol init to be > > > > > > initialized and we need to just give up if we cannot communicate with > > > > > > the SCMI platform fw in such early stages. (Base protocol is really the > > > > > > only mandatory proto is I remember correctly the spec) > > > > > > > > > > > > Currenly I'm off and only glancing at mails but I'll have a thought > > > > > > about > > > > > > these issues once back in a few weeks time. > > > > > > > > > > > > Thanks, > > > > > > Cristian > > > > > > > > > > > Hi Cristian > > > > > I hope you enjoyed your vacation. Did you get a chance to look at > > > > > the issue > > > > > stated above and have some idea as to how to solve this? > > > > > > > > Do you still see the issue with v5.15 ? Can you please check if haven't > > > > already done that ? > > > > > > > > Also 30ms delay we have is huge IMO and we typically expect the > > > > communication > > > > with remote processor or any entity that implements SCMI to happen in > > > > terms > > > > of one or few ms tops. > > > > > > > > If there is a race, we need to fix that but I am interested in knowing > > > > why the default time of 30ms not sufficient ? Did increasing that helps > > > > and is this timeout happening only for the initial commands(guessing the > > > > SCMI firmware is not yet ready) or does it happen even during run-time ? > > > > > > Hi Sudeep > > > I haven't checked on 5.15 but after glancing at the code I believe > > > we should > > > see the same issue. > > > I agree 30ms is a big enough value and should be something that remote > > > firmware should resolve. But > > > if remote firmware goes into a bad state and not functioning > > > properly at > > > least kernel should not panic. > > > > > > The issue we see here happens during scmi probe. The response from the > > > remote agent can be delayed > > > causing a timeout during base protocol acquire, which leads to the > > > probe > > > failure. > > > What I have observed is sometimes the failure of probe and > > > rx_callback (due > > > to a delayed message) > > > happens around the same time on different cpus. Because of this > > > race, the > > > device memory may be cleared > > > while the interrupt(rx_callback) is executing on another cpu. > > > > So I was looking at the failure path you mentioned: a late concurrent > > reply on Base protocol from the fw, during the probe, leads to an > > invocation > > of scmi_rx_callback() on a different CPU while core data structs like > > cinfo are being freed by the SCMI core on the probe failure path. > > (v5.15-added SCMI concurrrency handling stuff I mentiond shuld help for > > races regarding xfer but not for the cinfo stuff in this case ...) > > > > We cannot defer Base proto init since we just wanna fail early while > > probing if not even the Base protocol can work fine, and also because > > Base protocol information are indeed needed for initial setup, so we > > cannot juts proceed if we did not even got a Base reply on the number of > > protos. (already said) > > > > In my opinion, the proper way to address this kind of races at probe > > failure should be to ensure that the transport you are using is properly > > shut down completely before cleanup starts (same applies for a clean > > remove), i.e. scmi_rx_callback should not even be possibly registered to > > be called when the the final cleanup by the core is started (devm_ frees > > I mean after scmi_probe exit failing...) > > > > BUT indeed looking back at transport layers like mailbox and virtio, > > this > > should be happening already, because the flow is like > > > > scmi_probe() > > { > > ... > > > > clean_tx_rx_setup: > > scmi_cleanup_txrx_channels() > > .... > > --->>> ret = idr_for_each(idr, info->desc->ops->chan_free, idr); > > - > > return ret; > > } > > > > .... only after this scmi_probe returns the core devm layer starts > > freeing devm_ > > allocated stuff like cinfo, AND the above per-transport specific > > .chan_free seems > > to take care to 'deactivate/dregister' the scmi_rx_callback at the > > transport layer: > > > > > > e.g. MBOX transport > > ------------------------- > > static int mailbox_chan_free(int id, void *p, void *data) > > { > > struct scmi_chan_info *cinfo = p; > > struct scmi_mailbox *smbox = cinfo->transport_info; > > > > if (smbox && !IS_ERR(smbox->chan)) { > > mbox_free_channel(smbox->chan); <<< THIS MBOX CORE CALL DEACTIVATE > > cinfo->transport_info = NULL; > > > > > > e.g. VIRTIO Transport > > ----------------------------- > > static int virtio_chan_free(int id, void *p, void *data) > > { > > unsigned long flags; > > struct scmi_chan_info *cinfo = p; > > struct scmi_vio_channel *vioch = cinfo->transport_info; > > > > spin_lock_irqsave(&vioch->ready_lock, flags); > > vioch->ready = false; <<<< THIS VIRTIO FLAG > > DEACTIVATE VIRTIO CBS INVOKCATION > > spin_unlock_irqrestore(&vioch->ready_lock, flags); > > > > > > ... AND both of the above call are indeed also spinlocked heavily, so > > that > > the 'deactivation' of the scmi_rx_callback should be visible properly; > > in > > other words I would expect that after the above .chan_free() have > > completed the scmi_rx_callback() cannot be called anymore, because the > > transport itself will properly drop any so-late fw reply. > > > > So I am now wondering, which transport are you using in your tests ? > > since at least for the above 2 example it seems to me that your > > race-on-probe failure condition should be already addressed by the > > transport layer itself....or am I getting wrong the nature of the race ? > > > > Thanks > > Cristian > > Hi Cristian > You caught the scenario perfectly. But there is still a possibility of a > race. To be clear we use > the mbox transport. Let me explain in more detail. > Lets assume that the last command (base protocol acquire) kernel sent to > remote agent timed out. > This will lead to final cleanup before exiting probe like you mentioned. > Once cleanup is done(mailbox_chan_free) > no more responses from remote agent will acknowledged but if the response > comes in between the cleanup in probe > and the last command timing out we will see a race since the response can > come asynchronously. In this scenario cleanup > and scmi_rx_callback race with each other. > I believe to solve this we need to synchronize cleanup with > scmi_rx_callback. we can serialize these two paths > and exit early in rx_callback if cleanup has been completed. > Yes indeed, but my concern is also not to introduce to much contention on the RX path (with irqsave spinlocking & friends), given that this racy scenario has surely to be handled but it is also highly unlikely, so I don't want to slow down all the rx path all the time. So I tried something along this lines: ----8<------ diff --git a/drivers/firmware/arm_scmi/common.h b/drivers/firmware/arm_scmi/common.h index dea1bfbe1052..036f8ccff450 100644 --- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -340,11 +340,13 @@ void scmi_protocol_release(const struct scmi_handle *handle, u8 protocol_id); * channel * @handle: Pointer to SCMI entity handle * @transport_info: Transport layer related information + * @users: A refcount to track active users of this channel */ struct scmi_chan_info { struct device *dev; struct scmi_handle *handle; void *transport_info; + refcount_t users; }; /** diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c index b406b3f78f46..5814ed3f444e 100644 --- a/drivers/firmware/arm_scmi/driver.c +++ b/drivers/firmware/arm_scmi/driver.c @@ -678,6 +678,16 @@ static void scmi_handle_response(struct scmi_chan_info *cinfo, scmi_xfer_command_release(info, xfer); } +static inline bool scmi_acquire_channel(struct scmi_chan_info *cinfo) +{ + return refcount_inc_not_zero(&cinfo->users); +} + +static inline void scmi_release_channel(struct scmi_chan_info *cinfo) +{ + return refcount_dec(&cinfo->users); +} + /** * scmi_rx_callback() - callback for receiving messages * @@ -695,6 +705,10 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv) { u8 msg_type = MSG_XTRACT_TYPE(msg_hdr); + /* Bail out if channel freed already */ + if (!scmi_acquire_channel(cinfo)) + return; + switch (msg_type) { case MSG_TYPE_NOTIFICATION: scmi_handle_notification(cinfo, msg_hdr, priv); @@ -707,6 +721,8 @@ void scmi_rx_callback(struct scmi_chan_info *cinfo, u32 msg_hdr, void *priv) WARN_ONCE(1, "received unknown msg_type:%d\n", msg_type); break; } + + scmi_release_channel(cinfo); } /** @@ -1506,10 +1522,27 @@ static int scmi_chan_setup(struct scmi_info *info, struct device *dev, return ret; } + refcount_set(&cinfo->users, 1); cinfo->handle = &info->handle; return 0; } +static int scmi_chan_free(int id, void *p, void *data) +{ + struct scmi_chan_info *cinfo = p; + struct scmi_info *info = handle_to_scmi_info(cinfo->handle); + + if (refcount_dec_and_test(&cinfo->users)) { + info->desc->ops->chan_free(id, cinfo, data); + } else { + /* Stall till the ongoing rx_callback completes */ + spin_until_cond(refcount_read(&cinfo->users) == 0); + info->desc->ops->chan_free(id, cinfo, data); + } + + return 0; +} + static inline int scmi_txrx_setup(struct scmi_info *info, struct device *dev, int prot_id) { @@ -1792,11 +1825,11 @@ static int scmi_cleanup_txrx_channels(struct scmi_info *info) int ret; struct idr *idr = &info->tx_idr; - ret = idr_for_each(idr, info->desc->ops->chan_free, idr); + ret = idr_for_each(idr, scmi_chan_free, idr); idr_destroy(&info->tx_idr); idr = &info->rx_idr; - ret = idr_for_each(idr, info->desc->ops->chan_free, idr); + ret = idr_for_each(idr, scmi_chan_free, idr); idr_destroy(&info->rx_idr); return ret; ------8<----- Can you give it a go on your setup ? Beware it is not really tested on the racy error path (:P) and I could have still missed something regarding synchro (and I expect an undesired refcount warn on the scmi_release_channel too when the race is hit....but just to experiment a bit for now and see if something like this could be enough while avoiding further locking) Thanks, Cristian