Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4736315pxj; Tue, 25 May 2021 15:21:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwR5dQ7MARn5IDY3JLDCx8Uze8vzRRttjGEUcfMJFtQtuX8s5s9GPNTHbTFNutYo15ZupxP X-Received: by 2002:aa7:d58b:: with SMTP id r11mr34543103edq.231.1621981291026; Tue, 25 May 2021 15:21:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621981291; cv=none; d=google.com; s=arc-20160816; b=BKrwfJlU2qrVpLFczZPykYVLgocHNMxgB+9bsEvztOKFGtdbZyyU9aOoZwwsFt2aCd fudKoDib+Jx0Ei0vz4d7bP2rhXN1dmzHcgyUMRPUm0JIo98PfFwBaHl8vf5qonMo3h5G Cd4ADptF9quxxH3iX+2GVaB9EXS55fz0E3cUzomqJls1eQPuYcT2aizjJG/NQxm6CkC8 MUiefGLLyeAf/lPJXxJ4sFrzQcn1zkwei4phw8kYPS5zvpmXdbQyF4/9wUukDY0lXdAf WgZ/KCK39BLWe6ShuMK3xsMSpj0SROk+IktOP1HUWQKHgKRAuPSIZd6/5ENxxVwZdNlo kVrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dmarc-filter:sender:dkim-signature; bh=7HujQ3ytiVUpOoBuHQNgqxNnR8h83bOt0+a+zswp5OI=; b=KLryBcmqgXSdTmlVeDYFr1v4ZyeXq1M/DCDAkjS4F9pA2cWLyTgvDP46GPYQy9Srzg RdhSKUK55fcRaY465KOTCkTQE/fb1RlDNnetdgkqV6GRclFCXkm5vvcONP5fASHafVpZ 2/GpUk2BDLLx9XEMO0HDhYEO6U773KTB/f0oWK/a7pYGrJlt6hQH5r7VeyAXtOUmalaq 5jZ+QBMGtvtjXo96a9t66ayuNP4gkCM6BJX3PODckYGYNPK8UXyBnvRrgC68lhFuiRy2 W6TxhTOr0m7coSbL28Q29Q01ThFtmfULCJo4WC5PNZrztaXYBoO9kjt2gE0tDgTTeTI7 GpfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=pZE3XdJa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s13si18736312edr.28.2021.05.25.15.21.07; Tue, 25 May 2021 15:21:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@mg.codeaurora.org header.s=smtp header.b=pZE3XdJa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232779AbhEYTtk (ORCPT + 99 others); Tue, 25 May 2021 15:49:40 -0400 Received: from so254-9.mailgun.net ([198.61.254.9]:15226 "EHLO so254-9.mailgun.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230240AbhEYTtk (ORCPT ); Tue, 25 May 2021 15:49:40 -0400 DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=mg.codeaurora.org; q=dns/txt; s=smtp; t=1621972090; h=Content-Transfer-Encoding: Content-Type: In-Reply-To: MIME-Version: Date: Message-ID: From: References: Cc: To: Subject: Sender; bh=7HujQ3ytiVUpOoBuHQNgqxNnR8h83bOt0+a+zswp5OI=; b=pZE3XdJa4LRewaIe1ehqhRP779YAPRIJ53W7gVyNPf5nVc0XQFR+l1TI/T3C6a0V3zDStBzI wNmjdT2vAI0D/UWPM9OvIeTUyFhn5YuFOoWXX9YwBMYSmvk1CYu5QIIsmdykqB1/f05ovD8W +cg2hVuNPBdBRPOwXum33BETe7k= X-Mailgun-Sending-Ip: 198.61.254.9 X-Mailgun-Sid: WyI0MWYwYSIsICJsaW51eC1rZXJuZWxAdmdlci5rZXJuZWwub3JnIiwgImJlOWU0YSJd Received: from smtp.codeaurora.org (ec2-35-166-182-171.us-west-2.compute.amazonaws.com [35.166.182.171]) by smtp-out-n01.prod.us-west-2.postgun.com with SMTP id 60ad54742bff04e53bbf97ba (version=TLS1.2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256); Tue, 25 May 2021 19:48:04 GMT Sender: sidgup=codeaurora.org@mg.codeaurora.org Received: by smtp.codeaurora.org (Postfix, from userid 1001) id B981EC43460; Tue, 25 May 2021 19:48:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-caf-mail-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=ALL_TRUSTED,BAYES_00, NICE_REPLY_A,SPF_FAIL,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from [192.168.1.10] (cpe-75-83-25-192.socal.res.rr.com [75.83.25.192]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: sidgup) by smtp.codeaurora.org (Postfix) with ESMTPSA id 99E50C4338A; Tue, 25 May 2021 19:48:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 99E50C4338A Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: aws-us-west-2-caf-mail-1.web.codeaurora.org; spf=fail smtp.mailfrom=sidgup@codeaurora.org Subject: Re: [PATCH] remoteproc: core: Invoke subdev callbacks in list order To: Bjorn Andersson Cc: ohad@wizery.com, linux-remoteproc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, psodagud@codeaurora.org References: <1621292893-18827-1-git-send-email-sidgup@codeaurora.org> From: Siddharth Gupta Message-ID: <02bc228b-4bd2-de5a-6cf7-5623b8048e7f@codeaurora.org> Date: Tue, 25 May 2021 12:48:01 -0700 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/24/2021 8:03 PM, Bjorn Andersson wrote: > On Mon 17 May 18:08 CDT 2021, Siddharth Gupta wrote: > >> Subdevices at the beginning of the subdev list should have >> higher priority than those at the end of the list. Reverse >> traversal of the list causes priority inversion, which can >> impact the performance of the device. >> > The subdev lists layers of the communication onion, we bring them up > inside out and we take them down outside in. > > This stems from the primary idea that we want to be able to shut things > down cleanly (in the case of a stop) and we pass the "crashed" flag to > indicate to each recipient during "stop" that it may not rely on the > response of a lower layer. > > As such, I don't think it's right to say that we have a priority > inversion. My understanding of the topic was that each subdevice should be independent of the other. In our case unfortunately the sysmon subdevice depends on the glink endpoint. However the priority inversion doesn't happen in these subdevices, it happens due to the SSR notifications that we send to kernel clients. In this case kernel clients also can have QMI sockets that in turn depend on the glink endpoint, which means when they go to release the QMI socket a broadcast will be sent out to all connected clients about the closure of the connection which in this case happens to be the remoteproc which died. So if we peel the onion, we will be unnecessarily be waiting for a dead remoteproc. > >> For example a device adds the glink, sysmon and ssr subdevs >> to its list. During a crash the ssr notification would go >> before the glink and sysmon notifications. This can cause a >> degraded response when a client driver waits for a response >> from the crashed rproc. >> > In general the design is such that components are not expected to > communicate with the crashed remote when "crashed" is set, this avoids > the single-remote crash. Here the glink device on the rpmsg bus won't know about the crashed remoteproc till we send glink notification first, right? Since we send out sysmon and SSR notifications first, the glink device will still be "alive" on the rpmsg bus. > > The case where this isn't holding up is when two remote processors > crashes simultaneously, in which case e.g. sysmon has been seen hitting > its timeout waiting for an ack from a dead remoteproc - but I was under > the impression that this window shrunk dramatically as a side effect of > us fixing the notification ordering. You are right, the window would become smaller in the case of two remoteprocs, but this issue can come up with even a single remoteproc unless prioritize certain subdevices. > >> Signed-off-by: Siddharth Gupta >> --- >> drivers/remoteproc/remoteproc_core.c | 24 ++++++++++++++---------- >> 1 file changed, 14 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c >> index 626a6b90f..ac8fc42 100644 >> --- a/drivers/remoteproc/remoteproc_core.c >> +++ b/drivers/remoteproc/remoteproc_core.c >> @@ -1167,7 +1167,7 @@ static int rproc_handle_resources(struct rproc *rproc, >> >> static int rproc_prepare_subdevices(struct rproc *rproc) >> { >> - struct rproc_subdev *subdev; >> + struct rproc_subdev *subdev, *itr; >> int ret; >> >> list_for_each_entry(subdev, &rproc->subdevs, node) { >> @@ -1181,9 +1181,11 @@ static int rproc_prepare_subdevices(struct rproc *rproc) >> return 0; >> >> unroll_preparation: >> - list_for_each_entry_continue_reverse(subdev, &rproc->subdevs, node) { >> - if (subdev->unprepare) >> - subdev->unprepare(subdev); >> + list_for_each_entry(itr, &rproc->subdevs, node) { >> + if (itr == subdev) >> + break; >> + if (itr->unprepare) >> + itr->unprepare(subdev); >> } >> >> return ret; >> @@ -1191,7 +1193,7 @@ static int rproc_prepare_subdevices(struct rproc *rproc) >> >> static int rproc_start_subdevices(struct rproc *rproc) >> { >> - struct rproc_subdev *subdev; >> + struct rproc_subdev *subdev, *itr; >> int ret; >> >> list_for_each_entry(subdev, &rproc->subdevs, node) { >> @@ -1205,9 +1207,11 @@ static int rproc_start_subdevices(struct rproc *rproc) >> return 0; >> >> unroll_registration: >> - list_for_each_entry_continue_reverse(subdev, &rproc->subdevs, node) { >> - if (subdev->stop) >> - subdev->stop(subdev, true); >> + list_for_each_entry(itr, &rproc->subdevs, node) { >> + if (itr == subdev) >> + break; >> + if (itr->stop) >> + itr->stop(itr, true); >> } >> >> return ret; >> @@ -1217,7 +1221,7 @@ static void rproc_stop_subdevices(struct rproc *rproc, bool crashed) >> { >> struct rproc_subdev *subdev; >> >> - list_for_each_entry_reverse(subdev, &rproc->subdevs, node) { >> + list_for_each_entry(subdev, &rproc->subdevs, node) { > I presume this is the case you actually care about, can you help me > understand if you changed the others for consistence or if there's some > flow of events where that might be necessary. Yes you are right, I only changed the others for consistence. However, I will give this more thought and see if unprepare in the reverse order can make a difference. Thanks, Sid > > Regards, > Bjorn > >> if (subdev->stop) >> subdev->stop(subdev, crashed); >> } >> @@ -1227,7 +1231,7 @@ static void rproc_unprepare_subdevices(struct rproc *rproc) >> { >> struct rproc_subdev *subdev; >> >> - list_for_each_entry_reverse(subdev, &rproc->subdevs, node) { >> + list_for_each_entry(subdev, &rproc->subdevs, node) { >> if (subdev->unprepare) >> subdev->unprepare(subdev); >> } >> -- >> Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, >> a Linux Foundation Collaborative Project >>