Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2788889pxj; Mon, 31 May 2021 10:48:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyp2MHoSvySCH5s5CyFI7pR1eJR7C6agK+9iT8uBbIpSV+9PYYEr0HORItIISEDwNJ0w0My X-Received: by 2002:a92:2907:: with SMTP id l7mr18538616ilg.64.1622483290870; Mon, 31 May 2021 10:48:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622483290; cv=none; d=google.com; s=arc-20160816; b=yQ2hZWiRqf//f0MrWIwi39XAibBVtjCkniPr0d5U7xECjmWVW3XpZulnnOI0L76VPY af/mC2xEzIql1dcoQSfoBF5/4Uf0CfVrhaHmfLt9uLR5MPiz23pNcLEjG85LGoMWgypT LOZasqWAsL1DMdOajgfNpby6S2co5XYjlbQ7F9dlEPa3TCK3a6YQIJQffAvHescx8y/A NKlt5hd+Wc2wXT/5f4UsEwMy+1E2iq4XLEXSp4ulfsDwZra6PoEGGOxOgHqDg6wvdsny aF1D180/6ScoeogeviXP/UY1bEUJz1/WSms8dGILjFGovjWUEswVYptkHbo4TGxrdOZ1 hzVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=0UQpSMC1R+2Xa12LBVW7x40T3z3R/9Jq0QsvLZP5FHQ=; b=RGtJ6ksQ4L2DHWggECycFJLOztZPe4RrqnWSeKKZJwv7HqzqaRGqLbx+ZwofyfeXNr 3bJu3TGz7JQOqJfQWMCsN1SkoiRVxebe2DxcrWbVaO1n47zFYix+CC54MiJHMevk0Xwi m+HV4Akc2d2tMXdnup87AcxWQWRiPhpafSDMf/OvsqmecHeH9l0k0eIPUDoZXLn5S/p9 rfsJ1DS56zOl1AVB39mrYFTxwyhrrnNPDLpi7Qt30KoZzzTEV7aW5CQXC2oW0pZ02sfO dBydOI81QgUiMpBaFxVwkQ5IC9r+6SlU+6PUl1MU1Qh1X1xryKiNQijku/tOXlZdRz5o acBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PY3S1B9S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id k21si15079306jad.9.2021.05.31.10.47.55; Mon, 31 May 2021 10:48:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PY3S1B9S; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232626AbhEaRrr (ORCPT + 99 others); Mon, 31 May 2021 13:47:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232382AbhEaRrc (ORCPT ); Mon, 31 May 2021 13:47:32 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5A14C06134B for ; Mon, 31 May 2021 10:21:56 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id pi6-20020a17090b1e46b029015cec51d7cdso311612pjb.5 for ; Mon, 31 May 2021 10:21:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=0UQpSMC1R+2Xa12LBVW7x40T3z3R/9Jq0QsvLZP5FHQ=; b=PY3S1B9SWEFG0pRMqrSIevRJgIxUSSfpOJJdgMmbTjBoNfuIY1YJqsev+Zba8gWNKm 5Po7yoXdfbuFzLxk8kaUv9fmsMs8pMf+NaeANmMiOJyRGb+8Zrfad0a3g5qCdnh3vJrN nkvmY3+mUugnrb7gZgUTicN7K9AjoPcP9ZwPeTAqzT6oJqcKcaHkz2GnIRKOZNQ/9doY yyLiErefayU3NndHJrsb0XrUC8ZRXetV6rbEfqW3bCnuD0dSj6YWsoPg/0UlpMW58pCj csF6LbOIhqmfEwTpTDa/cthrUsEPH+xf+DzMF5qFm3MYqs4Qw/4nb21+eut3gbbRfGcf xofQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=0UQpSMC1R+2Xa12LBVW7x40T3z3R/9Jq0QsvLZP5FHQ=; b=jJGC4faRDKBuHzNt03U4B3Spg1tIci5SJVn8PiVa3AS7QSwNJ+ftRLBdga7HTSNZ2v p0MxvYMskzodA153EQXGPJJfpc9fC1op9cAcpfDaOHCz+qQHWzPuJ1opUHku9qKk9MoW YWkXy0zD9cOlvSbTqv0Rnx6ZMhvCdcsiC75GaJHkXNO0HAQhufY1Xjs0WFl3YHhiMuFq I5JXbGUmIeLkug4tJ8Fb6gLDHkzejWAsy6luZgYhZHnnIKGoaC0A9v4casIH4UumU3oW jpPnwoH3PA6SPVQ5zrX1l93OoCF8Hi/Apmp1iNbbRJ3kr0SJa+6PIvHcLDisUIyPATkF BBJg== X-Gm-Message-State: AOAM531Cjvx0nKiSa4pDDpf7/xqEBYITeDKs///UuS0LrZrsMBhqGs1y t9Q8W3DK3w2BpzOVKtP3BOmhVw== X-Received: by 2002:a17:90b:1b4f:: with SMTP id nv15mr183861pjb.56.1622481716289; Mon, 31 May 2021 10:21:56 -0700 (PDT) Received: from xps15 (S0106889e681aac74.cg.shawcable.net. [68.147.0.187]) by smtp.gmail.com with ESMTPSA id t14sm10642295pfg.168.2021.05.31.10.21.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 May 2021 10:21:55 -0700 (PDT) Date: Mon, 31 May 2021 11:21:53 -0600 From: Mathieu Poirier To: Bjorn Andersson Cc: Alex Elder , ohad@wizery.com, linux-remoteproc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/1] remoteproc: use freezable workqueue for crash notifications Message-ID: <20210531172153.GA1718330@xps15> References: <20210519234418.1196387-1-elder@linaro.org> <20210519234418.1196387-2-elder@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 27, 2021 at 10:55:05PM -0500, Bjorn Andersson wrote: > On Wed 19 May 18:44 CDT 2021, Alex Elder wrote: > > > When a remoteproc has crashed, rproc_report_crash() is called to > > handle whatever recovery is desired. This can happen at almost any > > time, often triggered by an interrupt, though it can also be > > initiated by a write to debugfs file remoteproc/remoteproc*/crash. > > > > When a crash is reported, the crash handler worker is scheduled to > > run (rproc_crash_handler_work()). One thing that worker does is > > call rproc_trigger_recovery(), which calls rproc_stop(). That calls > > the ->stop method for any remoteproc subdevices before making the > > remote processor go offline. > > > > The Q6V5 modem remoteproc driver implements an SSR subdevice that > > notifies registered drivers when the modem changes operational state > > (prepare, started, stop/crash, unprepared). The IPA driver > > registers to receive these notifications. > > > > With that as context, I'll now describe the problem. > > > > There was a situation in which buggy modem firmware led to a modem > > crash very soon after system (AP) resume had begun. The crash caused > > a remoteproc SSR crash notification to be sent to the IPA driver. > > The problem was that, although system resume had begun, it had not > > yet completed, and the IPA driver was still in a suspended state. This is a very tight race condition - I agree with you that it is next to impossible to test. > > > > This scenario could happen to any driver that registers for these > > SSR notifications, because they are delivered without knowledge of > > the (suspend) state of registered recipient drivers. > > > > This patch offers a simple fix for this, by having the crash > > handling worker function run on the system freezable workqueue. > > This workqueue does not operate if user space is frozen (for > > suspend). As a result, the SSR subdevice only delivers its > > crash notification when the system is fully operational (i.e., > > neither suspended nor in suspend/resume transition). > > I think the real fix for this problem should be in the platform driver where the remoteproc interrupt would be masked while suspending and re-enabled again when resuming. The runtime PM API would work just fine for that... But doing so wouldn't guarantee that other drivers, i.e IPA, would be operational. Unless of one is a child of the other or using a bus like mechanic, and getting to that point will introduce a lot more churn than what this patch does. The proposed solution will work since user space is expected to resume when the kernel and drivers are fully operational. As you pointed out the only concern is with other users, which may not be noticeable given the very small delay that is introduced. As such: Reviewed-by: Mathieu Poirier But be mindful the patch will have to be reverted if someone complains, whether that happens in two weeks or 10 months. Thanks, Mathieu > > This makes sense to me; both that it ensures that we spend our resources > on the actual system resume and that it avoids surprises from this > happening while the system still is in a funky state... > > Reviewed-by: Bjorn Andersson > > But it would be nice to get some input from other users of the > framework. > > Regards, > Bjorn > > > Signed-off-by: Alex Elder > > --- > > drivers/remoteproc/remoteproc_core.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > > index 39cf44cb08035..6bedf2d2af239 100644 > > --- a/drivers/remoteproc/remoteproc_core.c > > +++ b/drivers/remoteproc/remoteproc_core.c > > @@ -2724,8 +2724,8 @@ void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type) > > dev_err(&rproc->dev, "crash detected in %s: type %s\n", > > rproc->name, rproc_crash_to_string(type)); > > > > - /* create a new task to handle the error */ > > - schedule_work(&rproc->crash_handler); > > + /* Have a worker handle the error; ensure system is not suspended */ > > + queue_work(system_freezable_wq, &rproc->crash_handler); > > } > > EXPORT_SYMBOL(rproc_report_crash); > > > > -- > > 2.27.0 > >