Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp823527pxu; Fri, 11 Dec 2020 15:37:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJzvilunPZsmiSmbw+uRWsNjlBqJMWsE2v+IXUm2edjBWewF7RKuZwDOJaxG36kvCtHqmkwY X-Received: by 2002:a17:906:1498:: with SMTP id x24mr12722059ejc.170.1607729863153; Fri, 11 Dec 2020 15:37:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607729863; cv=none; d=google.com; s=arc-20160816; b=TAymX/OJAH+E/jeWpPxV59ikT55lHKjiRdDwEMCr+mUceI6xXnK6gSOLrjB3NsRAYP gJcfaPSp9neiE39MUoclpV5JcI8+HBCDShOsLjRMnkJ3ACHZDBmSwCfhJ0U2BpZalGL8 fkawUAdUWWeXgLyaiFkACuK4nhy2iqTUbAbrmZERhOJO5fel9bXdpSQ6hapuCx8b2Wva A69Bnfs9ihhY1nXHLX1HNiDAomLdtmuLy42s0ljUPJM/AWgaOf1IoAkqosgE3ENHrxpI XgszfYFtmXdX7Hl2KaPqxvaV+HysmjLbkmZb+uiKnMNZ10hWeWT0FeGVDU2SUTmNPwC6 7Z5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:message-id:date:to:cc:from:subject :references:in-reply-to:content-transfer-encoding:mime-version :dkim-signature; bh=dprNn0WuSryYKqCdSm/Bp8UbqgSIIDDVYORtNWfa1WU=; b=B9ljzdZ3SV+FsxSIK7FkPBi9pWK7hyUnTtFLaBifdmdHTDfw9NN6ca1UO/mgh49Lfi pjZzZ6xr2NDxNbHdfR9VyvoWMMGz7z3ccKt55U7X2unaSQnwNt11Mw2heKbMuIjnAYbZ 6KRv3HrTaUEEuLGH/Ww9nZYG4KoS7neEdwlM3l9c1mYcT6uzqE678VHGJ1GlKD0BY6K0 Dq8F8Xev8Re3RkT4wqv49SMgPCcaF6XEJdTRj0sqoURoH05Cui2atIUQeFvIUU6Zmea1 EadkWQ7Fdh9esi48QA0G7gs/NGjq7sOqmdA/OCNGREuiqLuceBxkrNGCIAnahOZtaIRT Pvbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=TgF3ywge; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m8si5877738edp.82.2020.12.11.15.37.20; Fri, 11 Dec 2020 15:37:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=TgF3ywge; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2394648AbgLKAwK (ORCPT + 99 others); Thu, 10 Dec 2020 19:52:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2394608AbgLKAvi (ORCPT ); Thu, 10 Dec 2020 19:51:38 -0500 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 01A03C0613D3 for ; Thu, 10 Dec 2020 16:50:58 -0800 (PST) Received: by mail-pf1-x432.google.com with SMTP id c79so5862264pfc.2 for ; Thu, 10 Dec 2020 16:50:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:content-transfer-encoding:in-reply-to:references :subject:from:cc:to:date:message-id:user-agent; bh=dprNn0WuSryYKqCdSm/Bp8UbqgSIIDDVYORtNWfa1WU=; b=TgF3ywgeruCprqOfnZtUK170KoOS56SF0sNSw9F9eGyjKwPruhDwRMwGj9lsZWaKb+ QY3z63z2q3RRyqhgPUDj0qz6QDZwZikHLKHid80XFpV+d8RX0f0kPBOPHhncpXyOFfzf AHG1StZy2oGIpCpqfM9KUk2MtIsBTPmFNa7fo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:content-transfer-encoding :in-reply-to:references:subject:from:cc:to:date:message-id :user-agent; bh=dprNn0WuSryYKqCdSm/Bp8UbqgSIIDDVYORtNWfa1WU=; b=ul8ZORp8n/P03pRiVHeKTBG7Kfw9yOQx6yc/x3IJGhk0bFq702hi8wmrP/ZE4e9CdA fVGJ4cXT1J4vu3JnuPwpDSW0qXCnCWFqICF7oV1qdlq2V71w0dnx3XZOWnZd6VVWTwaw gQ+rovI32brXqyCH5REyPYrAGZNWGLsCZORwgAx65dRsWmpeSjMW/uycn5Yp3/vnTpoB S0gF+YSKDDmQnMGSpd1P/f5N7uIBj61huJd9YPlEtsynIUi40zvYsOYygf+AL4Vd/VOc yxuxGGtwWjC0oWkp8VRoVoFq52disMpGgKxXUed5ey1iv9i+gRmwHw9wwrV8+7ftNr8K SGIg== X-Gm-Message-State: AOAM531Q0phh+oCnw14OGtDmXuY7SIVDXyyeZ3RU0LQlsOuRXhpr1Csf YP0BZMvhagdpaHrDrjZ3UlbyEQ== X-Received: by 2002:a63:4b1f:: with SMTP id y31mr9247800pga.29.1607647857472; Thu, 10 Dec 2020 16:50:57 -0800 (PST) Received: from chromium.org ([2620:15c:202:201:3e52:82ff:fe6c:83ab]) by smtp.gmail.com with ESMTPSA id q22sm4375726pjg.15.2020.12.10.16.50.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Dec 2020 16:50:56 -0800 (PST) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable In-Reply-To: References: <20201203074459.13078-1-rojay@codeaurora.org> <160757022002.1580929.8656750350166301192@swboyd.mtv.corp.google.com> <160764107797.1580929.14768824290834396298@swboyd.mtv.corp.google.com> <160764316821.1580929.18177257779550490986@swboyd.mtv.corp.google.com> Subject: Re: [PATCH] spi: spi-geni-qcom: Fix NULL pointer access in geni_spi_isr From: Stephen Boyd Cc: Roja Rani Yarubandi , Mark Brown , Andy Gross , Bjorn Andersson , linux-arm-msm , linux-spi , LKML , Akash Asthana , msavaliy@qti.qualcomm.com To: Doug Anderson Date: Thu, 10 Dec 2020 16:50:55 -0800 Message-ID: <160764785500.1580929.4255309510717807485@swboyd.mtv.corp.google.com> User-Agent: alot/0.9.1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Quoting Doug Anderson (2020-12-10 15:50:04) > Hi, >=20 > On Thu, Dec 10, 2020 at 3:32 PM Stephen Boyd wrote: > > > > Quoting Doug Anderson (2020-12-10 15:07:39) > > > Hi, > > > > > > On Thu, Dec 10, 2020 at 2:58 PM Stephen Boyd wr= ote: > > > > right? It will only ensure that other irq handlers have completed, = which > > > > may be a problem, but not the only one. > > > > > > > > TL;DR: Peek at the irq status register in the timeout logic and ski= p it > > > > if the irq is pending? > > > > > > I don't have tons of experience with synchronize_irq(), but the > > > function comment seems to indicate that as long as the interrupt is > > > pending synchronize_irq() will do what we want even if the CPU that > > > should handle the interrupt is in an irqsoff section. Digging a > > > little bit I guess it relies upon the interrupt controller being able > > > to read this state, but (hopefully) the GIC can? > > > > I didn't read synchronize_irq() more than the single line summary. I > > thought it would only make sure other irq handlers have finished, which > > is beside the point of some long section of code that has disabled irqs > > on CPU0 with local_irq_disable(). And further more, presumably the irq > > handler could be threaded, and then we could put a sufficiently large > > msleep() at the start of geni_spi_isr() and see the same problem? >=20 > As I understand it synchronize_irq(): > 1. If the interrupt is not running but is pending at a hardware level, > it'll wait. > 2. If the interrupt is currently running it waits for it to finish. >=20 > That should handle all the cases you're talking about including > waiting for the threaded IRQ handler. There's an explicit comment > about the threaded IRQ being accounted for in synchronize_irq(): >=20 > https://elixir.bootlin.com/linux/v5.9/source/kernel/irq/manage.c#L134 Ok cool sounds like it would work then. Thanks for reading the code for me! :) >=20 >=20 > > > If it doesn't work like I think it does, I'd be OK with peeking in the > > > IRQ status register, but we shouldn't _skip_ the logic IMO. As long > > > as we believe that an interrupt could happen in the future we > > > shouldn't return from handle_fifo_timeout(). It's impossible to > > > reason about how future transfers would work if the pending interrupt > > > from the previous transfer could fire at any point. > > > > Right. I just meant skip the timeout handling logic. We'd have to go > > back to the timeout and keep waiting until the irq handler can run and > > complete the completion variable. > > > > I forgot that this is half handled in the spi core though. Peeking at > > m_irq doesn't look very easy to implement. It certainly seems like this > > means the timeout handler is busted and the diagram earlier could > > indicate that spi core is driving this logic from > > spi_transfer_one_message(). >=20 > My assumption was that it was still OK (even if not perfect) to still > process it as a timeout. I just want to really make sure a future > interrupt isn't going to show up. I'm worried about the buffer disappearing if spi core calls handle_err() but the geni_spi_isr() handler runs both an rx and a cancel/abort routine. That doesn't seem to be the case though so it looks all fine. >=20 > If we want to try to do better, we can do timeout handling ourselves. > The SPI core allows for that. >=20 >=20 > > So why don't we check for cur_xfer being NULL in the rx/tx handling > > paths too and bail out there? Does the FIFO need to be cleared out in > > such a situation that spi core thinks a timeout happened but there's RX > > data according to m_irq? Do we need to read it all and throw it away? Or > > does the abort/cancel clear out the RX fifo? >=20 > I don't know for sure, but IMO it's safest to read anything that's in > the FIFO. It's also important to adjust the watermark in the TX case. > The suggestions I provided in my original reply (#2 and #3) handle > this and are plenty simple. >=20 > As per my original reply, though, anything we do in the ISR doesn't > replace the changes we need to make to handle_fifo_timeout(). It is > very important that when handle_fifo_timeout() finishes that no future > interrupts for old transfers will fire. >=20 Alright. With a proper diagram in the commit text I think doing all of the points, 1 through 3, would be good and required to leave the hardware in a sane state for all the scenarios. Why do we need to call synchronize_irq() at the start and end of handle_fifo_timeout() though? Presumably having it at the start would make sure the long delayed irq runs and handles any rx/tx by throwing it away. Sounds great, but having it at the end leaves me confused. We want to make sure the cancel really went through? Don't we know that because the completion variable for cancel succeeded? I guess I'm not convinced that the hardware is so bad that it cancels and aborts the sequencer, raises an irq for that, and then raises an irq for the earlier rx/tx that the sequencer canceled out. Is that happening? It's called a sequencer because presumably it runs a sequence of operations like tx, rx, cs change, cancel, and abort. Hopefully it doesn't run them out of order. If they run at the same time it's fine, the irq handler will see all of them and throw away reads, etc.