Received: by 2002:a05:7412:f589:b0:e2:908c:2ebd with SMTP id eh9csp836700rdb; Wed, 1 Nov 2023 04:31:34 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFMyCqhhiVUc1dMRXPp3ojpRkq5y9+NpDz2o/PH1v7GZzcRyBQuIQfZY8xkzWMAsxnJXBgJ X-Received: by 2002:a92:d709:0:b0:359:42fc:41ce with SMTP id m9-20020a92d709000000b0035942fc41cemr1290065iln.18.1698838294114; Wed, 01 Nov 2023 04:31:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698838294; cv=none; d=google.com; s=arc-20160816; b=GK3GZHyoxUay9pFxeQbM8vApP1JYrgRFeWj+2PPPF5SGHy4y626aHpB0MhkQ7stdZd CXRQXp3dlnBmyl12I3voxZPY3H/kXvE7ulZ3ucwiR4Yvs4za9H9Fof3ZVF/1PMMB2wCc 2z/F/L4YH6zgz5lGaCO/ChnSyOo2N2QmguCKPAkuw1pztGy2KgpneDY8MNRH3ccsq+lD gsuO46GG+ThqYMUJymE73qolO0tbWrZlVuB2gJSKP87eTz7+nr0/G9JxGrwCwYSXC9Px iEcZQHjFn5/zt+nH8zVgAol7UX/KoPTOANBjgi3gPpbm2OL1U+LIQBNaW6aArJ1CceoJ r+pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=5GLyp5FUS2qPs528dNdoTbrsbRrAppICUBpffUvX+tg=; fh=9TmjfApZ2Ab6U07mjgfrdyg3hRsoWFx39AV+vaj2B2s=; b=jfidwrCV4OmkIdzcAktW9+lK7Vrq0nkdZwY6kgrMF9iEF4XBLGd7OsuPfQww0KmJ2Q fhDUx5el5zoQt2RbmLu3kqLbt5jqhnvIoIGnW10EZhzAHeZN8twDWjyhZYWNfLnnnSy2 NProJxlUJdXzNPuqPngDHAiqF2k7Pgi2de+Zxv1GbYPnW+NLjo2fkbcMbK1/zptQBte4 KmCLuPwCIbg8XpH8GIPd4kw7noaESo1TBk/REQwAorYhLZykUH4VDPKsgY7TjfuH4qs5 7eKT3ogxW3eR47MoUTLjaV/sgx1DCd7T5ZDkpuXZMpP28IUpXwzvoH1rRCGAzhsCSIxA PaWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=bLbaUnav; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id z2-20020a056a00240200b006bf840df9bdsi1486245pfh.27.2023.11.01.04.31.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Nov 2023 04:31:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=bLbaUnav; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id ABCD2809AFC0; Wed, 1 Nov 2023 04:31:30 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235276AbjKALbV (ORCPT + 99 others); Wed, 1 Nov 2023 07:31:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58098 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235193AbjKALbT (ORCPT ); Wed, 1 Nov 2023 07:31:19 -0400 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A944B111 for ; Wed, 1 Nov 2023 04:31:13 -0700 (PDT) Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-778a20df8c3so465925285a.3 for ; Wed, 01 Nov 2023 04:31:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1698838272; x=1699443072; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5GLyp5FUS2qPs528dNdoTbrsbRrAppICUBpffUvX+tg=; b=bLbaUnavCfhTqoQra0/LwwSuSzXE/lzWII6uTfolMZuySw1ibvN3Lu3dcICmVjx2iQ /oojfgLHsczj78CwbLfUJ6y19g2hm/VcMQ2wRlf03UzILzJx09RLbVkChCmTFTdrmyay g2ooi9ccxI9ALUzxHTja3pJ4SSpNnS7K3A17w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698838272; x=1699443072; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5GLyp5FUS2qPs528dNdoTbrsbRrAppICUBpffUvX+tg=; b=qh6F/Jiy5jMLOhMC0vjtoQ2p//+EbFP7eKbSUQLcxXuL8e4eHiWB7MIXaN3gm0Fleb fiTjs+n5cFGNo58Tu1BNv6VxUlWSfY8a+Bg+ri+ZjfgIe41KGFomgWbwDdcKRRfP3EEV HlmfdwOhlNrtS++v4WAxDo0WtQFo4ioLDTGbWfN4IpsqwOmIawUoWkZ46U26Xa0rA/Xm 8Urc3b6AItVXB/yLIlnBxp/6+scdRk05+lFajxw7rAONphHTbF+UazRGdQpF0Wkyw5Jh JYCffg4PuVPLyydW61RUQDegiBXrIQC6ixF7/uTHss1LpZ/wEmZaVZ2PVZ6L2HySMdYn kUSg== X-Gm-Message-State: AOJu0YzpoQC4AGS6tDWNb+PsNUtJHH2dw0ZAqsCfKHhOlYGrWiKyyMKV huhjoJGGkUIm9A5vXL2ITopTyJiUO7S+Dw57QiiNJg== X-Received: by 2002:a0c:e2c5:0:b0:672:7fe3:7aae with SMTP id t5-20020a0ce2c5000000b006727fe37aaemr7995774qvl.56.1698838272664; Wed, 01 Nov 2023 04:31:12 -0700 (PDT) MIME-Version: 1.0 References: <20231027145623.2258723-1-korneld@chromium.org> <20231027145623.2258723-2-korneld@chromium.org> In-Reply-To: From: =?UTF-8?Q?Kornel_Dul=C4=99ba?= Date: Wed, 1 Nov 2023 12:31:01 +0100 Message-ID: Subject: Re: [PATCH 1/2] mmc: cqhci: Add a quirk to clear stale TC To: Adrian Hunter Cc: linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org, Ulf Hansson , Radoslaw Biernacki , Gwendal Grignou Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.3 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 01 Nov 2023 04:31:30 -0700 (PDT) On Mon, Oct 30, 2023 at 8:31=E2=80=AFPM Adrian Hunter wrote: > > On 27/10/23 17:56, Kornel Dul=C4=99ba wrote: > > This fix addresses a stale task completion event issued right after the > > CQE recovery. As it's a hardware issue the fix is done in form of a > > quirk. > > > > When error interrupt is received the driver runs recovery logic is run. > > It halts the controller, clears all pending tasks, and then re-enables > > it. On some platforms a stale task completion event is observed, > > regardless of the CQHCI_CLEAR_ALL_TASKS bit being set. > > > > This results in either: > > a) Spurious TC completion event for an empty slot. > > b) Corrupted data being passed up the stack, as a result of premature > > completion for a newly added task. > > > > To fix that re-enable the controller, clear task completion bits, > > interrupt status register and halt it again. > > This is done at the end of the recovery process, right before interrupt= s > > are re-enabled. > > > > Signed-off-by: Kornel Dul=C4=99ba > > --- > > drivers/mmc/host/cqhci-core.c | 42 +++++++++++++++++++++++++++++++++++ > > drivers/mmc/host/cqhci.h | 1 + > > 2 files changed, 43 insertions(+) > > > > diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-cor= e.c > > index b3d7d6d8d654..e534222df90c 100644 > > --- a/drivers/mmc/host/cqhci-core.c > > +++ b/drivers/mmc/host/cqhci-core.c > > @@ -1062,6 +1062,45 @@ static void cqhci_recover_mrqs(struct cqhci_host= *cq_host) > > /* CQHCI could be expected to clear it's internal state pretty quickly= */ > > #define CQHCI_CLEAR_TIMEOUT 20 > > > > +/* > > + * During CQE recovery all pending tasks are cleared from the > > + * controller and its state is being reset. > > + * On some platforms the controller sets a task completion bit for > > + * a stale(previously cleared) task right after being re-enabled. > > + * This results in a spurious interrupt at best and corrupted data > > + * being passed up the stack at worst. The latter happens when > > + * the driver enqueues a new request on the problematic task slot > > + * before the "spurious" task completion interrupt is handled. > > + * To fix it: > > + * 1. Re-enable controller by clearing the halt flag. > > + * 2. Clear interrupt status and the task completion register. > > + * 3. Halt the controller again to be consistent with quirkless logic. > > + * > > + * This assumes that there are no pending requests on the queue. > > + */ > > +static void cqhci_quirk_clear_stale_tc(struct cqhci_host *cq_host) > > +{ > > + u32 reg; > > + > > + WARN_ON(cq_host->qcnt); > > + cqhci_writel(cq_host, 0, CQHCI_CTL); > > + if ((cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT)) { > > + pr_err("%s: cqhci: CQE failed to exit halt state\n", > > + mmc_hostname(cq_host->mmc)); > > + } > > + reg =3D cqhci_readl(cq_host, CQHCI_TCN); > > + cqhci_writel(cq_host, reg, CQHCI_TCN); > > + reg =3D cqhci_readl(cq_host, CQHCI_IS); > > + cqhci_writel(cq_host, reg, CQHCI_IS); > > + > > + /* > > + * Halt the controller again. > > + * This is only needed so that we're consistent across quirk > > + * and quirkless logic. > > + */ > > + cqhci_halt(cq_host->mmc, CQHCI_FINISH_HALT_TIMEOUT); > > +} > > Thanks a lot for tracking this down! > > It could be that the "un-halt" starts a task, so it would be > better to force the "clear" to work if possible, which > should be the case if CQE is disabled. > > Would you mind trying the code below? Note the increased > CQHCI_START_HALT_TIMEOUT helps avoid trying to clear tasks > when CQE has not halted. Sure, I'll try it out tomorrow, as I don't have access to the DUT today. BTW do we even need to halt the controller in the recovery_finish logic? It has already been halted in recovery_start, I guess it could be there in case the recovery_start halt didn't work. But in that case shouldn't we do this disable/re-enable dance in recovery_s= tart? > > > diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.= c > index b3d7d6d8d654..534c13069833 100644 > --- a/drivers/mmc/host/cqhci-core.c > +++ b/drivers/mmc/host/cqhci-core.c > @@ -987,7 +987,7 @@ static bool cqhci_halt(struct mmc_host *mmc, unsigned= int timeout) > * layers will need to send a STOP command), so we set the timeout based= on a > * generous command timeout. > */ > -#define CQHCI_START_HALT_TIMEOUT 5 > +#define CQHCI_START_HALT_TIMEOUT 500 > > static void cqhci_recovery_start(struct mmc_host *mmc) > { > @@ -1075,28 +1075,27 @@ static void cqhci_recovery_finish(struct mmc_host= *mmc) > > ok =3D cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT); > > - if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT)) > - ok =3D false; > - > /* > * The specification contradicts itself, by saying that tasks can= not be > * cleared if CQHCI does not halt, but if CQHCI does not halt, it= should > * be disabled/re-enabled, but not to disable before clearing tas= ks. > * Have a go anyway. > */ > - if (!ok) { > - pr_debug("%s: cqhci: disable / re-enable\n", mmc_hostname= (mmc)); > - cqcfg =3D cqhci_readl(cq_host, CQHCI_CFG); > - cqcfg &=3D ~CQHCI_ENABLE; > - cqhci_writel(cq_host, cqcfg, CQHCI_CFG); > - cqcfg |=3D CQHCI_ENABLE; > - cqhci_writel(cq_host, cqcfg, CQHCI_CFG); > - /* Be sure that there are no tasks */ > - ok =3D cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT); > - if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT)) > - ok =3D false; > - WARN_ON(!ok); > - } > + if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT)) > + ok =3D false; > + > + cqcfg =3D cqhci_readl(cq_host, CQHCI_CFG); > + cqcfg &=3D ~CQHCI_ENABLE; > + cqhci_writel(cq_host, cqcfg, CQHCI_CFG); > + > + cqcfg =3D cqhci_readl(cq_host, CQHCI_CFG); > + cqcfg |=3D CQHCI_ENABLE; > + cqhci_writel(cq_host, cqcfg, CQHCI_CFG); > + > + cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT); > + > + if (!ok) > + cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT); > > cqhci_recover_mrqs(cq_host); > >