Received: by 2002:a05:7412:85a1:b0:e2:908c:2ebd with SMTP id n33csp23173rdh; Mon, 30 Oct 2023 12:31:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGBn/Ym0YGtdKgKaqFZWGyr0/NW/A0qi8D9sqSrgQ1atXatCCmmg9FjPncVWzWdQZxHMGVm X-Received: by 2002:a17:90a:cb12:b0:280:1cfb:f7ad with SMTP id z18-20020a17090acb1200b002801cfbf7admr5621563pjt.4.1698694274953; Mon, 30 Oct 2023 12:31:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698694274; cv=none; d=google.com; s=arc-20160816; b=fa1294d93VyFz0zrjhAKnOnkWmvzFCDIyLRHqzVPoSDkN8NnJk4z++M021BhVcVYLh CKAiOUBTEGhZftN58j1lbBRY3VHExivT/jXVjhF1u4Ench5zmskXSD7DmNFZhr7Xn4yw 2KyypqRjdJjy9Q9tc46ijTQhpcVjCQNoszV/YlZ6g1RpeqDfgoi4YtgHZrL3ljWph5eg LBNB0Ky4iLFDSrPyeXko5pCIp+gFKwxoBAs+BmPuvqHrORgXOE6b3RUnzJXjM2LyKB11 qnTF3Sot0vi+qtVhRfjmTou+9SFWoS6WY2WWIMO1mpyVnXWgYpDRU1uM/9/vCNx0Ko8/ hKQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=Bn68+9C346k+SY2lOOgB0qWfB4xhLmgUcM2lN0Pcx2I=; fh=jiL7qcvG69k1bS9zGqmkPOpqJvwMpOXKWgFUwjGtPlo=; b=GY3yABrTPt0JJlOC4ckE5unlb9m01sIRr5hE0WvgDaNMFKY9HUrI4dQ3ca1uziVjG5 J1d04/c5hrpZ7FmGIrag8GVlDamKNRZskqYjiiBbj8sRsq4HvRv9Rqo918GDIpzWbyO7 WtFhEJsmrwaXMxM80lTHz8jSE2GI9wDX8al5nRobdR60dYgeAuG5v9Iu7H8hw6YoGonK RHXJ4uvcuxg505PSFNpJq2Lu1qo2UCcNmESZmzFy2kKTTwo6T2pg76+dB5TQGHO2TLyW 9pNWNHc/6JxgYtGkBU62rHRYr+iKBP5CdeAiFiAM/9gneyK5kpYIB0eqhiXtOCgzGHgO eE5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NV3cAfTK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id mm22-20020a17090b359600b002791e648ceasi2201391pjb.112.2023.10.30.12.31.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 12:31:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NV3cAfTK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 8AAA580A6ED0; Mon, 30 Oct 2023 12:31:11 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231222AbjJ3TbB (ORCPT + 99 others); Mon, 30 Oct 2023 15:31:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229510AbjJ3TbA (ORCPT ); Mon, 30 Oct 2023 15:31:00 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 282B2A9; Mon, 30 Oct 2023 12:30:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698694256; x=1730230256; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=oE5glMctpbfYlLJs53M3TJqVUGKgwDxVjEE0ZRcfHpM=; b=NV3cAfTKh5BNgEUMPp5/yH4Fu+uaNWL6hMCxcDUDyKIVFToxMyQKw0T4 G19bKDnjCm9fTEQ37RLj9x6x8QQ9RO1wzht3pRIO/V2inW5cIzKpM0a0T 6RPoxQqsyekTEShaUBH9VMskv9fJ/6PJ0TyTkRbxeLmyX7z7kh8n7SkPU 9enC7YRS9Q2keXSsAJHc0yEIZ5V4yIS5G5XQIGei/10qIOIYazgDG4PSB Z9dMar8ubxnNh8r0xpdhkaeX+e3VRP37xdV1F7bBxnpCdwHL+QM5ja4ah YcnXOhbthPVrGsMlKyvs7ikuJTvYmvLe3x7FAf3i1IfIEQrXJeh1j6Tho A==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="387965408" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="387965408" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 12:30:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="1007520505" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="1007520505" Received: from ahunter6-mobl1.ger.corp.intel.com (HELO [10.0.2.15]) ([10.249.41.161]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 12:30:54 -0700 Message-ID: Date: Mon, 30 Oct 2023 21:30:48 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] mmc: cqhci: Add a quirk to clear stale TC Content-Language: en-US To: =?UTF-8?Q?Kornel_Dul=C4=99ba?= Cc: linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org, Ulf Hansson , Radoslaw Biernacki , Gwendal Grignou References: <20231027145623.2258723-1-korneld@chromium.org> <20231027145623.2258723-2-korneld@chromium.org> From: Adrian Hunter Organization: Intel Finland Oy, Registered Address: PL 281, 00181 Helsinki, Business Identity Code: 0357606 - 4, Domiciled in Helsinki In-Reply-To: <20231027145623.2258723-2-korneld@chromium.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.3 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 30 Oct 2023 12:31:11 -0700 (PDT) On 27/10/23 17:56, Kornel Dulęba wrote: > This fix addresses a stale task completion event issued right after the > CQE recovery. As it's a hardware issue the fix is done in form of a > quirk. > > When error interrupt is received the driver runs recovery logic is run. > It halts the controller, clears all pending tasks, and then re-enables > it. On some platforms a stale task completion event is observed, > regardless of the CQHCI_CLEAR_ALL_TASKS bit being set. > > This results in either: > a) Spurious TC completion event for an empty slot. > b) Corrupted data being passed up the stack, as a result of premature > completion for a newly added task. > > To fix that re-enable the controller, clear task completion bits, > interrupt status register and halt it again. > This is done at the end of the recovery process, right before interrupts > are re-enabled. > > Signed-off-by: Kornel Dulęba > --- > drivers/mmc/host/cqhci-core.c | 42 +++++++++++++++++++++++++++++++++++ > drivers/mmc/host/cqhci.h | 1 + > 2 files changed, 43 insertions(+) > > diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c > index b3d7d6d8d654..e534222df90c 100644 > --- a/drivers/mmc/host/cqhci-core.c > +++ b/drivers/mmc/host/cqhci-core.c > @@ -1062,6 +1062,45 @@ static void cqhci_recover_mrqs(struct cqhci_host *cq_host) > /* CQHCI could be expected to clear it's internal state pretty quickly */ > #define CQHCI_CLEAR_TIMEOUT 20 > > +/* > + * During CQE recovery all pending tasks are cleared from the > + * controller and its state is being reset. > + * On some platforms the controller sets a task completion bit for > + * a stale(previously cleared) task right after being re-enabled. > + * This results in a spurious interrupt at best and corrupted data > + * being passed up the stack at worst. The latter happens when > + * the driver enqueues a new request on the problematic task slot > + * before the "spurious" task completion interrupt is handled. > + * To fix it: > + * 1. Re-enable controller by clearing the halt flag. > + * 2. Clear interrupt status and the task completion register. > + * 3. Halt the controller again to be consistent with quirkless logic. > + * > + * This assumes that there are no pending requests on the queue. > + */ > +static void cqhci_quirk_clear_stale_tc(struct cqhci_host *cq_host) > +{ > + u32 reg; > + > + WARN_ON(cq_host->qcnt); > + cqhci_writel(cq_host, 0, CQHCI_CTL); > + if ((cqhci_readl(cq_host, CQHCI_CTL) & CQHCI_HALT)) { > + pr_err("%s: cqhci: CQE failed to exit halt state\n", > + mmc_hostname(cq_host->mmc)); > + } > + reg = cqhci_readl(cq_host, CQHCI_TCN); > + cqhci_writel(cq_host, reg, CQHCI_TCN); > + reg = cqhci_readl(cq_host, CQHCI_IS); > + cqhci_writel(cq_host, reg, CQHCI_IS); > + > + /* > + * Halt the controller again. > + * This is only needed so that we're consistent across quirk > + * and quirkless logic. > + */ > + cqhci_halt(cq_host->mmc, CQHCI_FINISH_HALT_TIMEOUT); > +} Thanks a lot for tracking this down! It could be that the "un-halt" starts a task, so it would be better to force the "clear" to work if possible, which should be the case if CQE is disabled. Would you mind trying the code below? Note the increased CQHCI_START_HALT_TIMEOUT helps avoid trying to clear tasks when CQE has not halted. diff --git a/drivers/mmc/host/cqhci-core.c b/drivers/mmc/host/cqhci-core.c index b3d7d6d8d654..534c13069833 100644 --- a/drivers/mmc/host/cqhci-core.c +++ b/drivers/mmc/host/cqhci-core.c @@ -987,7 +987,7 @@ static bool cqhci_halt(struct mmc_host *mmc, unsigned int timeout) * layers will need to send a STOP command), so we set the timeout based on a * generous command timeout. */ -#define CQHCI_START_HALT_TIMEOUT 5 +#define CQHCI_START_HALT_TIMEOUT 500 static void cqhci_recovery_start(struct mmc_host *mmc) { @@ -1075,28 +1075,27 @@ static void cqhci_recovery_finish(struct mmc_host *mmc) ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT); - if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT)) - ok = false; - /* * The specification contradicts itself, by saying that tasks cannot be * cleared if CQHCI does not halt, but if CQHCI does not halt, it should * be disabled/re-enabled, but not to disable before clearing tasks. * Have a go anyway. */ - if (!ok) { - pr_debug("%s: cqhci: disable / re-enable\n", mmc_hostname(mmc)); - cqcfg = cqhci_readl(cq_host, CQHCI_CFG); - cqcfg &= ~CQHCI_ENABLE; - cqhci_writel(cq_host, cqcfg, CQHCI_CFG); - cqcfg |= CQHCI_ENABLE; - cqhci_writel(cq_host, cqcfg, CQHCI_CFG); - /* Be sure that there are no tasks */ - ok = cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT); - if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT)) - ok = false; - WARN_ON(!ok); - } + if (!cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT)) + ok = false; + + cqcfg = cqhci_readl(cq_host, CQHCI_CFG); + cqcfg &= ~CQHCI_ENABLE; + cqhci_writel(cq_host, cqcfg, CQHCI_CFG); + + cqcfg = cqhci_readl(cq_host, CQHCI_CFG); + cqcfg |= CQHCI_ENABLE; + cqhci_writel(cq_host, cqcfg, CQHCI_CFG); + + cqhci_halt(mmc, CQHCI_FINISH_HALT_TIMEOUT); + + if (!ok) + cqhci_clear_all_tasks(mmc, CQHCI_CLEAR_TIMEOUT); cqhci_recover_mrqs(cq_host);