Received: by 2002:a05:7412:a9a2:b0:e2:908c:2ebd with SMTP id o34csp382637rdh; Thu, 26 Oct 2023 05:08:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFyjCNwGOESyDt9ttDSRPzEOofUAfOqMwkgfCTJHmS07ZQHqpfcqOYWeBNv0Kuwg78et7wx X-Received: by 2002:a05:690c:dd4:b0:5a7:af4f:59ad with SMTP id db20-20020a05690c0dd400b005a7af4f59admr5423519ywb.0.1698322125050; Thu, 26 Oct 2023 05:08:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698322125; cv=none; d=google.com; s=arc-20160816; b=al1L/fiz6jkNMLE5+YqKy5xp8LKT4VxVLgEjn42mQj8K5EZiuFAXq7LoMyl8ovcJhb wGrSRFTl9WhebMvNfoG8h290aT8PamXaDteemYnxheLZKsttoEJOHz4ZTLTSqMKo05DX x5wxlG6XPpwdEpux1PZzYgG0mjHlQa8nvbP/cH+C2lUGKDnYhTtZdjEnplD4PqKx9X8C hXIly3PePtqwk6x9o4Rg05Bkkzj8jdq7jsKRuNBW4C8MWAlZ2D+6nZmDbGgVtHzT7epj 0Ir3oVirdvS0MVXcaHIMILLHMNOWAz4aWxWLiaprVvu4oH34sZ5ErACGt2dmx0CE/G6B +2kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version; bh=hXo9uH7k4D2aZOP2Ai9oROUx9PrvlLdJaZe+dRQhP7I=; fh=NA7RUn6ZBIepXesQxSo7heQREsRJSNx9RdK9a+937kM=; b=ouUGKBDqXxlTgDf5wZNv8xdhUz/AVuL96FeDRnLFePyIT1u77kqWdSl4YORmP8mOha LT112WTQ5Tlak/CQ3db+eXQso0rf4N055FfcD4eTrz2v3EovVHYWM3mgfk4K/0LZn/id T03yBtaHxjw6R6X82QeGWwD/ikwxKUDIxa+JH7l5PBJVS6i2nbaiI41UO7pjHeb0hJol OnzfYYY9OLB1kzccqgb7cp4U+JKOBi3otuPsXToTN+vDpnQk0dWtnv2vlxhpqOe2HJd8 Hj/de19/d350gg6Uz4cxbTM4inn8lkDms7WmDRbeoipUkmnKz4UuS3perPzD1buOp9Cp Jh8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id h16-20020a81b410000000b005a8632c3d4esi14807996ywi.561.2023.10.26.05.08.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 05:08:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 7D556821821B; Thu, 26 Oct 2023 05:08:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230455AbjJZMIj convert rfc822-to-8bit (ORCPT + 99 others); Thu, 26 Oct 2023 08:08:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229642AbjJZMIh (ORCPT ); Thu, 26 Oct 2023 08:08:37 -0400 Received: from mail-yw1-f177.google.com (mail-yw1-f177.google.com [209.85.128.177]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E4A21A6; Thu, 26 Oct 2023 05:08:35 -0700 (PDT) Received: by mail-yw1-f177.google.com with SMTP id 00721157ae682-5a7a80a96dbso17192987b3.0; Thu, 26 Oct 2023 05:08:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698322114; x=1698926914; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3KKbxpaRdKQQQLvpY2x+Hdf2Y5BAYNzBsK+Et59KcX8=; b=ADxCBZjkuPmbALPFjGI0cRx5TBP6YQYRzuMWM8y3VNYdibuyVDxwu/dZruIXPN0JlL MDqmSV+lzBsKEKJQgmcVqrBfRbgsi36IroW3865gzKvSl1mYnLfbwduO+e/XyI04mHxT Lig7rPWmRU8nphSuyqGl//Zb/tL8+x5aIv8OdogZKYLODd1AvE3z1cfFM2Sp3BkIVeYF C3j3EFhm2N9w/dt0znjgvv5ygj2drJNwapnUaGlR2Y9ziAmbcb3HZ1AQSrKtydoaV8/8 mDbRofo+V0wF2o94fxMYphPxMc0h9XT+oWixewvyRROzG7WAct2Gejw3VSd4fdetzDHA BgXQ== X-Gm-Message-State: AOJu0Yx1q4SuiDSOBfV+4DdCpbTGFTeDsFjuxad9ILP1980lfTfzeKfT zFJbbb1206JmxdTdf7+TDWV/bB/Srk6FbQ== X-Received: by 2002:a0d:e682:0:b0:5a7:dac8:2fa with SMTP id p124-20020a0de682000000b005a7dac802famr4070833ywe.24.1698322114027; Thu, 26 Oct 2023 05:08:34 -0700 (PDT) Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com. [209.85.128.172]) by smtp.gmail.com with ESMTPSA id f5-20020a0ddc05000000b005a7bf2aff15sm5976139ywe.95.2023.10.26.05.08.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Oct 2023 05:08:33 -0700 (PDT) Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-59e88a28b98so7109127b3.1; Thu, 26 Oct 2023 05:08:33 -0700 (PDT) X-Received: by 2002:a81:eb04:0:b0:5a7:a896:3f54 with SMTP id n4-20020a81eb04000000b005a7a8963f54mr3275652ywm.26.1698322113544; Thu, 26 Oct 2023 05:08:33 -0700 (PDT) MIME-Version: 1.0 References: <20231009130126.697995596@linuxfoundation.org> <2023101057-runny-pellet-8952@gregkh> <7d7a5a15-3349-adce-02cd-82b6cb4bebde@roeck-us.net> In-Reply-To: From: Geert Uytterhoeven Date: Thu, 26 Oct 2023 14:08:20 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: renesas_sdhi problems in 5.10-stable was Re: [PATCH 5.10 000/226] 5.10.198-rc1 review To: Guenter Roeck Cc: Pavel Machek , Wolfram Sang , Ulf Hansson , Greg Kroah-Hartman , niklas.soderlund+renesas@ragnatech.se, yoshihiro.shimoda.uh@renesas.com, biju.das.jz@bp.renesas.com, Chris.Paterson2@renesas.com, stable@vger.kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, shuah@kernel.org, patches@kernelci.org, lkft-triage@lists.linaro.org, jonathanh@nvidia.com, f.fainelli@gmail.com, sudipm.mukherjee@gmail.com, srw@sladewatkins.net, rwarsow@gmx.de, conor@kernel.org, Linux MMC List , Linux-Renesas Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 26 Oct 2023 05:08:42 -0700 (PDT) On Wed, Oct 25, 2023 at 11:26 PM Geert Uytterhoeven wrote: > On Wed, Oct 25, 2023 at 9:53 PM Geert Uytterhoeven wrote: > > On Wed, Oct 25, 2023 at 8:39 PM Guenter Roeck wrote: > > > On 10/25/23 10:05, Geert Uytterhoeven wrote: > > > > On Wed, Oct 25, 2023 at 2:35 PM Geert Uytterhoeven wrote: > > > >> On Wed, Oct 25, 2023 at 12:53 PM Geert Uytterhoeven > > > >> wrote: > > > >>> On Wed, Oct 25, 2023 at 12:47 PM Geert Uytterhoeven > > > >>> wrote: > > > >>>> On Tue, Oct 24, 2023 at 9:22 PM Pavel Machek wrote: > > > >>>>> But we still have failures on Renesas with 5.10.199-rc2: > > > >>>>> > > > >>>>> https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/1047368849 > > > >>>>> > > > >>>>> And they still happed during MMC init: > > > >>>>> > > > >>>>> 2.638013] renesas_sdhi_internal_dmac ee100000.mmc: Got CD GPIO > > > >>>>> [ 2.638846] INFO: trying to register non-static key. > > > >>>>> [ 2.644192] ledtrig-cpu: registered to indicate activity on CPUs > > > >>>>> [ 2.649066] The code is fine but needs lockdep annotation, or maybe > > > >>>>> [ 2.649069] you didn't initialize this object before use? > > > >>>>> [ 2.649071] turning off the locking correctness validator. > > > >>>>> [ 2.649080] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.199-rc2-arm64-renesas-ge31b6513c43d #1 > > > >>>>> [ 2.649082] Hardware name: HopeRun HiHope RZ/G2M with sub board (DT) > > > >>>>> [ 2.649086] Call trace: > > > >>>>> [ 2.655106] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping .... > > > >>>>> [ 2.661354] dump_backtrace+0x0/0x194 > > > >>>>> [ 2.661361] show_stack+0x14/0x20 > > > >>>>> [ 2.667430] usbcore: registered new interface driver usbhid > > > >>>>> [ 2.672230] dump_stack+0xe8/0x130 > > > >>>>> [ 2.672238] register_lock_class+0x480/0x514 > > > >>>>> [ 2.672244] __lock_acquire+0x74/0x20ec > > > >>>>> [ 2.681113] usbhid: USB HID core driver > > > >>>>> [ 2.687450] lock_acquire+0x218/0x350 > > > >>>>> [ 2.687456] _raw_spin_lock+0x58/0x80 > > > >>>>> [ 2.687464] tmio_mmc_irq+0x410/0x9ac > > > >>>>> [ 2.688556] renesas_sdhi_internal_dmac ee160000.mmc: mmc0 base at 0x00000000ee160000, max clock rate 200 MHz > > > >>>>> [ 2.744936] __handle_irq_event_percpu+0xbc/0x340 > > > >>>>> [ 2.749635] handle_irq_event+0x60/0x100 > > > >>>>> [ 2.753553] handle_fasteoi_irq+0xa0/0x1ec > > > >>>>> [ 2.757644] __handle_domain_irq+0x7c/0xdc > > > >>>>> [ 2.761736] efi_header_end+0x4c/0xd0 > > > >>>>> [ 2.765393] el1_irq+0xcc/0x180 > > > >>>>> [ 2.768530] arch_cpu_idle+0x14/0x2c > > > >>>>> [ 2.772100] default_idle_call+0x58/0xe4 > > > >>>>> [ 2.776019] do_idle+0x244/0x2c0 > > > >>>>> [ 2.779242] cpu_startup_entry+0x20/0x6c > > > >>>>> [ 2.783160] rest_init+0x164/0x28c > > > >>>>> [ 2.786561] arch_call_rest_init+0xc/0x14 > > > >>>>> [ 2.790565] start_kernel+0x4c4/0x4f8 > > > >>>>> [ 2.794233] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014 > > > >>>>> [ 2.803011] Mem abort info: > > > >>>>> > > > >>>>> from https://lava.ciplatform.org/scheduler/job/1025535 > > > >>>>> from > > > >>>>> https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/5360973735 . > > > >>>>> > > > >>>>> Is there something else missing? > > > >> > > > >> It seems to be an intermittent issue. Investigating... > > > > > > > > After spending too much time on bisecting, the bad guy turns out to > > > > be commit 6d3745bbc3341d3b ("mmc: renesas_sdhi: register irqs before > > > > registering controller") in v5.10.198. > > > > > > > > Adding debug information shows the lock is mmc_host.lock. > > > > > > > > It is definitely initialized: > > > > > > > > renesas_sdhi_probe() > > > > { > > > > ... > > > > tmio_mmc_host_alloc() > > > > mmc_alloc_host > > > > spin_lock_init(&host->lock); > > Initializing mmc_host.lock. > > > > > ... > > > > devm_request_irq() > > > > -> tmio_mmc_irq > > > > tmio_mmc_cmd_irq() > > > > spin_lock(&host->lock); > > Locking tmio_mmc_host.lock, but ... > > > > > ... > > > > } > > > > > > > > That leaves us with a missing lockdep annotation? > > > > > > Is it possible that the lock initialization is overwritten ? > > > I seem to recall a recent case where this happens. > > > > > > Also, there is > > > spin_lock_init(&_host->lock); > > > in tmio_mmc_host_probe(), and tmio_mmc_host_probe() is called after > > > devm_request_irq(). > > > > Unless I am missing something, that is initializing tmio_mmc_host.lock, > > which is a different lock than mmc_host.lock? > > ... tmio_mmc_host.lock is initialized only here. > > Now the question remains why this is not triggered in mainline. > More investigation to do tomorrow... | --- a/drivers/mmc/host/renesas_sdhi_core.c | +++ b/drivers/mmc/host/renesas_sdhi_core.c | @@ -1011,6 +1011,8 @@ int renesas_sdhi_probe(struct platform_device *pdev, | renesas_sdhi_start_signal_voltage_switch; | host->sdcard_irq_setbit_mask = TMIO_STAT_ALWAYS_SET_27; | host->reset = renesas_sdhi_reset; host->sdcard_irq_mask_all is not initialized in this branch | + } else { | + host->sdcard_irq_mask_all = TMIO_MASK_ALL; | } | /* Orginally registers were 16 bit apart, could be 32 or 64 nowadays */ | @@ -1098,9 +1100,7 @@ int renesas_sdhi_probe(struct platform_device *pdev, | host->ops.hs400_complete = renesas_sdhi_hs400_complete; | } | - ret = tmio_mmc_host_probe(host); | - if (ret < 0) | - goto edisclk; | + sd_ctrl_write32_as_16_and_16(host, CTL_IRQ_MASK, host->sdcard_irq_mask_all); Fails to disable interrupts for real as host->sdcard_irq_mask_all is still zero. | num_irqs = platform_irq_count(pdev); | if (num_irqs < 0) { | @@ -1127,6 +1127,10 @@ int renesas_sdhi_probe(struct platform_device *pdev, | goto eirq; | } | + ret = tmio_mmc_host_probe(host); Initializes host->sdcard_irq_mask_all when needed and disables interrupts: if (!_host->sdcard_irq_mask_all) _host->sdcard_irq_mask_all = TMIO_MASK_ALL; tmio_mmc_disable_mmc_irqs(_host, _host->sdcard_irq_mask_all); If the interrupt came in before, we have an issue. | + if (ret < 0) | + goto edisclk; | + | dev_info(&pdev->dev, "%s base at %pa, max clock rate %u MHz\n", | mmc_hostname(host->mmc), &res->start, host->mmc->f_max / 1000000); The solution is to backport commit 9f12cac1bb88e329 ("mmc: renesas_sdhi: use custom mask for TMIO_MASK_ALL") in v5.13. As this doesn't backport cleanly, I'll submit a (tested) patch. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds