Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp953394rdf; Wed, 22 Nov 2023 01:10:23 -0800 (PST) X-Google-Smtp-Source: AGHT+IE9xSTk1Ua6hfqlaeHamP6Wo/0ZQMZt9soxEwOOsL0NZ1AA4NPjAd/NVFi/dTLXk8kCUZgS X-Received: by 2002:a17:902:ecc7:b0:1c6:11ca:8861 with SMTP id a7-20020a170902ecc700b001c611ca8861mr7558651plh.21.1700644223214; Wed, 22 Nov 2023 01:10:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700644223; cv=none; d=google.com; s=arc-20160816; b=xPjyE6O28f2k3CPhpwTHI9Efpys3J6k8FT2UTL8+p9TUjf9lzHOsBNEm9zIz1/LjqV sS9h6jXV6GwHpfc8wiymrghibOIOrzzGiX7zk2yVBHSOaQuiQg+KWLPQicnqrtvwl+R9 su/iGWf/5R/e6uNXRcoVwL+6C+XzfWMbqzcMPqTgUCNCAGJ0zjR5M4bHHCHRnwMhbBw7 tKe5qNtQwNxhTrpSeOEebJQf2hdj6AA2YXc7z+c3EoYDOFurWn3/6fhBkq78MWkvY6vm llHYncnOKbLRzolr+JrmBiseyKanh+2f0LU6/5aEQNAq99F3h2IYvaFKYaAQ1Eq9ZBdj /qfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :content-language:from:references:cc:to:subject:user-agent :mime-version:date:message-id:dkim-signature; bh=lzYsHRunzekn2A5bQ3My15XfS/ZCkVWMwI8ks0jlI8I=; fh=YOG2TC7spbkl1RuPnzpOIsR+pCkArISvAzBJbY5+NAg=; b=e1qMFdsoBlGbLlr6/CliWPR7jUOAqlp7JGQwAyri9rkeGUbLY/0noccLhSeTyoWVnu AHVti/oegKKw1BwpcXUm8vL0dlRcoBAJdSZ9051OMzxkGQ7dSaDzs0XtB1r4dzj+S+C3 96LPQA1fDkA/QTHeeDWJSE2piNWzL9Fbo9B0F+pLNQthNI1rML6FBB1jIsYWWmfaAxW4 pzTtdhqoTR7Q4Mz9w3f5hHAGtSNAMaKPMXWFDZ/cJ2Bg06Fhf2g79bT1grPtiwHyc6Rd vbQ10Xtj3J88z7tbO882ChWvUloAL7KwSst8cLmn/oXkJuDuo4+w/cqVas5UbmCig6bw iw2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b="CVBu/rkb"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id f9-20020a170902e98900b001cf7acdca8asi579015plb.573.2023.11.22.01.10.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 01:10:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b="CVBu/rkb"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id B5F9180DFA51; Wed, 22 Nov 2023 01:10:12 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235206AbjKVJJK (ORCPT + 99 others); Wed, 22 Nov 2023 04:09:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235192AbjKVJIt (ORCPT ); Wed, 22 Nov 2023 04:08:49 -0500 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B4B23C33; Wed, 22 Nov 2023 01:06:24 -0800 (PST) Received: from [100.107.97.3] (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: kholk11) by madras.collabora.co.uk (Postfix) with ESMTPSA id B168D6602F2B; Wed, 22 Nov 2023 09:06:22 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1700643983; bh=+6frr+ifTyvpSRQqcbHjfT1DwxvyGpG2yDaXM+9UwGg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=CVBu/rkb3GBYVQMzGejdK1ETSEu/H9Xw339jFIbwK+HXuMgcdtWfkeRVOqQkHmKjF jyfzv/8zPf4RIN/hlElLrLpHO7hY+SFE4kyBZxMNE/SKPmkQunKdWAvSxKs2h2OX1E 7gkpz5rTR3I0a4tzGLI4mmITtPFcA/ZuPxdnU1i9BbsoM4h8O8MILXaHxEU9pI912D 8uxhSyj7JBrhU3Z1pn/m8s4dwQ9OM4ojmxb22HrDJrZQwV/Am8SkxLiqu3j5fSH91E eMTPeMCnejusp5K9/raSjQCOMnk8JYRUwA31WEity1rxwDDvl5LT++Wmk9vNyHxo/E 1u8FlghJEQ06w== Message-ID: <4c73f67e-174c-497e-85a5-cb053ce657cb@collabora.com> Date: Wed, 22 Nov 2023 10:06:19 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off() To: Krzysztof Kozlowski , Boris Brezillon Cc: Steven Price , tzimmermann@suse.de, linux-kernel@vger.kernel.org, mripard@kernel.org, dri-devel@lists.freedesktop.org, wenst@chromium.org, kernel@collabora.com, "linux-samsung-soc@vger.kernel.org" , Marek Szyprowski References: <20231102141507.73481-1-angelogioacchino.delregno@collabora.com> <7928524a-b581-483b-b1a1-6ffd719ce650@arm.com> <1c9838fb-7f2d-4752-b86a-95bcf504ac2f@linaro.org> <6b7a4669-7aef-41a7-8201-c2cfe401bc43@collabora.com> <20231121175531.085809f5@collabora.com> From: AngeloGioacchino Del Regno Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 22 Nov 2023 01:10:13 -0800 (PST) Il 21/11/23 18:08, Krzysztof Kozlowski ha scritto: > On 21/11/2023 17:55, Boris Brezillon wrote: >> On Tue, 21 Nov 2023 17:11:42 +0100 >> AngeloGioacchino Del Regno >> wrote: >> >>> Il 21/11/23 16:34, Krzysztof Kozlowski ha scritto: >>>> On 08/11/2023 14:20, Steven Price wrote: >>>>> On 02/11/2023 14:15, AngeloGioacchino Del Regno wrote: >>>>>> The layout of the registers {TILER,SHADER,L2}_PWROFF_LO, used to request >>>>>> powering off cores, is the same as the {TILER,SHADER,L2}_PWRON_LO ones: >>>>>> this means that in order to request poweroff of cores, we are supposed >>>>>> to write a bitmask of cores that should be powered off! >>>>>> This means that the panfrost_gpu_power_off() function has always been >>>>>> doing nothing. >>>>>> >>>>>> Fix powering off the GPU by writing a bitmask of the cores to poweroff >>>>>> to the relevant PWROFF_LO registers and then check that the transition >>>>>> (from ON to OFF) has finished by polling the relevant PWRTRANS_LO >>>>>> registers. >>>>>> >>>>>> While at it, in order to avoid code duplication, move the core mask >>>>>> logic from panfrost_gpu_power_on() to a new panfrost_get_core_mask() >>>>>> function, used in both poweron and poweroff. >>>>>> >>>>>> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") >>>>>> Signed-off-by: AngeloGioacchino Del Regno >>>> >>>> >>>> Hi, >>>> >>>> This commit was added to next recently but it causes "external abort on >>>> non-linefetch" during boot of my Odroid HC1 board. >>>> >>>> At least bisect points to it. >>>> >>>> If fixed, please add: >>>> >>>> Reported-by: Krzysztof Kozlowski >>>> >>>> [ 4.861683] 8<--- cut here --- >>>> [ 4.863429] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf0c8802c >>>> [ 4.871018] [f0c8802c] *pgd=433ed811, *pte=11800653, *ppte=11800453 >>>> ... >>>> [ 5.164010] panfrost_gpu_irq_handler from __handle_irq_event_percpu+0xcc/0x31c >>>> [ 5.171276] __handle_irq_event_percpu from handle_irq_event+0x38/0x80 >>>> [ 5.177765] handle_irq_event from handle_fasteoi_irq+0x9c/0x250 >>>> [ 5.183743] handle_fasteoi_irq from generic_handle_domain_irq+0x28/0x38 >>>> [ 5.190417] generic_handle_domain_irq from gic_handle_irq+0x88/0xa8 >>>> [ 5.196741] gic_handle_irq from generic_handle_arch_irq+0x34/0x44 >>>> [ 5.202893] generic_handle_arch_irq from __irq_svc+0x8c/0xd0 >>>> >>>> Full log: >>>> https://krzk.eu/#/builders/21/builds/4392/steps/11/logs/serial0 >>>> >>> >>> Hey Krzysztof, >>> >>> This is interesting. It might be about the cores that are missing from the partial >>> core_mask raising interrupts, but an external abort on non-linefetch is strange to >>> see here. >> >> I've seen such external aborts in the past, and the fault type has >> often been misleading. It's unlikely to have anything to do with a > > Yeah, often accessing device with power or clocks gated. > Except my commit does *not* gate SoC power, nor SoC clocks ???? What the "Really power off ..." commit does is to ask the GPU to internally power off the shaders, tilers and L2, that's why I say that it is strange to see that kind of abort. The GPU_INT_CLEAR GPU_INT_STAT, GPU_FAULT_STATUS and GPU_FAULT_ADDRESS_{HI/LO} registers should still be accessible even with shaders, tilers and cache OFF. Anyway, yes, synchronizing IRQs before calling the poweroff sequence would also work, but that'd add up quite a bit of latency on the runtime_suspend() call, so in this case I'd be more for avoiding to execute any register r/w in the handler by either checking if the GPU is supposed to be OFF, or clearing interrupts, which may not work if those are generated after the execution of the poweroff function. Or we could simply disable the irq after power_off, but that'd be hacky (as well). Let's see if asking to poweroff *everything* works: --- drivers/gpu/drm/panfrost/panfrost_gpu.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c b/drivers/gpu/drm/panfrost/panfrost_gpu.c index 09f5e1563ebd..1c7276aaa182 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c @@ -429,21 +429,29 @@ void panfrost_gpu_power_off(struct panfrost_device *pfdev) int ret; u32 val; - gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & core_mask); + gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present); + gpu_write(pfdev, SHADER_PWROFF_HI, U32_MAX); ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO, val, !val, 1, 1000); if (ret) dev_err(pfdev->dev, "shader power transition timeout"); gpu_write(pfdev, TILER_PWROFF_LO, pfdev->features.tiler_present); + gpu_write(pfdev, TILER_PWROFF_HI, U32_MAX); ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_PWRTRANS_LO, val, !val, 1, 1000); if (ret) dev_err(pfdev->dev, "tiler power transition timeout"); - gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & core_mask); + gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present); ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO, - val, !val, 0, 1000); + val, !val, 0, 1000); + if (ret) + dev_err(pfdev->dev, "l2_low power transition timeout"); + + gpu_write(pfdev, L2_PWROFF_HI, U32_MAX); + ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_HI, + val, !val, 0, 1000); if (ret) dev_err(pfdev->dev, "l2 power transition timeout"); } -- 2.42.0 Cheers, Angelo >> non-linefetch access, but it might be caused by a register access after >> the clock or power domain driving the register bank has been disabled. >> The following diff might help validate this theory. If that works, we >> probably want to make sure we synchronize IRQs before disabling in the >> suspend path. >> >> --->8--- >> diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h b/drivers/gpu/drm/panfrost/panfrost_regs.h >> index 55ec807550b3..98df66e5cc9b 100644 >> --- a/drivers/gpu/drm/panfrost/panfrost_regs.h >> +++ b/drivers/gpu/drm/panfrost/panfrost_regs.h >> @@ -34,8 +34,6 @@ >> (GPU_IRQ_FAULT |\ >> GPU_IRQ_MULTIPLE_FAULT |\ >> GPU_IRQ_RESET_COMPLETED |\ >> - GPU_IRQ_POWER_CHANGED |\ >> - GPU_IRQ_POWER_CHANGED_ALL |\ > > This helped, at least for this issue (next-20231121). Much later in > user-space boot I have lockups: > watchdog: BUG: soft lockup - CPU#4 stuck for 26s! [kworker/4:1:61] > > [ 56.329224] smp_call_function_single from > __sync_rcu_exp_select_node_cpus+0x29c/0x78c > [ 56.337111] __sync_rcu_exp_select_node_cpus from > sync_rcu_exp_select_cpus+0x334/0x878 > [ 56.344995] sync_rcu_exp_select_cpus from wait_rcu_exp_gp+0xc/0x18 > [ 56.351231] wait_rcu_exp_gp from process_one_work+0x20c/0x620 > [ 56.357038] process_one_work from worker_thread+0x1d0/0x488 > [ 56.362668] worker_thread from kthread+0x104/0x138 > [ 56.367521] kthread from ret_from_fork+0x14/0x28 > > But anyway the external abort does not appear. > > Best regards, > Krzysztof >