Received: by 2002:a05:7412:40d:b0:e2:908c:2ebd with SMTP id 13csp989978rdf; Wed, 22 Nov 2023 02:34:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IF2wfB7Hlfunij8MFvHjY93CUGGokHsshwdCDOBjXWUh+kGexpWNdIa4HD1zzAF6uc48Bk/ X-Received: by 2002:a17:902:e752:b0:1cf:5d4e:7783 with SMTP id p18-20020a170902e75200b001cf5d4e7783mr2129925plf.25.1700649278151; Wed, 22 Nov 2023 02:34:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700649278; cv=none; d=google.com; s=arc-20160816; b=dfM+glcStDJGFVgEaDG1QAWQyy0K3/Ch0NnMBgqdkR0vGP8Vx3d/warb/uFi7UiOka C/zqIoc8+1n0Zzsg72iWNyWiXa9p0QZv0yf4q1m9wW0JRlDUOy/WTJ7ZHeXVzY7nvRF+ oOKHywV85PpuZnzSh7Dqo2H5pdQLsSJzzFxkJ20JxGuXVXnlSKk8IJld0NW+XBtpRR/6 wMISErmTbydQd57wUjSaRdJpakZA+PPt7gBsPv0WAG6M1g4KVCCsiBLow0ng4d6E/F7u 23McGrIfhNPZSGVtWMvnmTBLvEx9WnEY9C5itkLQsjQtciP4Ptdsjw9ng2Iwkf0pG3GY +vBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=D28vGQd+5pEfDopD50aPiaA//qTa/oXqm+Zhe4r2qjo=; fh=qas/QV/rrekCVQqwzaA6RGLrYu+eC7xj6vnA2l7DA3g=; b=RTilUh+71mvT7yuus7DomrgqpmV0yJbk1EQtzzciO2w4QpwJfu2ruupbFHhvU3NgpX GTQJ+fpMJQhSV4eI766ksyNyni6w6POHn8LTtgCMhgTQNyNeMIrgBtaokceBWcuswvLo 9mnY0IadYMm2/i2J9fvYnXLQ2J/dfZUb/1izAiCx+L01RVCqqX6lnBzNz7YXXmR1xJcm EkYUlutsmDmg9vdxpFAgdQbWbqvaK+XV+Yll5+Ss3J5msaTQN+Q/9+RrwjgS3P/5JLej C9H7NqPHfXwQLADE5i+Agd6Ubq51jJJThDKpseGMWU+9CmwXljCfuUHaCJX6dlrA1t0Y 6mFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=cHIXt3yG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Return-Path: Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id t24-20020a1709028c9800b001c9c83947d1si11824217plo.645.2023.11.22.02.34.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 02:34:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@collabora.com header.s=mail header.b=cHIXt3yG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=collabora.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 886E780CF53E; Wed, 22 Nov 2023 02:33:33 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233111AbjKVKdc (ORCPT + 99 others); Wed, 22 Nov 2023 05:33:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35968 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230225AbjKVKda (ORCPT ); Wed, 22 Nov 2023 05:33:30 -0500 Received: from madras.collabora.co.uk (madras.collabora.co.uk [46.235.227.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 648CF93; Wed, 22 Nov 2023 02:33:26 -0800 (PST) Received: from [100.107.97.3] (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: kholk11) by madras.collabora.co.uk (Postfix) with ESMTPSA id 364576607351; Wed, 22 Nov 2023 10:33:24 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1700649204; bh=LW60lqPVhl0HcdFitvYc/FI/s8B9ah3/L4d5OITngAc=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=cHIXt3yGjJ3de3cpUqTsK2Y98ZcEXYHOKty7yHSKLnrNdAEasf96wcSMy/rbUrLOk GRrzY57idyR6V90u8+y1iNTGMNlOrss9Ekl9x0IyC04Ke1W2/p/sP5hyZKEsokJspy k/2qL8mLxfegCWjdY1UbMokc+MSDBTOLyDmDSStNouEO66w24aM6BGXnX4hqyVOIaT /mHMvQ1d4F39dpRLCvu47EfKGYCtkPErhrfm53m3CGw81jDUYxTGP0bLksiaoob670 wW4JnW3SUxASzZBZvjlzj/acETgJRA85t53k2DTeW/LNELc390uTqDjNWIVAXxIzx5 IKqyMXtEK0JBQ== Message-ID: <1e196ad9-48b3-484f-ada5-83c56eea60ec@collabora.com> Date: Wed, 22 Nov 2023 11:33:21 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off() Content-Language: en-US To: Steven Price , Krzysztof Kozlowski , Boris Brezillon Cc: tzimmermann@suse.de, linux-kernel@vger.kernel.org, mripard@kernel.org, dri-devel@lists.freedesktop.org, wenst@chromium.org, kernel@collabora.com, "linux-samsung-soc@vger.kernel.org" , Marek Szyprowski References: <20231102141507.73481-1-angelogioacchino.delregno@collabora.com> <7928524a-b581-483b-b1a1-6ffd719ce650@arm.com> <1c9838fb-7f2d-4752-b86a-95bcf504ac2f@linaro.org> <6b7a4669-7aef-41a7-8201-c2cfe401bc43@collabora.com> <20231121175531.085809f5@collabora.com> <4c73f67e-174c-497e-85a5-cb053ce657cb@collabora.com> From: AngeloGioacchino Del Regno In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 22 Nov 2023 02:33:33 -0800 (PST) Il 22/11/23 10:48, Steven Price ha scritto: > On 22/11/2023 09:06, AngeloGioacchino Del Regno wrote: >> Il 21/11/23 18:08, Krzysztof Kozlowski ha scritto: >>> On 21/11/2023 17:55, Boris Brezillon wrote: >>>> On Tue, 21 Nov 2023 17:11:42 +0100 >>>> AngeloGioacchino Del Regno >>>> wrote: >>>> >>>>> Il 21/11/23 16:34, Krzysztof Kozlowski ha scritto: >>>>>> On 08/11/2023 14:20, Steven Price wrote: >>>>>>> On 02/11/2023 14:15, AngeloGioacchino Del Regno wrote: >>>>>>>> The layout of the registers {TILER,SHADER,L2}_PWROFF_LO, used to >>>>>>>> request >>>>>>>> powering off cores, is the same as the {TILER,SHADER,L2}_PWRON_LO >>>>>>>> ones: >>>>>>>> this means that in order to request poweroff of cores, we are >>>>>>>> supposed >>>>>>>> to write a bitmask of cores that should be powered off! >>>>>>>> This means that the panfrost_gpu_power_off() function has always >>>>>>>> been >>>>>>>> doing nothing. >>>>>>>> >>>>>>>> Fix powering off the GPU by writing a bitmask of the cores to >>>>>>>> poweroff >>>>>>>> to the relevant PWROFF_LO registers and then check that the >>>>>>>> transition >>>>>>>> (from ON to OFF) has finished by polling the relevant PWRTRANS_LO >>>>>>>> registers. >>>>>>>> >>>>>>>> While at it, in order to avoid code duplication, move the core mask >>>>>>>> logic from panfrost_gpu_power_on() to a new panfrost_get_core_mask() >>>>>>>> function, used in both poweron and poweroff. >>>>>>>> >>>>>>>> Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") >>>>>>>> Signed-off-by: AngeloGioacchino Del Regno >>>>>>>> >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> This commit was added to next recently but it causes "external >>>>>> abort on >>>>>> non-linefetch" during boot of my Odroid HC1 board. >>>>>> >>>>>> At least bisect points to it. >>>>>> >>>>>> If fixed, please add: >>>>>> >>>>>> Reported-by: Krzysztof Kozlowski >>>>>> >>>>>> [    4.861683] 8<--- cut here --- >>>>>> [    4.863429] Unhandled fault: external abort on non-linefetch >>>>>> (0x1008) at 0xf0c8802c >>>>>> [    4.871018] [f0c8802c] *pgd=433ed811, *pte=11800653, *ppte=11800453 >>>>>> ... >>>>>> [    5.164010]  panfrost_gpu_irq_handler from >>>>>> __handle_irq_event_percpu+0xcc/0x31c >>>>>> [    5.171276]  __handle_irq_event_percpu from >>>>>> handle_irq_event+0x38/0x80 >>>>>> [    5.177765]  handle_irq_event from handle_fasteoi_irq+0x9c/0x250 >>>>>> [    5.183743]  handle_fasteoi_irq from >>>>>> generic_handle_domain_irq+0x28/0x38 >>>>>> [    5.190417]  generic_handle_domain_irq from >>>>>> gic_handle_irq+0x88/0xa8 >>>>>> [    5.196741]  gic_handle_irq from generic_handle_arch_irq+0x34/0x44 >>>>>> [    5.202893]  generic_handle_arch_irq from __irq_svc+0x8c/0xd0 >>>>>> >>>>>> Full log: >>>>>> https://krzk.eu/#/builders/21/builds/4392/steps/11/logs/serial0 >>>>>> >>>>> >>>>> Hey Krzysztof, >>>>> >>>>> This is interesting. It might be about the cores that are missing >>>>> from the partial >>>>> core_mask raising interrupts, but an external abort on non-linefetch >>>>> is strange to >>>>> see here. >>>> >>>> I've seen such external aborts in the past, and the fault type has >>>> often been misleading. It's unlikely to have anything to do with a >>> >>> Yeah, often accessing device with power or clocks gated. >>> >> >> Except my commit does *not* gate SoC power, nor SoC clocks ???? >> >> What the "Really power off ..." commit does is to ask the GPU to >> internally power >> off the shaders, tilers and L2, that's why I say that it is strange to >> see that >> kind of abort. >> >> The GPU_INT_CLEAR GPU_INT_STAT, GPU_FAULT_STATUS and >> GPU_FAULT_ADDRESS_{HI/LO} >> registers should still be accessible even with shaders, tilers and cache >> OFF. >> >> Anyway, yes, synchronizing IRQs before calling the poweroff sequence >> would also >> work, but that'd add up quite a bit of latency on the runtime_suspend() >> call, so >> in this case I'd be more for avoiding to execute any register r/w in the >> handler >> by either checking if the GPU is supposed to be OFF, or clearing >> interrupts, which >> may not work if those are generated after the execution of the poweroff >> function. >> Or we could simply disable the irq after power_off, but that'd be hacky >> (as well). >> >> >> Let's see if asking to poweroff *everything* works: >> >> >> --- >>  drivers/gpu/drm/panfrost/panfrost_gpu.c | 14 +++++++++++--- >>  1 file changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_gpu.c >> b/drivers/gpu/drm/panfrost/panfrost_gpu.c >> index 09f5e1563ebd..1c7276aaa182 100644 >> --- a/drivers/gpu/drm/panfrost/panfrost_gpu.c >> +++ b/drivers/gpu/drm/panfrost/panfrost_gpu.c >> @@ -429,21 +429,29 @@ void panfrost_gpu_power_off(struct panfrost_device >> *pfdev) >>      int ret; >>      u32 val; >> >> -    gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present & >> core_mask); >> +    gpu_write(pfdev, SHADER_PWROFF_LO, pfdev->features.shader_present); > > Hopefully this one line change, and... > >> +    gpu_write(pfdev, SHADER_PWROFF_HI, U32_MAX); >>      ret = readl_relaxed_poll_timeout(pfdev->iomem + SHADER_PWRTRANS_LO, >>                       val, !val, 1, 1000); >>      if (ret) >>          dev_err(pfdev->dev, "shader power transition timeout"); >> >>      gpu_write(pfdev, TILER_PWROFF_LO, pfdev->features.tiler_present); >> +    gpu_write(pfdev, TILER_PWROFF_HI, U32_MAX); >>      ret = readl_relaxed_poll_timeout(pfdev->iomem + TILER_PWRTRANS_LO, >>                       val, !val, 1, 1000); >>      if (ret) >>          dev_err(pfdev->dev, "tiler power transition timeout"); >> >> -    gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present & >> core_mask); >> +    gpu_write(pfdev, L2_PWROFF_LO, pfdev->features.l2_present); > > ... this one are all that are actually needed - the rest should be > ignored as they affect cores that aren't present. > Honestly - when I wrote that diff, I didn't care at all whether the HI registers were powering off cores that weren't present, because I knew that the GPU would have handled that gracefully anyway. What I wanted to do was to reduce Krzysztof's testing effort to a minimum, actually preventing to send more than one patch to try... but with that, you bought me a bit of precious time that I would've spent with research, so, hats off! Thank you! > The Exynos 5422 SoC has a T628 MP6 - so two core groups which isn't a > particularly well supported configuration. But I'm not sure how we're > ending up with the second core group being powered up in the first > place. Even if it was left powered by something previous (e.g. the > bootloader) then the soft-reset during probe should cause them to power > down. > Hm. I didn't know that soft_reset is supposed to (and will) power down cores. This is clarifying some things I didn't really have an explanation for... so thanks again :-) > But it seems like a good idea to power off everything when powering > down, even if we didn't expect the cores to be on. > > Boris also has a point that before cutting the power/clocks we should > really be synchronising with the IRQs - but that affects the follow on > patches not this one. > ...which gives me some more ideas to try... in the near future. But it's out of context for this fix anyway. Cheers, Angelo > Steve > >>      ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_LO, >> -                 val, !val, 0, 1000); >> +                     val, !val, 0, 1000); >> +    if (ret) >> +        dev_err(pfdev->dev, "l2_low power transition timeout"); >> + >> +    gpu_write(pfdev, L2_PWROFF_HI, U32_MAX); >> +    ret = readl_poll_timeout(pfdev->iomem + L2_PWRTRANS_HI, >> +                     val, !val, 0, 1000); >>      if (ret) >>          dev_err(pfdev->dev, "l2 power transition timeout"); >>  } >