Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754543AbdLFJWQ (ORCPT ); Wed, 6 Dec 2017 04:22:16 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:59722 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753656AbdLFJWI (ORCPT ); Wed, 6 Dec 2017 04:22:08 -0500 Subject: Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init() To: Ben Skeggs , Jon Hunter References: <6db622ac-319d-c640-91ab-9248e528b69b@nvidia.com> Cc: David Airlie , dri-devel@lists.freedesktop.org, linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org From: Guillaume Tucker Message-ID: Date: Wed, 6 Dec 2017 09:22:04 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2406 Lines: 62 On 05/12/17 18:32, Ben Skeggs wrote: > On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter wrote: > >> >> On 04/12/17 18:37, Guillaume Tucker wrote: >>> If the firmware fails to load then ->fini() will be called before the >>> device has been initialised, causing the kernel to hang while trying >>> to write to a register. Add a test in ->fini() to avoid this issue. >>> >>> This fixes a kernel hang on tegra124. >>> >>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown") >>> Signed-off-by: Guillaume Tucker >>> CC: Ben Skeggs >>> --- >>> drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>> index a3ba7f50198b..95e2aba64aad 100644 >>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base) >>> } >>> >>> void >>> -gf100_bar_bar1_fini(struct nvkm_bar *bar) >>> +gf100_bar_bar1_fini(struct nvkm_bar *base) >>> { >>> - nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000); >>> + struct nvkm_device *device = base->subdev.device; >>> + >>> + if (base->subdev.oneinit) >>> + nvkm_mask(device, 0x001704, 0x80000000, 0x00000000); >>> } >>> >>> void >> >> I have tested this and it works for me. Thanks for fixing this! Would be >> good to get Ben's ACK, but you can have my ... >> > I'd love to get a good explanation as to why it hangs without this change, > as, on the surface, it's not immediately obvious as to why it's hanging. To be fair I'm not entirely sure either why this causes a hang, I haven't read the TRM... The iomem has been mapped at this point, so accessing the register should work. One clue is when you look at _bar1_init(), the 0x1704 register is initialised with some (device instance?) memory address. So it's possible that the hardware does something special when you set this to 0 as in _bar1_fini(), which may fail in particular if it was previously not initialised with a valid address. This is merely guesswork, would be interested to find out the real explanation though. >> Tested-by: Jon Hunter Thanks! Guillaume