Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752452AbdLFRSJ convert rfc822-to-8bit (ORCPT ); Wed, 6 Dec 2017 12:18:09 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:6292 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752162AbdLFRSH (ORCPT ); Wed, 6 Dec 2017 12:18:07 -0500 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 06 Dec 2017 09:18:51 -0800 Subject: Re: [PATCH 1/2] drm/nouveau/bar/gf100: fix hang when calling ->fini() before ->init() To: Guillaume Tucker , Ben Skeggs CC: David Airlie , , , , References: <6db622ac-319d-c640-91ab-9248e528b69b@nvidia.com> From: Jon Hunter Message-ID: Date: Wed, 6 Dec 2017 17:18:02 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.26.11.67] X-ClientProxiedBy: UKMAIL102.nvidia.com (10.26.138.15) To UKMAIL101.nvidia.com (10.26.138.13) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2696 Lines: 69 On 06/12/17 09:22, Guillaume Tucker wrote: > On 05/12/17 18:32, Ben Skeggs wrote: >> On Wed, Dec 6, 2017 at 12:30 AM, Jon Hunter wrote: >> >>> >>> On 04/12/17 18:37, Guillaume Tucker wrote: >>>> If the firmware fails to load then ->fini() will be called before the >>>> device has been initialised, causing the kernel to hang while trying >>>> to write to a register.  Add a test in ->fini() to avoid this issue. >>>> >>>> This fixes a kernel hang on tegra124. >>>> >>>> Fixes: b17de35a2ebbe ("drm/nouveau/bar: implement bar1 teardown") >>>> Signed-off-by: Guillaume Tucker >>>> CC: Ben Skeggs >>>> --- >>>>  drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c | 7 +++++-- >>>>  1 file changed, 5 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>> b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>>> index a3ba7f50198b..95e2aba64aad 100644 >>>> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>>> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/bar/gf100.c >>>> @@ -43,9 +43,12 @@ gf100_bar_bar1_wait(struct nvkm_bar *base) >>>>  } >>>> >>>>  void >>>> -gf100_bar_bar1_fini(struct nvkm_bar *bar) >>>> +gf100_bar_bar1_fini(struct nvkm_bar *base) >>>>  { >>>> -     nvkm_mask(bar->subdev.device, 0x001704, 0x80000000, 0x00000000); >>>> +     struct nvkm_device *device = base->subdev.device; >>>> + >>>> +     if (base->subdev.oneinit) >>>> +             nvkm_mask(device, 0x001704, 0x80000000, 0x00000000); >>>>  } >>>> >>>>  void >>> >>> I have tested this and it works for me. Thanks for fixing this! Would be >>> good to get Ben's ACK, but you can have my ... >>> >> I'd love to get a good explanation as to why it hangs without this >> change, >> as, on the surface, it's not immediately obvious as to why it's hanging. > > To be fair I'm not entirely sure either why this causes a hang, I > haven't read the TRM...  The iomem has been mapped at this point, > so accessing the register should work.  One clue is when you look > at _bar1_init(), the 0x1704 register is initialised with > some (device instance?) memory address.  So it's possible that > the hardware does something special when you set this to 0 as in > _bar1_fini(), which may fail in particular if it was previously > not initialised with a valid address. > > This is merely guesswork, would be interested to find out the > real explanation though. OK, well that's no good. It's a good pointer, but we need to make sure we understand the root of this hang. I will see if I have sometime to dig into this further, maybe next week. Cheers Jon -- nvpublic