Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752044AbcLQHDH (ORCPT ); Sat, 17 Dec 2016 02:03:07 -0500 Received: from nat-hk.nvidia.com ([203.18.50.4]:8568 "EHLO hkmmgate101.nvidia.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750780AbcLQHDF (ORCPT ); Sat, 17 Dec 2016 02:03:05 -0500 X-PGP-Universal: processed; by hkpgpgate102.nvidia.com on Fri, 16 Dec 2016 23:03:01 -0800 Subject: Re: Issue with DRM and "reimplement IDR and IDA using the radix tree" To: Thierry Reding , Matthew Wilcox , Stephen Rothwell , Andrew Morton References: <20161216161644.GA2018@ulmo.ba.sec> CC: , , From: Alexandre Courbot X-Nvconfidentiality: public Message-ID: Date: Sat, 17 Dec 2016 15:47:48 +0900 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161216161644.GA2018@ulmo.ba.sec> X-Originating-IP: [10.19.62.184] X-ClientProxiedBy: HKMAIL101.nvidia.com (10.18.16.10) To HKMAIL103.nvidia.com (10.18.16.12) Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3307 Lines: 80 On 12/17/2016 01:16 AM, Thierry Reding wrote: > * PGP Signed by an unknown key > > On Wed, Dec 14, 2016 at 11:08:20PM +0900, Alexandre Courbot wrote: >> Forgot to add the most relevant list for this issue (linux-next). >> >> Stephen, maybe you will want to temporarily revert this patch until this >> is cleared? This probably affects other users than DRM. >> >> On 12/13/2016 04:14 PM, Alexandre Courbot wrote: >>> Hi Matthew, >>> >>> Trying the latest -next on the Jetson TK1 board (with two different DRM >>> devices and display and render), I noticed that the GPU device probe >>> always failed with error -ENOSPC. After investigating I figured out that >>> this was due to the minor device allocation failing when a second DRM >>> device is added. >>> >>> More precisely, when drm_minor_alloc() is called with DRM_MINOR_PRIMARY >>> (0) as argument for a second time, the call to idr_alloc() (which has a >>> requested range of 0..64) fails instead of returning 1 as expected. Note >>> that the first call is successful. >>> >>> Reverting "reimplement IDR and IDA using the radix tree" on 20161213's >>> next fixes the issue for me, suggesting a bug may have slipped in there. >>> >>> Not sure how this could be fixed, so reporting the issue for now in case >>> it is not known yet. > > I can confirm Alex' findings, though the symptoms seem to be slightly > different, which may be related to me testing on next-20161216 rather > than next-20161213. > > What I'm seeing is that all drivers get probed correctly, but when an > application tries to open the DRM device files (/dev/dri/card0 in this > case), then all devices of a given minor type disappear. So in my case > upon boot I get this: > > # ls -l /dev/dri/ > total 0 > crw-rw---- 1 root video 226, 0 Dec 16 15:59 card0 > crw-rw---- 1 root video 226, 1 Dec 16 15:59 card1 > crw-rw---- 1 root video 226, 128 Dec 16 15:59 renderD128 > > The modetest program from libdrm is then unable to open any devices: > > # modetest > trying to open device 'i915'...failed > trying to open device 'amdgpu'...failed > trying to open device 'radeon'...failed > trying to open device 'nouveau'...failed > trying to open device 'vmwgfx'...failed > trying to open device 'omapdrm'...failed > trying to open device 'exynos'...failed > trying to open device 'tilcdc'...failed > trying to open device 'msm'...failed > trying to open device 'sti'...failed > trying to open device 'tegra'...failed > trying to open device 'imx-drm'...failed > trying to open device 'rockchip'...failed > trying to open device 'atmel-hlcdc'...failed > trying to open device 'fsl-dcu-drm'...failed > trying to open device 'vc4'...failed > trying to open device 'virtio_gpu'...failed > trying to open device 'mediatek'...failed > no device found > > And after that all of the primary minors are gone: > > # ls -l /dev/dri/ > total 0 > crw-rw---- 1 root video 226, 128 Dec 16 15:59 renderD128 That's exactly what I am also getting with 20161216. As it turns out the patch has changed slightly (my revert did not apply after a rebase), and the symptoms changed against 20161215, but the fix is the same: reverting gives me back a working system. This patch really should be reverted for now. Like Thierry I am available to test further iterations.