2021-08-02 17:21:15

by Mikael Pettersson

[permalink] [raw]
Subject: [BISECTED] 5.14.0-rc4 broke drm/ttm when !CONFIG_DEBUG_FS

Booting 5.14.0-rc4 on my box with Radeon graphics breaks with

[drm:radeon_ttm_init [radeon]] *ERROR* failed initializing buffer
object driver(-19).
radeon 0000:01:00.0: Fatal error during GPU init

after which the screen goes black for the rest of kernel boot and
early user-space init.
Once the console login shows up the screen is in some legacy low-res
mode and Xorg can't be started.

A git bisect between v5.14-rc3 (good) and v5.14-rc4 (bad) identified

# first bad commit: [69de4421bb4c103ef42a32bafc596e23918c106f]
drm/ttm: Initialize debugfs from ttm_global_init()

Reverting that from 5.14.0-rc4 gives me a working kernel again.

Note that I have
# CONFIG_DEBUG_FS is not set

/Mikael


2021-08-02 18:38:18

by Duncan

[permalink] [raw]
Subject: Re: [BISECTED] 5.14.0-rc4 broke drm/ttm when !CONFIG_DEBUG_FS

[Not subscribed so please CC me. Manual quoting after using lore's
in-reply-to functionality. First time doing that so hope I got it
right.]

Mikael Pettersson <[email protected]> wrote...
> Booting 5.14.0-rc4 on my box with Radeon graphics breaks with
>
> [drm:radeon_ttm_init [radeon]] *ERROR* failed initializing buffer
> object driver(-19).
> radeon 0000:01:00.0: Fatal error during GPU init

Seeing this here too. amdgpu on polaris-11, on an old amd-fx6100
system.

> after which the screen goes black for the rest of kernel boot
> and early user-space init.

*NOT* seeing that. However, I have boot messages turned on by default
and I see them as usual, only it stays in vga-console mode instead of
switching to framebuffer after early-boot. I'm guessing MP has a
high-res boot-splash which doesn't work in vga mode, thus the
black-screen until the login shows up.

> Once the console login shows up the screen is in some legacy low-res
> mode and Xorg can't be started.
>
> A git bisect between v5.14-rc3 (good) and v5.14-rc4 (bad) identified
>
> # first bad commit: [69de4421bb4c103ef42a32bafc596e23918c106f]
> drm/ttm: Initialize debugfs from ttm_global_init()
>
> Reverting that from 5.14.0-rc4 gives me a working kernel again.
>
> Note that I have
> # CONFIG_DEBUG_FS is not set

That all matches here, including the unset CONFIG_DEBUG_FS and
confirming the revert on 5.14.0-rc4 works.

--
Duncan - HTML messages treated as spam
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."
Benjamin Franklin

2021-08-03 06:56:04

by Mikael Pettersson

[permalink] [raw]
Subject: Re: [BISECTED] 5.14.0-rc4 broke drm/ttm when !CONFIG_DEBUG_FS

On Mon, Aug 2, 2021 at 8:29 PM Duncan <[email protected]> wrote:
>
> [Not subscribed so please CC me. Manual quoting after using lore's
> in-reply-to functionality. First time doing that so hope I got it
> right.]
>
> Mikael Pettersson <[email protected]> wrote...
> > Booting 5.14.0-rc4 on my box with Radeon graphics breaks with
> >
> > [drm:radeon_ttm_init [radeon]] *ERROR* failed initializing buffer
> > object driver(-19).
> > radeon 0000:01:00.0: Fatal error during GPU init
>
> Seeing this here too. amdgpu on polaris-11, on an old amd-fx6100
> system.
>
> > after which the screen goes black for the rest of kernel boot
> > and early user-space init.
>
> *NOT* seeing that. However, I have boot messages turned on by default
> and I see them as usual, only it stays in vga-console mode instead of
> switching to framebuffer after early-boot. I'm guessing MP has a
> high-res boot-splash which doesn't work in vga mode, thus the
> black-screen until the login shows up.

Yes, I have the Fedora boot splash enabled.

> > Once the console login shows up the screen is in some legacy low-res
> > mode and Xorg can't be started.
> >
> > A git bisect between v5.14-rc3 (good) and v5.14-rc4 (bad) identified
> >
> > # first bad commit: [69de4421bb4c103ef42a32bafc596e23918c106f]
> > drm/ttm: Initialize debugfs from ttm_global_init()
> >
> > Reverting that from 5.14.0-rc4 gives me a working kernel again.
> >
> > Note that I have
> > # CONFIG_DEBUG_FS is not set
>
> That all matches here, including the unset CONFIG_DEBUG_FS and
> confirming the revert on 5.14.0-rc4 works.

Thanks for the confirmation.

2021-08-16 14:55:27

by lnx7586

[permalink] [raw]
Subject: [PATCH] drm/ttm: allow debugfs_create_file() to fail in ttm_global_init()

From: Greg Depoire--Ferrer <[email protected]>

Commit 69de4421bb4c ("drm/ttm: Initialize debugfs from ttm_global_init()")
unintentionally made ttm_global_init() return early with an error when
debugfs_create_file() fails. When CONFIG_DEBUG_FS is disabled,
debugfs_create_file() returns a ENODEV error so the TTM device would fail
to initialize.

Instead of returning early with the error, print it and continue. ENODEV
can be ignored because it just means that CONFIG_DEBUG_FS is disabled.

Fixes: 69de4421bb4c ("drm/ttm: Initialize debugfs from ttm_global_init()")
Reported-by: Mikael Pettersson <[email protected]>
Reported-by: Duncan <[email protected]>
Signed-off-by: Greg Depoire--Ferrer <[email protected]>
---
Hi, I had this bug as well with the nouveau driver after updating. This
patch fixes it for me.

drivers/gpu/drm/ttm/ttm_device.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c
index 74e3b460132b..12b73979c798 100644
--- a/drivers/gpu/drm/ttm/ttm_device.c
+++ b/drivers/gpu/drm/ttm/ttm_device.c
@@ -69,6 +69,7 @@ static int ttm_global_init(void)
unsigned long num_pages, num_dma32;
struct sysinfo si;
int ret = 0;
+ int tmp;

mutex_lock(&ttm_global_mutex);
if (++ttm_glob_use_count > 1)
@@ -78,9 +79,9 @@ static int ttm_global_init(void)

ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
if (IS_ERR(ttm_debugfs_root)) {
- ret = PTR_ERR(ttm_debugfs_root);
- ttm_debugfs_root = NULL;
- goto out;
+ tmp = PTR_ERR(ttm_debugfs_root);
+ if (tmp != -ENODEV)
+ pr_err("failed to create debugfs: %d", tmp);
}

/* Limit the number of pages in the pool to about 50% of the total
--
2.31.1

2021-08-19 22:40:24

by Duncan

[permalink] [raw]
Subject: Re: [PATCH] drm/ttm: allow debugfs_create_file() to fail in ttm_global_init()

On Mon, 16 Aug 2021 16:30:46 +0200
[email protected] wrote:

> From: Greg Depoire--Ferrer <[email protected]>
>
> Commit 69de4421bb4c ("drm/ttm: Initialize debugfs from
> ttm_global_init()") unintentionally made ttm_global_init() return
> early with an error when debugfs_create_file() fails. When
> CONFIG_DEBUG_FS is disabled, debugfs_create_file() returns a ENODEV
> error so the TTM device would fail to initialize.
>
> Instead of returning early with the error, print it and continue.
> ENODEV can be ignored because it just means that CONFIG_DEBUG_FS is
> disabled.
>
> Fixes: 69de4421bb4c ("drm/ttm: Initialize debugfs from ttm_global_init()")
> Reported-by: Mikael Pettersson <[email protected]>
> Reported-by: Duncan <[email protected]>
> Signed-off-by: Greg Depoire--Ferrer <[email protected]>
> ---
> Hi, I had this bug as well with the nouveau driver after updating.
> This patch fixes it for me.
>
> drivers/gpu/drm/ttm/ttm_device.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)

This fixes the problem here, too. Running it now.

Tested-by: Duncan <[email protected]>

--
Duncan - HTML messages treated as spam
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."
Benjamin Franklin