In the call stack xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init
Power_domains hasn't been cleaned up if return error,
which has do the clean in i915_driver_late_release call from i915_driver_probe.
unreferenced object 0xffff88811150ee00 (size 512):
comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
hex dump (first 32 bytes):
10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
[<ffffffff812c98b2>] __kmalloc+0x52/0x150
[<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
[<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
[<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
[<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
[<ffffffff817f2187>] local_pci_probe+0x47/0xa0
[<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
[<ffffffff8192f2a2>] really_probe+0x1a2/0x410
[<ffffffff8192f598>] __driver_probe_device+0x78/0x160
[<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
[<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
[<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
[<ffffffff8192e159>] bus_add_driver+0x119/0x220
[<ffffffff81930d00>] driver_register+0x60/0x120
[<ffffffffa05e50a0>] 0xffffffffa05e50a0
Signed-off-by: wangxiaoming321 <[email protected]>
---
drivers/gpu/drm/xe/xe_display.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
index 74391d9b11ae..2725afba4afb 100644
--- a/drivers/gpu/drm/xe/xe_display.c
+++ b/drivers/gpu/drm/xe/xe_display.c
@@ -146,8 +146,10 @@ int xe_display_init_nommio(struct xe_device *xe)
intel_detect_pch(xe);
err = intel_power_domains_init(xe);
- if (err)
+ if (err) {
+ intel_power_domains_cleanup(xe);
return err;
+ }
return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
}
--
2.25.1
On Thu, 25 Jan 2024, wangxiaoming321 <[email protected]> wrote:
> In the call stack xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init
> Power_domains hasn't been cleaned up if return error,
> which has do the clean in i915_driver_late_release call from i915_driver_probe.
This has nothing to do with i915_*.
If intel_power_domains_init() returns an error, it should have cleaned
up after itself, not force its caller to do that. If there's an issue,
please fix it in intel_power_domains_init().
BR,
Jani.
>
> unreferenced object 0xffff88811150ee00 (size 512):
> comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
> hex dump (first 32 bytes):
> 10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
> ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
> [<ffffffff812c98b2>] __kmalloc+0x52/0x150
> [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
> [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
> [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
> [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
> [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
> [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
> [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
> [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
> [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
> [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
> [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
> [<ffffffff8192e159>] bus_add_driver+0x119/0x220
> [<ffffffff81930d00>] driver_register+0x60/0x120
> [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>
> Signed-off-by: wangxiaoming321 <[email protected]>
> ---
> drivers/gpu/drm/xe/xe_display.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
> index 74391d9b11ae..2725afba4afb 100644
> --- a/drivers/gpu/drm/xe/xe_display.c
> +++ b/drivers/gpu/drm/xe/xe_display.c
> @@ -146,8 +146,10 @@ int xe_display_init_nommio(struct xe_device *xe)
> intel_detect_pch(xe);
>
> err = intel_power_domains_init(xe);
> - if (err)
> + if (err) {
> + intel_power_domains_cleanup(xe);
> return err;
> + }
>
> return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
> }
--
Jani Nikula, Intel
intel_power_domains_init has been called twice in xe_device_probe:
xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
-> intel_power_domains_init(i915)
It needs add a flag to avoid power_domains->power_wells double malloc.
unreferenced object 0xffff88811150ee00 (size 512):
comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
hex dump (first 32 bytes):
10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
[<ffffffff812c98b2>] __kmalloc+0x52/0x150
[<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
[<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
[<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
[<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
[<ffffffff817f2187>] local_pci_probe+0x47/0xa0
[<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
[<ffffffff8192f2a2>] really_probe+0x1a2/0x410
[<ffffffff8192f598>] __driver_probe_device+0x78/0x160
[<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
[<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
[<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
[<ffffffff8192e159>] bus_add_driver+0x119/0x220
[<ffffffff81930d00>] driver_register+0x60/0x120
[<ffffffffa05e50a0>] 0xffffffffa05e50a0
Signed-off-by: wangxiaoming321 <[email protected]>
---
drivers/gpu/drm/i915/display/intel_display_power.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c
index bf9685acf75a..3b48a1cb7c54 100644
--- a/drivers/gpu/drm/i915/display/intel_display_power.c
+++ b/drivers/gpu/drm/i915/display/intel_display_power.c
@@ -36,6 +36,8 @@
for_each_power_well_reverse(__dev_priv, __power_well) \
for_each_if(test_bit((__domain), (__power_well)->domains.bits))
+static int intel_power_domains_init_flag = 0;
+
const char *
intel_display_power_domain_str(enum intel_display_power_domain domain)
{
@@ -1016,6 +1018,11 @@ int intel_power_domains_init(struct drm_i915_private *dev_priv)
{
struct i915_power_domains *power_domains = &dev_priv->display.power.domains;
+ if(intel_power_domains_init_flag == 1)
+ return 0;
+
+ intel_power_domains_init_flag++;
+
dev_priv->display.params.disable_power_well =
sanitize_disable_power_well_option(dev_priv,
dev_priv->display.params.disable_power_well);
@@ -1041,6 +1048,7 @@ int intel_power_domains_init(struct drm_i915_private *dev_priv)
*/
void intel_power_domains_cleanup(struct drm_i915_private *dev_priv)
{
+ intel_power_domains_init_flag = 0;
intel_display_power_map_cleanup(&dev_priv->display.power.domains);
}
--
2.25.1
intel_power_domains_init has been called twice in xe_device_probe:
xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
-> intel_power_domains_init(i915)
It needs remove one to avoid power_domains->power_wells double malloc.
unreferenced object 0xffff88811150ee00 (size 512):
comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
hex dump (first 32 bytes):
10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
[<ffffffff812c98b2>] __kmalloc+0x52/0x150
[<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
[<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
[<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
[<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
[<ffffffff817f2187>] local_pci_probe+0x47/0xa0
[<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
[<ffffffff8192f2a2>] really_probe+0x1a2/0x410
[<ffffffff8192f598>] __driver_probe_device+0x78/0x160
[<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
[<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
[<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
[<ffffffff8192e159>] bus_add_driver+0x119/0x220
[<ffffffff81930d00>] driver_register+0x60/0x120
[<ffffffffa05e50a0>] 0xffffffffa05e50a0
Signed-off-by: wangxiaoming321 <[email protected]>
---
drivers/gpu/drm/xe/xe_display.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
index 74391d9b11ae..e4db069f0db3 100644
--- a/drivers/gpu/drm/xe/xe_display.c
+++ b/drivers/gpu/drm/xe/xe_display.c
@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
int xe_display_init_nommio(struct xe_device *xe)
{
- int err;
-
if (!xe->info.enable_display)
return 0;
@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
/* This must be called before any calls to HAS_PCH_* */
intel_detect_pch(xe);
- err = intel_power_domains_init(xe);
- if (err)
- return err;
-
return drmm_add_action_or_reset(&xe->drm, xe_display_fini_nommio, xe);
}
--
2.25.1
On Fri, Jan 26, 2024 at 10:34:33PM +0800, wangxiaoming321 wrote:
> intel_power_domains_init has been called twice in xe_device_probe:
> xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
> xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
> -> intel_power_domains_init(i915)
Why are you calling it twice?
>
> It needs add a flag to avoid power_domains->power_wells double malloc.
>
> unreferenced object 0xffff88811150ee00 (size 512):
> comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
> hex dump (first 32 bytes):
> 10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
> ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
> [<ffffffff812c98b2>] __kmalloc+0x52/0x150
> [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
> [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
> [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
> [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
> [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
> [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
> [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
> [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
> [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
> [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
> [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
> [<ffffffff8192e159>] bus_add_driver+0x119/0x220
> [<ffffffff81930d00>] driver_register+0x60/0x120
> [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>
> Signed-off-by: wangxiaoming321 <[email protected]>
> ---
> drivers/gpu/drm/i915/display/intel_display_power.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c
> index bf9685acf75a..3b48a1cb7c54 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_power.c
> +++ b/drivers/gpu/drm/i915/display/intel_display_power.c
> @@ -36,6 +36,8 @@
> for_each_power_well_reverse(__dev_priv, __power_well) \
> for_each_if(test_bit((__domain), (__power_well)->domains.bits))
>
> +static int intel_power_domains_init_flag = 0;
> +
> const char *
> intel_display_power_domain_str(enum intel_display_power_domain domain)
> {
> @@ -1016,6 +1018,11 @@ int intel_power_domains_init(struct drm_i915_private *dev_priv)
> {
> struct i915_power_domains *power_domains = &dev_priv->display.power.domains;
>
> + if(intel_power_domains_init_flag == 1)
> + return 0;
> +
> + intel_power_domains_init_flag++;
Consider what happens when you have multiple Intel GPUs in the system...
> +
> dev_priv->display.params.disable_power_well =
> sanitize_disable_power_well_option(dev_priv,
> dev_priv->display.params.disable_power_well);
> @@ -1041,6 +1048,7 @@ int intel_power_domains_init(struct drm_i915_private *dev_priv)
> */
> void intel_power_domains_cleanup(struct drm_i915_private *dev_priv)
> {
> + intel_power_domains_init_flag = 0;
> intel_display_power_map_cleanup(&dev_priv->display.power.domains);
> }
>
> --
> 2.25.1
--
Ville Syrj?l?
Intel
+Jani
On Fri, Jan 26, 2024 at 11:34:53PM +0800, wangxiaoming321 wrote:
>intel_power_domains_init has been called twice in xe_device_probe:
>xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
>xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
>-> intel_power_domains_init(i915)
ok, once upon a time intel_power_domains_init() was called by the driver
initialization code and not initialized inside the display. I think.
Now it's part of the display probe and we never updated the xe side.
>
>It needs remove one to avoid power_domains->power_wells double malloc.
>
>unreferenced object 0xffff88811150ee00 (size 512):
> comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
> hex dump (first 32 bytes):
> 10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
> ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
> [<ffffffff812c98b2>] __kmalloc+0x52/0x150
> [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
> [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
> [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
> [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
> [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
> [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
> [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
> [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
> [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
> [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
> [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
> [<ffffffff8192e159>] bus_add_driver+0x119/0x220
> [<ffffffff81930d00>] driver_register+0x60/0x120
> [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>
This will need a Fixes trailer. This seems to be a suitable one:
Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
>Signed-off-by: wangxiaoming321 <[email protected]>
>---
> drivers/gpu/drm/xe/xe_display.c | 6 ------
> 1 file changed, 6 deletions(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
>index 74391d9b11ae..e4db069f0db3 100644
>--- a/drivers/gpu/drm/xe/xe_display.c
>+++ b/drivers/gpu/drm/xe/xe_display.c
>@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
>
> int xe_display_init_nommio(struct xe_device *xe)
> {
>- int err;
>-
> if (!xe->info.enable_display)
> return 0;
>
>@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
> /* This must be called before any calls to HAS_PCH_* */
> intel_detect_pch(xe);
>
>- err = intel_power_domains_init(xe);
>- if (err)
>- return err;
xe_display_init_nommio() has xe_display_fini_nommio() as its destructor
counter part. Unfortunately display side looks wrong as it does:
init:
intel_display_driver_probe_noirq() -> intel_power_domains_init()
destroy:
i915_driver_late_release() -> intel_power_domains_cleanup()
I think leaving intel_power_domains_cleanup() as is for now so it's
called by xe works, but this needs to go through CI, which apparently
this series didn't go. I re-triggered it.
+Jani if he thinks this can be changed in another way or already have
the complete solution.
Lucas De Marchi
On Wed, 31 Jan 2024, Lucas De Marchi <[email protected]> wrote:
> +Jani
>
> On Fri, Jan 26, 2024 at 11:34:53PM +0800, wangxiaoming321 wrote:
>>intel_power_domains_init has been called twice in xe_device_probe:
>>xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
>>xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
>>-> intel_power_domains_init(i915)
>
> ok, once upon a time intel_power_domains_init() was called by the driver
> initialization code and not initialized inside the display. I think.
> Now it's part of the display probe and we never updated the xe side.
>
>>
>>It needs remove one to avoid power_domains->power_wells double malloc.
>>
>>unreferenced object 0xffff88811150ee00 (size 512):
>> comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
>> hex dump (first 32 bytes):
>> 10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
>> ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
>> backtrace:
>> [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
>> [<ffffffff812c98b2>] __kmalloc+0x52/0x150
>> [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
>> [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
>> [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
>> [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
>> [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
>> [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
>> [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
>> [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
>> [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
>> [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
>> [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
>> [<ffffffff8192e159>] bus_add_driver+0x119/0x220
>> [<ffffffff81930d00>] driver_register+0x60/0x120
>> [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>>
>
> This will need a Fixes trailer. This seems to be a suitable one:
>
> Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
>
>>Signed-off-by: wangxiaoming321 <[email protected]>
>>---
>> drivers/gpu/drm/xe/xe_display.c | 6 ------
>> 1 file changed, 6 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
>>index 74391d9b11ae..e4db069f0db3 100644
>>--- a/drivers/gpu/drm/xe/xe_display.c
>>+++ b/drivers/gpu/drm/xe/xe_display.c
>>@@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
>>
>> int xe_display_init_nommio(struct xe_device *xe)
>> {
>>- int err;
>>-
>> if (!xe->info.enable_display)
>> return 0;
>>
>>@@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
>> /* This must be called before any calls to HAS_PCH_* */
>> intel_detect_pch(xe);
>>
>>- err = intel_power_domains_init(xe);
>>- if (err)
>>- return err;
>
> xe_display_init_nommio() has xe_display_fini_nommio() as its destructor
> counter part. Unfortunately display side looks wrong as it does:
>
> init:
> intel_display_driver_probe_noirq() -> intel_power_domains_init()
>
> destroy:
> i915_driver_late_release() -> intel_power_domains_cleanup()
>
> I think leaving intel_power_domains_cleanup() as is for now so it's
> called by xe works, but this needs to go through CI, which apparently
> this series didn't go. I re-triggered it.
>
> +Jani if he thinks this can be changed in another way or already have
> the complete solution.
I don't. But it is and will be a recurring problem. i915 and xe core
drivers should handle display init and cleanup the same way. But
currently i915 goes on to call e.g. intel_power_domains_cleanup()
directly from top level driver code. There are other examples.
And we seem to have recently added *more*. See e.g. bd738d859e71
("drm/i915: Prevent modesets during driver init/shutdown").
BR,
Jani.
--
Jani Nikula, Intel
On 2024-01-31 16:07, Jani Nikula wrote:
> On Wed, 31 Jan 2024, Lucas De Marchi <[email protected]> wrote:
>> +Jani
>>
>> On Fri, Jan 26, 2024 at 11:34:53PM +0800, wangxiaoming321 wrote:
>>> intel_power_domains_init has been called twice in xe_device_probe:
>>> xe_device_probe -> xe_display_init_nommio -> intel_power_domains_init(xe)
>>> xe_device_probe -> xe_display_init_noirq -> intel_display_driver_probe_noirq
>>> -> intel_power_domains_init(i915)
>>
>> ok, once upon a time intel_power_domains_init() was called by the driver
>> initialization code and not initialized inside the display. I think.
>> Now it's part of the display probe and we never updated the xe side.
>>
>>>
>>> It needs remove one to avoid power_domains->power_wells double malloc.
>>>
>>> unreferenced object 0xffff88811150ee00 (size 512):
>>> comm "systemd-udevd", pid 506, jiffies 4294674198 (age 3605.560s)
>>> hex dump (first 32 bytes):
>>> 10 b4 9d a0 ff ff ff ff ff ff ff ff ff ff ff ff ................
>>> ff ff ff ff ff ff ff ff 00 00 00 00 00 00 00 00 ................
>>> backtrace:
>>> [<ffffffff8134b901>] __kmem_cache_alloc_node+0x1c1/0x2b0
>>> [<ffffffff812c98b2>] __kmalloc+0x52/0x150
>>> [<ffffffffa08b0033>] __set_power_wells+0xc3/0x360 [xe]
>>> [<ffffffffa08562fc>] xe_display_init_nommio+0x4c/0x70 [xe]
>>> [<ffffffffa07f0d1c>] xe_device_probe+0x3c/0x5a0 [xe]
>>> [<ffffffffa082e48f>] xe_pci_probe+0x33f/0x5a0 [xe]
>>> [<ffffffff817f2187>] local_pci_probe+0x47/0xa0
>>> [<ffffffff817f3db3>] pci_device_probe+0xc3/0x1f0
>>> [<ffffffff8192f2a2>] really_probe+0x1a2/0x410
>>> [<ffffffff8192f598>] __driver_probe_device+0x78/0x160
>>> [<ffffffff8192f6ae>] driver_probe_device+0x1e/0x90
>>> [<ffffffff8192f92a>] __driver_attach+0xda/0x1d0
>>> [<ffffffff8192c95c>] bus_for_each_dev+0x7c/0xd0
>>> [<ffffffff8192e159>] bus_add_driver+0x119/0x220
>>> [<ffffffff81930d00>] driver_register+0x60/0x120
>>> [<ffffffffa05e50a0>] 0xffffffffa05e50a0
>>>
>>
>> This will need a Fixes trailer. This seems to be a suitable one:
>>
>> Fixes: 44e694958b95 ("drm/xe/display: Implement display support")
>>
>>> Signed-off-by: wangxiaoming321 <[email protected]>
>>> ---
>>> drivers/gpu/drm/xe/xe_display.c | 6 ------
>>> 1 file changed, 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_display.c b/drivers/gpu/drm/xe/xe_display.c
>>> index 74391d9b11ae..e4db069f0db3 100644
>>> --- a/drivers/gpu/drm/xe/xe_display.c
>>> +++ b/drivers/gpu/drm/xe/xe_display.c
>>> @@ -134,8 +134,6 @@ static void xe_display_fini_nommio(struct drm_device *dev, void *dummy)
>>>
>>> int xe_display_init_nommio(struct xe_device *xe)
>>> {
>>> - int err;
>>> -
>>> if (!xe->info.enable_display)
>>> return 0;
>>>
>>> @@ -145,10 +143,6 @@ int xe_display_init_nommio(struct xe_device *xe)
>>> /* This must be called before any calls to HAS_PCH_* */
>>> intel_detect_pch(xe);
>>>
>>> - err = intel_power_domains_init(xe);
>>> - if (err)
>>> - return err;
>>
>> xe_display_init_nommio() has xe_display_fini_nommio() as its destructor
>> counter part. Unfortunately display side looks wrong as it does:
>>
>> init:
>> intel_display_driver_probe_noirq() -> intel_power_domains_init()
>>
>> destroy:
>> i915_driver_late_release() -> intel_power_domains_cleanup()
>>
>> I think leaving intel_power_domains_cleanup() as is for now so it's
>> called by xe works, but this needs to go through CI, which apparently
>> this series didn't go. I re-triggered it.
>>
>> +Jani if he thinks this can be changed in another way or already have
>> the complete solution.
>
> I don't. But it is and will be a recurring problem. i915 and xe core
> drivers should handle display init and cleanup the same way. But
> currently i915 goes on to call e.g. intel_power_domains_cleanup()
> directly from top level driver code. There are other examples.
>
> And we seem to have recently added *more*. See e.g. bd738d859e71
> ("drm/i915: Prevent modesets during driver init/shutdown").
That commit seems terrible Should we instead not only enable any code
that can cause modesets after it's safe to do so?
Cheers,
~Maarten