2020-06-03 13:04:22

by Julian Stecklina

[permalink] [raw]
Subject: [PATCH] drm/i915/gvt: print actionable error message when gm runs out

When a user tries to allocate too many or too big vGPUs and runs out
of graphics memory, the resulting error message is not actionable and
looks like an internal error.

Change the error message to clearly point out what actions a user can
take to resolve this situation.

Cc: Thomas Prescher <[email protected]>
Cc: Zhenyu Wang <[email protected]>
Signed-off-by: Julian Stecklina <[email protected]>
---
drivers/gpu/drm/i915/gvt/aperture_gm.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
index 0d6d598713082..5c5c8e871dae2 100644
--- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
+++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
@@ -69,9 +69,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
start, end, flags);
mmio_hw_access_post(gt);
mutex_unlock(&gt->ggtt->vm.mutex);
- if (ret)
- gvt_err("fail to alloc %s gm space from host\n",
- high_gm ? "high" : "low");
+ if (ret) {
+ gvt_err("vgpu%d: failed to allocate %s gm space from host\n",
+ vgpu->id, high_gm ? "high" : "low");
+ gvt_err("vgpu%d: destroying vGPUs, decreasing vGPU memory size or increasing GPU aperture size may resolve this\n",
+ vgpu->id);
+ }

return ret;
}
--
2.26.2


2020-06-05 05:15:00

by Zhenyu Wang

[permalink] [raw]
Subject: Re: [PATCH] drm/i915/gvt: print actionable error message when gm runs out

On 2020.06.03 14:33:21 +0200, Julian Stecklina wrote:
> When a user tries to allocate too many or too big vGPUs and runs out
> of graphics memory, the resulting error message is not actionable and
> looks like an internal error.
>
> Change the error message to clearly point out what actions a user can
> take to resolve this situation.
>
> Cc: Thomas Prescher <[email protected]>
> Cc: Zhenyu Wang <[email protected]>
> Signed-off-by: Julian Stecklina <[email protected]>
> ---
> drivers/gpu/drm/i915/gvt/aperture_gm.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> index 0d6d598713082..5c5c8e871dae2 100644
> --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
> +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
> @@ -69,9 +69,12 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm)
> start, end, flags);
> mmio_hw_access_post(gt);
> mutex_unlock(&gt->ggtt->vm.mutex);
> - if (ret)
> - gvt_err("fail to alloc %s gm space from host\n",
> - high_gm ? "high" : "low");
> + if (ret) {
> + gvt_err("vgpu%d: failed to allocate %s gm space from host\n",
> + vgpu->id, high_gm ? "high" : "low");
> + gvt_err("vgpu%d: destroying vGPUs, decreasing vGPU memory size or increasing GPU aperture size may resolve this\n",
> + vgpu->id);

Currently we can't decrease vGPU mem size as defined by mdev type,
so actually you may try different vGPU type. And aperture size is
also handled for supported vGPU mdev types, so assume user should
already be awared of that too. I just don't want us to be too chatty. :)

> + }
>
> return ret;
> }
> --
> 2.26.2
>
> _______________________________________________
> intel-gvt-dev mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

--
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827


Attachments:
(No filename) (2.02 kB)
signature.asc (201.00 B)
Download all attachments

2020-06-05 11:12:45

by Julian Stecklina

[permalink] [raw]
Subject: Re: [PATCH] drm/i915/gvt: print actionable error message when gm runs out

On Fri, 2020-06-05 at 12:54 +0800, Zhenyu Wang wrote:
> On 2020.06.03 14:33:21 +0200, Julian Stecklina wrote:
> > + gvt_err("vgpu%d: failed to allocate %s gm space from host\n",
> > + vgpu->id, high_gm ? "high" : "low");
> > + gvt_err("vgpu%d: destroying vGPUs, decreasing vGPU memory size
> > or increasing GPU aperture size may resolve this\n",
> > + vgpu->id);
>
> Currently we can't decrease vGPU mem size as defined by mdev type,
> so actually you may try different vGPU type.

Yes, that's what I meant.

> And aperture size is
> also handled for supported vGPU mdev types, so assume user should
> already be awared of that too. I just don't want us to be too chatty. :)

Our users typically hit this particular error message when they haven't
configured the GPU aperture size in the system BIOS correctly. Many laptops we
see have the aperture set to 256MB and this is simply not enough.

I don't cling to the specific wording of the error message, but any hint in the
error message that this is not an obscure, internal error or bug, but something
that the user can actually fix, would be helpful.

Julian