2023-09-15 07:45:03

by Adrián Larumbe

[permalink] [raw]
Subject: [PATCH v5 0/6] Add fdinfo support to Panfrost

This patch series adds fdinfo support to the Panfrost DRM driver. It will
display a series of key:value pairs under /proc/pid/fdinfo/fd for render
processes that open the Panfrost DRM file.

The pairs contain basic drm gpu engine and memory region information that
can either be cat by a privileged user or accessed with IGT's gputop
utility.

Changelog:

v1: https://lore.kernel.org/lkml/[email protected]/T/

v2: https://lore.kernel.org/lkml/[email protected]/T/
- Changed the way gpu cycles and engine time are calculated, using GPU
registers and taking into account potential resets.
- Split render engine values into fragment and vertex/tiler ones.
- Added more fine-grained calculation of RSS size for BO's.
- Implemente selection of drm-memory region size units
- Removed locking of shrinker's mutex in GEM obj status function

v3: https://lore.kernel.org/lkml/[email protected]/
- Changed fdinfo engine names to something more descriptive
- Mentioned GPU cycle counts aren't an exact measure
- Handled the case when job->priv might be NULL
- Handled 32 bit overflow of cycle register
- Kept fdinfo drm memory stats size unit display within 10k times the
previous multiplier for more accurate BO size numbers
- Removed special handling of Prime imported BO RSS
- Use rss_size only for heap objects
- Use bo->base.madv instead of specific purgeable flag
- Fixed kernel test robot warnings

v4: https://lore.kernel.org/lkml/[email protected]/
- Move cycle counter get and put to panfrost_job_hw_submit and
panfrost_job_handle_{err,done} for more accuracy
- Make sure cycle counter refs are released in reset path
- Drop the model param for toggling cycle counting and do
leave it down to the debugfs file
- Don't disable cycle counter when togglint debugfs file,
let refcounting logic handle it instead.
- Remove fdinfo data nested structure definion and 'names' field
- When incrementing BO RSS size in GPU MMU page fault IRQ handler, assume
granuality of 2MiB for every successful mapping.
- drm-file picks an fdinfo memory object size unit that doesn't lose precision.

v5:
- Removed explicit initialisation of atomic variable for profiling mode,
as it's allocated with kzalloc.
- Pass engine utilisation structure to jobs rather than the file context, to avoid
future misusage of the latter.
- Remove double reading of cycle counter register and ktime in job deqeueue function,
as the scheduler will make sure these values are read over in case of requeuing.
- Moved putting of cycle counting refcnt into panfrost job dequeue
function to avoid repetition.

Adrián Larumbe (6):
drm/panfrost: Add cycle count GPU register definitions
drm/panfrost: Add fdinfo support GPU load metrics
drm/panfrost: Add fdinfo support for memory stats
drm/drm_file: Add DRM obj's RSS reporting function for fdinfo
drm/panfrost: Implement generic DRM object RSS reporting function
drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

drivers/gpu/drm/drm_file.c | 10 +++-
drivers/gpu/drm/panfrost/Makefile | 2 +
drivers/gpu/drm/panfrost/panfrost_debugfs.c | 20 +++++++
drivers/gpu/drm/panfrost/panfrost_debugfs.h | 13 +++++
drivers/gpu/drm/panfrost/panfrost_devfreq.c | 8 +++
drivers/gpu/drm/panfrost/panfrost_devfreq.h | 3 ++
drivers/gpu/drm/panfrost/panfrost_device.c | 2 +
drivers/gpu/drm/panfrost/panfrost_device.h | 13 +++++
drivers/gpu/drm/panfrost/panfrost_drv.c | 59 ++++++++++++++++++++-
drivers/gpu/drm/panfrost/panfrost_gem.c | 29 ++++++++++
drivers/gpu/drm/panfrost/panfrost_gem.h | 5 ++
drivers/gpu/drm/panfrost/panfrost_gpu.c | 41 ++++++++++++++
drivers/gpu/drm/panfrost/panfrost_gpu.h | 4 ++
drivers/gpu/drm/panfrost/panfrost_job.c | 24 +++++++++
drivers/gpu/drm/panfrost/panfrost_job.h | 5 ++
drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 +
drivers/gpu/drm/panfrost/panfrost_regs.h | 5 ++
include/drm/drm_gem.h | 9 ++++
18 files changed, 250 insertions(+), 3 deletions(-)
create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.c
create mode 100644 drivers/gpu/drm/panfrost/panfrost_debugfs.h


base-commit: f45acf7acf75921c0409d452f0165f51a19a74fd
--
2.42.0


2023-09-15 14:42:51

by Adrián Larumbe

[permalink] [raw]
Subject: [PATCH v5 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

The current implementation will try to pick the highest available size
display unit as soon as the BO size exceeds that of the previous
multiplier. That can lead to loss of precision in contexts of low memory
usage.

The new selection criteria try to preserve precision, whilst also
increasing the display unit selection threshold to render more accurate
values.

Signed-off-by: Adrián Larumbe <[email protected]>
---
drivers/gpu/drm/drm_file.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 762965e3d503..34cfa128ffe5 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
}
EXPORT_SYMBOL(drm_send_event);

+#define UPPER_UNIT_THRESHOLD 100
+
static void print_size(struct drm_printer *p, const char *stat,
const char *region, u64 sz)
{
@@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
unsigned u;

for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
- if (sz < SZ_1K)
+ if ((sz & (SZ_1K - 1)) &&
+ sz < UPPER_UNIT_THRESHOLD * SZ_1K)
break;
sz = div_u64(sz, SZ_1K);
}
--
2.42.0

2023-09-15 14:54:55

by Boris Brezillon

[permalink] [raw]
Subject: Re: [PATCH v5 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

On Thu, 14 Sep 2023 23:38:44 +0100
Adrián Larumbe <[email protected]> wrote:

> The current implementation will try to pick the highest available size
> display unit as soon as the BO size exceeds that of the previous
> multiplier. That can lead to loss of precision in contexts of low memory
> usage.
>
> The new selection criteria try to preserve precision, whilst also
> increasing the display unit selection threshold to render more accurate
> values.
>
> Signed-off-by: Adrián Larumbe <[email protected]>

Reviewed-by: Boris Brezillon <[email protected]>

> ---
> drivers/gpu/drm/drm_file.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 762965e3d503..34cfa128ffe5 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> }
> EXPORT_SYMBOL(drm_send_event);
>
> +#define UPPER_UNIT_THRESHOLD 100
> +
> static void print_size(struct drm_printer *p, const char *stat,
> const char *region, u64 sz)
> {
> @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
> unsigned u;
>
> for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> - if (sz < SZ_1K)
> + if ((sz & (SZ_1K - 1)) &&
> + sz < UPPER_UNIT_THRESHOLD * SZ_1K)
> break;
> sz = div_u64(sz, SZ_1K);
> }

2023-09-18 10:16:11

by Steven Price

[permalink] [raw]
Subject: Re: [PATCH v5 6/6] drm/drm-file: Show finer-grained BO sizes in drm_show_memory_stats

On 14/09/2023 23:38, Adrián Larumbe wrote:
> The current implementation will try to pick the highest available size
> display unit as soon as the BO size exceeds that of the previous
> multiplier. That can lead to loss of precision in contexts of low memory
> usage.
>
> The new selection criteria try to preserve precision, whilst also
> increasing the display unit selection threshold to render more accurate
> values.
>
> Signed-off-by: Adrián Larumbe <[email protected]>

I have to admit I find it odd to be "pretty printing" this value in the
first place. But this is clearly an improvement.

Reviewed-by: Steven Price <[email protected]>

> ---
> drivers/gpu/drm/drm_file.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 762965e3d503..34cfa128ffe5 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -872,6 +872,8 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> }
> EXPORT_SYMBOL(drm_send_event);
>
> +#define UPPER_UNIT_THRESHOLD 100
> +
> static void print_size(struct drm_printer *p, const char *stat,
> const char *region, u64 sz)
> {
> @@ -879,7 +881,8 @@ static void print_size(struct drm_printer *p, const char *stat,
> unsigned u;
>
> for (u = 0; u < ARRAY_SIZE(units) - 1; u++) {
> - if (sz < SZ_1K)
> + if ((sz & (SZ_1K - 1)) &&
> + sz < UPPER_UNIT_THRESHOLD * SZ_1K)
> break;
> sz = div_u64(sz, SZ_1K);
> }