2018-01-18 16:49:12

by Andrey Grodzovsky

[permalink] [raw]
Subject: [RFC] Per file OOM badness

Hi, this series is a revised version of an RFC sent by Christian König
a few years ago. The original RFC can be found at
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC
and switched to a callback into file_ops instead of a new member in struct file.

Thanks,
Andrey



2018-01-18 16:50:34

by Andrey Grodzovsky

[permalink] [raw]
Subject: [PATCH 3/4] drm/gem: adjust per file OOM badness on handling buffers

Large amounts of VRAM are usually not CPU accessible, so they are not mapped
into the processes address space. But since the device drivers usually support
swapping buffers from VRAM to system memory we can still run into an out of
memory situation when userspace starts to allocate to much.

This patch gives the OOM another hint which process is
holding how many resources.

Signed-off-by: Andrey Grodzovsky <[email protected]>
---
drivers/gpu/drm/drm_file.c | 12 ++++++++++++
drivers/gpu/drm/drm_gem.c | 8 ++++++++
include/drm/drm_file.h | 4 ++++
3 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index b3c6e99..626cc76 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -747,3 +747,15 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
spin_unlock_irqrestore(&dev->event_lock, irqflags);
}
EXPORT_SYMBOL(drm_send_event);
+
+long drm_oom_badness(struct file *f)
+{
+
+ struct drm_file *file_priv = f->private_data;
+
+ if (file_priv)
+ return atomic_long_read(&file_priv->f_oom_badness);
+
+ return 0;
+}
+EXPORT_SYMBOL(drm_oom_badness);
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 01f8d94..ffbadc8 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -264,6 +264,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
drm_gem_remove_prime_handles(obj, file_priv);
drm_vma_node_revoke(&obj->vma_node, file_priv);

+ atomic_long_sub(obj->size >> PAGE_SHIFT,
+ &file_priv->f_oom_badness);
+
drm_gem_object_handle_put_unlocked(obj);

return 0;
@@ -299,6 +302,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
idr_remove(&filp->object_idr, handle);
spin_unlock(&filp->table_lock);

+ atomic_long_sub(obj->size >> PAGE_SHIFT, &filp->f_oom_badness);
+
return 0;
}
EXPORT_SYMBOL(drm_gem_handle_delete);
@@ -417,6 +422,9 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
}

*handlep = handle;
+
+ atomic_long_add(obj->size >> PAGE_SHIFT,
+ &file_priv->f_oom_badness);
return 0;

err_revoke:
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 0e0c868..ac3aa75 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -317,6 +317,8 @@ struct drm_file {

/* private: */
unsigned long lock_count; /* DRI1 legacy lock count */
+
+ atomic_long_t f_oom_badness;
};

/**
@@ -378,4 +380,6 @@ void drm_event_cancel_free(struct drm_device *dev,
void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e);
void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);

+long drm_oom_badness(struct file *f);
+
#endif /* _DRM_FILE_H_ */
--
2.7.4


2018-01-18 16:50:34

by Andrey Grodzovsky

[permalink] [raw]
Subject: [PATCH 4/4] drm/amdgpu: Use drm_oom_badness for amdgpu.

Signed-off-by: Andrey Grodzovsky <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 46a0c93..6a733cdc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -828,6 +828,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
#ifdef CONFIG_COMPAT
.compat_ioctl = amdgpu_kms_compat_ioctl,
#endif
+ .oom_file_badness = drm_oom_badness,
};

static bool
--
2.7.4


2018-01-18 16:50:38

by Andrey Grodzovsky

[permalink] [raw]
Subject: [PATCH 2/4] oom: take per file badness into account

Try to make better decisions which process to kill based on
per file OOM badness

Signed-off-by: Andrey Grodzovsky <[email protected]>
---
mm/oom_kill.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 29f8555..825ed52 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -49,6 +49,8 @@
#define CREATE_TRACE_POINTS
#include <trace/events/oom.h>

+#include <linux/fdtable.h>
+
int sysctl_panic_on_oom;
int sysctl_oom_kill_allocating_task;
int sysctl_oom_dump_tasks = 1;
@@ -182,6 +184,21 @@ static bool is_dump_unreclaim_slabs(void)
}

/**
+ * oom_file_badness - add per file badness
+ * @points: pointer to summed up badness points
+ * @file: tasks open file
+ * @n: file descriptor id (unused)
+ */
+static int oom_file_badness(const void *points, struct file *file, unsigned n)
+{
+ if (file->f_op->oom_file_badness)
+ *((long *)points) += file->f_op->oom_file_badness(file);
+
+ return 0;
+}
+
+
+/**
* oom_badness - heuristic function to determine which candidate task to kill
* @p: task struct of which task we should calculate
* @totalpages: total present RAM allowed for page allocation
@@ -222,6 +239,12 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
*/
points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
mm_pgtables_bytes(p->mm) / PAGE_SIZE;
+
+ /*
+ * Add how much memory a task uses in opened files, e.g. device drivers.
+ */
+ iterate_fd(p->files, 0, oom_file_badness, &points);
+
task_unlock(p);

/*
--
2.7.4


2018-01-18 16:52:51

by Andrey Grodzovsky

[permalink] [raw]
Subject: [PATCH 1/4] fs: add OOM badness callback in file_operatrations struct.

This allows device drivers to specify an additional badness for the OOM
when they allocate memory on behalf of userspace.

Signed-off-by: Andrey Grodzovsky <[email protected]>
---
include/linux/fs.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaa..938394a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1728,6 +1728,7 @@ struct file_operations {
u64);
ssize_t (*dedupe_file_range)(struct file *, u64, u64, struct file *,
u64);
+ long (*oom_file_badness)(struct file *);
} __randomize_layout;

struct inode_operations {
--
2.7.4


2018-01-18 17:03:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> Hi, this series is a revised version of an RFC sent by Christian K?nig
> a few years ago. The original RFC can be found at
> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>
> This is the same idea and I've just adressed his concern from the original RFC
> and switched to a callback into file_ops instead of a new member in struct file.

Please add the full description to the cover letter and do not make
people hunt links.

Here is the origin cover letter text
: I'm currently working on the issue that when device drivers allocate memory on
: behalf of an application the OOM killer usually doesn't knew about that unless
: the application also get this memory mapped into their address space.
:
: This is especially annoying for graphics drivers where a lot of the VRAM
: usually isn't CPU accessible and so doesn't make sense to map into the
: address space of the process using it.
:
: The problem now is that when an application starts to use a lot of VRAM those
: buffers objects sooner or later get swapped out to system memory, but when we
: now run into an out of memory situation the OOM killer obviously doesn't knew
: anything about that memory and so usually kills the wrong process.
:
: The following set of patches tries to address this problem by introducing a per
: file OOM badness score, which device drivers can use to give the OOM killer a
: hint how many resources are bound to a file descriptor so that it can make
: better decisions which process to kill.
:
: So question at every one: What do you think about this approach?
:
: My biggest concern right now is the patches are messing with a core kernel
: structure (adding a field to struct file). Any better idea? I'm considering
: to put a callback into file_ops instead.

--
Michal Hocko
SUSE Labs

2018-01-18 17:14:46

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > Hi, this series is a revised version of an RFC sent by Christian K?nig
> > a few years ago. The original RFC can be found at
> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> >
> > This is the same idea and I've just adressed his concern from the original RFC
> > and switched to a callback into file_ops instead of a new member in struct file.
>
> Please add the full description to the cover letter and do not make
> people hunt links.
>
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate memory on
> : behalf of an application the OOM killer usually doesn't knew about that unless
> : the application also get this memory mapped into their address space.
> :
> : This is especially annoying for graphics drivers where a lot of the VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into the
> : address space of the process using it.
> :
> : The problem now is that when an application starts to use a lot of VRAM those
> : buffers objects sooner or later get swapped out to system memory, but when we
> : now run into an out of memory situation the OOM killer obviously doesn't knew
> : anything about that memory and so usually kills the wrong process.

OK, but how do you attribute that memory to a particular OOM killable
entity? And how do you actually enforce that those resources get freed
on the oom killer action?

> : The following set of patches tries to address this problem by introducing a per
> : file OOM badness score, which device drivers can use to give the OOM killer a
> : hint how many resources are bound to a file descriptor so that it can make
> : better decisions which process to kill.

But files are not killable, they can be shared... In other words this
doesn't help the oom killer to make an educated guess at all.

> :
> : So question at every one: What do you think about this approach?

I thing is just just wrong semantically. Non-reclaimable memory is a
pain, especially when there is way too much of it. If you can free that
memory somehow then you can hook into slab shrinker API and react on the
memory pressure. If you can account such a memory to a particular
process and make sure that the consumption is bound by the process life
time then we can think of an accounting that oom_badness can consider
when selecting a victim.

--
Michal Hocko
SUSE Labs

2018-01-18 20:02:28

by Eric Anholt

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Michal Hocko <[email protected]> writes:

> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>> > Hi, this series is a revised version of an RFC sent by Christian König
>> > a few years ago. The original RFC can be found at
>> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>> >
>> > This is the same idea and I've just adressed his concern from the original RFC
>> > and switched to a callback into file_ops instead of a new member in struct file.
>>
>> Please add the full description to the cover letter and do not make
>> people hunt links.
>>
>> Here is the origin cover letter text
>> : I'm currently working on the issue that when device drivers allocate memory on
>> : behalf of an application the OOM killer usually doesn't knew about that unless
>> : the application also get this memory mapped into their address space.
>> :
>> : This is especially annoying for graphics drivers where a lot of the VRAM
>> : usually isn't CPU accessible and so doesn't make sense to map into the
>> : address space of the process using it.
>> :
>> : The problem now is that when an application starts to use a lot of VRAM those
>> : buffers objects sooner or later get swapped out to system memory, but when we
>> : now run into an out of memory situation the OOM killer obviously doesn't knew
>> : anything about that memory and so usually kills the wrong process.
>
> OK, but how do you attribute that memory to a particular OOM killable
> entity? And how do you actually enforce that those resources get freed
> on the oom killer action?
>
>> : The following set of patches tries to address this problem by introducing a per
>> : file OOM badness score, which device drivers can use to give the OOM killer a
>> : hint how many resources are bound to a file descriptor so that it can make
>> : better decisions which process to kill.
>
> But files are not killable, they can be shared... In other words this
> doesn't help the oom killer to make an educated guess at all.

Maybe some more context would help the discussion?

The struct file in patch 3 is the DRM fd. That's effectively "my
process's interface to talking to the GPU" not "a single GPU resource".
Once that file is closed, all of the process's private, idle GPU buffers
will be immediately freed (this will be most of their allocations), and
some will be freed once the GPU completes some work (this will be most
of the rest of their allocations).

Some GEM BOs won't be freed just by closing the fd, if they've been
shared between processes. Those are usually about 8-24MB total in a
process, rather than the GBs that modern apps use (or that our testcases
like to allocate and thus trigger oomkilling of the test harness instead
of the offending testcase...)

Even if we just had the private+idle buffers being accounted in OOM
badness, that would be a huge step forward in system reliability.

>> : So question at every one: What do you think about this approach?
>
> I thing is just just wrong semantically. Non-reclaimable memory is a
> pain, especially when there is way too much of it. If you can free that
> memory somehow then you can hook into slab shrinker API and react on the
> memory pressure. If you can account such a memory to a particular
> process and make sure that the consumption is bound by the process life
> time then we can think of an accounting that oom_badness can consider
> when selecting a victim.

For graphics, we can't free most of our memory without also effectively
killing the process. i915 and vc4 have "purgeable" interfaces for
userspace (on i915 this is exposed all the way to GL applications and is
hooked into shrinker, and on vc4 this is so far just used for
userspace-internal buffer caches to be purged when a CMA allocation
fails). However, those purgeable pools are expected to be a tiny
fraction of the GPU allocations by the process.


Attachments:
signature.asc (847.00 B)

2018-01-19 05:41:41

by He, Roger

[permalink] [raw]
Subject: RE: [RFC] Per file OOM badness

Basically the idea is right to me.

1. But we need smaller granularity to control the contribution to OOM badness.
Because when the TTM buffer resides in VRAM rather than evict to system memory, we should not take this account into badness.
But I think it is not easy to implement.

2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure the buffer size is already taken into account for kernel.
If yes, at last the size will be counted again by your patches.

So, I am thinking if we can counted the TTM buffer size into:
struct mm_rss_stat {
atomic_long_t count[NR_MM_COUNTERS];
};
Which is done by kernel based on CPU VM (page table).

Something like that:
When GTT allocate suceess:
add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size);

When GTT swapped out:
dec_mm_counter from MM_ANONPAGES frist, then
add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size); // or MM_SHMEMPAGES or add new item.

Update the corresponding item in mm_rss_stat always.
If that, we can control the status update accurately.
What do you think about that?
And is there any side-effect for this approach?


Thanks
Roger(Hongbo.He)

-----Original Message-----
From: dri-devel [mailto:[email protected]] On Behalf Of Andrey Grodzovsky
Sent: Friday, January 19, 2018 12:48 AM
To: [email protected]; [email protected]; [email protected]; [email protected]
Cc: Koenig, Christian <[email protected]>
Subject: [RFC] Per file OOM badness

Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file.

Thanks,
Andrey

_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

2018-01-19 06:02:20

by He, Roger

[permalink] [raw]
Subject: RE: [RFC] Per file OOM badness



-----Original Message-----
From: amd-gfx [mailto:[email protected]] On Behalf Of Michal Hocko
Sent: Friday, January 19, 2018 1:14 AM
To: Grodzovsky, Andrey <[email protected]>
Cc: [email protected]; [email protected]; [email protected]; [email protected]; Koenig, Christian <[email protected]>
Subject: Re: [RFC] Per file OOM badness

On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > Hi, this series is a revised version of an RFC sent by Christian
> > König a few years ago. The original RFC can be found at
> > https://lists.freedesktop.org/archives/dri-devel/2015-September/0897
> > 78.html
> >
> > This is the same idea and I've just adressed his concern from the
> > original RFC and switched to a callback into file_ops instead of a new member in struct file.
>
> Please add the full description to the cover letter and do not make
> people hunt links.
>
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate
> memory on
> : behalf of an application the OOM killer usually doesn't knew about
> that unless
> : the application also get this memory mapped into their address space.
> :
> : This is especially annoying for graphics drivers where a lot of the
> VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into
> the
> : address space of the process using it.
> :
> : The problem now is that when an application starts to use a lot of
> VRAM those
> : buffers objects sooner or later get swapped out to system memory,
> but when we
> : now run into an out of memory situation the OOM killer obviously
> doesn't knew
> : anything about that memory and so usually kills the wrong process.

OK, but how do you attribute that memory to a particular OOM killable entity? And how do you actually enforce that those resources get freed on the oom killer action?

Here I think we need more fine granularity for distinguishing the buffer is taking VRAM or system memory.

> : The following set of patches tries to address this problem by
> introducing a per
> : file OOM badness score, which device drivers can use to give the OOM
> killer a
> : hint how many resources are bound to a file descriptor so that it
> can make
> : better decisions which process to kill.

But files are not killable, they can be shared... In other words this doesn't help the oom killer to make an educated guess at all.

> :
> : So question at every one: What do you think about this approach?

I thing is just just wrong semantically. Non-reclaimable memory is a pain, especially when there is way too much of it. If you can free that memory somehow then you can hook into slab shrinker API and react on the memory pressure. If you can account such a memory to a particular process and make sure that the consumption is bound by the process life time then we can think of an accounting that oom_badness can consider when selecting a victim.

I think you are misunderstanding here.
Actually for now, the memory in TTM Pools already has mm_shrink which is implemented in ttm_pool_mm_shrink_init.
And here the memory we want to make it contribute to OOM badness is not in TTM Pools.
Because when TTM buffer allocation success, the memory already is removed from TTM Pools.

Thanks
Roger(Hongbo.He)

--
Michal Hocko
SUSE Labs
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

2018-01-19 06:03:47

by Chunming Zhou

[permalink] [raw]
Subject: Re: [PATCH 3/4] drm/gem: adjust per file OOM badness on handling buffers



On 2018年01月19日 00:47, Andrey Grodzovsky wrote:
> Large amounts of VRAM are usually not CPU accessible, so they are not mapped
> into the processes address space. But since the device drivers usually support
> swapping buffers from VRAM to system memory we can still run into an out of
> memory situation when userspace starts to allocate to much.
>
> This patch gives the OOM another hint which process is
> holding how many resources.
>
> Signed-off-by: Andrey Grodzovsky <[email protected]>
> ---
> drivers/gpu/drm/drm_file.c | 12 ++++++++++++
> drivers/gpu/drm/drm_gem.c | 8 ++++++++
> include/drm/drm_file.h | 4 ++++
> 3 files changed, 24 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index b3c6e99..626cc76 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -747,3 +747,15 @@ void drm_send_event(struct drm_device *dev, struct drm_pending_event *e)
> spin_unlock_irqrestore(&dev->event_lock, irqflags);
> }
> EXPORT_SYMBOL(drm_send_event);
> +
> +long drm_oom_badness(struct file *f)
> +{
> +
> + struct drm_file *file_priv = f->private_data;
> +
> + if (file_priv)
> + return atomic_long_read(&file_priv->f_oom_badness);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL(drm_oom_badness);
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index 01f8d94..ffbadc8 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -264,6 +264,9 @@ drm_gem_object_release_handle(int id, void *ptr, void *data)
> drm_gem_remove_prime_handles(obj, file_priv);
> drm_vma_node_revoke(&obj->vma_node, file_priv);
>
> + atomic_long_sub(obj->size >> PAGE_SHIFT,
> + &file_priv->f_oom_badness);
> +
> drm_gem_object_handle_put_unlocked(obj);
>
> return 0;
> @@ -299,6 +302,8 @@ drm_gem_handle_delete(struct drm_file *filp, u32 handle)
> idr_remove(&filp->object_idr, handle);
> spin_unlock(&filp->table_lock);
>
> + atomic_long_sub(obj->size >> PAGE_SHIFT, &filp->f_oom_badness);
> +
> return 0;
> }
> EXPORT_SYMBOL(drm_gem_handle_delete);
> @@ -417,6 +422,9 @@ drm_gem_handle_create_tail(struct drm_file *file_priv,
> }
>
> *handlep = handle;
> +
> + atomic_long_add(obj->size >> PAGE_SHIFT,
> + &file_priv->f_oom_badness);
For VRAM case, it should be counted only when vram bo is evicted to
system memory.
For example, vram total is 8GB, system memory total is 8GB, one
application allocates 7GB vram and 7GB system memory, which is allowed,
but if following your idea, then this application will be killed by OOM,
right?

Regards,
David Zhou
> return 0;
>
> err_revoke:
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 0e0c868..ac3aa75 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -317,6 +317,8 @@ struct drm_file {
>
> /* private: */
> unsigned long lock_count; /* DRI1 legacy lock count */
> +
> + atomic_long_t f_oom_badness;
> };
>
> /**
> @@ -378,4 +380,6 @@ void drm_event_cancel_free(struct drm_device *dev,
> void drm_send_event_locked(struct drm_device *dev, struct drm_pending_event *e);
> void drm_send_event(struct drm_device *dev, struct drm_pending_event *e);
>
> +long drm_oom_badness(struct file *f);
> +
> #endif /* _DRM_FILE_H_ */


2018-01-19 08:19:46

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 19.01.2018 um 06:39 schrieb He, Roger:
> Basically the idea is right to me.
>
> 1. But we need smaller granularity to control the contribution to OOM badness.
> Because when the TTM buffer resides in VRAM rather than evict to system memory, we should not take this account into badness.
> But I think it is not easy to implement.

I was considering that as well when I wrote the original patch set, but
then decided against it at least for now.

Basically all VRAM buffers can be swapped to system memory, so they
potentially need system memory as well. That is especially important
during suspend/resume.

>
> 2. If the TTM buffer(GTT here) is mapped to user for CPU access, not quite sure the buffer size is already taken into account for kernel.
> If yes, at last the size will be counted again by your patches.

No that isn't accounted for as far as I know.

>
> So, I am thinking if we can counted the TTM buffer size into:
> struct mm_rss_stat {
> atomic_long_t count[NR_MM_COUNTERS];
> };
> Which is done by kernel based on CPU VM (page table).
>
> Something like that:
> When GTT allocate suceess:
> add_mm_counter(vma->vm_mm, MM_ANONPAGES, buffer_size);
>
> When GTT swapped out:
> dec_mm_counter from MM_ANONPAGES frist, then
> add_mm_counter(vma->vm_mm, MM_SWAPENTS, buffer_size); // or MM_SHMEMPAGES or add new item.
>
> Update the corresponding item in mm_rss_stat always.
> If that, we can control the status update accurately.
> What do you think about that?
> And is there any side-effect for this approach?

I already tried this when I originally worked on the issue and that
approach didn't worked because allocated buffers are not associated to
the process where they are created.

E.g. most display surfaces are created by the X server, but used by
processes. So if you account the BO to the process who created it we
would start to kill X again and that is exactly what we try to avoid.

Regards,
Christian.

>
>
> Thanks
> Roger(Hongbo.He)
>
> -----Original Message-----
> From: dri-devel [mailto:[email protected]] On Behalf Of Andrey Grodzovsky
> Sent: Friday, January 19, 2018 12:48 AM
> To: [email protected]; [email protected]; [email protected]; [email protected]
> Cc: Koenig, Christian <[email protected]>
> Subject: [RFC] Per file OOM badness
>
> Hi, this series is a revised version of an RFC sent by Christian König a few years ago. The original RFC can be found at https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>
> This is the same idea and I've just adressed his concern from the original RFC and switched to a callback into file_ops instead of a new member in struct file.
>
> Thanks,
> Andrey
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> amd-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


2018-01-19 08:22:56

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Thu 18-01-18 12:01:32, Eric Anholt wrote:
> Michal Hocko <[email protected]> writes:
>
> > On Thu 18-01-18 18:00:06, Michal Hocko wrote:
> >> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> >> > Hi, this series is a revised version of an RFC sent by Christian K?nig
> >> > a few years ago. The original RFC can be found at
> >> > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
> >> >
> >> > This is the same idea and I've just adressed his concern from the original RFC
> >> > and switched to a callback into file_ops instead of a new member in struct file.
> >>
> >> Please add the full description to the cover letter and do not make
> >> people hunt links.
> >>
> >> Here is the origin cover letter text
> >> : I'm currently working on the issue that when device drivers allocate memory on
> >> : behalf of an application the OOM killer usually doesn't knew about that unless
> >> : the application also get this memory mapped into their address space.
> >> :
> >> : This is especially annoying for graphics drivers where a lot of the VRAM
> >> : usually isn't CPU accessible and so doesn't make sense to map into the
> >> : address space of the process using it.
> >> :
> >> : The problem now is that when an application starts to use a lot of VRAM those
> >> : buffers objects sooner or later get swapped out to system memory, but when we
> >> : now run into an out of memory situation the OOM killer obviously doesn't knew
> >> : anything about that memory and so usually kills the wrong process.
> >
> > OK, but how do you attribute that memory to a particular OOM killable
> > entity? And how do you actually enforce that those resources get freed
> > on the oom killer action?
> >
> >> : The following set of patches tries to address this problem by introducing a per
> >> : file OOM badness score, which device drivers can use to give the OOM killer a
> >> : hint how many resources are bound to a file descriptor so that it can make
> >> : better decisions which process to kill.
> >
> > But files are not killable, they can be shared... In other words this
> > doesn't help the oom killer to make an educated guess at all.
>
> Maybe some more context would help the discussion?
>
> The struct file in patch 3 is the DRM fd. That's effectively "my
> process's interface to talking to the GPU" not "a single GPU resource".
> Once that file is closed, all of the process's private, idle GPU buffers
> will be immediately freed (this will be most of their allocations), and
> some will be freed once the GPU completes some work (this will be most
> of the rest of their allocations).
>
> Some GEM BOs won't be freed just by closing the fd, if they've been
> shared between processes. Those are usually about 8-24MB total in a
> process, rather than the GBs that modern apps use (or that our testcases
> like to allocate and thus trigger oomkilling of the test harness instead
> of the offending testcase...)
>
> Even if we just had the private+idle buffers being accounted in OOM
> badness, that would be a huge step forward in system reliability.

OK, in that case I would propose a different approach. We already
have rss_stat. So why do not we simply add a new counter there
MM_KERNELPAGES and consider those in oom_badness? The rule would be
that such a memory is bound to the process life time. I guess we will
find more users for this later.
--
Michal Hocko
SUSE Labs

2018-01-19 08:26:29

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

[removed the broken quoting - please try to use an email client which
doesn't mess up the qouted text]

On Fri 19-01-18 06:01:26, He, Roger wrote:
[...]
> I think you are misunderstanding here.
> Actually for now, the memory in TTM Pools already has mm_shrink which is implemented in ttm_pool_mm_shrink_init.
> And here the memory we want to make it contribute to OOM badness is not in TTM Pools.
> Because when TTM buffer allocation success, the memory already is removed from TTM Pools.

I have no idea what TTM buffers are. But this smells like something
rather specific to the particular subsytem. And my main objection here
is that struct file is not a proper vehicle to carry such an
information. So whatever the TTM subsystem does it should contribute to
generic counters rather than abuse fd because it happens to use it to
communicate with userspace.
--
Michal Hocko
SUSE Labs

2018-01-19 08:36:56

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 18.01.2018 um 21:01 schrieb Eric Anholt:
> Michal Hocko <[email protected]> writes:
>
>> [SNIP]
>> But files are not killable, they can be shared... In other words this
>> doesn't help the oom killer to make an educated guess at all.
> Maybe some more context would help the discussion?

Thanks for doing this. Wanted to reply yesterday with that information
as well, but was unfortunately on sick leave.

>
> The struct file in patch 3 is the DRM fd. That's effectively "my
> process's interface to talking to the GPU" not "a single GPU resource".
> Once that file is closed, all of the process's private, idle GPU buffers
> will be immediately freed (this will be most of their allocations), and
> some will be freed once the GPU completes some work (this will be most
> of the rest of their allocations).
>
> Some GEM BOs won't be freed just by closing the fd, if they've been
> shared between processes. Those are usually about 8-24MB total in a
> process, rather than the GBs that modern apps use (or that our testcases
> like to allocate and thus trigger oomkilling of the test harness instead
> of the offending testcase...)
>
> Even if we just had the private+idle buffers being accounted in OOM
> badness, that would be a huge step forward in system reliability.

Yes, and that's exactly the intention here because currently the OOM
killer usually kills X when a graphics related application allocates to
much memory and that is highly undesirable.

>>> : So question at every one: What do you think about this approach?
>> I thing is just just wrong semantically. Non-reclaimable memory is a
>> pain, especially when there is way too much of it. If you can free that
>> memory somehow then you can hook into slab shrinker API and react on the
>> memory pressure. If you can account such a memory to a particular
>> process and make sure that the consumption is bound by the process life
>> time then we can think of an accounting that oom_badness can consider
>> when selecting a victim.
> For graphics, we can't free most of our memory without also effectively
> killing the process. i915 and vc4 have "purgeable" interfaces for
> userspace (on i915 this is exposed all the way to GL applications and is
> hooked into shrinker, and on vc4 this is so far just used for
> userspace-internal buffer caches to be purged when a CMA allocation
> fails). However, those purgeable pools are expected to be a tiny
> fraction of the GPU allocations by the process.

Same thing with TTM and amdgpu/radeon. We already have a shrinker hock
as well and make room as much as we can when needed.

But I think Michal's concerns are valid as well and I thought about them
when I created the initial patch.

One possible solution which came to my mind is that (IIRC) we not only
store the usual reference count per GEM object, but also how many
handles where created for it.

So what we could do is to iterate over all GEM handles of a client and
account only size/num_handles as badness for the client.

The end result would be that X and the client application would both get
1/2 of the GEM objects size accounted for.

Regards,
Christian.


2018-01-19 08:40:46

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 19.01.2018 um 09:20 schrieb Michal Hocko:
> On Thu 18-01-18 12:01:32, Eric Anholt wrote:
>> Michal Hocko <[email protected]> writes:
>>
>>> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>>>>> Hi, this series is a revised version of an RFC sent by Christian König
>>>>> a few years ago. The original RFC can be found at
>>>>> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>>>>>
>>>>> This is the same idea and I've just adressed his concern from the original RFC
>>>>> and switched to a callback into file_ops instead of a new member in struct file.
>>>> Please add the full description to the cover letter and do not make
>>>> people hunt links.
>>>>
>>>> Here is the origin cover letter text
>>>> : I'm currently working on the issue that when device drivers allocate memory on
>>>> : behalf of an application the OOM killer usually doesn't knew about that unless
>>>> : the application also get this memory mapped into their address space.
>>>> :
>>>> : This is especially annoying for graphics drivers where a lot of the VRAM
>>>> : usually isn't CPU accessible and so doesn't make sense to map into the
>>>> : address space of the process using it.
>>>> :
>>>> : The problem now is that when an application starts to use a lot of VRAM those
>>>> : buffers objects sooner or later get swapped out to system memory, but when we
>>>> : now run into an out of memory situation the OOM killer obviously doesn't knew
>>>> : anything about that memory and so usually kills the wrong process.
>>> OK, but how do you attribute that memory to a particular OOM killable
>>> entity? And how do you actually enforce that those resources get freed
>>> on the oom killer action?
>>>
>>>> : The following set of patches tries to address this problem by introducing a per
>>>> : file OOM badness score, which device drivers can use to give the OOM killer a
>>>> : hint how many resources are bound to a file descriptor so that it can make
>>>> : better decisions which process to kill.
>>> But files are not killable, they can be shared... In other words this
>>> doesn't help the oom killer to make an educated guess at all.
>> Maybe some more context would help the discussion?
>>
>> The struct file in patch 3 is the DRM fd. That's effectively "my
>> process's interface to talking to the GPU" not "a single GPU resource".
>> Once that file is closed, all of the process's private, idle GPU buffers
>> will be immediately freed (this will be most of their allocations), and
>> some will be freed once the GPU completes some work (this will be most
>> of the rest of their allocations).
>>
>> Some GEM BOs won't be freed just by closing the fd, if they've been
>> shared between processes. Those are usually about 8-24MB total in a
>> process, rather than the GBs that modern apps use (or that our testcases
>> like to allocate and thus trigger oomkilling of the test harness instead
>> of the offending testcase...)
>>
>> Even if we just had the private+idle buffers being accounted in OOM
>> badness, that would be a huge step forward in system reliability.
> OK, in that case I would propose a different approach. We already
> have rss_stat. So why do not we simply add a new counter there
> MM_KERNELPAGES and consider those in oom_badness? The rule would be
> that such a memory is bound to the process life time. I guess we will
> find more users for this later.

I already tried that and the problem with that approach is that some
buffers are not created by the application which actually uses them.

For example X/Wayland is creating and handing out render buffers to
application which want to use OpenGL.

So the result is when you always account the application who created the
buffer the OOM killer will certainly reap X/Wayland first. And that is
exactly what we want to avoid here.

Regards,
Christian.

2018-01-19 09:32:57

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-19 09:39 AM, Christian König wrote:
> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>> On Thu 18-01-18 12:01:32, Eric Anholt wrote:
>>> Michal Hocko <[email protected]> writes:
>>>
>>>> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>>>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>>>>>> Hi, this series is a revised version of an RFC sent by Christian
>>>>>> König
>>>>>> a few years ago. The original RFC can be found at
>>>>>> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>>>>>>
>>>>>>
>>>>>> This is the same idea and I've just adressed his concern from the
>>>>>> original RFC
>>>>>> and switched to a callback into file_ops instead of a new member
>>>>>> in struct file.
>>>>> Please add the full description to the cover letter and do not make
>>>>> people hunt links.
>>>>>
>>>>> Here is the origin cover letter text
>>>>> : I'm currently working on the issue that when device drivers
>>>>> allocate memory on
>>>>> : behalf of an application the OOM killer usually doesn't knew
>>>>> about that unless
>>>>> : the application also get this memory mapped into their address
>>>>> space.
>>>>> :
>>>>> : This is especially annoying for graphics drivers where a lot of
>>>>> the VRAM
>>>>> : usually isn't CPU accessible and so doesn't make sense to map
>>>>> into the
>>>>> : address space of the process using it.
>>>>> :
>>>>> : The problem now is that when an application starts to use a lot
>>>>> of VRAM those
>>>>> : buffers objects sooner or later get swapped out to system memory,
>>>>> but when we
>>>>> : now run into an out of memory situation the OOM killer obviously
>>>>> doesn't knew
>>>>> : anything about that memory and so usually kills the wrong process.
>>>> OK, but how do you attribute that memory to a particular OOM killable
>>>> entity? And how do you actually enforce that those resources get freed
>>>> on the oom killer action?
>>>>
>>>>> : The following set of patches tries to address this problem by
>>>>> introducing a per
>>>>> : file OOM badness score, which device drivers can use to give the
>>>>> OOM killer a
>>>>> : hint how many resources are bound to a file descriptor so that it
>>>>> can make
>>>>> : better decisions which process to kill.
>>>> But files are not killable, they can be shared... In other words this
>>>> doesn't help the oom killer to make an educated guess at all.
>>> Maybe some more context would help the discussion?
>>>
>>> The struct file in patch 3 is the DRM fd.  That's effectively "my
>>> process's interface to talking to the GPU" not "a single GPU resource".
>>> Once that file is closed, all of the process's private, idle GPU buffers
>>> will be immediately freed (this will be most of their allocations), and
>>> some will be freed once the GPU completes some work (this will be most
>>> of the rest of their allocations).
>>>
>>> Some GEM BOs won't be freed just by closing the fd, if they've been
>>> shared between processes.  Those are usually about 8-24MB total in a
>>> process, rather than the GBs that modern apps use (or that our testcases
>>> like to allocate and thus trigger oomkilling of the test harness instead
>>> of the offending testcase...)
>>>
>>> Even if we just had the private+idle buffers being accounted in OOM
>>> badness, that would be a huge step forward in system reliability.
>> OK, in that case I would propose a different approach. We already
>> have rss_stat. So why do not we simply add a new counter there
>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>> that such a memory is bound to the process life time. I guess we will
>> find more users for this later.
>
> I already tried that and the problem with that approach is that some
> buffers are not created by the application which actually uses them.
>
> For example X/Wayland is creating and handing out render buffers to
> application which want to use OpenGL.
>
> So the result is when you always account the application who created the
> buffer the OOM killer will certainly reap X/Wayland first. And that is
> exactly what we want to avoid here.

FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
anymore. With DRI3 and Wayland, buffers are allocated by the clients and
then shared with the X / Wayland server.

Also, in all cases, the amount of memory allocated for buffers shared
between DRI/Wayland clients and the server should be relatively small
compared to the amount of memory allocated for buffers used only locally
in the client, particularly for clients which create significant memory
pressure.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-19 09:59:56

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 19.01.2018 um 10:32 schrieb Michel Dänzer:
> On 2018-01-19 09:39 AM, Christian König wrote:
>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>>> On Thu 18-01-18 12:01:32, Eric Anholt wrote:
>>>> Michal Hocko <[email protected]> writes:
>>>>
>>>>> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>>>>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>>>>>>> Hi, this series is a revised version of an RFC sent by Christian
>>>>>>> König
>>>>>>> a few years ago. The original RFC can be found at
>>>>>>> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>>>>>>>
>>>>>>>
>>>>>>> This is the same idea and I've just adressed his concern from the
>>>>>>> original RFC
>>>>>>> and switched to a callback into file_ops instead of a new member
>>>>>>> in struct file.
>>>>>> Please add the full description to the cover letter and do not make
>>>>>> people hunt links.
>>>>>>
>>>>>> Here is the origin cover letter text
>>>>>> : I'm currently working on the issue that when device drivers
>>>>>> allocate memory on
>>>>>> : behalf of an application the OOM killer usually doesn't knew
>>>>>> about that unless
>>>>>> : the application also get this memory mapped into their address
>>>>>> space.
>>>>>> :
>>>>>> : This is especially annoying for graphics drivers where a lot of
>>>>>> the VRAM
>>>>>> : usually isn't CPU accessible and so doesn't make sense to map
>>>>>> into the
>>>>>> : address space of the process using it.
>>>>>> :
>>>>>> : The problem now is that when an application starts to use a lot
>>>>>> of VRAM those
>>>>>> : buffers objects sooner or later get swapped out to system memory,
>>>>>> but when we
>>>>>> : now run into an out of memory situation the OOM killer obviously
>>>>>> doesn't knew
>>>>>> : anything about that memory and so usually kills the wrong process.
>>>>> OK, but how do you attribute that memory to a particular OOM killable
>>>>> entity? And how do you actually enforce that those resources get freed
>>>>> on the oom killer action?
>>>>>
>>>>>> : The following set of patches tries to address this problem by
>>>>>> introducing a per
>>>>>> : file OOM badness score, which device drivers can use to give the
>>>>>> OOM killer a
>>>>>> : hint how many resources are bound to a file descriptor so that it
>>>>>> can make
>>>>>> : better decisions which process to kill.
>>>>> But files are not killable, they can be shared... In other words this
>>>>> doesn't help the oom killer to make an educated guess at all.
>>>> Maybe some more context would help the discussion?
>>>>
>>>> The struct file in patch 3 is the DRM fd.  That's effectively "my
>>>> process's interface to talking to the GPU" not "a single GPU resource".
>>>> Once that file is closed, all of the process's private, idle GPU buffers
>>>> will be immediately freed (this will be most of their allocations), and
>>>> some will be freed once the GPU completes some work (this will be most
>>>> of the rest of their allocations).
>>>>
>>>> Some GEM BOs won't be freed just by closing the fd, if they've been
>>>> shared between processes.  Those are usually about 8-24MB total in a
>>>> process, rather than the GBs that modern apps use (or that our testcases
>>>> like to allocate and thus trigger oomkilling of the test harness instead
>>>> of the offending testcase...)
>>>>
>>>> Even if we just had the private+idle buffers being accounted in OOM
>>>> badness, that would be a huge step forward in system reliability.
>>> OK, in that case I would propose a different approach. We already
>>> have rss_stat. So why do not we simply add a new counter there
>>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>>> that such a memory is bound to the process life time. I guess we will
>>> find more users for this later.
>> I already tried that and the problem with that approach is that some
>> buffers are not created by the application which actually uses them.
>>
>> For example X/Wayland is creating and handing out render buffers to
>> application which want to use OpenGL.
>>
>> So the result is when you always account the application who created the
>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>> exactly what we want to avoid here.
> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
> anymore. With DRI3 and Wayland, buffers are allocated by the clients and
> then shared with the X / Wayland server.

Good point, when I initially looked at that problem DRI3 wasn't widely
used yet.

> Also, in all cases, the amount of memory allocated for buffers shared
> between DRI/Wayland clients and the server should be relatively small
> compared to the amount of memory allocated for buffers used only locally
> in the client, particularly for clients which create significant memory
> pressure.

That is unfortunately only partially true. When you have a single
runaway application which tries to allocate everything it would indeed
work as you described.

But when I tested this a few years ago with X based desktop the
applications which actually used most of the memory where Firefox and
Thunderbird. Unfortunately they never got accounted for that.

Now, on my current Wayland based desktop it actually doesn't look much
better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of
all memory was allocated either by gnome-shell or Xwayland.

Regards,
Christian.

2018-01-19 10:03:36

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-19 10:58 AM, Christian König wrote:
> Am 19.01.2018 um 10:32 schrieb Michel Dänzer:
>> On 2018-01-19 09:39 AM, Christian König wrote:
>>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>>>> On Thu 18-01-18 12:01:32, Eric Anholt wrote:
>>>>> Michal Hocko <[email protected]> writes:
>>>>>
>>>>>> On Thu 18-01-18 18:00:06, Michal Hocko wrote:
>>>>>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>>>>>>>> Hi, this series is a revised version of an RFC sent by Christian
>>>>>>>> König
>>>>>>>> a few years ago. The original RFC can be found at
>>>>>>>> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This is the same idea and I've just adressed his concern from the
>>>>>>>> original RFC
>>>>>>>> and switched to a callback into file_ops instead of a new member
>>>>>>>> in struct file.
>>>>>>> Please add the full description to the cover letter and do not make
>>>>>>> people hunt links.
>>>>>>>
>>>>>>> Here is the origin cover letter text
>>>>>>> : I'm currently working on the issue that when device drivers
>>>>>>> allocate memory on
>>>>>>> : behalf of an application the OOM killer usually doesn't knew
>>>>>>> about that unless
>>>>>>> : the application also get this memory mapped into their address
>>>>>>> space.
>>>>>>> :
>>>>>>> : This is especially annoying for graphics drivers where a lot of
>>>>>>> the VRAM
>>>>>>> : usually isn't CPU accessible and so doesn't make sense to map
>>>>>>> into the
>>>>>>> : address space of the process using it.
>>>>>>> :
>>>>>>> : The problem now is that when an application starts to use a lot
>>>>>>> of VRAM those
>>>>>>> : buffers objects sooner or later get swapped out to system memory,
>>>>>>> but when we
>>>>>>> : now run into an out of memory situation the OOM killer obviously
>>>>>>> doesn't knew
>>>>>>> : anything about that memory and so usually kills the wrong process.
>>>>>> OK, but how do you attribute that memory to a particular OOM killable
>>>>>> entity? And how do you actually enforce that those resources get
>>>>>> freed
>>>>>> on the oom killer action?
>>>>>>
>>>>>>> : The following set of patches tries to address this problem by
>>>>>>> introducing a per
>>>>>>> : file OOM badness score, which device drivers can use to give the
>>>>>>> OOM killer a
>>>>>>> : hint how many resources are bound to a file descriptor so that it
>>>>>>> can make
>>>>>>> : better decisions which process to kill.
>>>>>> But files are not killable, they can be shared... In other words this
>>>>>> doesn't help the oom killer to make an educated guess at all.
>>>>> Maybe some more context would help the discussion?
>>>>>
>>>>> The struct file in patch 3 is the DRM fd.  That's effectively "my
>>>>> process's interface to talking to the GPU" not "a single GPU
>>>>> resource".
>>>>> Once that file is closed, all of the process's private, idle GPU
>>>>> buffers
>>>>> will be immediately freed (this will be most of their allocations),
>>>>> and
>>>>> some will be freed once the GPU completes some work (this will be most
>>>>> of the rest of their allocations).
>>>>>
>>>>> Some GEM BOs won't be freed just by closing the fd, if they've been
>>>>> shared between processes.  Those are usually about 8-24MB total in a
>>>>> process, rather than the GBs that modern apps use (or that our
>>>>> testcases
>>>>> like to allocate and thus trigger oomkilling of the test harness
>>>>> instead
>>>>> of the offending testcase...)
>>>>>
>>>>> Even if we just had the private+idle buffers being accounted in OOM
>>>>> badness, that would be a huge step forward in system reliability.
>>>> OK, in that case I would propose a different approach. We already
>>>> have rss_stat. So why do not we simply add a new counter there
>>>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>>>> that such a memory is bound to the process life time. I guess we will
>>>> find more users for this later.
>>> I already tried that and the problem with that approach is that some
>>> buffers are not created by the application which actually uses them.
>>>
>>> For example X/Wayland is creating and handing out render buffers to
>>> application which want to use OpenGL.
>>>
>>> So the result is when you always account the application who created the
>>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>>> exactly what we want to avoid here.
>> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
>> anymore. With DRI3 and Wayland, buffers are allocated by the clients and
>> then shared with the X / Wayland server.
>
> Good point, when I initially looked at that problem DRI3 wasn't widely
> used yet.
>
>> Also, in all cases, the amount of memory allocated for buffers shared
>> between DRI/Wayland clients and the server should be relatively small
>> compared to the amount of memory allocated for buffers used only locally
>> in the client, particularly for clients which create significant memory
>> pressure.
>
> That is unfortunately only partially true. When you have a single
> runaway application which tries to allocate everything it would indeed
> work as you described.
>
> But when I tested this a few years ago with X based desktop the
> applications which actually used most of the memory where Firefox and
> Thunderbird. Unfortunately they never got accounted for that.
>
> Now, on my current Wayland based desktop it actually doesn't look much
> better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of
> all memory was allocated either by gnome-shell or Xwayland.

My guess would be this is due to pixmaps, which allow X clients to cause
the X server to allocate essentially unlimited amounts of memory. It's a
separate issue, which would require a different solution than what we're
discussing in this thread. Maybe something that would allow the X server
to tell the kernel that some of the memory it allocates is for the
client process.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-19 10:42:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Fri 19-01-18 09:39:03, Christian K?nig wrote:
> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
[...]
> > OK, in that case I would propose a different approach. We already
> > have rss_stat. So why do not we simply add a new counter there
> > MM_KERNELPAGES and consider those in oom_badness? The rule would be
> > that such a memory is bound to the process life time. I guess we will
> > find more users for this later.
>
> I already tried that and the problem with that approach is that some buffers
> are not created by the application which actually uses them.
>
> For example X/Wayland is creating and handing out render buffers to
> application which want to use OpenGL.
>
> So the result is when you always account the application who created the
> buffer the OOM killer will certainly reap X/Wayland first. And that is
> exactly what we want to avoid here.

Then you have to find the target allocation context at the time of the
allocation and account it. As follow up emails show, implementations
might differ and any robust oom solution have to rely on the common
counters.
--
Michal Hocko
SUSE Labs

2018-01-19 11:42:25

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 19.01.2018 um 11:40 schrieb Michal Hocko:
> On Fri 19-01-18 09:39:03, Christian König wrote:
>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
> [...]
>>> OK, in that case I would propose a different approach. We already
>>> have rss_stat. So why do not we simply add a new counter there
>>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>>> that such a memory is bound to the process life time. I guess we will
>>> find more users for this later.
>> I already tried that and the problem with that approach is that some buffers
>> are not created by the application which actually uses them.
>>
>> For example X/Wayland is creating and handing out render buffers to
>> application which want to use OpenGL.
>>
>> So the result is when you always account the application who created the
>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>> exactly what we want to avoid here.
> Then you have to find the target allocation context at the time of the
> allocation and account it.

And exactly that's the root of the problem: The target allocation
context isn't known at the time of the allocation.

We could add callbacks so that when the memory is passed from the
allocator to the actual user of the memory. In other words when the
memory is passed from the X server to the client the driver would need
to decrement the X servers accounting and increment the clients accounting.

But I think that would go deep into the file descriptor handling (we
would at least need to handle dup/dup2 and passing the fd using unix
domain sockets) and most likely would be rather error prone.

The per file descriptor badness is/was just the much easier approach to
solve the issue, because the drivers already knew which client is
currently using which buffer objects.

I of course agree that file descriptors can be shared between processes
and are by themselves not killable. But at least for our graphics driven
use case I don't see much of a problem killing all processes when a file
descriptor is used by more than one at the same time.

Regards,
Christian.

> As follow up emails show, implementations
> might differ and any robust oom solution have to rely on the common
> counters.


2018-01-19 12:15:37

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Fri 19-01-18 12:37:51, Christian K?nig wrote:
[...]
> The per file descriptor badness is/was just the much easier approach to
> solve the issue, because the drivers already knew which client is currently
> using which buffer objects.
>
> I of course agree that file descriptors can be shared between processes and
> are by themselves not killable. But at least for our graphics driven use
> case I don't see much of a problem killing all processes when a file
> descriptor is used by more than one at the same time.

Ohh, I absolutely see why you have chosen this way for your particular
usecase. I am just arguing that this would rather be more generic to be
merged. If there is absolutely no other way around we can consider it
but right now I do not see that all other options have been considered
properly. Especially when the fd based approach is basically wrong for
almost anybody else.
--
Michal Hocko
SUSE Labs

2018-01-19 12:20:58

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Fri 19-01-18 13:13:51, Michal Hocko wrote:
> On Fri 19-01-18 12:37:51, Christian K?nig wrote:
> [...]
> > The per file descriptor badness is/was just the much easier approach to
> > solve the issue, because the drivers already knew which client is currently
> > using which buffer objects.
> >
> > I of course agree that file descriptors can be shared between processes and
> > are by themselves not killable. But at least for our graphics driven use
> > case I don't see much of a problem killing all processes when a file
> > descriptor is used by more than one at the same time.
>
> Ohh, I absolutely see why you have chosen this way for your particular
> usecase. I am just arguing that this would rather be more generic to be
> merged. If there is absolutely no other way around we can consider it
> but right now I do not see that all other options have been considered
> properly. Especially when the fd based approach is basically wrong for
> almost anybody else.

And more importantly. Iterating over _all_ fd which is what is your
approach is based on AFAIU is not acceptable for the OOM path. Even
though oom_badness is not a hot path we do not really want it to take a
lot of time either. Even the current iteration over all processes is
quite time consuming. Now you want to add the number of opened files and
that might be quite many per process.
--
Michal Hocko
SUSE Labs

2018-01-19 15:10:19

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-19 11:02 AM, Michel Dänzer wrote:
> On 2018-01-19 10:58 AM, Christian König wrote:
>> Am 19.01.2018 um 10:32 schrieb Michel Dänzer:
>>> On 2018-01-19 09:39 AM, Christian König wrote:
>>>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>>>>> OK, in that case I would propose a different approach. We already
>>>>> have rss_stat. So why do not we simply add a new counter there
>>>>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>>>>> that such a memory is bound to the process life time. I guess we will
>>>>> find more users for this later.
>>>> I already tried that and the problem with that approach is that some
>>>> buffers are not created by the application which actually uses them.
>>>>
>>>> For example X/Wayland is creating and handing out render buffers to
>>>> application which want to use OpenGL.
>>>>
>>>> So the result is when you always account the application who created the
>>>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>>>> exactly what we want to avoid here.
>>> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
>>> anymore. With DRI3 and Wayland, buffers are allocated by the clients and
>>> then shared with the X / Wayland server.
>>
>> Good point, when I initially looked at that problem DRI3 wasn't widely
>> used yet.
>>
>>> Also, in all cases, the amount of memory allocated for buffers shared
>>> between DRI/Wayland clients and the server should be relatively small
>>> compared to the amount of memory allocated for buffers used only locally
>>> in the client, particularly for clients which create significant memory
>>> pressure.
>>
>> That is unfortunately only partially true. When you have a single
>> runaway application which tries to allocate everything it would indeed
>> work as you described.
>>
>> But when I tested this a few years ago with X based desktop the
>> applications which actually used most of the memory where Firefox and
>> Thunderbird. Unfortunately they never got accounted for that.
>>
>> Now, on my current Wayland based desktop it actually doesn't look much
>> better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of
>> all memory was allocated either by gnome-shell or Xwayland.
>
> My guess would be this is due to pixmaps, which allow X clients to cause
> the X server to allocate essentially unlimited amounts of memory. It's a
> separate issue, which would require a different solution than what we're
> discussing in this thread. Maybe something that would allow the X server
> to tell the kernel that some of the memory it allocates is for the
> client process.

Of course, such a mechanism could probably be abused to incorrectly
blame other processes for one's own memory consumption...


I'm not sure if the pixmap issue can be solved for the OOM killer. It's
an X design issue which is fixed with Wayland. So it's probably better
to ignore it for this discussion.

Also, I really think the issue with DRM buffers being shared between
processes isn't significant for the OOM killer compared to DRM buffers
only used in the same process that allocates them. So I suggest focusing
on the latter.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-19 16:50:20

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-19 12:37 PM, Christian König wrote:
> Am 19.01.2018 um 11:40 schrieb Michal Hocko:
>> On Fri 19-01-18 09:39:03, Christian König wrote:
>>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>> [...]
>>>> OK, in that case I would propose a different approach. We already
>>>> have rss_stat. So why do not we simply add a new counter there
>>>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>>>> that such a memory is bound to the process life time. I guess we will
>>>> find more users for this later.
>>> I already tried that and the problem with that approach is that some
>>> buffers
>>> are not created by the application which actually uses them.
>>>
>>> For example X/Wayland is creating and handing out render buffers to
>>> application which want to use OpenGL.
>>>
>>> So the result is when you always account the application who created the
>>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>>> exactly what we want to avoid here.
>> Then you have to find the target allocation context at the time of the
>> allocation and account it.
>
> And exactly that's the root of the problem: The target allocation
> context isn't known at the time of the allocation.
>
> We could add callbacks so that when the memory is passed from the
> allocator to the actual user of the memory. In other words when the
> memory is passed from the X server to the client the driver would need
> to decrement the X servers accounting and increment the clients accounting.
>
> But I think that would go deep into the file descriptor handling (we
> would at least need to handle dup/dup2 and passing the fd using unix
> domain sockets) and most likely would be rather error prone.
>
> The per file descriptor badness is/was just the much easier approach to
> solve the issue, because the drivers already knew which client is
> currently using which buffer objects.
>
> I of course agree that file descriptors can be shared between processes
> and are by themselves not killable. But at least for our graphics driven
> use case I don't see much of a problem killing all processes when a file
> descriptor is used by more than one at the same time.

In that case, accounting a BO as suggested by Michal above, in every
process that shares it, should work fine, shouldn't it?

The OOM killer will first select the process which has more memory
accounted for other things than the BOs shared with another process.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-19 16:56:56

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 19.01.2018 um 13:20 schrieb Michal Hocko:
> On Fri 19-01-18 13:13:51, Michal Hocko wrote:
>> On Fri 19-01-18 12:37:51, Christian König wrote:
>> [...]
>>> The per file descriptor badness is/was just the much easier approach to
>>> solve the issue, because the drivers already knew which client is currently
>>> using which buffer objects.
>>>
>>> I of course agree that file descriptors can be shared between processes and
>>> are by themselves not killable. But at least for our graphics driven use
>>> case I don't see much of a problem killing all processes when a file
>>> descriptor is used by more than one at the same time.
>> Ohh, I absolutely see why you have chosen this way for your particular
>> usecase. I am just arguing that this would rather be more generic to be
>> merged. If there is absolutely no other way around we can consider it
>> but right now I do not see that all other options have been considered
>> properly. Especially when the fd based approach is basically wrong for
>> almost anybody else.
> And more importantly. Iterating over _all_ fd which is what is your
> approach is based on AFAIU is not acceptable for the OOM path. Even
> though oom_badness is not a hot path we do not really want it to take a
> lot of time either. Even the current iteration over all processes is
> quite time consuming. Now you want to add the number of opened files and
> that might be quite many per process.

Mhm, crap that is a really good argument.

How about adding a linked list of callbacks to check for the OOM killer
to check for each process?

This way we can avoid finding the process where we need to account
things on when memory is allocated and still allow the OOM killer to
only check the specific callbacks it needs to determine the score of a
process?

Would still require some changes in the fs layer, but I think that
should be doable.

Regards,
Christian.

2018-01-21 06:52:35

by Eric Anholt

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Michel Dänzer <[email protected]> writes:

> On 2018-01-19 11:02 AM, Michel Dänzer wrote:
>> On 2018-01-19 10:58 AM, Christian König wrote:
>>> Am 19.01.2018 um 10:32 schrieb Michel Dänzer:
>>>> On 2018-01-19 09:39 AM, Christian König wrote:
>>>>> Am 19.01.2018 um 09:20 schrieb Michal Hocko:
>>>>>> OK, in that case I would propose a different approach. We already
>>>>>> have rss_stat. So why do not we simply add a new counter there
>>>>>> MM_KERNELPAGES and consider those in oom_badness? The rule would be
>>>>>> that such a memory is bound to the process life time. I guess we will
>>>>>> find more users for this later.
>>>>> I already tried that and the problem with that approach is that some
>>>>> buffers are not created by the application which actually uses them.
>>>>>
>>>>> For example X/Wayland is creating and handing out render buffers to
>>>>> application which want to use OpenGL.
>>>>>
>>>>> So the result is when you always account the application who created the
>>>>> buffer the OOM killer will certainly reap X/Wayland first. And that is
>>>>> exactly what we want to avoid here.
>>>> FWIW, what you describe is true with DRI2, but not with DRI3 or Wayland
>>>> anymore. With DRI3 and Wayland, buffers are allocated by the clients and
>>>> then shared with the X / Wayland server.
>>>
>>> Good point, when I initially looked at that problem DRI3 wasn't widely
>>> used yet.
>>>
>>>> Also, in all cases, the amount of memory allocated for buffers shared
>>>> between DRI/Wayland clients and the server should be relatively small
>>>> compared to the amount of memory allocated for buffers used only locally
>>>> in the client, particularly for clients which create significant memory
>>>> pressure.
>>>
>>> That is unfortunately only partially true. When you have a single
>>> runaway application which tries to allocate everything it would indeed
>>> work as you described.
>>>
>>> But when I tested this a few years ago with X based desktop the
>>> applications which actually used most of the memory where Firefox and
>>> Thunderbird. Unfortunately they never got accounted for that.
>>>
>>> Now, on my current Wayland based desktop it actually doesn't look much
>>> better. Taking a look at radeon_gem_info/amdgpu_gem_info the majority of
>>> all memory was allocated either by gnome-shell or Xwayland.
>>
>> My guess would be this is due to pixmaps, which allow X clients to cause
>> the X server to allocate essentially unlimited amounts of memory. It's a
>> separate issue, which would require a different solution than what we're
>> discussing in this thread. Maybe something that would allow the X server
>> to tell the kernel that some of the memory it allocates is for the
>> client process.
>
> Of course, such a mechanism could probably be abused to incorrectly
> blame other processes for one's own memory consumption...
>
>
> I'm not sure if the pixmap issue can be solved for the OOM killer. It's
> an X design issue which is fixed with Wayland. So it's probably better
> to ignore it for this discussion.
>
> Also, I really think the issue with DRM buffers being shared between
> processes isn't significant for the OOM killer compared to DRM buffers
> only used in the same process that allocates them. So I suggest focusing
> on the latter.

Agreed. The 95% case is non-shared buffers, so just don't account for
them and we'll have a solution good enough that we probably never need
to handle the shared case. On the DRM side, removing buffers from the
accounting once they get shared would be easy.


Attachments:
signature.asc (847.00 B)

2018-01-22 23:24:00

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Thu, 18 Jan 2018 11:47:48 -0500 Andrey Grodzovsky <[email protected]> wrote:

> Hi, this series is a revised version of an RFC sent by Christian K?nig
> a few years ago. The original RFC can be found at
> https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html
>
> This is the same idea and I've just adressed his concern from the original RFC
> and switched to a callback into file_ops instead of a new member in struct file.

Should be in address_space_operations, I suspect. If an application
opens a file twice, we only want to count it once?

But we're putting the cart ahead of the horse here. Please provide us
with a detailed description of the problem which you are addressing so
that the MM developers can better consider how to address your
requirements.

2018-01-23 11:40:03

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Fri 19-01-18 17:54:36, Christian K?nig wrote:
> Am 19.01.2018 um 13:20 schrieb Michal Hocko:
> > On Fri 19-01-18 13:13:51, Michal Hocko wrote:
> > > On Fri 19-01-18 12:37:51, Christian K?nig wrote:
> > > [...]
> > > > The per file descriptor badness is/was just the much easier approach to
> > > > solve the issue, because the drivers already knew which client is currently
> > > > using which buffer objects.
> > > >
> > > > I of course agree that file descriptors can be shared between processes and
> > > > are by themselves not killable. But at least for our graphics driven use
> > > > case I don't see much of a problem killing all processes when a file
> > > > descriptor is used by more than one at the same time.
> > > Ohh, I absolutely see why you have chosen this way for your particular
> > > usecase. I am just arguing that this would rather be more generic to be
> > > merged. If there is absolutely no other way around we can consider it
> > > but right now I do not see that all other options have been considered
> > > properly. Especially when the fd based approach is basically wrong for
> > > almost anybody else.
> > And more importantly. Iterating over _all_ fd which is what is your
> > approach is based on AFAIU is not acceptable for the OOM path. Even
> > though oom_badness is not a hot path we do not really want it to take a
> > lot of time either. Even the current iteration over all processes is
> > quite time consuming. Now you want to add the number of opened files and
> > that might be quite many per process.
>
> Mhm, crap that is a really good argument.
>
> How about adding a linked list of callbacks to check for the OOM killer to
> check for each process?
>
> This way we can avoid finding the process where we need to account things on
> when memory is allocated and still allow the OOM killer to only check the
> specific callbacks it needs to determine the score of a process?

I might be oversimplifying but there really have to be a boundary when
you have the target user context, no? Then do the accounting when you
get data to the user.
--
Michal Hocko
SUSE Labs

2018-01-23 15:28:20

by Roman Gushchin

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > Hi, this series is a revised version of an RFC sent by Christian K?nig
> > a few years ago. The original RFC can be found at
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&
> Here is the origin cover letter text
> : I'm currently working on the issue that when device drivers allocate memory on
> : behalf of an application the OOM killer usually doesn't knew about that unless
> : the application also get this memory mapped into their address space.
> :
> : This is especially annoying for graphics drivers where a lot of the VRAM
> : usually isn't CPU accessible and so doesn't make sense to map into the
> : address space of the process using it.
> :
> : The problem now is that when an application starts to use a lot of VRAM those
> : buffers objects sooner or later get swapped out to system memory, but when we
> : now run into an out of memory situation the OOM killer obviously doesn't knew
> : anything about that memory and so usually kills the wrong process.
> :
> : The following set of patches tries to address this problem by introducing a per
> : file OOM badness score, which device drivers can use to give the OOM killer a
> : hint how many resources are bound to a file descriptor so that it can make
> : better decisions which process to kill.
> :
> : So question at every one: What do you think about this approach?
> :
> : My biggest concern right now is the patches are messing with a core kernel
> : structure (adding a field to struct file). Any better idea? I'm considering
> : to put a callback into file_ops instead.

Hello!

I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
We do have kmem accounting on the memory cgroup level, and the cgroup-aware
OOM selection logic takes cgroup's kmem size into account. So, you don't
need to introduce another accounting mechanism for OOM.

You can find the current implementation in the mm tree.

Thanks!

Roman

2018-01-23 15:37:44

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
> > On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> > > Hi, this series is a revised version of an RFC sent by Christian K?nig
> > > a few years ago. The original RFC can be found at
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&
> > Here is the origin cover letter text
> > : I'm currently working on the issue that when device drivers allocate memory on
> > : behalf of an application the OOM killer usually doesn't knew about that unless
> > : the application also get this memory mapped into their address space.
> > :
> > : This is especially annoying for graphics drivers where a lot of the VRAM
> > : usually isn't CPU accessible and so doesn't make sense to map into the
> > : address space of the process using it.
> > :
> > : The problem now is that when an application starts to use a lot of VRAM those
> > : buffers objects sooner or later get swapped out to system memory, but when we
> > : now run into an out of memory situation the OOM killer obviously doesn't knew
> > : anything about that memory and so usually kills the wrong process.
> > :
> > : The following set of patches tries to address this problem by introducing a per
> > : file OOM badness score, which device drivers can use to give the OOM killer a
> > : hint how many resources are bound to a file descriptor so that it can make
> > : better decisions which process to kill.
> > :
> > : So question at every one: What do you think about this approach?
> > :
> > : My biggest concern right now is the patches are messing with a core kernel
> > : structure (adding a field to struct file). Any better idea? I'm considering
> > : to put a callback into file_ops instead.
>
> Hello!
>
> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?

I do not think so. The problem is that the allocating context is not
identical with the end consumer.
--
Michal Hocko
SUSE Labs

2018-01-23 16:40:53

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-23 04:36 PM, Michal Hocko wrote:
> On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
>> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>>>> Hi, this series is a revised version of an RFC sent by Christian König
>>>> a few years ago. The original RFC can be found at
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&
>>> Here is the origin cover letter text
>>> : I'm currently working on the issue that when device drivers allocate memory on
>>> : behalf of an application the OOM killer usually doesn't knew about that unless
>>> : the application also get this memory mapped into their address space.
>>> :
>>> : This is especially annoying for graphics drivers where a lot of the VRAM
>>> : usually isn't CPU accessible and so doesn't make sense to map into the
>>> : address space of the process using it.
>>> :
>>> : The problem now is that when an application starts to use a lot of VRAM those
>>> : buffers objects sooner or later get swapped out to system memory, but when we
>>> : now run into an out of memory situation the OOM killer obviously doesn't knew
>>> : anything about that memory and so usually kills the wrong process.
>>> :
>>> : The following set of patches tries to address this problem by introducing a per
>>> : file OOM badness score, which device drivers can use to give the OOM killer a
>>> : hint how many resources are bound to a file descriptor so that it can make
>>> : better decisions which process to kill.
>>> :
>>> : So question at every one: What do you think about this approach?
>>> :
>>> : My biggest concern right now is the patches are messing with a core kernel
>>> : structure (adding a field to struct file). Any better idea? I'm considering
>>> : to put a callback into file_ops instead.
>>
>> Hello!
>>
>> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
>
> I do not think so. The problem is that the allocating context is not
> identical with the end consumer.

That's actually not really true. Even in cases where a BO is shared with
a different process, it is still used at least occasionally in the
process which allocated it as well. Otherwise there would be no point in
sharing it between processes.


There should be no problem if the memory of a shared BO is accounted for
in each process sharing it. It might be nice to scale each process'
"debt" by 1 / (number of processes sharing it) if possible, but in the
worst case accounting it fully in each process should be fine.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-24 09:30:05

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Tue 23-01-18 17:39:19, Michel D?nzer wrote:
> On 2018-01-23 04:36 PM, Michal Hocko wrote:
> > On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
> >> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
> >>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
> >>>> Hi, this series is a revised version of an RFC sent by Christian K?nig
> >>>> a few years ago. The original RFC can be found at
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&
> >>> Here is the origin cover letter text
> >>> : I'm currently working on the issue that when device drivers allocate memory on
> >>> : behalf of an application the OOM killer usually doesn't knew about that unless
> >>> : the application also get this memory mapped into their address space.
> >>> :
> >>> : This is especially annoying for graphics drivers where a lot of the VRAM
> >>> : usually isn't CPU accessible and so doesn't make sense to map into the
> >>> : address space of the process using it.
> >>> :
> >>> : The problem now is that when an application starts to use a lot of VRAM those
> >>> : buffers objects sooner or later get swapped out to system memory, but when we
> >>> : now run into an out of memory situation the OOM killer obviously doesn't knew
> >>> : anything about that memory and so usually kills the wrong process.
> >>> :
> >>> : The following set of patches tries to address this problem by introducing a per
> >>> : file OOM badness score, which device drivers can use to give the OOM killer a
> >>> : hint how many resources are bound to a file descriptor so that it can make
> >>> : better decisions which process to kill.
> >>> :
> >>> : So question at every one: What do you think about this approach?
> >>> :
> >>> : My biggest concern right now is the patches are messing with a core kernel
> >>> : structure (adding a field to struct file). Any better idea? I'm considering
> >>> : to put a callback into file_ops instead.
> >>
> >> Hello!
> >>
> >> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
> >
> > I do not think so. The problem is that the allocating context is not
> > identical with the end consumer.
>
> That's actually not really true. Even in cases where a BO is shared with
> a different process, it is still used at least occasionally in the
> process which allocated it as well. Otherwise there would be no point in
> sharing it between processes.

OK, but somebody has to be made responsible. Otherwise you are just
killing a process which doesn't really release any memory.

> There should be no problem if the memory of a shared BO is accounted for
> in each process sharing it. It might be nice to scale each process'
> "debt" by 1 / (number of processes sharing it) if possible, but in the
> worst case accounting it fully in each process should be fine.

So how exactly then helps to kill one of those processes? The memory
stays pinned behind or do I still misunderstand?
--
Michal Hocko
SUSE Labs

2018-01-24 10:28:00

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-24 10:28 AM, Michal Hocko wrote:
> On Tue 23-01-18 17:39:19, Michel Dänzer wrote:
>> On 2018-01-23 04:36 PM, Michal Hocko wrote:
>>> On Tue 23-01-18 15:27:00, Roman Gushchin wrote:
>>>> On Thu, Jan 18, 2018 at 06:00:06PM +0100, Michal Hocko wrote:
>>>>> On Thu 18-01-18 11:47:48, Andrey Grodzovsky wrote:
>>>>>> Hi, this series is a revised version of an RFC sent by Christian König
>>>>>> a few years ago. The original RFC can be found at
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_archives_dri-2Ddevel_2015-2DSeptember_089778.html&d=DwIDAw&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=R-JIQjy8rqmH5qD581_VYL0Q7cpWSITKOnBCE-3LI8U&s=QZGqKpKuJ2BtioFGSy8_721owcWJ0J6c6d4jywOwN4w&
>>>>> Here is the origin cover letter text
>>>>> : I'm currently working on the issue that when device drivers allocate memory on
>>>>> : behalf of an application the OOM killer usually doesn't knew about that unless
>>>>> : the application also get this memory mapped into their address space.
>>>>> :
>>>>> : This is especially annoying for graphics drivers where a lot of the VRAM
>>>>> : usually isn't CPU accessible and so doesn't make sense to map into the
>>>>> : address space of the process using it.
>>>>> :
>>>>> : The problem now is that when an application starts to use a lot of VRAM those
>>>>> : buffers objects sooner or later get swapped out to system memory, but when we
>>>>> : now run into an out of memory situation the OOM killer obviously doesn't knew
>>>>> : anything about that memory and so usually kills the wrong process.
>>>>> :
>>>>> : The following set of patches tries to address this problem by introducing a per
>>>>> : file OOM badness score, which device drivers can use to give the OOM killer a
>>>>> : hint how many resources are bound to a file descriptor so that it can make
>>>>> : better decisions which process to kill.
>>>>> :
>>>>> : So question at every one: What do you think about this approach?
>>>>> :
>>>>> : My biggest concern right now is the patches are messing with a core kernel
>>>>> : structure (adding a field to struct file). Any better idea? I'm considering
>>>>> : to put a callback into file_ops instead.
>>>>
>>>> Hello!
>>>>
>>>> I wonder if groupoom (aka cgroup-aware OOM killer) can work for you?
>>>
>>> I do not think so. The problem is that the allocating context is not
>>> identical with the end consumer.
>>
>> That's actually not really true. Even in cases where a BO is shared with
>> a different process, it is still used at least occasionally in the
>> process which allocated it as well. Otherwise there would be no point in
>> sharing it between processes.
>
> OK, but somebody has to be made responsible. Otherwise you are just
> killing a process which doesn't really release any memory.
>
>> There should be no problem if the memory of a shared BO is accounted for
>> in each process sharing it. It might be nice to scale each process'
>> "debt" by 1 / (number of processes sharing it) if possible, but in the
>> worst case accounting it fully in each process should be fine.
>
> So how exactly then helps to kill one of those processes? The memory
> stays pinned behind or do I still misunderstand?

Fundamentally, the memory is only released once all references to the
BOs are dropped. That's true no matter how the memory is accounted for
between the processes referencing the BO.


In practice, this should be fine:

1. The amount of memory used for shared BOs is normally small compared
to the amount of memory used for non-shared BOs (and other things). So
regardless of how shared BOs are accounted for, the OOM killer should
first target the process which is responsible for more memory overall.

2. If the OOM killer kills a process which is sharing BOs with another
process, this should result in the other process dropping its references
to the BOs as well, at which point the memory is released.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-24 11:02:36

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Wed 24-01-18 11:27:15, Michel D?nzer wrote:
> On 2018-01-24 10:28 AM, Michal Hocko wrote:
[...]
> > So how exactly then helps to kill one of those processes? The memory
> > stays pinned behind or do I still misunderstand?
>
> Fundamentally, the memory is only released once all references to the
> BOs are dropped. That's true no matter how the memory is accounted for
> between the processes referencing the BO.
>
>
> In practice, this should be fine:
>
> 1. The amount of memory used for shared BOs is normally small compared
> to the amount of memory used for non-shared BOs (and other things). So
> regardless of how shared BOs are accounted for, the OOM killer should
> first target the process which is responsible for more memory overall.

OK. So this is essentially the same as with the normal shared memory
which is a part of the RSS in general.

> 2. If the OOM killer kills a process which is sharing BOs with another
> process, this should result in the other process dropping its references
> to the BOs as well, at which point the memory is released.

OK. How exactly are those BOs mapped to the userspace?
--
Michal Hocko
SUSE Labs

2018-01-24 11:24:21

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-24 12:01 PM, Michal Hocko wrote:
> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>> On 2018-01-24 10:28 AM, Michal Hocko wrote:
> [...]
>>> So how exactly then helps to kill one of those processes? The memory
>>> stays pinned behind or do I still misunderstand?
>>
>> Fundamentally, the memory is only released once all references to the
>> BOs are dropped. That's true no matter how the memory is accounted for
>> between the processes referencing the BO.
>>
>>
>> In practice, this should be fine:
>>
>> 1. The amount of memory used for shared BOs is normally small compared
>> to the amount of memory used for non-shared BOs (and other things). So
>> regardless of how shared BOs are accounted for, the OOM killer should
>> first target the process which is responsible for more memory overall.
>
> OK. So this is essentially the same as with the normal shared memory
> which is a part of the RSS in general.

Right.


>> 2. If the OOM killer kills a process which is sharing BOs with another
>> process, this should result in the other process dropping its references
>> to the BOs as well, at which point the memory is released.
>
> OK. How exactly are those BOs mapped to the userspace?

I'm not sure what you're asking. Userspace mostly uses a GEM handle to
refer to a BO. There can also be userspace CPU mappings of the BO's
memory, but userspace doesn't need CPU mappings for all BOs and only
creates them as needed.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-24 11:52:14

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Wed 24-01-18 12:23:10, Michel D?nzer wrote:
> On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > On Wed 24-01-18 11:27:15, Michel D?nzer wrote:
[...]
> >> 2. If the OOM killer kills a process which is sharing BOs with another
> >> process, this should result in the other process dropping its references
> >> to the BOs as well, at which point the memory is released.
> >
> > OK. How exactly are those BOs mapped to the userspace?
>
> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> refer to a BO. There can also be userspace CPU mappings of the BO's
> memory, but userspace doesn't need CPU mappings for all BOs and only
> creates them as needed.

OK, I guess you have to bear with me some more. This whole stack is a
complete uknonwn. I am mostly after finding a boundary where you can
charge the allocated memory to the process so that the oom killer can
consider it. Is there anything like that? Except for the proposed file
handle hack?
--
Michal Hocko
SUSE Labs

2018-01-24 12:11:59

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 24.01.2018 um 12:50 schrieb Michal Hocko:
> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> [...]
>>>> 2. If the OOM killer kills a process which is sharing BOs with another
>>>> process, this should result in the other process dropping its references
>>>> to the BOs as well, at which point the memory is released.
>>> OK. How exactly are those BOs mapped to the userspace?
>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>> refer to a BO. There can also be userspace CPU mappings of the BO's
>> memory, but userspace doesn't need CPU mappings for all BOs and only
>> creates them as needed.
> OK, I guess you have to bear with me some more. This whole stack is a
> complete uknonwn. I am mostly after finding a boundary where you can
> charge the allocated memory to the process so that the oom killer can
> consider it. Is there anything like that? Except for the proposed file
> handle hack?

Not that I knew of.

As I said before we need some kind of callback that a process now starts
to use a file descriptor, but without anything from that file descriptor
mapped into the address space.

Regards,
Christian.

2018-01-24 14:32:13

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-24 12:50 PM, Michal Hocko wrote:
> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> [...]
>>>> 2. If the OOM killer kills a process which is sharing BOs with another
>>>> process, this should result in the other process dropping its references
>>>> to the BOs as well, at which point the memory is released.
>>>
>>> OK. How exactly are those BOs mapped to the userspace?
>>
>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>> refer to a BO. There can also be userspace CPU mappings of the BO's
>> memory, but userspace doesn't need CPU mappings for all BOs and only
>> creates them as needed.
>
> OK, I guess you have to bear with me some more. This whole stack is a
> complete uknonwn. I am mostly after finding a boundary where you can
> charge the allocated memory to the process so that the oom killer can
> consider it. Is there anything like that?

I think something like charging the memory of a BO to the process when a
userspace handle is created for it, and "uncharging" when a handle is
destroyed, could be a good start.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-30 09:25:08

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 4/4] drm/amdgpu: Use drm_oom_badness for amdgpu.

On Thu, Jan 18, 2018 at 11:47:52AM -0500, Andrey Grodzovsky wrote:
> Signed-off-by: Andrey Grodzovsky <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 46a0c93..6a733cdc8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -828,6 +828,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
> #ifdef CONFIG_COMPAT
> .compat_ioctl = amdgpu_kms_compat_ioctl,
> #endif
> + .oom_file_badness = drm_oom_badness,

Would be neat if we could roll this out for all gem drivers (once it's no
longer an RFC ofc).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-01-30 09:30:40

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-24 12:50 PM, Michal Hocko wrote:
> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> [...]
>>>> 2. If the OOM killer kills a process which is sharing BOs with another
>>>> process, this should result in the other process dropping its references
>>>> to the BOs as well, at which point the memory is released.
>>>
>>> OK. How exactly are those BOs mapped to the userspace?
>>
>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>> refer to a BO. There can also be userspace CPU mappings of the BO's
>> memory, but userspace doesn't need CPU mappings for all BOs and only
>> creates them as needed.
>
> OK, I guess you have to bear with me some more. This whole stack is a
> complete uknonwn. I am mostly after finding a boundary where you can
> charge the allocated memory to the process so that the oom killer can
> consider it. Is there anything like that? Except for the proposed file
> handle hack?

How about the other way around: what APIs can we use to charge /
"uncharge" memory to a process? If we have those, we can experiment with
different places to call them.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-30 09:33:02

by Daniel Vetter

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian K?nig wrote:
> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
> > On Wed 24-01-18 12:23:10, Michel D?nzer wrote:
> > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > On Wed 24-01-18 11:27:15, Michel D?nzer wrote:
> > [...]
> > > > > 2. If the OOM killer kills a process which is sharing BOs with another
> > > > > process, this should result in the other process dropping its references
> > > > > to the BOs as well, at which point the memory is released.
> > > > OK. How exactly are those BOs mapped to the userspace?
> > > I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> > > refer to a BO. There can also be userspace CPU mappings of the BO's
> > > memory, but userspace doesn't need CPU mappings for all BOs and only
> > > creates them as needed.
> > OK, I guess you have to bear with me some more. This whole stack is a
> > complete uknonwn. I am mostly after finding a boundary where you can
> > charge the allocated memory to the process so that the oom killer can
> > consider it. Is there anything like that? Except for the proposed file
> > handle hack?
>
> Not that I knew of.
>
> As I said before we need some kind of callback that a process now starts to
> use a file descriptor, but without anything from that file descriptor mapped
> into the address space.

For more context: With DRI3 and wayland the compositor opens the DRM fd
and then passes it to the client, which then starts allocating stuff. That
makes book-keeping rather annoying.

I guess a good first order approximation would be if we simply charge any
newly allocated buffers to the process that created them, but that means
hanging onto lots of mm_struct pointers since we want to make sure we then
release those pages to the right mm again (since the process that drops
the last ref might be a totally different one, depending upon how the
buffers or DRM fd have been shared).

Would it be ok to hang onto potentially arbitrary mmget references
essentially forever? If that's ok I think we can do your process based
account (minus a few minor inaccuracies for shared stuff perhaps, but no
one cares about that).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-01-30 09:44:10

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-30 10:31 AM, Daniel Vetter wrote:
> On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian König wrote:
>> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
>>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>>>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>>> [...]
>>>>>> 2. If the OOM killer kills a process which is sharing BOs with another
>>>>>> process, this should result in the other process dropping its references
>>>>>> to the BOs as well, at which point the memory is released.
>>>>> OK. How exactly are those BOs mapped to the userspace?
>>>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
>>>> refer to a BO. There can also be userspace CPU mappings of the BO's
>>>> memory, but userspace doesn't need CPU mappings for all BOs and only
>>>> creates them as needed.
>>> OK, I guess you have to bear with me some more. This whole stack is a
>>> complete uknonwn. I am mostly after finding a boundary where you can
>>> charge the allocated memory to the process so that the oom killer can
>>> consider it. Is there anything like that? Except for the proposed file
>>> handle hack?
>>
>> Not that I knew of.
>>
>> As I said before we need some kind of callback that a process now starts to
>> use a file descriptor, but without anything from that file descriptor mapped
>> into the address space.
>
> For more context: With DRI3 and wayland the compositor opens the DRM fd
> and then passes it to the client, which then starts allocating stuff. That
> makes book-keeping rather annoying.

Actually, what you're describing is only true for the buffers shared by
an X server with an X11 compositor. For the actual applications, the
buffers are created on the client side and then shared with the X server
/ Wayland compositor.

Anyway, it doesn't really matter. In all cases, the buffers are actually
used by all parties that are sharing them, so charging the memory to all
of them is perfectly appropriate.


> I guess a good first order approximation would be if we simply charge any
> newly allocated buffers to the process that created them, but that means
> hanging onto lots of mm_struct pointers since we want to make sure we then
> release those pages to the right mm again (since the process that drops
> the last ref might be a totally different one, depending upon how the
> buffers or DRM fd have been shared).
>
> Would it be ok to hang onto potentially arbitrary mmget references
> essentially forever? If that's ok I think we can do your process based
> account (minus a few minor inaccuracies for shared stuff perhaps, but no
> one cares about that).

Honestly, I think you and Christian are overthinking this. Let's try
charging the memory to every process which shares a buffer, and go from
there.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-30 10:31:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Tue 30-01-18 10:29:10, Michel D?nzer wrote:
> On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > On Wed 24-01-18 12:23:10, Michel D?nzer wrote:
> >> On 2018-01-24 12:01 PM, Michal Hocko wrote:
> >>> On Wed 24-01-18 11:27:15, Michel D?nzer wrote:
> > [...]
> >>>> 2. If the OOM killer kills a process which is sharing BOs with another
> >>>> process, this should result in the other process dropping its references
> >>>> to the BOs as well, at which point the memory is released.
> >>>
> >>> OK. How exactly are those BOs mapped to the userspace?
> >>
> >> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> >> refer to a BO. There can also be userspace CPU mappings of the BO's
> >> memory, but userspace doesn't need CPU mappings for all BOs and only
> >> creates them as needed.
> >
> > OK, I guess you have to bear with me some more. This whole stack is a
> > complete uknonwn. I am mostly after finding a boundary where you can
> > charge the allocated memory to the process so that the oom killer can
> > consider it. Is there anything like that? Except for the proposed file
> > handle hack?
>
> How about the other way around: what APIs can we use to charge /
> "uncharge" memory to a process? If we have those, we can experiment with
> different places to call them.

add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.

--
Michal Hocko
SUSE Labs

2018-01-30 10:42:34

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
> [SNIP]
>> Would it be ok to hang onto potentially arbitrary mmget references
>> essentially forever? If that's ok I think we can do your process based
>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>> one cares about that).
> Honestly, I think you and Christian are overthinking this. Let's try
> charging the memory to every process which shares a buffer, and go from
> there.

My problem is that this needs to be bullet prove.

For example imagine an application which allocates a lot of BOs, then
calls fork() and let the parent process die. The file descriptor lives
on in the child process, but the memory is not accounted against the child.

Otherwise we would allow easy construction of deny of service problems.

To avoid that I think we need to add something like new file_operations
callbacks which informs a file descriptor that it is going to be used in
a new process or stopped to be used in a process.

Regards,
Christian.

2018-01-30 10:43:01

by Daniel Vetter

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel D?nzer wrote:
> On 2018-01-30 10:31 AM, Daniel Vetter wrote:
> > On Wed, Jan 24, 2018 at 01:11:09PM +0100, Christian K?nig wrote:
> >> Am 24.01.2018 um 12:50 schrieb Michal Hocko:
> >>> On Wed 24-01-18 12:23:10, Michel D?nzer wrote:
> >>>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
> >>>>> On Wed 24-01-18 11:27:15, Michel D?nzer wrote:
> >>> [...]
> >>>>>> 2. If the OOM killer kills a process which is sharing BOs with another
> >>>>>> process, this should result in the other process dropping its references
> >>>>>> to the BOs as well, at which point the memory is released.
> >>>>> OK. How exactly are those BOs mapped to the userspace?
> >>>> I'm not sure what you're asking. Userspace mostly uses a GEM handle to
> >>>> refer to a BO. There can also be userspace CPU mappings of the BO's
> >>>> memory, but userspace doesn't need CPU mappings for all BOs and only
> >>>> creates them as needed.
> >>> OK, I guess you have to bear with me some more. This whole stack is a
> >>> complete uknonwn. I am mostly after finding a boundary where you can
> >>> charge the allocated memory to the process so that the oom killer can
> >>> consider it. Is there anything like that? Except for the proposed file
> >>> handle hack?
> >>
> >> Not that I knew of.
> >>
> >> As I said before we need some kind of callback that a process now starts to
> >> use a file descriptor, but without anything from that file descriptor mapped
> >> into the address space.
> >
> > For more context: With DRI3 and wayland the compositor opens the DRM fd
> > and then passes it to the client, which then starts allocating stuff. That
> > makes book-keeping rather annoying.
>
> Actually, what you're describing is only true for the buffers shared by
> an X server with an X11 compositor. For the actual applications, the
> buffers are created on the client side and then shared with the X server
> / Wayland compositor.
>
> Anyway, it doesn't really matter. In all cases, the buffers are actually
> used by all parties that are sharing them, so charging the memory to all
> of them is perfectly appropriate.
>
>
> > I guess a good first order approximation would be if we simply charge any
> > newly allocated buffers to the process that created them, but that means
> > hanging onto lots of mm_struct pointers since we want to make sure we then
> > release those pages to the right mm again (since the process that drops
> > the last ref might be a totally different one, depending upon how the
> > buffers or DRM fd have been shared).
> >
> > Would it be ok to hang onto potentially arbitrary mmget references
> > essentially forever? If that's ok I think we can do your process based
> > account (minus a few minor inaccuracies for shared stuff perhaps, but no
> > one cares about that).
>
> Honestly, I think you and Christian are overthinking this. Let's try
> charging the memory to every process which shares a buffer, and go from
> there.

I'm not concerned about wrongly accounting shared buffers (they don't
matter), but imbalanced accounting. I.e. allocate a buffer in the client,
share it, but then the compositor drops the last reference.

If we store the mm_struct pointer in drm_gem_object, we don't need any
callback from the vfs when fds are shared or anything like that. We can
simply account any newly allocated buffers to the current->mm, and then
store that later for dropping the account for when the gem obj is
released. This would entirely ignore any complications with shared
buffers, which I think we can do because even when we pass the DRM fd to a
different process, the actual buffer allocations are not passed around
like that for private buffers. And private buffers are the only ones that
really matter.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2018-01-30 10:48:56

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-30 11:42 AM, Daniel Vetter wrote:
> On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:
>> On 2018-01-30 10:31 AM, Daniel Vetter wrote:
>>
>>> I guess a good first order approximation would be if we simply charge any
>>> newly allocated buffers to the process that created them, but that means
>>> hanging onto lots of mm_struct pointers since we want to make sure we then
>>> release those pages to the right mm again (since the process that drops
>>> the last ref might be a totally different one, depending upon how the
>>> buffers or DRM fd have been shared).
>>>
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process based
>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>> one cares about that).
>>
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go from
>> there.
>
> I'm not concerned about wrongly accounting shared buffers (they don't
> matter), but imbalanced accounting. I.e. allocate a buffer in the client,
> share it, but then the compositor drops the last reference.

I don't think the order matters. The memory is "uncharged" in each
process when it drops its reference.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-30 11:03:07

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-30 11:40 AM, Christian König wrote:
> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>> [SNIP]
>>> Would it be ok to hang onto potentially arbitrary mmget references
>>> essentially forever? If that's ok I think we can do your process based
>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>> one cares about that).
>> Honestly, I think you and Christian are overthinking this. Let's try
>> charging the memory to every process which shares a buffer, and go from
>> there.
>
> My problem is that this needs to be bullet prove.
>
> For example imagine an application which allocates a lot of BOs, then
> calls fork() and let the parent process die. The file descriptor lives
> on in the child process, but the memory is not accounted against the child.

What exactly are you referring to by "the file descriptor" here?


What happens to BO handles in general in this case? If both parent and
child process keep the same handle for the same BO, one of them
destroying the handle will result in the other one not being able to use
it anymore either, won't it?


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-30 11:29:21

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
> On 2018-01-30 11:40 AM, Christian König wrote:
>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>>> [SNIP]
>>>> Would it be ok to hang onto potentially arbitrary mmget references
>>>> essentially forever? If that's ok I think we can do your process based
>>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>>> one cares about that).
>>> Honestly, I think you and Christian are overthinking this. Let's try
>>> charging the memory to every process which shares a buffer, and go from
>>> there.
>> My problem is that this needs to be bullet prove.
>>
>> For example imagine an application which allocates a lot of BOs, then
>> calls fork() and let the parent process die. The file descriptor lives
>> on in the child process, but the memory is not accounted against the child.
> What exactly are you referring to by "the file descriptor" here?

The file descriptor used to identify the connection to the driver. In
other words our drm_file structure in the kernel.

> What happens to BO handles in general in this case? If both parent and
> child process keep the same handle for the same BO, one of them
> destroying the handle will result in the other one not being able to use
> it anymore either, won't it?
Correct.

That usage is actually not useful at all, but we already had
applications which did exactly that by accident.

Not to mention that somebody could do it on purpose.

Regards,
Christian.

2018-01-30 11:35:11

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-30 12:28 PM, Christian König wrote:
> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>> On 2018-01-30 11:40 AM, Christian König wrote:
>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>>>> [SNIP]
>>>>> Would it be ok to hang onto potentially arbitrary mmget references
>>>>> essentially forever? If that's ok I think we can do your process based
>>>>> account (minus a few minor inaccuracies for shared stuff perhaps,
>>>>> but no
>>>>> one cares about that).
>>>> Honestly, I think you and Christian are overthinking this. Let's try
>>>> charging the memory to every process which shares a buffer, and go from
>>>> there.
>>> My problem is that this needs to be bullet prove.
>>>
>>> For example imagine an application which allocates a lot of BOs, then
>>> calls fork() and let the parent process die. The file descriptor lives
>>> on in the child process, but the memory is not accounted against the
>>> child.
>> What exactly are you referring to by "the file descriptor" here?
>
> The file descriptor used to identify the connection to the driver. In
> other words our drm_file structure in the kernel.
>
>> What happens to BO handles in general in this case? If both parent and
>> child process keep the same handle for the same BO, one of them
>> destroying the handle will result in the other one not being able to use
>> it anymore either, won't it?
> Correct.
>
> That usage is actually not useful at all, but we already had
> applications which did exactly that by accident.
>
> Not to mention that somebody could do it on purpose.

Can we just prevent child processes from using their parent's DRM file
descriptors altogether? Allowing it seems like a bad idea all around.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-30 11:36:15

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 30.01.2018 11:48, Michel Dänzer wrote:
> On 2018-01-30 11:42 AM, Daniel Vetter wrote:
>> On Tue, Jan 30, 2018 at 10:43:10AM +0100, Michel Dänzer wrote:
>>> On 2018-01-30 10:31 AM, Daniel Vetter wrote:
>>>
>>>> I guess a good first order approximation would be if we simply charge any
>>>> newly allocated buffers to the process that created them, but that means
>>>> hanging onto lots of mm_struct pointers since we want to make sure we then
>>>> release those pages to the right mm again (since the process that drops
>>>> the last ref might be a totally different one, depending upon how the
>>>> buffers or DRM fd have been shared).
>>>>
>>>> Would it be ok to hang onto potentially arbitrary mmget references
>>>> essentially forever? If that's ok I think we can do your process based
>>>> account (minus a few minor inaccuracies for shared stuff perhaps, but no
>>>> one cares about that).
>>>
>>> Honestly, I think you and Christian are overthinking this. Let's try
>>> charging the memory to every process which shares a buffer, and go from
>>> there.
>>
>> I'm not concerned about wrongly accounting shared buffers (they don't
>> matter), but imbalanced accounting. I.e. allocate a buffer in the client,
>> share it, but then the compositor drops the last reference.
>
> I don't think the order matters. The memory is "uncharged" in each
> process when it drops its reference.

Daniel made a fair point about passing DRM fds between processes, though.

It's not a problem with how the fds are currently used, but somebody
could do the following:

1. Create a DRM fd in process A, allocate lots of buffers.
2. Pass the fd to process B via some IPC mechanism.
3. Exit process A.

There needs to be some assurance that the BOs are accounted as belonging
to process B in the end.

Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

2018-01-30 11:37:47

by Nicolai Hähnle

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 30.01.2018 12:34, Michel Dänzer wrote:
> On 2018-01-30 12:28 PM, Christian König wrote:
>> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>>> On 2018-01-30 11:40 AM, Christian König wrote:
>>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>>>>> [SNIP]
>>>>>> Would it be ok to hang onto potentially arbitrary mmget references
>>>>>> essentially forever? If that's ok I think we can do your process based
>>>>>> account (minus a few minor inaccuracies for shared stuff perhaps,
>>>>>> but no
>>>>>> one cares about that).
>>>>> Honestly, I think you and Christian are overthinking this. Let's try
>>>>> charging the memory to every process which shares a buffer, and go from
>>>>> there.
>>>> My problem is that this needs to be bullet prove.
>>>>
>>>> For example imagine an application which allocates a lot of BOs, then
>>>> calls fork() and let the parent process die. The file descriptor lives
>>>> on in the child process, but the memory is not accounted against the
>>>> child.
>>> What exactly are you referring to by "the file descriptor" here?
>>
>> The file descriptor used to identify the connection to the driver. In
>> other words our drm_file structure in the kernel.
>>
>>> What happens to BO handles in general in this case? If both parent and
>>> child process keep the same handle for the same BO, one of them
>>> destroying the handle will result in the other one not being able to use
>>> it anymore either, won't it?
>> Correct.
>>
>> That usage is actually not useful at all, but we already had
>> applications which did exactly that by accident.
>>
>> Not to mention that somebody could do it on purpose.
>
> Can we just prevent child processes from using their parent's DRM file
> descriptors altogether? Allowing it seems like a bad idea all around.

Existing protocols pass DRM fds between processes though, don't they?

Not child processes perhaps, but special-casing that seems like awful
design.

Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.

2018-01-30 11:44:43

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:
> On 30.01.2018 12:34, Michel Dänzer wrote:
>> On 2018-01-30 12:28 PM, Christian König wrote:
>>> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>>>> On 2018-01-30 11:40 AM, Christian König wrote:
>>>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>>>>>> [SNIP]
>>>>>>> Would it be ok to hang onto potentially arbitrary mmget references
>>>>>>> essentially forever? If that's ok I think we can do your process
>>>>>>> based
>>>>>>> account (minus a few minor inaccuracies for shared stuff perhaps,
>>>>>>> but no
>>>>>>> one cares about that).
>>>>>> Honestly, I think you and Christian are overthinking this. Let's try
>>>>>> charging the memory to every process which shares a buffer, and go
>>>>>> from
>>>>>> there.
>>>>> My problem is that this needs to be bullet prove.
>>>>>
>>>>> For example imagine an application which allocates a lot of BOs, then
>>>>> calls fork() and let the parent process die. The file descriptor lives
>>>>> on in the child process, but the memory is not accounted against the
>>>>> child.
>>>> What exactly are you referring to by "the file descriptor" here?
>>>
>>> The file descriptor used to identify the connection to the driver. In
>>> other words our drm_file structure in the kernel.
>>>
>>>> What happens to BO handles in general in this case? If both parent and
>>>> child process keep the same handle for the same BO, one of them
>>>> destroying the handle will result in the other one not being able to
>>>> use
>>>> it anymore either, won't it?
>>> Correct.
>>>
>>> That usage is actually not useful at all, but we already had
>>> applications which did exactly that by accident.
>>>
>>> Not to mention that somebody could do it on purpose.
>>
>> Can we just prevent child processes from using their parent's DRM file
>> descriptors altogether? Allowing it seems like a bad idea all around.
>
> Existing protocols pass DRM fds between processes though, don't they?
>
> Not child processes perhaps, but special-casing that seems like awful
> design.

Fair enough.

Can we disallow passing DRM file descriptors which have any buffers
allocated? :)


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-01-30 11:57:24

by Christian König

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am 30.01.2018 um 12:42 schrieb Michel Dänzer:
> On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:
>> On 30.01.2018 12:34, Michel Dänzer wrote:
>>> On 2018-01-30 12:28 PM, Christian König wrote:
>>>> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>>>>> On 2018-01-30 11:40 AM, Christian König wrote:
>>>>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>>>>>>> [SNIP]
>>>>>>>> Would it be ok to hang onto potentially arbitrary mmget references
>>>>>>>> essentially forever? If that's ok I think we can do your process
>>>>>>>> based
>>>>>>>> account (minus a few minor inaccuracies for shared stuff perhaps,
>>>>>>>> but no
>>>>>>>> one cares about that).
>>>>>>> Honestly, I think you and Christian are overthinking this. Let's try
>>>>>>> charging the memory to every process which shares a buffer, and go
>>>>>>> from
>>>>>>> there.
>>>>>> My problem is that this needs to be bullet prove.
>>>>>>
>>>>>> For example imagine an application which allocates a lot of BOs, then
>>>>>> calls fork() and let the parent process die. The file descriptor lives
>>>>>> on in the child process, but the memory is not accounted against the
>>>>>> child.
>>>>> What exactly are you referring to by "the file descriptor" here?
>>>> The file descriptor used to identify the connection to the driver. In
>>>> other words our drm_file structure in the kernel.
>>>>
>>>>> What happens to BO handles in general in this case? If both parent and
>>>>> child process keep the same handle for the same BO, one of them
>>>>> destroying the handle will result in the other one not being able to
>>>>> use
>>>>> it anymore either, won't it?
>>>> Correct.
>>>>
>>>> That usage is actually not useful at all, but we already had
>>>> applications which did exactly that by accident.
>>>>
>>>> Not to mention that somebody could do it on purpose.
>>> Can we just prevent child processes from using their parent's DRM file
>>> descriptors altogether? Allowing it seems like a bad idea all around.
>> Existing protocols pass DRM fds between processes though, don't they?
>>
>> Not child processes perhaps, but special-casing that seems like awful
>> design.
> Fair enough.
>
> Can we disallow passing DRM file descriptors which have any buffers
> allocated? :)

Hehe good point, but I'm sorry I have to ruin that.

The root VM page table is allocated when the DRM file descriptor is
created and we want to account those to whoever uses the file descriptor
as well.

We could now make an exception for the root VM page table to not be
accounted (shouldn't be that much compared to the rest of the VM tree),
but Nicolai is right all those exceptions are just an awful design :)

Looking into the fs layer there actually only seem to be two function
which are involved when a file descriptor is installed/removed from a
process. So we just need to add some callbacks there.

Regards,
Christian.

2018-01-30 12:43:44

by Andrey Grodzovsky

[permalink] [raw]
Subject: Re: [PATCH 4/4] drm/amdgpu: Use drm_oom_badness for amdgpu.

That definitely what I planned, just didn't want to clutter the RFC with
multiple repeated changes.

Thanks,

Andrey



On 01/30/2018 04:24 AM, Daniel Vetter wrote:
> On Thu, Jan 18, 2018 at 11:47:52AM -0500, Andrey Grodzovsky wrote:
>> Signed-off-by: Andrey Grodzovsky <[email protected]>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 46a0c93..6a733cdc8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -828,6 +828,7 @@ static const struct file_operations amdgpu_driver_kms_fops = {
>> #ifdef CONFIG_COMPAT
>> .compat_ioctl = amdgpu_kms_compat_ioctl,
>> #endif
>> + .oom_file_badness = drm_oom_badness,
> Would be neat if we could roll this out for all gem drivers (once it's no
> longer an RFC ofc).
> -Daniel


2018-01-30 17:48:52

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-01-30 12:56 PM, Christian König wrote:
> Am 30.01.2018 um 12:42 schrieb Michel Dänzer:
>> On 2018-01-30 12:36 PM, Nicolai Hähnle wrote:
>>> On 30.01.2018 12:34, Michel Dänzer wrote:
>>>> On 2018-01-30 12:28 PM, Christian König wrote:
>>>>> Am 30.01.2018 um 12:02 schrieb Michel Dänzer:
>>>>>> On 2018-01-30 11:40 AM, Christian König wrote:
>>>>>>> Am 30.01.2018 um 10:43 schrieb Michel Dänzer:
>>>>>>>> [SNIP]
>>>>>>>>> Would it be ok to hang onto potentially arbitrary mmget references
>>>>>>>>> essentially forever? If that's ok I think we can do your process
>>>>>>>>> based
>>>>>>>>> account (minus a few minor inaccuracies for shared stuff perhaps,
>>>>>>>>> but no
>>>>>>>>> one cares about that).
>>>>>>>> Honestly, I think you and Christian are overthinking this. Let's
>>>>>>>> try
>>>>>>>> charging the memory to every process which shares a buffer, and go
>>>>>>>> from
>>>>>>>> there.
>>>>>>> My problem is that this needs to be bullet prove.
>>>>>>>
>>>>>>> For example imagine an application which allocates a lot of BOs,
>>>>>>> then
>>>>>>> calls fork() and let the parent process die. The file descriptor
>>>>>>> lives
>>>>>>> on in the child process, but the memory is not accounted against the
>>>>>>> child.
>>>>>> What exactly are you referring to by "the file descriptor" here?
>>>>> The file descriptor used to identify the connection to the driver. In
>>>>> other words our drm_file structure in the kernel.
>>>>>
>>>>>> What happens to BO handles in general in this case? If both parent
>>>>>> and
>>>>>> child process keep the same handle for the same BO, one of them
>>>>>> destroying the handle will result in the other one not being able to
>>>>>> use
>>>>>> it anymore either, won't it?
>>>>> Correct.
>>>>>
>>>>> That usage is actually not useful at all, but we already had
>>>>> applications which did exactly that by accident.
>>>>>
>>>>> Not to mention that somebody could do it on purpose.
>>>> Can we just prevent child processes from using their parent's DRM file
>>>> descriptors altogether? Allowing it seems like a bad idea all around.
>>> Existing protocols pass DRM fds between processes though, don't they?
>>>
>>> Not child processes perhaps, but special-casing that seems like awful
>>> design.
>> Fair enough.
>>
>> Can we disallow passing DRM file descriptors which have any buffers
>> allocated? :)
>
> Hehe good point, but I'm sorry I have to ruin that.
>
> The root VM page table is allocated when the DRM file descriptor is
> created and we want to account those to whoever uses the file descriptor
> as well.

Alternatively, since the file descriptor is closed in the sending
process in this case, maybe we can "uncharge" the buffer memory from the
sending process and charge it to the receiving one during the transfer?


> Looking into the fs layer there actually only seem to be two function
> which are involved when a file descriptor is installed/removed from a
> process. So we just need to add some callbacks there.

That could work for file descriptor passing, but I'm not sure it really
helps for the fork case. Let's say we charge the buffer memory to the
child process as well. If either process later destroys a buffer handle,
the buffer becomes inaccessible to the other process as well, however
its memory remains charged to it (even though it may already be freed).

I think using a DRM file descriptor in both parent and child processes
is a pathological case that we really want to prevent rather than
worrying about how to make it work well. It doesn't seem to be working
well in general already anyway.


Maybe we could keep track of which process "owns" a DRM file descriptor,
and return an error from any relevant system calls for it from another
process. When passing an fd, its ownership would transfer to the
receiving process. When forking, the ownership would remain with the
parent process.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-03-26 14:37:48

by Lucas Stach

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Hi all,

Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
> On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> > On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > >
> > > [...]
> > > > > > 2. If the OOM killer kills a process which is sharing BOs
> > > > > > with another
> > > > > > process, this should result in the other process dropping
> > > > > > its references
> > > > > > to the BOs as well, at which point the memory is released.
> > > > >
> > > > > OK. How exactly are those BOs mapped to the userspace?
> > > >
> > > > I'm not sure what you're asking. Userspace mostly uses a GEM
> > > > handle to
> > > > refer to a BO. There can also be userspace CPU mappings of the
> > > > BO's
> > > > memory, but userspace doesn't need CPU mappings for all BOs and
> > > > only
> > > > creates them as needed.
> > >
> > > OK, I guess you have to bear with me some more. This whole stack
> > > is a
> > > complete uknonwn. I am mostly after finding a boundary where you
> > > can
> > > charge the allocated memory to the process so that the oom killer
> > > can
> > > consider it. Is there anything like that? Except for the proposed
> > > file
> > > handle hack?
> >
> > How about the other way around: what APIs can we use to charge /
> > "uncharge" memory to a process? If we have those, we can experiment
> > with
> > different places to call them.
>
> add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.

So is anyone still working on this? This is hurting us bad enough that
I don't want to keep this topic rotting for another year.

If no one is currently working on this I would volunteer to give the
simple "just account private, non-shared buffers in process RSS" a
spin.

Regards,
Lucas

2018-04-04 09:18:14

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-03-26 04:36 PM, Lucas Stach wrote:
> Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
>> On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
>>> On 2018-01-24 12:50 PM, Michal Hocko wrote:
>>>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>>>>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>>>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>>>>
>>>> [...]
>>>>>>> 2. If the OOM killer kills a process which is sharing BOs
>>>>>>> with another
>>>>>>> process, this should result in the other process dropping
>>>>>>> its references
>>>>>>> to the BOs as well, at which point the memory is released.
>>>>>>
>>>>>> OK. How exactly are those BOs mapped to the userspace?
>>>>>
>>>>> I'm not sure what you're asking. Userspace mostly uses a GEM
>>>>> handle to
>>>>> refer to a BO. There can also be userspace CPU mappings of the
>>>>> BO's
>>>>> memory, but userspace doesn't need CPU mappings for all BOs and
>>>>> only
>>>>> creates them as needed.
>>>>
>>>> OK, I guess you have to bear with me some more. This whole stack
>>>> is a
>>>> complete uknonwn. I am mostly after finding a boundary where you
>>>> can
>>>> charge the allocated memory to the process so that the oom killer
>>>> can
>>>> consider it. Is there anything like that? Except for the proposed
>>>> file
>>>> handle hack?
>>>
>>> How about the other way around: what APIs can we use to charge /
>>> "uncharge" memory to a process? If we have those, we can experiment
>>> with
>>> different places to call them.
>>
>> add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.
>
> So is anyone still working on this? This is hurting us bad enough that
> I don't want to keep this topic rotting for another year.
>
> If no one is currently working on this I would volunteer to give the
> simple "just account private, non-shared buffers in process RSS" a
> spin.

Sounds good. FWIW, I think shared buffers can also be easily handled by
accounting them in each process which has a reference. But that's more
of a detail, shouldn't make a big difference overall either way.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer

2018-04-04 09:38:17

by Lucas Stach

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

Am Mittwoch, den 04.04.2018, 11:09 +0200 schrieb Michel Dänzer:
> On 2018-03-26 04:36 PM, Lucas Stach wrote:
> > Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
> > > On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
> > > > On 2018-01-24 12:50 PM, Michal Hocko wrote:
> > > > > On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
> > > > > > On 2018-01-24 12:01 PM, Michal Hocko wrote:
> > > > > > > On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
> > > > >
> > > > > [...]
> > > > > > > > 2. If the OOM killer kills a process which is sharing BOs
> > > > > > > > with another
> > > > > > > > process, this should result in the other process dropping
> > > > > > > > its references
> > > > > > > > to the BOs as well, at which point the memory is released.
> > > > > > >
> > > > > > > OK. How exactly are those BOs mapped to the userspace?
> > > > > >
> > > > > > I'm not sure what you're asking. Userspace mostly uses a GEM
> > > > > > handle to
> > > > > > refer to a BO. There can also be userspace CPU mappings of the
> > > > > > BO's
> > > > > > memory, but userspace doesn't need CPU mappings for all BOs and
> > > > > > only
> > > > > > creates them as needed.
> > > > >
> > > > > OK, I guess you have to bear with me some more. This whole stack
> > > > > is a
> > > > > complete uknonwn. I am mostly after finding a boundary where you
> > > > > can
> > > > > charge the allocated memory to the process so that the oom killer
> > > > > can
> > > > > consider it. Is there anything like that? Except for the proposed
> > > > > file
> > > > > handle hack?
> > > >
> > > > How about the other way around: what APIs can we use to charge /
> > > > "uncharge" memory to a process? If we have those, we can experiment
> > > > with
> > > > different places to call them.
> > >
> > > add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.
> >
> > So is anyone still working on this? This is hurting us bad enough that
> > I don't want to keep this topic rotting for another year.
> >
> > If no one is currently working on this I would volunteer to give the
> > simple "just account private, non-shared buffers in process RSS" a
> > spin.
>
> Sounds good. FWIW, I think shared buffers can also be easily handled by
> accounting them in each process which has a reference. But that's more
> of a detail, shouldn't make a big difference overall either way.

Yes, both options to wither never account shared buffers or to always
account them into every process having a reference should be pretty
easy. Where it gets hard is when trying to account the buffer only in
the last process holding a reference or something like this.

For the OOM case I think it makes more sense to never account shared
buffers, as this may lead to a process (like the compositor) to have
its RSS inflated by shared buffers, rendering it the likely victim for
the OOM killer/reaper, while killing this process will not lead to
freeing of any shared graphics memory, at least if the clients sharing
the buffer survive killing of the compositor.

This opens up the possibility to "hide" buffers from the accounting by
sharing them, but I guess it's still much better than the nothing we do
today to account for graphics buffers.

Regards,
Lucas

2018-04-04 09:48:05

by Michel Dänzer

[permalink] [raw]
Subject: Re: [RFC] Per file OOM badness

On 2018-04-04 11:36 AM, Lucas Stach wrote:
> Am Mittwoch, den 04.04.2018, 11:09 +0200 schrieb Michel Dänzer:
>> On 2018-03-26 04:36 PM, Lucas Stach wrote:
>>> Am Dienstag, den 30.01.2018, 11:28 +0100 schrieb Michal Hocko:
>>>> On Tue 30-01-18 10:29:10, Michel Dänzer wrote:
>>>>> On 2018-01-24 12:50 PM, Michal Hocko wrote:
>>>>>> On Wed 24-01-18 12:23:10, Michel Dänzer wrote:
>>>>>>> On 2018-01-24 12:01 PM, Michal Hocko wrote:
>>>>>>>> On Wed 24-01-18 11:27:15, Michel Dänzer wrote:
>>>>>>
>>>>>> [...]
>>>>>>>>> 2. If the OOM killer kills a process which is sharing BOs
>>>>>>>>> with another
>>>>>>>>> process, this should result in the other process dropping
>>>>>>>>> its references
>>>>>>>>> to the BOs as well, at which point the memory is released.
>>>>>>>>
>>>>>>>> OK. How exactly are those BOs mapped to the userspace?
>>>>>>>
>>>>>>> I'm not sure what you're asking. Userspace mostly uses a GEM
>>>>>>> handle to
>>>>>>> refer to a BO. There can also be userspace CPU mappings of the
>>>>>>> BO's
>>>>>>> memory, but userspace doesn't need CPU mappings for all BOs and
>>>>>>> only
>>>>>>> creates them as needed.
>>>>>>
>>>>>> OK, I guess you have to bear with me some more. This whole stack
>>>>>> is a
>>>>>> complete uknonwn. I am mostly after finding a boundary where you
>>>>>> can
>>>>>> charge the allocated memory to the process so that the oom killer
>>>>>> can
>>>>>> consider it. Is there anything like that? Except for the proposed
>>>>>> file
>>>>>> handle hack?
>>>>>
>>>>> How about the other way around: what APIs can we use to charge /
>>>>> "uncharge" memory to a process? If we have those, we can experiment
>>>>> with
>>>>> different places to call them.
>>>>
>>>> add_mm_counter() and I would add a new counter e.g. MM_KERNEL_PAGES.
>>>
>>> So is anyone still working on this? This is hurting us bad enough that
>>> I don't want to keep this topic rotting for another year.
>>>
>>> If no one is currently working on this I would volunteer to give the
>>> simple "just account private, non-shared buffers in process RSS" a
>>> spin.
>>
>> Sounds good. FWIW, I think shared buffers can also be easily handled by
>> accounting them in each process which has a reference. But that's more
>> of a detail, shouldn't make a big difference overall either way.
>
> Yes, both options to wither never account shared buffers or to always
> account them into every process having a reference should be pretty
> easy. Where it gets hard is when trying to account the buffer only in
> the last process holding a reference or something like this.

FWIW, I don't think that would make sense anyway. A shared buffer is
actually used by all processes which have a reference to it, so it
should be accounted the same in all of them.


--
Earthling Michel Dänzer | http://www.amd.com
Libre software enthusiast | Mesa and X developer