The drm/i915 gpu driver loves to hang onto as much memory as it can -
we cache pinned pages, dma mappings and obviously also gpu address
space bindings of buffer objects. On top of that userspace has its own
opportunistic cache which is managed by an madvise-like ioctl to tell
the kernel which objects are purgeable and which are actually used.
This is to cache userspace mmapings and a bit of other metadata about
buffer objects needed to be able to hit fastpaths even on fresh
objects.
We have routine encounters with the OOM killer due to all this crave
for memory. The latest one seems to be an artifact of the mm core
trying really hard to balance page lru evictions with shrinking
caches: The shrinker in drm/i915 doesn't actually free memory, but
only drops all the dma mappings and page refcounts so that the backing
storage (which is just shmemfs nodes) can actually be evicted.
Which means that if the core mm hasn't found anything to evict from
the page lru (most likely because drm/i915 has pinned down everything
available) it will also not shrink any of the caches. Which leads to a
premature OOM while still tons of pages used by gpu buffer objects
could be swapped out.
For a quick hack I've added a shrink-me-harder flag to make sure
there's at least a bit of forward progress. It seems to work. I've
called the flag evicts_to_page_lru, but that might just be uninformed
me talking ...
We should also probably have something with a bit more smarts to be more
aggressive when in a tight spot and avoid the minimal shrinking when
it's not really required, so maybe take scan_control->priority into
account somehow. But since I utterly lack clue I've figured sending
out a quick rfc first is better.
Also, this needs to be rebased to the new shrinker api in 3.12, I
simply haven't rolled my trees forward yet. In any case I just want to
get the discussion started on this.
Cc: Glauber Costa <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69247
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 1 +
include/linux/shrinker.h | 14 ++++++++++++++
mm/vmscan.c | 4 ++++
3 files changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d80f33d..7481d0a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4582,6 +4582,7 @@ i915_gem_load(struct drm_device *dev)
dev_priv->mm.inactive_shrinker.shrink = i915_gem_inactive_shrink;
dev_priv->mm.inactive_shrinker.seeks = DEFAULT_SEEKS;
+ dev_priv->mm.inactive_shrinker.evicts_to_page_lru = true;
register_shrinker(&dev_priv->mm.inactive_shrinker);
}
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index ac6b8ee..361cc2d 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -32,6 +32,20 @@ struct shrinker {
int seeks; /* seeks to recreate an obj */
long batch; /* reclaim batch size, 0 = default */
+ /*
+ * Some shrinkers (especially gpu drivers using gem as backing storage)
+ * hold onto gobloads of pinned pagecache memory (from shmem nodes).
+ * When those caches get shrunk the memory only gets unpin and so is
+ * available to be evicted with the page launderer.
+ *
+ * The problem is that the core mm tries to balance eviction from the
+ * page lru with shrinking caches. So if there's nothing on the page lru
+ * to evict we'll never shrink the gpu driver caches and so will OOM
+ * despite tons of memory used by gpu buffer objects that could be
+ * swapped out. Setting this flag ensures forward progress.
+ */
+ bool evicts_to_page_lru;
+
/* These are for internal use */
struct list_head list;
atomic_long_t nr_in_batch; /* objs pending delete */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2cff0d4..d81f6e0 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -254,6 +254,10 @@ unsigned long shrink_slab(struct shrink_control *shrink,
total_scan = max_pass;
}
+ /* Always try to shrink a bit to make forward progress. */
+ if (shrinker->evicts_to_page_lru)
+ total_scan = max_t(long, total_scan, batch_size);
+
/*
* We need to avoid excessive windup on filesystem shrinkers
* due to large numbers of GFP_NOFS allocations causing the
--
1.8.4.rc3
On 18.09.2013 11:10, Daniel Vetter wrote:
Just now I prepared a patch changing the same function in vmscan.c
> Also, this needs to be rebased to the new shrinker api in 3.12, I
> simply haven't rolled my trees forward yet.
Well, you should. Since commit 81e49f shrinker->count_objects might be
set to SHRINK_STOP, causing shrink_slab_node() to complain loud and often:
[ 1908.234595] shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx
The kernel emitted a few thousand log lines like the one quoted above during the
last few days on my system.
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 2cff0d4..d81f6e0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -254,6 +254,10 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> total_scan = max_pass;
> }
>
> + /* Always try to shrink a bit to make forward progress. */
> + if (shrinker->evicts_to_page_lru)
> + total_scan = max_t(long, total_scan, batch_size);
> +
At that place the error message is already emitted.
> /*
> * We need to avoid excessive windup on filesystem shrinkers
> * due to large numbers of GFP_NOFS allocations causing the
Have a look at the attached patch. It fixes my problem with the erroneous/misleading
error messages, and I think it?s right to just bail out early if SHRINK_STOP is found.
Do you agree ?
cu,
Knut
On Wed, Sep 18, 2013 at 12:38:23PM +0200, Knut Petersen wrote:
> On 18.09.2013 11:10, Daniel Vetter wrote:
>
> Just now I prepared a patch changing the same function in vmscan.c
> >Also, this needs to be rebased to the new shrinker api in 3.12, I
> >simply haven't rolled my trees forward yet.
>
> Well, you should. Since commit 81e49f shrinker->count_objects might be
> set to SHRINK_STOP, causing shrink_slab_node() to complain loud and often:
>
> [ 1908.234595] shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx
>
> The kernel emitted a few thousand log lines like the one quoted above during the
> last few days on my system.
>
> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >index 2cff0d4..d81f6e0 100644
> >--- a/mm/vmscan.c
> >+++ b/mm/vmscan.c
> >@@ -254,6 +254,10 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> > total_scan = max_pass;
> > }
> >+ /* Always try to shrink a bit to make forward progress. */
> >+ if (shrinker->evicts_to_page_lru)
> >+ total_scan = max_t(long, total_scan, batch_size);
> >+
> At that place the error message is already emitted.
> > /*
> > * We need to avoid excessive windup on filesystem shrinkers
> > * due to large numbers of GFP_NOFS allocations causing the
>
> Have a look at the attached patch. It fixes my problem with the erroneous/misleading
> error messages, and I think it?s right to just bail out early if SHRINK_STOP is found.
>
> Do you agree ?
Looking at the patch which introduced these error message for you, which
changed the ->count_objects return value from 0 to SHRINK_STOP your patch
below to treat 0 and SHRINK_STOP equally simply reverts the functional
change.
I don't think that's the intention behind SHRINK_STOP. But if it's the
right think to do we better revert the offending commit directly. And
since I lack clue I think that's a call for core mm guys to make.
-Daniel
>
> cu,
> Knut
>
> From 75ae570ce7b0bb6b40c76beb18fc075e9af3127a Mon Sep 17 00:00:00 2001
> From: Knut Petersen <[email protected]>
> Date: Wed, 18 Sep 2013 12:06:33 +0200
> Subject: [PATCH] mm: respect SHRINK_STOP in shrink_slab_node()
>
> Since commit 81e49f811404f428a9d9a63295a0c267e802fa12
> i915_gem_inactive_count() might return SHRINK_STOP.
>
> Unfortunately SHRINK_STOP is not handled propperly in
> shrink_slab_node(), causing a system log cluttered with
> kernel error messages complaining about "negative objects
> to delete".
>
> I think the proper way of handling SHRINK_STOP is obvious,
> we should obey ;-)
>
> Signed-off-by: Knut Petersen <[email protected]>
> ---
> mm/vmscan.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 8ed1b77..b1e6f0d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -244,6 +244,8 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker,
> max_pass = shrinker->count_objects(shrinker, shrinkctl);
> if (max_pass == 0)
> return 0;
> + if (max_pass == SHRINK_STOP)
> + return 0;
>
> /*
> * copy the current shrinker scan count into a local variable
> --
> 1.8.1.4
>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
> Looking at the patch which introduced these error message for you, which
> changed the ->count_objects return value from 0 to SHRINK_STOP your patch
> below to treat 0 and SHRINK_STOP equally simply reverts the functional
> change.
Yes, for i915* it de facto restores the old behaviour.
> I don't think that's the intention behind SHRINK_STOP. But if it's the
> right think to do we better revert the offending commit directly.
But there is other code that also returns SHRINK_STOP. So i believe it?s better to
adapt shrink_slab_node() to handle SHRINK_STOP properly than to revert 81e49f.
> And since I lack clue I think that's a call for core mm guys to make.
I agree. They?ll probably have to apply some additional changes to
shrink_slab_node(). It really doesn?t look right to me, but they certainly
know better what the code is supposed to do ;-)
cu,
Knut
On Wed, Sep 18, 2013 at 12:38:23PM +0200, Knut Petersen wrote:
> On 18.09.2013 11:10, Daniel Vetter wrote:
>
> Just now I prepared a patch changing the same function in vmscan.c
> >Also, this needs to be rebased to the new shrinker api in 3.12, I
> >simply haven't rolled my trees forward yet.
>
> Well, you should. Since commit 81e49f shrinker->count_objects might be
> set to SHRINK_STOP, causing shrink_slab_node() to complain loud and often:
>
> [ 1908.234595] shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx
>
> The kernel emitted a few thousand log lines like the one quoted above during the
> last few days on my system.
>
> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >index 2cff0d4..d81f6e0 100644
> >--- a/mm/vmscan.c
> >+++ b/mm/vmscan.c
> >@@ -254,6 +254,10 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> > total_scan = max_pass;
> > }
> >+ /* Always try to shrink a bit to make forward progress. */
> >+ if (shrinker->evicts_to_page_lru)
> >+ total_scan = max_t(long, total_scan, batch_size);
> >+
> At that place the error message is already emitted.
> > /*
> > * We need to avoid excessive windup on filesystem shrinkers
> > * due to large numbers of GFP_NOFS allocations causing the
>
> Have a look at the attached patch. It fixes my problem with the erroneous/misleading
> error messages, and I think it?s right to just bail out early if SHRINK_STOP is found.
>
> Do you agree ?
No, that's wrong. ->count_objects should never ass SHRINK_STOP.
Indeed, it should always return a count of objects in the cache,
regardless of the context.
SHRINK_STOP is for ->scan_objects to tell the shrinker it can make
any progress due to the context it is called in. This allows the
shirnker to defer the work to another call in a different context.
However, if ->count-objects doesn't return a count, the work that
was supposed to be done cannot be deferred, and that is what
->count_objects should always return the number of objects in the
cache.
Cheers,
Dave.
--
Dave Chinner
[email protected]
[my keyboard my be on the fritz - it's not typing what I'm thinking...]
On Thu, Sep 19, 2013 at 06:38:22AM +1000, Dave Chinner wrote:
> On Wed, Sep 18, 2013 at 12:38:23PM +0200, Knut Petersen wrote:
> > On 18.09.2013 11:10, Daniel Vetter wrote:
> >
> > Just now I prepared a patch changing the same function in vmscan.c
> > >Also, this needs to be rebased to the new shrinker api in 3.12, I
> > >simply haven't rolled my trees forward yet.
> >
> > Well, you should. Since commit 81e49f shrinker->count_objects might be
> > set to SHRINK_STOP, causing shrink_slab_node() to complain loud and often:
> >
> > [ 1908.234595] shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx
> >
> > The kernel emitted a few thousand log lines like the one quoted above during the
> > last few days on my system.
> >
> > >diff --git a/mm/vmscan.c b/mm/vmscan.c
> > >index 2cff0d4..d81f6e0 100644
> > >--- a/mm/vmscan.c
> > >+++ b/mm/vmscan.c
> > >@@ -254,6 +254,10 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> > > total_scan = max_pass;
> > > }
> > >+ /* Always try to shrink a bit to make forward progress. */
> > >+ if (shrinker->evicts_to_page_lru)
> > >+ total_scan = max_t(long, total_scan, batch_size);
> > >+
> > At that place the error message is already emitted.
> > > /*
> > > * We need to avoid excessive windup on filesystem shrinkers
> > > * due to large numbers of GFP_NOFS allocations causing the
> >
> > Have a look at the attached patch. It fixes my problem with the erroneous/misleading
> > error messages, and I think it?s right to just bail out early if SHRINK_STOP is found.
> >
> > Do you agree ?
>
> No, that's wrong. ->count_objects should never ass SHRINK_STOP.
*pass
> Indeed, it should always return a count of objects in the cache,
> regardless of the context.
>
> SHRINK_STOP is for ->scan_objects to tell the shrinker it can make
*can't
> any progress due to the context it is called in. This allows the
> shirnker to defer the work to another call in a different context.
> However, if ->count-objects doesn't return a count, the work that
> was supposed to be done cannot be deferred, and that is what
*why
> ->count_objects should always return the number of objects in the
> cache.
Cheers,
Dave.
--
Dave Chinner
[email protected]
On Wed, Sep 18, 2013 at 10:38 PM, Dave Chinner <[email protected]> wrote:
> No, that's wrong. ->count_objects should never ass SHRINK_STOP.
> Indeed, it should always return a count of objects in the cache,
> regardless of the context.
>
> SHRINK_STOP is for ->scan_objects to tell the shrinker it can make
> any progress due to the context it is called in. This allows the
> shirnker to defer the work to another call in a different context.
> However, if ->count-objects doesn't return a count, the work that
> was supposed to be done cannot be deferred, and that is what
> ->count_objects should always return the number of objects in the
> cache.
So we should rework the locking in the drm/i915 shrinker to be able to
always count objects? Thus far no one screamed yet that we're not
really able to do that in all call contexts ...
So should I revert 81e49f or will the early return 0; completely upset
the core shrinker logic?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Thu, Sep 19, 2013 at 08:57:04AM +0200, Daniel Vetter wrote:
> On Wed, Sep 18, 2013 at 10:38 PM, Dave Chinner <[email protected]> wrote:
> > No, that's wrong. ->count_objects should never ass SHRINK_STOP.
> > Indeed, it should always return a count of objects in the cache,
> > regardless of the context.
> >
> > SHRINK_STOP is for ->scan_objects to tell the shrinker it can make
> > any progress due to the context it is called in. This allows the
> > shirnker to defer the work to another call in a different context.
> > However, if ->count-objects doesn't return a count, the work that
> > was supposed to be done cannot be deferred, and that is what
> > ->count_objects should always return the number of objects in the
> > cache.
>
> So we should rework the locking in the drm/i915 shrinker to be able to
> always count objects? Thus far no one screamed yet that we're not
> really able to do that in all call contexts ...
It's not the end of the world if you count no objects. in an ideal
world, you keep a count of the object sizes on the LRU when you
add/remove the objects on the list, that way .count_objects doesn't
need to walk or lock anything, which is what things like the inode
and dentry caches do...
>
> So should I revert 81e49f or will the early return 0; completely upset
> the core shrinker logic?
It looks to me like 81e49f changed the wrong function to return
SHRINK_STOP. It should have changed i915_gem_inactive_scan() to
return SHRINK_STOP when the locks could not be taken, not
i915_gem_inactive_count().
What should happen is this:
max_pass = count_objects()
if (max_pass == 0)
/* skip to next shrinker */
/* calculate total_scan from max_pass and previous leftovers */
while (total_scan) {
freed = scan_objects(batch_size)
if (freed == SHRINK_STOP)
break; /* can't make progress */
total_scan -= batch_size;
}
/* save remaining total_scan for next pass */
i.e. SHRINK_STOP will abort the scan loop when nothing can be done.
Right now, if nothing can be done because the locks can't be taken,
the scan loop will continue running until total_scan reaches zero.
i.e. it does a whole lotta nothing.
So right now, I'd revert 81e49f and then convert
i915_gem_inactive_scan() to return SHRINK_STOP if it can't get
locks, and everything shoul dwork just fine...
Cheers,
Dave.
--
Dave Chinner
[email protected]
On 19.09.2013 08:57, Daniel Vetter wrote:
> On Wed, Sep 18, 2013 at 10:38 PM, Dave Chinner <[email protected]> wrote:
>> No, that's wrong. ->count_objects should never ass SHRINK_STOP.
>> Indeed, it should always return a count of objects in the cache,
>> regardless of the context.
>>
>> SHRINK_STOP is for ->scan_objects to tell the shrinker it can make
>> any progress due to the context it is called in. This allows the
>> shirnker to defer the work to another call in a different context.
>> However, if ->count-objects doesn't return a count, the work that
>> was supposed to be done cannot be deferred, and that is what
>> ->count_objects should always return the number of objects in the
>> cache.
> So we should rework the locking in the drm/i915 shrinker to be able to
> always count objects? Thus far no one screamed yet that we're not
> really able to do that in all call contexts ...
If this would have been a problem in the past, it probably would
have been ended up as one of those unresolved random glitches ...
> So should I revert 81e49f or will the early return 0; completely upset
> the core shrinker logic?
After Daves answer and a look at all other uses of SHRINK_STOP in the current
kernel sources it is clear that 81e49f must be reverted.
Wherever else SHRINK_STOP is returned, it ends up in ->scan_objects.
So i915_gem_inactive_scan() and not i915_gem_inactive_count()
should return that value in case of a failed trylock:
i915_gem_inactive_scan(struct shrinker *shrinker, struct shrink_control *sc)
{
struct drm_i915_private *dev_priv =
container_of(shrinker,
struct drm_i915_private,
mm.inactive_shrinker);
struct drm_device *dev = dev_priv->dev;
int nr_to_scan = sc->nr_to_scan;
unsigned long freed;
bool unlock = true;
if (!mutex_trylock(&dev->struct_mutex)) {
if (!mutex_is_locked_by(&dev->struct_mutex, current))
- return 0;
+ return SHRINK_STOP;
if (dev_priv->mm.shrinker_no_lock_stealing)
- return 0;
+ return SHRINK_STOP;
unlock = false;
}
atm a kernel with 81e49f reverted,
i915_gem_inactive_scan() changed as described above,
and i915_gem_inactive_count() always counting _without_ any locking
seems to work fine here. Is locking really needed at that place?
cu,
Knut