Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752110Ab3IRJJx (ORCPT ); Wed, 18 Sep 2013 05:09:53 -0400 Received: from mail-ea0-f178.google.com ([209.85.215.178]:45709 "EHLO mail-ea0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751933Ab3IRJJv (ORCPT ); Wed, 18 Sep 2013 05:09:51 -0400 From: Daniel Vetter To: Linux MM Cc: LKML , DRI Development , Intel Graphics Development , Daniel Vetter , Glauber Costa , Andrew Morton , Rik van Riel , Mel Gorman , Johannes Weiner , Michal Hocko Subject: [PATCH] [RFC] mm/shrinker: Add a shrinker flag to always shrink a bit Date: Wed, 18 Sep 2013 11:10:01 +0200 Message-Id: <1379495401-18279-1-git-send-email-daniel.vetter@ffwll.ch> X-Mailer: git-send-email 1.8.4.rc3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4738 Lines: 111 The drm/i915 gpu driver loves to hang onto as much memory as it can - we cache pinned pages, dma mappings and obviously also gpu address space bindings of buffer objects. On top of that userspace has its own opportunistic cache which is managed by an madvise-like ioctl to tell the kernel which objects are purgeable and which are actually used. This is to cache userspace mmapings and a bit of other metadata about buffer objects needed to be able to hit fastpaths even on fresh objects. We have routine encounters with the OOM killer due to all this crave for memory. The latest one seems to be an artifact of the mm core trying really hard to balance page lru evictions with shrinking caches: The shrinker in drm/i915 doesn't actually free memory, but only drops all the dma mappings and page refcounts so that the backing storage (which is just shmemfs nodes) can actually be evicted. Which means that if the core mm hasn't found anything to evict from the page lru (most likely because drm/i915 has pinned down everything available) it will also not shrink any of the caches. Which leads to a premature OOM while still tons of pages used by gpu buffer objects could be swapped out. For a quick hack I've added a shrink-me-harder flag to make sure there's at least a bit of forward progress. It seems to work. I've called the flag evicts_to_page_lru, but that might just be uninformed me talking ... We should also probably have something with a bit more smarts to be more aggressive when in a tight spot and avoid the minimal shrinking when it's not really required, so maybe take scan_control->priority into account somehow. But since I utterly lack clue I've figured sending out a quick rfc first is better. Also, this needs to be rebased to the new shrinker api in 3.12, I simply haven't rolled my trees forward yet. In any case I just want to get the discussion started on this. Cc: Glauber Costa Cc: Andrew Morton Cc: Rik van Riel Cc: Mel Gorman Cc: Johannes Weiner Cc: Michal Hocko Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69247 Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_gem.c | 1 + include/linux/shrinker.h | 14 ++++++++++++++ mm/vmscan.c | 4 ++++ 3 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d80f33d..7481d0a 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4582,6 +4582,7 @@ i915_gem_load(struct drm_device *dev) dev_priv->mm.inactive_shrinker.shrink = i915_gem_inactive_shrink; dev_priv->mm.inactive_shrinker.seeks = DEFAULT_SEEKS; + dev_priv->mm.inactive_shrinker.evicts_to_page_lru = true; register_shrinker(&dev_priv->mm.inactive_shrinker); } diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index ac6b8ee..361cc2d 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -32,6 +32,20 @@ struct shrinker { int seeks; /* seeks to recreate an obj */ long batch; /* reclaim batch size, 0 = default */ + /* + * Some shrinkers (especially gpu drivers using gem as backing storage) + * hold onto gobloads of pinned pagecache memory (from shmem nodes). + * When those caches get shrunk the memory only gets unpin and so is + * available to be evicted with the page launderer. + * + * The problem is that the core mm tries to balance eviction from the + * page lru with shrinking caches. So if there's nothing on the page lru + * to evict we'll never shrink the gpu driver caches and so will OOM + * despite tons of memory used by gpu buffer objects that could be + * swapped out. Setting this flag ensures forward progress. + */ + bool evicts_to_page_lru; + /* These are for internal use */ struct list_head list; atomic_long_t nr_in_batch; /* objs pending delete */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 2cff0d4..d81f6e0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -254,6 +254,10 @@ unsigned long shrink_slab(struct shrink_control *shrink, total_scan = max_pass; } + /* Always try to shrink a bit to make forward progress. */ + if (shrinker->evicts_to_page_lru) + total_scan = max_t(long, total_scan, batch_size); + /* * We need to avoid excessive windup on filesystem shrinkers * due to large numbers of GFP_NOFS allocations causing the -- 1.8.4.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/