Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759986AbcDEUkJ (ORCPT ); Tue, 5 Apr 2016 16:40:09 -0400 Received: from mail-pa0-f43.google.com ([209.85.220.43]:32861 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750890AbcDEUkH (ORCPT ); Tue, 5 Apr 2016 16:40:07 -0400 Date: Tue, 5 Apr 2016 13:40:04 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: "Kirill A. Shutemov" , Andrea Arcangeli , Andres Lagar-Cavilla , Yang Shi , Ning Qu , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 01/10] mm: update_lru_size warn and reset bad lru_size In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2789 Lines: 80 Though debug kernels have a VM_BUG_ON to help protect from misaccounting lru_size, non-debug kernels are liable to wrap it around: and then the vast unsigned long size draws page reclaim into a loop of repeatedly doing nothing on an empty list, without even a cond_resched(). That soft lockup looks confusingly like an over-busy reclaim scenario, with lots of contention on the lru_lock in shrink_inactive_list(): yet has a totally different origin. Help differentiate with a custom warning in mem_cgroup_update_lru_size(), even in non-debug kernels; and reset the size to avoid the lockup. But the particular bug which suggested this change was mine alone, and since fixed. Make it a WARN_ONCE: the first occurrence is the most informative, a flurry may follow, yet even when rate-limited little more is learnt. Signed-off-by: Hugh Dickins --- include/linux/mm_inline.h | 2 +- mm/memcontrol.c | 24 ++++++++++++++++++++---- 2 files changed, 21 insertions(+), 5 deletions(-) --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -35,8 +35,8 @@ static __always_inline void del_page_fro struct lruvec *lruvec, enum lru_list lru) { int nr_pages = hpage_nr_pages(page); - mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); list_del(&page->lru); + mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages); } --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1022,22 +1022,38 @@ out: * @lru: index of lru list the page is sitting on * @nr_pages: positive when adding or negative when removing * - * This function must be called when a page is added to or removed from an - * lru list. + * This function must be called under lru_lock, just before a page is added + * to or just after a page is removed from an lru list (that ordering being + * so as to allow it to check that lru_size 0 is consistent with list_empty). */ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, int nr_pages) { struct mem_cgroup_per_zone *mz; unsigned long *lru_size; + long size; + bool empty; if (mem_cgroup_disabled()) return; mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec); lru_size = mz->lru_size + lru; - *lru_size += nr_pages; - VM_BUG_ON((long)(*lru_size) < 0); + empty = list_empty(lruvec->lists + lru); + + if (nr_pages < 0) + *lru_size += nr_pages; + + size = *lru_size; + if (WARN_ONCE(size < 0 || empty != !size, + "%s(%p, %d, %d): lru_size %ld but %sempty\n", + __func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) { + VM_BUG_ON(1); + *lru_size = 0; + } + + if (nr_pages > 0) + *lru_size += nr_pages; } bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg)