Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753530AbcDNL4x (ORCPT ); Thu, 14 Apr 2016 07:56:53 -0400 Received: from mx2.suse.de ([195.135.220.15]:54431 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752430AbcDNL4w (ORCPT ); Thu, 14 Apr 2016 07:56:52 -0400 Subject: Re: [PATCH 01/10] mm: update_lru_size warn and reset bad lru_size To: Hugh Dickins , Andrew Morton References: Cc: "Kirill A. Shutemov" , Andrea Arcangeli , Andres Lagar-Cavilla , Yang Shi , Ning Qu , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Johannes Weiner , Michal Hocko , Vladimir Davydov From: Vlastimil Babka Message-ID: <570F8579.2070201@suse.cz> Date: Thu, 14 Apr 2016 13:56:41 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4034 Lines: 105 On 04/05/2016 10:40 PM, Hugh Dickins wrote: > Though debug kernels have a VM_BUG_ON to help protect from misaccounting > lru_size, non-debug kernels are liable to wrap it around: and then the > vast unsigned long size draws page reclaim into a loop of repeatedly > doing nothing on an empty list, without even a cond_resched(). > > That soft lockup looks confusingly like an over-busy reclaim scenario, > with lots of contention on the lru_lock in shrink_inactive_list(): > yet has a totally different origin. > > Help differentiate with a custom warning in mem_cgroup_update_lru_size(), > even in non-debug kernels; and reset the size to avoid the lockup. But > the particular bug which suggested this change was mine alone, and since > fixed. In my opinion, the code now looks quite complicated, not sure it's a good tradeoff for a rare (?) development bug. But I guess it's up to memcg maintainers which I note are not explicitly CC'd, so adding them now. Maybe more generally, we can discuss in LSF/MM's mm debugging session, what it means that DEBUG_VM check has to become unconditional. Does it mean insufficient testing with DEBUG_VM during development/integration phase? Or are some bugs so rare we can't depend on that phase to catch them? IIRC Fedora kernels are built with DEBUG_VM, unless that changed... > Make it a WARN_ONCE: the first occurrence is the most informative, a > flurry may follow, yet even when rate-limited little more is learnt. > > Signed-off-by: Hugh Dickins > --- > include/linux/mm_inline.h | 2 +- > mm/memcontrol.c | 24 ++++++++++++++++++++---- > 2 files changed, 21 insertions(+), 5 deletions(-) > > --- a/include/linux/mm_inline.h > +++ b/include/linux/mm_inline.h > @@ -35,8 +35,8 @@ static __always_inline void del_page_fro > struct lruvec *lruvec, enum lru_list lru) > { > int nr_pages = hpage_nr_pages(page); > - mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); > list_del(&page->lru); > + mem_cgroup_update_lru_size(lruvec, lru, -nr_pages); > __mod_zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru, -nr_pages); > } > > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1022,22 +1022,38 @@ out: > * @lru: index of lru list the page is sitting on > * @nr_pages: positive when adding or negative when removing > * > - * This function must be called when a page is added to or removed from an > - * lru list. > + * This function must be called under lru_lock, just before a page is added > + * to or just after a page is removed from an lru list (that ordering being > + * so as to allow it to check that lru_size 0 is consistent with list_empty). > */ > void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru, > int nr_pages) > { > struct mem_cgroup_per_zone *mz; > unsigned long *lru_size; > + long size; > + bool empty; Could there be more descriptive names? lru_size vs size looks confusing. > > if (mem_cgroup_disabled()) > return; > > mz = container_of(lruvec, struct mem_cgroup_per_zone, lruvec); > lru_size = mz->lru_size + lru; > - *lru_size += nr_pages; > - VM_BUG_ON((long)(*lru_size) < 0); > + empty = list_empty(lruvec->lists + lru); > + > + if (nr_pages < 0) > + *lru_size += nr_pages; > + > + size = *lru_size; > + if (WARN_ONCE(size < 0 || empty != !size, Maybe I'm just not used enough to constructs like "empty != !size", but it really takes me longer than I'd like to get the meaning :( > + "%s(%p, %d, %d): lru_size %ld but %sempty\n", > + __func__, lruvec, lru, nr_pages, size, empty ? "" : "not ")) { > + VM_BUG_ON(1); > + *lru_size = 0; > + } > + > + if (nr_pages > 0) > + *lru_size += nr_pages; > } > > bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg) > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: email@kvack.org >