Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933587AbcDLMKZ (ORCPT ); Tue, 12 Apr 2016 08:10:25 -0400 Received: from mail-wm0-f44.google.com ([74.125.82.44]:38135 "EHLO mail-wm0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932524AbcDLMKY (ORCPT ); Tue, 12 Apr 2016 08:10:24 -0400 Date: Tue, 12 Apr 2016 14:10:20 +0200 From: Michal Hocko To: Hugh Dickins Cc: Andrew Morton , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: mmotm woes, mainly compaction Message-ID: <20160412121020.GC10771@dhcp22.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 8605 Lines: 223 On Tue 12-04-16 00:18:00, Hugh Dickins wrote: > Michal, I'm sorry to say that I now find that I misinformed you. > > You'll remember when we were chasing the order=2 OOMs on two of my > machines at the end of March (in private mail). And you sent me a > mail containing two patches, the second "Another thing to try ... > so this on top" doing a *migrate_mode++. > > I answered you definitively that the first patch worked, > so "I haven't tried adding the one below at all". > > Not true, I'm afraid. Although I had split the *migrate_mode++ one > off into a separate patch that I did not apply, I found looking back > today (when trying to work out why order=2 OOMs were still a problem > on mmotm 2016-04-06) that I never deleted that part from the end of > the first patch; so in fact what I'd been testing had included the > second; and now I find that _it_ was the effective solution. > > Which is particularly sad because I think we were both a bit > uneasy about the *migrate_mode++ one: partly the style of it > incrementing the enum; but more seriously that it advances all the > way to MIGRATE_SYNC, when the first went only to MIGRATE_SYNC_LIGHT. Yeah, I was thinking about this some more and I have dropped MIGRATE_SYNC patch because this is just too dangerous. It gets all the way to to writeout() and this is a great stack overflow hazard. But I guess we do not need this writeout and wait_on_page_writeback (done from __unmap_and_move) would be sufficient. I was already thinking about splitting MIGRATE_SYNC into two states one allowing the wait on events and the other to allow the writeout. > But without it, I am still stuck with the order=2 OOMs. > > And worse: after establishing that that fixes the order=2 OOMs for > me on 4.6-rc2-mm1, I thought I'd better check that the three you > posted today (the 1/2 classzone_idx one, the 2/2 prevent looping > forever, and the "ction-abstract-compaction-feedback-to-helpers-fix"; > but I'm too far behind to consider or try the RFC thp backoff one) > (a) did not surprisingly fix it on their own, and (b) worked well > with the *migrate_mode++ one added in. I am not really sure what you have been testing here. The hugetlb load or the same make on tmpfs? I would be really surprised if any of the above pathces made any difference for the make workload. > (a) as you'd expect, they did not help on their own; and (b) they > worked fine together on the G5 (until it hit the powerpc swapping > sigsegv, which I think the powerpc guys are hoping is a figment of > my imagination); but (b) they did not work fine together on the > laptop, that combination now gives it order=1 OOMs. Despair. Something is definitelly wrong here. I have already seen that compaction is sometimes giving surprising results. I have seen Vlastimil has posted some fixes so maybe this would be a side effect. I also hope to come up with some reasonable set of trace points to tell us more but let's see whether the order-2 issue can be solved first. This is still with the ugly enum++ but let's close eyes and think about something nicer... Thanks! --- diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h index ebf3d89a3919..e1947d7af63f 100644 --- a/include/linux/migrate_mode.h +++ b/include/linux/migrate_mode.h @@ -6,11 +6,14 @@ * on most operations but not ->writepage as the potential stall time * is too significant * MIGRATE_SYNC will block when migrating pages + * MIGRATE_SYNC_WRITEOUT will trigger the IO when migrating pages. Make sure + * to not use this flag from deep stacks. */ enum migrate_mode { MIGRATE_ASYNC, MIGRATE_SYNC_LIGHT, MIGRATE_SYNC, + MIGRATE_SYNC_WRITEOUT, }; #endif /* MIGRATE_MODE_H_INCLUDED */ diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 539b25a76111..0f14c65865ad 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -9,7 +9,8 @@ #define MIGRATE_MODE \ EM( MIGRATE_ASYNC, "MIGRATE_ASYNC") \ EM( MIGRATE_SYNC_LIGHT, "MIGRATE_SYNC_LIGHT") \ - EMe(MIGRATE_SYNC, "MIGRATE_SYNC") + EM( MIGRATE_SYNC, "MIGRATE_SYNC") \ + EMe(MIGRATE_SYNC_WRITEOUT, "MIGRATE_SYNC_WRITEOUT") #define MIGRATE_REASON \ diff --git a/mm/compaction.c b/mm/compaction.c index 68dfbc07692d..7f631a6e234f 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1839,7 +1839,7 @@ static void compact_node(int nid) { struct compact_control cc = { .order = -1, - .mode = MIGRATE_SYNC, + .mode = MIGRATE_SYNC_WRITEOUT, .ignore_skip_hint = true, }; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 5a544c6c0717..a591b29a25ba 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1571,7 +1571,7 @@ static int soft_offline_huge_page(struct page *page, int flags) } ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, - MIGRATE_SYNC, MR_MEMORY_FAILURE); + MIGRATE_SYNC_WRITEOUT, MR_MEMORY_FAILURE); if (ret) { pr_info("soft offline: %#lx: migration failed %d, type %lx\n", pfn, ret, page->flags); @@ -1651,7 +1651,7 @@ static int __soft_offline_page(struct page *page, int flags) page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, - MIGRATE_SYNC, MR_MEMORY_FAILURE); + MIGRATE_SYNC_WRITEOUT, MR_MEMORY_FAILURE); if (ret) { if (!list_empty(&pagelist)) { list_del(&page->lru); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 7ad0d2eb9a2c..6cd8664c9e6e 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1554,7 +1554,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) * migrate_pages returns # of failed pages. */ ret = migrate_pages(&source, alloc_migrate_target, NULL, 0, - MIGRATE_SYNC, MR_MEMORY_HOTPLUG); + MIGRATE_SYNC_WRITEOUT, MR_MEMORY_HOTPLUG); if (ret) putback_movable_pages(&source); } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 7f80ebcd6552..a6a947980773 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1001,7 +1001,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, new_node_page, NULL, dest, - MIGRATE_SYNC, MR_SYSCALL); + MIGRATE_SYNC_WRITEOUT, MR_SYSCALL); if (err) putback_movable_pages(&pagelist); } @@ -1242,7 +1242,7 @@ static long do_mbind(unsigned long start, unsigned long len, if (!list_empty(&pagelist)) { WARN_ON_ONCE(flags & MPOL_MF_LAZY); nr_failed = migrate_pages(&pagelist, new_page, NULL, - start, MIGRATE_SYNC, MR_MEMPOLICY_MBIND); + start, MIGRATE_SYNC_WRITEOUT, MR_MEMPOLICY_MBIND); if (nr_failed) putback_movable_pages(&pagelist); } diff --git a/mm/migrate.c b/mm/migrate.c index 028814625eea..3e907354cfec 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -809,7 +809,7 @@ static int fallback_migrate_page(struct address_space *mapping, { if (PageDirty(page)) { /* Only writeback pages in full synchronous migration */ - if (mode != MIGRATE_SYNC) + if (mode != MIGRATE_SYNC_WRITEOUT) return -EBUSY; return writeout(mapping, page); } @@ -938,7 +938,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage, * the retry loop is too short and in the sync-light case, * the overhead of stalling is too much */ - if (mode != MIGRATE_SYNC) { + if (mode < MIGRATE_SYNC) { rc = -EBUSY; goto out_unlock; } @@ -1187,7 +1187,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page, return -ENOMEM; if (!trylock_page(hpage)) { - if (!force || mode != MIGRATE_SYNC) + if (!force || mode < MIGRATE_SYNC) goto out; lock_page(hpage); } @@ -1447,7 +1447,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm, err = 0; if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, new_page_node, NULL, - (unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL); + (unsigned long)pm, MIGRATE_SYNC_WRITEOUT, MR_SYSCALL); if (err) putback_movable_pages(&pagelist); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6d1da0ceaf1e..d80c9755ffc7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3030,8 +3030,8 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags, * failure could be caused by weak migration mode. */ if (compaction_failed(compact_result)) { - if (*migrate_mode == MIGRATE_ASYNC) { - *migrate_mode = MIGRATE_SYNC_LIGHT; + if (*migrate_mode < MIGRATE_SYNC) { + *migrate_mode++; return true; } return false; -- Michal Hocko SUSE Labs