Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754181AbdLHQQN (ORCPT ); Fri, 8 Dec 2017 11:16:13 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:37319 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753830AbdLHQQM (ORCPT ); Fri, 8 Dec 2017 11:16:12 -0500 X-Google-Smtp-Source: AGs4zMbcTTyUIOeTAEjiSndo7JNlf2RZMhUk3bmilo/JoZoA4DiyuZtWR90VloNczGq7F/ecHlUAMw== From: Michal Hocko To: Cc: Zi Yan , Naoya Horiguchi , "Kirill A. Shutemov" , Vlastimil Babka , Andrew Morton , Andrea Reale , LKML Subject: [RFC PATCH 0/3] mm: unclutter THP migration Date: Fri, 8 Dec 2017 17:15:56 +0100 Message-Id: <20171208161559.27313-1-mhocko@kernel.org> X-Mailer: git-send-email 2.15.0 In-Reply-To: <20171207143401.GK20234@dhcp22.suse.cz> References: <20171207143401.GK20234@dhcp22.suse.cz> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2638 Lines: 52 On Thu 07-12-17 15:34:01, Michal Hocko wrote: > On Thu 07-12-17 22:10:47, Zi Yan wrote: [...] > > I agree with you that we should try to migrate all tail pages if the THP > > needs to be split. But this might not be compatible with "getting > > migration results" in unmap_and_move(), since a caller of > > migrate_pages() may want to know the status of each page in the > > migration list via int **result in get_new_page() (e.g. > > new_page_node()). The caller has no idea whether a THP in its migration > > list will be split or not, thus, storing migration results might be > > quite tricky if tail pages are added into the migration list. > > Ouch. I wasn't aware of this "beauty". I will try to wrap my head around > this code and think about what to do about it. Thanks for point me to > it. OK, so was staring at this yesterday and concluded that the current implementation of do_move_page_to_node_array is unfixable to work with split_thp_page_list in migrate_pages. So I've reimplemented it to not use the quite ugly fixed sized batching. Instead I am using dynamic batching based on the same node request. See the patch 1 for more details about implementation. This will allow us to remove the quite ugly 'int **reason' from the allocation callback as well. This is patch 2 and patch 3 is finally the thp migration code. Diffstat is quite supportive for this cleanup. include/linux/migrate.h | 7 +- include/linux/page-isolation.h | 3 +- mm/compaction.c | 3 +- mm/huge_memory.c | 6 + mm/internal.h | 1 + mm/memory_hotplug.c | 5 +- mm/mempolicy.c | 40 +---- mm/migrate.c | 350 ++++++++++++++++++----------------------- mm/page_isolation.c | 3 +- 9 files changed, 177 insertions(+), 241 deletions(-) Does anybody see any issues with this approach? On a side note: There are some semantic issues I have encountered on the way but I am not addressing them here. E.g. trying to move THP tail pages will result in either success or EBUSY (the later one more likely once we isolate head from the LRU list). Hugetlb reports EACCESS on tail pages. Some errors are reported via status parameter but migration failures are not even though the original `reason' argument suggests there was an intention to do so. From a quick look into git history this never worked. I have tried to keep the semantic unchanged. Then there is a relatively minor thing that the page isolation might fail because of pages not being on the LRU - e.g. because they are sitting on the per-cpu LRU caches. Easily fixable.