Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754461AbdC3EPS (ORCPT ); Thu, 30 Mar 2017 00:15:18 -0400 Received: from mga05.intel.com ([192.55.52.43]:36503 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751138AbdC3EPQ (ORCPT ); Thu, 30 Mar 2017 00:15:16 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,244,1486454400"; d="scan'208";a="82152892" From: "Huang\, Ying" To: Johannes Weiner Cc: "Huang\, Ying" , Andrew Morton , , Subject: Re: [PATCH -mm -v7 9/9] mm, THP, swap: Delay splitting THP during swap out References: <20170328053209.25876-1-ying.huang@intel.com> <20170328053209.25876-10-ying.huang@intel.com> <20170329171654.GD31821@cmpxchg.org> Date: Thu, 30 Mar 2017 12:15:13 +0800 In-Reply-To: <20170329171654.GD31821@cmpxchg.org> (Johannes Weiner's message of "Wed, 29 Mar 2017 13:16:54 -0400") Message-ID: <871stftn72.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3679 Lines: 116 Johannes Weiner writes: > On Tue, Mar 28, 2017 at 01:32:09PM +0800, Huang, Ying wrote: >> @@ -183,12 +184,53 @@ void __delete_from_swap_cache(struct page *page) >> ADD_CACHE_INFO(del_total, nr); >> } >> >> +#ifdef CONFIG_THP_SWAP_CLUSTER >> +int add_to_swap_trans_huge(struct page *page, struct list_head *list) >> +{ >> + swp_entry_t entry; >> + int ret = 0; >> + >> + /* cannot split, which may be needed during swap in, skip it */ >> + if (!can_split_huge_page(page, NULL)) >> + return -EBUSY; >> + /* fallback to split huge page firstly if no PMD map */ >> + if (!compound_mapcount(page)) >> + return 0; >> + entry = get_huge_swap_page(); >> + if (!entry.val) >> + return 0; >> + if (mem_cgroup_try_charge_swap(page, entry, HPAGE_PMD_NR)) { >> + __swapcache_free(entry, true); >> + return -EOVERFLOW; >> + } >> + ret = add_to_swap_cache(page, entry, >> + __GFP_HIGH | __GFP_NOMEMALLOC|__GFP_NOWARN); >> + /* -ENOMEM radix-tree allocation failure */ >> + if (ret) { >> + __swapcache_free(entry, true); >> + return 0; >> + } >> + ret = split_huge_page_to_list(page, list); >> + if (ret) { >> + delete_from_swap_cache(page); >> + return -EBUSY; >> + } >> + return 1; >> +} >> +#else >> +static inline int add_to_swap_trans_huge(struct page *page, >> + struct list_head *list) >> +{ >> + return 0; >> +} >> +#endif >> + >> /** >> * add_to_swap - allocate swap space for a page >> * @page: page we want to move to swap >> * >> * Allocate swap space for the page and add the page to the >> - * swap cache. Caller needs to hold the page lock. >> + * swap cache. Caller needs to hold the page lock. >> */ >> int add_to_swap(struct page *page, struct list_head *list) >> { >> @@ -198,6 +240,18 @@ int add_to_swap(struct page *page, struct list_head *list) >> VM_BUG_ON_PAGE(!PageLocked(page), page); >> VM_BUG_ON_PAGE(!PageUptodate(page), page); >> >> + if (unlikely(PageTransHuge(page))) { >> + err = add_to_swap_trans_huge(page, list); >> + switch (err) { >> + case 1: >> + return 1; >> + case 0: >> + /* fallback to split firstly if return 0 */ >> + break; >> + default: >> + return 0; >> + } >> + } >> entry = get_swap_page(); >> if (!entry.val) >> return 0; > > add_to_swap_trans_huge() is too close a copy of add_to_swap(), which > makes the code error prone for future modifications to the swap slot > allocation protocol. > > This should read: > > retry: > entry = get_swap_page(page); > if (!entry.val) { > if (PageTransHuge(page)) { > split_huge_page_to_list(page, list); > goto retry; > } > return 0; > } If the swap space is used up, that is, get_swap_page() cannot allocate even 1 swap entry for a normal page. We will split THP unnecessarily with the change, but in the original code, we just skip the THP. There may be a performance regression here. Similar problem exists for mem_cgroup_try_charge_swap() too. If the mem cgroup exceeds the swap limit, the THP will be split unnecessary with the change too. > And get_swap_page(), mem_cgroup_try_charge_swap() etc. should all > check PageTransHuge() instead of having extra parameters or separate > code paths for the huge page case. > > In general, don't try to tack this feature onto the side of the > VM. Because right now, this looks a bit like the hugetlb code, with > one big branch in the beginning that opens up an alternate > reality. Instead, these functions should handle THP all the way down > the stack, and without passing down redundant information. Yes. We should share the code as much as possible. I just have some questions as above. Could you help me on that? Best Regards, Huang, Ying