Received: by 2002:ac0:a591:0:0:0:0:0 with SMTP id m17-v6csp1084756imm; Sat, 7 Jul 2018 16:25:22 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdSpBXhyYbJ52SlKPGM9Ooc8cvV3bHZ8Xo4YdbpqI3df23Ml3aAZ+lyd6M25W/uwXS1jZni X-Received: by 2002:a62:3f44:: with SMTP id m65-v6mr15633010pfa.98.1531005922257; Sat, 07 Jul 2018 16:25:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531005922; cv=none; d=google.com; s=arc-20160816; b=GegBLhu/oPSLXd6C83JxgmwQzEeNoy/6UHacoeafwN/4P7dhvi+LI5Ct70Qv/n9VoZ pDogqer12Vd7xFusgmY5a59rzxifKMBxGLA3XtvnWHmUwLRb4J+B5slSr3IgOjaqMvUQ RZ3VMQOW0joVxKxoNdloyFQ1u0Tv/t1Tf7qmlo+7lyi5v9CaCfFOywNVarSlFuk6+e2n D33Rvnw6XdP3CW5jNqo+7qMZ36cPalVxKjRyfYx1Fz6JqMXqlx5XeLeKE+L3e0lgfZYb DbfSCEN+H3ZFtq+ZTM9ArshyqzTYTM2gAyh0pA7Noub6rfSWpuVYCKMTscP9uh+a8b/T URiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature :arc-authentication-results; bh=9zR2EK1IXekiwZXi1ed05RYOYPIy2NQuN9vwRO80F7U=; b=EAuK3oaZXbP5OSuG+/0U0yKN3JbIza93H81v2t9LA+C/mNS65uo0q9DaK2A80JaiFF 3XoHy+8TnghPmNLgk1llXlbq93M6vrdIPKu826OLUP4qQ25qlt7KSdbs85i1ha7Fkx6e Ur6W9MKN3bufmXpuUCqzhqtfN32mHmwv3ExmnEOQNVyFatkTJC+x2BBYF6ZGhsM44j5t +1cX98GSKvLdnhVFnL1u+ANk/TN3qhRlcwS6rZH2SqvUiC7TL3n75PgP9Rdaz+JmsehY U8GRVJTgO3L+nY3wnq4N01fV15YSftrPglUUGCSRBlgG8sJdIMgzEcvIf7lfD7sQrjK1 EuWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MIILAAky; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m190-v6si11315853pfm.315.2018.07.07.16.24.40; Sat, 07 Jul 2018 16:25:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MIILAAky; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754414AbeGGXXI (ORCPT + 99 others); Sat, 7 Jul 2018 19:23:08 -0400 Received: from mail-io0-f194.google.com ([209.85.223.194]:37753 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754328AbeGGXXG (ORCPT ); Sat, 7 Jul 2018 19:23:06 -0400 Received: by mail-io0-f194.google.com with SMTP id z19-v6so13975073ioh.4 for ; Sat, 07 Jul 2018 16:23:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9zR2EK1IXekiwZXi1ed05RYOYPIy2NQuN9vwRO80F7U=; b=MIILAAkyJuRlm8oFdAzpSlQ2QSQPaZu1UotqOfZyIF5eC7EA89Gt9ohKPzT9jShp9y r9Uiq7s2GHmACXTWfBqO/jZ/oVE89euKaRqACS3/HruXOqwhqdnPxnrPHfgY1ZvocbXL IbB/9yOerJ6IWUgsAY7GC0Vfjcs3DINgRpOAIoBvfKNvYg/Y2pDIP/1huM22kWi2liJ4 CyfbifQkmOmE3m5QC0nFGl8V9jugPVIrh9SaxTPsPQeke+Tr52OglXt3vl1XdNp44pLx 79aGm9ITwQwf2WIt5ano+Py3bHURxJyfPMg1MvUU0ccw1yRBxhPoSHZPOPlf1Q7Sf2Z5 i7nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9zR2EK1IXekiwZXi1ed05RYOYPIy2NQuN9vwRO80F7U=; b=iOmu0R+BBN3Kl72oTvAwjSVIhxZ1eG0qbKxyHaSXg+QSysxx+XFxHbkGX7TUXCZ6/f VqCDxjSmY8SEKN84vWu6fWK7Q1KD+GK5y3hOXkQ6UMPr0uw/nbRCU/pzPhvVLfi3hIEh bigYA3S8DwUTlBAFfXkPMOqyeotitAN1gPA/tifeqlMeNWxKVivuLF2CrLDMCvsC9gS/ f0E12A/JNUd2RFrvotO6gPqusCV/KAe6x6UyWylXIQbCcw8tsUQ34/6jATjOH5kvsMWa kDLnS3+RThe4LbyGni0SGI8gczv04WCWtaP9IAmIsgsahvP4n/PVyzFQEXa2nsqN1fpU WPIA== X-Gm-Message-State: AOUpUlFRnvvrdmHcAI+fDapS/J3B8mrRhjPHZL9o1onGNNq9i7sOLUk9 jgplxm5oIZ/kJLwVgBHy1+LgWUcnc5rlM0NVytY= X-Received: by 2002:a6b:1505:: with SMTP id 5-v6mr4360552iov.56.1531005785987; Sat, 07 Jul 2018 16:23:05 -0700 (PDT) MIME-Version: 1.0 References: <20180622035151.6676-1-ying.huang@intel.com> <20180622035151.6676-4-ying.huang@intel.com> In-Reply-To: <20180622035151.6676-4-ying.huang@intel.com> From: Dan Williams Date: Sat, 7 Jul 2018 16:22:54 -0700 Message-ID: Subject: Re: [PATCH -mm -v4 03/21] mm, THP, swap: Support PMD swap mapping in swap_duplicate() To: ying.huang@intel.com Cc: Andrew Morton , linux-mm , Linux Kernel Mailing List , "Kirill A. Shutemov" , Andrea Arcangeli , Michal Hocko , Johannes Weiner , Shaohua Li , hughd@google.com, Minchan Kim , Rik van Riel , Dave Hansen , n-horiguchi@ah.jp.nec.com, zi.yan@cs.rutgers.edu, daniel.m.jordan@oracle.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 21, 2018 at 8:55 PM Huang, Ying wrote: > > From: Huang Ying > > To support to swapin the THP as a whole, we need to create PMD swap > mapping during swapout, and maintain PMD swap mapping count. This > patch implements the support to increase the PMD swap mapping > count (for swapout, fork, etc.) and set SWAP_HAS_CACHE flag (for > swapin, etc.) for a huge swap cluster in swap_duplicate() function > family. Although it only implements a part of the design of the swap > reference count with PMD swap mapping, the whole design is described > as follow to make it easy to understand the patch and the whole > picture. > > A huge swap cluster is used to hold the contents of a swapouted THP. > After swapout, a PMD page mapping to the THP will become a PMD > swap mapping to the huge swap cluster via a swap entry in PMD. While > a PTE page mapping to a subpage of the THP will become the PTE swap > mapping to a swap slot in the huge swap cluster via a swap entry in > PTE. > > If there is no PMD swap mapping and the corresponding THP is removed > from the page cache (reclaimed), the huge swap cluster will be split > and become a normal swap cluster. > > The count (cluster_count()) of the huge swap cluster is > SWAPFILE_CLUSTER (= HPAGE_PMD_NR) + PMD swap mapping count. Because > all swap slots in the huge swap cluster are mapped by PTE or PMD, or > has SWAP_HAS_CACHE bit set, the usage count of the swap cluster is > HPAGE_PMD_NR. And the PMD swap mapping count is recorded too to make > it easy to determine whether there are remaining PMD swap mappings. > > The count in swap_map[offset] is the sum of PTE and PMD swap mapping > count. This means when we increase the PMD swap mapping count, we > need to increase swap_map[offset] for all swap slots inside the swap > cluster. An alternative choice is to make swap_map[offset] to record > PTE swap map count only, given we have recorded PMD swap mapping count > in the count of the huge swap cluster. But this need to increase > swap_map[offset] when splitting the PMD swap mapping, that may fail > because of memory allocation for swap count continuation. That is > hard to dealt with. So we choose current solution. > > The PMD swap mapping to a huge swap cluster may be split when unmap a > part of PMD mapping etc. That is easy because only the count of the > huge swap cluster need to be changed. When the last PMD swap mapping > is gone and SWAP_HAS_CACHE is unset, we will split the huge swap > cluster (clear the huge flag). This makes it easy to reason the > cluster state. > > A huge swap cluster will be split when splitting the THP in swap > cache, or failing to allocate THP during swapin, etc. But when > splitting the huge swap cluster, we will not try to split all PMD swap > mappings, because we haven't enough information available for that > sometimes. Later, when the PMD swap mapping is duplicated or swapin, > etc, the PMD swap mapping will be split and fallback to the PTE > operation. > > When a THP is added into swap cache, the SWAP_HAS_CACHE flag will be > set in the swap_map[offset] of all swap slots inside the huge swap > cluster backing the THP. This huge swap cluster will not be split > unless the THP is split even if its PMD swap mapping count dropped to > 0. Later, when the THP is removed from swap cache, the SWAP_HAS_CACHE > flag will be cleared in the swap_map[offset] of all swap slots inside > the huge swap cluster. And this huge swap cluster will be split if > its PMD swap mapping count is 0. > > Signed-off-by: "Huang, Ying" > Cc: "Kirill A. Shutemov" > Cc: Andrea Arcangeli > Cc: Michal Hocko > Cc: Johannes Weiner > Cc: Shaohua Li > Cc: Hugh Dickins > Cc: Minchan Kim > Cc: Rik van Riel > Cc: Dave Hansen > Cc: Naoya Horiguchi > Cc: Zi Yan > Cc: Daniel Jordan > --- > include/linux/huge_mm.h | 5 + > include/linux/swap.h | 9 +- > mm/memory.c | 2 +- > mm/rmap.c | 2 +- > mm/swap_state.c | 2 +- > mm/swapfile.c | 287 +++++++++++++++++++++++++++++++++--------------- > 6 files changed, 214 insertions(+), 93 deletions(-) I'm probably missing some background, but I find the patch hard to read. Can you disseminate some of this patch changelog into kernel-doc commentary so it's easier to follow which helpers do what relative to THP swap. > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index d3bbf6bea9e9..213d32e57c39 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -80,6 +80,11 @@ extern struct kobj_attribute shmem_enabled_attr; > #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) > #define HPAGE_PMD_NR (1< > +static inline bool thp_swap_supported(void) > +{ > + return IS_ENABLED(CONFIG_THP_SWAP); > +} > + > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > #define HPAGE_PMD_SHIFT PMD_SHIFT > #define HPAGE_PMD_SIZE ((1UL) << HPAGE_PMD_SHIFT) > diff --git a/include/linux/swap.h b/include/linux/swap.h > index f73eafcaf4e9..57aa655ab27d 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -451,8 +451,8 @@ extern swp_entry_t get_swap_page_of_type(int); > extern int get_swap_pages(int n, bool cluster, swp_entry_t swp_entries[]); > extern int add_swap_count_continuation(swp_entry_t, gfp_t); > extern void swap_shmem_alloc(swp_entry_t); > -extern int swap_duplicate(swp_entry_t); > -extern int swapcache_prepare(swp_entry_t); > +extern int swap_duplicate(swp_entry_t *entry, bool cluster); This patch introduces a new flag to swap_duplicate(), but then all all usages still pass 'false' so why does this patch change the argument. Seems this change belongs to another patch? > +extern int swapcache_prepare(swp_entry_t entry, bool cluster); Rather than add a cluster flag to these helpers can the swp_entry_t carry the cluster flag directly?