Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp6554353ybx; Mon, 11 Nov 2019 10:52:04 -0800 (PST) X-Google-Smtp-Source: APXvYqzSaVzahqQWFsqT2qbIwEpmbf/kF+ixykQCBARX28lClQuC4rDdd/bH8kYlhLktXhgQPXaF X-Received: by 2002:a50:85cb:: with SMTP id q11mr27412486edh.141.1573498324847; Mon, 11 Nov 2019 10:52:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1573498324; cv=none; d=google.com; s=arc-20160816; b=w/8+cqcwSQe6ge1rel7Ga+uQwaNfGWjZYEtlIdYdJWDEzOlcWjl0mqVysyqO471kvZ hdgEDQZswndc0RimKMaHhunAhiR7MKvdZB8jPClMgS3ANSi9Tr03uD1PcwnXRYfLkrao OKHf3MfRgfVmYvIhir096tFVYKmG52oLmG9snOFgtE+hiQWaIMK6XZbLUNrCFWx0yNF1 VAcq+7InxAgiJ8aRgrcMhYBmlxgl8CSMKk9lJsDDWoOL4ryoC07xWJtsyxO5hkLiWh4e iLwzO0pJ6kXf97+4aSDjROb4CoyjN//JlvleC+givRHhWAnjU2siopHVW1M6hgw1Q6zy M1WA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=+863utmANhrP3CvMy+D6vvyX1bdwDDrT38Iakz8qRks=; b=l/sL2fKv+MqJTTZk68C/btmCfJ9vb4xEmjW9D9odECMn7T/ht1fahVqbTJT24A+rfM rmoEBuGhpz6Nsa7ZTXIkeh5fi7wLDYjf3B/ePKIS6kEGgeKmt47B2HfCiF2XP47PKuiU eoaCAqeUtwUsuxKnxa7A53IoRUth4iWyl8kU/3Uo1pfpPrvMM7cOtMloX8t1aUMT+kCQ DemceTgIRBWSCsRXvhCPLxCRKgkzE+XfxACIkqoZl8ebh6dBiTC4CTPv02hzpH1c1qnJ BYEZPUcPn6yZI+uDUjfPje2+YX1hR13itKMYcQ8wx7CVolsdqLv1o+1GEFb8QdL+k6KY GY8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Kox6ca7t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 93si12602685edo.408.2019.11.11.10.51.40; Mon, 11 Nov 2019 10:52:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Kox6ca7t; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729588AbfKKSvK (ORCPT + 99 others); Mon, 11 Nov 2019 13:51:10 -0500 Received: from mail.kernel.org ([198.145.29.99]:45184 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729954AbfKKSvJ (ORCPT ); Mon, 11 Nov 2019 13:51:09 -0500 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id E30D521655; Mon, 11 Nov 2019 18:51:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1573498267; bh=5qaXQkNJ3aWaFk1jD7Vpz4/dieN6o6QHSC1R30KmmnU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Kox6ca7tw6hBIdS+86Vi/pRvjmDbnFgF6T9aeOtLdUMQszWwhF9W0wocLuq1ZZjLq Gio4BnAbz4tb2wuXyp8m0Ee2PabZv94qgT39l7oQlyoRpJpWl5N0SZMNdI4vduSVZL HZEbTRIFaIdK+TrA4Cf4Crgna8esMaZiOflNtpCY= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Yang Shi , Gang Deng , Hugh Dickins , "Kirill A. Shutemov" , Andrea Arcangeli , Matthew Wilcox , Andrew Morton , Linus Torvalds Subject: [PATCH 5.3 031/193] mm: thp: handle page cache THP correctly in PageTransCompoundMap Date: Mon, 11 Nov 2019 19:26:53 +0100 Message-Id: <20191111181502.629754148@linuxfoundation.org> X-Mailer: git-send-email 2.24.0 In-Reply-To: <20191111181459.850623879@linuxfoundation.org> References: <20191111181459.850623879@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yang Shi commit 169226f7e0d275c1879551f37484ef6683579a5c upstream. We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But the _mapcount < 0 check doesn't work for page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [yang.shi@linux.alibaba.com: v4] Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi Reported-by: Gang Deng Tested-by: Gang Deng Suggested-by: Hugh Dickins Acked-by: Kirill A. Shutemov Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: [4.8+] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 5 ----- include/linux/mm_types.h | 5 +++++ include/linux/page-flags.h | 20 ++++++++++++++++++-- 3 files changed, 23 insertions(+), 7 deletions(-) --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -695,11 +695,6 @@ static inline void *kvcalloc(size_t n, s extern void kvfree(const void *addr); -static inline atomic_t *compound_mapcount_ptr(struct page *page) -{ - return &page[1].compound_mapcount; -} - static inline int compound_mapcount(struct page *page) { VM_BUG_ON_PAGE(!PageCompound(page), page); --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -221,6 +221,11 @@ struct page { #endif } _struct_page_alignment; +static inline atomic_t *compound_mapcount_ptr(struct page *page) +{ + return &page[1].compound_mapcount; +} + /* * Used for sizing the vmemmap region on some architectures */ --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -622,12 +622,28 @@ static inline int PageTransCompound(stru * * Unlike PageTransCompound, this is safe to be called only while * split_huge_pmd() cannot run from under us, like if protected by the - * MMU notifier, otherwise it may result in page->_mapcount < 0 false + * MMU notifier, otherwise it may result in page->_mapcount check false * positives. + * + * We have to treat page cache THP differently since every subpage of it + * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE + * mapped in the current process so comparing subpage's _mapcount to + * compound_mapcount to filter out PTE mapped case. */ static inline int PageTransCompoundMap(struct page *page) { - return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0; + struct page *head; + + if (!PageTransCompound(page)) + return 0; + + if (PageAnon(page)) + return atomic_read(&page->_mapcount) < 0; + + head = compound_head(page); + /* File THP is PMD mapped and not PTE mapped */ + return atomic_read(&page->_mapcount) == + atomic_read(compound_mapcount_ptr(head)); } /*