Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp1361698imj; Thu, 7 Feb 2019 23:38:29 -0800 (PST) X-Google-Smtp-Source: AHgI3IZMuQm/Q9vqwlaKHcBQkL1HK3S10k8P2LLXp/aMTFJOJVlWm+XGmPtwR7DufMijYBUPZWDl X-Received: by 2002:aa7:849a:: with SMTP id u26mr12796620pfn.157.1549611509838; Thu, 07 Feb 2019 23:38:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1549611509; cv=none; d=google.com; s=arc-20160816; b=Bg5ddj2M+T6Ke6L9uDOEfIk8RVb5hZ1M5emQF8wtfefTENPdRzrwf4764xPij93bzT keEtqwBKPYdzb+GKqEHmnDjEYBWgbJcGsyW0WrmXCYdmEzo3QurbG3VOOZAXHydXP1Em 7IHyQ4HI/4J0aRQx6bebKV3oapeyrp3g6t9gAtLzaEjOHFfDjpan23JdeZXx3BinyFwE 6meOeiIk1VEkfjVkYgd7dnxmkouMxdWVj9QaMBWkADXRjyMHI45EBPatyWjpAE9IzIY1 LWzbSKM81ihwNSCcbDdxNReteqCAHiGwDzCqUvzO7ZdwaO3Fp6Eu4vl9e0TaC11QRi7R i2Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-id:content-language:accept-language:in-reply-to:references :message-id:date:thread-index:thread-topic:subject:cc:to:from; bh=95GDjCoRKsvlTMD8ETs2aYtF3Fz9OQygUmko5VDMsEU=; b=aIy1L/bVs9Fv9reIVU11TaeXFEffNmkUJBQazPqwSFmxCzYiGlM1V5honDNso0DtYc kAEbTW75mTqHrCuefpd5mKw9ChMhIK/vB5cEY3pZ64l9Zrg35kv7156DC+FdHF5aFQBx /+rz3yhdgF1TcZfClK8u84CR7mzEfE8JbjQp0DEa+poBeSk1BmWbt6BUEOKhhGtGvQ9i gyHccbrHfR39fOS2SRKxFQ8bi++eAphBFRqkiK5jyU+BNL8MmCl2jVv2UAKoVqXnj7bN HpQ1/12sBLaUYwYJM6LwQYuaDzgFy+Dns5IzCNE267sjlKr/PuzjNXDpqIRYCw/GxUmb kRmA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g92si1498839plg.392.2019.02.07.23.38.14; Thu, 07 Feb 2019 23:38:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727182AbfBHHgp convert rfc822-to-8bit (ORCPT + 99 others); Fri, 8 Feb 2019 02:36:45 -0500 Received: from tyo161.gate.nec.co.jp ([114.179.232.161]:59848 "EHLO tyo161.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726115AbfBHHgo (ORCPT ); Fri, 8 Feb 2019 02:36:44 -0500 Received: from mailgate01.nec.co.jp ([114.179.233.122]) by tyo161.gate.nec.co.jp (8.15.1/8.15.1) with ESMTPS id x187YEIQ004889 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 8 Feb 2019 16:34:14 +0900 Received: from mailsv01.nec.co.jp (mailgate-v.nec.co.jp [10.204.236.94]) by mailgate01.nec.co.jp (8.15.1/8.15.1) with ESMTP id x187YEHs004312; Fri, 8 Feb 2019 16:34:14 +0900 Received: from mail01b.kamome.nec.co.jp (mail01b.kamome.nec.co.jp [10.25.43.2]) by mailsv01.nec.co.jp (8.15.1/8.15.1) with ESMTP id x187Wdv1024689; Fri, 8 Feb 2019 16:34:14 +0900 Received: from bpxc99gp.gisp.nec.co.jp ([10.38.151.148] [10.38.151.148]) by mail03.kamome.nec.co.jp with ESMTP id BT-MMP-2250485; Fri, 8 Feb 2019 16:31:51 +0900 Received: from BPXM23GP.gisp.nec.co.jp ([10.38.151.215]) by BPXC20GP.gisp.nec.co.jp ([10.38.151.148]) with mapi id 14.03.0319.002; Fri, 8 Feb 2019 16:31:50 +0900 From: Naoya Horiguchi To: Mike Kravetz CC: "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Michal Hocko , "Andrea Arcangeli" , "Kirill A . Shutemov" , Mel Gorman , Davidlohr Bueso , Andrew Morton , "stable@vger.kernel.org" Subject: Re: [PATCH] huegtlbfs: fix page leak during migration of file pages Thread-Topic: [PATCH] huegtlbfs: fix page leak during migration of file pages Thread-Index: AQHUuODePgsnC0Y0+kCvHMlE24frN6XUI3qAgACAsgCAADeYAIAAHE6A Date: Fri, 8 Feb 2019 07:31:49 +0000 Message-ID: <20190208073149.GA14423@hori1.linux.bs1.fc.nec.co.jp> References: <20190130211443.16678-1-mike.kravetz@oracle.com> <917e7673-051b-e475-8711-ed012cff4c44@oracle.com> <20190208023132.GA25778@hori1.linux.bs1.fc.nec.co.jp> <07ce373a-d9ea-f3d3-35cc-5bc181901caf@oracle.com> In-Reply-To: <07ce373a-d9ea-f3d3-35cc-5bc181901caf@oracle.com> Accept-Language: en-US, ja-JP Content-Language: ja-JP X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.51.8.80] Content-Type: text/plain; charset="iso-2022-jp" Content-ID: Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 07, 2019 at 09:50:30PM -0800, Mike Kravetz wrote: > On 2/7/19 6:31 PM, Naoya Horiguchi wrote: > > On Thu, Feb 07, 2019 at 10:50:55AM -0800, Mike Kravetz wrote: > >> On 1/30/19 1:14 PM, Mike Kravetz wrote: > >>> +++ b/fs/hugetlbfs/inode.c > >>> @@ -859,6 +859,16 @@ static int hugetlbfs_migrate_page(struct address_space *mapping, > >>> rc = migrate_huge_page_move_mapping(mapping, newpage, page); > >>> if (rc != MIGRATEPAGE_SUCCESS) > >>> return rc; > >>> + > >>> + /* > >>> + * page_private is subpool pointer in hugetlb pages, transfer > >>> + * if needed. > >>> + */ > >>> + if (page_private(page) && !page_private(newpage)) { > >>> + set_page_private(newpage, page_private(page)); > >>> + set_page_private(page, 0); > > > > You don't have to copy PagePrivate flag? > > > > Well my original thought was no. For hugetlb pages, PagePrivate is not > associated with page_private. It indicates a reservation was consumed. > It is set when a hugetlb page is newly allocated and the allocation is > associated with a reservation and the global reservation count is > decremented. When the page is added to the page cache or rmap, > PagePrivate is cleared. If the page is free'ed before being added to page > cache or rmap, PagePrivate tells free_huge_page to restore (increment) the > reserve count as we did not 'instantiate' the page. > > So, PagePrivate is only set from the time a huge page is allocated until > it is added to page cache or rmap. My original thought was that the page > could not be migrated during this time. However, I am not sure if that > reasoning is correct. The page is not locked, so it would appear that it > could be migrated? But, if it can be migrated at this time then perhaps > there are bigger issues for the (hugetlb) page fault code? In my understanding, free hugetlb pages are not expected to be passed to migrate_pages(), and currently that's ensured by each migration caller which checks and avoids free hugetlb pages on its own. migrate_pages() and its internal code are probably not aware of handling free hugetlb pages, so if they are accidentally passed to migration code, that's a big problem as you are concerned. So the above reasoning should work at least this assumption is correct. Most of migration callers are not intersted in moving free hugepages. The one I'm not sure of is the code path from alloc_contig_range(). If someone think it's worthwhile to migrate free hugepage to get bigger contiguous memory, he/she tries to enable that code path and the assumption will be broken. Thanks, Naoya Horiguchi > > >>> + > >>> + } > >>> + > >>> if (mode != MIGRATE_SYNC_NO_COPY) > >>> migrate_page_copy(newpage, page); > >>> else > >>> diff --git a/mm/migrate.c b/mm/migrate.c > >>> index f7e4bfdc13b7..0d9708803553 100644 > >>> --- a/mm/migrate.c > >>> +++ b/mm/migrate.c > >>> @@ -703,8 +703,14 @@ void migrate_page_states(struct page *newpage, struct page *page) > >>> */ > >>> if (PageSwapCache(page)) > >>> ClearPageSwapCache(page); > >>> - ClearPagePrivate(page); > >>> - set_page_private(page, 0); > >>> + /* > >>> + * Unlikely, but PagePrivate and page_private could potentially > >>> + * contain information needed at hugetlb free page time. > >>> + */ > >>> + if (!PageHuge(page)) { > >>> + ClearPagePrivate(page); > >>> + set_page_private(page, 0); > >>> + } > > > > # This argument is mainly for existing code... > > > > According to the comment on migrate_page(): > > > > /* > > * Common logic to directly migrate a single LRU page suitable for > > * pages that do not use PagePrivate/PagePrivate2. > > * > > * Pages are locked upon entry and exit. > > */ > > int migrate_page(struct address_space *mapping, ... > > > > So this common logic assumes that page_private is not used, so why do > > we explicitly clear page_private in migrate_page_states()? > > Perhaps someone else knows. If not, I can do some git research and > try to find out why. > > > buffer_migrate_page(), which is commonly used for the case when > > page_private is used, does that clearing outside migrate_page_states(). > > So I thought that hugetlbfs_migrate_page() could do in the similar manner. > > IOW, migrate_page_states() should not do anything on PagePrivate. > > But there're a few other .migratepage callbacks, and I'm not sure all of > > them are safe for the change, so this approach might not fit for a small fix. > > I will look at those as well unless someone knows without researching. > > > > > # BTW, there seems a typo in $SUBJECT. > > Thanks! > > -- > Mike Kravetz >