Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp3092467ybt; Mon, 22 Jun 2020 14:57:49 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxGEqP8dKmDAMxtGeUQWshpTPVYzxrX/gwMBV4SF+kSxvCe2SOAipHjw31kHwDjdp8Pp1+q X-Received: by 2002:a17:906:842:: with SMTP id f2mr16619355ejd.547.1592863068883; Mon, 22 Jun 2020 14:57:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592863068; cv=none; d=google.com; s=arc-20160816; b=kZ8btTO670MMxLE0j1P2QzAgp4Znb52IC2DAujYnwDtwOuX8V4/mGVQQvCHHGnYOtz OE94GsTjgVA6m3fwVHvjWR7MHlNoa2rI+SqKEdYlbBx/oB+rrEYd4sTbWNnEtDeYSVfh eUKvjucX+P7iG8UU19oifiTVDhWu0AoLdA7A1zy6TNRJ/Odz3XWx1gdQrASNakD8Xh2j EPk4+3yUBiASzzgKKZtNXvU5FKQe7itWu5YFBJjR3coPG9OQ4lxtePVjo+M6ej7MeXLD HTSZEaqiD9c5L8KG3M+7XndF8kvtoxg8o82ClUhLlepfcfhzUuhWorKAUgeBolili/Ec 5CTw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=LgtRcec8pFS8o414D5NIPowdZ8h0bDqM6BzUqAF7wDU=; b=nKDOOca3eSocz6LtaXPauxs4aUkQHnDPUgZgfQbTOB+VNOPk7EfXBydmXOns6ibwIb DaHLMZlmckm0wxtl6fNjbnZkLdqVuQegUyRfVSe5DhfJdH4iyx9zu7wNl6DKdYTBvJOU sp6HF38RYMo4+rh8qtWWosSjaf14Z76seXFRQAyJJEu/129FJeRy5kjwZejuPDIKTLKL SAPAPnbg1xg3SoMOc+QJlWtG9vDb6scBQMDxGhF1e/iBHK++woMRN/ZUWcUweA4LpTAI oE0WOxUiuuVgfzZA9xQmvcxT1L+yoaRb0fm6SiX/kcnKbYik3wvHqXkVUHkzfVfse6Xa 7UGw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d21si3463764eja.703.2020.06.22.14.57.25; Mon, 22 Jun 2020 14:57:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730688AbgFVVxe (ORCPT + 99 others); Mon, 22 Jun 2020 17:53:34 -0400 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:10875 "EHLO hqnvemgate24.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727006AbgFVVxd (ORCPT ); Mon, 22 Jun 2020 17:53:33 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 22 Jun 2020 14:52:02 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 22 Jun 2020 14:53:33 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 22 Jun 2020 14:53:33 -0700 Received: from [10.2.173.37] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 22 Jun 2020 21:53:19 +0000 From: Zi Yan To: Ralph Campbell CC: , , , , , Jerome Glisse , "John Hubbard" , Christoph Hellwig , "Jason Gunthorpe" , Ben Skeggs , Andrew Morton , Shuah Khan , "Huang, Ying" Subject: Re: [PATCH 13/16] mm: support THP migration to device private memory Date: Mon, 22 Jun 2020 17:53:16 -0400 X-Mailer: MailMate (1.13.1r5690) Message-ID: <4C364E23-0716-4D59-85A1-0C293B86BC2C@nvidia.com> In-Reply-To: References: <20200619215649.32297-1-rcampbell@nvidia.com> <20200619215649.32297-14-rcampbell@nvidia.com>

MIME-Version: 1.0 X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: multipart/signed; boundary="=_MailMate_01B16C6A-044A-4133-8EED-4E3E8BD714A0_="; micalg=pgp-sha512; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=_MailMate_01B16C6A-044A-4133-8EED-4E3E8BD714A0_= Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1592862722; bh=NrfKOUou3RzBC3PHYbZiHoCBD1SSkgLigvkNL6kWD7s=; h=X-PGP-Universal:From:To:CC:Subject:Date:X-Mailer:Message-ID: In-Reply-To:References:MIME-Version:X-Originating-IP: X-ClientProxiedBy:Content-Type; b=EHPz3HwOymAIhfvHsNTkvadM7CUoD36zV97f618ebAE/iZZpT3BMn6xTrMsdGh6RJ b49LHOzgLnwyeSPJ+87aUaFqpJj2ZT1qOaFrT5fY0Nm45ZUdyDrImUQVfIWVa7T49M FBlu7dKkkxrZA+nd3WFZ87+B+7k52VcqnG1CTxF2P72ZKxK5bxGBmrkfhTtq2JQpWV pLdExn8VwmGr6trjdIP5e1PDtNrDxqi9oVvuI99lZ7elr/jA6lD4jWG9mECeV6QsQW /572iKrhvMl1JEQhfDUdqCpvEBEiZUE006TZlxagS4MVw0WJJNW80AJWhcnZQQOBif Cj44oP0wf/SBw== On 22 Jun 2020, at 17:31, Ralph Campbell wrote: > On 6/22/20 1:10 PM, Zi Yan wrote: >> On 22 Jun 2020, at 15:36, Ralph Campbell wrote: >> >>> On 6/21/20 4:20 PM, Zi Yan wrote: >>>> On 19 Jun 2020, at 17:56, Ralph Campbell wrote: >>>> >>>>> Support transparent huge page migration to ZONE_DEVICE private memo= ry. >>>>> A new flag (MIGRATE_PFN_COMPOUND) is added to the input PFN array t= o >>>>> indicate the huge page was fully mapped by the CPU. >>>>> Export prep_compound_page() so that device drivers can create huge >>>>> device private pages after calling memremap_pages(). >>>>> >>>>> Signed-off-by: Ralph Campbell >>>>> --- >>>>> include/linux/migrate.h | 1 + >>>>> include/linux/mm.h | 1 + >>>>> mm/huge_memory.c | 30 ++++-- >>>>> mm/internal.h | 1 - >>>>> mm/memory.c | 10 +- >>>>> mm/memremap.c | 9 +- >>>>> mm/migrate.c | 226 ++++++++++++++++++++++++++++++++--= ------ >>>>> mm/page_alloc.c | 1 + >>>>> 8 files changed, 226 insertions(+), 53 deletions(-) >>>>> >>>>> diff --git a/include/linux/migrate.h b/include/linux/migrate.h >>>>> index 3e546cbf03dd..f6a64965c8bd 100644 >>>>> --- a/include/linux/migrate.h >>>>> +++ b/include/linux/migrate.h >>>>> @@ -166,6 +166,7 @@ static inline int migrate_misplaced_transhuge_p= age(struct mm_struct *mm, >>>>> #define MIGRATE_PFN_MIGRATE (1UL << 1) >>>>> #define MIGRATE_PFN_LOCKED (1UL << 2) >>>>> #define MIGRATE_PFN_WRITE (1UL << 3) >>>>> +#define MIGRATE_PFN_COMPOUND (1UL << 4) >>>>> #define MIGRATE_PFN_SHIFT 6 >>>>> >>>>> static inline struct page *migrate_pfn_to_page(unsigned long mpf= n) >>>>> diff --git a/include/linux/mm.h b/include/linux/mm.h >>>>> index dc7b87310c10..020b9dd3cddb 100644 >>>>> --- a/include/linux/mm.h >>>>> +++ b/include/linux/mm.h >>>>> @@ -932,6 +932,7 @@ static inline unsigned int page_shift(struct pa= ge *page) >>>>> } >>>>> >>>>> void free_compound_page(struct page *page); >>>>> +void prep_compound_page(struct page *page, unsigned int order); >>>>> >>>>> #ifdef CONFIG_MMU >>>>> /* >>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>>> index 78c84bee7e29..25d95f7b1e98 100644 >>>>> --- a/mm/huge_memory.c >>>>> +++ b/mm/huge_memory.c >>>>> @@ -1663,23 +1663,35 @@ int zap_huge_pmd(struct mmu_gather *tlb, st= ruct vm_area_struct *vma, >>>>> } else { >>>>> struct page *page =3D NULL; >>>>> int flush_needed =3D 1; >>>>> + bool is_anon =3D false; >>>>> >>>>> if (pmd_present(orig_pmd)) { >>>>> page =3D pmd_page(orig_pmd); >>>>> + is_anon =3D PageAnon(page); >>>>> page_remove_rmap(page, true); >>>>> VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); >>>>> VM_BUG_ON_PAGE(!PageHead(page), page); >>>>> } else if (thp_migration_supported()) { >>>>> swp_entry_t entry; >>>>> >>>>> - VM_BUG_ON(!is_pmd_migration_entry(orig_pmd)); >>>>> entry =3D pmd_to_swp_entry(orig_pmd); >>>>> - page =3D pfn_to_page(swp_offset(entry)); >>>>> + if (is_device_private_entry(entry)) { >>>>> + page =3D device_private_entry_to_page(entry); >>>>> + is_anon =3D PageAnon(page); >>>>> + page_remove_rmap(page, true); >>>>> + VM_BUG_ON_PAGE(page_mapcount(page) < 0, page); >>>>> + VM_BUG_ON_PAGE(!PageHead(page), page); >>>>> + put_page(page); >>>> >>>> Why do you hide this code behind thp_migration_supported()? It seems= that you just need >>>> pmd swap entry not pmd migration entry. Also the condition is not co= nsistent with the code >>>> in __handle_mm_fault(), in which you handle is_device_private_entry(= ) directly without >>>> checking thp_migration_support(). >>> >>> Good point, I think "else if (thp_migration_supported())" should be >>> "else if (is_pmd_migration_entry(orig_pmd))" since if the PMD *is* >>> a device private or migration entry, then it should be handled and th= e >>> VM_BUG_ON() should be that thp_migration_supported() is true >>> (or maybe remove the VM_BUG_ON?). >> >> I disagree. A device private entry is independent of a PMD migration e= ntry, since a device private >> entry is just a swap entry, which is available when CONFIG_TRANSPARENT= _HUGEPAGE. So for architectures >> support THP but not THP migration (like ARM64), your code should still= work. > > I'll fix this up for v2 and you can double check me. Sure. > >> I would suggest you to check all the use of is_swap_pmd() and make sur= e the code >> can handle is_device_private_entry(). > > OK. > >> For new device private code, you might need to guard it either statica= lly or dynamically in case >> CONFIG_DEVICE_PRIVATE is disabled. Potentially, you would like to make= sure a system without >> CONFIG_DEVICE_PRIVATE will not see is_device_private_entry() =3D=3D tr= ue and give errors when it does. > > I have compiled and run with CONFIG_DEVICE_PRIVATE off but I can test m= ore combinations of > config settings. Thanks. > >>> >>>> Do we need to support split_huge_pmd() if a page is migrated to devi= ce? Any new code >>>> needed in split_huge_pmd()? >>> >>> I was thinking that any CPU usage of the device private page would ca= use it to be >>> migrated back to system memory as a whole PMD/PUD page but I'll doubl= e check. >>> At least there should be a check that the page isn't a device private= page. >> >> Well, that depends. If we can allocate a THP on CPU memory, we can mig= rate the whole page back. >> But if no THP is allocated due to low on free memory or memory fragmen= tation, I think you >> might need a fallback plan, either splitting the device private page a= nd migrating smaller >> pages instead or reclaiming CPU memory until you get a THP. IMHO, the = former might be preferred, >> since the latter might cost a lot of CPU cycles but still gives no THP= after all. > > Sounds reasonable. I'll work on adding the fallback path for v2. Ying(cc=E2=80=99d) developed the code to swapout and swapin THP in one pi= ece: https://lore.kernel.org/linux-mm/20181207054122.27822-1-ying.huang@i= ntel.com/. I am not sure whether the patchset makes into mainstream or not. It could= be a good technical reference for swapping in device private pages, although swapping in pages from dis= k and from device private memory are two different scenarios. Since the device private memory swapin impacts core mm performance, we mi= ght want to discuss your patches with more people, like the ones from Ying=E2=80=99s patchset, in the next= version. =E2=80=94 Best Regards, Yan Zi --=_MailMate_01B16C6A-044A-4133-8EED-4E3E8BD714A0_= Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQJDBAEBCgAtFiEEh7yFAW3gwjwQ4C9anbJR82th+ooFAl7xKEwPHHppeUBudmlk aWEuY29tAAoJEJ2yUfNrYfqKTXsP/irp9S1u//paPQIo1knXW4girKRQRZMIplni FXD2ONePaTsfaCUsRmKhnS+rVHxY2RWvG7nNsm+ULdiXkMVeJLA7CPZPmPqZF2JG 5PfskGGC1uRfNMDnFPRfGKXcGjmW5mkuEdMJCmUBQE8eNeSMRS7dMxVqWI4fx6dY sJ/ls76u4xeWFpVraLBpvQKak4Q0DgS9rbxl+pqxV8qQUn3MaLzVlgMVjc5sl5tq yPfiGSEPvPT8WOZM8yihOuu1Jms/0VmFjWVyscCkcKD7/oSeuC3IP4OjEridzWKs NQlyyB2T5nrwG9Wt9tpnMnWeD4ZtaXPQ2nndAQtxO0Gyj3e7jUifKdslQFBysnKk /gpSlGwPbHz/uZ2oH3oxRDR0yWhEcp9WoGq1TQP7JxZnIyVvor1N+F7ijTJ4ZZZ2 SRleICG1LOKDSiT4XySeLGufgPdOM06yBmAryrvPkoGeRBER+J9pr+RruukOSA9m tQzY2pVktqbTbjGTjF+nDGUBm7MF1u4aGWKhQsNdSu7Ueu6/cQGI7I1WzzBQKVN1 mXdnbAHCXhtRfJkz5HlJedGwy/RTVBmOjnZ+/tArhiW6gN5KFAzhxbcrr+M/5FIp CS+sUnnxu3LSd3XAOMebjlHu3D13ezHq64Rw2MAeI/FBBtPRmD4YyRZPUL1GdGuB Y8JMeB04 =Agy3 -----END PGP SIGNATURE----- --=_MailMate_01B16C6A-044A-4133-8EED-4E3E8BD714A0_=--