Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp212940iog; Tue, 28 Jun 2022 20:58:50 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uBi9+wNmCfQ0nMTX4zld+cVqqDqLnCW3zv4yrDv7X/prTT752YsiVTE0fbXZQ8eNQcZDir X-Received: by 2002:a63:85c6:0:b0:40d:80e9:49e7 with SMTP id u189-20020a6385c6000000b0040d80e949e7mr1257018pgd.478.1656475130030; Tue, 28 Jun 2022 20:58:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1656475130; cv=pass; d=google.com; s=arc-20160816; b=JB02S4rfJhJ56Akn/Y6Rs3YeIlIZqkpT8RUGOv2qctBoUIhMekE2ve/U+rBD7IuqKy V3gKKyzfG6wB9TkkPxv0LuDMs0aSou5s2RdrhwytCXaAnf3nYuTq3H9xRqkaMZhGXm4k br01OsTrca7hOxDtTsf4w53zvfxmORki/V52T3QLw/zGCOMa1F9/T/DJURrNlqTZOiyz IA4dL9CUwXbHTVSAa2OPVzsSEXNXxvcBsx9vysPXIxnH/+dw8lXFUWMO3CsuYolGn2FA 6bsEQxGsq7R9oUIopuioRLa+msZefLSOS0U0a7ZQZ/dBpKUYe8Pa4VNVbaUjnKmTqYwj p/lA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VaeO4CpO6lgNhNxmThByUtktpghZ0UDD5SZJhWAEnj8=; b=ASMOUHxMTrTx5jkmL42kbjdYnHOX0ZP1z2WUC/6QIZrqjNMWpLC8vSfnUrWk1Lfvwi Pc0hgqOw6DNzS7yLZRmWabkunEuPVI0/Yy8/wdiSA1DiyPyCum+i1y5recB48ORPshaA 5dFSdnQfU2eg8i3Jw9Epf5W2/GJd+ARjyY2FyF+G0u21M5FakjKs8eG3NBw8FDyXXLUM +jXFfyOL52BYkyUGIXqBdBOJu2ydw28UXFloMnHZ1MGoIG3beKPrjCMyP7ll4ZEPPuEz XpBSpArdpx4QLsk5R08Nw6eqeWBRBHNISZlfF35+Jx0q3KEi1eIpz6jtmI2Jh2XdRt9X Y2GA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=XRsRGv43; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a1-20020a17090a740100b001e2bf2a53ccsi2002078pjg.72.2022.06.28.20.58.30; Tue, 28 Jun 2022 20:58:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=XRsRGv43; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231334AbiF2DzU (ORCPT + 99 others); Tue, 28 Jun 2022 23:55:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231339AbiF2DzQ (ORCPT ); Tue, 28 Jun 2022 23:55:16 -0400 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2052.outbound.protection.outlook.com [40.107.237.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CE0B2ED56; Tue, 28 Jun 2022 20:55:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mAjC96V+enF+HkfVyhngLOdCeVdfQszaDYaLVl82PEQsNSex16EvgFjENwtdC3goshvZ1Z/SqiKxwiWfxF0qiOTTgL6OY/UuD05//SwrKiJ163LSoZITKuyR3Tg/mpIPsssIzoExRY4Ow4Ueyz6KR8ThLIL9b2RUwUgXslamkNaoutXWJ6+UYEZQyePLJxF56dYjtQ2CUfWAxPYIOEiz9SEtw1PdWCE6nPyxL2jq/UnrrLJMwGcQMSCo/0yWWHrTvnjBT1naMZ5fSfD0A5v44LmeNPhBqKzpE/PJu9ZVGHPSz6mPra4Jvpe2d/iR+50yqPySlCSEBVAtJks0I7Lggw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=VaeO4CpO6lgNhNxmThByUtktpghZ0UDD5SZJhWAEnj8=; b=FQ4wA/BzLLkauIyM+/1Xix4AFM+C1374j9BhKOPZBOFmFhVewPvex2e6jcZjkDkPkw5Ll1owaqzZvRx4VZllX69AmMOMOFWmC6KJFu3gAxdcbcvwGpDmNYFkkh1rycrsH9E1n2DsvHqjKy17O9pRTms1t809MwCpGOk3w8ADDfYCme/NEwTCXwIi9orxYq6y45djz8X9lyqoaEgNcUbLrVdEXbaEzam1UP/y1/0DL3lh+vhCtafmRvr6KlsmFAVwz8OBr3h5wXrpb653OZFw6bT0CZM3zV50Jlw1lrHJR2MgqJvtoM0YFVTMPchh72i9BpcpFz9iUArAK3K4IjKDiQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=nvidia.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=VaeO4CpO6lgNhNxmThByUtktpghZ0UDD5SZJhWAEnj8=; b=XRsRGv43Iwc6aFrAjLHRKMD9TdJnXdvJhBhmI9dk4q1bmMz0HjD7cY942IL66nf+/0HtpPso2xAa1V0kWpC2M9aI6d65Gt+e5zGhQm+mwsRcxsEQMb54yavwA0IxM9+ITO7+ZptUzPWp6tjP2Zz+EP97bi9oE9XNsV57Fgl/VOE= Received: from DM6PR07CA0109.namprd07.prod.outlook.com (2603:10b6:5:330::17) by DM6PR12MB4388.namprd12.prod.outlook.com (2603:10b6:5:2a9::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5373.17; Wed, 29 Jun 2022 03:55:06 +0000 Received: from DM6NAM11FT045.eop-nam11.prod.protection.outlook.com (2603:10b6:5:330:cafe::16) by DM6PR07CA0109.outlook.office365.com (2603:10b6:5:330::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5373.17 via Frontend Transport; Wed, 29 Jun 2022 03:55:06 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DM6NAM11FT045.mail.protection.outlook.com (10.13.173.123) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5373.15 via Frontend Transport; Wed, 29 Jun 2022 03:55:06 +0000 Received: from alex-MS-7B09.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.28; Tue, 28 Jun 2022 22:55:05 -0500 From: Alex Sierra To: CC: , , , , , , , , , , , , Subject: [PATCH v7 02/14] mm: add zone device coherent type memory support Date: Tue, 28 Jun 2022 22:54:14 -0500 Message-ID: <20220629035426.20013-3-alex.sierra@amd.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220629035426.20013-1-alex.sierra@amd.com> References: <20220629035426.20013-1-alex.sierra@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f87d16d4-8353-4d9c-3f62-08da59832781 X-MS-TrafficTypeDiagnostic: DM6PR12MB4388:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ePahJOOuwiAbT1nhEyww9Hk4ukCNrVcZwkTsXsckpLN5aazSxzcvZpzs8fN9wWnesO4fPfd9UlAyocVYTz3hdX9ncCG4yIkAWVQz8Q08Ge8v8gY8Zym89pIAeqYU88LQ6xtM2dUlM9KBelDgwexrGnOSG5Y4biL+cJSyxnJR0hpdrqiVH9qGkkTakSMY9NSX/JgETVcREQfyzVWluTwh29xtlg4zYSrNt+JVsknwVWFLDopjfdtuM8cZq/LDYU0h8qxsFE6FOwTi/uhuTfcAnIEnHqn2afeQNy8x2SpRByewCRBIF7CG5OkSqQExh4V86uyurKAGBYagVKVIG2PflHxDvfGrnVIPMVYziRZ6yP13zSMnZEf7eCmljUGNFA+HqRLSvYB3NrDreIvxNYs3qOQLU7T9Z5nFcJpVPsop5d3oDjbakWLpQoKxr8OFIzNMJ/MpN4c7BcdKlJQlGsH00zWAcWDwBrEoHzaObC5Brke8yKhf8RDMErRe4Rhd6msT4GCEpKIbK1TyL3r3ymctLPc0p8jo6NOW9vDAfGEx5hO68sJKKpU+7u6z8bZQRSV2y18IuTIxCtHroWN28VYS8e2EXWhzborJ9E6LnqHvpIc78gToNZGiatwIPRLWF+WvsiE3ENMkffXGdlK3nCpkY/aDeb2mFhr+aWxFOlUQhRsevRxtuAcvg3o5gA32Ei2gRQRjMq4DFFhBuoKWtpF/tyi7DGhW5p5iha9PjhNsejaRVIcUAfqYIpXmu4x1NAoNS6I/2JwVkGKrnd2LdQ0BNQ4OrNNJUUpsNF0nM/viZrL1iIEThrePIZ1uC5ZZTJVbZ6JZnWjmo/IrlHVMSlX7vw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230016)(4636009)(346002)(376002)(136003)(39860400002)(396003)(46966006)(36840700001)(40470700004)(70586007)(186003)(26005)(41300700001)(4326008)(70206006)(2616005)(16526019)(83380400001)(82310400005)(7416002)(8936002)(7696005)(47076005)(2906002)(478600001)(8676002)(86362001)(5660300002)(6666004)(6916009)(1076003)(54906003)(36860700001)(316002)(81166007)(82740400003)(426003)(36756003)(40480700001)(44832011)(356005)(40460700003)(336012)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Jun 2022 03:55:06.3778 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f87d16d4-8353-4d9c-3f62-08da59832781 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT045.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4388 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. Signed-off-by: Alex Sierra Acked-by: Felix Kuehling Reviewed-by: Alistair Popple [hch: rebased ontop of the refcount changes, removed is_dev_private_or_coherent_page] Signed-off-by: Christoph Hellwig --- include/linux/memremap.h | 22 +++++++++++++++++++++- mm/memcontrol.c | 7 ++++--- mm/memory-failure.c | 8 ++++++-- mm/memremap.c | 10 ++++++++++ mm/migrate_device.c | 16 +++++++--------- mm/rmap.c | 5 +++-- 6 files changed, 51 insertions(+), 17 deletions(-) diff --git a/include/linux/memremap.h b/include/linux/memremap.h index c272bd0af3c1..6fc0ced64b2d 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/vm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -143,6 +151,17 @@ static inline bool folio_is_device_private(const struct folio *folio) return is_device_private_page(&folio->page); } +static inline bool is_device_coherent_page(const struct page *page) +{ + return is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_COHERENT; +} + +static inline bool folio_is_device_coherent(const struct folio *folio) +{ + return is_device_coherent_page(&folio->page); +} + static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_PCI_P2PDMA) && @@ -160,7 +179,8 @@ static inline bool is_longterm_pinnable_page(struct page *page) if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) return false; #endif - return !(is_zone_movable_page(page) || + return !(is_device_coherent_page(page) || + is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page))); } #else diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 618c366a2f07..5d37a85c67da 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5665,8 +5665,8 @@ static int mem_cgroup_move_account(struct page *page, * 2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a * target for charge migration. if @target is not NULL, the entry is stored * in target->ent. - * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is MEMORY_DEVICE_PRIVATE - * (so ZONE_DEVICE page and thus not on the lru). + * 3(MC_TARGET_DEVICE): like MC_TARGET_PAGE but page is device memory and + * thus not on the lru. * For now we such page is charge like a regular page would be as for all * intent and purposes it is just special memory taking the place of a * regular page. @@ -5704,7 +5704,8 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, */ if (page_memcg(page) == mc.from) { ret = MC_TARGET_PAGE; - if (is_device_private_page(page)) + if (is_device_private_page(page) || + is_device_coherent_page(page)) ret = MC_TARGET_DEVICE; if (target) target->page = page; diff --git a/mm/memory-failure.c b/mm/memory-failure.c index da39ec8afca8..79f175eeb190 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1685,12 +1685,16 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags, goto unlock; } - if (pgmap->type == MEMORY_DEVICE_PRIVATE) { + switch (pgmap->type) { + case MEMORY_DEVICE_PRIVATE: + case MEMORY_DEVICE_COHERENT: /* - * TODO: Handle HMM pages which may need coordination + * TODO: Handle device pages which may need coordination * with device-side memory. */ goto unlock; + default: + break; } /* diff --git a/mm/memremap.c b/mm/memremap.c index b870a659eee6..0f8f08f8a991 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -315,6 +315,16 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) return ERR_PTR(-EINVAL); } break; + case MEMORY_DEVICE_COHERENT: + if (!pgmap->ops->page_free) { + WARN(1, "Missing page_free method\n"); + return ERR_PTR(-EINVAL); + } + if (!pgmap->owner) { + WARN(1, "Missing owner\n"); + return ERR_PTR(-EINVAL); + } + break; case MEMORY_DEVICE_FS_DAX: if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) { WARN(1, "File system DAX not supported\n"); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 5052093d0262..a4847ad65da3 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -518,7 +518,7 @@ EXPORT_SYMBOL(migrate_vma_setup); * handle_pte_fault() * do_anonymous_page() * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE - * private page. + * private or coherent page. */ static void migrate_vma_insert_page(struct migrate_vma *migrate, unsigned long addr, @@ -594,11 +594,8 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate, page_to_pfn(page)); entry = swp_entry_to_pte(swp_entry); } else { - /* - * For now we only support migrating to un-addressable device - * memory. - */ - if (is_zone_device_page(page)) { + if (is_zone_device_page(page) && + !is_device_coherent_page(page)) { pr_warn_once("Unsupported ZONE_DEVICE page type.\n"); goto abort; } @@ -701,10 +698,11 @@ void migrate_vma_pages(struct migrate_vma *migrate) mapping = page_mapping(page); - if (is_device_private_page(newpage)) { + if (is_device_private_page(newpage) || + is_device_coherent_page(newpage)) { /* - * For now only support private anonymous when migrating - * to un-addressable device memory. + * For now only support anonymous memory migrating to + * device private or coherent memory. */ if (mapping) { migrate->src[i] &= ~MIGRATE_PFN_MIGRATE; diff --git a/mm/rmap.c b/mm/rmap.c index 5bcb334cd6f2..04fac1af870b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1957,7 +1957,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (folio_is_zone_device(folio)) { + if (folio_is_device_private(folio)) { unsigned long pfn = folio_pfn(folio); swp_entry_t entry; pte_t swp_pte; @@ -2131,7 +2131,8 @@ void try_to_migrate(struct folio *folio, enum ttu_flags flags) TTU_SYNC))) return; - if (folio_is_zone_device(folio) && !folio_is_device_private(folio)) + if (folio_is_zone_device(folio) && + (!folio_is_device_private(folio) && !folio_is_device_coherent(folio))) return; /* -- 2.32.0