Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp7890149pxb; Fri, 19 Feb 2021 01:51:56 -0800 (PST) X-Google-Smtp-Source: ABdhPJwXxHxcPbNRYZiE7IbU7NI9j7VW7HUtuG97BjBLfOlmqiNr8uxGa248f0KWj3EW6PpqQOHa X-Received: by 2002:a50:cc0c:: with SMTP id m12mr6973425edi.154.1613728316565; Fri, 19 Feb 2021 01:51:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613728316; cv=none; d=google.com; s=arc-20160816; b=Q5uky2WxsNjtJrW5kMKyocQsKGQZg76dJ3xLhD8S4e7AOlBC8hC/Rs6R5GndyZ+uaH UlRNZQdVDaKJoRID6N7hWmVlmRMg1glthYjbof4hoLT29LY1Nt/D34TMwXxceKAjbSBw wbzVZF99Op1mSnis7IxmcmI2dhbLujan/2fJbEWLqfd/4ik5FZQBNX1KzDIC3qLmyLH/ 3M6ZWw0+WrcL8pFQtJYcXt+Wa5ZCVX22L4H5w4otPQwof7NSfWPFKBxkaidNbqwnGYs8 yY5/Je1bcK9gd8+KuJXqF7UOnwmQq7t56jC5zGAeo9GcnTQXhFuaHE90JX2W8PxWfJ6U 7L5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=ZI0RLkKXb+MTh0o6tijJ0vZsP4VuwukuwPULbyVK0mQ=; b=UhEofNH2s4oDFrLFhZnJku0KsaIGLGasHNigs9d1FgOmVK1RhcrwmIqQxXkpW+LMhg twEn3mRHSE8O7HoHi4lEABsj97QTg63rKvn7f8YIJSyi/RtRZwrrxYaoHWrF4ffeOmEH hOKUa8vIp2ui0rOUnzEW5Pcttq0/jAE1ncRveJpsr4loiC0JVfR/GcLU1hCmF3HHa4/U XseI/SrHULlmeSvxkmRjGfqXadhm9fGBSKfS7v7387hBnuuxKvtyZ3VghW8DLPxnsYIb 5vdtx28m3iPD2rbHvgKsb7PnyOs4jx9x7dUj0LiSnm2apolUTfwHV+DdBAhmt5ICVTWu s1mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=aYlimAFS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z9si5397479ejm.194.2021.02.19.01.51.32; Fri, 19 Feb 2021 01:51:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=aYlimAFS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230140AbhBSJtf (ORCPT + 99 others); Fri, 19 Feb 2021 04:49:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229636AbhBSJta (ORCPT ); Fri, 19 Feb 2021 04:49:30 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2147CC061786; Fri, 19 Feb 2021 01:48:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ZI0RLkKXb+MTh0o6tijJ0vZsP4VuwukuwPULbyVK0mQ=; b=aYlimAFS+2sygyQngCLjEk9f5x 0pEI9OFxZfCtaaeKB13nLxYUu4zvXmV0lDajMqRUzu9Fvpow+hMUrjS+a4AHeOOm3IrbhakGPgfpn 61wibgPfFl0Qz28R5vXOrRfnI8Suo9j3lNM5Hrh781KfWrQ9fNIBPO65IeheJhkMgfy6f7TjjHzcF 7lc3IUTVHAMhzUlp17Hk51ia8PEStjOvmxozV9TNRIPeJYQJAHZmZI36IYH8HHtTNa7ez5hm8Xhip jgD8cGtVcG62iiYBFywTr8VHhKdDUBX7hP5NCnjT5jnVXIVib6izttVigeC6oNafx+KBkILaniIj9 AOgRJhNg==; Received: from hch by casper.infradead.org with local (Exim 4.94 #2 (Red Hat Linux)) id 1lD2Nx-002iIr-C7; Fri, 19 Feb 2021 09:47:45 +0000 Date: Fri, 19 Feb 2021 09:47:41 +0000 From: Christoph Hellwig To: Alistair Popple Cc: linux-mm@kvack.org, nouveau@lists.freedesktop.org, bskeggs@redhat.com, akpm@linux-foundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, dri-devel@lists.freedesktop.org, jhubbard@nvidia.com, rcampbell@nvidia.com, jglisse@redhat.com, jgg@nvidia.com, hch@infradead.org, daniel@ffwll.ch Subject: Re: [PATCH v2 1/4] hmm: Device exclusive memory access Message-ID: <20210219094741.GA641389@infradead.org> References: <20210219020750.16444-1-apopple@nvidia.com> <20210219020750.16444-2-apopple@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210219020750.16444-2-apopple@nvidia.com> X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > page = migration_entry_to_page(swpent); > else if (is_device_private_entry(swpent)) > page = device_private_entry_to_page(swpent); > + else if (is_device_exclusive_entry(swpent)) > + page = device_exclusive_entry_to_page(swpent); > page = migration_entry_to_page(swpent); > else if (is_device_private_entry(swpent)) > page = device_private_entry_to_page(swpent); > + else if (is_device_exclusive_entry(swpent)) > + page = device_exclusive_entry_to_page(swpent); > if (is_device_private_entry(entry)) > page = device_private_entry_to_page(entry); > + > + if (is_device_exclusive_entry(entry)) > + page = device_exclusive_entry_to_page(entry); Any chance we can come up with a clever scheme to avoid all this boilerplate code (and maybe also what it gets compiled to)? > diff --git a/include/linux/hmm.h b/include/linux/hmm.h > index 866a0fa104c4..5d28ff6d4d80 100644 > --- a/include/linux/hmm.h > +++ b/include/linux/hmm.h > @@ -109,6 +109,10 @@ struct hmm_range { > */ > int hmm_range_fault(struct hmm_range *range); > > +int hmm_exclusive_range(struct mm_struct *mm, unsigned long start, > + unsigned long end, struct page **pages); > +vm_fault_t hmm_remove_exclusive_entry(struct vm_fault *vmf); Can we avoid the hmm naming for new code (we should probably also kill it off for the existing code)? > +#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e) \ > + || is_device_exclusive_entry(e)); }) > +#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e) \ > + || is_device_exclusive_entry(e)); }) Can you turn these into properly formatted inline functions? As-is this becomes pretty unreadable. > +static inline void make_device_exclusive_entry_read(swp_entry_t *entry) > +{ > + *entry = swp_entry(SWP_DEVICE_EXCLUSIVE_READ, swp_offset(*entry)); > +} s/make_device_exclusive_entry_read/mark_device_exclusive_entry_readable/ ?? > + > +static inline swp_entry_t make_device_exclusive_entry(struct page *page, bool write) > +{ > + return swp_entry(write ? SWP_DEVICE_EXCLUSIVE_WRITE : SWP_DEVICE_EXCLUSIVE_READ, > + page_to_pfn(page)); > +} I'd split this into two helpers, which is easier to follow and avoids the pointlessly overlong lines. > +static inline bool is_device_exclusive_entry(swp_entry_t entry) > +{ > + int type = swp_type(entry); > + return type == SWP_DEVICE_EXCLUSIVE_READ || type == SWP_DEVICE_EXCLUSIVE_WRITE; > +} Another overly long line. I also wouldn't bother with the local variable: return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_READ || swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE; > +static inline bool is_write_device_exclusive_entry(swp_entry_t entry) > +{ > + return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE; > +} Or reuse these kind of helpers.. > + > +static inline unsigned long device_exclusive_entry_to_pfn(swp_entry_t entry) > +{ > + return swp_offset(entry); > +} > + > +static inline struct page *device_exclusive_entry_to_page(swp_entry_t entry) > +{ > + return pfn_to_page(swp_offset(entry)); > +} I'd rather open code these two, and as a prep patch also kill off the equivalents for the migration and device private entries, which would actually clean up a lot of the mess mentioned in my first comment above. > +static int hmm_exclusive_skip(unsigned long start, > + unsigned long end, > + __always_unused int depth, > + struct mm_walk *walk) > +{ > + struct hmm_exclusive_walk *hmm_exclusive_walk = walk->private; > + unsigned long addr; > + > + for (addr = start; addr < end; addr += PAGE_SIZE) > + hmm_exclusive_walk->pages[hmm_exclusive_walk->npages++] = NULL; > + > + return 0; > +} Wouldn't pre-zeroing the array be simpler and more efficient? > +int hmm_exclusive_range(struct mm_struct *mm, unsigned long start, > + unsigned long end, struct page **pages) > +{ > + struct hmm_exclusive_walk hmm_exclusive_walk = { .pages = pages, .npages = 0 }; > + int i; > + > + /* Collect and lock candidate pages */ > + walk_page_range(mm, start, end, &hmm_exclusive_walk_ops, &hmm_exclusive_walk); Please avoid the overly long lines. But more importantly: Unless I'm missing something obvious this walk_page_range call just open codes get_user_pages_fast, why can't you use that? > +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_HUGETLB) > + if (PageTransHuge(page)) { > + VM_BUG_ON_PAGE(1, page); > + continue; > + } > +#endif Doesn't PageTransHuge always return false for that case? If not shouldn't we make sure it does? > + > + pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); > + if (pte_swp_soft_dirty(*pvmw.pte)) > + pte = pte_mksoft_dirty(pte); > + > + entry = pte_to_swp_entry(*pvmw.pte); > + if (pte_swp_uffd_wp(*pvmw.pte)) > + pte = pte_mkuffd_wp(pte); > + else if (is_write_device_exclusive_entry(entry)) > + pte = maybe_mkwrite(pte_mkdirty(pte), vma); > + > + set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); > + > + /* > + * No need to take a page reference as one was already > + * created when the swap entry was made. > + */ > + if (PageAnon(page)) > + page_add_anon_rmap(page, vma, pvmw.address, false); > + else > + page_add_file_rmap(page, false); > + > + if (vma->vm_flags & VM_LOCKED) > + mlock_vma_page(page); > + > + /* > + * No need to invalidate - it was non-present before. However > + * secondary CPUs may have mappings that need invalidating. > + */ > + update_mmu_cache(vma, pvmw.address, pvmw.pte); It would be nice to split out the body of this loop into a helper. > + if (!is_device_private_entry(entry) && > + !is_device_exclusive_entry(entry)) The normal style for this would be: if (!is_device_private_entry(entry) && !is_device_exclusive_entry(entry)) > - if (!is_device_private_entry(entry)) > + if (!is_device_private_entry(entry) && !is_device_exclusive_entry(entry)) Plase split this into two lines. > @@ -216,6 +219,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > } > if (!map_pte(pvmw)) > goto next_pte; > + > while (1) { > if (check_pte(pvmw)) > return true; Spurious whitespace change. > - if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) && > + if (IS_ENABLED(CONFIG_MIGRATION) && (flags & (TTU_MIGRATION | TTU_EXCLUSIVE)) && Please split this into two lines. > is_zone_device_page(page) && !is_device_private_page(page)) > return true; > > @@ -1591,6 +1591,33 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, > /* We have to invalidate as we cleared the pte */ > mmu_notifier_invalidate_range(mm, address, > address + PAGE_SIZE); > + } else if (flags & TTU_EXCLUSIVE) { try_to_unmap_one has turned into a monster. A little refactoring to split it into managable pieces would be nice.