Attachments:

(No filename) (10.16 kB)
.config.gz (32.92 kB)
Download all attachments

2021-02-19 09:51:56

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v2 1/4] hmm: Device exclusive memory access

> page = migration_entry_to_page(swpent);
> else if (is_device_private_entry(swpent))
> page = device_private_entry_to_page(swpent);
> + else if (is_device_exclusive_entry(swpent))
> + page = device_exclusive_entry_to_page(swpent);

> page = migration_entry_to_page(swpent);
> else if (is_device_private_entry(swpent))
> page = device_private_entry_to_page(swpent);
> + else if (is_device_exclusive_entry(swpent))
> + page = device_exclusive_entry_to_page(swpent);

> if (is_device_private_entry(entry))
> page = device_private_entry_to_page(entry);
> +
> + if (is_device_exclusive_entry(entry))
> + page = device_exclusive_entry_to_page(entry);

Any chance we can come up with a clever scheme to avoid all this
boilerplate code (and maybe also what it gets compiled to)?

> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 866a0fa104c4..5d28ff6d4d80 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -109,6 +109,10 @@ struct hmm_range {
> */
> int hmm_range_fault(struct hmm_range *range);
>
> +int hmm_exclusive_range(struct mm_struct *mm, unsigned long start,
> + unsigned long end, struct page **pages);
> +vm_fault_t hmm_remove_exclusive_entry(struct vm_fault *vmf);

Can we avoid the hmm naming for new code (we should probably also kill
it off for the existing code)?

> +#define free_swap_and_cache(e) ({(is_migration_entry(e) || is_device_private_entry(e) \
> + || is_device_exclusive_entry(e)); })
> +#define swapcache_prepare(e) ({(is_migration_entry(e) || is_device_private_entry(e) \
> + || is_device_exclusive_entry(e)); })

Can you turn these into properly formatted inline functions? As-is this
becomes pretty unreadable.

> +static inline void make_device_exclusive_entry_read(swp_entry_t *entry)
> +{
> + *entry = swp_entry(SWP_DEVICE_EXCLUSIVE_READ, swp_offset(*entry));
> +}

s/make_device_exclusive_entry_read/mark_device_exclusive_entry_readable/
??

> +
> +static inline swp_entry_t make_device_exclusive_entry(struct page *page, bool write)
> +{
> + return swp_entry(write ? SWP_DEVICE_EXCLUSIVE_WRITE : SWP_DEVICE_EXCLUSIVE_READ,
> + page_to_pfn(page));
> +}

I'd split this into two helpers, which is easier to follow and avoids
the pointlessly overlong lines.

> +static inline bool is_device_exclusive_entry(swp_entry_t entry)
> +{
> + int type = swp_type(entry);
> + return type == SWP_DEVICE_EXCLUSIVE_READ || type == SWP_DEVICE_EXCLUSIVE_WRITE;
> +}

Another overly long line. I also wouldn't bother with the local
variable:

return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_READ ||
swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE;

> +static inline bool is_write_device_exclusive_entry(swp_entry_t entry)
> +{
> + return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE;
> +}

Or reuse these kind of helpers..

> +
> +static inline unsigned long device_exclusive_entry_to_pfn(swp_entry_t entry)
> +{
> + return swp_offset(entry);
> +}
> +
> +static inline struct page *device_exclusive_entry_to_page(swp_entry_t entry)
> +{
> + return pfn_to_page(swp_offset(entry));
> +}

I'd rather open code these two, and as a prep patch also kill off the
equivalents for the migration and device private entries, which would
actually clean up a lot of the mess mentioned in my first comment above.

> +static int hmm_exclusive_skip(unsigned long start,
> + unsigned long end,
> + __always_unused int depth,
> + struct mm_walk *walk)
> +{
> + struct hmm_exclusive_walk *hmm_exclusive_walk = walk->private;
> + unsigned long addr;
> +
> + for (addr = start; addr < end; addr += PAGE_SIZE)
> + hmm_exclusive_walk->pages[hmm_exclusive_walk->npages++] = NULL;
> +
> + return 0;
> +}

Wouldn't pre-zeroing the array be simpler and more efficient?

> +int hmm_exclusive_range(struct mm_struct *mm, unsigned long start,
> + unsigned long end, struct page **pages)
> +{
> + struct hmm_exclusive_walk hmm_exclusive_walk = { .pages = pages, .npages = 0 };
> + int i;
> +
> + /* Collect and lock candidate pages */
> + walk_page_range(mm, start, end, &hmm_exclusive_walk_ops, &hmm_exclusive_walk);

Please avoid the overly long lines.

But more importantly: Unless I'm missing something obvious this
walk_page_range call just open codes get_user_pages_fast, why can't you
use that?

> +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_HUGETLB)
> + if (PageTransHuge(page)) {
> + VM_BUG_ON_PAGE(1, page);
> + continue;
> + }
> +#endif

Doesn't PageTransHuge always return false for that case? If not
shouldn't we make sure it does?

> +
> + pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot)));
> + if (pte_swp_soft_dirty(*pvmw.pte))
> + pte = pte_mksoft_dirty(pte);
> +
> + entry = pte_to_swp_entry(*pvmw.pte);
> + if (pte_swp_uffd_wp(*pvmw.pte))
> + pte = pte_mkuffd_wp(pte);
> + else if (is_write_device_exclusive_entry(entry))
> + pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> +
> + set_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte);
> +
> + /*
> + * No need to take a page reference as one was already
> + * created when the swap entry was made.
> + */
> + if (PageAnon(page))
> + page_add_anon_rmap(page, vma, pvmw.address, false);
> + else
> + page_add_file_rmap(page, false);
> +
> + if (vma->vm_flags & VM_LOCKED)
> + mlock_vma_page(page);
> +
> + /*
> + * No need to invalidate - it was non-present before. However
> + * secondary CPUs may have mappings that need invalidating.
> + */
> + update_mmu_cache(vma, pvmw.address, pvmw.pte);

It would be nice to split out the body of this loop into a helper.

> + if (!is_device_private_entry(entry) &&
> + !is_device_exclusive_entry(entry))

The normal style for this would be:

if (!is_device_private_entry(entry) &&
!is_device_exclusive_entry(entry))

> - if (!is_device_private_entry(entry))
> + if (!is_device_private_entry(entry) && !is_device_exclusive_entry(entry))

Plase split this into two lines.

> @@ -216,6 +219,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> }
> if (!map_pte(pvmw))
> goto next_pte;
> +
> while (1) {
> if (check_pte(pvmw))
> return true;

Spurious whitespace change.

> - if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> + if (IS_ENABLED(CONFIG_MIGRATION) && (flags & (TTU_MIGRATION | TTU_EXCLUSIVE)) &&

Please split this into two lines.

> is_zone_device_page(page) && !is_device_private_page(page))
> return true;
>
> @@ -1591,6 +1591,33 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> /* We have to invalidate as we cleared the pte */
> mmu_notifier_invalidate_range(mm, address,
> address + PAGE_SIZE);
> + } else if (flags & TTU_EXCLUSIVE) {

try_to_unmap_one has turned into a monster. A little refactoring to
split it into managable pieces would be nice.

2021-02-19 14:02:54

by Jason Gunthorpe

[permalink] [raw]

Subject: Re: [PATCH v2 1/4] hmm: Device exclusive memory access

On Fri, Feb 19, 2021 at 09:47:41AM +0000, Christoph Hellwig wrote:

> > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > index 866a0fa104c4..5d28ff6d4d80 100644
> > +++ b/include/linux/hmm.h
> > @@ -109,6 +109,10 @@ struct hmm_range {
> > */
> > int hmm_range_fault(struct hmm_range *range);
> >
> > +int hmm_exclusive_range(struct mm_struct *mm, unsigned long start,
> > + unsigned long end, struct page **pages);
> > +vm_fault_t hmm_remove_exclusive_entry(struct vm_fault *vmf);
>
> Can we avoid the hmm naming for new code (we should probably also kill
> it off for the existing code)?

Yes please, I'd prefer it if hmm.c was just the special page walker
and not a grab bag of unrelated things

Is there is a more natural place to put this in the mm for this idea?

Jason

2021-02-22 10:51:42

by Alistair Popple

[permalink] [raw]

Subject: Re: [PATCH v2 1/4] hmm: Device exclusive memory access

On Friday, 19 February 2021 8:47:41 PM AEDT Christoph Hellwig wrote:
> > page = migration_entry_to_page(swpent);
> > else if (is_device_private_entry(swpent))
> > page = device_private_entry_to_page(swpent);
> > + else if (is_device_exclusive_entry(swpent))
> > + page = device_exclusive_entry_to_page(swpent);
>
> > page = migration_entry_to_page(swpent);
> > else if (is_device_private_entry(swpent))
> > page = device_private_entry_to_page(swpent);
> > + else if (is_device_exclusive_entry(swpent))
> > + page = device_exclusive_entry_to_page(swpent);
>
> > if (is_device_private_entry(entry))
> > page = device_private_entry_to_page(entry);
> > +
> > + if (is_device_exclusive_entry(entry))
> > + page = device_exclusive_entry_to_page(entry);
>
> Any chance we can come up with a clever scheme to avoid all this
> boilerplate code (and maybe also what it gets compiled to)?

If I open code the entry_to_page() functions as suggested below then these
simplify down to single if statements like so:

if (is_migration_entry(entry) ||
is_device_private_entry(entry) ||
is_device_exclusive_entry(entry))
page = pfn_to_page(swp_offset(entry));

I could simplify further by hiding that in a single static inline like so:

static inline bool is_special_entry(swp_entry_t entry)
{
return is_migration_entry(entry) ||
is_device_private_entry(entry) ||
is_device_exclusive_entry(entry);
}

My only concern with doing that is these entries can't *always* be treated the
same so it might make it too easy to overlook the subtle differences.

> > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > index 866a0fa104c4..5d28ff6d4d80 100644
> > --- a/include/linux/hmm.h
> > +++ b/include/linux/hmm.h
> > @@ -109,6 +109,10 @@ struct hmm_range {
> > */
> > int hmm_range_fault(struct hmm_range *range);
> >
> > +int hmm_exclusive_range(struct mm_struct *mm, unsigned long start,
> > + unsigned long end, struct page **pages);
> > +vm_fault_t hmm_remove_exclusive_entry(struct vm_fault *vmf);
>
> Can we avoid the hmm naming for new code (we should probably also kill
> it off for the existing code)?

Sure. I ended up stuffing it in there for the moment because I didn't know of
any other "obvious" spot to put it. How about these for names:

int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,
unsigned long end, struct page **pages);
vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf);

I am open to any alternative naming suggestions though.

> > +#define free_swap_and_cache(e) ({(is_migration_entry(e) ||
is_device_private_entry(e) \
> > + || is_device_exclusive_entry(e)); })
> > +#define swapcache_prepare(e) ({(is_migration_entry(e) ||
is_device_private_entry(e) \
> > + || is_device_exclusive_entry(e)); })
> Can you turn these into properly formatted inline functions? As-is this
> becomes pretty unreadable.

Ok, if I add a is_special_entry() function as suggested above these could just
use that.

> > +static inline void make_device_exclusive_entry_read(swp_entry_t *entry)
> > +{
> > + *entry = swp_entry(SWP_DEVICE_EXCLUSIVE_READ, swp_offset(*entry));
> > +}
>
> s/make_device_exclusive_entry_read/mark_device_exclusive_entry_readable/
> ??

See my next comment.

> > +
> > +static inline swp_entry_t make_device_exclusive_entry(struct page *page,
bool write)
> > +{
> > + return swp_entry(write ? SWP_DEVICE_EXCLUSIVE_WRITE :
SWP_DEVICE_EXCLUSIVE_READ,
> > + page_to_pfn(page));
> > +}
>
> I'd split this into two helpers, which is easier to follow and avoids
> the pointlessly overlong lines.

I assume you mean separate read and write functions instead of using the write
flag?

These are based on the existing device private functions which themselves
looked to be based on the migration entry functions. It would be good to keep
these consistent so when making the changes above I would also refactor the
existing make_device_private_entry() and make_migration_entry() functions to
match as well.

As a test I tried refactoring the migration entry functions into the below and
it seemed to make things a bit easier to follow if not a little verbose:

static inline int is_writable_migration_entry(swp_entry_t entry);
static inline swp_entry_t make_readable_migration_entry(pgoff_t offset);
static inline swp_entry_t make_writable_migration_entry(pgoff_t offset);
static inline int is_migration_entry(swp_entry_t entry);

So I can do that along with the same refactoring of device private accessor
functions as a prep patch if that seems reasonable?

> > +static inline bool is_device_exclusive_entry(swp_entry_t entry)
> > +{
> > + int type = swp_type(entry);
> > + return type == SWP_DEVICE_EXCLUSIVE_READ || type ==
SWP_DEVICE_EXCLUSIVE_WRITE;
> > +}
>
> Another overly long line. I also wouldn't bother with the local
> variable:
>
> return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_READ ||
> swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE;
>
>
> > +static inline bool is_write_device_exclusive_entry(swp_entry_t entry)
> > +{
> > + return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE;
> > +}
>
> Or reuse these kind of helpers..
>
> > +
> > +static inline unsigned long device_exclusive_entry_to_pfn(swp_entry_t
entry)
> > +{
> > + return swp_offset(entry);
> > +}
> > +
> > +static inline struct page *device_exclusive_entry_to_page(swp_entry_t
entry)
> > +{
> > + return pfn_to_page(swp_offset(entry));
> > +}
>
> I'd rather open code these two, and as a prep patch also kill off the
> equivalents for the migration and device private entries, which would
> actually clean up a lot of the mess mentioned in my first comment above.

Ok, I have just tried doing that and it ends up being more concise and
accurate so I'll add it to the series.

> > +static int hmm_exclusive_skip(unsigned long start,
> > + unsigned long end,
> > + __always_unused int depth,
> > + struct mm_walk *walk)
> > +{
> > + struct hmm_exclusive_walk *hmm_exclusive_walk = walk->private;
> > + unsigned long addr;
> > +
> > + for (addr = start; addr < end; addr += PAGE_SIZE)
> > + hmm_exclusive_walk->pages[hmm_exclusive_walk->npages++] = NULL;
> > +
> > + return 0;
> > +}
>
> Wouldn't pre-zeroing the array be simpler and more efficient?

Good point. I had other code here but I didn't need it and should have just
removed what was left.

> > +int hmm_exclusive_range(struct mm_struct *mm, unsigned long start,
> > + unsigned long end, struct page **pages)
> > +{
> > + struct hmm_exclusive_walk hmm_exclusive_walk = { .pages = pages,
.npages = 0 };
> > + int i;
> > +
> > + /* Collect and lock candidate pages */
> > + walk_page_range(mm, start, end, &hmm_exclusive_walk_ops,
&hmm_exclusive_walk);
>
> Please avoid the overly long lines.
>
> But more importantly: Unless I'm missing something obvious this
> walk_page_range call just open codes get_user_pages_fast, why can't you
> use that?

Good idea, you aren't missing anything although I don't think
get_user_pages_fast() will work as the driver needs to be able to operate on a
remote mm_struct.

But something like get_user_pages_remote() would work better than open-coding
it.

> > +#if defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) || defined(CONFIG_HUGETLB)
> > + if (PageTransHuge(page)) {
> > + VM_BUG_ON_PAGE(1, page);
> > + continue;
> > + }
> > +#endif
>
> Doesn't PageTransHuge always return false for that case? If not
> shouldn't we make sure it does?

Right, this is probably an overly defensive check so I'll remove it.

[snip - will address code format/structure suggestions]

>
> try_to_unmap_one has turned into a monster. A little refactoring to
> split it into managable pieces would be nice.
>

Agreed, adding this it did seem a little unwieldy. I will see if I can do a
bit of refactoring for the next version.

- Alistair