by Daniel Gomez

[permalink] [raw]

Subject: Re: [PATCH v3 0/6] add mTHP support for anonymous shmem

On Thu, Jun 06, 2024 at 10:38:22AM +0200, David Hildenbrand wrote:
> On 06.06.24 05:31, Baolin Wang wrote:
> >
> >
> > On 2024/6/4 20:05, Daniel Gomez wrote:
> > > On Tue, Jun 04, 2024 at 05:45:20PM +0800, Baolin Wang wrote:
> > > >
> > > >
> > > > On 2024/6/4 16:18, Daniel Gomez wrote:
> > > > > On Fri, May 31, 2024 at 01:13:48PM +0200, David Hildenbrand wrote:
> > > > > > > >
> > > > > > > > As a default, we should not be using large folios / mTHP for any shmem,
> > > > > > > > just like we did with THP via shmem_enabled. This is what this series
> > > > > > > > currently does, and is aprt of the whole mTHP user-space interface design.
> > > > > > > >
> > > > > > > > Further, the mTHP controls should control all of shmem, not only
> > > > > > > > "anonymous shmem".
> > > > > > >
> > > > > > > Yes, that's what I thought and in my TODO list.
> > > > > >
> > > > > > Good, it would be helpful to coordinate with Daniel and Pankaj.
> > > > >
> > > > > I've integrated patches 11 and 12 from the lsf RFC thread [1] on top of Baolin's
> > > > > v3 patches. You may find a version in my integration branch here [2]. I can
> > > > > attach them here if it's preferred.
> > > > >
> > > > > [1] https://lore.kernel.org/all/[email protected]/
> > > > > [2] https://protect2.fireeye.com/v1/url?k=a23e7c06-c3b56926-a23ff749-74fe485fb347-371ca2bfd5d9869f&q=1&e=6974304e-a786-4255-93a7-57498540241c&u=https%3A%2F%2Fgitlab.com%2Fdkruces%2Flinux-next%2F-%2Fcommits%2Fnext-20240604-shmem-mthp
> > > > >
> > > > > The point here is to combine the large folios strategy I proposed with mTHP
> > > > > user controls. Would it make sense to limit the orders to the mapping order
> > > > > calculated based on the size and index?
> > > >
> > > > IMO, for !anon shmem, this change makes sense to me. We should respect the
> > > > size and mTHP should act as a order filter.
> > >
> > > What about respecing the size when within_size flag is enabled? Then, 'always'
> > > would allocate mTHP enabled folios, regardless of the size. And 'never'
> > > would ignore mTHP and size. So, 'never' can be used for this 'safe' boot case
> > > mentioned in the discussion.
> >
> > Looks reasonable to me. What do you think, David?
> >
>
> That mimics existing PMD-THP behavior, right?
>
> With "within_size", we must not exceed the size, with "always", we may
> exceed the size.

But right now we only check the inode size. With large folio support in
write_iter() we can have access to the length as well. I think this would solve
(paratially?) the cases where we don't have access to the file size and if we
perform writes in bigger chunks.

E.g. xfs_io -t -f -c "pwrite -b 2M -S 0x58 0 2M" /mnt/test/file

For 'within_size', the example above would allocate 512 pages instead of one
huge page. After patches [1] [2] we can get the size of the write to allocate
whatever mTHP/THP makes more sense for the length being passed.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

Here a quick hack for THP:

@@ -561,7 +561,8 @@ bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
case SHMEM_HUGE_WITHIN_SIZE:
index = round_up(index + 1, HPAGE_PMD_NR);
i_size = round_up(i_size_read(inode), PAGE_SIZE);
- if (i_size >> PAGE_SHIFT >= index)
+ if ((i_size >> PAGE_SHIFT >= index) ||
+ (len >> PAGE_SHIFT >= index))
return true;
fallthrough;

>
> > And what about 'advise' option? Silimar to 'within_size'?
>
> Good question. What's the behavior of PMD-THP? I would assume it behaves
> like "within_size", because in the common case we mmap (+advise) only within
> the size of the file, not exceeding it.

It allocates a huge page on request when MADV_HUGEPAGE (regardless of the size).

>
> (the always option, as I learned during the meeting, is primarily helpful
> when writing to tmpfs files in an append-fashion. With mmap()+madvise() that
> doesn't quite happen)

Let's benefit of larger writes as well if the chunk matches any of the mTHP/
THP sizes.

>
> --
> Cheers,
>
> David / dhildenb
>

2024-06-07 10:40:14

by David Hildenbrand

[permalink] [raw]

Subject: Re: [PATCH v3 0/6] add mTHP support for anonymous shmem

On 07.06.24 11:05, Daniel Gomez wrote:
> On Thu, Jun 06, 2024 at 10:38:22AM +0200, David Hildenbrand wrote:
>> On 06.06.24 05:31, Baolin Wang wrote:
>>>
>>>
>>> On 2024/6/4 20:05, Daniel Gomez wrote:
>>>> On Tue, Jun 04, 2024 at 05:45:20PM +0800, Baolin Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/6/4 16:18, Daniel Gomez wrote:
>>>>>> On Fri, May 31, 2024 at 01:13:48PM +0200, David Hildenbrand wrote:
>>>>>>>>>
>>>>>>>>> As a default, we should not be using large folios / mTHP for any shmem,
>>>>>>>>> just like we did with THP via shmem_enabled. This is what this series
>>>>>>>>> currently does, and is aprt of the whole mTHP user-space interface design.
>>>>>>>>>
>>>>>>>>> Further, the mTHP controls should control all of shmem, not only
>>>>>>>>> "anonymous shmem".
>>>>>>>>
>>>>>>>> Yes, that's what I thought and in my TODO list.
>>>>>>>
>>>>>>> Good, it would be helpful to coordinate with Daniel and Pankaj.
>>>>>>
>>>>>> I've integrated patches 11 and 12 from the lsf RFC thread [1] on top of Baolin's
>>>>>> v3 patches. You may find a version in my integration branch here [2]. I can
>>>>>> attach them here if it's preferred.
>>>>>>
>>>>>> [1] https://lore.kernel.org/all/[email protected]/
>>>>>> [2] https://protect2.fireeye.com/v1/url?k=a23e7c06-c3b56926-a23ff749-74fe485fb347-371ca2bfd5d9869f&q=1&e=6974304e-a786-4255-93a7-57498540241c&u=https%3A%2F%2Fgitlab.com%2Fdkruces%2Flinux-next%2F-%2Fcommits%2Fnext-20240604-shmem-mthp
>>>>>>
>>>>>> The point here is to combine the large folios strategy I proposed with mTHP
>>>>>> user controls. Would it make sense to limit the orders to the mapping order
>>>>>> calculated based on the size and index?
>>>>>
>>>>> IMO, for !anon shmem, this change makes sense to me. We should respect the
>>>>> size and mTHP should act as a order filter.
>>>>
>>>> What about respecing the size when within_size flag is enabled? Then, 'always'
>>>> would allocate mTHP enabled folios, regardless of the size. And 'never'
>>>> would ignore mTHP and size. So, 'never' can be used for this 'safe' boot case
>>>> mentioned in the discussion.
>>>
>>> Looks reasonable to me. What do you think, David?
>>>
>>
>> That mimics existing PMD-THP behavior, right?
>>
>> With "within_size", we must not exceed the size, with "always", we may
>> exceed the size.
>
> But right now we only check the inode size. With large folio support in
> write_iter() we can have access to the length as well. I think this would solve
> (paratially?) the cases where we don't have access to the file size and if we
> perform writes in bigger chunks.
>
> E.g. xfs_io -t -f -c "pwrite -b 2M -S 0x58 0 2M" /mnt/test/file
>
> For 'within_size', the example above would allocate 512 pages instead of one
> huge page. After patches [1] [2] we can get the size of the write to allocate
> whatever mTHP/THP makes more sense for the length being passed.
>

Yes, although this sounds like an optimization/change we should be doing
separately I guess.

> [1] https://lore.kernel.org/all/[email protected]/
> [2] https://lore.kernel.org/all/[email protected]/
>
> Here a quick hack for THP:
>
> @@ -561,7 +561,8 @@ bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
> case SHMEM_HUGE_WITHIN_SIZE:
> index = round_up(index + 1, HPAGE_PMD_NR);
> i_size = round_up(i_size_read(inode), PAGE_SIZE);
> - if (i_size >> PAGE_SHIFT >= index)
> + if ((i_size >> PAGE_SHIFT >= index) ||
> + (len >> PAGE_SHIFT >= index))
> return true;
> fallthrough;
>
>
>>
>>> And what about 'advise' option? Silimar to 'within_size'?
>>
>> Good question. What's the behavior of PMD-THP? I would assume it behaves
>> like "within_size", because in the common case we mmap (+advise) only within
>> the size of the file, not exceeding it.
>
> It allocates a huge page on request when MADV_HUGEPAGE (regardless of the size).
>

Interesting, so we should do the same. Thanks!

--
Cheers,

David / dhildenb