2023-07-31 17:06:23

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare)

On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote:
> Assume we do do the page table sharing at mmap time, if the flags are right.
> Let's focus on the most common:
>
> mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED)
>
> And doing the same in each and every process.

That may be the most common in your usage, but for a database, you're
looking at two usage scenarios. Postgres calls mmap() on the database
file itself so that all processes share the kernel page cache.
Some Commercial Databases call mmap() on a hugetlbfs file so that all
processes share the same userspace buffer cache. Other Commecial
Databases call shmget() / shmat() with SHM_HUGETLB for the exact
same reason.

This is why I proposed mshare(). Anyone can use it for anything.
We have such a diverse set of users who want to do stuff with shared
page tables that we should not be tying it to memfd or any other
filesystem. Not to mention that it's more flexible; you can map
individual 4kB files into it and still get page table sharing.



2023-07-31 18:12:00

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare)

On 31.07.23 18:38, Matthew Wilcox wrote:
> On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote:
>> Assume we do do the page table sharing at mmap time, if the flags are right.
>> Let's focus on the most common:
>>
>> mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED)
>>
>> And doing the same in each and every process.
>
> That may be the most common in your usage, but for a database, you're
> looking at two usage scenarios. Postgres calls mmap() on the database
> file itself so that all processes share the kernel page cache.
> Some Commercial Databases call mmap() on a hugetlbfs file so that all
> processes share the same userspace buffer cache. Other Commecial
> Databases call shmget() / shmat() with SHM_HUGETLB for the exact
> same reason.

I remember you said that postgres might be looking into using shmem as
well, maybe I am wrong.

memfd/hugetlb/shmem could all be handled alike, just "arbitrary
filesystems" would require more work.

>
> This is why I proposed mshare(). Anyone can use it for anything.
> We have such a diverse set of users who want to do stuff with shared
> page tables that we should not be tying it to memfd or any other
> filesystem. Not to mention that it's more flexible; you can map
> individual 4kB files into it and still get page table sharing.

That's not what the current proposal does, or am I wrong?

Also, I'm curious, is that a real requirement in the database world?

--
Cheers,

David / dhildenb


2023-08-01 07:16:49

by Rongwei Wang

[permalink] [raw]
Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare)


On 2023/8/1 00:38, Matthew Wilcox wrote:
> On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote:
>> Assume we do do the page table sharing at mmap time, if the flags are right.
>> Let's focus on the most common:
>>
>> mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED)
>>
>> And doing the same in each and every process.
> That may be the most common in your usage, but for a database, you're
> looking at two usage scenarios. Postgres calls mmap() on the database
> file itself so that all processes share the kernel page cache.
> Some Commercial Databases call mmap() on a hugetlbfs file so that all
> processes share the same userspace buffer cache. Other Commecial
> Databases call shmget() / shmat() with SHM_HUGETLB for the exact
> same reason.
>
> This is why I proposed mshare(). Anyone can use it for anything.

Hi Matthew

I'm a little confused about this mshare(). Which one is the mshare() you
refer to here, previous mshare() based on filesystem or this RFC v2
posted by Khalid?

IMHO, they have much difference between previously mshare() and
MAP_SHARED_PT now.

> We have such a diverse set of users who want to do stuff with shared
> page tables that we should not be tying it to memfd or any other
> filesystem. Not to mention that it's more flexible; you can map
> individual 4kB files into it and still get page table sharing.

2023-08-01 19:51:00

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare)

On Tue, Aug 01, 2023 at 02:53:02PM +0800, Rongwei Wang wrote:
>
> On 2023/8/1 00:38, Matthew Wilcox wrote:
> > On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote:
> > > Assume we do do the page table sharing at mmap time, if the flags are right.
> > > Let's focus on the most common:
> > >
> > > mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED)
> > >
> > > And doing the same in each and every process.
> > That may be the most common in your usage, but for a database, you're
> > looking at two usage scenarios. Postgres calls mmap() on the database
> > file itself so that all processes share the kernel page cache.
> > Some Commercial Databases call mmap() on a hugetlbfs file so that all
> > processes share the same userspace buffer cache. Other Commecial
> > Databases call shmget() / shmat() with SHM_HUGETLB for the exact
> > same reason.
> >
> > This is why I proposed mshare(). Anyone can use it for anything.
>
> Hi Matthew
>
> I'm a little confused about this mshare(). Which one is the mshare() you
> refer to here, previous mshare() based on filesystem or this RFC v2 posted
> by Khalid?
>
> IMHO, they have much difference between previously mshare() and
> MAP_SHARED_PT now.

I haven't read this version of the patchset. I'm describing the original
idea, not what it may have turned into. As far as I'm concerned, we're
still trying to decide what functionality we actually want, not arguing
about whether this exact patchset has the correct number of tab indents
to be merged.