On Tue, Jun 27, 2017 at 09:08:40AM -0700, Prakash Sangappa wrote:
> Applications like the database use hugetlbfs for performance reason.
> Files on hugetlbfs filesystem are created and huge pages allocated
> using fallocate() API. Pages are deallocated/freed using fallocate() hole
> punching support. These files are mmap'ed and accessed by many
> single threaded processes as shared memory. The database keeps
> track of which offsets in the hugetlbfs file have pages allocated.
>
> Any access to mapped address over holes in the file, which can occur due
> to bugs in the application, is considered invalid and expect the process
> to simply receive a SIGBUS. However, currently when a hole in the file is
> accessed via the mmap'ed address, kernel/mm attempts to automatically
> allocate a page at page fault time, resulting in implicitly filling the
> hole in the file. This may not be the desired behavior for applications
> like the database that want to explicitly manage page allocations of
> hugetlbfs files. The requirement here is for a way to prevent the kernel
> from implicitly allocating a page to fill holes in hugetbfs file.
>
> This can be achieved using userfaultfd mechanism to intercept page-fault
> events when mmap'ed address over holes in the file are accessed, and
> prevent kernel from implicitly filling the hole. However, currently using
> userfaultfd would require each of the database processes to use a monitor
> thread and the setup cost associated with it, is considered an overhead.
>
> It would be better if userfaultd mechanism could have a way to request
> simply sending a signal,for the robustness use case described above.
> This would not require the use of a monitor thread.
>
> This patch adds the feature to userfaultfd mechanism to request for a
> SIGBUS signal delivery to the faulting process, instead of the
> page-fault event.
>
> See following for previous discussion about a different solution
> to the above database requirement, leading to this proposal to enhance
> userfaultfd, as suggested by Andrea.
>
> http://www.spinics.net/lists/linux-mm/msg129224.html
>
> Signed-off-by: Prakash <[email protected]>
> ---
> fs/userfaultfd.c | 5 +++++
> include/uapi/linux/userfaultfd.h | 10 +++++++++-
> 2 files changed, 14 insertions(+), 1 deletion(-)
Apparently your mail client clobbered the white space, can you please
resend with proper formatting?
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 1d622f2..5686d6d2 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -371,6 +371,11 @@ int handle_userfault(struct vm_fault *vmf, unsigned
> long reason)
> VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
> VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
>
> + if (ctx->features & UFFD_FEATURE_SIGBUS) {
> + goto out;
> + }
Please remove the curly braces.
> +
> /*
> * If it's already released don't get it. This avoids to loop
> * in __get_user_pages if userfaultfd_release waits on the
> diff --git a/include/uapi/linux/userfaultfd.h
> b/include/uapi/linux/userfaultfd.h
> index 3b05953..d39d5db 100644
> --- a/include/uapi/linux/userfaultfd.h
> +++ b/include/uapi/linux/userfaultfd.h
> @@ -23,7 +23,8 @@
> UFFD_FEATURE_EVENT_REMOVE | \
> UFFD_FEATURE_EVENT_UNMAP | \
> UFFD_FEATURE_MISSING_HUGETLBFS | \
> - UFFD_FEATURE_MISSING_SHMEM)
> + UFFD_FEATURE_MISSING_SHMEM | \
> + UFFD_FEATURE_SIGBUS)
> #define UFFD_API_IOCTLS \
> ((__u64)1 << _UFFDIO_REGISTER | \
> (__u64)1 << _UFFDIO_UNREGISTER | \
> @@ -153,6 +154,12 @@ struct uffdio_api {
> * UFFD_FEATURE_MISSING_SHMEM works the same as
> * UFFD_FEATURE_MISSING_HUGETLBFS, but it applies to shmem
> * (i.e. tmpfs and other shmem based APIs).
> + *
> + * UFFD_FEATURE_SIGBUS feature means no page-fault
> + * (UFFD_EVENT_PAGEFAULT) event will be delivered, instead
> + * a SIGBUS signal will be sent to the faulting process.
> + * The application process can enable this behavior by adding
> + * it to uffdio_api.features.
I think that it maybe worth making UFFD_FEATURE_SIGBUS mutually exclusive
with the non-cooperative events. There is no point of having monitor if the
page fault handler will anyway just kill the faulting process.
> */
> #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0)
> #define UFFD_FEATURE_EVENT_FORK (1<<1)
> @@ -161,6 +168,7 @@ struct uffdio_api {
> #define UFFD_FEATURE_MISSING_HUGETLBFS (1<<4)
> #define UFFD_FEATURE_MISSING_SHMEM (1<<5)
> #define UFFD_FEATURE_EVENT_UNMAP (1<<6)
> +#define UFFD_FEATURE_SIGBUS (1<<7)
> __u64 features;
>
> __u64 ioctls;
> --
> 2.7.4
On 07/04/2017 11:28 AM, Mike Rapoport wrote:
> On Tue, Jun 27, 2017 at 09:08:40AM -0700, Prakash Sangappa wrote:
>> Applications like the database use hugetlbfs for performance reason.
>> Files on hugetlbfs filesystem are created and huge pages allocated
>> using fallocate() API. Pages are deallocated/freed using fallocate() hole
>> punching support. These files are mmap'ed and accessed by many
>> single threaded processes as shared memory. The database keeps
>> track of which offsets in the hugetlbfs file have pages allocated.
>>
>> Any access to mapped address over holes in the file, which can occur due
>> to bugs in the application, is considered invalid and expect the process
>> to simply receive a SIGBUS. However, currently when a hole in the file is
>> accessed via the mmap'ed address, kernel/mm attempts to automatically
>> allocate a page at page fault time, resulting in implicitly filling the
>> hole in the file. This may not be the desired behavior for applications
>> like the database that want to explicitly manage page allocations of
>> hugetlbfs files. The requirement here is for a way to prevent the kernel
>> from implicitly allocating a page to fill holes in hugetbfs file.
>>
>> This can be achieved using userfaultfd mechanism to intercept page-fault
>> events when mmap'ed address over holes in the file are accessed, and
>> prevent kernel from implicitly filling the hole. However, currently using
>> userfaultfd would require each of the database processes to use a monitor
>> thread and the setup cost associated with it, is considered an overhead.
>>
>> It would be better if userfaultd mechanism could have a way to request
>> simply sending a signal,for the robustness use case described above.
>> This would not require the use of a monitor thread.
>>
>> This patch adds the feature to userfaultfd mechanism to request for a
>> SIGBUS signal delivery to the faulting process, instead of the
>> page-fault event.
>>
>> See following for previous discussion about a different solution
>> to the above database requirement, leading to this proposal to enhance
>> userfaultfd, as suggested by Andrea.
>>
>> http://www.spinics.net/lists/linux-mm/msg129224.html
>>
>> Signed-off-by: Prakash <[email protected]>
>> ---
>> fs/userfaultfd.c | 5 +++++
>> include/uapi/linux/userfaultfd.h | 10 +++++++++-
>> 2 files changed, 14 insertions(+), 1 deletion(-)
> Apparently your mail client clobbered the white space, can you please
> resend with proper formatting?
>
Ok, Will resend the patch along with suggested changes.
>> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
>> index 1d622f2..5686d6d2 100644
>> --- a/fs/userfaultfd.c
>> +++ b/fs/userfaultfd.c
>> @@ -371,6 +371,11 @@ int handle_userfault(struct vm_fault *vmf, unsigned
>> long reason)
>> VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP));
>> VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP));
>>
>> + if (ctx->features & UFFD_FEATURE_SIGBUS) {
>> + goto out;
>> + }
> Please remove the curly braces.
Ok,
>
>> +
>> /*
>> * If it's already released don't get it. This avoids to loop
>> * in __get_user_pages if userfaultfd_release waits on the
>> diff --git a/include/uapi/linux/userfaultfd.h
>> b/include/uapi/linux/userfaultfd.h
>> index 3b05953..d39d5db 100644
>> --- a/include/uapi/linux/userfaultfd.h
>> +++ b/include/uapi/linux/userfaultfd.h
>> @@ -23,7 +23,8 @@
>> UFFD_FEATURE_EVENT_REMOVE | \
>> UFFD_FEATURE_EVENT_UNMAP | \
>> UFFD_FEATURE_MISSING_HUGETLBFS | \
>> - UFFD_FEATURE_MISSING_SHMEM)
>> + UFFD_FEATURE_MISSING_SHMEM | \
>> + UFFD_FEATURE_SIGBUS)
>> #define UFFD_API_IOCTLS \
>> ((__u64)1 << _UFFDIO_REGISTER | \
>> (__u64)1 << _UFFDIO_UNREGISTER | \
>> @@ -153,6 +154,12 @@ struct uffdio_api {
>> * UFFD_FEATURE_MISSING_SHMEM works the same as
>> * UFFD_FEATURE_MISSING_HUGETLBFS, but it applies to shmem
>> * (i.e. tmpfs and other shmem based APIs).
>> + *
>> + * UFFD_FEATURE_SIGBUS feature means no page-fault
>> + * (UFFD_EVENT_PAGEFAULT) event will be delivered, instead
>> + * a SIGBUS signal will be sent to the faulting process.
>> + * The application process can enable this behavior by adding
>> + * it to uffdio_api.features.
> I think that it maybe worth making UFFD_FEATURE_SIGBUS mutually exclusive
> with the non-cooperative events. There is no point of having monitor if the
> page fault handler will anyway just kill the faulting process.
Will this not be too restrictive?. The non-cooperative events could
still be useful if an application wants to track changes
to VA ranges that are registered even though it expects
a signal on page fault.
On Wed, Jul 05, 2017 at 05:41:14PM -0700, prakash.sangappa wrote:
>
>
> On 07/04/2017 11:28 AM, Mike Rapoport wrote:
> >On Tue, Jun 27, 2017 at 09:08:40AM -0700, Prakash Sangappa wrote:
> >>Applications like the database use hugetlbfs for performance reason.
> >>Files on hugetlbfs filesystem are created and huge pages allocated
> >>using fallocate() API. Pages are deallocated/freed using fallocate() hole
> >>punching support. These files are mmap'ed and accessed by many
> >>single threaded processes as shared memory. The database keeps
> >>track of which offsets in the hugetlbfs file have pages allocated.
> >>
[ ... ]
> >>+ *
> >>+ * UFFD_FEATURE_SIGBUS feature means no page-fault
> >>+ * (UFFD_EVENT_PAGEFAULT) event will be delivered, instead
> >>+ * a SIGBUS signal will be sent to the faulting process.
> >>+ * The application process can enable this behavior by adding
> >>+ * it to uffdio_api.features.
> >I think that it maybe worth making UFFD_FEATURE_SIGBUS mutually exclusive
> >with the non-cooperative events. There is no point of having monitor if the
> >page fault handler will anyway just kill the faulting process.
>
>
> Will this not be too restrictive?. The non-cooperative events could
> still be useful if an application wants to track changes
> to VA ranges that are registered even though it expects
> a signal on page fault.
I wouldn't say that we must make UFFD_FEATURE_SIGBUS mutually exclusive
with other events, but, IMHO, it's something we should at least think
about.
In my view, if you anyway have uffd monitor, you may process page faults
there as well and then there is no actual need in UFFD_FEATURE_SIGBUS.
--
Sincerely yours,
Mike.
>On Wed, Jul 05, 2017 at 05:41:14PM -0700, prakash.sangappa wrote:
>> On 07/04/2017 11:28 AM, Mike Rapoport wrote:
>> >On Tue, Jun 27, 2017 at 09:08:40AM -0700, Prakash Sangappa wrote:
>> >>Applications like the database use hugetlbfs for performance reason.
>> >>Files on hugetlbfs filesystem are created and huge pages allocated
>> >>using fallocate() API. Pages are deallocated/freed using fallocate() hole
>> >>punching support. These files are mmap'ed and accessed by many
>> >>single threaded processes as shared memory. The database keeps
>> >>track of which offsets in the hugetlbfs file have pages allocated.
>> >>
[ ... ]
>> >I think that it maybe worth making UFFD_FEATURE_SIGBUS mutually exclusive
>> >with the non-cooperative events. There is no point of having monitor if the
>> >page fault handler will anyway just kill the faulting process.
>>
>>
>> Will this not be too restrictive?. The non-cooperative events could
>> still be useful if an application wants to track changes
>> to VA ranges that are registered even though it expects
>> a signal on page fault.
>I wouldn't say that we must make UFFD_FEATURE_SIGBUS mutually exclusive
>with other events, but, IMHO, it's something we should at least think
>about.
>In my view, if you anyway have uffd monitor, you may process page faults
>there as well and then there is no actual need in UFFD_FEATURE_SIGBUS.
A use case for this I am considering, is lightweight threads/continuations
having a context switch on the page faults of file backed VMA's. Some sort
of asynchronous read() would then be initiated. For this case, the primary
function of SIGBUS is to allow the thread to jump into context switch code.
While it's not immediately clear what roll the uffd monitor thread would play
in this. One can imagine the possibility of the UFFD thread also managing
asynchronous i/o.
Just wanted to voice my approval of this patch in general. It could enable
some really cool userland technologies imho.
~Jon Pry