Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S939863AbdD1Jpj convert rfc822-to-8bit (ORCPT ); Fri, 28 Apr 2017 05:45:39 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:57045 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756429AbdD1Jp2 (ORCPT ); Fri, 28 Apr 2017 05:45:28 -0400 Date: Fri, 28 Apr 2017 12:45:16 +0300 User-Agent: K-9 Mail for Android In-Reply-To: References: <1493302474-4701-1-git-send-email-rppt@linux.vnet.ibm.com> <1493302474-4701-2-git-send-email-rppt@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Subject: Re: [PATCH man-pages 1/2] userfaultfd.2: start documenting non-cooperative events To: "Michael Kerrisk (man-pages)" CC: mtk.manpages@gmail.com, Andrea Arcangeli , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-man@vger.kernel.org From: Mike Rapoprt X-TM-AS-GCONF: 00 x-cbid: 17042809-0008-0000-0000-000004359AD0 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17042809-0009-0000-0000-00001D74299C Message-Id: <190E3CFC-492F-4672-9385-9C3D8F57F26C@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-28_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1704280145 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7136 Lines: 226 On April 27, 2017 8:26:16 PM GMT+03:00, "Michael Kerrisk (man-pages)" wrote: >Hi Mike, > >I've applied this, but have some questions/points I think >further clarification. > >On 04/27/2017 04:14 PM, Mike Rapoport wrote: >> Signed-off-by: Mike Rapoport >> --- >> man2/userfaultfd.2 | 135 >++++++++++++++++++++++++++++++++++++++++++++++++++--- >> 1 file changed, 128 insertions(+), 7 deletions(-) >> >> diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 >> index cfea5cb..44af3e4 100644 >> --- a/man2/userfaultfd.2 >> +++ b/man2/userfaultfd.2 >> @@ -75,7 +75,7 @@ flag in >> .PP >> When the last file descriptor referring to a userfaultfd object is >closed, >> all memory ranges that were registered with the object are >unregistered >> -and unread page-fault events are flushed. >> +and unread events are flushed. >> .\" >> .SS Usage >> The userfaultfd mechanism is designed to allow a thread in a >multithreaded >> @@ -99,6 +99,20 @@ In such non-cooperative mode, >> the process that monitors userfaultfd and handles page faults >> needs to be aware of the changes in the virtual memory layout >> of the faulting process to avoid memory corruption. >> + >> +Starting from Linux 4.11, >> +userfaultfd may notify the fault-handling threads about changes >> +in the virtual memory layout of the faulting process. >> +In addition, if the faulting process invokes >> +.BR fork (2) >> +system call, >> +the userfaultfd objects associated with the parent may be duplicated >> +into the child process and the userfaultfd monitor will be notified >> +about the file descriptor associated with the userfault objects > >What does "notified about the file descriptor" mean? Well, seems that I've made this one really awkward :) When the monitored process forks, all the userfault objects associated​ with it are duplicated into the child process. For each duplicated object, userfault generates event of type UFFD_EVENT_FORK and the uffdio_msg for this event contains the file descriptor that should be used to manipulate the duplicated userfault object. Hope this clarifies. >> +created for the child process, >> +which allows userfaultfd monitor to perform user-space paging >> +for the child process. >> + >> .\" FIXME elaborate about non-cooperating mode, describe its >limitations >> .\" for kernels before 4.11, features added in 4.11 >> .\" and limitations remaining in 4.11 >> @@ -144,6 +158,10 @@ Details of the various >> operations can be found in >> .BR ioctl_userfaultfd (2). >> >> +Since Linux 4.11, events other than page-fault may enabled during >> +.B UFFDIO_API >> +operation. >> + >> Up to Linux 4.11, >> userfaultfd can be used only with anonymous private memory mappings. >> >> @@ -156,7 +174,8 @@ Each >> .BR read (2) >> from the userfaultfd file descriptor returns one or more >> .I uffd_msg >> -structures, each of which describes a page-fault event: >> +structures, each of which describes a page-fault event >> +or an event required for the non-cooperative userfaultfd usage: >> >> .nf >> .in +4n >> @@ -168,6 +187,23 @@ struct uffd_msg { >> __u64 flags; /* Flags describing fault */ >> __u64 address; /* Faulting address */ >> } pagefault; >> + struct { >> + __u32 ufd; /* userfault file descriptor >> + of the child process */ >> + } fork; /* since Linux 4.11 */ >> + struct { >> + __u64 from; /* old address of the >> + remapped area */ >> + __u64 to; /* new address of the >> + remapped area */ >> + __u64 len; /* original mapping length */ >> + } remap; /* since Linux 4.11 */ >> + struct { >> + __u64 start; /* start address of the >> + removed area */ >> + __u64 end; /* end address of the >> + removed area */ >> + } remove; /* since Linux 4.11 */ >> ... >> } arg; >> >> @@ -194,14 +230,73 @@ structure are as follows: >> .TP >> .I event >> The type of event. >> -Currently, only one value can appear in this field: >> -.BR UFFD_EVENT_PAGEFAULT , >> -which indicates a page-fault event. >> +Depending of the event type, >> +different fields of the >> +.I arg >> +union represent details required for the event processing. >> +The non-page-fault events are generated only when appropriate >feature >> +is enabled during API handshake with >> +.B UFFDIO_API >> +.BR ioctl (2). >> + >> +The following values can appear in the >> +.I event >> +field: >> +.RS >> +.TP >> +.B UFFD_EVENT_PAGEFAULT >> +A page-fault event. >> +The page-fault details are available in the >> +.I pagefault >> +field. >> .TP >> -.I address >> +.B UFFD_EVENT_FORK >> +Generated when the faulting process invokes >> +.BR fork (2) >> +system call. >> +The event details are available in the >> +.I fork >> +field. >> +.\" FIXME descirbe duplication of userfault file descriptor during >fork >> +.TP >> +.B UFFD_EVENT_REMAP >> +Generated when the faulting process invokes >> +.BR mremap (2) >> +system call. >> +The event details are available in the >> +.I remap >> +field. >> +.TP >> +.B UFFD_EVENT_REMOVE >> +Generated when the faulting process invokes >> +.BR madvise (2) >> +system call with >> +.BR MADV_DONTNEED >> +or >> +.BR MADV_REMOVE >> +advice. >> +The event details are available in the >> +.I remove >> +field. >> +.TP >> +.B UFFD_EVENT_UNMAP >> +Generated when the faulting process unmaps a memory range, >> +either explicitly using >> +.BR munmap (2) >> +system call or implicitly during >> +.BR mmap (2) >> +or >> +.BR mremap (2) >> +system calls. >> +The event details are available in the >> +.I remove >> +field. >> +.RE >> +.TP >> +.I pagefault.address >> The address that triggered the page fault. >> .TP >> -.I flags >> +.I pagefault.flags >> A bit mask of flags that describe the event. >> For >> .BR UFFD_EVENT_PAGEFAULT , >> @@ -218,6 +313,32 @@ otherwise it is a read fault. >> .\" >> .\" UFFD_PAGEFAULT_FLAG_WP is not yet supported. >> .RE >> +.TP >> +.I fork.ufd >> +The file descriptor associated with the userfault object >> +created for the child process >> +.TP >> +.I remap.from >> +The original address of the memory range that was remapped using >> +.BR mremap (2). >> +.TP >> +.I remap.to >> +The new address of the memory range that was remapped using >> +.BR mremap (2). >> +.TP >> +.I remap.len >> +The original length of the the memory range that was remapped using >> +.BR mremap (2). >> +.TP >> +.I remove.start >> +The start address of the memory range that was freed using >> +.BR madvise (2) >> +or unmapped >> +.TP >> +.I remove.end >> +The end address of the memory range that was freed using >> +.BR madvise (2) >> +or unmapped >> .PP >> A >> .BR read (2) > >Cheers, > >Michael -- Sent from my Android device with K-9 Mail. Please excuse my brevity.