Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1033624AbdD0OOx (ORCPT ); Thu, 27 Apr 2017 10:14:53 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:49925 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1755583AbdD0OOq (ORCPT ); Thu, 27 Apr 2017 10:14:46 -0400 From: Mike Rapoport To: "Michael Kerrisk (man-pages)" Cc: Andrea Arcangeli , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-man@vger.kernel.org, Mike Rapoport Subject: [PATCH man-pages 1/2] userfaultfd.2: start documenting non-cooperative events Date: Thu, 27 Apr 2017 17:14:33 +0300 X-Mailer: git-send-email 1.9.1 In-Reply-To: <1493302474-4701-1-git-send-email-rppt@linux.vnet.ibm.com> References: <1493302474-4701-1-git-send-email-rppt@linux.vnet.ibm.com> X-TM-AS-GCONF: 00 x-cbid: 17042714-0040-0000-0000-000003778EA5 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17042714-0041-0000-0000-0000252427D6 Message-Id: <1493302474-4701-2-git-send-email-rppt@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-04-27_12:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1703280000 definitions=main-1704270238 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5745 Lines: 198 Signed-off-by: Mike Rapoport --- man2/userfaultfd.2 | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 128 insertions(+), 7 deletions(-) diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2 index cfea5cb..44af3e4 100644 --- a/man2/userfaultfd.2 +++ b/man2/userfaultfd.2 @@ -75,7 +75,7 @@ flag in .PP When the last file descriptor referring to a userfaultfd object is closed, all memory ranges that were registered with the object are unregistered -and unread page-fault events are flushed. +and unread events are flushed. .\" .SS Usage The userfaultfd mechanism is designed to allow a thread in a multithreaded @@ -99,6 +99,20 @@ In such non-cooperative mode, the process that monitors userfaultfd and handles page faults needs to be aware of the changes in the virtual memory layout of the faulting process to avoid memory corruption. + +Starting from Linux 4.11, +userfaultfd may notify the fault-handling threads about changes +in the virtual memory layout of the faulting process. +In addition, if the faulting process invokes +.BR fork (2) +system call, +the userfaultfd objects associated with the parent may be duplicated +into the child process and the userfaultfd monitor will be notified +about the file descriptor associated with the userfault objects +created for the child process, +which allows userfaultfd monitor to perform user-space paging +for the child process. + .\" FIXME elaborate about non-cooperating mode, describe its limitations .\" for kernels before 4.11, features added in 4.11 .\" and limitations remaining in 4.11 @@ -144,6 +158,10 @@ Details of the various operations can be found in .BR ioctl_userfaultfd (2). +Since Linux 4.11, events other than page-fault may enabled during +.B UFFDIO_API +operation. + Up to Linux 4.11, userfaultfd can be used only with anonymous private memory mappings. @@ -156,7 +174,8 @@ Each .BR read (2) from the userfaultfd file descriptor returns one or more .I uffd_msg -structures, each of which describes a page-fault event: +structures, each of which describes a page-fault event +or an event required for the non-cooperative userfaultfd usage: .nf .in +4n @@ -168,6 +187,23 @@ struct uffd_msg { __u64 flags; /* Flags describing fault */ __u64 address; /* Faulting address */ } pagefault; + struct { + __u32 ufd; /* userfault file descriptor + of the child process */ + } fork; /* since Linux 4.11 */ + struct { + __u64 from; /* old address of the + remapped area */ + __u64 to; /* new address of the + remapped area */ + __u64 len; /* original mapping length */ + } remap; /* since Linux 4.11 */ + struct { + __u64 start; /* start address of the + removed area */ + __u64 end; /* end address of the + removed area */ + } remove; /* since Linux 4.11 */ ... } arg; @@ -194,14 +230,73 @@ structure are as follows: .TP .I event The type of event. -Currently, only one value can appear in this field: -.BR UFFD_EVENT_PAGEFAULT , -which indicates a page-fault event. +Depending of the event type, +different fields of the +.I arg +union represent details required for the event processing. +The non-page-fault events are generated only when appropriate feature +is enabled during API handshake with +.B UFFDIO_API +.BR ioctl (2). + +The following values can appear in the +.I event +field: +.RS +.TP +.B UFFD_EVENT_PAGEFAULT +A page-fault event. +The page-fault details are available in the +.I pagefault +field. .TP -.I address +.B UFFD_EVENT_FORK +Generated when the faulting process invokes +.BR fork (2) +system call. +The event details are available in the +.I fork +field. +.\" FIXME descirbe duplication of userfault file descriptor during fork +.TP +.B UFFD_EVENT_REMAP +Generated when the faulting process invokes +.BR mremap (2) +system call. +The event details are available in the +.I remap +field. +.TP +.B UFFD_EVENT_REMOVE +Generated when the faulting process invokes +.BR madvise (2) +system call with +.BR MADV_DONTNEED +or +.BR MADV_REMOVE +advice. +The event details are available in the +.I remove +field. +.TP +.B UFFD_EVENT_UNMAP +Generated when the faulting process unmaps a memory range, +either explicitly using +.BR munmap (2) +system call or implicitly during +.BR mmap (2) +or +.BR mremap (2) +system calls. +The event details are available in the +.I remove +field. +.RE +.TP +.I pagefault.address The address that triggered the page fault. .TP -.I flags +.I pagefault.flags A bit mask of flags that describe the event. For .BR UFFD_EVENT_PAGEFAULT , @@ -218,6 +313,32 @@ otherwise it is a read fault. .\" .\" UFFD_PAGEFAULT_FLAG_WP is not yet supported. .RE +.TP +.I fork.ufd +The file descriptor associated with the userfault object +created for the child process +.TP +.I remap.from +The original address of the memory range that was remapped using +.BR mremap (2). +.TP +.I remap.to +The new address of the memory range that was remapped using +.BR mremap (2). +.TP +.I remap.len +The original length of the the memory range that was remapped using +.BR mremap (2). +.TP +.I remove.start +The start address of the memory range that was freed using +.BR madvise (2) +or unmapped +.TP +.I remove.end +The end address of the memory range that was freed using +.BR madvise (2) +or unmapped .PP A .BR read (2) -- 1.9.1