Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756579AbdCTUGv (ORCPT ); Mon, 20 Mar 2017 16:06:51 -0400 Received: from mail2.tiolive.com ([94.23.229.207]:53204 "EHLO mail2.tiolive.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755674AbdCTUGu (ORCPT ); Mon, 20 Mar 2017 16:06:50 -0400 Date: Mon, 20 Mar 2017 23:06:44 +0300 From: Kirill Smelkov To: "Michael Kerrisk (man-pages)" Cc: Nick Piggin , Andrew Morton , Randy Dunlap , Mark Fasheh , Linus Torvalds , Michel Lespinasse , linux-man , lkml Subject: Re: Why mmap(MAP_POPULATE | MAP_NONBLOCK) is needed (Re: [patch] mmap.2: Add link to commit which broke MAP_POPULATE | MAP_NONBLOCK to be noop) Message-ID: <20170320200644.jiers5dbg45nfh3y@deco.navytux.spb.ru> References: <20170318194010.11639-1-kirr@nexedi.com> <20170320155948.pgpp2uhgoppicdl4@deco.navytux.spb.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7717 Lines: 238 Michael, first of all thanks for feedback. On Mon, Mar 20, 2017 at 08:38:50PM +0100, Michael Kerrisk (man-pages) wrote: > [CC += Michel Lespinasse ] > > Kirill, > > I need some help here. > > On 20 March 2017 at 16:59, Kirill Smelkov wrote: > > On Sat, Mar 18, 2017 at 10:40:10PM +0300, Kirill Smelkov wrote: > >> Signed-off-by: Kirill Smelkov > >> --- > >> man2/mmap.2 | 1 + > >> 1 file changed, 1 insertion(+) > >> > >> diff --git a/man2/mmap.2 b/man2/mmap.2 > >> index 96875e486..f6fd56523 100644 > >> --- a/man2/mmap.2 > >> +++ b/man2/mmap.2 > >> @@ -300,6 +300,7 @@ Don't perform read-ahead: > >> create page tables entries only for pages > >> that are already present in RAM. > >> Since Linux 2.6.23, this flag causes > >> +.\" commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 > >> .BR MAP_POPULATE > >> to do nothing. > >> One day, the combination of > > > > Please also find below benchmark which explains why > > > > mmap(MAP_POPULATE | MAP_NONBLOCK) > > > > is actually needed. > > Okay -- clearly things have changed (but I received no man-pages > patch). Strange it was sent. Let me show it once again here (git am -s): ---- 8< ---- From: Kirill Smelkov Subject: [patch] mmap.2: Add link to commit which broke MAP_POPULATE | MAP_NONBLOCK to be noop Signed-off-by: Kirill Smelkov --- man2/mmap.2 | 1 + 1 file changed, 1 insertion(+) diff --git a/man2/mmap.2 b/man2/mmap.2 index 96875e486..f6fd56523 100644 --- a/man2/mmap.2 +++ b/man2/mmap.2 @@ -300,6 +300,7 @@ Don't perform read-ahead: create page tables entries only for pages that are already present in RAM. Since Linux 2.6.23, this flag causes +.\" commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 .BR MAP_POPULATE to do nothing. One day, the combination of -- 2.11.0 ---- 8< ---- > What do you believe the man page should now say. What man page says today correctly describes current behaviour: ---- 8< ---- MAP_NONBLOCK (since Linux 2.5.46) This flag is meaningful only in conjunction with MAP_POPULATE. Don't perform read- ahead: create page tables entries only for pages that are already present in RAM. Since Linux 2.6.23, this flag causes MAP_POPULATE to do nothing. One day, the combination of MAP_POPULATE and MAP_NONBLOCK may be reimplemented. ---- 8< ---- For now I've just added reference to commit corresponding to "Since Linux 2.6.23, this flag causes MAP_POPULATE to do nothing." > Or, perhaps we can ask Michel: > > commit bebeb3d68b24bb4132d452c5707fe321208bcbcd > Author: Michel Lespinasse > Date: Fri Feb 22 16:32:37 2013 -0800 > > The above commit (which went into Linux 3.9) seems to be the source of > the change. > > Michael, can you suggest to us what the mmap() man page should now say > about MAP_POPULATE? It is good to have feedback from relevant people, but as my patch to man-pages says, if I understand it correctly, the original patch which changed behaviour is this: ---- 8< ---- commit 54cb8821de07f2ffcd28c380ce9b93d5784b40d7 Author: Nick Piggin Date: Thu Jul 19 01:46:59 2007 -0700 mm: merge populate and nopage into fault (fixes nonlinear) ... After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in <-- NOTE here pagecache. Seems like a fringe functionality anyway. ... [akpm@linux-foundation.org: cleanup] [randy.dunlap@oracle.com: doc. fixes for readahead] [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin Signed-off-by: Randy Dunlap Cc: Mark Fasheh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds ---- 8< ---- Adding all people involved to Cc - please have a look at quoted benchmark below which justifies usage of mmap(MAP_POPULATE | MAP_NONBLOCK). Thanks, Kirill > > ---- 8< ---- (https://lab.nexedi.com/kirr/misc/blob/5a25f4ae/t_sysmmap_c.c) > > /* This program benchmarks pagefault time. > > * > > * Unfortunately as of 2017-Mar-20 for data in pagecache the situation is as > > * follows (i7-6600U, Linux 4.9.13): > > * > > * 1. minor pagefault: ~ 1200ns > > * (this program) > > * > > * 2. read syscall + whole page copy: ~ 215ns > > * (https://github.com/golang/go/issues/19563#issuecomment-287423654) > > * > > * 3. it is not possible to mmap(MAP_POPULATE | MAP_NONBLOCK) (i.e. prefault > > * those PTE that are already in pagecache). > > * ( http://www.spinics.net/lists/linux-man/msg11420.html, > > * https://git.kernel.org/linus/54cb8821de07f2ffcd28c380ce9b93d5784b40d7 ) > > * > > * 4. (Q) I'm not sure a mechanism exists in the kernel to automatically > > * subscribe a VMA so that when a page becomes pagecached, associated PTE is > > * adjusted so that programs won't need to pay minor pagefault time on > > * access. > > * > > * unless 3 and 4 are solved mmap unfortunately seems to be slower choice > > * compared to just pread. > > */ > > #define _GNU_SOURCE > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > #include > > > > // 12345678 > > #define NITER 500000 > > > > // microtime returns current time as double > > double microtime() { > > int err; > > struct timeval tv; > > > > err = gettimeofday(&tv, NULL); > > if (err == -1) { > > perror("gettimeofday"); > > abort(); > > } > > > > return tv.tv_sec + 1E-6 * tv.tv_usec; > > } > > > > > > int main() { > > unsigned char *addr, sum = 0; > > int fd, err, i; > > size_t size; > > double Tstart, Tend; > > > > fd = open("/dev/shm/y.dat", O_RDWR | O_CREAT | O_TRUNC, 0666); > > if (fd == -1) { > > perror("open"); > > abort(); > > } > > > > size = NITER * PAGE_SIZE; > > > > err = ftruncate(fd, size); > > if (err == -1) { > > perror("ftruncate"); > > abort(); > > } > > > > #if 1 > > // make sure RAM is actually allocated > > Tstart = microtime(); > > err = fallocate(fd, /*mode*/0, 0, size); > > Tend = microtime(); > > if (err == -1) { > > perror("fallocate"); > > abort(); > > } > > printf("T(fallocate):\t%.1f\t%6.1f ns / page\n", Tend - Tstart, (Tend - Tstart) * 1E9 / NITER); > > #endif > > > > Tstart = microtime(); > > addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0); > > //addr = mmap(NULL, size, PROT_READ, MAP_SHARED | MAP_POPULATE, fd, 0); > > //addr = mmap(NULL, size, PROT_READ, MAP_SHARED | MAP_POPULATE | MAP_NONBLOCK, fd, 0); > > if (addr == MAP_FAILED) { > > perror("mmap"); > > abort(); > > } > > Tend = microtime(); > > printf("T(mmap):\t%.1f\t%6.1f ns / page\n", Tend - Tstart, (Tend - Tstart) * 1E9 / NITER); > > > > Tstart = microtime(); > > //for (int j=0; j < 100; j++) > > for (i=0; i > sum += addr[i*PAGE_SIZE]; > > } > > Tend = microtime(); > > > > printf("T(pagefault):\t%.1f\t%6.1f ns / page\t(%i)\n", Tend - Tstart, (Tend - Tstart) * 1E9 / NITER, sum); > > > > return 0; > > } > > ---- 8< ----