Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755168AbbHXRAb (ORCPT ); Mon, 24 Aug 2015 13:00:31 -0400 Received: from a23-79-238-175.deploy.static.akamaitechnologies.com ([23.79.238.175]:55049 "EHLO prod-mail-xrelay07.akamai.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753180AbbHXRA3 (ORCPT ); Mon, 24 Aug 2015 13:00:29 -0400 Date: Mon, 24 Aug 2015 13:00:28 -0400 From: Eric B Munson To: Konstantin Khlebnikov Cc: Vlastimil Babka , Michal Hocko , Andrew Morton , Jonathan Corbet , "Kirill A. Shutemov" , Linux Kernel Mailing List , dri-devel , "linux-mm@kvack.org" , Linux API Subject: Re: [PATCH v7 3/6] mm: Introduce VM_LOCKONFAULT Message-ID: <20150824170028.GC17005@akamai.com> References: <20150821072552.GF23723@dhcp22.suse.cz> <20150821183132.GA12835@akamai.com> <55DB1C77.8070705@suse.cz> <55DB29EB.1000308@suse.cz> <20150824150912.GA17005@akamai.com> <20150824155503.GB17005@akamai.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="zCKi3GIZzVBPywwA" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5596 Lines: 148 --zCKi3GIZzVBPywwA Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote: > On Mon, Aug 24, 2015 at 6:55 PM, Eric B Munson wrote: > > On Mon, 24 Aug 2015, Konstantin Khlebnikov wrote: > > > >> On Mon, Aug 24, 2015 at 6:09 PM, Eric B Munson wr= ote: > >> > On Mon, 24 Aug 2015, Vlastimil Babka wrote: > >> > > >> >> On 08/24/2015 03:50 PM, Konstantin Khlebnikov wrote: > >> >> >On Mon, Aug 24, 2015 at 4:30 PM, Vlastimil Babka = wrote: > >> >> >>On 08/24/2015 12:17 PM, Konstantin Khlebnikov wrote: > >> >> >>>> > >> >> >>>> > >> >> >>>>I am in the middle of implementing lock on fault this way, but = I cannot > >> >> >>>>see how we will hanlde mremap of a lock on fault region. Say w= e have > >> >> >>>>the following: > >> >> >>>> > >> >> >>>> addr =3D mmap(len, MAP_ANONYMOUS, ...); > >> >> >>>> mlock(addr, len, MLOCK_ONFAULT); > >> >> >>>> ... > >> >> >>>> mremap(addr, len, 2 * len, ...) > >> >> >>>> > >> >> >>>>There is no way for mremap to know that the area being remapped= was lock > >> >> >>>>on fault so it will be locked and prefaulted by remap. How can= we avoid > >> >> >>>>this without tracking per vma if it was locked with lock or loc= k on > >> >> >>>>fault? > >> >> >>> > >> >> >>> > >> >> >>>remap can count filled ptes and prefault only completely populat= ed areas. > >> >> >> > >> >> >> > >> >> >>Does (and should) mremap really prefault non-present pages? Shoul= dn't it > >> >> >>just prepare the page tables and that's it? > >> >> > > >> >> >As I see mremap prefaults pages when it extends mlocked area. > >> >> > > >> >> >Also quote from manpage > >> >> >: If the memory segment specified by old_address and old_size is = locked > >> >> >: (using mlock(2) or similar), then this lock is maintained when t= he segment is > >> >> >: resized and/or relocated. As a consequence, the amount of memo= ry locked > >> >> >: by the process may change. > >> >> > >> >> Oh, right... Well that looks like a convincing argument for having a > >> >> sticky VM_LOCKONFAULT after all. Having mremap guess by scanning > >> >> existing pte's would slow it down, and be unreliable (was the area > >> >> completely populated because MLOCK_ONFAULT was not used or because > >> >> the process aulted it already? Was it not populated because > >> >> MLOCK_ONFAULT was used, or because mmap(MAP_LOCKED) failed to > >> >> populate it all?). > >> > > >> > Given this, I am going to stop working in v8 and leave the vma flag = in > >> > place. > >> > > >> >> > >> >> The only sane alternative is to populate always for mremap() of > >> >> VM_LOCKED areas, and document this loss of MLOCK_ONFAULT information > >> >> as a limitation of mlock2(MLOCK_ONFAULT). Which might or might not > >> >> be enough for Eric's usecase, but it's somewhat ugly. > >> >> > >> > > >> > I don't think that this is the right solution, I would be really > >> > surprised as a user if an area I locked with MLOCK_ONFAULT was then > >> > fully locked and prepopulated after mremap(). > >> > >> If mremap is the only problem then we can add opposite flag for it: > >> > >> "MREMAP_NOPOPULATE" > >> - do not populate new segment of locked areas > >> - do not copy normal areas if possible (anonymous/special must be copi= ed) > >> > >> addr =3D mmap(len, MAP_ANONYMOUS, ...); > >> mlock(addr, len, MLOCK_ONFAULT); > >> ... > >> addr2 =3D mremap(addr, len, 2 * len, MREMAP_NOPOPULATE); > >> ... > >> > > > > But with this, the user must remember what areas are locked with > > MLOCK_LOCKONFAULT and which are locked the with prepopulate so the > > correct mremap flags can be used. > > >=20 > Yep. Shouldn't be hard. You anyway have to do some changes in user-space. >=20 Sorry if I wasn't clear enough in my last reply, I think forcing userspace to track this is the wrong choice. The VM system is responsible for tracking these attributes and should continue to be. >=20 > Much simpler for users-pace solution is a mm-wide flag which turns all fu= rther > mlocks and MAP_LOCKED into lock-on-fault. Something like > mlockall(MCL_NOPOPULATE_LOCKED). This set certainly adds the foundation for such a change if you think it would be useful. That particular behavior was not part of my inital use case though. --zCKi3GIZzVBPywwA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJV202sAAoJELbVsDOpoOa9Oi8P/jYlOtUrXtvvqK5Oiozpis6Z zb7mLmaiWxr61uIe/ljCdtWpWK8XHLSsM7xx5IKGQAkhgjmiQSdeDVIh5iWU/9mY 6Vz/XYBmLDuC7cq8N+ZAJbv7LDhQOFma1Kp86YA1nVHyZhayEFOxZqUbwAkhMJKE Qx1+qKNs/7W1cA21iYDYV2Zn/Uopjxx2bhdR6uOAnEFC/FdnXy33J9M4ArJHVpLO YLsg9ufYtM3vpJObGTHRASyQ0NLMADzmLB6w5U+F8g2dWHzJjIP+kHPTDLda1HC0 x5edQgqjAV/TQ6DBsVcms+GYXLkYsEM8wCunvHqOSCrNjyk8yiF4rnZm55CG/WcR d9aP0KH5iwgSTqWvl9WLclf2MWX84AetDHWfnA0KF6Q7eYRPbQXccTqUNLFdwQBg 6eYKEKaqbuK0bBts4kJlLRZGN5paAjgFLCB3njxPYzMqBhHaU3skQsYY/v6Xa/9D 9tsrpTNQqhaY2j2eZQeek5oJYTpGPdGagGd5AoLZTtIfzFhFTyFl62mwsXMVOKZF n20DxV41TFrRMUe+RkFhzyvApjyZpgeQBNlCJArYLUrNZvUN67H72GXcHNtYJUAx DfJteZBYCyq6tOV4DaEYBiWOn3P2KrIHZpBDLCIQMwWTwiKPDZWA1Rtv+vPfP51q AGupr2+rybcpunEOKHx/ =Y0HY -----END PGP SIGNATURE----- --zCKi3GIZzVBPywwA-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/