Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3730111imm; Mon, 11 Jun 2018 00:20:54 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKGlsZOBtkH5oY/wOL5SIk6E34/QE7kV4HXtAxrTpdxL3YUW157zGo0XY2kRkApuahGiKZ6 X-Received: by 2002:a62:d09:: with SMTP id v9-v6mr16123490pfi.163.1528701654449; Mon, 11 Jun 2018 00:20:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528701654; cv=none; d=google.com; s=arc-20160816; b=tpejzUajTHdmKKLELWGihdEevVBbUnUxrw6t2aqDcxV/1zoOKXfuUjtTJzNx1vlOZq 7cdbjiWM2S/MG5IgQhhj5eB1LmYFWbZnUHtPmIjJKZwX9TFK+Dbpx40StpvcdbUeSGa3 WVfx9PJCaZOi4URzpxvoNBlw0V4r26gOS9vD65p+VjzkRuYzVkJ+xQQfkl0R3IyJ4uxm Y8SkFpYpO3iWIKgS6mxjTxBcT8LisohXX8lqRgsIRQVPizyh0wBsaMg9f6t7h763i3r9 9u5aTz70CEZJMblKFGmu4qsgbjW+OlGm4OxHyIurLjRzqQJh/XARqZX18nXrRZkLmQTe ZJXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=6kqn20COBXw7aVbTev2Rj0WZglrwy8alfLOtERi/kOU=; b=AASV8N1jFwvNCiIyiMxIYA3ya++8YTgxlffbF6pq6xyOWmJBa2DzmbFNsMHiuBmqB+ X+CsTbvhJGMbYoEfjdRGjw2ozuy8AzCf2zOIngUPOwVrtJGUPvvzsC6OhUsalPqdIiUA t1T5TNy205SNfN5PHRsC8iawEoKOz9BEBn6SH9dyV/WhtgdIG5MnoJSc1cwIxIZlaEJ0 oGseSXW009Yn3TI8/Q89WrSKKYZX08EzAzg/Gx2RkPz2L9iHdN9gwdL+DUEymYYpWoB2 xDPEVxW4LegL0uBwmVg+geY0QuJhsVxtyatH9thJsHCSv9L9pR4APopSMyACLuEgyI+f dMjg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o89-v6si37831138pfi.165.2018.06.11.00.20.40; Mon, 11 Jun 2018 00:20:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754139AbeFKHUK (ORCPT + 99 others); Mon, 11 Jun 2018 03:20:10 -0400 Received: from mx2.suse.de ([195.135.220.15]:35999 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754073AbeFKHUI (ORCPT ); Mon, 11 Jun 2018 03:20:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext-too.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 173F2ADE3; Mon, 11 Jun 2018 07:20:07 +0000 (UTC) Date: Mon, 11 Jun 2018 09:20:05 +0200 From: Michal Hocko To: Jason Baron Cc: akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka , Joonsoo Kim , Mel Gorman , "Kirill A. Shutemov" , linux-api@vger.kernel.org Subject: Re: [PATCH] mm/madvise: allow MADV_DONTNEED to free memory that is MLOCK_ONFAULT Message-ID: <20180611072005.GC13364@dhcp22.suse.cz> References: <1528484212-7199-1-git-send-email-jbaron@akamai.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1528484212-7199-1-git-send-email-jbaron@akamai.com> User-Agent: Mutt/1.9.5 (2018-04-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [CCing linux-api - please make sure to CC this mailing list anytime you are touching user visible apis] On Fri 08-06-18 14:56:52, Jason Baron wrote: > In order to free memory that is marked MLOCK_ONFAULT, the memory region > needs to be first unlocked, before calling MADV_DONTNEED. And if the region > is to be reused as MLOCK_ONFAULT, we require another call to mlock2() with > the MLOCK_ONFAULT flag. > > Let's simplify freeing memory that is set MLOCK_ONFAULT, by allowing > MADV_DONTNEED to work directly for memory that is set MLOCK_ONFAULT. I do not understand the point here. How is MLOCK_ONFAULT any different from the regular mlock here? If you want to free mlocked memory then fine but the behavior should be consistent. MLOCK_ONFAULT is just a way to say that we do not want to pre-populate the mlocked area and do that lazily on the page fault time. madvise should make any difference here. That being said we do not allow MADV_DONTNEED on VM_LOCKED since ever. I do not really see why but this would be a user visible change. Can we do that? What was the original motivation for exclusion? [keeping the rest of email for linux-api] > The > locked memory limits, tracked by mm->locked_vm do not need to be adjusted > in this case, since they were charged to the entire region when > MLOCK_ONFAULT was initially set. > > Further, I don't think allowing MADV_FREE for MLOCK_ONFAULT regions makes > sense, since the point of MLOCK_ONFAULT is for userspace to know when pages > are locked in memory and thus to know when page faults will occur. > > Signed-off-by: Jason Baron > Cc: Andrew Morton > Cc: Michal Hocko > Cc: Vlastimil Babka > Cc: Joonsoo Kim > Cc: Mel Gorman > Cc: Kirill A. Shutemov > --- > mm/internal.h | 18 ++++++++++++++++++ > mm/madvise.c | 4 ++-- > mm/oom_kill.c | 2 +- > 3 files changed, 21 insertions(+), 3 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 9e3654d..16c0041 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -15,6 +15,7 @@ > #include > #include > #include > +#include > > /* > * The set of flags that only affect watermark checking and reclaim > @@ -45,9 +46,26 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, > > static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) > { > + return !(((vma->vm_flags & (VM_LOCKED|VM_LOCKONFAULT)) == VM_LOCKED) || > + (vma->vm_flags & (VM_HUGETLB|VM_PFNMAP))); > +} > + > +static inline bool can_madv_free_vma(struct vm_area_struct *vma) > +{ > return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); > } > > +static inline bool can_madv_dontneed_or_free_vma(struct vm_area_struct *vma, > + int behavior) > +{ > + if (behavior == MADV_DONTNEED) > + return can_madv_dontneed_vma(vma); > + else if (behavior == MADV_FREE) > + return can_madv_free_vma(vma); > + else > + return 0; > +} > + > void unmap_page_range(struct mmu_gather *tlb, > struct vm_area_struct *vma, > unsigned long addr, unsigned long end, > diff --git a/mm/madvise.c b/mm/madvise.c > index 4d3c922..61ff306 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -517,7 +517,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, > int behavior) > { > *prev = vma; > - if (!can_madv_dontneed_vma(vma)) > + if (!can_madv_dontneed_or_free_vma(vma, behavior)) > return -EINVAL; > > if (!userfaultfd_remove(vma, start, end)) { > @@ -539,7 +539,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, > */ > return -ENOMEM; > } > - if (!can_madv_dontneed_vma(vma)) > + if (!can_madv_dontneed_or_free_vma(vma, behavior)) > return -EINVAL; > if (end > vma->vm_end) { > /* > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 8ba6cb8..9817d15 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -492,7 +492,7 @@ void __oom_reap_task_mm(struct mm_struct *mm) > set_bit(MMF_UNSTABLE, &mm->flags); > > for (vma = mm->mmap ; vma; vma = vma->vm_next) { > - if (!can_madv_dontneed_vma(vma)) > + if (!can_madv_free_vma(vma)) > continue; > > /* > -- > 2.7.4 > -- Michal Hocko SUSE Labs