Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp1315912pxm; Thu, 3 Mar 2022 15:03:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJyAKKLnoa9g5hd9uMQzVMqlj+Lew4W2LnP8ke8inUDE/d0mm+vRiwkIN2SdyXOopMXMiDZE X-Received: by 2002:a05:6402:d7:b0:413:673:ba2f with SMTP id i23-20020a05640200d700b004130673ba2fmr36178091edu.29.1646348624333; Thu, 03 Mar 2022 15:03:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646348624; cv=none; d=google.com; s=arc-20160816; b=aPz2dPt7UZvrvyxhS9HSH7VEEdkr8aISR9CtHrtKIo1MkkSsootAagFk7+uVy85FnR bmVZy0E97XUeQ5SCfKy53P8jvfu9LLkR1guZiNUgtKoVHzkwWd6BVzYWKdNC3w229WhC ejluP/PGXybL+QuP4cln/qIomzEKovdCNqoU0q/+mJJQp3cXcmrVx4iymDC9iLUtTOsk vCjX83ppP1TLiBGQxeQ8nIJROdoP/ZHPPF9uwc2lns/Zvbzo43SFuPgu6l/uSZeQhyUc uW/5vcmjvdubzIvURiYSfNRlikucXi5lNvKTRt2lft/HL4p3+W2Q1qcA6f0AbyGUyr81 HApw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=AJm6rEfe6QChserQuJSZLkAQP22tqq4oajdg4WrPTzA=; b=OdmM4Z8PWzgoYEhPZ2J1Xarvvr2lPngAdbMHf8oGgvUHz3G0NRVXq1OGf5xvWm/DyN bxQgNuoPogVCtbEf+lAp/WtvHMKIYEz1eXDz1lVHZLZtv7gTtL8k4PCuXVVNJyS6AZ05 oM0hg6yRMyjen+7JpURiqOjTtU+4gGtZY0qQdpcUQJ6QHXaX5dqZvQ58BfTamjy2WrtF X4C+RaaKnPfULmf6RchXIjtaHvTMeG0zrA5TscHXwjBOcwv/NcSeKHB+LNf+DaOH3wP+ EP7XxMTM8SswZkfSPAd+aev9Zr2UQMEJvDoU/7DakaHzmxw3A8dFjwriM9Zl3GkkGX1V 6fnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=rjFirQQf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u17-20020aa7d991000000b00415effa61basi1037960eds.265.2022.03.03.15.03.21; Thu, 03 Mar 2022 15:03:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=rjFirQQf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233902AbiCCVaq (ORCPT + 99 others); Thu, 3 Mar 2022 16:30:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbiCCVap (ORCPT ); Thu, 3 Mar 2022 16:30:45 -0500 Received: from mail-qk1-x72a.google.com (mail-qk1-x72a.google.com [IPv6:2607:f8b0:4864:20::72a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87C39166E25 for ; Thu, 3 Mar 2022 13:29:58 -0800 (PST) Received: by mail-qk1-x72a.google.com with SMTP id 185so5031589qkh.1 for ; Thu, 03 Mar 2022 13:29:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=AJm6rEfe6QChserQuJSZLkAQP22tqq4oajdg4WrPTzA=; b=rjFirQQfFh4ulxYx7aVCojr3fd5pqh1Jw1kbTCIPcKS70KJAgNT0fOilTmG98RH+Aq 2IADMLT0rYcG5f0jdWo5lNufgz52zYYQX0a0P4cPl3zvCXwCy3Oou5bikCHmP4ay00Q+ R5jS1FKoAYpdD442M8hB16LMhO2omPHD8kH7PBZbxAO3ncOJok0pR7hN6HvIk5zGrGcJ aFHZVDjib3ELve/RXF8kOB8h1I0haoiDZgdyDmgCvbSEDqWOXhSGLqXjJSSg0x2xTfE8 oxCkGpAk98F6NDIUdU87Ohk79PT8D1l9pDkinvBL/HFxKmWRsBvvzqFkX4XguVJKP6Ig qyeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=AJm6rEfe6QChserQuJSZLkAQP22tqq4oajdg4WrPTzA=; b=nZmH45HA1QoBRljxyoWnaPQ7ImV/q+QgHUf3lfFwKlFcPELVXcTJ1VyfEd31jzPGk6 1SCK/sd5vu9G0mpHrlzZ/dIpCGvF72MTD+NzswXKuocvbLLgSf/HdxL8wyT6vglTRjRh +tv/js3yn5ymE8mVCo8B1OsMPLbX9KrIqxZvYQOfjn8ZaWTzpi54lmkOWG9NsgTyQtDC 4wbTaC0rRTwdfa1Hjce6o2ChXRxUcMJYhziG/SLw1t6TmmLHodbbPVcXBKcWPUapOpHi vz2jKiG4JiQ2pqLfQW57JMo9WsklEFRqdwZLncMjJH4DSfuEuSqxxZHbNrjthggFdptU w0WQ== X-Gm-Message-State: AOAM533AKBTLsifORtAPbQEzGl9KUxoUfI9LKcLg9KwBHGtG9sCvw4d6 WcorAOEh2cj1Uh8wRkgTQBDv9Q== X-Received: by 2002:a05:620a:c44:b0:508:201b:39d0 with SMTP id u4-20020a05620a0c4400b00508201b39d0mr726361qki.437.1646342997651; Thu, 03 Mar 2022 13:29:57 -0800 (PST) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id x12-20020ac85f0c000000b002de8931d4d6sm2206171qta.77.2022.03.03.13.29.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Mar 2022 13:29:57 -0800 (PST) From: Johannes Weiner To: Andrew Morton Cc: Michal Hocko , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED Date: Thu, 3 Mar 2022 16:29:56 -0500 Message-Id: <20220303212956.229409-1-hannes@cmpxchg.org> X-Mailer: git-send-email 2.35.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating, there are valid use cases for depopulating locked ranges as well. Users mlock memory to protect secrets. There are allocators for secure buffers that want in-use memory generally mlocked, but cleared and invalidated memory to give up the physical pages. This could be done with explicit munlock -> mlock calls on free -> alloc of course, but that adds two unnecessary syscalls, heavy mmap_sem write locks, vma splits and re-merges - only to get rid of the backing pages. Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are okay with on-demand initial population. It seems valid to selectively free some memory during the lifetime of such a process, without having to mess with its overall policy. Why add a separate flag? Isn't this a pretty niche usecase? - MADV_DONTNEED has been bailing on locked vmas forever. It's at least conceivable that someone, somewhere is relying on mlock to protect data from perhaps broader invalidation calls. Changing this behavior now could lead to quiet data corruption. - It also clarifies expectations around MADV_FREE and maybe MADV_REMOVE. It avoids the situation where one quietly behaves different than the others. MADV_FREE_LOCKED can be added later. - The combination of mlock() and madvise() in the first place is probably niche. But where it happens, I'd say that dropping pages from a locked region once they don't contain secrets or won't page anymore is much saner than relying on mlock to protect memory from speculative or errant invalidation calls. It's just that we can't change the default behavior because of the two previous points. Given that, an explicit new flag seems to make the most sense. Signed-off-by: Johannes Weiner --- include/uapi/asm-generic/mman-common.h | 2 ++ mm/madvise.c | 16 +++++++++++++--- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 1567a3294c3d..6c1aa92a92e4 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -75,6 +75,8 @@ #define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */ #define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */ +#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/madvise.c b/mm/madvise.c index 5604064df464..12dfa14bc985 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -800,6 +800,13 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma, return 0; } +static bool can_madv_dontneed_free(struct vm_area_struct *vma, int behavior) +{ + if (behavior == MADV_DONTNEED_LOCKED) + return !(vma->vm_flags & (VM_HUGETLB|VM_PFNMAP)); + return can_madv_lru_vma(vma); +} + static long madvise_dontneed_free(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end, @@ -808,7 +815,8 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; *prev = vma; - if (!can_madv_lru_vma(vma)) + + if (!can_madv_dontneed_free(vma, behavior)) return -EINVAL; if (!userfaultfd_remove(vma, start, end)) { @@ -830,7 +838,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ return -ENOMEM; } - if (!can_madv_lru_vma(vma)) + if (!can_madv_dontneed_free(vma, behavior)) return -EINVAL; if (end > vma->vm_end) { /* @@ -850,7 +858,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, VM_WARN_ON(start >= end); } - if (behavior == MADV_DONTNEED) + if (behavior == MADV_DONTNEED || behavior == MADV_DONTNEED_LOCKED) return madvise_dontneed_single_vma(vma, start, end); else if (behavior == MADV_FREE) return madvise_free_single_vma(vma, start, end); @@ -988,6 +996,7 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: + case MADV_DONTNEED_LOCKED: return madvise_dontneed_free(vma, prev, start, end, behavior); case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: @@ -1113,6 +1122,7 @@ madvise_behavior_valid(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_DONTNEED_LOCKED: case MADV_FREE: case MADV_COLD: case MADV_PAGEOUT: -- 2.35.1