Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp2117074pxm; Fri, 4 Mar 2022 09:40:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJx1VDggn49GlIAbnrtCKOcBxzeE3SvHfWe/qrbmgXy52SXrFmM6ctWGlK+XChPmE8e7A7P1 X-Received: by 2002:a17:907:c13:b0:6d7:a03:8d9f with SMTP id ga19-20020a1709070c1300b006d70a038d9fmr14631046ejc.141.1646415621356; Fri, 04 Mar 2022 09:40:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646415621; cv=none; d=google.com; s=arc-20160816; b=piXTnjBkLDI67y8kMyCU8rDuBU0IYl6qG3QMCZDsOc/4Rmih2FKmps0mfEr6sU8Fu4 D2bfhzcjdYSLoKTdMIgBiHcM1/6rDlbPeQ8Gk8FhtbqJtY4U6SK4sy3rndC1mjjGSsUK 8xwNDwbsZ5L8ITCKXqjeO0be9RLc2m52xW8PFl/RXlfDDEKoLeRHjgcs6LV9Tb6RDmOM 0u2N3Nw9DRHIAUZw2TeynVv9a6oJktcsATMuALvnS7uh+YfAb28tNjRAOqdovJEQClpN wS4HGIO94dR/r5ObR9029pqe/rWNDq/DUbslp5584bj4aQZVGj+BPsxYq6U8AqdAuRLh GKHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=VqMdHxq3Tr9wFinwTE0WAXhnv9tEGYziFkIAVbbyOnQ=; b=0zUIx7yeFbAAsio49/0BwCPbOHbAItQiHh6J6J+YBQbJwoPHvRIO/0ksTmN4ZxmFk0 FY1vYmtB7hPcOGveQVc008s5CHY9lHVBMUGx+KhJyQhWzx3qD58Wm+BDnGAx1JfTT6yw RhfHPEIl6PvsaWBXCoWyZkT04OTuP1I9dtFTYMrRZtOCVM9yU46fJr2/HjkN1FDB2ErH 3liHZD5eVIH//nZRqSE922tZlQm3hM4KqoUItao3kMpTHU96sRH2D1GIqcos+oWx9Iks B1Zu2cjQ37ktX2uhcPe+1jeF0toWXMIq930u0dTvicQs7YDNg7fZM6hQ8TXLQkAY+mnz e5bQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=AkRsaKEn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l10-20020a056402254a00b00415620cc54bsi5162013edb.526.2022.03.04.09.39.43; Fri, 04 Mar 2022 09:40:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=AkRsaKEn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237067AbiCDJN3 (ORCPT + 99 others); Fri, 4 Mar 2022 04:13:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237540AbiCDJNX (ORCPT ); Fri, 4 Mar 2022 04:13:23 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B92AB38790 for ; Fri, 4 Mar 2022 01:12:35 -0800 (PST) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 77377212C5; Fri, 4 Mar 2022 09:12:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1646385154; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=VqMdHxq3Tr9wFinwTE0WAXhnv9tEGYziFkIAVbbyOnQ=; b=AkRsaKEn6BkQPgOaWVvK+fXQY1+K/rVYo7Ck9TjWhm7pX6kkJptc+hon/dZMW/TzohL2lh nPfP6WzaHDN/4tGiPUz2vFwgkPKPKBrRSNrstsA/DXm6XV0eaZ4+a9fYQdPUGxC4CV+Wc6 MSF6h2BaLxV6mE7tiW/OlzH/mp+82V8= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 62190A3B84; Fri, 4 Mar 2022 09:12:34 +0000 (UTC) Date: Fri, 4 Mar 2022 10:12:33 +0100 From: Michal Hocko To: Johannes Weiner Cc: Andrew Morton , Vlastimil Babka , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: madvise: MADV_DONTNEED_LOCKED Message-ID: References: <20220303212956.229409-1-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URI_DOTEDU autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [please CC linux-api if you are going to repost with the fix suggested by Nadav] On Thu 03-03-22 16:47:34, Johannes Weiner wrote: > On Thu, Mar 03, 2022 at 04:29:56PM -0500, Johannes Weiner wrote: > > MADV_DONTNEED historically rejects mlocked ranges, but with > > MLOCK_ONFAULT and MCL_ONFAULT allowing to mlock without populating, > > there are valid use cases for depopulating locked ranges as well. > > > > Users mlock memory to protect secrets. There are allocators for secure > > buffers that want in-use memory generally mlocked, but cleared and > > invalidated memory to give up the physical pages. This could be done > > with explicit munlock -> mlock calls on free -> alloc of course, but > > that adds two unnecessary syscalls, heavy mmap_sem write locks, vma > > splits and re-merges - only to get rid of the backing pages. > > > > Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are > > okay with on-demand initial population. It seems valid to selectively > > free some memory during the lifetime of such a process, without having > > to mess with its overall policy. > > > > Why add a separate flag? Isn't this a pretty niche usecase? > > > > - MADV_DONTNEED has been bailing on locked vmas forever. It's at least > > conceivable that someone, somewhere is relying on mlock to protect > > data from perhaps broader invalidation calls. Changing this behavior > > now could lead to quiet data corruption. > > > > - It also clarifies expectations around MADV_FREE and maybe > > MADV_REMOVE. It avoids the situation where one quietly behaves > > different than the others. MADV_FREE_LOCKED can be added later. > > > > - The combination of mlock() and madvise() in the first place is > > probably niche. But where it happens, I'd say that dropping pages > > from a locked region once they don't contain secrets or won't page > > anymore is much saner than relying on mlock to protect memory from > > speculative or errant invalidation calls. It's just that we can't > > change the default behavior because of the two previous points. > > > > Given that, an explicit new flag seems to make the most sense. > > > > Signed-off-by: Johannes Weiner > > Just for context, I found this discussion back from 2018: > > https://lkml.iu.edu/hypermail/linux/kernel/1806.1/00483.html > > It seems to me that the usecase wasn't really in question, but people > weren't sure about the API, and then Jason found a workaround before > the discussion really concluded. I was asked internally about this > feature, so I'm submitting another patch in this direction, but with > more thoughts on why I chose to go with a new flag. Hopefully we can > work it out this time around :-) Thanks for the link. The topic sounded familiar but I couldn't really remember any details anymore. Now I do remember that I wasn't happy about special casing MLOCK_ONFAULT. A dedicated madvise operation is definitely safer and I am OK with that. Presented usecases make sense to me as well. Btw. I have a recollection that Mike is working on MADV_DONTNEED support for hugetlb pages. I do not know the current state of that work. Not that it would make nay impact on your new flag but some minor changes might be needed. Anyway, after the madvise_need_mmap_write is addressed, feel free to add Acked-by: Michal Hocko Thanks! -- Michal Hocko SUSE Labs