Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5950005pxb; Mon, 14 Feb 2022 11:28:43 -0800 (PST) X-Google-Smtp-Source: ABdhPJyeZLpxWBFYwJ7oqPBOrU8qi9o8+7ZbQbSUhq7UkpymLMUeEWBDXOvPb3zEFETf1TMGp26L X-Received: by 2002:a63:fe02:: with SMTP id p2mr445620pgh.193.1644866923693; Mon, 14 Feb 2022 11:28:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644866923; cv=none; d=google.com; s=arc-20160816; b=N7ttSaPWSXJbo+qHt5Gk1apDE1L9KmM8NGj229ubeDm59QnoDAX+qLethxhPUgIEbu xn0ha+m5MNzRlcbgRscSCRaUylWyNlmTkzovQkwPX9z9vcjetBrqbdJQBFgJ4hdwFuLh H93wKm5C0r0ffymg9rkJ7cH3tdZy1HoK0RRtRTUfm/iZMUQoyrt7EEwsd7LAwawJh+0t fEhWl+2fFcskpJGFaCP4EVq26ABxrZIg+eplqB9Ii3qxcK2I2QemL+ef4wQ8Cul/uWTN 2Vf7vO3syRMqfys72HS9BqFe+TeZUz9JYiIbtD7cXr3GxouN9SKrMortPZgqk9KimLVQ COwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature:dkim-signature; bh=+72dd9i2dKPgLkrK9l7LAsT6WmguVVk+/wOxv28xopg=; b=bBl9yjoLMbt9StSI88FUN3PqLnSv0aD8dRcxRFCjLivWNvtagSrhOFbx0xIHmw7y5O 2aFF3zQF6nDgMl3E1dF422uVhAZDGlQQZN/wNbUJFXl1JUO6ZWk2eQe0zwEy7ZFjc/pA MM5ZwgE2nXZCBzwsMuXPhBTNd95dL9CGE1ITbyXVGWv8WogOP+3hABx5NHKWI4vcgUPR KE4W3is06UkJ2h0QlshnAa8mW+5AuzQfinnoj4tz7n0NbNEXs1ikJQ0flPtmCfc1CAof Qxy9woxUSRppHTmQd/zT3ODbqVVb5goMf+GfnbIqyLutXebi4OEosaghaqn+vICUBzsr IXLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=E5x2yLSg; dkim=neutral (no key) header.i=@suse.cz; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id e4si13652908plx.172.2022.02.14.11.28.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 14 Feb 2022 11:28:43 -0800 (PST) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=E5x2yLSg; dkim=neutral (no key) header.i=@suse.cz; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 36129939A8; Mon, 14 Feb 2022 11:17:25 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348730AbiBNKpK (ORCPT + 99 others); Mon, 14 Feb 2022 05:45:10 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:60378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349101AbiBNKoz (ORCPT ); Mon, 14 Feb 2022 05:44:55 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45419BC9D for ; Mon, 14 Feb 2022 02:07:04 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 00CAA210EC; Mon, 14 Feb 2022 10:07:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1644833223; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+72dd9i2dKPgLkrK9l7LAsT6WmguVVk+/wOxv28xopg=; b=E5x2yLSgPfZyMGWvWNYQkuer+RuAcL54G2pyS/cQQS0/1V4mEfHsKv1rf2KCBB+AhEwpin gVqjWDokUVFXZQVeM9jOKWtfgD4lkXcXOBTQF60qnoLMbXLYVHQEdznnGV9anuG9pX6Ja/ R1TQH7sI21yn+4LFF3ECaPhAH7umdvk= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1644833223; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+72dd9i2dKPgLkrK9l7LAsT6WmguVVk+/wOxv28xopg=; b=HJDw8vxtElWuZxUEz8sglubYXcadMkFL59Xzlh52RHBoOvXFz4E308F/eGKCzDskII2zOA bpKh+4crkPy0guCg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A876B13A3C; Mon, 14 Feb 2022 10:07:02 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WnXkJsYpCmIROwAAMHmgww (envelope-from ); Mon, 14 Feb 2022 10:07:02 +0000 Message-ID: Date: Mon, 14 Feb 2022 11:07:01 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH v2 01/13] mm/munlock: delete page_mlock() and all its works Content-Language: en-US To: Hugh Dickins , Andrew Morton Cc: Michal Hocko , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Alistair Popple , Johannes Weiner , Rik van Riel , Suren Baghdasaryan , Yu Zhao , Greg Thelen , Shakeel Butt , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com> <5ed1f01-3e7e-7e26-cc1-2b7a574e2147@google.com> <4a5bc989-e59a-d421-faf4-8156f700ec99@suse.cz> <957e2ea6-d01e-256f-51a0-d927a93b50a5@suse.cz> From: Vlastimil Babka In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/14/22 07:59, Hugh Dickins wrote: > We have recommended some applications to mlock their userspace, but that > turns out to be counter-productive: when many processes mlock the same > file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it > is needed for write, to remove any vma mapping that file from rmap's tree; > but hogged for read by those with mlocks calling page_mlock() (formerly > known as try_to_munlock()) on *each* page mapped from the file (the > purpose being to find out whether another process has the page mlocked, > so therefore it should not be unmlocked yet). > > Several optimizations have been made in the past: one is to skip > page_mlock() when mapcount tells that nothing else has this page > mapped; but that doesn't help at all when others do have it mapped. > This time around, I initially intended to add a preliminary search > of the rmap tree for overlapping VM_LOCKED ranges; but that gets > messy with locking order, when in doubt whether a page is actually > present; and risks adding even more contention on the i_mmap_rwsem. > > A solution would be much easier, if only there were space in struct page > for an mlock_count... but actually, most of the time, there is space for > it - an mlocked page spends most of its life on an unevictable LRU, but > since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has > been redundant. Let's try to reuse its page->lru. > > But leave that until a later patch: in this patch, clear the ground by > removing page_mlock(), and all the infrastructure that has gathered > around it - which mostly hinders understanding, and will make reviewing > new additions harder. Don't mind those old comments about THPs, they > date from before 4.5's refcounting rework: splitting is not a risk here. > > Just keep a minimal version of munlock_vma_page(), as reminder of what it > should attend to (in particular, the odd way PGSTRANDED is counted out of > PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range(). Move > unchanged __mlock_posix_error_return() out of the way, down to above its > caller: this series then makes no further change after mlock_fixup(). > > After this and each following commit, the kernel builds, boots and runs; > but with deficiencies which may show up in testing of mlock and munlock. > The system calls succeed or fail as before, and mlock remains effective > in preventing page reclaim; but meminfo's Unevictable and Mlocked amounts > may be shown too low after mlock, grow, then stay too high after munlock: > with previously mlocked pages remaining unevictable for too long, until > finally unmapped and freed and counts corrected. Normal service will be > resumed in "mm/munlock: mlock_pte_range() when mlocking or munlocking". Great! > Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka