Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp2117816pxb; Wed, 9 Feb 2022 11:12:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJyH4/dUWOarC054DN24lu3bQfJESHpPflPDCbUI+l6hPzh947foL5Rr9n43SEMaJi6nAtMD X-Received: by 2002:a05:6a00:22c9:: with SMTP id f9mr3700806pfj.79.1644433929886; Wed, 09 Feb 2022 11:12:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644433929; cv=none; d=google.com; s=arc-20160816; b=0h/sJLCCbFNfNGhP/0uKAv2MNRZhqbV7ttsq4Q+Obbj4J21xOme7PaQTvtjL4RqOmW ic8cfeqIyQo6TkxWKU4hQnBReQb2GpIsK+5i3w+ZRv5cSm4rMYQ1W1Nbuz+yjjSTDxR0 tUTrK2TB2biRqHUh/JKu9DdeRfoRmVGREeWMdY81riHLFeMkhWoxeKkG9HvB0aq5AKug xE3cgs1ZgNEiYHAWMvfXdfHEYxASQbnAkKQlERxQBFz5UYqckum8qYKJZhmns3lIB8BT 21NzX4f5GqhHYUEO7mW5LdAUS+5wbLF+d+O6NjHVXShgTtUeDWt6yhnDXOB4gxPT5vym 3soA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:subject :from:references:cc:to:content-language:user-agent:mime-version:date :message-id:dkim-signature:dkim-signature; bh=yMhn/foFlPWj85JoD23YnerTlqudGiH7pAqZto4fg9U=; b=DPXcy1n74db2FMHKVfHKhv20B9//6hQcn+lAtltx2zc4gYq8lM+WZ9e3mLk7E+BJ1I cnbe4pW6avq7B4E/ULX211r/ccg99+/WDjmdAGFvWSRMAaXsdGpp1GB25SedKFJePJDf GZ1xkezOkY+5IhAXw5hOWl7nGkuyOrc/RwsLru1gFicfGKBL0cdZhcz+HntwFCeP5mMf LrE4UCYqCNlmWBoGuZ90hxbLEWc4nvZKIN0kZ29IK2QRnEDcY2TK6/QAXvp89P4cRQkL F36N9eexd5494dfqdHmKkeGed0IwrH3QfeWJwCymZ72NsqWN79icM5OsCmLRJrVLTT+f PIrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=etFlE1JM; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e2si3918651pjr.178.2022.02.09.11.12.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Feb 2022 11:12:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=etFlE1JM; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CBA85C014F2E; Wed, 9 Feb 2022 11:11:29 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239264AbiBISb3 (ORCPT + 99 others); Wed, 9 Feb 2022 13:31:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238761AbiBISb0 (ORCPT ); Wed, 9 Feb 2022 13:31:26 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 33DA4C0613C9 for ; Wed, 9 Feb 2022 10:31:29 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 8F09921107; Wed, 9 Feb 2022 18:31:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1644431487; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yMhn/foFlPWj85JoD23YnerTlqudGiH7pAqZto4fg9U=; b=etFlE1JMkGGbKv6nzsqVSgDCVnS9qgV4AWmz7Ytysy6AI87OYRQRRVglIx+dT0UmrhSIgA jIsCL3X8Y/X/RyxIFEjxb/JHu5iL7IyaoKdjD9TpoDzV5SxEp4FjhQGzmQafM0WRPmvBmO pbfq7R1on05QzcyFoT8JQsyNZRRAmTE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1644431487; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yMhn/foFlPWj85JoD23YnerTlqudGiH7pAqZto4fg9U=; b=b2s2T20o/OE+G6C6c8NfWnD9vV2puA8DZ6pxtfEH73W3va/RzzB5l0LBsZxcsZQPXKSe9C XCa8hkkFCG+IYuCQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 58AE613D93; Wed, 9 Feb 2022 18:31:27 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 6uHxFH8IBGLqZgAAMHmgww (envelope-from ); Wed, 09 Feb 2022 18:31:27 +0000 Message-ID: <4a5bc989-e59a-d421-faf4-8156f700ec99@suse.cz> Date: Wed, 9 Feb 2022 19:31:27 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1 Content-Language: en-US To: Hugh Dickins , Andrew Morton Cc: Michal Hocko , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Alistair Popple , Johannes Weiner , Rik van Riel , Suren Baghdasaryan , Yu Zhao , Greg Thelen , Shakeel Butt , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com> <5ed1f01-3e7e-7e26-cc1-2b7a574e2147@google.com> From: Vlastimil Babka Subject: Re: [PATCH 01/13] mm/munlock: delete page_mlock() and all its works In-Reply-To: <5ed1f01-3e7e-7e26-cc1-2b7a574e2147@google.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/6/22 22:30, Hugh Dickins wrote: > We have recommended some applications to mlock their userspace, but that > turns out to be counter-productive: when many processes mlock the same > file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it > is needed for write, to remove any vma mapping that file from rmap's tree; > but hogged for read by those with mlocks calling page_mlock() (formerly > known as try_to_munlock()) on *each* page mapped from the file (the > purpose being to find out whether another process has the page mlocked, > so therefore it should not be unmlocked yet). > > Several optimizations have been made in the past: one is to skip > page_mlock() when mapcount tells that nothing else has this page > mapped; but that doesn't help at all when others do have it mapped. > This time around, I initially intended to add a preliminary search > of the rmap tree for overlapping VM_LOCKED ranges; but that gets > messy with locking order, when in doubt whether a page is actually > present; and risks adding even more contention on the i_mmap_rwsem. > > A solution would be much easier, if only there were space in struct page > for an mlock_count... but actually, most of the time, there is space for > it - an mlocked page spends most of its life on an unevictable LRU, but > since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has > been redundant. Let's try to reuse its page->lru. > > But leave that until a later patch: in this patch, clear the ground by > removing page_mlock(), and all the infrastructure that has gathered > around it - which mostly hinders understanding, and will make reviewing > new additions harder. Don't mind those old comments about THPs, they > date from before 4.5's refcounting rework: splitting is not a risk here. > > Just keep a minimal version of munlock_vma_page(), as reminder of what it > should attend to (in particular, the odd way PGSTRANDED is counted out of > PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range(). Move > unchanged __mlock_posix_error_return() out of the way, down to above its > caller: this series then makes no further change after mlock_fixup(). > > Signed-off-by: Hugh Dickins While I understand the reasons to clear the ground first, wonder what are the implications for bisectability - is there a risk of surprising failures? Maybe we should at least explicitly spell out the implications here? IIUC, pages that once become mlocked, will stay mlocked, implicating the Mlocked meminfo counter and inability to reclaim them. But if e.g. a process that did mlockall() exits, its exclusive pages will be freed anyway, so it's not a catastrophic kind of leak, right? Yet it differs from the existing "failure modes" where pages would be left as "stranded" due to failure of being isolated, because they would at least go through TestClearPageMlocked and counters update. > > /* > @@ -413,75 +136,11 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec, > * > * Returns with VM_LOCKED cleared. Callers must be prepared to > * deal with this. > - * > - * We don't save and restore VM_LOCKED here because pages are > - * still on lru. In unmap path, pages might be scanned by reclaim > - * and re-mlocked by page_mlock/try_to_unmap before we unmap and > - * free them. This will result in freeing mlocked pages. > */ > void munlock_vma_pages_range(struct vm_area_struct *vma, > unsigned long start, unsigned long end) > { > - vma->vm_flags &= VM_LOCKED_CLEAR_MASK; Should we at least keep doing the flags clearing? I haven't check if there are some VM_BUG_ONs that would trip on not cleared, but wouldn't be entirely surprised.