Received: by 2002:a05:6358:a55:b0:ec:fcf4:3ecf with SMTP id 21csp6227453rwb; Wed, 18 Jan 2023 02:46:29 -0800 (PST) X-Google-Smtp-Source: AMrXdXuCNB/HUW1xJE70CZEHl69ZCi7pNVprcRwjgSI7N8cHCu6tXvDRxG3KJ1SjAQ8YiDk0XZqB X-Received: by 2002:a17:906:abd8:b0:7b1:316c:38f5 with SMTP id kq24-20020a170906abd800b007b1316c38f5mr7318304ejb.30.1674038789420; Wed, 18 Jan 2023 02:46:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674038789; cv=none; d=google.com; s=arc-20160816; b=aKwFlrQRFT8fmEKCcZUXKpz+HVVe3OXC4F94LstaLG2cPiy6gjv4hNk3OaA+qoQVYs +9jZHGAiZV/5R9n9cDIQJbQajti5RbTQEHr0fHVsNMucKbjQ8KuTEgySAq99Vnozs4gi CWrVaAkYtR6IxXplLOaZacW9sOlV6Puq7S8035pHDtsx8M677+uX/JIHeGjcga92nN5N fzCg8JyGjJRW4mZBD1LuXOHypt7iGccZVckiasBuKBdwY/lCr61WnYkZrewKxSfxGfYu dWNLYO42+HwYdesC84CqxLG3XVJjkBpDGW7VZkM3GugTN5J6jHX6xo3JIFR4BEk7a25b 7R9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=438nt7jo257JN0NDxWdsUaukIUI34KHbTWX932uaDtE=; b=p7vRsqmeNvmbqQPegR30y4k/4gC79RqSCwkSNO0bN3i2VCa4oJcw0BtewIdIa9oVCi Wk4lsS/JGfNm6/VV+lV0KXiIsKrIgQ4evETawcXhAoD9VjjCDxfEDPJYJAe4lfsD5aYo Wp9FwVLeavmoMsptl/NDhgkrGy1KXMIBSIggY9Yi3pcIPLuhpHH5TbZLOAsIJfsobW+9 4m3eEInOb5kZ+ox70wF0CoDSLtdXzbXrDPgfplXuzMQGFmNYZYtHOF253lKridxGGjav PQnMlDlUT2VfA5gS3q+SW+YDZGy5dJt/GuosZrbm9VlW21unTAeUu0Dsr1xnH09w8WMM ZdLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=g0sbYN+b; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dp17-20020a170906c15100b00871442f107esi8946656ejc.873.2023.01.18.02.46.18; Wed, 18 Jan 2023 02:46:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=g0sbYN+b; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230152AbjARKLg (ORCPT + 45 others); Wed, 18 Jan 2023 05:11:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44950 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbjARKKx (ORCPT ); Wed, 18 Jan 2023 05:10:53 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F5449576F for ; Wed, 18 Jan 2023 01:18:18 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E7FF321018; Wed, 18 Jan 2023 09:18:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1674033496; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=438nt7jo257JN0NDxWdsUaukIUI34KHbTWX932uaDtE=; b=g0sbYN+brR7TQ3n0vUl1nEJKdgAcwgOWuLvnlLuh4Ljm8yKKlyqyRzYGHHlJ5G4hSinscU ZwP8lIlnwOEaHQfwZADHptX9kCosF1BY1oVLj3MPqBe8P2Z04nTaF1g+yfLgx9dN97jLsY EwMwLYb1PZKQWvPDR+TlDdUSo2KdsOI= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1C1E3139D2; Wed, 18 Jan 2023 09:18:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id ThJ9Ble5x2OpMgAAMHmgww (envelope-from ); Wed, 18 Jan 2023 09:18:15 +0000 Date: Wed, 18 Jan 2023 10:18:08 +0100 From: Michal Hocko To: Matthew Wilcox Cc: Suren Baghdasaryan , akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, paulmck@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, hughlynch@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: [PATCH 12/41] mm: add per-VMA lock and helper functions to control it Message-ID: References: <20230109205336.3665937-1-surenb@google.com> <20230109205336.3665937-13-surenb@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 17-01-23 21:54:58, Matthew Wilcox wrote: > On Tue, Jan 17, 2023 at 01:21:47PM -0800, Suren Baghdasaryan wrote: > > On Tue, Jan 17, 2023 at 7:12 AM Michal Hocko wrote: > > > > > > On Tue 17-01-23 16:04:26, Michal Hocko wrote: > > > > On Mon 09-01-23 12:53:07, Suren Baghdasaryan wrote: > > > > > Introduce a per-VMA rw_semaphore to be used during page fault handling > > > > > instead of mmap_lock. Because there are cases when multiple VMAs need > > > > > to be exclusively locked during VMA tree modifications, instead of the > > > > > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock > > > > > exclusively and setting vma->lock_seq to the current mm->lock_seq. When > > > > > mmap_write_lock holder is done with all modifications and drops mmap_lock, > > > > > it will increment mm->lock_seq, effectively unlocking all VMAs marked as > > > > > locked. > > > > > > > > I have to say I was struggling a bit with the above and only understood > > > > what you mean by reading the patch several times. I would phrase it like > > > > this (feel free to use if you consider this to be an improvement). > > > > > > > > Introduce a per-VMA rw_semaphore. The lock implementation relies on a > > > > per-vma and per-mm sequence counters to note exclusive locking: > > > > - read lock - (implemented by vma_read_trylock) requires the the > > > > vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to > > > > differ. If they match then there must be a vma exclusive lock > > > > held somewhere. > > > > - read unlock - (implemented by vma_read_unlock) is a trivial > > > > vma->lock unlock. > > > > - write lock - (vma_write_lock) requires the mmap_lock to be > > > > held exclusively and the current mm counter is noted to the vma > > > > side. This will allow multiple vmas to be locked under a single > > > > mmap_lock write lock (e.g. during vma merging). The vma counter > > > > is modified under exclusive vma lock. > > > > > > Didn't realize one more thing. > > > Unlike standard write lock this implementation allows to be > > > called multiple times under a single mmap_lock. In a sense > > > it is more of mark_vma_potentially_modified than a lock. > > > > In the RFC it was called vma_mark_locked() originally and renames were > > discussed in the email thread ending here: > > https://lore.kernel.org/all/621612d7-c537-3971-9520-a3dec7b43cb4@suse.cz/. > > If other names are preferable I'm open to changing them. > > I don't want to bikeshed this, but rather than locking it seems to be > more: > > vma_start_read() > vma_end_read() > vma_start_write() > vma_end_write() > vma_downgrade_write() > > ... and that these are _implemented_ with locks (in part) is an > implementation detail? Agreed! > Would that reduce people's confusion? Yes I believe that naming it less like a locking primitive will clarify it. vma_{start,end}_[try]read is better indeed. I am wondering about the write side of things because that is where things get confusing. There is no explicit write lock nor unlock. vma_start_write sounds better than the vma_write_lock but it still lacks that pairing with vma_end_write which is never the right thing to call. Wouldn't vma_mark_modified and vma_publish_changes describe the scheme better? Downgrade case is probably the least interesting one because that is just one off thing that can be completely hidden from any code besides mmap_write_downgrade so I wouldn't be too concern about that one. But as you say, no need to bikeshed this too much. Great naming is hard and if the scheme is documented properly we can live with a suboptimal naming as well. -- Michal Hocko SUSE Labs