Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp532904lqt; Thu, 6 Jun 2024 10:21:23 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVXhq4y0fNyUlxlFwv3/zw3oIbHe+6nAb9F34RUwy2l6ZSvbT9tVTwmAvTbrUPy31ORgOpCsLk/W7dqpfNkt2NRjdBzuzpLfh0RV6Jrlw== X-Google-Smtp-Source: AGHT+IGHOCZ2i77mzvZo+d7sqhZy3txXnIraFZfJqWScSh2dSvUGpLVbcn/uHf4+/a2cYK25eFNo X-Received: by 2002:a17:90b:228a:b0:2bf:e6f1:59e9 with SMTP id 98e67ed59e1d1-2c2bcad65c3mr180185a91.20.1717694482761; Thu, 06 Jun 2024 10:21:22 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717694482; cv=pass; d=google.com; s=arc-20160816; b=j2mT7VweVjUgC15oWJqUJ2pf3NjhQVNcxXVl3WUGBTJ4LBi82FidismhQhMrzAsz1M fhSzlFkTgbUWt0VKceWkPUmK5Rhr3EQo8n+0qx4Qlxq03mKbm+wF7N9B1qFm1DcRDeiu 3PQeGoZvOxqIq43rYorcMZPCGxHKmywtU8xdcUtEJOxZiSnzfF5dV4eM7bHUq1kvG1FJ W8xhfpcJm9Rs4IuZNW007FvRJhxs5H6vC5nye4b6mg8+RpzOoyxpZUMYWZcD09SQuQ3D lOojq64oJhTOGUS322dWpnojNo4xWbGwVYev2xfFpOVRuEy+TR3I76k9Bx6y0gbVCl3z CzJg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=hRK/mnhg9puUTTzwI/wwQsIKXT3wkOFAvZtn/HKANeU=; fh=mNHBDpg3/H7MBXpFSxYck3qatA8vLR8Vyje8RxiW83o=; b=PqWiog58cbcTTxb8mlxt4gr7/guendyb+F+vlUE4yK5IBZXxHUCV90/VO3uYS8jHc1 QcN0NXdFCR3ZzRVVMvXp8pJTva85l39cEY5HZX8xgEjDWnzkPVI566roNiZsQZ2qNqBa gq+14/s+marmdbGWigYfQJX9ATF0wh48U1RcgQS8PZhx9VvDWDn1t3mow/AT83UnnCBT 1LrtiKse2mTYwxarK6LPmHnahRW05t6YR6ud2Ap5GdPNWD1JrsUOB1HoJxMT6sKyipgi /HCRa5cvr5BnUpCcS/INNNWCgl9mLLC4+QtGdR3Xb/BU0KaChdBnPsuUqjJmvQtCzJ1F C/gQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=ZfHKPHd8; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-204778-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-204778-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id 98e67ed59e1d1-2c2806335cfsi1460130a91.12.2024.06.06.10.21.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jun 2024 10:21:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-204778-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=ZfHKPHd8; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-204778-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-204778-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 6C7A628883F for ; Thu, 6 Jun 2024 17:14:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2EE7B1991A8; Thu, 6 Jun 2024 17:14:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZfHKPHd8" Received: from mail-yw1-f172.google.com (mail-yw1-f172.google.com [209.85.128.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8BE9D198E86 for ; Thu, 6 Jun 2024 17:14:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717694048; cv=none; b=FHyiAXuXzFVBrqIwYb7iAkncUwHFPE/byLQuV/oqOuVLtIJSoKcuhU+Xi6bMJKxZSzQel7ZbWfVFwXiZDvhFWTLkHmsebm9uiYa5DOtNgjrBMvwLzddsKahjvJMSERQ81Cqho5h9ifEERfI12/4sbT3o/7xSbNTKRw7xCfsWqps= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717694048; c=relaxed/simple; bh=3PcZbTSvMIet1KvRH/3UIizXiqdHfZwbn62ekU7BeTk=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=MJ/6OdgOuLDl3lZgqfRIj8McYQXpGkT0SVJauOeTsbVLASQczru1yMeQeaz3DXoMztE2cpn7PMwRvR+fQhthrtqZyHS5O8Q1gIx+yvXk509O4zXAFLSqWlZqZuvalVAzwOL21gp/UVVbJ4+1DhV//+9yGXbwHHQi6Lxc3AOTFxU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ZfHKPHd8; arc=none smtp.client-ip=209.85.128.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-yw1-f172.google.com with SMTP id 00721157ae682-62a0809c805so12054077b3.0 for ; Thu, 06 Jun 2024 10:14:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717694045; x=1718298845; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hRK/mnhg9puUTTzwI/wwQsIKXT3wkOFAvZtn/HKANeU=; b=ZfHKPHd88/pQdXDhSx7rj+OQOYhlJT8Ziy+oVtQGDenFjzFnp1lY1uDONjpIbY6C6L T0SbIYjboXx1e02tAi3c3ZD3Gq+G2goZHggWjMm33l5djEuEPgfApdzrTlQsVSUKx1NT STiUjeurb7q9QslwoKoU9kU6oYDOqZkkXO+LgaFSf6sv3L/fTb5ekW0oqxuv34FoqNxA T5PcV+YIq8Lh/fwjRYzWA/oIS3a3kfdo+pTp/iOWhEUJXrhxNxbjCaYVw3m+oPnMl0pm qbpCxk7rC7UMkgI24mawWB4sAvdATgJ0ztfcCGzylZuwZLWS+CzpSmGIURkswJap463h 4b2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717694045; x=1718298845; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hRK/mnhg9puUTTzwI/wwQsIKXT3wkOFAvZtn/HKANeU=; b=cyvmyl2nu8BmZ3w9s7AlkcPaIG/SKJPVOQdG1FamIpVPL++u4Cu5iAUpFhjdlZC+qi XbaC3R5KxFticGxXS7X1T/ORVrlGuz0hrMW3kGbTmA/cCNAoRUjQvxGlUu1Drn2BiwNf S9nHLdRgi+jQLBrAgRPt24SPxLIkOF/GyrjHTbWe6TmnYuyvfGshpsNs2aWmB7prZQF5 7bQ0hhrZOIZrKh2iItOtK7SuE+QdctzYlPxy37xRRl06qh3dUrgJjmsUcrBOXkZR10Sm DHoRt2+CWfzacUZ6CANDUXY9qXx1i0W4cgPxL2PuKnmhCg56zBSsLGycvlmq92Xewn89 Ozmw== X-Forwarded-Encrypted: i=1; AJvYcCW31x5uYZpSw/Fwz90XtXxjGKnZ2UdhlsrzS77UpIhuA0eQhcgYg6+ADp7jwSzwDnE5UKoQhLn58LtG6rIlgqsV21HMgsGCWoRbwnKJ X-Gm-Message-State: AOJu0YxkP2ZvALq0qI0/u/EdZRWz+g5Sk3sBd1zQonqXHlYTUEWo2Hgd L44CSM0XKaqTPwLFDo63AgUS6fSKiKx3jEIu+D2vig+amjmigIB8PDGFAL0CipuNviAfqn5Gr23 OztCUmjh8oz2/AAU61YNEHk6jzHbPRFJgmUC2 X-Received: by 2002:a0d:d709:0:b0:615:1a0:78ea with SMTP id 00721157ae682-62cbff3b0eamr45661767b3.34.1717694044790; Thu, 06 Jun 2024 10:14:04 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-2-andrii@kernel.org> <5fmylram4hhrrdl7vf6odyvuxcrvhipsx2ij5z4dsfciuzf4on@qwk7qzze6gbt> In-Reply-To: From: Suren Baghdasaryan Date: Thu, 6 Jun 2024 10:13:52 -0700 Message-ID: Subject: Re: [PATCH v3 1/9] mm: add find_vma()-like API but RCU protected and taking VMA lock To: Andrii Nakryiko Cc: "Liam R. Howlett" , Matthew Wilcox , Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Jun 6, 2024 at 9:52=E2=80=AFAM Andrii Nakryiko wrote: > > On Wed, Jun 5, 2024 at 4:22=E2=80=AFPM Suren Baghdasaryan wrote: > > > > On Wed, Jun 5, 2024 at 10:03=E2=80=AFAM Liam R. Howlett wrote: > > > > > > * Andrii Nakryiko [240605 12:27]: > > > > On Wed, Jun 5, 2024 at 9:24=E2=80=AFAM Andrii Nakryiko > > > > wrote: > > > > > > > > > > On Wed, Jun 5, 2024 at 9:13=E2=80=AFAM Andrii Nakryiko > > > > > wrote: > > > > > > > > > > > > On Wed, Jun 5, 2024 at 6:33=E2=80=AFAM Liam R. Howlett wrote: > > > > > > > > > > > > > > * Matthew Wilcox [240604 20:57]: > > > > > > > > On Tue, Jun 04, 2024 at 05:24:46PM -0700, Andrii Nakryiko w= rote: > > > > > > > > > +/* > > > > > > > > > + * find_and_lock_vma_rcu() - Find and lock the VMA for a= given address, or the > > > > > > > > > + * next VMA. Search is done under RCU protection, withou= t taking or assuming > > > > > > > > > + * mmap_lock. Returned VMA is guaranteed to be stable an= d not isolated. > > > > > > > > > > > > > > > > You know this is supposed to be the _short_ description, ri= ght? > > > > > > > > Three lines is way too long. The full description goes bet= ween the > > > > > > > > arguments and the Return: line. > > > > > > > > > > > > Sure, I'll adjust. > > > > > > > > > > > > > > > > > > > > > > > + * @mm: The mm_struct to check > > > > > > > > > + * @addr: The address > > > > > > > > > + * > > > > > > > > > + * Returns: The VMA associated with addr, or the next VM= A. > > > > > > > > > + * May return %NULL in the case of no VMA at addr or abo= ve. > > > > > > > > > + * If the VMA is being modified and can't be locked, -EB= USY is returned. > > > > > > > > > + */ > > > > > > > > > +struct vm_area_struct *find_and_lock_vma_rcu(struct mm_s= truct *mm, > > > > > > > > > + unsigned long ad= dress) > > > > > > > > > +{ > > > > > > > > > + MA_STATE(mas, &mm->mm_mt, address, address); > > > > > > > > > + struct vm_area_struct *vma; > > > > > > > > > + int err; > > > > > > > > > + > > > > > > > > > + rcu_read_lock(); > > > > > > > > > +retry: > > > > > > > > > + vma =3D mas_find(&mas, ULONG_MAX); > > > > > > > > > + if (!vma) { > > > > > > > > > + err =3D 0; /* no VMA, return NULL */ > > > > > > > > > + goto inval; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + if (!vma_start_read(vma)) { > > > > > > > > > + err =3D -EBUSY; > > > > > > > > > + goto inval; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + /* > > > > > > > > > + * Check since vm_start/vm_end might change before we= lock the VMA. > > > > > > > > > + * Note, unlike lock_vma_under_rcu() we are searching= for VMA covering > > > > > > > > > + * address or the next one, so we only make sure VMA = wasn't updated to > > > > > > > > > + * end before the address. > > > > > > > > > + */ > > > > > > > > > + if (unlikely(vma->vm_end <=3D address)) { > > > > > > > > > + err =3D -EBUSY; > > > > > > > > > + goto inval_end_read; > > > > > > > > > + } > > > > > > > > > + > > > > > > > > > + /* Check if the VMA got isolated after we found it */ > > > > > > > > > + if (vma->detached) { > > > > > > > > > + vma_end_read(vma); > > > > > > > > > + count_vm_vma_lock_event(VMA_LOCK_MISS); > > > > > > > > > + /* The area was replaced with another one */ > > > > > > > > > > > > > > > > Surely you need to mas_reset() before you goto retry? > > > > > > > > > > > > > > Probably more than that. We've found and may have adjusted t= he > > > > > > > index/last; we should reconfigure the maple state. You shoul= d probably > > > > > > > use mas_set(), which will reset the maple state and set the i= ndex and > > > > > > > long to address. > > > > > > > > > > > > Yep, makes sense, thanks. As for the `unlikely(vma->vm_end <=3D > > > > > > address)` case, I presume we want to do the same, right? Basica= lly, on > > > > > > each retry start from the `address` unconditionally, no matter = what's > > > > > > the reason for retry. > > > > > > > > > > ah, never mind, we don't retry in that situation, I'll just put > > > > > `mas_set(&mas, address);` right before `goto retry;`. Unless we s= hould > > > > > actually retry in the case when VMA got moved before the requeste= d > > > > > address, not sure, let me know what you think. Presumably retryin= g > > > > > will allow us to get the correct VMA without the need to fall bac= k to > > > > > mmap_lock? > > > > > > > > sorry, one more question as I look some more around this (unfamilia= r > > > > to me) piece of code. I see that lock_vma_under_rcu counts > > > > VMA_LOCK_MISS on retry, but I see that there is actually a > > > > VMA_LOCK_RETRY stat as well. Any reason it's a MISS instead of RETR= Y? > > > > Should I use MISS as well, or actually count a RETRY? > > > > > > > > > > VMA_LOCK_MISS is used here because we missed the VMA due to a write > > > happening to move the vma (rather rare). The VMA_LOCK missed the vma= . > > > > > > VMA_LOCK_RETRY is used to indicate we need to retry under the mmap lo= ck. > > > A retry is needed after the VMA_LOCK did not work under rcu locking. > > > > Originally lock_vma_under_rcu() was used only inside page fault path, > > so these counters helped us quantify how effective VMA locking is when > > handling page faults. With more users of that function these counters > > will be affected by other paths as well. I'm not sure but I think it > > makes sense to use them only inside page fault path, IOW we should > > probably move count_vm_vma_lock_event() calls outside of > > lock_vma_under_rcu() and add them only when handling page faults. > > Alright, seems like I should then just drop count_vm_vma_lock_event() > from the API I'm adding. That would be my preference but as I said, I'm not 100% sure about this direction. > > > > > > > > > Thanks, > > > Liam