Received: by 2002:ab2:6309:0:b0:1fb:d597:ff75 with SMTP id s9csp34475lqt; Wed, 5 Jun 2024 16:16:25 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU8KkhX9XFkHMh3jRmUrX+gKEHPwRCUKMFOKs4qsO95V/WLC9Jwut72CMFVzOFB+mQKJbuKr42J5VKzmYm1yXaEiBdZ/vj6L33PIPcYdQ== X-Google-Smtp-Source: AGHT+IFmRI1JqfecmS88zAtNYH+bCJUAqCzkq53IQ2TewsXeRBSAe/31JSBOoPrl6QVg3OGTodjH X-Received: by 2002:a50:ccd4:0:b0:579:c117:2e75 with SMTP id 4fb4d7f45d1cf-57a8b6a4cb5mr2655190a12.18.1717629384878; Wed, 05 Jun 2024 16:16:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717629384; cv=pass; d=google.com; s=arc-20160816; b=rQ2kOe2k9UwEHC7tvUYvzKpWBj8r79JJ2navRrQLffnkQBizLx6AAYSjhNuxvhAuS1 ncpJy7I66NT808R21w9RmoxBvp6wa+ihXVRbPXNAqXcMl1pGSIVZTnlDpWCGuiv94A7D VB2UkSk19tIyKjHf1cdfhYLoICGZN7oF5lS+xM1JjoVV6wakfdyfWf/pAnVTAUqvR1Ja BzvgzyHJ8z0xwqNPr8XB0k7uayMPVkLU4B+6a5KWeKPlRko0ShjW55lNDOhd6arfASg+ 7vVmB6qJ05F+U3lyoDoUPBgUvX+26TriOatdEeD17gXCjcQTd4DiqrQMAavTgblcaPYK rB+Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=It1jGSpY/d5JoW3BH0YDpFaZbUvrsvHdjibkMTnj6oI=; fh=rn9uRG8s0h7k/AC4a4aymXtMyMhRCrCSGmo9WcHnRTQ=; b=lyA7871aw1J10dBmZGyskIJGNHc9tfVjQgwpgqX4EpxmUO0l4l0llGVs5B5AkAshE9 xaqSr32eFBsG/HZwRd+7wqnmymNUC89r/QOUEXdmwaIoFoMjtVz0qnpvFay3zhk8kSCL dW5hJn8yssNntrfm7LLIYSK8dUfZJWbGF3KHRNPIrUObUcUuncFHkNIho53QaeyGHube 1gqmgyMpR7rrIdcrh2HQL6pk0YoIVtm5rKz8bfCjyBnf4bGD1sMFPxUqG1JOMFV4hIW8 kj+BhOXNI6oIQka7aXTu2THeYeKFYi2txwWr9KkjLOFopSMBmE5diS9zDV5OQfTqvWmj gb6w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=0axwYly0; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-203391-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203391-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id 4fb4d7f45d1cf-57aae232ab2si42986a12.445.2024.06.05.16.16.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Jun 2024 16:16:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-203391-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=0axwYly0; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-203391-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-203391-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 6D4731F25471 for ; Wed, 5 Jun 2024 23:16:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E20C215FA95; Wed, 5 Jun 2024 23:16:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0axwYly0" Received: from mail-yb1-f170.google.com (mail-yb1-f170.google.com [209.85.219.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F2AB15FA73 for ; Wed, 5 Jun 2024 23:16:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717629375; cv=none; b=rb1x7+qYidoe0zw8v59onJBVa4JSauolpmSfkCsgkh/Bl+F+gnye148Im5l1qUtay+DI/ThRtx/Hu1EZ/p4OCbCPvu9C87l5cwJsWP653CINBmheitoMyMa6BWmCnCYhBm06VhkU1ouK99KRopi3cga1d0Mw+RCjeHcCjYHlTuA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717629375; c=relaxed/simple; bh=sCnF1WWhVGW8xVYwTfmbAr1BNfwqxr2DfWhEZMEn4bY=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=M4niHyDtfX/Gonlml0d8Mqudp7FRWaebu6Exl0VdP2zLi/oF7k8l0AofmNJuFoHSTMYh+yq20x0aZFELpcqTB2BDy2155HvBOkZYnprhaRPNQD2dfFdMp+8VmWfStDOM0vDNddrbikAh/mVxv4kB7Ipp12RVDqG0RaeZJiczKmg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0axwYly0; arc=none smtp.client-ip=209.85.219.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-yb1-f170.google.com with SMTP id 3f1490d57ef6-dfa6e0add60so419507276.3 for ; Wed, 05 Jun 2024 16:16:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1717629372; x=1718234172; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=It1jGSpY/d5JoW3BH0YDpFaZbUvrsvHdjibkMTnj6oI=; b=0axwYly08R1IzMpbeUenVfvBdvSeJTVQkAqE3dBLDuiuXw2fhLdGKRnP2nmVosxg0H 32W3X+n2fWL2hfXb7JH0T4L03t6OMJsk/Rl7dOqCaPwef+STobuFDkhNFrcj3BjL1JVX 9SIwpnDv1VgBV2EHFImMur9dBwfNZpQQD0ZWs+73im87Uk0uij2ss+w1ZBYLy/UaXttf JAb/h25UE20Lc+mSoiGPBYPEvnspIV9fyPKZkNOlh1p+saWy+T2Z9Mw5KV/2xfYBrsJG CeImEszUUD9lEUkJG5zZOv6VLSLaaXeLW1uXyO+MzSUDQLihn0Bmc1DgwtTygKubYH39 s0qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717629372; x=1718234172; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=It1jGSpY/d5JoW3BH0YDpFaZbUvrsvHdjibkMTnj6oI=; b=JZuRc4UMkvy/UB1ssdUJESYlpGmLdFLfLdcRsPJSBBlLNUevB/FJ+vv4xc6s2seGcr ZqMQoyEbLVt28cPx5YojqvtKAPEiTXfxp24870OjaNECBWhG194o+c4URxjMUZAJAXee H4N4sb2ZAdK6tfdkgsqRupC7DoQKnm96OqPq3tVN4zc+ZI0XA6Jb7OQSCkTaQJfYAeCZ XLaIQkba9V1Uj0IKONgZzJm2ADtOZVpwePCumFORAdOAh2DWl6Lj1eQQg7Z/g4OrtEV6 EBX4sODJC+Nexg/SRZpeJjnykwDNRyXei/1gxKPWTZAcvUnHd7+1k7ujV+4X8I49XEXF 3otw== X-Forwarded-Encrypted: i=1; AJvYcCUv2MGDRRVyRh7uphX0VZn8mi2bkVoQPoYE/hkRhI6jq7pUsPx+zEMg3gqz9DHJW/Kbc/4VyeJ64ZwAOGsnO/588oJFaJlfHnU7vSsT X-Gm-Message-State: AOJu0YzUrkZswO8jGOw4U+PiOy8jNvEeB4F8zupw1B6/vkK6S063P6Ao cK60weVOxWHQcxP4DaZoLHSLntWkXNII4QlR/zjePd+OKWN+gOwSqe04DyINgzf4+Ji+D/lBNXv L/MfKsDuJ+5qixnsUQA0u+WqOBjKEirI6+vUE X-Received: by 2002:a25:b299:0:b0:dfa:5d84:716b with SMTP id 3f1490d57ef6-dfacac6b7f4mr3630672276.57.1717629371946; Wed, 05 Jun 2024 16:16:11 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240605002459.4091285-1-andrii@kernel.org> <20240605002459.4091285-5-andrii@kernel.org> In-Reply-To: <20240605002459.4091285-5-andrii@kernel.org> From: Suren Baghdasaryan Date: Wed, 5 Jun 2024 16:15:58 -0700 Message-ID: Subject: Re: [PATCH v3 4/9] fs/procfs: use per-VMA RCU-protected locking in PROCMAP_QUERY API To: Andrii Nakryiko Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, gregkh@linuxfoundation.org, linux-mm@kvack.org, liam.howlett@oracle.com, rppt@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jun 4, 2024 at 5:25=E2=80=AFPM Andrii Nakryiko = wrote: > > Attempt to use RCU-protected per-VMA lock when looking up requested VMA > as much as possible, only falling back to mmap_lock if per-VMA lock > failed. This is done so that querying of VMAs doesn't interfere with > other critical tasks, like page fault handling. > > This has been suggested by mm folks, and we make use of a newly added > internal API that works like find_vma(), but tries to use per-VMA lock. > > We have two sets of setup/query/teardown helper functions with different > implementations depending on availability of per-VMA lock (conditioned > on CONFIG_PER_VMA_LOCK) to abstract per-VMA lock subtleties. > > When per-VMA lock is available, lookup is done under RCU, attempting to > take a per-VMA lock. If that fails, we fallback to mmap_lock, but then > proceed to unconditionally grab per-VMA lock again, dropping mmap_lock > immediately. In this configuration mmap_lock is never helf for long, > minimizing disruptions while querying. > > When per-VMA lock is compiled out, we take mmap_lock once, query VMAs > using find_vma() API, and then unlock mmap_lock at the very end once as > well. In this setup we avoid locking/unlocking mmap_lock on every looked > up VMA (depending on query parameters we might need to iterate a few of > them). > > Signed-off-by: Andrii Nakryiko > --- > fs/proc/task_mmu.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 614fbe5d0667..140032ffc551 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -388,6 +388,49 @@ static int pid_maps_open(struct inode *inode, struct= file *file) > PROCMAP_QUERY_VMA_FLAGS \ > ) > > +#ifdef CONFIG_PER_VMA_LOCK > +static int query_vma_setup(struct mm_struct *mm) > +{ > + /* in the presence of per-VMA lock we don't need any setup/teardo= wn */ > + return 0; > +} > + > +static void query_vma_teardown(struct mm_struct *mm, struct vm_area_stru= ct *vma) > +{ > + /* in the presence of per-VMA lock we need to unlock vma, if pres= ent */ > + if (vma) > + vma_end_read(vma); > +} > + > +static struct vm_area_struct *query_vma_find_by_addr(struct mm_struct *m= m, unsigned long addr) > +{ > + struct vm_area_struct *vma; > + > + /* try to use less disruptive per-VMA lock */ > + vma =3D find_and_lock_vma_rcu(mm, addr); > + if (IS_ERR(vma)) { > + /* failed to take per-VMA lock, fallback to mmap_lock */ > + if (mmap_read_lock_killable(mm)) > + return ERR_PTR(-EINTR); > + > + vma =3D find_vma(mm, addr); > + if (vma) { > + /* > + * We cannot use vma_start_read() as it may fail = due to > + * false locked (see comment in vma_start_read())= . We > + * can avoid that by directly locking vm_lock und= er > + * mmap_lock, which guarantees that nobody can lo= ck the > + * vma for write (vma_start_write()) under us. > + */ > + down_read(&vma->vm_lock->lock); Hi Andrii, The above pattern of locking VMA under mmap_lock and then dropping mmap_lock is becoming more common. Matthew had an RFC proposal for an API to do this here: https://lore.kernel.org/all/ZivhG0yrbpFqORDw@casper.infradead.org/. It might be worth reviving that discussion. > + } > + > + mmap_read_unlock(mm); Later on in your code you are calling get_vma_name() which might call anon_vma_name() to retrieve user-defined VMA name. After this patch this operation will be done without holding mmap_lock, however per https://elixir.bootlin.com/linux/latest/source/include/linux/mm_types.h#L58= 2 this function has to be called with mmap_lock held for read. Indeed with debug flags enabled you should hit this assertion: https://elixir.bootlin.com/linux/latest/source/mm/madvise.c#L96. > + } > + > + return vma; > +} > +#else > static int query_vma_setup(struct mm_struct *mm) > { > return mmap_read_lock_killable(mm); > @@ -402,6 +445,7 @@ static struct vm_area_struct *query_vma_find_by_addr(= struct mm_struct *mm, unsig > { > return find_vma(mm, addr); > } > +#endif > > static struct vm_area_struct *query_matching_vma(struct mm_struct *mm, > unsigned long addr, u32 = flags) > @@ -441,8 +485,10 @@ static struct vm_area_struct *query_matching_vma(str= uct mm_struct *mm, > skip_vma: > /* > * If the user needs closest matching VMA, keep iterating. > + * But before we proceed we might need to unlock current VMA. > */ > addr =3D vma->vm_end; > + vma_end_read(vma); /* no-op under !CONFIG_PER_VMA_LOCK */ > if (flags & PROCMAP_QUERY_COVERING_OR_NEXT_VMA) > goto next_vma; > no_vma: > -- > 2.43.0 >