Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp3433161lqp; Tue, 26 Mar 2024 09:02:15 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUsYc2ARXOKw3USw3xy7SpxzDYBlq/UyFzFO1dh9UUYsWzdDOlk8AhMjJbESERdbv7ugbWzCcUaUDUply84+Oi2BvhAri0aTzL4BPbvSw== X-Google-Smtp-Source: AGHT+IHeBkPh0HASdx0xIJHXHk7WZL6vXdVAsaVN/U9SheApwYL69+WcLLLokiRGSa1bQARb9+G4 X-Received: by 2002:a05:6102:3138:b0:473:3c61:dbac with SMTP id f24-20020a056102313800b004733c61dbacmr1241945vsh.7.1711468935055; Tue, 26 Mar 2024 09:02:15 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711468935; cv=pass; d=google.com; s=arc-20160816; b=biov/IYM32LXdyx9LBE7qwlVvEvYo5XDHXtD8c8Dowadze3UUP8rgOPPwtiOH5HhNe JdslorGE/vYl3MpmuTKP7N6PXW9Dz9W9fqwkhFxkEM7EBEyT3PWeYn64UJYSWCYFKB/B Qkg8qbyuDJ3uwy2ZO6s4+Nil5m2KwR+JT7vwCtO+OhmCre3zTRz7fKxTmrADHaxJmuuV cfKAjj6RwgMIQdr+uhk6kWz7Ixi0bLw+2hxI6uoGAkg9JF+0fsmqrUF+r9y+uDCdLqbM uJZ+BUIsH+TmYKx56uav5yKX0JM/yoLpV2lQWCtMDvbqGvuZrzjuLEIt2nTSAD7YXtfr 51gw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=aQJPy88vBYo4nQyyX6JO+jPm5JtOURILAq3nriLwYbQ=; fh=zBkFNGe9GP4F5a2jffH53jHiN5q2aM3W0P+1oLDSeC4=; b=nntDDCB2h304RvEkqkJxBn4v0dGtEPaJcqQg2E++faGlsvnXSj+ryanyUfUBnOCbTS O90eDRxMVKiUz2s43wONuY0Jn4WncUbt1bfPq79gKB9ZPhDf9eDPHIRHhqk1wQoX1w9/ UFurAa7uCFl1wAOIPu2n5tmVxCtTP0fqgMT8QcyV1wKlPLVDjlE4zKPkk/tPtcrTYNhw IEVOjpG0WhV2EDWF1zOd+JuZLq5xIcWQzKMnUIIGDJsVk+SBcDAXyBWn9iHIdJTiNz0H puTRfe9/14ILsl0zxVbiO6+tRtUzHgx9/Uvz7+GCt001SmW8ehfdFa7jHoVysLu2vcYn rPHA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=I+AVg4pM; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-119414-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-119414-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id t13-20020a05610210cd00b00476e7541452si965022vsr.673.2024.03.26.09.02.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 09:02:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-119414-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=I+AVg4pM; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-119414-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-119414-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 40BF21C62C61 for ; Tue, 26 Mar 2024 16:02:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5917113C90A; Tue, 26 Mar 2024 16:02:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="I+AVg4pM" Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 784D22574B; Tue, 26 Mar 2024 16:02:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711468923; cv=none; b=kcpIp+7zWp/Mv55on3hZsV9BTNCOgMvCjpuNoUUGWZYmGTCKWvU9VMXUuhnOXGBLnMyK/T6zWFiRXZxcR6aQj0a1DOjBI1CF4ZSZfKPJTsET6FR5IbYvQ2qrXlLVHntK5cPCuMhdmgN6JVQVXCSeZ+UH/y2afyd1wcLdyHIYjgI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711468923; c=relaxed/simple; bh=IjbfE8tFOdfjNx4MkilscMXL9i5JKvzYqCkq93qx1DI=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=Zf5WFUFs7oZtBkbymupksJMDJybJRaTyr9w2hwOcqYhdjSwITBs52Kb3mRkThvz07TDrWen+eerCvo6NG5Glqx4BEIyyiwEtN/lmt+4RLCQudPzKUp5akd5lb+XRGCC+1kgga6bAubw5LBpvRTuKlw/DU8TU9TZoK0tJZrKBdyo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=I+AVg4pM; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-1e0f2798b47so5211775ad.1; Tue, 26 Mar 2024 09:02:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1711468920; x=1712073720; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aQJPy88vBYo4nQyyX6JO+jPm5JtOURILAq3nriLwYbQ=; b=I+AVg4pMGd1gzfEoz6Uq8v/9YC+YvYSeTaAiyRhtTnNeYvxqaxCXtahDtSH0Yzxl0b dPokTHz6GZQWZ6JqdpOrNc8I7LabJejcpMu8vuxFYRiZLuXgtt8bU6ZIKP3hmgB02A20 8QrbargXJ2GlhYvGsxEespqzME3YB4IorZq46woaCh5JVeRiZTqcQ0JbTwU1w+VFi//2 Max75S5+Ya+pdOpTe67owOLHLSlM6IMdvG7CG2+1ONmjT8BpM0Wttev6gV8GnFC0KbSB 64ztF3PZTfkkJ3LzW6ieamFi1X+wHyVdPlA+VXzR3nK+tglXrmzZsj2UEVVB2BFdzTwu ZJHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711468920; x=1712073720; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aQJPy88vBYo4nQyyX6JO+jPm5JtOURILAq3nriLwYbQ=; b=wTWmj+E0pMc+2kBqXP/5Sh6+hjiM0ob1QgOaOEujSgKmgt/5iwRTpesSGUOY4kg/kU 2/nfjGdq0wMqqxF8p88iayWs1IijDFrdtVM2IReM61NvUkCJceMdH7F/gze1HiVcetWg K+zdF2cWIgpJaDXCAxi5TvGG4OivweCiHV7c13KO/mCvbGryQbXeH7ekRo9sGrnjOTVd S/83Gl33d9VbnUqgelQS3R8Wzg3tqBYeqVtj3xP92XBEN17Q69gTOuxbo6oJunck9lCE jWr1KVyFlFDAs0fHIy3F7h8jZ52jcOuBJYcU6jn/iEn95l8VXnxt23C8M3UaHDm/yXoU JM2Q== X-Forwarded-Encrypted: i=1; AJvYcCVuGvkc4rfpLe5NHyZtl36fj8QLiRmT0XBcj9YV3vBf6iW5YG7XOUoMu+VpTLMLZ+qlaCRC/eDY8pmQDMpm0U6/Hl5XFAG0rNDjpx8K6BUFf/phLszpVpK7E5KVZd6yQfTmGdvRVhZ7PVwQURSPhx6NJak5xElLK/E7W8zRfJZzb32B7IB8UZLgnI2uTcKmn2RUpTik/tju2qYjT6aLs1toZNzIs+B2mA== X-Gm-Message-State: AOJu0YxI2Ua17tn5tN/pxuvLnoKanWJFlgh70WAFA18jE6OjwksuHhwX NJIEvI5zHk7dx4kdB+uAS5W24qDC/hg7cEE8y9ByFEmQx7F1JiOuum2Kpt2Bkx+TtuzfwUW9lqZ +Ho+VwnvMcbxATsqZ3C8tsWwpk+w= X-Received: by 2002:a17:902:e748:b0:1e0:e011:e3cb with SMTP id p8-20020a170902e74800b001e0e011e3cbmr3326261plf.15.1711468919504; Tue, 26 Mar 2024 09:01:59 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240321145736.2373846-1-jonathan.haslam@gmail.com> <20240325120323.ec3248d330b2755e73a6571e@kernel.org> In-Reply-To: <20240325120323.ec3248d330b2755e73a6571e@kernel.org> From: Andrii Nakryiko Date: Tue, 26 Mar 2024 09:01:47 -0700 Message-ID: Subject: Re: [PATCH] uprobes: reduce contention on uprobes_tree access To: Masami Hiramatsu Cc: Jonathan Haslam , linux-trace-kernel@vger.kernel.org, andrii@kernel.org, bpf@vger.kernel.org, rostedt@goodmis.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Mar 24, 2024 at 8:03=E2=80=AFPM Masami Hiramatsu wrote: > > On Thu, 21 Mar 2024 07:57:35 -0700 > Jonathan Haslam wrote: > > > Active uprobes are stored in an RB tree and accesses to this tree are > > dominated by read operations. Currently these accesses are serialized b= y > > a spinlock but this leads to enormous contention when large numbers of > > threads are executing active probes. > > > > This patch converts the spinlock used to serialize access to the > > uprobes_tree RB tree into a reader-writer spinlock. This lock type > > aligns naturally with the overwhelmingly read-only nature of the tree > > usage here. Although the addition of reader-writer spinlocks are > > discouraged [0], this fix is proposed as an interim solution while an > > RCU based approach is implemented (that work is in a nascent form). Thi= s > > fix also has the benefit of being trivial, self contained and therefore > > simple to backport. > > > > This change has been tested against production workloads that exhibit > > significant contention on the spinlock and an almost order of magnitude > > reduction for mean uprobe execution time is observed (28 -> 3.5 microse= cs). > > Looks good to me. > > Acked-by: Masami Hiramatsu (Google) Masami, Given the discussion around per-cpu rw semaphore and need for (internal) batched attachment API for uprobes, do you think you can apply this patch as is for now? We can then gain initial improvements in scalability that are also easy to backport, and Jonathan will work on a more complete solution based on per-cpu RW semaphore, as suggested by Ingo. > > BTW, how did you measure the overhead? I think spinlock overhead > will depend on how much lock contention happens. > > Thank you, > > > > > [0] https://docs.kernel.org/locking/spinlocks.html > > > > Signed-off-by: Jonathan Haslam > > --- > > kernel/events/uprobes.c | 22 +++++++++++----------- > > 1 file changed, 11 insertions(+), 11 deletions(-) > > > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > index 929e98c62965..42bf9b6e8bc0 100644 > > --- a/kernel/events/uprobes.c > > +++ b/kernel/events/uprobes.c > > @@ -39,7 +39,7 @@ static struct rb_root uprobes_tree =3D RB_ROOT; > > */ > > #define no_uprobe_events() RB_EMPTY_ROOT(&uprobes_tree) > > > > -static DEFINE_SPINLOCK(uprobes_treelock); /* serialize rbtree acces= s */ > > +static DEFINE_RWLOCK(uprobes_treelock); /* serialize rbtree acces= s */ > > > > #define UPROBES_HASH_SZ 13 > > /* serialize uprobe->pending_list */ > > @@ -669,9 +669,9 @@ static struct uprobe *find_uprobe(struct inode *ino= de, loff_t offset) > > { > > struct uprobe *uprobe; > > > > - spin_lock(&uprobes_treelock); > > + read_lock(&uprobes_treelock); > > uprobe =3D __find_uprobe(inode, offset); > > - spin_unlock(&uprobes_treelock); > > + read_unlock(&uprobes_treelock); > > > > return uprobe; > > } > > @@ -701,9 +701,9 @@ static struct uprobe *insert_uprobe(struct uprobe *= uprobe) > > { > > struct uprobe *u; > > > > - spin_lock(&uprobes_treelock); > > + write_lock(&uprobes_treelock); > > u =3D __insert_uprobe(uprobe); > > - spin_unlock(&uprobes_treelock); > > + write_unlock(&uprobes_treelock); > > > > return u; > > } > > @@ -935,9 +935,9 @@ static void delete_uprobe(struct uprobe *uprobe) > > if (WARN_ON(!uprobe_is_active(uprobe))) > > return; > > > > - spin_lock(&uprobes_treelock); > > + write_lock(&uprobes_treelock); > > rb_erase(&uprobe->rb_node, &uprobes_tree); > > - spin_unlock(&uprobes_treelock); > > + write_unlock(&uprobes_treelock); > > RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */ > > put_uprobe(uprobe); > > } > > @@ -1298,7 +1298,7 @@ static void build_probe_list(struct inode *inode, > > min =3D vaddr_to_offset(vma, start); > > max =3D min + (end - start) - 1; > > > > - spin_lock(&uprobes_treelock); > > + read_lock(&uprobes_treelock); > > n =3D find_node_in_range(inode, min, max); > > if (n) { > > for (t =3D n; t; t =3D rb_prev(t)) { > > @@ -1316,7 +1316,7 @@ static void build_probe_list(struct inode *inode, > > get_uprobe(u); > > } > > } > > - spin_unlock(&uprobes_treelock); > > + read_unlock(&uprobes_treelock); > > } > > > > /* @vma contains reference counter, not the probed instruction. */ > > @@ -1407,9 +1407,9 @@ vma_has_uprobes(struct vm_area_struct *vma, unsig= ned long start, unsigned long e > > min =3D vaddr_to_offset(vma, start); > > max =3D min + (end - start) - 1; > > > > - spin_lock(&uprobes_treelock); > > + read_lock(&uprobes_treelock); > > n =3D find_node_in_range(inode, min, max); > > - spin_unlock(&uprobes_treelock); > > + read_unlock(&uprobes_treelock); > > > > return !!n; > > } > > -- > > 2.43.0 > > > > > -- > Masami Hiramatsu (Google)