Received: by 2002:a89:288:0:b0:1f7:eeee:6653 with SMTP id j8csp74765lqh; Mon, 6 May 2024 11:42:14 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWYb88E/4pC/Gd8eqQmFNrjmt8PQWmxovwuJUu4XztqBNV/K7zx3IsoX2pEgRrnIJk/FjJMmthTrbdWAz+jdE0QwqrDqBi8Ef6DrNa7UQ== X-Google-Smtp-Source: AGHT+IFGH3+BWBROW1fym3/AxjYY1qK9V+vupLp2xDyZgZeOQvYbajIjTBCLgF07RFJ/cruWvTwu X-Received: by 2002:a05:6808:1692:b0:3c9:6894:bcfa with SMTP id bb18-20020a056808169200b003c96894bcfamr6216189oib.53.1715020933983; Mon, 06 May 2024 11:42:13 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1715020933; cv=pass; d=google.com; s=arc-20160816; b=BjfzDZb5AIAL9dnJRKz4R1nLAWlt9CpzqOLp0i5PYmo9GBZo0594Fz4j39a6hzynIC 2VwPx6WGLoxJab6h7HWtGQ2Wmxj+rTGRoKDJyUo9yQAhTOHn/83+2sYV65Mn3zncglhI AngasoFoK+5ArViUJPDGYV/yUdzEkH2T6hlZcIzqWvIVU+ueOm3cs/5/YdKJVGTLakz7 rA+sSoDrbacHSGKYUPJxuu75A9fxLf4/rJxe1PATeQh9chi4+ajHr0Hi3FD7Pg6jzwWY PfESzMK8btpTmJJ8nVnFc3ipd7qf55hFozIzJAv6axFkbc31zJKqK04JloyOsG2bL+L1 OhuA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=bDT08fGLSf2RPyvvgFwElfxPpKeZ2KDPZFIEHKmSIzs=; fh=Hv0cD/qh6ARjz2m9d3xhDNO9q8wy7710vdfXIGN186A=; b=sNzSRqXwGIidtgzq88nUsyMqG+AEqlxcAJiFpz/ZIRclvOuqBTI2FwYWSAa98Ta9nw 7TOO5dAteGyzi1hrrR/LpgbHFi8ZscSPBrs+WXaA1mFNIZHnogGh/ZVvu+dGYJGJrUi/ Jy1B5pRByi2BkRUQ9XR9eT9286k1/zO1Yk6Wm5rKK28bQ84AZn3iAyWr6Od/uMdGny6W MlVqQ7inpUrSJq5g2cu4ew1QcKotw0KuAKZXpOF45losSwi3S6dyMpfeLjkBGxEQ0xNM 4uugEJVhPT23UOsrSc3itaZGxR9GHzHi31y9JQm6mrefbx1Q8iwWTX2KEI+18k5+gBuH Xztw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=V7i+8GA+; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-170303-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-170303-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id wd31-20020a05620a729f00b007929b9d5474si2215902qkn.239.2024.05.06.11.42.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 May 2024 11:42:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-170303-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=V7i+8GA+; arc=pass (i=1 spf=pass spfdomain=gmail.com dkim=pass dkdomain=gmail.com dmarc=pass fromdomain=gmail.com); spf=pass (google.com: domain of linux-kernel+bounces-170303-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-170303-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 9A5161C225E4 for ; Mon, 6 May 2024 18:42:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6B79F158D9D; Mon, 6 May 2024 18:41:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V7i+8GA+" Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDD6B1E4A6; Mon, 6 May 2024 18:41:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715020918; cv=none; b=A0Qp+5ImvF5vFEFHvAT9F1d4XGYsqc2r9OtZdlxzIXf+mZTFg52wM4QMdfFkwRWKuYZMJRkkw0KiGNu37/VWzEhWjGTupqNef/WuuXQASJKHe012Pmg46SWBLnaxUpWFgkVx3SfVpiIaZ6NE3G4OH80BKL887ojbdkhB677xUjI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1715020918; c=relaxed/simple; bh=tCKuk3qiNlN7CQiFoVHXwNs6NO9F2MdMtkXvWzlMPFo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=G44iMEJtPjd+Ug+3LFN/0s+7Yk/s9XuOtxqKYhjyHK7qme1ycPo8ib0RFpRUYui5nhmAnaOuFwSfvCJIsnFsbqYkRVuyPBtCaufcYg3LGFkYwNyL2RgP6PfXEmClobzol9vG/Jl9tafVHFs8bMYNyhL89ing7UoUZ5yGe6cfkrk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=V7i+8GA+; arc=none smtp.client-ip=209.85.216.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pj1-f47.google.com with SMTP id 98e67ed59e1d1-2b27c532e50so1563537a91.2; Mon, 06 May 2024 11:41:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1715020916; x=1715625716; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bDT08fGLSf2RPyvvgFwElfxPpKeZ2KDPZFIEHKmSIzs=; b=V7i+8GA+JJSItd0504xSx/TGhtTKXuh1YY39xfiv6iJ9rswcbdDMiXMBM9+ulXwXKW OPnOcwEAp5s02mOwsOUWjxSWYIMLKz7r5SUBHettq0z+OKWh/FpMRJAZQWPwd6NRBc0R wWjvrCTHidoP1CgSy3J/bfGTGmK1M/WWaBfJf2aGrvxWf/1j5oTLnHN/a1aZpNrLwqrV vNFSfD0l17ok4tdCc+ZBQKpPHd0QcH3ITNUn5ZCsVuabklKT5DCkAuNg3LmgjrlK7/cW VvxbV4x1pK2wfvGl12K0o6MalotVXUMxqgEr76rBsxLsqC/whh4B8o9KjlpcqlATZEre akyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715020916; x=1715625716; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bDT08fGLSf2RPyvvgFwElfxPpKeZ2KDPZFIEHKmSIzs=; b=Vz3za6mbK2Q8A1wSdo07dfrfxN/F+PtbIAQ+Cw9MAS7yVFvWs0EvDEIdoUkhA89JO8 feMOV+GYXRNjQkiq9Lpxw6FEPtUDkPqhtLE4sXCqo3OLumzm9GRaHWYtSMgs6azbxhyN PsbX6e0Bhq8d6DoQ5p07l0pHGr/3Cd3ow7K/PRtdhXatS3tplYxxjzJC/uatFoVuQNfj JDF6wyQEMsAJEKaHtliApXUQyiHnPXC8YUUN5JpH859+ccvSMJe6z/pSxRSjRezE+TNk MxmwppeSeD6HZB0GeuFXBbMiY0Cmhgy8QIq3Zv5pXQY0ag1m3WPujov5TuehDKjktaxb zDdw== X-Forwarded-Encrypted: i=1; AJvYcCWfeskKqJvkkOqqPxnBb2+qvgWFWlkZf3vtIsjJvlSaodMLDXO3g8dyjJcQ9x86AuPkRkcZ/eqwHeWlp8vyRkpjmUqiYLyP04wn4cvjCLlpEO+kuh4ctUPt/NkHeu8muES3SBMn09BmdxFF15rsaCGoVpZw/cMuQYd+R+KNthl0AVFUXJE4nr/m/S6ycOAxwElwmOo8jUKOccnhvf7ev3WB5N4= X-Gm-Message-State: AOJu0Yyyr1GPX7fZTseVW6xr6+D4mp+2jZHjYXka9+ZFBj+ucJzrnpye fQq9aXijvrewB03oFVhYKLeEC1YMRmzjB7GefctL2MdyNUFrDE76iFREi7QwRSxRzCLOjG/yZnC rHo3brbQk+oL1mTyV8P8pnV1KzPc= X-Received: by 2002:a17:90b:1058:b0:2b1:e314:a5e6 with SMTP id gq24-20020a17090b105800b002b1e314a5e6mr10307940pjb.7.1715020915838; Mon, 06 May 2024 11:41:55 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240504003006.3303334-1-andrii@kernel.org> <20240504003006.3303334-3-andrii@kernel.org> <2024050439-janitor-scoff-be04@gregkh> In-Reply-To: From: Andrii Nakryiko Date: Mon, 6 May 2024 11:41:43 -0700 Message-ID: Subject: Re: [PATCH 2/5] fs/procfs: implement efficient VMA querying API for /proc//maps To: Arnaldo Carvalho de Melo Cc: Jiri Olsa , Ian Rogers , Greg KH , Andrii Nakryiko , linux-fsdevel@vger.kernel.org, brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-mm@kvack.org, =?UTF-8?Q?Daniel_M=C3=BCller?= , "linux-perf-use." Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, May 6, 2024 at 6:58=E2=80=AFAM Arnaldo Carvalho de Melo wrote: > > On Sat, May 04, 2024 at 02:50:31PM -0700, Andrii Nakryiko wrote: > > On Sat, May 4, 2024 at 8:28=E2=80=AFAM Greg KH wrote: > > > On Fri, May 03, 2024 at 05:30:03PM -0700, Andrii Nakryiko wrote: > > > > Note also, that fetching VMA name (e.g., backing file path, or spec= ial > > > > hard-coded or user-provided names) is optional just like build ID. = If > > > > user sets vma_name_size to zero, kernel code won't attempt to retri= eve > > > > it, saving resources. > > > > > Signed-off-by: Andrii Nakryiko > > > > Where is the userspace code that uses this new api you have created? > > > So I added a faithful comparison of existing /proc//maps vs new > > ioctl() API to solve a common problem (as described above) in patch > > #5. The plan is to put it in mentioned blazesym library at the very > > least. > > > > I'm sure perf would benefit from this as well (cc'ed Arnaldo and > > linux-perf-user), as they need to do stack symbolization as well. > > At some point, when BPF iterators became a thing we thought about, IIRC > Jiri did some experimentation, but I lost track, of using BPF to > synthesize PERF_RECORD_MMAP2 records for pre-existing maps, the layout > as in uapi/linux/perf_event.h: > > /* > * The MMAP2 records are an augmented version of MMAP, they add > * maj, min, ino numbers to be used to uniquely identify each map= ping > * > * struct { > * struct perf_event_header header; > * > * u32 pid, tid; > * u64 addr; > * u64 len; > * u64 pgoff; > * union { > * struct { > * u32 maj; > * u32 min; > * u64 ino; > * u64 ino_generation; > * }; > * struct { > * u8 build_id_size; > * u8 __reserved_1; > * u16 __reserved_2; > * u8 build_id[20]; > * }; > * }; > * u32 prot, flags; > * char filename[]; > * struct sample_id sample_id; > * }; > */ > PERF_RECORD_MMAP2 =3D 10, > > * PERF_RECORD_MISC_MMAP_BUILD_ID - PERF_RECORD_MMAP2 event > > As perf.data files can be used for many purposes we want them all, so we ok, so because you want them all and you don't know which VMAs will be useful or not, it's a different problem. BPF iterators will be faster purely due to avoiding binary -> text -> binary conversion path, but other than that you'll still retrieve all VMAs. You can still do the same full VMA iteration with this new API, of course, but advantages are probably smaller as you'll be retrieving a full set of VMAs regardless (though it would be interesting to compare anyways). > setup a meta data perf file descriptor to go on receiving the new mmaps > while we read /proc//maps, to reduce the chance of missing maps, do > it in parallel, etc: > > =E2=AC=A2[acme@toolbox perf-tools-next]$ perf record -h 'event synthesis' > > Usage: perf record [] [] > or: perf record [] -- [] > > --num-thread-synthesize > number of threads to run for event synthesis > --synth > Fine-tune event synthesis: default=3Dall > > =E2=AC=A2[acme@toolbox perf-tools-next]$ > > For this specific initial synthesis of everything the plan, as mentioned > about Jiri's experiments, was to use a BPF iterator to just feed the > perf ring buffer with those events, that way userspace would just > receive the usual records it gets when a new mmap is put in place, the > BPF iterator would just feed the preexisting mmaps, as instructed via > the perf_event_attr for the perf_event_open syscall. > > For people not wanting BPF, i.e. disabling it altogether in perf or > disabling just BPF skels, then we would fallback to the current method, > or to the one being discussed here when it becomes available. > > One thing to have in mind is for this iterator not to generate duplicate > records for non-pre-existing mmaps, i.e. we would need some generation > number that would be bumped when asking for such pre-existing maps > PERF_RECORD_MMAP2 dumps. Looking briefly at struct vm_area_struct, it doesn't seems like the kernel maintains any sort of generation (at least not at vm_area_struct level), so this would be nice to have, I'm sure, but isn't really related to adding this API. Once the kernel does have this "VMA generation" counter, it can be trivially added to this binary interface (which can't be said about /proc//maps, unfortunately). > > > It will be up to other similar projects to adopt this, but we'll > > definitely get this into blazesym as it is actually a problem for the > > At some point looking at plugging blazesym somehow with perf may be > something to consider, indeed. In the above I meant direct use of this new API in perf code itself, but yes, blazesym is a generic library for symbolization that handles ELF/DWARF/GSYM (and I believe more formats), so it indeed might make sense to use it. > > - Arnaldo > > > abovementioned Oculus use case. We already had to make a tradeoff (see > > [2], this wasn't done just because we could, but it was requested by > > Oculus customers) to cache the contents of /proc//maps and run > > the risk of missing some shared libraries that can be loaded later. It > > would be great to not have to do this tradeoff, which this new API > > would enable. > > > > [2] https://github.com/libbpf/blazesym/commit/6b521314126b3ae6f2add43= e93234b59fed48ccf > > [...]