Received: by 2002:a25:b323:0:0:0:0:0 with SMTP id l35csp3188676ybj; Mon, 23 Sep 2019 16:35:34 -0700 (PDT) X-Google-Smtp-Source: APXvYqzf+sZSJk7K/u7I5v/deLB/+BtN+k66o26+pdNyuyfRUEURMgD2fO0g6BQnIPmtLqiegpve X-Received: by 2002:a17:907:20c4:: with SMTP id qq4mr19652ejb.161.1569281734337; Mon, 23 Sep 2019 16:35:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569281734; cv=none; d=google.com; s=arc-20160816; b=g0RQer6jAef3jLYxG8OiV776BQj+Xmbr38rhY/E27GrGZPbtVM3gWRyuej/LnmBRJ8 YIAPkGhfZeuelT9otDlLGE9Z1t9ppoSCtqkAFy7s56NOh5MHfihA+cHb0UffuxvBsvX6 jaNAEqHhKaZLpewCFIiGLWDgp5gVLV06eEKpa/+Pu6dG9Yetm36yuN3ppPrQcvOadaa+ 62hTfAmmCzfLXpe4S/hNyC5rLZDJtiT43AdDquo4oy/6HXTBFf067pnab0SO/t+9Baxk hNf7OdYfuJSnsXOCLlqztn76mkDhX7iSbCWFQpPq2b9XYvhW9VGbonvliJWKRs5UauIP aNnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XPcpJ7/7fUdRbDxozFgIR8tRcEIaRKDKlJYcg6GhO8E=; b=AJUk1p3WNJFpfkGN0kHYUCd4fhOsPcr8lf+/2HyXkEohzSOyI7XNUpwBp6Wb1p++2H EH+UBVMy6ZVpK5f6QOQDsYvOfndz5Vc5RIgGA5d5qkj3DDCL7yV5dZdastpl3ICHmGSO qTcNQw9h0gsJeym4TeUtFpsg4JGWnqU9+/br0xJs4tclDPA3EkdqOMkTpqocNqcBRLc5 m5RQnD7Lk7z6ABc4V3SVHCIthZr5jSOKEzg383R0SSPwaZZou6BDHmBdxdqMWceZ6LzS Ee6ipxfHlnqPFB+EFcRdZNj4pnxky8aB0rLScG3SqV9Hik/9MJ8ThIIyj6q9pj8SQMQ/ n1mw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GLbdw2Zd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g20si5580433eje.364.2019.09.23.16.35.10; Mon, 23 Sep 2019 16:35:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=GLbdw2Zd; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391198AbfIVSrw (ORCPT + 99 others); Sun, 22 Sep 2019 14:47:52 -0400 Received: from mail.kernel.org ([198.145.29.99]:44278 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391134AbfIVSrq (ORCPT ); Sun, 22 Sep 2019 14:47:46 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id ECC87214D9; Sun, 22 Sep 2019 18:47:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1569178065; bh=wsTJHiMFPOoHIgER7mlVFFqhW8kRSBgcp+tO9JjJ15I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=GLbdw2ZdovNqJR5Uelowm7WLIXx3SgKyhYcWGn0yWa4EN0WR9+G6ohffC5y/u3Rzp k0wOUPLp0nEbrNFxMTbF3E79mJrz9J22vxTIVY+ImO4lWTuObJBaYTw0FgsxHgspJ2 9nE0AQ55eKi868HjbHk9Y3zNRGk+geKTnnx2Pri4= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Arnaldo Carvalho de Melo , Karl Rister , Alexander Shishkin , Alexei Starovoitov , Brendan Gregg , Daniel Borkmann , Krister Johansen , Namhyung Kim , Peter Zijlstra , Song Liu , Stanislav Fomichev , Thomas-Mich Richter , Sasha Levin Subject: [PATCH AUTOSEL 5.3 139/203] perf evlist: Use unshare(CLONE_FS) in sb threads to let setns(CLONE_NEWNS) work Date: Sun, 22 Sep 2019 14:42:45 -0400 Message-Id: <20190922184350.30563-139-sashal@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190922184350.30563-1-sashal@kernel.org> References: <20190922184350.30563-1-sashal@kernel.org> MIME-Version: 1.0 X-stable: review X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Arnaldo Carvalho de Melo [ Upstream commit b397f8468fa27f08b83b348ffa56a226f72453af ] When we started using a thread to catch the PERF_RECORD_BPF_EVENT meta data events to then ask the kernel for further info (BTF, etc) for BPF programs shortly after they get loaded, we forgot to use unshare(CLONE_FS) as was done in: 868a832918f6 ("perf top: Support lookup of symbols in other mount namespaces.") Do it so that we can enter the namespaces to read the build-ids at the end of a 'perf record' session for the DSOs that had hits. Before: Starting a 'stress-ng --cpus 8' inside a container and then, outside the container running: # perf record -a --namespaces sleep 5 # perf buildid-list | grep stress-ng # We would end up with a 'perf.data' file that had no entry in its build-id table for the /usr/bin/stress-ng binary inside the container that got tons of PERF_RECORD_SAMPLEs. After: # perf buildid-list | grep stress-ng f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/bin/stress-ng # Then its just a matter of making sure that that binary debuginfo package gets available in a place that 'perf report' will look at build-id keyed ELF files, which, in my case, on a f30 notebook, was a matter of installing the debuginfo file for the distro used in the container, fedora 31: # rpm -ivh http://fedora.c3sl.ufpr.br/linux/development/31/Everything/x86_64/debug/tree/Packages/s/stress-ng-debuginfo-0.07.29-10.fc31.x86_64.rpm Then, because perf currently looks for those debuginfo files (richer ELF symtab) inside that namespace (look at the setns calls): openat(AT_FDCWD, "/proc/self/ns/mnt", O_RDONLY) = 137 openat(AT_FDCWD, "/proc/13169/ns/mnt", O_RDONLY) = 139 setns(139, CLONE_NEWNS) = 0 stat("/usr/bin/stress-ng", {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0 openat(AT_FDCWD, "/usr/bin/stress-ng", O_RDONLY) = 140 fcntl(140, F_GETFD) = 0 fstat(140, {st_mode=S_IFREG|0755, st_size=3065416, ...}) = 0 mmap(NULL, 3065416, PROT_READ, MAP_PRIVATE, 140, 0) = 0x7ff2fdc5b000 munmap(0x7ff2fdc5b000, 3065416) = 0 close(140) = 0 stat("stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/usr/bin/.debug/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug", 0x7fff45d71260) = -1 ENOENT (No such file or directory) stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29", 0x7fff45d711e0) = -1 ENOENT (No such file or directory) To only then go back to the "host" namespace to look just in the users's ~/.debug cache: setns(137, CLONE_NEWNS) = 0 chdir("/root") = 0 close(137) = 0 close(139) = 0 stat("/root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf", 0x7fff45d732e0) = -1 ENOENT (No such file or directory) It continues to fail to resolve symbols: # perf report | grep stress-ng | head -5 9.50% stress-ng-cpu stress-ng [.] 0x0000000000021ac1 8.58% stress-ng-cpu stress-ng [.] 0x0000000000021ab4 8.51% stress-ng-cpu stress-ng [.] 0x0000000000021489 7.17% stress-ng-cpu stress-ng [.] 0x00000000000219b6 3.93% stress-ng-cpu stress-ng [.] 0x0000000000021478 # To overcome that we use: # perf buildid-cache -v --add /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug Adding f2ed02c68341183a124b9b0f6e2e6c493c465b29 /usr/lib/debug/usr/bin/stress-ng-0.07.29-10.fc31.x86_64.debug: Ok # # ls -la /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf -rw-r--r--. 3 root root 2401184 Jul 27 07:03 /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf # file /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf /root/.debug/.build-id/f2/ed02c68341183a124b9b0f6e2e6c493c465b29/elf: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter \004, BuildID[sha1]=f2ed02c68341183a124b9b0f6e2e6c493c465b29, for GNU/Linux 3.2.0, with debug_info, not stripped, too many notes (256) # Now it finally works: # perf report | grep stress-ng | head -5 23.59% stress-ng-cpu stress-ng [.] ackermann 23.33% stress-ng-cpu stress-ng [.] is_prime 17.36% stress-ng-cpu stress-ng [.] stress_cpu_sieve 6.08% stress-ng-cpu stress-ng [.] stress_cpu_correlate 3.55% stress-ng-cpu stress-ng [.] queens_try # I'll make sure that it looks for the build-id keyed files in both the "host" namespace (the namespace the user running 'perf record' was a the time of the recording) and in the container namespace, as it shouldn't matter where a content based key lookup finds the ELF file to use in resolving symbols, etc. Reported-by: Karl Rister Cc: Alexander Shishkin Cc: Alexei Starovoitov Cc: Brendan Gregg Cc: Daniel Borkmann Cc: Krister Johansen Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Song Liu Cc: Stanislav Fomichev Cc: Thomas-Mich Richter Fixes: 657ee5531903 ("perf evlist: Introduce side band thread") Link: https://lkml.kernel.org/n/tip-g79k0jz41adiaeuqud742t2l@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin --- tools/perf/util/evlist.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index b0364d923f764..070c3bd578827 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -20,6 +20,7 @@ #include "bpf-event.h" #include #include +#include #include "parse-events.h" #include @@ -1870,6 +1871,14 @@ static void *perf_evlist__poll_thread(void *arg) struct perf_evlist *evlist = arg; bool draining = false; int i, done = 0; + /* + * In order to read symbols from other namespaces perf to needs to call + * setns(2). This isn't permitted if the struct_fs has multiple users. + * unshare(2) the fs so that we may continue to setns into namespaces + * that we're observing when, for instance, reading the build-ids at + * the end of a 'perf record' session. + */ + unshare(CLONE_FS); while (!done) { bool got_data = false; -- 2.20.1