Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp5316574pxb; Sun, 6 Feb 2022 21:58:44 -0800 (PST) X-Google-Smtp-Source: ABdhPJyC0GYFVusiSHndfC9soYXgZy+nqhjdhtUOR6RQ2BG+XY6kPg4actcgj/qDME/E0rAR43Uf X-Received: by 2002:a17:907:766b:: with SMTP id kk11mr1300240ejc.339.1644213524447; Sun, 06 Feb 2022 21:58:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1644213524; cv=none; d=google.com; s=arc-20160816; b=IfJtxJWCvTlvuT7xkgGPkbSNZ4PkmuliVXvr9b0tFtswfr0LZ0qui/3eABr+CgCLX0 AuWy4o6dSvrVqBY3oPMlDPp0pmmy6ume3oIbqOltkc4ARLpXrHc87fKtPnmJ6c5/wj6H THfSpTYkZ5akgwjFXn1xcja0gSBpvTba+FrfgsgXgwM4+uKoHkTiRRHR7Akl/Sgj7Glh X+6j6ITIY8dmMD8ILcYi3BSgwkoHf/hImIrEKI5v40ykbqaaKmlHg4PhMboOkC6o1xjV mUabIIJcatKYmTyOzxe3nayI4faLEvkbINI18Z8UJracJFmnav/+udBEdqoNSVA5TL6P EVBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=AvUdSyv4AHEETuOkpKa8AYU99AJ+vHmq6OnHDcdbIU8=; b=QNh6M80v6rHrvGfRXqdqemZ3wp0y/8sasPMGKs6LTuSt3V1NEKosFA4xDzQqwa1bdc eRgCJZAiIwq0eKVcpWM4ytnQx/jSPva3S6CQwGwR9OP+Sm/ktEtR44VVIbH3fRnqqDzO RbL1m/t7W6HazAGi/k0yRaR6SS0IFZbTdBG5yUUEa3rI5Gn+YAmggsV6koZcEkknnZvo X+lIIi9mDUfXRSo6A+WDWO+J8vhYs1JQieMxvm5q+cr6LK6Y/I2XDBmj0mVQEqKeeEzo R7nKU9dRQe/rEdsHaMPdmD2qZfj6bLAdM0sLwN2GPwSlBk8LvYRSTSj5hQiBT0oQpH// a2+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fGT+Al6j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w15si7077729ejz.710.2022.02.06.21.58.20; Sun, 06 Feb 2022 21:58:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=fGT+Al6j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233689AbiBDT27 (ORCPT + 99 others); Fri, 4 Feb 2022 14:28:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233619AbiBDT26 (ORCPT ); Fri, 4 Feb 2022 14:28:58 -0500 Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD036C061714 for ; Fri, 4 Feb 2022 11:28:58 -0800 (PST) Received: by mail-qv1-xf30.google.com with SMTP id d12so1365372qvl.3 for ; Fri, 04 Feb 2022 11:28:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=AvUdSyv4AHEETuOkpKa8AYU99AJ+vHmq6OnHDcdbIU8=; b=fGT+Al6jKRABrpuLfbqnm6XPKxGS7b9L/JLvBA9fAuere+nRCcWK8x0tkYF9n+W+JR IS0JAG5bIEXor67f+t/MO7IqVr4nOUr/vs6FhCUaJYJU15l0efC0QaR9jUzezLeiNqGK Xl+SxDny7GxqBbW+QC0RsgJMqrWxvakkz2KRSx/lW1wIjQI6KwiSPrhUoDq3F/PwFnlb uuz4+4jiJspu0dn9SFRlH7KEPqorov1JYBufZTMoXnQvmpjuaShMW3iyalvp93btMUB6 a2q9uY3JupTZdS7KyEQhCsuO+9RLY/KhSYpNI4UaDGSzbuPdDCCyYffvNdqwzP9nL1i1 yEGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=AvUdSyv4AHEETuOkpKa8AYU99AJ+vHmq6OnHDcdbIU8=; b=WMm4G+rB/idhXk+076HoSp+DWDp03rWUjj2PvT0yy8GqoxLHp3bNyIVD+r2Gb2DTek qgiTNrY5FxXlZyyaf4zTFuwv3NuBR7tbNa/A2iBfxi8W/RWkgYYcWFQvfS9wkEp5v0Cz YeoUZSC2wkMyP0jWLxZTJvR83UHqiTWcg6WeWrxm/ohrnaJlu8NgkUkFzAOFKR/HE3oM WE1PTnkVanyjII6nB8/9jzld/YOY47DexJdFdXZfwqHN7OkUjbyzk3IUq7epnzjcT+fK SdjFnQch4RCMd4mzJ4FC3NAb3cNgeU8i9xF9pn5Gy4RwJnL2mggbrkdMNhxpxqzv0nfw 17AQ== X-Gm-Message-State: AOAM533e9kpiuY3zxNpeQfp0vmqENMlV7PlS/bowl6J6phlmLdxJ3kAH +J/LrhFEscrU36WJ/t4VSJ/dq3cccIFAsXEWxsywVw== X-Received: by 2002:ad4:576f:: with SMTP id r15mr3056183qvx.35.1644002937619; Fri, 04 Feb 2022 11:28:57 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Hao Luo Date: Fri, 4 Feb 2022 11:28:46 -0800 Message-ID: Subject: Re: [Question] How to reliably get BuildIDs from bpf prog To: Song Liu Cc: Alexei Starovoitov , Andrii Nakryiko , Daniel Borkmann , Song Liu , Yonghong Song , Martin KaFai Lau , KP Singh , bpf , open list , Jiri Olsa , Blake Jones , Alexey Alexandrov , Namhyung Kim , Ian Rogers , "pasha.tatashin@soleen.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 25, 2022 at 4:16 PM Song Liu wrote: > > On Tue, Jan 25, 2022 at 3:54 PM Hao Luo wrote: > > > > Thanks Song for your suggestion. > > > > On Mon, Jan 24, 2022 at 11:08 PM Song Liu wrote: > > > > > > On Mon, Jan 24, 2022 at 2:43 PM Hao Luo wrote: > > > > > > > > Dear BPF experts, > > > > > > > > I'm working on collecting some kernel performance data using BPF > > > > tracing prog. Our performance profiling team wants to associate the > > > > data with user stack information. One of the requirements is to > > > > reliably get BuildIDs from bpf_get_stackid() and other similar helpers > > > > [1]. > > > > > > > > As part of an early investigation, we found that there are a couple > > > > issues that make bpf_get_stackid() much less reliable than we'd like > > > > for our use: > > > > > > > > 1. The first page of many binaries (which contains the ELF headers and > > > > thus the BuildID that we need) is often not in memory. The failure of > > > > find_get_page() (called from build_id_parse()) is higher than we would > > > > want. > > > > > > Our top use case of bpf_get_stack() is called from NMI, so there isn't > > > much we can do. Maybe it is possible to improve it by changing the > > > layout of the binary and the libraries? Specifically, if the text is > > > also in the first page, it is likely to stay in memory? > > > > > > > We are seeing 30-40% of stack frames not able to get build ids due to > > this. This is a place where we could improve the reliability of build > > id. > > > > There were a few proposals coming up when we found this issue. One of > > them is to have userspace mlock the first page. This would be the > > easiest fix, if it works. Another proposal from Ian Rogers (cc'ed) is > > to embed build id in vma. This is an idea similar to [1], but it's > > unclear (at least to me) where to store the string. I'm wondering if > > we can introduce a sleepable version of bpf_get_stack() if it helps. > > When a page is not present, sleepable bpf_get_stack() can bring in the > > page. > > I guess it is possible to have different flavors of bpf_get_stack(). > However, I am not sure whether the actual use case could use sleepable > BPF programs. Our user of bpf_get_stack() is a profiler. The BPF program > which triggers a perf_event from NMI, where we really cannot sleep. > > If we have target use case that could sleep, sleepable bpf_get_stack() sounds > reasonable to me. > > > > > [1] https://lwn.net/Articles/867818/ > > > > > > 2. When anonymous huge pages are used to hold some regions of process > > > > text, build_id_parse() also fails to get a BuildID because > > > > vma->vm_file is NULL. > > > > > > How did the text get in anonymous memory? I guess it is NOT from JIT? > > > We had a hack to use transparent huge page for application text. The > > > hack looks like: > > > > > > "At run time, the application creates an 8MB temporary buffer and the > > > hot section of the executable memory is copied to it. The 8MB region in > > > the executable memory is then converted to a huge page (by way of an > > > mmap() to anonymous pages and an madvise() to create a huge page), the > > > data is copied back to it, and it is made executable again using > > > mprotect()." > > > > > > If your case is the same (or similar), it can probably be fixed with > > > CONFIG_READ_ONLY_THP_FOR_FS, and modified user space. > > > > > > > In our use cases, we have text mapped to huge pages that are not > > backed by files. vma->vm_file could be null or points some fake file. > > This causes challenges for us on getting build id for these code text. > > So, what is the ideal output in these cases? If there isn't a back file, we > don't really have good build-id for it, right? > Right, I don't have a solution for this case unfortunately. Probably will just discard the failed frames. :( But in the case where the problem is the page not in mem, Song, do you also see a similar high rate of build id parsing failure in your use case (30 ~ 40% of frames)? If no, we may have done something wrong on our side. If yes, is that a problem for your use case? > Thanks, > Song