Received: by 2002:ab2:7988:0:b0:1f4:b336:87c4 with SMTP id g8csp124107lqj; Thu, 11 Apr 2024 11:37:30 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUR5dMamHOzG7LTB9tIHrY5rZIVsVdxqrRi9Wi+rlPFFuIvFSa3jzdH4EHzKb4syCC8jR4UMlUHQj70Sk/x05x5Jw6bM7mcyh2wDeMuSA== X-Google-Smtp-Source: AGHT+IGFMBZk0ag+almWlTNNDhsoqsmWS1JaCf4RQI6nv8SAqLW1YXyMv9gR9N2lUb65vdazZYh7 X-Received: by 2002:ad4:4184:0:b0:69b:5475:ca74 with SMTP id e4-20020ad44184000000b0069b5475ca74mr513808qvp.31.1712860649754; Thu, 11 Apr 2024 11:37:29 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712860649; cv=pass; d=google.com; s=arc-20160816; b=hsExODPEGsKAivCaos46LKA/gTi3uhQzOCvGsfyMtpPZqZelvp6YoQ6p8wCYtFekrx BSIx8JjpiyF50CqRiXY3ZO6ohibJOeLV4Aq2Rh0LS5KXeIL4RFFMzQ81/bUXEvvUMVsE 9zDEHw7L2JzXJH8Gq9xiBCZaNrXj/SS1rQSxhR4aNlIo0G8le89MCwhtOSjfMCGiy/pa +84uKO7HGgGGlZ2EBUb/dn7CcxJg8I5+9CUlyjtIjsjqFZPXC9kCZyiys7AxfvGUBNHO whRusXqiktXod1T6kkUwmyrWSO+8Ng/JUTYziM5l3wScSAUlZnrUSQ2kvR2nt7XuCr0s mpqQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:dkim-signature; bh=xbtisnhUzZ5+5osOX2htiyEfxmKmwf+lNIM3kB5/lv8=; fh=7TyggRFmfoXGXRWarDCmXbhqJVhbhoe/7LuIU6jbDnk=; b=bLIA/BOQVk36w0EDb8KRg6S2OmyMoATdvhtzfMifZ401+LIg8/cE3lP15yMD2+rScS RO0VTF3lmOH9Jbdh/qq+CIqtVK2P/At32G3QLnrGrQUPLdKedZSzZYA3fObCZMRD6n97 t3OGrIT7tnar4ZNsodfdAD7R9RFmmHAQ8Vj2HptiJl5MuOoaKQIOUP/+aJ3xtnnhzWUN iRwYbyFDVnCfsaTg5IuF+hN5d3GTYyYn0OBOeNEhDYx3POC7woTPLxP7CJrYiW+Abmqk z0Wg4wI/+GxysXFWtNSah0vzm+t7oz6M61ZmSGiL7VpwUZMMMlqV2Ss40BIxKwlHXk/A isWg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b=GP088t5u; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-141455-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-141455-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id s17-20020ad44391000000b0069b49a08081si1834348qvr.38.2024.04.11.11.37.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Apr 2024 11:37:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-141455-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google09082023 header.b=GP088t5u; arc=pass (i=1 spf=pass spfdomain=cloudflare.com dkim=pass dkdomain=cloudflare.com dmarc=pass fromdomain=cloudflare.com); spf=pass (google.com: domain of linux-kernel+bounces-141455-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-141455-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id C28101C245BD for ; Thu, 11 Apr 2024 18:36:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3908C1384A1; Thu, 11 Apr 2024 18:09:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="GP088t5u" Received: from mail-yw1-f171.google.com (mail-yw1-f171.google.com [209.85.128.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3BE15102E for ; Thu, 11 Apr 2024 18:09:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712858980; cv=none; b=t24OvIX7aQfMc57ojw+2crNwWtK9dLh48zqh2R5HXrOhoUTLA1ckFQUIKHXpSeKxB9cNPgAfzbZK9rFYjOSfQAQi+F1HpdQseeVk50cL2BSuqD/oUTDO3+F1dzJQBiAAaEh2JgMHAN7GhMnerZosbmET48nFqV8AVQFCIFpZRL8= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712858980; c=relaxed/simple; bh=7sQCResQHcHFaajY/affxbUQdzkJ+QOQsXOVt7r+6/o=; h=MIME-Version:From:Date:Message-ID:Subject:To:Cc:Content-Type; b=MjZAMOLMYmbJDNQym1wZ3AzLkHqn9ZsXNlcdn4FfHFHc1XfDYBqb/x/E7zVYcmup6kBrS67eXJE3bKt1hHnKc/kBhKN6RwUqZNoyFRFlNub2RiLpWj3Rt4jUTnBmZu/lXSXBSSVvXFPPWOD3cnyhJLzBuRdwr/WUw5rfAtdv2h8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com; spf=pass smtp.mailfrom=cloudflare.com; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b=GP088t5u; arc=none smtp.client-ip=209.85.128.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cloudflare.com Received: by mail-yw1-f171.google.com with SMTP id 00721157ae682-617e6c873f3so309217b3.2 for ; Thu, 11 Apr 2024 11:09:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1712858978; x=1713463778; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=xbtisnhUzZ5+5osOX2htiyEfxmKmwf+lNIM3kB5/lv8=; b=GP088t5uqDEAw37lXuL6pzgmZg8DkXCZq+lMEya04SkKxaBIJrrwYkTBG5aZLUYUhd X2C7mXN8iM14w4H9EsdAcqqGTYsBIwH6Mb3wzrM8hE7c4pZNvcqz6VLR67qRdxETy65q sYJiS5rKYYoD+i3QWuRtqFUaBOyJ08u87Kedf8QbKQX2Ef9uQeot7T9rQiFI7PE2MA2o VUXIOYVT0kZsK/hBoNWjfnWWmmugr54x55wv/5d+kqr+Rn6lAObvdzGjOt5skH8NdoPn 7hBHgCDaC6uTfKHLGOu/jOPS7dGzEB+O+Rxei3xfMbbLntUPMYNYG/PXp2bIpz3TaiXa QUFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712858978; x=1713463778; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=xbtisnhUzZ5+5osOX2htiyEfxmKmwf+lNIM3kB5/lv8=; b=HW/3hGkzjB/po1pbvvaK+lfwlzjvRrXbpVUlIZBN1aqHE/AQ75umOOm7g7f+Qa3nlh zqLzspIN8gOBQjkFhN7Y/J6zWzXwyC1hn37TtfEHIl1CbjlSDWJYjVe2TLhO2+i4GcDM NZiTP7by2PdqFbItHtB1F3tCiby68NDB6UdYR4OVncdskxIOhp89LqSLYjXuWJuiz9CV LF9PvhuOllqw4BKSdqxdw3ParW83GudNP4drmqHClCqpsR8R3oSj3oJmH8LKM6RoIoRn E9sXLb8ZNqmiogV8UAuoRl1NtAIfB+h9v/l4PocCExLHNq/QUM/jTg+Cl4mKvK/lY09w 1zzA== X-Forwarded-Encrypted: i=1; AJvYcCXuqkgumkzUNsyNiIKGDM0LwOtj2fpKrrMNZ3jusEOSTJf710bMROdFa90eNtiYI10UAIwhfFoQYa/F7xWIT7VQhx56P2GM7Gp1lHqb X-Gm-Message-State: AOJu0YxLpt0nzWPXT8HmOAz9ykm9AmInCw+6lKb1B/cOgzPdHU8Mh7z0 lu4N7cogJ4XVAWROUW9tADrPM22+rJMioH+j9GttnSvWVbH6JI6iuBlJ0OyxSee8gbUWGwbJ51t UFW0m2ytlHc1Ps5DHPKgY+i8CLnWi/2OlHmpl/Q== X-Received: by 2002:a0d:d695:0:b0:611:18fc:9489 with SMTP id y143-20020a0dd695000000b0061118fc9489mr218230ywd.28.1712858977725; Thu, 11 Apr 2024 11:09:37 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Ivan Babrou Date: Thu, 11 Apr 2024 11:09:26 -0700 Message-ID: Subject: Incorrect BPF stats accounting for fentry on arm64 To: bpf Cc: kernel-team , Xu Kuohai , linux-kernel , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Hello, We're seeing incorrect data for bpf runtime stats on arm64. Here's an example: $ sudo bpftool prog show id 693110 693110: tracing name __tcp_retransmit_skb tag e37be2fbe8be4726 gpl run_time_ns 2493581964213176 run_cnt 1133532 recursion_misses 1 loaded_at 2024-04-10T22:33:09+0000 uid 62727 xlated 312B jited 344B memlock 4096B map_ids 8550445,8550441 btf_id 8726522 pids prometheus-ebpf(2224907) According to bpftool, this program reported 66555800ns of runtime at one point and then it jumped to 2493581675247416ns just 53s later when we looked at it again. This is happening only on arm64 nodes in our fleet on both v6.1.82 and v6.6.25. We have two services that are involved: * ebpf_exporter attaches bpf programs to the kernel and exports prometheus metrics and opentelementry traces driven by its probes * bpf_stats_exporter runs bpftool every 53s to capture bpf runtime metrics The problematic fentry is attached to __tcp_retransmit_skb, but an identical one is also attached to tcp_send_loss_probe, which does not exhibit the same issue: SEC("fentry/__tcp_retransmit_skb") int BPF_PROG(__tcp_retransmit_skb, struct sock *sk) { return handle_sk((struct pt_regs *) ctx, sk, sk_kind_tcp_retransmit_skb); } SEC("fentry/tcp_send_loss_probe") int BPF_PROG(tcp_send_loss_probe, struct sock *sk) { return handle_sk((struct pt_regs *) ctx, sk, sk_kind_tcp_send_loss_probe); } In handle_sk we do a map lookup and an optional ringbuf push. There is no sleeping (I don't think it's even allowed on v6.1). It's interesting that it only happens for the retransmit, but not for the loss probe. The issue manifests some time after we restart ebpf_exporter and reattach the probes. It doesn't happen immediately, as we need to capture metrics 53s apart to produce a visible spike in metrics. There is no corresponding spike in execution count, only in execution time. It doesn't happen deterministically. Some ebpf_exporter restarts show it, some don't. It doesn't keep happening after ebpf_exporter restart. It happens once and that's it. Maybe recursion_misses plays a role here? We see none for tcp_send_loss_probe. We do see some for inet_sk_error_report tracepoint, but it doesn't spike like __tcp_retransmit_skb does. The biggest smoking gun is that it only happens on arm64. I'm happy to try out patches to figure this one out.