Received: by 2002:a25:824b:0:0:0:0:0 with SMTP id d11csp1847441ybn; Thu, 26 Sep 2019 03:09:09 -0700 (PDT) X-Google-Smtp-Source: APXvYqztPGHMKhbQQx4b465S+wWGsqk+R49cdl/EGdhuoenaYUURzCYp/nR+GizwpY+90g9spjPr X-Received: by 2002:a50:fd01:: with SMTP id i1mr2592592eds.184.1569492549147; Thu, 26 Sep 2019 03:09:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1569492549; cv=none; d=google.com; s=arc-20160816; b=pdFuyRyjhnj4w9s8nsyY+XjjxPcXhoYZV6lCU+dmc6aIFmE+eU+SqulSn8gk03tcYZ Fp7fIsYTjArokL/3lOFUsauXKhlumOAk3Cf6eHF6BBVPt1sZPLhMoC2NgAiBkO940C3K IzpYG3aFw1iZ/FbiAtguTPKbMqBPAHEUnX7NRj7pnn2tY0iOjNYr1RkxKIO53ekZi6RC vl2ksKpRkVRy4tBeTMjYq2LhhomC/bA7Np65/8CJBM7tNJrMljPEUCEx+KE+F7u0xsaw fmY8jCSoacc4NcptqY4NGvriKQCvJx4fMyeQnaw8DbDm6GceAmpJdoK4UuforZxzlh4z HXEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:from:subject:references :mime-version:message-id:in-reply-to:date:dkim-signature; bh=FpMTONf9jSbZVp/Ti1EaLgmO9nC11TmbgGftl+3PnOg=; b=crEuXu0g0AiPgslLhJfLfkuhrIqA5CitTRmoioGrZqvObm4/GqK5TbDw8KJw5HEDXu +pL5HTpKQSZoKS+wEAzMGjWx8FG/mLhBl0ss7padTaSEny0YeP34CWF8cpsY7z4+52dn 6e/QUOpQBFaGx+dTiKQzEDjwueKsXv9N8wUHJgvrEYGMz9U/O09MCl6NyFONyDRziSJF Z7lW1TKxUOt3KbKcxgQZ/qKdro+0SpH16jHZqHU2aeOhF31UKZrZ94sPv7aBUdhQx4HJ PyqGwQtoegI9hdPTr3wdInngrwraDQe6xKK6oVmKJLTil9f68dIwH6ZzgBD1GDGDXE00 HQlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Xd94kAYY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j34si977705ede.10.2019.09.26.03.08.45; Thu, 26 Sep 2019 03:09:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=Xd94kAYY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733029AbfIYXnY (ORCPT + 99 others); Wed, 25 Sep 2019 19:43:24 -0400 Received: from mail-pl1-f202.google.com ([209.85.214.202]:53413 "EHLO mail-pl1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732992AbfIYXnW (ORCPT ); Wed, 25 Sep 2019 19:43:22 -0400 Received: by mail-pl1-f202.google.com with SMTP id g13so258153plq.20 for ; Wed, 25 Sep 2019 16:43:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FpMTONf9jSbZVp/Ti1EaLgmO9nC11TmbgGftl+3PnOg=; b=Xd94kAYYME5hfFpdpbnsQg5ljRv8Qr/2aDSVVIWujRZV8DcbjVdowRQrZkwyXE7ZAf PsUJW8UIj7R9IYyn/klgxHUEWDrFWKvHHbFmE7pqzRQsuSnEEAbhqLQo9HsuC4V/gOCZ VhWvuEfVfkkW0HqEP7JU06o4gYLeN20Jc2S/3+Zl6januyJ4AQltm51bgOu8ctSNQ+Nw RHBZg8PBnkujl5T3oG7r5+Pjf7efoWSqeu4fsrvSIoii+8LyRmBdkrQyp02ua0RA9XUm IG/7KIKM/KIj0UN902pRhd05emQdifzS7xOp6rGKmi4aAei5loASuBV44XXyVdmJVeQo C31Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FpMTONf9jSbZVp/Ti1EaLgmO9nC11TmbgGftl+3PnOg=; b=muV5qnjsgQ4jRQMebxv7o+ppRtY1pX0e/hWrbnYb4ZZmY/FKWUguzSF1Dg8gjCNE2E 595GQ3m/5BDMYk7NaM0JjMKz0iJXKQ59PUdm1JHgnL2PHxa1oyVeIZIx0L7eWQtl5cnW Cck4sq30A90PbeLWZsPKD9gpmjmn+dhB3dJ48/qChN1K2VTv6yERi/MZz9dlxYMKeM0V ND/Rfz6vzBEje5jWyEITOaEfbw366dK6tL0sgGgP9dQ60K0vkD7npEQo7oFYvJNOZvO4 rMTGMfclCkBDYYSxThAyx+bnYlNDzSzbqm+riXrUHRxBR+X//+nzdtqKhu+8fY53UJXv tSxg== X-Gm-Message-State: APjAAAWxrSu8W0jKoQaTu7lbxJLvZwkI93tCnId4pzXzlMd/b85d2Voj q1Z5VhZGjmHomBTGURgGFfFpHTu9sHaDcOSg X-Received: by 2002:a63:e745:: with SMTP id j5mr417957pgk.302.1569455001448; Wed, 25 Sep 2019 16:43:21 -0700 (PDT) Date: Wed, 25 Sep 2019 16:43:12 -0700 In-Reply-To: <20190925234312.94063-1-allanzhang@google.com> Message-Id: <20190925234312.94063-2-allanzhang@google.com> Mime-Version: 1.0 References: <20190925234312.94063-1-allanzhang@google.com> X-Mailer: git-send-email 2.23.0.351.gc4317032e6-goog Subject: [PATCH 1/1] bpf: Fix bpf_event_output re-entry issue From: Allan Zhang To: daniel@iogearbox.net, songliubraving@fb.com, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Allan Zhang , Stanislav Fomichev , Eric Dumazet Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it can be called from atomic and non-atomic contexts since we don't have bpf_prog_active to prevent it happen. This patch enables 3 level of nesting to support normal, irq and nmi context. We can easily reproduce the issue by running neper crr mode with 100 flows and 10 threads from neper client side. Here is the whole stack dump: [ 515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220 [ 515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G W 4.15.0-smp-fixpanic #44 [ 515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018 [ 515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220 [ 515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246 [ 515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000 [ 515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80 [ 515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012 [ 515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002 [ 515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0 [ 515.228910] FS: 00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000 [ 515.228910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0 [ 515.228911] Call Trace: [ 515.228913] [ 515.228919] [] bpf_sockopt_event_output+0x3b/0x50 [ 515.228923] [] ? bpf_ktime_get_ns+0xe/0x10 [ 515.228927] [] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.228930] [] ? tcp_init_transfer+0x125/0x150 [ 515.228933] [] ? tcp_finish_connect+0x89/0x110 [ 515.228936] [] ? tcp_rcv_state_process+0x704/0x1010 [ 515.228939] [] ? sk_filter_trim_cap+0x53/0x2a0 [ 515.228942] [] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0 [ 515.228945] [] ? tcp_v6_do_rcv+0x1c0/0x460 [ 515.228947] [] ? tcp_v6_rcv+0x9f8/0xb30 [ 515.228951] [] ? ip6_route_input+0x190/0x220 [ 515.228955] [] ? ip6_protocol_deliver_rcu+0x6d/0x450 [ 515.228958] [] ? ip6_rcv_finish+0xb6/0x170 [ 515.228961] [] ? ip6_protocol_deliver_rcu+0x450/0x450 [ 515.228963] [] ? ipv6_rcv+0x61/0xe0 [ 515.228966] [] ? ipv6_list_rcv+0x330/0x330 [ 515.228969] [] ? __netif_receive_skb_one_core+0x5b/0xa0 [ 515.228972] [] ? __netif_receive_skb+0x21/0x70 [ 515.228975] [] ? process_backlog+0xb2/0x150 [ 515.228978] [] ? net_rx_action+0x16f/0x410 [ 515.228982] [] ? __do_softirq+0xdd/0x305 [ 515.228986] [] ? irq_exit+0x9c/0xb0 [ 515.228989] [] ? smp_call_function_single_interrupt+0x65/0x120 [ 515.228991] [] ? call_function_single_interrupt+0x81/0x90 [ 515.228992] [ 515.228996] [] ? io_serial_in+0x20/0x20 [ 515.229000] [] ? console_unlock+0x230/0x490 [ 515.229003] [] ? vprintk_emit+0x26a/0x2a0 [ 515.229006] [] ? vprintk_default+0x1f/0x30 [ 515.229008] [] ? vprintk_func+0x35/0x70 [ 515.229011] [] ? printk+0x50/0x66 [ 515.229013] [] ? bpf_event_output+0xb7/0x220 [ 515.229016] [] ? bpf_sockopt_event_output+0x3b/0x50 [ 515.229019] [] ? bpf_ktime_get_ns+0xe/0x10 [ 515.229023] [] ? release_sock+0x97/0xb0 [ 515.229026] [] ? tcp_recvmsg+0x31a/0xda0 [ 515.229029] [] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.229032] [] ? tcp_set_state+0x191/0x1b0 [ 515.229035] [] ? tcp_disconnect+0x2e/0x600 [ 515.229038] [] ? tcp_close+0x3eb/0x460 [ 515.229040] [] ? inet_release+0x42/0x70 [ 515.229043] [] ? inet6_release+0x39/0x50 [ 515.229046] [] ? __sock_release+0x4d/0xd0 [ 515.229049] [] ? sock_close+0x15/0x20 [ 515.229052] [] ? __fput+0xe7/0x1f0 [ 515.229055] [] ? ____fput+0xe/0x10 [ 515.229058] [] ? task_work_run+0x82/0xb0 [ 515.229061] [] ? exit_to_usermode_loop+0x7e/0x11f [ 515.229064] [] ? do_syscall_64+0x111/0x130 [ 515.229067] [] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops") Effort: BPF Signed-off-by: Allan Zhang Reviewed-by: Stanislav Fomichev Reviewed-by: Eric Dumazet --- kernel/trace/bpf_trace.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ca1255d14576..3e38a010003c 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -500,14 +500,17 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = { .arg5_type = ARG_CONST_SIZE_OR_ZERO, }; -static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs); -static DEFINE_PER_CPU(struct perf_sample_data, bpf_misc_sd); +static DEFINE_PER_CPU(int, bpf_event_output_nest_level); +struct bpf_nested_pt_regs { + struct pt_regs regs[3]; +}; +static DEFINE_PER_CPU(struct bpf_nested_pt_regs, bpf_pt_regs); +static DEFINE_PER_CPU(struct bpf_trace_sample_data, bpf_misc_sds); u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy) { - struct perf_sample_data *sd = this_cpu_ptr(&bpf_misc_sd); - struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs); + int nest_level = this_cpu_inc_return(bpf_event_output_nest_level); struct perf_raw_frag frag = { .copy = ctx_copy, .size = ctx_size, @@ -522,12 +525,25 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, .data = meta, }, }; + struct perf_sample_data *sd; + struct pt_regs *regs; + u64 ret; + + if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bpf_misc_sds.sds))) { + ret = -EBUSY; + goto out; + } + sd = this_cpu_ptr(&bpf_misc_sds.sds[nest_level - 1]); + regs = this_cpu_ptr(&bpf_pt_regs.regs[nest_level - 1]); perf_fetch_caller_regs(regs); perf_sample_data_init(sd, 0, 0); sd->raw = &raw; - return __bpf_perf_event_output(regs, map, flags, sd); + ret = __bpf_perf_event_output(regs, map, flags, sd); +out: + this_cpu_dec(bpf_event_output_nest_level); + return ret; } BPF_CALL_0(bpf_get_current_task) -- 2.23.0.351.gc4317032e6-goog