Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp3094321rdb; Mon, 4 Dec 2023 17:18:51 -0800 (PST) X-Google-Smtp-Source: AGHT+IH2i7CvIZXVjVVPVM4PoYCAgEIuF6d2pK6MM/yMluZgY51VpYIkwtCcfUlz8XzVrGs3hrVe X-Received: by 2002:a92:4a0f:0:b0:35d:59a2:6905 with SMTP id m15-20020a924a0f000000b0035d59a26905mr5252199ilf.50.1701739131537; Mon, 04 Dec 2023 17:18:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701739131; cv=none; d=google.com; s=arc-20160816; b=NHH0bsvfYBeaT199c/JrQjP5gi5HbNnz/jHPGpffUU/hAYcLAnXFOhij2eLbGQ5GRM eTnNdRM0yUGqpMy9uAtLb1acx2f2URjOrFm0EMewD5QGK4pg2RDMxh40y63ZnV/m+2ZF JkSafwhSLPNDNsDPCJCaAROZCssckqTiV7ZpyTRSI1oUdR+twkjNyGPPy7HnXMvEvI8Y p4uEIzttt6Dn6C5Vf04bn/XQbvuUlk2djVy/frC89w19NPSOuPjM9qhZykJ2pT4CQDny liV4GH3KoTjIA+9QyRwQeAf3BJXaTTRSiccEv6YtyB0Rnf/ZgOrULL7ojyz5cDP6hU5Y 4FZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=7jf8j7x8KL5+FXfau2GI8WqRGtfi/ioqnUgDBbWrCwc=; fh=fQgCWRvTCP/t++nK1ImfffmCDgAeoMpK4D1rM2RjSwU=; b=V9t6eGBR0j1MHT5TSzaVrgYmiI/Y0usCY4DN/GGD759Pe6l/9VppiI4m4kmXFtn6k3 Ee8+N1lMk65p1k38GxKP5QgcUiacbXbwUlFbmTSnBIf+dn+CdFWOLAHFOK1S2kyM4WY/ PuIIZV+yzeRc0OSrj2In9H3ACkOrwqNWqCcVTEN8qo1dcFto6Cl9XhilJltoPJoMZrYR jNIl8LdMsOzxgbybKQZGiKUGyc/xqxk/wtDwlgZn2K0d1gZ2qUPCokzY09mKCwAbDiDM 4Ydv2wNmCBydRTmwrEjOBdoqvf8kgpfhRkke/KIbYsuWcJ4pjZmq+0r2w5XnRdwrYVdK JV+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=dRXiJkmM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id s22-20020a634516000000b005b7dd20f8c1si8607003pga.20.2023.12.04.17.18.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 17:18:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=dRXiJkmM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 2C54C80ADF10; Mon, 4 Dec 2023 17:18:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343831AbjLEBSj (ORCPT + 99 others); Mon, 4 Dec 2023 20:18:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47682 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234803AbjLEBSi (ORCPT ); Mon, 4 Dec 2023 20:18:38 -0500 Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB27CAF; Mon, 4 Dec 2023 17:18:44 -0800 (PST) Received: by mail-wr1-x42a.google.com with SMTP id ffacd0b85a97d-3316bb1303bso3881382f8f.0; Mon, 04 Dec 2023 17:18:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701739123; x=1702343923; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=7jf8j7x8KL5+FXfau2GI8WqRGtfi/ioqnUgDBbWrCwc=; b=dRXiJkmMgX1yxKSrYMUOs+SPIgbQUY6dIghjFId5ulat6UOA4RkhMxc8WWqlPjT+Tu PRtFxUUYvoKqwRsFygr2Xg2YyAuTyBrDvzDrtrdZLJTIk2w6QaIDzEZxbOVGu5pGNKl+ AaixBZtiyYkbRD1V5DzH+nR6VKQ6qtCnl5pCuwg8a6EdddLRnVbZ20CxeRmh1oMbwdhu s0y4v4/U7vTUMjV/w7D67ZVAzH/Rue5SAHW92TNSDDk/vko1XpbMmv7JbllaVy6vkeR9 MMn7VoGYzNjjGLERnu08MGssvtMxqAFde4c/lRSEdkhj+43i/vzDiI6Vl+Gr+q/7r1q1 tCQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701739123; x=1702343923; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7jf8j7x8KL5+FXfau2GI8WqRGtfi/ioqnUgDBbWrCwc=; b=oK5F5Kkfiq9l7d6DVdTSyHV4+lpdKKr4YKsmgFE9C7xYruzzcdno7K2rKQDI/gSgFL B7JwXjJDPsL5mUUvWrBFUBghsegjNMUKYYeWBTDkh5MHPQCZsk5KR9pGwcOzJ2VxUpt8 Ao9DmvLIlkIF0ETCdM14bDT+ymtMWVfWbzhZaVxghn6QAH6eEt8e+AJ7Bs1a4muOIvXs Cs3Kn0EH58KaZ6rpGaBI9gPHMLucVNAQ9MnG6vBusp7cs/Z2/sKlTjvYpw6Lz/9NIZxS Ov7XrX8vMymFr3PCVQn6/irSwsh5yttOSVmvsL5SaVxMLpzRKCJKhQetUkB/4AhFW/cA aGxw== X-Gm-Message-State: AOJu0Yz0LvDNEWrDgNTG9v9IO3FRv4R/zjSyKvz7P/rRJFc2hc4q5ehT HgdL1depcnAvbaurrJI7Ub3rDthw90DjSLtJRxI= X-Received: by 2002:adf:e78a:0:b0:333:2fd2:812c with SMTP id n10-20020adfe78a000000b003332fd2812cmr3734853wrm.73.1701739122768; Mon, 04 Dec 2023 17:18:42 -0800 (PST) MIME-Version: 1.0 References: <20231130133630.192490507@infradead.org> <20231130134204.136058029@infradead.org> <20231204091334.GM3818@noisy.programming.kicks-ass.net> <20231204111128.GV8262@noisy.programming.kicks-ass.net> <20231204125239.GA1319@noisy.programming.kicks-ass.net> <20231204181614.GA7299@noisy.programming.kicks-ass.net> <20231204183354.GC7299@noisy.programming.kicks-ass.net> In-Reply-To: <20231204183354.GC7299@noisy.programming.kicks-ass.net> From: Alexei Starovoitov Date: Mon, 4 Dec 2023 17:18:31 -0800 Message-ID: Subject: Re: [PATCH v2 2/2] x86/cfi,bpf: Fix BPF JIT call To: Peter Zijlstra Cc: Jiri Olsa , Song Liu , Song Liu , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , "H. Peter Anvin" , "David S. Miller" , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Arnd Bergmann , Sami Tolvanen , Kees Cook , Nathan Chancellor , Nick Desaulniers , linux-riscv , LKML , Network Development , bpf , linux-arch , clang-built-linux , Josh Poimboeuf , Joao Moreira , Mark Rutland Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 04 Dec 2023 17:18:50 -0800 (PST) On Mon, Dec 4, 2023 at 10:34=E2=80=AFAM Peter Zijlstra wrote: > > > TL;DR, I think this is a pre-existing problem with kCFI + eBPF and not > caused by my patches. It's an old issue indeed. To workaround I just did: +__nocfi static long bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_fn, void *callback_ctx, u64 flags) to proceed further. test_progs passed few more tests, but then it hit: [ 13.965472] CFI failure at tcp_set_ca_state+0x51/0xd0 (target: 0xffffffffa02050d6; expected type: 0x3a47ac32) [ 13.966351] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 13.966752] CPU: 3 PID: 2142 Comm: test_progs Tainted: G O 6.7.0-rc3-00705-g421defd1bea0-dirty #5246 [ 13.967552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 13.968421] RIP: 0010:tcp_set_ca_state+0x51/0xd0 [ 13.968751] Code: 70 40 ff 84 c0 74 49 48 8b 83 60 07 00 00 4c 8b 58 10 4d 85 db 74 1b 40 0f b6 f5 48 89 df 41 ba ce 53 b8 c5 45 03 53 f1 74 02 <0f> 0b 2e e8 c7 ee 31 00 0f b6 83 90 07 00 00 40 80 e5 1f 24 e0 40 [ 13.975460] Call Trace: [ 13.975640] [ 13.975791] ? __die_body+0x68/0xb0 [ 13.976062] ? die+0xa4/0xd0 [ 13.976273] ? do_trap+0xa5/0x180 [ 13.976513] ? tcp_set_ca_state+0x51/0xd0 [ 13.976800] ? do_error_trap+0xb6/0x100 [ 13.977076] ? tcp_set_ca_state+0x51/0xd0 [ 13.977360] ? tcp_set_ca_state+0x51/0xd0 [ 13.977644] ? handle_invalid_op+0x2c/0x40 [ 13.977934] ? tcp_set_ca_state+0x51/0xd0 [ 13.978222] ? exc_invalid_op+0x38/0x60 [ 13.978497] ? asm_exc_invalid_op+0x1a/0x20 [ 13.978798] ? tcp_set_ca_state+0x51/0xd0 [ 13.979087] tcp_v6_syn_recv_sock+0x45c/0x6c0 [ 13.979401] tcp_check_req+0x497/0x590 [ 13.979671] tcp_v6_rcv+0x728/0xce0 [ 13.979923] ? raw6_local_deliver+0x63/0x350 [ 13.980257] ip6_protocol_deliver_rcu+0x2f6/0x560 [ 13.980596] ? ip6_input_finish+0x59/0x140 [ 13.980887] ? NF_HOOK+0x29/0x1d0 [ 13.981136] ip6_input_finish+0xcb/0x140 [ 13.981415] ? __cfi_ip6_input_finish+0x10/0x10 [ 13.981738] NF_HOOK+0x177/0x1d0 [ 13.981970] ? rcu_is_watching+0x10/0x40 [ 13.982279] ? lock_release+0x35/0x2e0 [ 13.982547] ? lock_release+0x35/0x2e0 [ 13.982822] ? NF_HOOK+0x29/0x1d0 [ 13.983064] ? __cfi_ip6_rcv_finish+0x10/0x10 [ 13.983409] NF_HOOK+0x177/0x1d0 [ 13.983664] ? ip6_rcv_core+0x50/0x6c0 [ 13.983956] ? process_backlog+0x132/0x290 [ 13.984264] ? process_backlog+0x132/0x290 [ 13.984557] __netif_receive_skb+0x5c/0x160 [ 13.984856] process_backlog+0x19e/0x290 [ 13.985140] __napi_poll+0x3f/0x1f0 [ 13.985402] net_rx_action+0x193/0x330 [ 13.985672] __do_softirq+0x14d/0x3ea [ 13.985963] ? do_softirq+0x7f/0xb0 [ 13.986243] ? __dev_queue_xmit+0x5b/0xd50 [ 13.986563] ? ip6_finish_output2+0x222/0x7a0 [ 13.986906] do_softirq+0x7f/0xb0 The stack trace doesn't have any bpf, but it's a bpf issue too. Here tcp_set_ca_state() calls icsk->icsk_ca_ops->set_state(sk, ca_state); which calls bpf prog via bpf trampoline. re: bpf_jit_binary_pack_hdr(). since cfi_mode is __ro_after_init we don't need to waste cfi_offset variable in prog->aux and in jit_context. How about +int get_cfi_offset(void) +{ + switch (cfi_mode) { + case CFI_FINEIBT: + return 16; + case CFI_KCFI: +#ifdef CONFIG_CALL_PADDING + return 16; +#else + return 5; +#endif + default: + return 0; + } +} + struct bpf_binary_header * bpf_jit_binary_pack_hdr(const struct bpf_prog *fp) { - unsigned long real_start =3D (unsigned long)fp->bpf_func; + unsigned long real_start =3D (unsigned long)fp->bpf_func - get_cfi_offset(); and have __weak version of get_cfi_offset() in bpf/core.c that returns 0 and non-weak in arch/x86 like above? Similarly remove prog_offset from jit_context and undo: ctx->prog_offset =3D emit_prologue(...) to keep it as 'static void emit_prologue' since cfi offset is fixed at early boot and the same for all progs. Separately we need to deal with bpf_for_each_array_elem() which doesn't look easy. And fix tcp_set_ca_state() as well (which is even harder). Just to see where places like these are I did: +__nocfi BPF_CALL_4(bpf_loop, u32, nr_loops, void *, callback_fn, void *, callback_= ctx, +__nocfi static long bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_fn, +__nocfi static enum hrtimer_restart bpf_timer_cb(struct hrtimer *hrtimer) +__nocfi static int __bpf_rbtree_add(struct bpf_rb_root *root, +__nocfi BPF_CALL_4(bpf_user_ringbuf_drain, struct bpf_map *, map, +__nocfi void tcp_set_ca_state(struct sock *sk, const u8 ca_state) +__nocfi void tcp_init_congestion_control(struct sock *sk) +__nocfi void tcp_enter_loss(struct sock *sk) +__nocfi static void tcp_cong_avoid(struct sock *sk, u32 ack, u32 acked) +__nocfi static inline void tcp_in_ack_event(struct sock *sk, u32 flags) and more... Which is clearly not a direction to go. Instead of annotating callers is there a way to say that all bpf_callback_t calls are nocfi? I feel the patches scratched the iceberg.