Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1270777rdb; Wed, 6 Dec 2023 13:39:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IEWamBdEJtSZpYfjx8CBr8W5bN5CnTuHrlDvgnsCOqgDyMwzCVnt6zCSU1gqmaT6+skMMDg X-Received: by 2002:a05:6358:7214:b0:170:4205:a968 with SMTP id h20-20020a056358721400b001704205a968mr1791713rwa.11.1701898797465; Wed, 06 Dec 2023 13:39:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701898797; cv=none; d=google.com; s=arc-20160816; b=woTK+f9PYEGw1049ogM9Oipl+bLmS2sTg28cyAVmIwV/5ThI0L6T5ioM0i0hT4/WQ7 0aQUdgv15Wuvy8yJ4rJjAUI38qyaN67iCyDWeGmEFdX99A82GNssTkvYFFtH8Nwvd63a rrPlL9BVPIlK7xDojT2HSOMs9SDORw/jbp6cvSLAvTDTtmWOGr4L4itBTBZ/BLWgejgk TFCDXDe2tzjPQOMsr0I3ObpGAUGR6HQT4H2lmK7s4UGYVKLdKF8IGktoIb7khzP7snoQ PsYLVBmHu3dI0x2YCYEaDCDytZRyA/Iels78YM5yXq/6qZlF1F3XdAwC2mM/edT+KLi6 jiMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=QjTiKTTNoFwqsK54uc+2vEdGntF3HRgfWaG2KiC6D/8=; fh=fQgCWRvTCP/t++nK1ImfffmCDgAeoMpK4D1rM2RjSwU=; b=mSOJBsXL8sl71xY+BBfpfzUTyLFMbPzqnZ11LhG0ukxjg8g8GlVNVunzXVKKah17y0 06xrYpitm+HmM+VLEhMJA0e90rEGNivRLOgeB25m0kTWLZskApGm0xgpR7Jzzm6g1Hgu wM0Et5jPmvcTfdI4wD7iluOBawUxD8j2KJjAonZ6ixxDFb+BTUU/rYOE91VPIcH/RbwT dTcORguoeywsKqli93dVfh3CDPT+W0JMTQbhhzG1uRtYMOGyb8MjD/NxlihmZTk/pAWW SW7dp3uXeAeGquVMDBuKbNaSrwpANonZoNAfuq6Kv45I4Y5KP7q5M4NrSrtivTK/Z6FR UKPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=V4pvCceC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id h9-20020a636c09000000b005bdbf2ac2d2si503521pgc.85.2023.12.06.13.39.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 13:39:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=V4pvCceC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 16D458025B71; Wed, 6 Dec 2023 13:39:56 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229679AbjLFVjp (ORCPT + 99 others); Wed, 6 Dec 2023 16:39:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229455AbjLFVjn (ORCPT ); Wed, 6 Dec 2023 16:39:43 -0500 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BB6AD1; Wed, 6 Dec 2023 13:39:49 -0800 (PST) Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1d03bcf27e9so2126685ad.0; Wed, 06 Dec 2023 13:39:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701898788; x=1702503588; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=QjTiKTTNoFwqsK54uc+2vEdGntF3HRgfWaG2KiC6D/8=; b=V4pvCceCY5HpU9DplcgINfG3ZeBbsdOSu87k9nmgDc5VaVhNEjYWDbDadgBw0XVYCz XzcT15uKLm+E+vTMXXrrwQDGxZPX8Qn+lbqR/TfvI6/uU7uxDwU3dLjXFtxo2ivafJbG iXSWzVpoYLYX8OOenSts/njqC/lW8HMHNTTdHswlVVD+duciLhGMZwvLuM2510Kpp7G6 tZBdS6HqRNCOhIXPD3eS4USKqJVOFB9LU8IJBhuaCE7SJN1DeVNTdLcJwg+ycKtP8Gb3 VUOdPQ2VnCgcygj4t6dXxDm9qqkLeepK6Rn/Sv67JazwNvLTFvet1ifA6F4sDr0ACgaS 9WBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701898788; x=1702503588; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=QjTiKTTNoFwqsK54uc+2vEdGntF3HRgfWaG2KiC6D/8=; b=AJohKknTlhG6FKTJpLe4P2JrSF516euN6FDOgtB02ffVzf6BCTL/g9PlGJfNgLF4Jm P2eJhZjR9BQPCxGzM+tKWo81R85lxBRlSdZmych4eTB+8WSpL4KWFszrSwWjZN2utQBJ 6zgO8fK+SdTcDw/r3aW2sriV/hyTuQO+somDcb5nmyJeplpyYQKpDq3+qQPdBDmpMdQG 5oA6OmQvsTZqC7HGcLb8mgKXTYzTg161XMT89OFva0moMpNNFMwg9/G89su2bDD6l+7k MiM2HcY1/f6rqLEpRCvF21NVwLwbF7MEVf6/iB2Vw2VwMByDUg0bkwmvE16qK/vazVlI MijQ== X-Gm-Message-State: AOJu0Yy+Gr1x57j4YIHzAFXde/85km5MUK7vJMzeLP+tHIv35ZAK7ma5 2jIKwXyCk2H1P6VKvz03QF8= X-Received: by 2002:a17:902:b90c:b0:1d0:b926:bbcd with SMTP id bf12-20020a170902b90c00b001d0b926bbcdmr1379659plb.54.1701898788518; Wed, 06 Dec 2023 13:39:48 -0800 (PST) Received: from MacBook-Pro-49.local ([2620:10d:c090:500::4:8a5f]) by smtp.gmail.com with ESMTPSA id t12-20020a1709028c8c00b001d077c9a9acsm268249plo.185.2023.12.06.13.39.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 13:39:48 -0800 (PST) Date: Wed, 6 Dec 2023 13:39:43 -0800 From: Alexei Starovoitov To: Peter Zijlstra Cc: Jiri Olsa , Song Liu , Song Liu , Paul Walmsley , Palmer Dabbelt , Albert Ou , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , X86 ML , "H. Peter Anvin" , "David S. Miller" , David Ahern , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Arnd Bergmann , Sami Tolvanen , Kees Cook , Nathan Chancellor , Nick Desaulniers , linux-riscv , LKML , Network Development , bpf , linux-arch , clang-built-linux , Josh Poimboeuf , Joao Moreira , Mark Rutland Subject: Re: [PATCH v2 2/2] x86/cfi,bpf: Fix BPF JIT call Message-ID: References: <20231204091334.GM3818@noisy.programming.kicks-ass.net> <20231204111128.GV8262@noisy.programming.kicks-ass.net> <20231204125239.GA1319@noisy.programming.kicks-ass.net> <20231204181614.GA7299@noisy.programming.kicks-ass.net> <20231204183354.GC7299@noisy.programming.kicks-ass.net> <20231206163814.GB36423@noisy.programming.kicks-ass.net> <20231206183713.GA35897@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231206183713.GA35897@noisy.programming.kicks-ass.net> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Wed, 06 Dec 2023 13:39:56 -0800 (PST) On Wed, Dec 06, 2023 at 07:37:13PM +0100, Peter Zijlstra wrote: > On Wed, Dec 06, 2023 at 05:38:14PM +0100, Peter Zijlstra wrote: > > On Mon, Dec 04, 2023 at 05:18:31PM -0800, Alexei Starovoitov wrote: > > > > > [ 13.978497] ? asm_exc_invalid_op+0x1a/0x20 > > > [ 13.978798] ? tcp_set_ca_state+0x51/0xd0 > > > [ 13.979087] tcp_v6_syn_recv_sock+0x45c/0x6c0 > > > [ 13.979401] tcp_check_req+0x497/0x590 > > > > > The stack trace doesn't have any bpf, but it's a bpf issue too. > > > Here tcp_set_ca_state() calls > > > icsk->icsk_ca_ops->set_state(sk, ca_state); > > > which calls bpf prog via bpf trampoline. > > > > > > > > Specifically, I think this is > > tools/testing/selftests/bpf/progs/bpf_cubic.c, which has: > > > > .set_state = (void *)bpf_cubic_state, > > > > which comes from: > > > > BPF_STRUCT_OPS(bpf_cubic_state, struct sock *sk, __u8 *new_state) > > > > which then wraps: > > > > BPF_PROG() > > > > which ends up generating: > > > > static __always_inline ___bpf_cubic_state(unsigned long long *ctx, struct sock *sk, __u8 *new_state) > > { > > ... > > } > > > > void bpf_cubic_state(unsigned long long *ctx) > > { > > return ____bpf_cubic_state(ctx, ctx[0], ctx[1]); > > } Yep. That's correct. > > I think this then uses arch_prepare_bpf_trampoline(), but I'm entirely > > lost how this all comes together, because the way I understand it the > > whole bpf_trampoline is used to hook into an ftrace __fentry hook. > > > > And a __fentry hook is very much not a function pointer. Help!?!? > > kernel/bpf/bpf_struct_ops.c:bpf_struct_ops_prepare_trampoline() > > And yeah, it seems to use the ftrace trampoline for indirect calls here, > *sigh*. Not quite. bpf_struct_ops_prepare_trampoline() prepares a trampoline that does conversion from native calling convention to bpf calling convention. We could have optimized it for the case of x86-64 and num_args <= 5 and it would be a nop trampoline, but so far it's generic and works on x86-64, arm64, etc. There were patches posted to make it work on 32-bit archs too (not landed yet). All native args are stored one by one into u64 ctx[0], u64 ctx[1], on stack and then bpf prog is called with a single ctx pointer. For example for the case of struct tcp_congestion_ops { void (*set_state)(struct sock *sk, u8 new_state); } The translation of 'sk' into ctx[0] is based on 'struct btf_func_model' which is discovered from 'set_state' func prototype as stored in BTF. So the trampoline for set_state, copies %rdi into ctx[0] and %rsi into ctx[1] and _directly_ calls bpf_cubic_state(). Note arch_prepare_bpf_trampoline() emits ENDBR as the first insn, because the pointer to this trampoline is directly stored in 'struct tcp_congestion_ops'. Later from TCP stack point of view 'icsk_ca_ops' are exactly the same for built-in cong controls (CCs), kernel module's CCs and bpf-based CCs. All calls to struct tcp_congestion_ops callbacks are normal indirect calls. Different CCs have different struct tcp_congestion_ops with their own pointers to functions, of course. There is no ftrace here at all. No .text live patching either. All is ok until kCFI comes into picture. Here we probably need to teach arch_prepare_bpf_trampoline() to emit different __kcfi_typeid depending on kernel function proto, so that caller hash checking logic won't be tripped. I suspect that requires to reverse engineer an algorithm of computing kcfi from clang. other ideas? > > The other case: > > > > For tools/testing/selftests/bpf/progs/bloom_filter_bench.c we have: > > > > bpf_for_each_map_elem(&array_map, bloom_callback, &data, 0); > > > > and here bloom callback appears like a normal function: > > > > static __u64 > > bloom_callback(struct bpf_map *map, __u32 *key, void *val, > > struct callback_ctx *data) > > > > > > But what do functions looks like in the JIT? What's the actual address > > that's then passed into the helper function. Given this seems to work > > without kCFI, it should at least have an ENDBR, but there's only 3 of > > those afaict: Right. That is very different from struct_ops/trampoline. There is no trampoline here at all. A bpf prog is JITed as normal, but its prototype is technically bpf_callback_t. We do the same JITing for all progs. Both main entry prog and subprograms. They all are treated as accepting 5 u64 args and returning single u64. For the main prog the prototype: bpf_prog_runX(u64 *regs, const struct bpf_insn *insn) is partially true, since 2nd arg is never used and the 1st arg is 'ctx'. So from x86-64 JIT pov there is no difference whether it's single 8-byte arg or five 8-byte args. In the case of bpf_for_each_map_elem() the 'bloom_callback' is a subprog of bpf_callback_t type. So the kernel is doing: ret = callback_fn((u64)(long)map, (u64)(long)&key, (u64)(long)val, (u64)(long)callback_ctx, 0); and that works on all archs including 32-bit. The kernel is doing conversion from native calling convention to bpf calling convention and for lucky archs like x86-64 the conversion is a true nop. It's a plain indirect call to JITed bpf prog. Note there is no interpreter support here. This works on archs with JITs only. No ftrace and no trampoline. This case is easier to make work with kCFI. The JIT will use: cfi_bpf_hash: .long __kcfi_typeid___bpf_prog_runX like your patch already does. And will use extern u64 __bpf_callback_fn(u64, u64, u64, u64, u64); cfi_bpf_subprog_hash: .long __kcfi_typeid___bpf_callback_fn to JIT all subprogs. See bpf_is_subprog(). Which shouldn't be too difficult to add. We'd need to tweak the verifier to make sure bpf_for_each_map_elem and friends never callback the main subprog which is technically the case today. Just need to add more guardrails. I can work on this. btw there are two patchsets in progress that will touch core bits of JITs. This one: https://patchwork.kernel.org/project/netdevbpf/cover/20231201190654.1233153-1-song@kernel.org/ and this one: https://patchwork.kernel.org/project/netdevbpf/cover/20231011152725.95895-1-hffilwlqm@gmail.com/ so do you mind resending your current set with get_cfi_offset() change and I can land it into bpf-next, so we can fix one bug at a time, build on top, and avoid conflicts? The more we dig the more it looks like that the follow up you planned to do on top of this set isn't going to happen soon. So should be ok going through bpf-next and then you can follow up with x86 things after merge window?