Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3990821rdh; Tue, 28 Nov 2023 08:57:31 -0800 (PST) X-Google-Smtp-Source: AGHT+IHT2XtCFI0PauvL9w2wZF0Kb6joFiM/Er/iCgU1v6Qs4kbP4hRjrVGM77b+8a6KKX4/jbxF X-Received: by 2002:a05:6a20:da9a:b0:180:f0ed:2992 with SMTP id iy26-20020a056a20da9a00b00180f0ed2992mr20908621pzb.51.1701190651014; Tue, 28 Nov 2023 08:57:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701190651; cv=none; d=google.com; s=arc-20160816; b=HgEmTSu54yG0PJvOVtCe+u023qY9iMbGO6kNZ8EuIJa8Fc8o2z9vTuSYS9yvntwBgb KsIV9ZYoufottSXTqBddlgaVmNOhrQXrDvmL66IIQVL8JlX+3b93BPG93ZsZ4RGnJzyz AOPxBDzzunSRs5N/Mw0DZHQMRGg4EQve8JDTe8KyMV2d6ipdAGGd5XhJk5TfV6/6W32A X0wFadFaH5IuqoUy1z0VTmfuYQ0XaY5yIY9rmjhjLe6FDBQynY3nt3YudM3uSUxaiOBE RqxklI75XnqvoS/l8j0gaonv5f7ggCGuE+bhVcFCaAssJfnUBGtUpaZOxF0ZUZGgjcHm U/mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:mime-version:date :dkim-signature:message-id; bh=uCoZVP9p+mWxUhxPz450EOpQNcQZHZe8k+JQNdrOq+4=; fh=Ysg4MTOtX9yC8tRVgVAV2ii6Wy4xmPki8txgvAxE8Rw=; b=lnODO7zxzwZWs7FuldTgVP9IDNVrHmHAnr5ItEJMujkFUhc/5Xx0uQE53t60P7+JPj pVEnlY7CouNNAGYWk5DZsqVFrOAVxNZg64rpMxPZvtEz2XcZEoOOEF5Tzi6a9C8Kqemt KKaYv0GS6yPpMPCqtTvvPjag7i5JVRn8p5ZHDGMVwhaud2Mn6D+ud6Icgz9MmJ5jyoCz JfKHFK4SUy1tr2TPece9b6llId8P6TEnZG/4EYY5WW1xZJu2zeFwBhVUU7gafoWm97tm 3ZDn2HOsbiR6iyg1AlSfwJk5EycHI/rKtxCawusa9b9yapRoppg8XrS14EIVYeQYEC0b arLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b="V/IMqR+5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id d16-20020a056a0010d000b006cbd3ab255bsi12949295pfu.61.2023.11.28.08.57.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 08:57:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b="V/IMqR+5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id B1A438090EA5; Tue, 28 Nov 2023 08:57:27 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229928AbjK1Q5J (ORCPT + 99 others); Tue, 28 Nov 2023 11:57:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234910AbjK1Q5F (ORCPT ); Tue, 28 Nov 2023 11:57:05 -0500 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E51271733 for ; Tue, 28 Nov 2023 08:57:08 -0800 (PST) Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1701190624; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uCoZVP9p+mWxUhxPz450EOpQNcQZHZe8k+JQNdrOq+4=; b=V/IMqR+5MvetGHQkAH2tR4eCGlycsSCxnHVBwlL2yv8P9Z1D92e/CghKCudtjCYV42WxFZ bFi7c9hfFC3tyzi3L92/UipOwcEU2Wlc4fHczhmT7gyAVQhKWbuztFUP+JcQv6VJceXtbj qI39oB+YScNGWNd3KAxE4O/Z6qh9xR4= Date: Tue, 28 Nov 2023 08:56:55 -0800 MIME-Version: 1.0 Subject: Re: [PATCH ipsec-next v1 6/7] bpf: selftests: test_tunnel: Disable CO-RE relocations Content-Language: en-GB To: Daniel Xu Cc: Eduard Zingerman , Alexei Starovoitov , Shuah Khan , Daniel Borkmann , Andrii Nakryiko , Alexei Starovoitov , Steffen Klassert , antony.antony@secunet.com, Mykola Lysenko , Martin KaFai Lau , Song Liu , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , bpf , "open list:KERNEL SELFTEST FRAMEWORK" , LKML , devel@linux-ipsec.org, Network Development References: <3ec6c068-7f95-419a-a0ae-a901f95e4838@linux.dev> <18e43cdf65e7ba0d8f6912364fbc5b08a6928b35.camel@gmail.com> <0535eb913f1a0c2d3c291478fde07e0aa2b333f1.camel@gmail.com> <42f9bf0d-695a-412d-bea5-cb7036fa7418@linux.dev> <53jaqi72ef4gynyafxidl5veb54kfs7dttxezkarwg75t7szd4@cvfg5pc7pyum> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 28 Nov 2023 08:57:27 -0800 (PST) On 11/28/23 11:17 AM, Daniel Xu wrote: > On Tue, Nov 28, 2023 at 10:13:50AM -0600, Daniel Xu wrote: >> On Mon, Nov 27, 2023 at 08:06:01PM -0800, Yonghong Song wrote: >>> On 11/27/23 7:01 PM, Daniel Xu wrote: >>>> On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: >>>>> On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: >>>>>> On 11/27/23 12:44 AM, Yonghong Song wrote: >>>>>>> On 11/26/23 8:52 PM, Eduard Zingerman wrote: >>>>>>>> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: >>>>>>>> [...] >>>>>>>>>> Tbh I'm not sure. This test passes with preserve_static_offset >>>>>>>>>> because it suppresses preserve_access_index. In general clang >>>>>>>>>> translates bitfield access to a set of IR statements like: >>>>>>>>>> >>>>>>>>>>    C: >>>>>>>>>>      struct foo { >>>>>>>>>>        unsigned _; >>>>>>>>>>        unsigned a:1; >>>>>>>>>>        ... >>>>>>>>>>      }; >>>>>>>>>>      ... foo->a ... >>>>>>>>>> >>>>>>>>>>    IR: >>>>>>>>>>      %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 >>>>>>>>>>      %bf.load = load i8, ptr %a, align 4 >>>>>>>>>>      %bf.clear = and i8 %bf.load, 1 >>>>>>>>>>      %bf.cast = zext i8 %bf.clear to i32 >>>>>>>>>> >>>>>>>>>> With preserve_static_offset the getelementptr+load are replaced by a >>>>>>>>>> single statement which is preserved as-is till code generation, >>>>>>>>>> thus load with align 4 is preserved. >>>>>>>>>> >>>>>>>>>> On the other hand, I'm not sure that clang guarantees that load or >>>>>>>>>> stores used for bitfield access would be always aligned according to >>>>>>>>>> verifier expectations. >>>>>>>>>> >>>>>>>>>> I think we should check if there are some clang knobs that prevent >>>>>>>>>> generation of unaligned memory access. I'll take a look. >>>>>>>>> Is there a reason to prefer fixing in compiler? I'm not opposed to it, >>>>>>>>> but the downside to compiler fix is it takes years to propagate and >>>>>>>>> sprinkles ifdefs into the code. >>>>>>>>> >>>>>>>>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? >>>>>>>> Well, the contraption below passes verification, tunnel selftest >>>>>>>> appears to work. I might have messed up some shifts in the macro, >>>>>>>> though. >>>>>>> I didn't test it. But from high level it should work. >>>>>>> >>>>>>>> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular >>>>>>>> field access might be unaligned. >>>>>>> clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet >>>>>>> alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. >>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> index 3065a716544d..41cd913ac7ff 100644 >>>>>>>> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>>>>> @@ -9,6 +9,7 @@ >>>>>>>>   #include "vmlinux.h" >>>>>>>>   #include >>>>>>>>   #include >>>>>>>> +#include >>>>>>>>   #include "bpf_kfuncs.h" >>>>>>>>   #include "bpf_tracing_net.h" >>>>>>>>   @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) >>>>>>>>       return TC_ACT_OK; >>>>>>>>   } >>>>>>>>   +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({            \ >>>>>>>> +    void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET);    \ >>>>>>>> +    unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE);        \ >>>>>>>> +    unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ >>>>>>>> +    unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ >>>>>>>> +    unsigned bit_size = (rshift - lshift);                \ >>>>>>>> +    unsigned long long nval, val, hi, lo;                \ >>>>>>>> +                                    \ >>>>>>>> +    asm volatile("" : "=r"(p) : "0"(p));                \ >>>>>>> Use asm volatile("" : "+r"(p)) ? >>>>>>> >>>>>>>> +                                    \ >>>>>>>> +    switch (byte_size) {                        \ >>>>>>>> +    case 1: val = *(unsigned char *)p; break;            \ >>>>>>>> +    case 2: val = *(unsigned short *)p; break;            \ >>>>>>>> +    case 4: val = *(unsigned int *)p; break;            \ >>>>>>>> +    case 8: val = *(unsigned long long *)p; break;            \ >>>>>>>> +    }                                \ >>>>>>>> +    hi = val >> (bit_size + rshift);                \ >>>>>>>> +    hi <<= bit_size + rshift;                    \ >>>>>>>> +    lo = val << (bit_size + lshift);                \ >>>>>>>> +    lo >>= bit_size + lshift;                    \ >>>>>>>> +    nval = new_val;                            \ >>>>>>>> +    nval <<= lshift;                        \ >>>>>>>> +    nval >>= rshift;                        \ >>>>>>>> +    val = hi | nval | lo;                        \ >>>>>>>> +    switch (byte_size) {                        \ >>>>>>>> +    case 1: *(unsigned char *)p      = val; break;            \ >>>>>>>> +    case 2: *(unsigned short *)p     = val; break;            \ >>>>>>>> +    case 4: *(unsigned int *)p       = val; break;            \ >>>>>>>> +    case 8: *(unsigned long long *)p = val; break;            \ >>>>>>>> +    }                                \ >>>>>>>> +}) >>>>>>> I think this should be put in libbpf public header files but not sure >>>>>>> where to put it. bpf_core_read.h although it is core write? >>>>>>> >>>>>>> But on the other hand, this is a uapi struct bitfield write, >>>>>>> strictly speaking, CORE write is really unnecessary here. It >>>>>>> would be great if we can relieve users from dealing with >>>>>>> such unnecessary CORE writes. In that sense, for this particular >>>>>>> case, I would prefer rewriting the code by using byte-level >>>>>>> stores... >>>>>> or preserve_static_offset to clearly mean to undo bitfield CORE ... >>>>> Ok, I will do byte-level rewrite for next revision. >>>> [...] >>>> >>>> This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . >>>> >>>> But I don't think it's very pretty. Also I'm seeing on the internet that >>>> people are saying the exact layout of bitfields is compiler dependent. >>> Any reference for this (exact layout of bitfields is compiler dependent)? >>> >>>> So I am wondering if these byte sized writes are correct. For that >>>> matter, I am wondering how the GCC generated bitfield accesses line up >>>> with clang generated BPF bytecode. Or why uapi contains a bitfield. >>> One thing for sure is memory layout of bitfields should be the same >>> for both clang and gcc as it is determined by C standard. Register >>> representation and how to manipulate could be different for different >>> compilers. >> I was reading this thread: >> https://github.com/Lora-net/LoRaMac-node/issues/697. It's obviously not >> authoritative, but they sure sound confident! >> >> I think I've also heard it before a long time ago when I was working on >> adding bitfield support to bpftrace. > Wikipedia [0] also claims this: > > The layout of bit fields in a C struct is > implementation-defined. For behavior that remains predictable > across compilers, it may be preferable to emulate bit fields > with a primitive and bit operators: > > [0]: https://en.wikipedia.org/wiki/Bit_field#C_programming_language Thanks for the informaiton. I am truely not aware of bit field layout could be different for different compilers. Does this mean source level bitfield manipulation may not work? uapi has bitfield is okay. compiler should do the right thing to do load/store in bitfields. Also, the networking bitfields are related memory layout transferring on the wire. Its memory layout is determined (although little/big endian interpresentation is different). BPF_CORE_WRITE_BITFIELD 'should' also be okay since the offset/size etc. is gotten from the compiler internals (from dwarf in more precise term). So looks like BPF_CORE_WRITE_BITFIELD is the way to go. Please use it then.