Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3595686rdh; Mon, 27 Nov 2023 20:06:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IHd7HAOJH8Ubp8oVfwyyoRU7scUgMJIoykMXolp0PEL9nqZszLRj4+MwYmrLP3HYejV5gP4 X-Received: by 2002:a17:902:ea06:b0:1cf:fc5c:75b1 with SMTP id s6-20020a170902ea0600b001cffc5c75b1mr269546plg.45.1701144408134; Mon, 27 Nov 2023 20:06:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701144408; cv=none; d=google.com; s=arc-20160816; b=pPkZNTSJYRY3EeQ/9cCp5BntxLNyOPcUevffx/BJcDLeHygUVh704lw2LXdsEFg+2x 2fPmAQ70EQb338viLIZc26AP5G4gE7dP3VX/6TuGcoOROV8RtCt638HfANH3kJl7u3+L j0TmaEBNsrw+BdQi9oqDwB3GG5qTFe7pLO7IY5mdcbdCeMTBAkUMk6Acf17xFJBH0si8 YyX/NR0DtZsQgqrHM/SjprrjqS+5tA78aGcoLXs11m6Q9TkKNBkpLPVlpkAV66srjr0V OsGuyr468EbbadMVsZvgEjZPEtNC3nlfHRnDb/iKovd4YYPm0xcm33iYiiDuiNXUDu+c smaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:mime-version:date :dkim-signature:message-id; bh=hCB+Tnyjl/lcxIYlcCvvwZQa4hXLluU5PYVE5GOSw70=; fh=Ysg4MTOtX9yC8tRVgVAV2ii6Wy4xmPki8txgvAxE8Rw=; b=q174O5Os/wkQB6Hs0WBmaqdao/uTST8J7fMs0viw32r3Oasl8kpC5tqMQWXOsilv/c 9w8GYE4C8FRg2Y01fCBSI5NZJm7Po2KlVrUec8n+kOfo06VzVYenfkQ7V/FprE+e8JiZ BiMaUi3g78G+YlqzLvlN4XGnwZz0U+yBRQBAkppZnXFS8NdnJVvbQH8t8ga4tnu2c9BZ 71TKRoA0hojkGH3zSmpqzWHL18KX0PoIva0DfCm7doQ6opPI8F+C+QeF1LdNWdrzXqUg d0kjtOg8ehK2QTg8Ici+TIypOAhmJR+3n+kPlD2C+0E8L9SQKtW32AtAGU2QinAZgNoB Bu3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=RPyOjSsl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id j18-20020a170902da9200b001cf7c82c00asi11858119plx.499.2023.11.27.20.06.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 20:06:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=RPyOjSsl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id BA1DD8059353; Mon, 27 Nov 2023 20:06:44 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234590AbjK1EGU (ORCPT + 99 others); Mon, 27 Nov 2023 23:06:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234589AbjK1EGO (ORCPT ); Mon, 27 Nov 2023 23:06:14 -0500 Received: from out-182.mta0.migadu.com (out-182.mta0.migadu.com [91.218.175.182]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E500510CA for ; Mon, 27 Nov 2023 20:06:14 -0800 (PST) Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1701144373; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hCB+Tnyjl/lcxIYlcCvvwZQa4hXLluU5PYVE5GOSw70=; b=RPyOjSslSqXufqb2RcMVQZndE8KliOKF1eVOVbP5IFrliwhRUVgf0GqNx7IeMI6fvTjqPP XiHnLJlArI1wx3nI4OjsWwft97gCu/JKzfN/qBUBYFGqV53KIMaWNZHc9FRN4LGaucizNS qBCFMpVRgiAutrTogxxUsoG/7+02+ZI= Date: Mon, 27 Nov 2023 20:06:01 -0800 MIME-Version: 1.0 Subject: Re: [PATCH ipsec-next v1 6/7] bpf: selftests: test_tunnel: Disable CO-RE relocations Content-Language: en-GB To: Daniel Xu Cc: Eduard Zingerman , Alexei Starovoitov , Shuah Khan , Daniel Borkmann , Andrii Nakryiko , Alexei Starovoitov , Steffen Klassert , antony.antony@secunet.com, Mykola Lysenko , Martin KaFai Lau , Song Liu , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , bpf , "open list:KERNEL SELFTEST FRAMEWORK" , LKML , devel@linux-ipsec.org, Network Development References: <391d524c496acc97a8801d8bea80976f58485810.1700676682.git.dxu@dxuuu.xyz> <0f210cef-c6e9-41c1-9ba8-225f046435e5@linux.dev> <3ec6c068-7f95-419a-a0ae-a901f95e4838@linux.dev> <18e43cdf65e7ba0d8f6912364fbc5b08a6928b35.camel@gmail.com> <0535eb913f1a0c2d3c291478fde07e0aa2b333f1.camel@gmail.com> <42f9bf0d-695a-412d-bea5-cb7036fa7418@linux.dev> <53jaqi72ef4gynyafxidl5veb54kfs7dttxezkarwg75t7szd4@cvfg5pc7pyum> X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Yonghong Song In-Reply-To: <53jaqi72ef4gynyafxidl5veb54kfs7dttxezkarwg75t7szd4@cvfg5pc7pyum> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 27 Nov 2023 20:06:44 -0800 (PST) On 11/27/23 7:01 PM, Daniel Xu wrote: > On Mon, Nov 27, 2023 at 02:45:11PM -0600, Daniel Xu wrote: >> On Sun, Nov 26, 2023 at 09:53:04PM -0800, Yonghong Song wrote: >>> On 11/27/23 12:44 AM, Yonghong Song wrote: >>>> On 11/26/23 8:52 PM, Eduard Zingerman wrote: >>>>> On Sun, 2023-11-26 at 18:04 -0600, Daniel Xu wrote: >>>>> [...] >>>>>>> Tbh I'm not sure. This test passes with preserve_static_offset >>>>>>> because it suppresses preserve_access_index. In general clang >>>>>>> translates bitfield access to a set of IR statements like: >>>>>>> >>>>>>>    C: >>>>>>>      struct foo { >>>>>>>        unsigned _; >>>>>>>        unsigned a:1; >>>>>>>        ... >>>>>>>      }; >>>>>>>      ... foo->a ... >>>>>>> >>>>>>>    IR: >>>>>>>      %a = getelementptr inbounds %struct.foo, ptr %0, i32 0, i32 1 >>>>>>>      %bf.load = load i8, ptr %a, align 4 >>>>>>>      %bf.clear = and i8 %bf.load, 1 >>>>>>>      %bf.cast = zext i8 %bf.clear to i32 >>>>>>> >>>>>>> With preserve_static_offset the getelementptr+load are replaced by a >>>>>>> single statement which is preserved as-is till code generation, >>>>>>> thus load with align 4 is preserved. >>>>>>> >>>>>>> On the other hand, I'm not sure that clang guarantees that load or >>>>>>> stores used for bitfield access would be always aligned according to >>>>>>> verifier expectations. >>>>>>> >>>>>>> I think we should check if there are some clang knobs that prevent >>>>>>> generation of unaligned memory access. I'll take a look. >>>>>> Is there a reason to prefer fixing in compiler? I'm not opposed to it, >>>>>> but the downside to compiler fix is it takes years to propagate and >>>>>> sprinkles ifdefs into the code. >>>>>> >>>>>> Would it be possible to have an analogue of BPF_CORE_READ_BITFIELD()? >>>>> Well, the contraption below passes verification, tunnel selftest >>>>> appears to work. I might have messed up some shifts in the macro, >>>>> though. >>>> I didn't test it. But from high level it should work. >>>> >>>>> Still, if clang would peek unlucky BYTE_{OFFSET,SIZE} for a particular >>>>> field access might be unaligned. >>>> clang should pick a sensible BYTE_SIZE/BYTE_OFFSET to meet >>>> alignment requirement. This is also required for BPF_CORE_READ_BITFIELD. >>>> >>>>> --- >>>>> >>>>> diff --git a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> index 3065a716544d..41cd913ac7ff 100644 >>>>> --- a/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> +++ b/tools/testing/selftests/bpf/progs/test_tunnel_kern.c >>>>> @@ -9,6 +9,7 @@ >>>>>   #include "vmlinux.h" >>>>>   #include >>>>>   #include >>>>> +#include >>>>>   #include "bpf_kfuncs.h" >>>>>   #include "bpf_tracing_net.h" >>>>>   @@ -144,6 +145,38 @@ int ip6gretap_get_tunnel(struct __sk_buff *skb) >>>>>       return TC_ACT_OK; >>>>>   } >>>>>   +#define BPF_CORE_WRITE_BITFIELD(s, field, new_val) ({            \ >>>>> +    void *p = (void *)s + __CORE_RELO(s, field, BYTE_OFFSET);    \ >>>>> +    unsigned byte_size = __CORE_RELO(s, field, BYTE_SIZE);        \ >>>>> +    unsigned lshift = __CORE_RELO(s, field, LSHIFT_U64); \ >>>>> +    unsigned rshift = __CORE_RELO(s, field, RSHIFT_U64); \ >>>>> +    unsigned bit_size = (rshift - lshift);                \ >>>>> +    unsigned long long nval, val, hi, lo;                \ >>>>> +                                    \ >>>>> +    asm volatile("" : "=r"(p) : "0"(p));                \ >>>> Use asm volatile("" : "+r"(p)) ? >>>> >>>>> +                                    \ >>>>> +    switch (byte_size) {                        \ >>>>> +    case 1: val = *(unsigned char *)p; break;            \ >>>>> +    case 2: val = *(unsigned short *)p; break;            \ >>>>> +    case 4: val = *(unsigned int *)p; break;            \ >>>>> +    case 8: val = *(unsigned long long *)p; break;            \ >>>>> +    }                                \ >>>>> +    hi = val >> (bit_size + rshift);                \ >>>>> +    hi <<= bit_size + rshift;                    \ >>>>> +    lo = val << (bit_size + lshift);                \ >>>>> +    lo >>= bit_size + lshift;                    \ >>>>> +    nval = new_val;                            \ >>>>> +    nval <<= lshift;                        \ >>>>> +    nval >>= rshift;                        \ >>>>> +    val = hi | nval | lo;                        \ >>>>> +    switch (byte_size) {                        \ >>>>> +    case 1: *(unsigned char *)p      = val; break;            \ >>>>> +    case 2: *(unsigned short *)p     = val; break;            \ >>>>> +    case 4: *(unsigned int *)p       = val; break;            \ >>>>> +    case 8: *(unsigned long long *)p = val; break;            \ >>>>> +    }                                \ >>>>> +}) >>>> I think this should be put in libbpf public header files but not sure >>>> where to put it. bpf_core_read.h although it is core write? >>>> >>>> But on the other hand, this is a uapi struct bitfield write, >>>> strictly speaking, CORE write is really unnecessary here. It >>>> would be great if we can relieve users from dealing with >>>> such unnecessary CORE writes. In that sense, for this particular >>>> case, I would prefer rewriting the code by using byte-level >>>> stores... >>> or preserve_static_offset to clearly mean to undo bitfield CORE ... >> Ok, I will do byte-level rewrite for next revision. > [...] > > This patch seems to work: https://pastes.dxuuu.xyz/0glrf9 . > > But I don't think it's very pretty. Also I'm seeing on the internet that > people are saying the exact layout of bitfields is compiler dependent. Any reference for this (exact layout of bitfields is compiler dependent)? > So I am wondering if these byte sized writes are correct. For that > matter, I am wondering how the GCC generated bitfield accesses line up > with clang generated BPF bytecode. Or why uapi contains a bitfield. One thing for sure is memory layout of bitfields should be the same for both clang and gcc as it is determined by C standard. Register representation and how to manipulate could be different for different compilers. > > WDYT, should I send up v2 with this or should I do one of the other > approaches in this thread? Daniel, look at your patch, since we need to do CORE_READ for those bitfields any way, I think Eduard's patch with BPF_CORE_WRITE_BITFIELD does make sense and it also makes code easy to understand. Could you take Eduard's patch for now? Whether and where to put BPF_CORE_WRITE_BITFIELD macros can be decided later. > > I am ok with any of the approaches. > > Thanks, > Daniel >