Received: by 2002:a89:48b:0:b0:1f5:f2ab:c469 with SMTP id a11csp1325031lqd; Thu, 25 Apr 2024 11:56:39 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVHQh8rZc75/65CjLhyfpoQLQQQwAOKCBqRbDaF9owuo9ryqCPrlSHKieN8eV5D3IHzDE4ng8HZUrN8evWshpKar4vbw9QQSDUJ1GI4aA== X-Google-Smtp-Source: AGHT+IETLAI32qi7Z/GFW2QGsfrd2NAQT7u3nfct2xBKFD+yoTrnSwMuEmj7pLwL6b+qUUMATWmt X-Received: by 2002:a17:902:7586:b0:1e6:6bba:7c70 with SMTP id j6-20020a170902758600b001e66bba7c70mr367186pll.36.1714071399322; Thu, 25 Apr 2024 11:56:39 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1714071399; cv=pass; d=google.com; s=arc-20160816; b=zX/kaqZA7OCXRbNzdV+SfyDbVN0fbssuPtTN7IEO5Sd4hx6lfcY5nKIl9I2kpy+a2q eiC7/L+MSaH5CphFffRIgiYOu61XtfcuZZrL8ilkjP1Lqc0pkvdNmz0Fa6w1K0kUaUIn pYq/xv7O+XrIS8CEsTaIKvRQbvt+v8IShnLC7D34ymNnB2OUY4Y11av9fpm4Qhs7S/qz wRxhhk3Nus2rcKdP5pyB79YLva/OnvbLAaj4ERac3gXoSHHtpYUH+pEmZ3gs5GkKXwyP vFTuBBHEWREsl36u5WoDyc8UdTR9lcvWTnLV/skTUVOdW+ui347VQMxoJrGzhyYqTWP9 p5JA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=+wBA7CD3yKNg58XyBaAurQFwROexVjedXV1xpBRnUAg=; fh=6gRi3YCjG1e+PQBiSQ6kvZ1evqcR5arGVAojuYC63dk=; b=gPVWz3VMG4zimC0vGpw383/mE5+Weho0YWyzZcq3kNBRYRiXuQmEWyuto9tK4uAX9p LN2hI23iBKcsapljwHpqqdWzMk6Ng0EWnFcdXZUiBWFdBuUaXKGb8k5umOkq9REssz5Z 4GuPukSvN2p67HZJt0PXRo/0Kgd9kZp8stimukWTirikW+a3PaC0dtUxBNFOxEIYVS9J i2u+0imu/COmRMN5bTBoJt0kmMrhv+T6EvYLOUx5RvaXwXZK/8KBda5B8kwmxN35AJG2 GyYz5kWmS43VNGSSlO+nHuGELX1gy+3DbnpLk0hh7YQFTf6tjGkQIk6zCdrNjbniLp0T iZeA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Xe1J7qyi; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-159043-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-159043-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id u13-20020a170902e80d00b001e8505652aasi13832036plg.46.2024.04.25.11.56.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Apr 2024 11:56:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-159043-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Xe1J7qyi; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-159043-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-159043-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id DD868286205 for ; Thu, 25 Apr 2024 18:56:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7DE611509A2; Thu, 25 Apr 2024 18:56:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Xe1J7qyi" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6769414E2CC; Thu, 25 Apr 2024 18:56:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714071361; cv=none; b=OUiSpoquanK9MtcAPAZrZnzMx2DTEhrZ8Zv/8PlgXYo4NvCjkTYL28ATengC5Ht3e+6gIcYYZJK4NdS+EAQ6Qm+X/aXYAjdIZR0hUla75P0iQKk+jKpgoVdsThqBpNWPqowQ3TvX8SGd08LMZC4llp1ULzFNvy6G+BMX+ywb+rs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1714071361; c=relaxed/simple; bh=gvIDnW7fKA4/z8SLEzwxKjbQ9fFnFV+k80e9QAEvWGA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=lABSdxKyMDZQ8VeK2cNf6oSHanc98OVAdRwcvEm/VQgikYQVb3stTf0XpDdV5PL0xtSkauAh6Gaaie5bEyzGDZ+ihN1VqikAncEdUWpA3GzpoloTM1f23EnKPMqo2B3rD0qqs5edwoPkMGh/dMGf45eI3ncpsx7M3MXEtjY7raw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Xe1J7qyi; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id A4B7FC113CC; Thu, 25 Apr 2024 18:55:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1714071361; bh=gvIDnW7fKA4/z8SLEzwxKjbQ9fFnFV+k80e9QAEvWGA=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=Xe1J7qyi6penA0+QjEJfPnHXZtcAtYz5H/gGP9JX3jQzpnm9EeogB2YXxHB7ad6kv sGq0ya2XMskzTwrT4gF+wcKE8ZLHr42r2a71TfTKWXyT77rV2233mgMBT9Q3L0DL05 6CrKXi9k612/HZHQ2uksL2a+FrzzxYtW9W/Mkx8w+TmpIXwzgMOS/0SpgGnQeyjW+H i3bNTxjIFYrUYVvYyShDeRSw1PNqywS0u9TtXn6kLgUPBkETi9Ec8ZfDrr8G/GNWRh Vg6eSpBZNf+jYmxDXxSYDbDqwzXEY0YVspYYRMZcfJpwMN7YUiG8YmZu9LgafGws/o auDGF8endPQCA== From: Puranjay Mohan To: Andrii Nakryiko Cc: Catalin Marinas , Will Deacon , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Zi Shen Lim , Xu Kuohai , Florent Revest , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Subject: Re: [PATCH bpf-next v2 2/2] bpf, arm64: inline bpf_get_smp_processor_id() helper In-Reply-To: References: <20240424173550.16359-1-puranjay@kernel.org> <20240424173550.16359-3-puranjay@kernel.org> Date: Thu, 25 Apr 2024 18:55:56 +0000 Message-ID: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Andrii Nakryiko writes: > On Thu, Apr 25, 2024 at 3:14=E2=80=AFAM Puranjay Mohan wrote: >> >> Andrii Nakryiko writes: >> >> > On Wed, Apr 24, 2024 at 10:36=E2=80=AFAM Puranjay Mohan wrote: >> >> >> >> As ARM64 JIT now implements BPF_MOV64_PERCPU_REG instruction, inline >> >> bpf_get_smp_processor_id(). >> >> >> >> ARM64 uses the per-cpu variable cpu_number to store the cpu id. >> >> >> >> Here is how the BPF and ARM64 JITed assembly changes after this commi= t: >> >> >> >> BPF >> >> =3D=3D=3D=3D=3D >> >> BEFORE AFTER >> >> -------- ------- >> >> >> >> int cpu =3D bpf_get_smp_processor_id(); int cpu =3D bpf_get= _smp_processor_id(); >> >> (85) call bpf_get_smp_processor_id#229032 (18) r0 =3D 0xffff800= 082072008 >> >> (bf) r0 =3D r0 >> > >> > nit: hmm, you are probably using a bit outdated bpftool, it should be >> > emitted as: >> > >> > (bf) r0 =3D &(void __percpu *)(r0) >> >> Yes, I was using the bpftool shipped with the distro. I tried it again >> with the latest bpftool and it emitted this as expected. > > Cool, would be nice to update the commit message with the right syntax > for next revision, thanks! > Sure, will do. >> >> > >> >> (61) r0 =3D *(u32 *)(= r0 +0) >> >> >> >> ARM64 JIT >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> >> >> >> BEFORE AFTER >> >> -------- ------- >> >> >> >> int cpu =3D bpf_get_smp_processor_id(); int cpu =3D bpf_get_smp_= processor_id(); >> >> mov x10, #0xfffffffffffff4d0 mov x7, #0xffff8000fff= fffff >> >> movk x10, #0x802b, lsl #16 movk x7, #0x8207, lsl #= 16 >> >> movk x10, #0x8000, lsl #32 movk x7, #0x2008 >> >> blr x10 mrs x10, tpidr_el1 >> >> add x7, x0, #0x0 add x7, x7, x10 >> >> ldr w7, [x7] >> >> >> >> Performance improvement using benchmark[1] >> >> >> >> BEFORE AFTER >> >> -------- ------- >> >> >> >> glob-arr-inc : 23.817 =C2=B1 0.019M/s glob-arr-inc : 24.= 631 =C2=B1 0.027M/s >> >> arr-inc : 23.253 =C2=B1 0.019M/s arr-inc : 23.= 742 =C2=B1 0.023M/s >> >> hash-inc : 12.258 =C2=B1 0.010M/s hash-inc : 12.= 625 =C2=B1 0.004M/s >> >> >> >> [1] https://github.com/anakryiko/linux/commit/8dec900975ef >> >> >> >> Signed-off-by: Puranjay Mohan >> >> --- >> >> kernel/bpf/verifier.c | 11 ++++++++++- >> >> 1 file changed, 10 insertions(+), 1 deletion(-) >> >> >> > >> > Besides the nits, lgtm. >> > >> > Acked-by: Andrii Nakryiko >> > >> >> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >> >> index 9715c88cc025..3373be261889 100644 >> >> --- a/kernel/bpf/verifier.c >> >> +++ b/kernel/bpf/verifier.c >> >> @@ -20205,7 +20205,7 @@ static int do_misc_fixups(struct bpf_verifier= _env *env) >> >> goto next_insn; >> >> } >> >> >> >> -#ifdef CONFIG_X86_64 >> >> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64) >> > >> > I think you can drop this, we are protected by >> > bpf_jit_supports_percpu_insn() check and newly added inner #if/#elif >> > checks? >> >> If I remove this and later add support of percpu_insn on RISCV without >> inlining bpf_get_smp_processor_id() then it will cause problems here >> right? because then the last 5-6 lines inside this if(){} will be >> executed for RISCV. > > Just add > > #else > return -EFAULT; I don't think we can return. > #endif > > ? > > I'm trying to avoid this duplication of the defined(CONFIG_xxx) checks > for supported architectures. Does the following look correct? I will do it like this: /* Implement bpf_get_smp_processor_id() inline. */ if (insn->imm =3D=3D BPF_FUNC_get_smp_processor_id && prog->jit_requested && bpf_jit_supports_percpu_insn()) { /* BPF_FUNC_get_smp_processor_id inlining is an * optimization, so if pcpu_hot.cpu_number is ever * changed in some incompatible and hard to support * way, it's fine to back out this inlining logic */ #if defined(CONFIG_X86_64) insn_buf[0] =3D BPF_MOV32_IMM(BPF_REG_0, (u32)(unsi= gned long)&pcpu_hot.cpu_number); insn_buf[1] =3D BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF= _REG_0); insn_buf[2] =3D BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_R= EG_0, 0); cnt =3D 3; #elif defined(CONFIG_ARM64) struct bpf_insn cpu_number_addr[2] =3D { BPF_LD_IMM= 64(BPF_REG_0, (u64)&cpu_number) }; insn_buf[0] =3D cpu_number_addr[0]; insn_buf[1] =3D cpu_number_addr[1]; insn_buf[2] =3D BPF_MOV64_PERCPU_REG(BPF_REG_0, BPF= _REG_0); insn_buf[3] =3D BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_R= EG_0, 0); cnt =3D 4; #else goto next_insn; #endif new_prog =3D bpf_patch_insn_data(env, i + delta, in= sn_buf, cnt); if (!new_prog) return -ENOMEM; delta +=3D cnt - 1; env->prog =3D prog =3D new_prog; insn =3D new_prog->insnsi + i + delta; goto next_insn; } >> >> > >> >> /* Implement bpf_get_smp_processor_id() inline. */ >> >> if (insn->imm =3D=3D BPF_FUNC_get_smp_processor_id && >> >> prog->jit_requested && bpf_jit_supports_percpu_in= sn()) { >> >> @@ -20214,11 +20214,20 @@ static int do_misc_fixups(struct bpf_verifi= er_env *env) >> >> * changed in some incompatible and hard to s= upport >> >> * way, it's fine to back out this inlining l= ogic >> >> */ >> >> +#if defined(CONFIG_X86_64) >> >> insn_buf[0] =3D BPF_MOV32_IMM(BPF_REG_0, (u32= )(unsigned long)&pcpu_hot.cpu_number); >> >> insn_buf[1] =3D BPF_MOV64_PERCPU_REG(BPF_REG_= 0, BPF_REG_0); >> >> insn_buf[2] =3D BPF_LDX_MEM(BPF_W, BPF_REG_0,= BPF_REG_0, 0); >> >> cnt =3D 3; >> >> +#elif defined(CONFIG_ARM64) >> >> + struct bpf_insn cpu_number_addr[2] =3D { BPF_= LD_IMM64(BPF_REG_0, (u64)&cpu_number) }; >> >> >> > >> > this &cpu_number offset is not guaranteed to be within 4GB on arm64? >> >> Unfortunately, the per-cpu section is not placed in the first 4GB and >> therefore the per-cpu pointers are not 32-bit on ARM64. > > I see. It might make sense to turn x86-64 code into using MOV64_IMM as > well to keep more of the logic common. Then it will be just the > difference of an offset that's loaded. Give it a try? I think MOV64_IMM would have more overhead than MOV32_IMM and if we can use it in x86-64 we should keep doing it that way. Wdyt?=20 >> >> > >> >> + insn_buf[0] =3D cpu_number_addr[0]; >> >> + insn_buf[1] =3D cpu_number_addr[1]; >> >> + insn_buf[2] =3D BPF_MOV64_PERCPU_REG(BPF_REG_= 0, BPF_REG_0); >> >> + insn_buf[3] =3D BPF_LDX_MEM(BPF_W, BPF_REG_0,= BPF_REG_0, 0); >> >> + cnt =3D 4; >> >> +#endif >> >> new_prog =3D bpf_patch_insn_data(env, i + del= ta, insn_buf, cnt); >> >> if (!new_prog) >> >> return -ENOMEM; >> >> -- >> >> 2.40.1 >> >>