Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp748931pxb; Wed, 6 Oct 2021 14:41:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwtZgi9GfETiNgm9FjMmOQrI2jr+HjafbCCfnKxOUek/bkkyy2/5xjWdEW9/QceX1O8zV9L X-Received: by 2002:a17:906:308d:: with SMTP id 13mr816356ejv.570.1633556502613; Wed, 06 Oct 2021 14:41:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633556502; cv=none; d=google.com; s=arc-20160816; b=YRcADYNz1SiVFEGev6yb+e9UucLd/4GjLfU7UqMZqo9HvOvStelxzmWyXicm83T6EI biC1jgGd6I4n9QuX7tp5i9bKoOE/x7u8Lvc02b6lRvfjkH1VoOHxqvzJHH8aRj3ZWa0k kfRFKSTRsiHmi85OlFFZUOeRuPd/LfJYK94n61mpDZBo2B4O34t1k9fYqGek+LdW2Hg3 0ogiZ5ZwX4WjMNgSG92rVYVA6YYs2etkkub5Trl+muusv4XVlbAuZyBDqC41uWIV3+Bo VxrIUm8KTU/JG5z6cjnMsXiiKqZ3dleUZJVzql5jCnbK3kHQRBjEPLrsDirSvGKEJhN1 OuqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=kmHvzgwaCs4FXLj20XYmzQhb+wtSK3SNS4VC+QpDcZ0=; b=uw3Hf4tT40QJuB8MS6G425g2j3sp4iTcVyLfN/lLrBR8fiMKgcrWDuu7I6b/c/lK4x 2h9lLsz2hofZH3l7WovP5GKsLa2nWwg7NCtBhbv04RJG26lD5HmsH6ONuzCgDl75BeGx FO+uogBAILQ/ngZdljHtZjoepSn4JpC3t6m9WTnQNKdmkoNBQYaBmG/A0PswzXnY65TC 6B1+QdicQ94s9omIh2TPfVGZWd73LMo5F1+rfnw5kC98wNfLS0ZayPaFG3v65S6UBu1g pOgxqtp5t4Lw9zCSbThU1PH1kpIWGsMPJcyOhWwVmqkIuSmn0S50zIRMHSZJLcRQm0hG GaMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N5kzp5Z9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gn9si30919074ejc.136.2021.10.06.14.41.18; Wed, 06 Oct 2021 14:41:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N5kzp5Z9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239743AbhJFVjQ (ORCPT + 99 others); Wed, 6 Oct 2021 17:39:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:41160 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229582AbhJFVjP (ORCPT ); Wed, 6 Oct 2021 17:39:15 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C608C611CA; Wed, 6 Oct 2021 21:37:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1633556242; bh=Mw6pv/2+UxrGm6Egc4Gd8klF/8uB+fC7Wjrkv2xcS1w=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=N5kzp5Z9bTL1AZXYtN2mAysE1C9AR2ExMtcsrG9xTpeSp7ks7RXadwIpaMraSdFvb rZJxW23XruHDWEXNqbSCH1gaQMqfwJFyyYnH5TcjjHrSkOxnLCN35D7CEPpH/Ayo9r bysu+5jYtsKOi6tPcJpc0Mws7ZQZgxG87oEJgM8NdImN6NpZ1PdarL6JbTMj49z9i8 /f0ebrk8oDuu7Qa0nLuU1M6DYPH6HaLIXFVNV6Ylpihvs4OQbAYZE76JNzb1khqn94 CTq5Fx2pibA9vC1tZmmVtRzMeqAsuLz4uf+uS1fF23TR5jPcnwncFFfahvOhHkaXtj VFl41raCym8Hg== Received: by mail-lf1-f41.google.com with SMTP id z11so7950881lfj.4; Wed, 06 Oct 2021 14:37:22 -0700 (PDT) X-Gm-Message-State: AOAM530wrCnjd4M0US6aa16BQ0yenMyLC0ECISHZueHfvVLbyD4u+IG8 2rY18xmJLLsz0jlywXXF6V18RjNp13G32TPWXuk= X-Received: by 2002:a2e:5442:: with SMTP id y2mr460905ljd.436.1633556240988; Wed, 06 Oct 2021 14:37:20 -0700 (PDT) MIME-Version: 1.0 References: <20211006175106.GA295227@fuller.cnet> In-Reply-To: <20211006175106.GA295227@fuller.cnet> From: Song Liu Date: Wed, 6 Oct 2021 14:37:09 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH bpf-next] bpf: introduce helper bpf_raw_read_cpu_clock To: Marcelo Tosatti Cc: bpf , open list , Nitesh Narayan Lal , Nicolas Saenz Julienne , Thomas Gleixner , Peter Zijlstra , Peter Xu , Andrii Nakryiko Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 6, 2021 at 10:52 AM Marcelo Tosatti wrote: > > > > Add bpf_raw_read_cpu_clock helper, to read architecture specific > CPU clock. In x86's case, this is the TSC. > > This is necessary to synchronize bpf traces from host and guest bpf-programs > (after subtracting guest tsc-offset from guest timestamps). Trying to understand the use case. So in a host-guest scenario, bpf_ktime_get_ns() will return different values in host and guest, but rdtsc() will give the same value. Is this correct? > > Signed-off-by: Marcelo Tosatti > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index ab83c22d274e..832bb1f65f28 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -95,6 +95,7 @@ config X86 > select ARCH_HAS_UBSAN_SANITIZE_ALL > select ARCH_HAS_DEBUG_WX > select ARCH_HAS_ZONE_DMA_SET if EXPERT > + select ARCH_HAS_BPF_RAW_CPU_CLOCK > select ARCH_HAVE_NMI_SAFE_CMPXCHG > select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI > select ARCH_MIGHT_HAVE_PC_PARPORT > diff --git a/arch/x86/include/asm/bpf_raw_cpu_clock.h b/arch/x86/include/asm/bpf_raw_cpu_clock.h > new file mode 100644 > index 000000000000..6951c399819e > --- /dev/null > +++ b/arch/x86/include/asm/bpf_raw_cpu_clock.h > @@ -0,0 +1,10 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef _ASM_X86_BPF_RAW_CPU_CLOCK_H_ > +#define _ASM_X86_BPF_RAW_CPU_CLOCK_H_ > + > +static inline unsigned long long read_raw_cpu_clock(void) > +{ > + return rdtsc_ordered(); > +} > + > +#endif /* _ASM_X86_BPF_RAW_CPU_CLOCK_H_ */ > diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c > index 3eff08d7b8e5..844a44ff508d 100644 > --- a/drivers/media/rc/bpf-lirc.c > +++ b/drivers/media/rc/bpf-lirc.c > @@ -105,6 +105,8 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > return &bpf_ktime_get_ns_proto; > case BPF_FUNC_ktime_get_boot_ns: > return &bpf_ktime_get_boot_ns_proto; > + case BPF_FUNC_read_raw_cpu_clock: > + return &bpf_read_raw_cpu_clock_proto; > case BPF_FUNC_tail_call: > return &bpf_tail_call_proto; > case BPF_FUNC_get_prandom_u32: > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index d604c8251d88..b6cb426085fb 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -2058,6 +2058,7 @@ extern const struct bpf_func_proto bpf_get_numa_node_id_proto; > extern const struct bpf_func_proto bpf_tail_call_proto; > extern const struct bpf_func_proto bpf_ktime_get_ns_proto; > extern const struct bpf_func_proto bpf_ktime_get_boot_ns_proto; > +extern const struct bpf_func_proto bpf_read_raw_cpu_clock_proto; > extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto; > extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; > extern const struct bpf_func_proto bpf_get_current_comm_proto; > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 6fc59d61937a..52191791b089 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -4037,6 +4037,13 @@ union bpf_attr { > * Return > * Current *ktime*. > * > + * u64 bpf_read_raw_cpu_clock(void) > + * Description > + * Return the architecture specific CPU clock value. > + * For x86, this is the TSC clock. > + * Return > + * *CPU clock value* > + * > * long bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size, const void *data, u32 data_len) > * Description > * **bpf_seq_printf**\ () uses seq_file **seq_printf**\ () to print > @@ -5089,6 +5096,7 @@ union bpf_attr { > FN(task_pt_regs), \ > FN(get_branch_snapshot), \ > FN(trace_vprintk), \ > + FN(read_raw_cpu_clock), \ > /* */ > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper > diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig > index a82d6de86522..5815db157220 100644 > --- a/kernel/bpf/Kconfig > +++ b/kernel/bpf/Kconfig > @@ -21,6 +21,10 @@ config HAVE_EBPF_JIT > config ARCH_WANT_DEFAULT_BPF_JIT > bool > > +# Used by archs to tell they support reading raw CPU clock > +config ARCH_HAS_BPF_RAW_CPU_CLOCK > + bool > + > menu "BPF subsystem" > > config BPF_SYSCALL > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index b6c72af64d5d..8e2359dfd582 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -2345,6 +2345,8 @@ const struct bpf_func_proto bpf_get_numa_node_id_proto __weak; > const struct bpf_func_proto bpf_ktime_get_ns_proto __weak; > const struct bpf_func_proto bpf_ktime_get_boot_ns_proto __weak; > const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto __weak; > +const struct bpf_func_proto bpf_read_raw_cpu_clock_proto __weak; > + > > const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak; > const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c > index 1ffd469c217f..90b9e5efaf65 100644 > --- a/kernel/bpf/helpers.c > +++ b/kernel/bpf/helpers.c > @@ -18,6 +18,10 @@ > > #include "../../lib/kstrtox.h" > > +#ifdef CONFIG_ARCH_HAS_BPF_RAW_CPU_CLOCK > +#include > +#endif > + > /* If kernel subsystem is allowing eBPF programs to call this function, > * inside its own verifier_ops->get_func_proto() callback it should return > * bpf_map_lookup_elem_proto, so that verifier can properly check the arguments > @@ -168,6 +172,21 @@ const struct bpf_func_proto bpf_ktime_get_boot_ns_proto = { > .ret_type = RET_INTEGER, > }; > > +BPF_CALL_0(bpf_read_raw_cpu_clock) > +{ > +#ifdef CONFIG_ARCH_HAS_BPF_RAW_CPU_CLOCK > + return read_raw_cpu_clock(); > +#else > + return sched_clock(); > +#endif > +} > + > +const struct bpf_func_proto bpf_read_raw_cpu_clock_proto = { > + .func = bpf_read_raw_cpu_clock, > + .gpl_only = false, > + .ret_type = RET_INTEGER, > +}; > + > BPF_CALL_0(bpf_ktime_get_coarse_ns) > { > return ktime_get_coarse_ns(); > @@ -1366,6 +1385,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) > return &bpf_ktime_get_boot_ns_proto; > case BPF_FUNC_ktime_get_coarse_ns: > return &bpf_ktime_get_coarse_ns_proto; > + case BPF_FUNC_read_raw_cpu_clock: > + return &bpf_read_raw_cpu_clock_proto; > case BPF_FUNC_ringbuf_output: > return &bpf_ringbuf_output_proto; > case BPF_FUNC_ringbuf_reserve: > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index 6b3153841a33..047ca7c1d57a 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -1113,6 +1113,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) > return &bpf_ktime_get_boot_ns_proto; > case BPF_FUNC_ktime_get_coarse_ns: > return &bpf_ktime_get_coarse_ns_proto; > + case BPF_FUNC_read_raw_cpu_clock: > + return &bpf_read_raw_cpu_clock_proto; With the change in bpf_base_func_proto, this part is not needed. > case BPF_FUNC_tail_call: > return &bpf_tail_call_proto; > case BPF_FUNC_get_current_pid_tgid: > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index 6fc59d61937a..52191791b089 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -4037,6 +4037,13 @@ union bpf_attr { > * Return > * Current *ktime*. > * > + * u64 bpf_read_raw_cpu_clock(void) > + * Description > + * Return the architecture specific CPU clock value. > + * For x86, this is the TSC clock. > + * Return > + * *CPU clock value* > + * > * long bpf_seq_printf(struct seq_file *m, const char *fmt, u32 fmt_size, const void *data, u32 data_len) > * Description > * **bpf_seq_printf**\ () uses seq_file **seq_printf**\ () to print > @@ -5089,6 +5096,7 @@ union bpf_attr { > FN(task_pt_regs), \ > FN(get_branch_snapshot), \ > FN(trace_vprintk), \ > + FN(read_raw_cpu_clock), \ > /* */ > > /* integer value in 'imm' field of BPF_CALL instruction selects which helper >