2024-05-04 03:14:42

by Abhishek Chauhan

[permalink] [raw]
Subject: [RFC PATCH bpf-next v6 0/3] Replace mono_delivery_time with tstamp_type

Patch 1 :- This patch takes care of only renaming the mono delivery
timestamp to tstamp_type with no change in functionality of
existing available code in kernel also
Starts assigning tstamp_type with either mono or real and
introduces a new enum in the skbuff.h, again no change in functionality
of the existing available code in kernel , just making the code scalable.

Patch 2 :- Additional bit was added to support tai timestamp type to
avoid tstamp drops in the forwarding path when testing TC-ETF.
Patch is also updating bpf filter.c
Some updates to bpf header files with introduction to BPF_SKB_CLOCK_TAI
and documentation updates stating deprecation of BPF_SKB_TSTAMP_UNSPEC
and BPF_SKB_TSTAMP_DELIVERY_MONO

Patch 3:- Handles forwarding of UDP packets with TAI clock id tstamp_type
type with supported changes for tc_redirect/tc_redirect_dtime
to handle forwarding of UDP packets with TAI tstamp_type

Abhishek Chauhan (3):
net: Rename mono_delivery_time to tstamp_type for scalabilty
net: Add additional bit to support clockid_t timestamp type
selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets

include/linux/skbuff.h | 68 ++++++++++++++-----
include/net/inet_frag.h | 4 +-
include/uapi/linux/bpf.h | 15 ++--
net/bridge/netfilter/nf_conntrack_bridge.c | 6 +-
net/core/dev.c | 2 +-
net/core/filter.c | 54 ++++++++-------
net/ieee802154/6lowpan/reassembly.c | 2 +-
net/ipv4/inet_fragment.c | 2 +-
net/ipv4/ip_fragment.c | 2 +-
net/ipv4/ip_output.c | 14 ++--
net/ipv4/raw.c | 2 +-
net/ipv4/tcp_output.c | 16 ++---
net/ipv6/ip6_output.c | 11 +--
net/ipv6/netfilter.c | 6 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
net/ipv6/raw.c | 2 +-
net/ipv6/reassembly.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/packet/af_packet.c | 7 +-
net/sched/act_bpf.c | 4 +-
net/sched/cls_bpf.c | 4 +-
tools/include/uapi/linux/bpf.h | 15 ++--
.../selftests/bpf/prog_tests/ctx_rewrite.c | 10 +--
.../selftests/bpf/prog_tests/tc_redirect.c | 3 -
.../selftests/bpf/progs/test_tc_dtime.c | 39 +++++------
25 files changed, 172 insertions(+), 122 deletions(-)

--
2.25.1



2024-05-04 03:22:09

by Abhishek Chauhan

[permalink] [raw]
Subject: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type

tstamp_type is now set based on actual clockid_t compressed
into 2 bits.

To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
other clockid_t timestamp.

We now support CLOCK_TAI as part of tstamp_type as part of this
commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Abhishek Chauhan <[email protected]>
---
Changes since v5
- Took care of documentation comments of tstamp_type
in skbuff.h as mentioned by Willem.
- Use of complete words instead of abbrevation in
macro definitions as mentioned by Willem.
- Fixed indentation problems
- Removed BPF_SKB_TSTAMP_UNSPEC and marked it
Deprecated as documentation, and introduced
BPF_SKB_CLOCK_REALTIME instead.
- BUILD_BUG_ON for additional enums introduced.
- __ip_make_skb and ip6_make_skb now has
tcp checks to mark tcp packet as mono tstamp base.
- separated the selftests/bpf changes into another patch.
- Made changes as per Martin in selftest bpf code and
tool/include/uapi/linux/bpf.h

Changes since v4
- Made changes to BPF code in filter.c as per
Martin's comments
- Minor fixes on comments given on documentation
from Willem in skbuff.h (removed obvious ones)
- Made changes to ctx_rewrite.c and test_tc_dtime.c
- test_tc_dtime.c i am not really sure if i took care
of all the changes as i am not too familiar with
the framework.
- Introduce common mask SKB_TSTAMP_TYPE_MASK instead
of multiple SKB mask.
- Optimisation on BPF code as suggested by Martin.
- Set default case to SKB_CLOCK_REALTME.

Changes since v3
- Carefully reviewed BPF APIs and made changes in
BPF code as well.
- Re-used actual clockid_t values since skbuff.h
indirectly includes uapi/linux/time.h
- Added CLOCK_TAI as part of the skb_set_delivery_time
handling instead of CLOCK_USER
- Added default in switch for unsupported and invalid
timestamp with an WARN_ONCE
- All of the above comments were given by Willem
- Made changes in filter.c as per Martin's comments
to handle invalid cases in bpf code with addition of
SKB_TAI_DELIVERY_TIME_MASK

Changes since v2
- Minor changes to commit subject

Changes since v1
- identified additional changes in BPF framework.
- Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
- Made changes in skb_set_delivery_time to keep changes similar to
previous code for mono_delivery_time and just setting tstamp_type
bit 1 for userspace timestamp.


include/linux/skbuff.h | 21 +++++++++++--------
include/uapi/linux/bpf.h | 15 +++++++++-----
net/core/filter.c | 44 +++++++++++++++++++++++-----------------
net/ipv4/ip_output.c | 5 ++++-
net/ipv4/raw.c | 2 +-
net/ipv6/ip6_output.c | 5 ++++-
net/ipv6/raw.c | 2 +-
net/packet/af_packet.c | 7 +++----
8 files changed, 61 insertions(+), 40 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index de3915e2bfdb..fe7d8dbef77e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -709,6 +709,8 @@ typedef unsigned char *sk_buff_data_t;
enum skb_tstamp_type {
SKB_CLOCK_REALTIME,
SKB_CLOCK_MONOTONIC,
+ SKB_CLOCK_TAI,
+ __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
};

/**
@@ -829,8 +831,7 @@ enum skb_tstamp_type {
* @decrypted: Decrypted SKB
* @slow_gro: state present at GRO time, slower prepare step required
* @tstamp_type: When set, skb->tstamp has the
- * delivery_time in mono clock base Otherwise, the
- * timestamp is considered real clock base.
+ * delivery_time clock base of skb->tstamp.
* @napi_id: id of the NAPI struct this skb came from
* @sender_cpu: (aka @napi_id) source CPU in XPS
* @alloc_cpu: CPU which did the skb allocation.
@@ -958,7 +959,7 @@ struct sk_buff {
/* private: */
__u8 __mono_tc_offset[0];
/* public: */
- __u8 tstamp_type:1; /* See skb_tstamp_type */
+ __u8 tstamp_type:2; /* See skb_tstamp_type */
#ifdef CONFIG_NET_XGRESS
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
__u8 tc_skip_classify:1;
@@ -1088,15 +1089,16 @@ struct sk_buff {
#endif
#define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)

-/* if you move tc_at_ingress or mono_delivery_time
+/* if you move tc_at_ingress or tstamp_type
* around, you also must adapt these constants.
*/
#ifdef __BIG_ENDIAN_BITFIELD
-#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
-#define TC_AT_INGRESS_MASK (1 << 6)
+#define SKB_TSTAMP_TYPE_MASK (3 << 6)
+#define SKB_TSTAMP_TYPE_RSHIFT (6)
+#define TC_AT_INGRESS_MASK (1 << 5)
#else
-#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
-#define TC_AT_INGRESS_MASK (1 << 1)
+#define SKB_TSTAMP_TYPE_MASK (3)
+#define TC_AT_INGRESS_MASK (1 << 2)
#endif
#define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)

@@ -4213,6 +4215,9 @@ static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
case CLOCK_MONOTONIC:
tstamp_type = SKB_CLOCK_MONOTONIC;
break;
+ case CLOCK_TAI:
+ tstamp_type = SKB_CLOCK_TAI;
+ break;
default:
WARN_ON_ONCE(1);
kt = 0;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 90706a47f6ff..25ea393cf084 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6207,12 +6207,17 @@ union { \
__u64 :64; \
} __attribute__((aligned(8)))

+/* The enum used in skb->tstamp_type. It specifies the clock type
+ * of the time stored in the skb->tstamp.
+ */
enum {
- BPF_SKB_TSTAMP_UNSPEC,
- BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
- /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
- * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
- * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
+ BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
+ BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
+ BPF_SKB_CLOCK_REALTIME = 0,
+ BPF_SKB_CLOCK_MONOTONIC = 1,
+ BPF_SKB_CLOCK_TAI = 2,
+ /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
+ * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
*/
};

diff --git a/net/core/filter.c b/net/core/filter.c
index a3781a796da4..9f3df4a0d1ee 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7726,16 +7726,20 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
return -EOPNOTSUPP;

switch (tstamp_type) {
- case BPF_SKB_TSTAMP_DELIVERY_MONO:
+ case BPF_SKB_CLOCK_MONOTONIC:
if (!tstamp)
return -EINVAL;
skb->tstamp = tstamp;
skb->tstamp_type = SKB_CLOCK_MONOTONIC;
break;
- case BPF_SKB_TSTAMP_UNSPEC:
- if (tstamp)
+ case BPF_SKB_CLOCK_TAI:
+ if (!tstamp)
return -EINVAL;
- skb->tstamp = 0;
+ skb->tstamp = tstamp;
+ skb->tstamp_type = SKB_CLOCK_TAI;
+ break;
+ case BPF_SKB_CLOCK_REALTIME:
+ skb->tstamp = tstamp;
skb->tstamp_type = SKB_CLOCK_REALTIME;
break;
default:
@@ -9387,16 +9391,17 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
{
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
- /* AX is needed because src_reg and dst_reg could be the same */
- __u8 tmp_reg = BPF_REG_AX;
-
- *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
- SKB_BF_MONO_TC_OFFSET);
- *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
- SKB_MONO_DELIVERY_TIME_MASK, 2);
- *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
- *insn++ = BPF_JMP_A(1);
- *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
+ BUILD_BUG_ON(__SKB_CLOCK_MAX != (int)BPF_SKB_CLOCK_TAI);
+ BUILD_BUG_ON(SKB_CLOCK_REALTIME != (int)BPF_SKB_CLOCK_REALTIME);
+ BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != (int)BPF_SKB_CLOCK_MONOTONIC);
+ BUILD_BUG_ON(SKB_CLOCK_TAI != (int)BPF_SKB_CLOCK_TAI);
+ *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
+ *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK);
+#ifdef __BIG_ENDIAN_BITFIELD
+ *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSHIFT);
+#else
+ BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1));
+#endif

return insn;
}
@@ -9439,10 +9444,11 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
__u8 tmp_reg = BPF_REG_AX;

*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
- *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
- TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
- *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
- TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
+ /* check if ingress mask bits is set */
+ *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
+ *insn++ = BPF_JMP_A(4);
+ *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_TSTAMP_TYPE_MASK, 1);
+ *insn++ = BPF_JMP_A(2);
/* skb->tc_at_ingress && skb->tstamp_type,
* read 0 as the (rcv) timestamp.
*/
@@ -9479,7 +9485,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
/* goto <store> */
*insn++ = BPF_JMP_A(2);
/* <clear>: skb->tstamp_type */
- *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
+ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TSTAMP_TYPE_MASK);
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET);
}
#endif
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fe86cadfa85b..c3d852eecb01 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,

skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
skb->mark = cork->mark;
- skb->tstamp = cork->transmit_time;
+ if (sk_is_tcp(sk))
+ skb_set_delivery_type_by_clockid(skb, cork->transmit_time, CLOCK_MONOTONIC);
+ else
+ skb_set_delivery_type_by_clockid(skb, cork->transmit_time, sk->sk_clockid);
/*
* Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
* on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 4cb43401e0e0..1a0953650356 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -360,7 +360,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
skb->protocol = htons(ETH_P_IP);
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
- skb->tstamp = sockc->transmit_time;
+ skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);
skb_dst_set(skb, &rt->dst);
*rtp = NULL;

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 05067bd44775..797a9764e8fe 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1924,7 +1924,10 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,

skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = cork->base.mark;
- skb->tstamp = cork->base.transmit_time;
+ if (sk_is_tcp(sk))
+ skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, CLOCK_MONOTONIC);
+ else
+ skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, sk->sk_clockid);

ip6_cork_steal_dst(skb, cork);
IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 2eedf255600b..f838366e8256 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -621,7 +621,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
skb->protocol = htons(ETH_P_IPV6);
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
- skb->tstamp = sockc->transmit_time;
+ skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, sk->sk_clockid);

skb_put(skb, length);
skb_reset_network_header(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8c6d3fbb4ed8..89b54021d196 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2056,8 +2056,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
skb->dev = dev;
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = READ_ONCE(sk->sk_mark);
- skb->tstamp = sockc.transmit_time;
-
+ skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);
skb_setup_tx_timestamp(skb, sockc.tsflags);

if (unlikely(extra_len == 4))
@@ -2585,7 +2584,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
skb->dev = dev;
skb->priority = READ_ONCE(po->sk.sk_priority);
skb->mark = READ_ONCE(po->sk.sk_mark);
- skb->tstamp = sockc->transmit_time;
+ skb_set_delivery_type_by_clockid(skb, sockc->transmit_time, po->sk.sk_clockid);
skb_setup_tx_timestamp(skb, sockc->tsflags);
skb_zcopy_set_nouarg(skb, ph.raw);

@@ -3063,7 +3062,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
skb->dev = dev;
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc.mark;
- skb->tstamp = sockc.transmit_time;
+ skb_set_delivery_type_by_clockid(skb, sockc.transmit_time, sk->sk_clockid);

if (unlikely(extra_len == 4))
skb->no_fcs = 1;
--
2.25.1


2024-05-04 03:22:15

by Abhishek Chauhan

[permalink] [raw]
Subject: [RFC PATCH bpf-next v6 1/3] net: Rename mono_delivery_time to tstamp_type for scalabilty

mono_delivery_time was added to check if skb->tstamp has delivery
time in mono clock base (i.e. EDT) otherwise skb->tstamp has
timestamp in ingress and delivery_time at egress.

Renaming the bitfield from mono_delivery_time to tstamp_type is for
extensibilty for other timestamps such as userspace timestamp
(i.e. SO_TXTIME) set via sock opts.

As we are renaming the mono_delivery_time to tstamp_type, it makes
sense to start assigning tstamp_type based on enum defined
in this commit.

Earlier we used bool arg flag to check if the tstamp is mono in
function skb_set_delivery_time, Now the signature of the functions
accepts tstamp_type to distinguish between mono and real time.

Also skb_set_delivery_type_by_clockid is a new function which accepts
clockid to determine the tstamp_type.

In future tstamp_type:1 can be extended to support userspace timestamp
by increasing the bitfield.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Abhishek Chauhan <[email protected]>
---
Changes since v5
- Avoided using garble function names as mentioned by
Willem.
- Implemented a conversion function stead of duplicating
the same logic as mentioned by Willem.
- Fixed indentation problems and minor documentation issues
which mentions tstamp_type as a whole instead of bitfield
notations. (Mentioned both by Willem and Martin)

Changes since v4
- Introduce new function to directly delivery_time and
another to set tstamp_type based on clockid.
- Removed un-necessary comments in skbuff.h as
enums were obvious and understood.

Changes since v3
- Fixed inconsistent capitalization in skbuff.h
- remove reference to MONO_DELIVERY_TIME_MASK in skbuff.h
and point it to skb_tstamp_type now.
- Explicitely setting SKB_CLOCK_MONO if valid transmit_time
ip_send_unicast_reply
- Keeping skb_tstamp inline with skb_clear_tstamp.
- skb_set_delivery_time checks if timstamp is 0 and
sets the tstamp_type to SKB_CLOCK_REAL.
- Above comments are given by Willem
- Found out that skbuff.h has access to uapi/linux/time.h
So now instead of using CLOCK_REAL/CLOCK_MONO
i am checking actual clockid_t directly to set tstamp_type
example:- CLOCK_REALTIME/CLOCK_MONOTONIC
- Compilation error fixed in
net/ieee802154/6lowpan/reassembly.c

Changes since v2
- Minor changes to commit subject

Changes since v1
- Squashed the two commits into one as mentioned by Willem.
- Introduced switch in skb_set_delivery_time.
- Renamed and removed directionality aspects w.r.t tstamp_type
as mentioned by Willem.

include/linux/skbuff.h | 53 ++++++++++++++++------
include/net/inet_frag.h | 4 +-
net/bridge/netfilter/nf_conntrack_bridge.c | 6 +--
net/core/dev.c | 2 +-
net/core/filter.c | 10 ++--
net/ieee802154/6lowpan/reassembly.c | 2 +-
net/ipv4/inet_fragment.c | 2 +-
net/ipv4/ip_fragment.c | 2 +-
net/ipv4/ip_output.c | 9 ++--
net/ipv4/tcp_output.c | 16 +++----
net/ipv6/ip6_output.c | 6 +--
net/ipv6/netfilter.c | 6 +--
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
net/ipv6/reassembly.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/sched/act_bpf.c | 4 +-
net/sched/cls_bpf.c | 4 +-
17 files changed, 80 insertions(+), 52 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 1c2902eaebd3..de3915e2bfdb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -706,6 +706,11 @@ typedef unsigned int sk_buff_data_t;
typedef unsigned char *sk_buff_data_t;
#endif

+enum skb_tstamp_type {
+ SKB_CLOCK_REALTIME,
+ SKB_CLOCK_MONOTONIC,
+};
+
/**
* DOC: Basic sk_buff geometry
*
@@ -823,10 +828,9 @@ typedef unsigned char *sk_buff_data_t;
* @dst_pending_confirm: need to confirm neighbour
* @decrypted: Decrypted SKB
* @slow_gro: state present at GRO time, slower prepare step required
- * @mono_delivery_time: When set, skb->tstamp has the
- * delivery_time in mono clock base (i.e. EDT). Otherwise, the
- * skb->tstamp has the (rcv) timestamp at ingress and
- * delivery_time at egress.
+ * @tstamp_type: When set, skb->tstamp has the
+ * delivery_time in mono clock base Otherwise, the
+ * timestamp is considered real clock base.
* @napi_id: id of the NAPI struct this skb came from
* @sender_cpu: (aka @napi_id) source CPU in XPS
* @alloc_cpu: CPU which did the skb allocation.
@@ -954,7 +958,7 @@ struct sk_buff {
/* private: */
__u8 __mono_tc_offset[0];
/* public: */
- __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */
+ __u8 tstamp_type:1; /* See skb_tstamp_type */
#ifdef CONFIG_NET_XGRESS
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
__u8 tc_skip_classify:1;
@@ -4179,7 +4183,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
- skb->mono_delivery_time = 0;
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
}

static inline ktime_t net_timedelta(ktime_t t)
@@ -4188,10 +4192,33 @@ static inline ktime_t net_timedelta(ktime_t t)
}

static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
- bool mono)
+ u8 tstamp_type)
{
skb->tstamp = kt;
- skb->mono_delivery_time = kt && mono;
+
+ if (kt)
+ skb->tstamp_type = tstamp_type;
+ else
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
+}
+
+static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
+ ktime_t kt, clockid_t clockid)
+{
+ u8 tstamp_type = SKB_CLOCK_REALTIME;
+
+ switch (clockid) {
+ case CLOCK_REALTIME:
+ break;
+ case CLOCK_MONOTONIC:
+ tstamp_type = SKB_CLOCK_MONOTONIC;
+ break;
+ default:
+ WARN_ON_ONCE(1);
+ kt = 0;
+ }
+
+ skb_set_delivery_time(skb, kt, tstamp_type);
}

DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
@@ -4201,8 +4228,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
*/
static inline void skb_clear_delivery_time(struct sk_buff *skb)
{
- if (skb->mono_delivery_time) {
- skb->mono_delivery_time = 0;
+ if (skb->tstamp_type) {
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
if (static_branch_unlikely(&netstamp_needed_key))
skb->tstamp = ktime_get_real();
else
@@ -4212,7 +4239,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb)

static inline void skb_clear_tstamp(struct sk_buff *skb)
{
- if (skb->mono_delivery_time)
+ if (skb->tstamp_type)
return;

skb->tstamp = 0;
@@ -4220,7 +4247,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)

static inline ktime_t skb_tstamp(const struct sk_buff *skb)
{
- if (skb->mono_delivery_time)
+ if (skb->tstamp_type)
return 0;

return skb->tstamp;
@@ -4228,7 +4255,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb)

static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond)
{
- if (!skb->mono_delivery_time && skb->tstamp)
+ if (skb->tstamp_type != SKB_CLOCK_MONOTONIC && skb->tstamp)
return skb->tstamp;

if (static_branch_unlikely(&netstamp_needed_key) || cond)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 153960663ce4..5af6eb14c5db 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -76,7 +76,7 @@ struct frag_v6_compare_key {
* @stamp: timestamp of the last received fragment
* @len: total length of the original datagram
* @meat: length of received fragments so far
- * @mono_delivery_time: stamp has a mono delivery time (EDT)
+ * @tstamp_type: stamp has a mono delivery time (EDT)
* @flags: fragment queue flags
* @max_size: maximum received fragment size
* @fqdir: pointer to struct fqdir
@@ -97,7 +97,7 @@ struct inet_frag_queue {
ktime_t stamp;
int len;
int meat;
- u8 mono_delivery_time;
+ u8 tstamp_type;
__u8 flags;
u16 max_size;
struct fqdir *fqdir;
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index c3c51b9a6826..816bb0fde718 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
struct sk_buff *))
{
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
unsigned int hlen, ll_rs, mtu;
ktime_t tstamp = skb->tstamp;
struct ip_frag_state state;
@@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
if (iter.frag)
ip_fraglist_prepare(skb, &iter);

- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, data, skb);
if (err || !iter.frag)
break;
@@ -113,7 +113,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
goto blackhole;
}

- skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb2, tstamp, tstamp_type);
err = output(net, sk, data, skb2);
if (err)
goto blackhole;
diff --git a/net/core/dev.c b/net/core/dev.c
index e02d2363347e..c1e4101f5c7b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2161,7 +2161,7 @@ EXPORT_SYMBOL(net_disable_timestamp);
static inline void net_timestamp_set(struct sk_buff *skb)
{
skb->tstamp = 0;
- skb->mono_delivery_time = 0;
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
if (static_branch_unlikely(&netstamp_needed_key))
skb->tstamp = ktime_get_real();
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 2510464692af..a3781a796da4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7730,13 +7730,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
if (!tstamp)
return -EINVAL;
skb->tstamp = tstamp;
- skb->mono_delivery_time = 1;
+ skb->tstamp_type = SKB_CLOCK_MONOTONIC;
break;
case BPF_SKB_TSTAMP_UNSPEC:
if (tstamp)
return -EINVAL;
skb->tstamp = 0;
- skb->mono_delivery_time = 0;
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
break;
default:
return -EINVAL;
@@ -9443,7 +9443,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
- /* skb->tc_at_ingress && skb->mono_delivery_time,
+ /* skb->tc_at_ingress && skb->tstamp_type,
* read 0 as the (rcv) timestamp.
*/
*insn++ = BPF_MOV64_IMM(value_reg, 0);
@@ -9468,7 +9468,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, write skb->tstamp as is if tstamp_type_access is true.
* Otherwise, writing at ingress will have to clear the
- * mono_delivery_time bit also.
+ * skb->tstamp_type bit also.
*/
if (!prog->tstamp_type_access) {
__u8 tmp_reg = BPF_REG_AX;
@@ -9478,7 +9478,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
/* goto <store> */
*insn++ = BPF_JMP_A(2);
- /* <clear>: mono_delivery_time */
+ /* <clear>: skb->tstamp_type */
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET);
}
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 2a983cf450da..26506dd4b357 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -130,7 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq,
goto err;

fq->q.stamp = skb->tstamp;
- fq->q.mono_delivery_time = skb->mono_delivery_time;
+ fq->q.tstamp_type = skb->tstamp_type;
if (frag_type == LOWPAN_DISPATCH_FRAG1)
fq->q.flags |= INET_FRAG_FIRST_IN;

diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index faaec92a46ac..d179a2c84222 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -619,7 +619,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
skb_mark_not_on_list(head);
head->prev = NULL;
head->tstamp = q->stamp;
- head->mono_delivery_time = q->mono_delivery_time;
+ head->tstamp_type = q->tstamp_type;

if (sk)
refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 534b98a0744a..c101bb1b9d3c 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
qp->iif = dev->ifindex;

qp->q.stamp = skb->tstamp;
- qp->q.mono_delivery_time = skb->mono_delivery_time;
+ qp->q.tstamp_type = skb->tstamp_type;
qp->q.meat += skb->len;
qp->ecn |= ecn;
add_frag_mem_limit(qp->q.fqdir, skb->truesize);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 9500031a1f55..fe86cadfa85b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
{
struct iphdr *iph;
struct sk_buff *skb2;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
struct rtable *rt = skb_rtable(skb);
unsigned int mtu, hlen, ll_rs;
struct ip_fraglist_iter iter;
@@ -856,7 +856,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
}
}

- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, skb);

if (!err)
@@ -912,7 +912,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
/*
* Put this fragment into the sending queue.
*/
- skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb2, tstamp, tstamp_type);
err = output(net, sk, skb2);
if (err)
goto fail;
@@ -1649,7 +1649,8 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
arg->csumoffset) = csum_fold(csum_add(nskb->csum,
arg->csum));
nskb->ip_summed = CHECKSUM_NONE;
- nskb->mono_delivery_time = !!transmit_time;
+ if (transmit_time)
+ nskb->tstamp_type = SKB_CLOCK_MONOTONIC;
if (txhash)
skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4);
ip_push_pending_frames(sk, &fl4);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 57edf66ff91b..26fda2b2518d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1301,7 +1301,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
tp = tcp_sk(sk);
prior_wstamp = tp->tcp_wstamp_ns;
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
- skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
+ skb_set_delivery_type_by_clockid(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC);
if (clone_it) {
oskb = skb;

@@ -1655,7 +1655,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,

skb_split(skb, buff, len);

- skb_set_delivery_time(buff, skb->tstamp, true);
+ skb_set_delivery_type_by_clockid(buff, skb->tstamp, CLOCK_MONOTONIC);
tcp_fragment_tstamp(skb, buff);

old_factor = tcp_skb_pcount(skb);
@@ -2764,7 +2764,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
/* "skb_mstamp_ns" is used as a start point for the retransmit timer */
tp->tcp_wstamp_ns = tp->tcp_clock_cache;
- skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
+ skb_set_delivery_type_by_clockid(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC);
list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
tcp_init_tso_segs(skb, mss_now);
goto repair; /* Skip network transmission */
@@ -3749,12 +3749,12 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
now = tcp_clock_ns();
#ifdef CONFIG_SYN_COOKIES
if (unlikely(synack_type == TCP_SYNACK_COOKIE && ireq->tstamp_ok))
- skb_set_delivery_time(skb, cookie_init_timestamp(req, now),
- true);
+ skb_set_delivery_type_by_clockid(skb, cookie_init_timestamp(req, now),
+ CLOCK_MONOTONIC);
else
#endif
{
- skb_set_delivery_time(skb, now, true);
+ skb_set_delivery_type_by_clockid(skb, now, CLOCK_MONOTONIC);
if (!tcp_rsk(req)->snt_synack) /* Timestamp first SYNACK */
tcp_rsk(req)->snt_synack = tcp_skb_timestamp_us(skb);
}
@@ -3841,7 +3841,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb,
synack_type, &opts);

- skb_set_delivery_time(skb, now, true);
+ skb_set_delivery_type_by_clockid(skb, now, CLOCK_MONOTONIC);
tcp_add_tx_delay(skb, tp);

return skb;
@@ -4025,7 +4025,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)

err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);

- skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, true);
+ skb_set_delivery_type_by_clockid(syn, syn_data->skb_mstamp_ns, CLOCK_MONOTONIC);

/* Now full SYN+DATA was cloned and sent (or not),
* remove the SYN from the original skb (syn_data)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 8f906e9fbc38..05067bd44775 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
struct rt6_info *rt = dst_rt6_info(skb_dst(skb));
struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
inet6_sk(skb->sk) : NULL;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
struct ip6_frag_state state;
unsigned int mtu, hlen, nexthdr_offset;
ktime_t tstamp = skb->tstamp;
@@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);

- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, skb);
if (!err)
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
@@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
/*
* Put this fragment into the sending queue.
*/
- skb_set_delivery_time(frag, tstamp, mono_delivery_time);
+ skb_set_delivery_time(frag, tstamp, tstamp_type);
err = output(net, sk, frag);
if (err)
goto fail;
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 53d255838e6a..e0c2347b4dc6 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
struct sk_buff *))
{
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
ktime_t tstamp = skb->tstamp;
struct ip6_frag_state state;
u8 *prevhdr, nexthdr = 0;
@@ -192,7 +192,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);

- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, data, skb);
if (err || !iter.frag)
break;
@@ -225,7 +225,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
goto blackhole;
}

- skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb2, tstamp, tstamp_type);
err = output(net, sk, data, skb2);
if (err)
goto blackhole;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index ce8c14d8aff5..1e0cdad52207 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->iif = dev->ifindex;

fq->q.stamp = skb->tstamp;
- fq->q.mono_delivery_time = skb->mono_delivery_time;
+ fq->q.tstamp_type = skb->tstamp_type;
fq->q.meat += skb->len;
fq->ecn |= ecn;
if (payload_len > fq->q.max_size)
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index ee95cdcc8747..fe7a337b6bc7 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->iif = dev->ifindex;

fq->q.stamp = skb->tstamp;
- fq->q.mono_delivery_time = skb->mono_delivery_time;
+ fq->q.tstamp_type = skb->tstamp_type;
fq->q.meat += skb->len;
fq->ecn |= ecn;
add_frag_mem_limit(fq->q.fqdir, skb->truesize);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 37201c4fb393..bf6e8c18fbb9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -975,7 +975,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
mark = inet_twsk(sk)->tw_mark;
else
mark = READ_ONCE(sk->sk_mark);
- skb_set_delivery_time(buff, tcp_transmit_time(sk), true);
+ skb_set_delivery_type_by_clockid(buff, tcp_transmit_time(sk), CLOCK_MONOTONIC);
}
if (txhash) {
/* autoflowlabel/skb_get_hash_flowi6 rely on buff->hash */
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 0e3cf11ae5fc..396b576390d0 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb,
bpf_compute_data_pointers(skb);
filter_res = bpf_prog_run(filter, skb);
}
- if (unlikely(!skb->tstamp && skb->mono_delivery_time))
- skb->mono_delivery_time = 0;
+ if (unlikely(!skb->tstamp && skb->tstamp_type))
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
skb_orphan(skb);

diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 5e83e890f6a4..1941ebec23ff 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb,
bpf_compute_data_pointers(skb);
filter_res = bpf_prog_run(prog->filter, skb);
}
- if (unlikely(!skb->tstamp && skb->mono_delivery_time))
- skb->mono_delivery_time = 0;
+ if (unlikely(!skb->tstamp && skb->tstamp_type))
+ skb->tstamp_type = SKB_CLOCK_REALTIME;

if (prog->exts_integrated) {
res->class = 0;
--
2.25.1


2024-05-04 03:22:21

by Abhishek Chauhan

[permalink] [raw]
Subject: [RFC PATCH bpf-next v6 3/3] selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets

With changes in the design to forward CLOCK_TAI in the skbuff
framework, existing selftest framework needs modification
to handle forwarding of UDP packets with CLOCK_TAI as clockid.

Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Abhishek Chauhan <[email protected]>
---
tools/include/uapi/linux/bpf.h | 15 ++++---
.../selftests/bpf/prog_tests/ctx_rewrite.c | 10 +++--
.../selftests/bpf/prog_tests/tc_redirect.c | 3 --
.../selftests/bpf/progs/test_tc_dtime.c | 39 +++++++++----------
4 files changed, 34 insertions(+), 33 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 90706a47f6ff..25ea393cf084 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -6207,12 +6207,17 @@ union { \
__u64 :64; \
} __attribute__((aligned(8)))

+/* The enum used in skb->tstamp_type. It specifies the clock type
+ * of the time stored in the skb->tstamp.
+ */
enum {
- BPF_SKB_TSTAMP_UNSPEC,
- BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
- /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
- * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
- * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
+ BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
+ BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
+ BPF_SKB_CLOCK_REALTIME = 0,
+ BPF_SKB_CLOCK_MONOTONIC = 1,
+ BPF_SKB_CLOCK_TAI = 2,
+ /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
+ * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
*/
};

diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
index 3b7c57fe55a5..71940f4ef0fb 100644
--- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
+++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
@@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
{
N(SCHED_CLS, struct __sk_buff, tstamp),
.read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
- "w11 &= 3;"
- "if w11 != 0x3 goto pc+2;"
+ "if w11 == 0x4 goto pc+1;"
+ "goto pc+4;"
+ "if w11 == 0x3 goto pc+1;"
+ "goto pc+2;"
"$dst = 0;"
"goto pc+1;"
"$dst = *(u64 *)($ctx + sk_buff::tstamp);",
.write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
- "if w11 & 0x2 goto pc+1;"
+ "if w11 & 0x4 goto pc+1;"
"goto pc+2;"
- "w11 &= -2;"
+ "w11 &= -3;"
"*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
"*(u64 *)($ctx + sk_buff::tstamp) = $src;",
},
diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
index b1073d36d77a..327d51f59142 100644
--- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
@@ -890,9 +890,6 @@ static void test_udp_dtime(struct test_tc_dtime *skel, int family, bool bpf_fwd)

ASSERT_EQ(dtimes[INGRESS_FWDNS_P100], 0,
dtime_cnt_str(t, INGRESS_FWDNS_P100));
- /* non mono delivery time is not forwarded */
- ASSERT_EQ(dtimes[INGRESS_FWDNS_P101], 0,
- dtime_cnt_str(t, INGRESS_FWDNS_P101));
for (i = EGRESS_FWDNS_P100; i < SET_DTIME; i++)
ASSERT_GT(dtimes[i], 0, dtime_cnt_str(t, i));

diff --git a/tools/testing/selftests/bpf/progs/test_tc_dtime.c b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
index 74ec09f040b7..21f5be202e4b 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_dtime.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
@@ -222,13 +222,19 @@ int egress_host(struct __sk_buff *skb)
return TC_ACT_OK;

if (skb_proto(skb_type) == IPPROTO_TCP) {
- if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO &&
+ if (skb->tstamp_type == BPF_SKB_CLOCK_MONOTONIC &&
+ skb->tstamp)
+ inc_dtimes(EGRESS_ENDHOST);
+ else
+ inc_errs(EGRESS_ENDHOST);
+ } else if (skb_proto(skb_type) == IPPROTO_UDP) {
+ if (skb->tstamp_type == BPF_SKB_CLOCK_TAI &&
skb->tstamp)
inc_dtimes(EGRESS_ENDHOST);
else
inc_errs(EGRESS_ENDHOST);
} else {
- if (skb->tstamp_type == BPF_SKB_TSTAMP_UNSPEC &&
+ if (skb->tstamp_type == BPF_SKB_CLOCK_REALTIME &&
skb->tstamp)
inc_dtimes(EGRESS_ENDHOST);
else
@@ -252,7 +258,7 @@ int ingress_host(struct __sk_buff *skb)
if (!skb_type)
return TC_ACT_OK;

- if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO &&
+ if (skb->tstamp_type == BPF_SKB_CLOCK_MONOTONIC &&
skb->tstamp == EGRESS_FWDNS_MAGIC)
inc_dtimes(INGRESS_ENDHOST);
else
@@ -315,7 +321,6 @@ int egress_fwdns_prio100(struct __sk_buff *skb)
SEC("tc")
int ingress_fwdns_prio101(struct __sk_buff *skb)
{
- __u64 expected_dtime = EGRESS_ENDHOST_MAGIC;
int skb_type;

skb_type = skb_get_type(skb);
@@ -323,29 +328,24 @@ int ingress_fwdns_prio101(struct __sk_buff *skb)
/* Should have handled in prio100 */
return TC_ACT_SHOT;

- if (skb_proto(skb_type) == IPPROTO_UDP)
- expected_dtime = 0;
-
if (skb->tstamp_type) {
if (fwdns_clear_dtime() ||
- skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
- skb->tstamp != expected_dtime)
+ (skb->tstamp_type != BPF_SKB_CLOCK_MONOTONIC &&
+ skb->tstamp_type != BPF_SKB_CLOCK_TAI) ||
+ skb->tstamp != EGRESS_ENDHOST_MAGIC)
inc_errs(INGRESS_FWDNS_P101);
else
inc_dtimes(INGRESS_FWDNS_P101);
} else {
- if (!fwdns_clear_dtime() && expected_dtime)
+ if (!fwdns_clear_dtime())
inc_errs(INGRESS_FWDNS_P101);
}

- if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
+ if (skb->tstamp_type == BPF_SKB_CLOCK_MONOTONIC) {
skb->tstamp = INGRESS_FWDNS_MAGIC;
} else {
if (bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC,
- BPF_SKB_TSTAMP_DELIVERY_MONO))
- inc_errs(SET_DTIME);
- if (!bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC,
- BPF_SKB_TSTAMP_UNSPEC))
+ BPF_SKB_CLOCK_MONOTONIC))
inc_errs(SET_DTIME);
}

@@ -370,7 +370,7 @@ int egress_fwdns_prio101(struct __sk_buff *skb)

if (skb->tstamp_type) {
if (fwdns_clear_dtime() ||
- skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
+ skb->tstamp_type != BPF_SKB_CLOCK_MONOTONIC ||
skb->tstamp != INGRESS_FWDNS_MAGIC)
inc_errs(EGRESS_FWDNS_P101);
else
@@ -380,14 +380,11 @@ int egress_fwdns_prio101(struct __sk_buff *skb)
inc_errs(EGRESS_FWDNS_P101);
}

- if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
+ if (skb->tstamp_type == BPF_SKB_CLOCK_MONOTONIC) {
skb->tstamp = EGRESS_FWDNS_MAGIC;
} else {
if (bpf_skb_set_tstamp(skb, EGRESS_FWDNS_MAGIC,
- BPF_SKB_TSTAMP_DELIVERY_MONO))
- inc_errs(SET_DTIME);
- if (!bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC,
- BPF_SKB_TSTAMP_UNSPEC))
+ BPF_SKB_CLOCK_MONOTONIC))
inc_errs(SET_DTIME);
}

--
2.25.1


2024-05-06 18:55:05

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 1/3] net: Rename mono_delivery_time to tstamp_type for scalabilty

Abhishek Chauhan wrote:
> mono_delivery_time was added to check if skb->tstamp has delivery
> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
> timestamp in ingress and delivery_time at egress.
>
> Renaming the bitfield from mono_delivery_time to tstamp_type is for
> extensibilty for other timestamps such as userspace timestamp
> (i.e. SO_TXTIME) set via sock opts.
>
> As we are renaming the mono_delivery_time to tstamp_type, it makes
> sense to start assigning tstamp_type based on enum defined
> in this commit.
>
> Earlier we used bool arg flag to check if the tstamp is mono in
> function skb_set_delivery_time, Now the signature of the functions
> accepts tstamp_type to distinguish between mono and real time.
>
> Also skb_set_delivery_type_by_clockid is a new function which accepts
> clockid to determine the tstamp_type.
>
> In future tstamp_type:1 can be extended to support userspace timestamp
> by increasing the bitfield.
>
> Link: https://lore.kernel.org/netdev/[email protected]/
> Signed-off-by: Abhishek Chauhan <[email protected]>
> ---
> Changes since v5
> - Avoided using garble function names as mentioned by
> Willem.
> - Implemented a conversion function stead of duplicating
> the same logic as mentioned by Willem.
> - Fixed indentation problems and minor documentation issues
> which mentions tstamp_type as a whole instead of bitfield
> notations. (Mentioned both by Willem and Martin)
>
> Changes since v4
> - Introduce new function to directly delivery_time and
> another to set tstamp_type based on clockid.
> - Removed un-necessary comments in skbuff.h as
> enums were obvious and understood.
>
> Changes since v3
> - Fixed inconsistent capitalization in skbuff.h
> - remove reference to MONO_DELIVERY_TIME_MASK in skbuff.h
> and point it to skb_tstamp_type now.
> - Explicitely setting SKB_CLOCK_MONO if valid transmit_time
> ip_send_unicast_reply
> - Keeping skb_tstamp inline with skb_clear_tstamp.
> - skb_set_delivery_time checks if timstamp is 0 and
> sets the tstamp_type to SKB_CLOCK_REAL.
> - Above comments are given by Willem
> - Found out that skbuff.h has access to uapi/linux/time.h
> So now instead of using CLOCK_REAL/CLOCK_MONO
> i am checking actual clockid_t directly to set tstamp_type
> example:- CLOCK_REALTIME/CLOCK_MONOTONIC
> - Compilation error fixed in
> net/ieee802154/6lowpan/reassembly.c
>
> Changes since v2
> - Minor changes to commit subject
>
> Changes since v1
> - Squashed the two commits into one as mentioned by Willem.
> - Introduced switch in skb_set_delivery_time.
> - Renamed and removed directionality aspects w.r.t tstamp_type
> as mentioned by Willem.
>
> include/linux/skbuff.h | 53 ++++++++++++++++------
> include/net/inet_frag.h | 4 +-
> net/bridge/netfilter/nf_conntrack_bridge.c | 6 +--
> net/core/dev.c | 2 +-
> net/core/filter.c | 10 ++--
> net/ieee802154/6lowpan/reassembly.c | 2 +-
> net/ipv4/inet_fragment.c | 2 +-
> net/ipv4/ip_fragment.c | 2 +-
> net/ipv4/ip_output.c | 9 ++--
> net/ipv4/tcp_output.c | 16 +++----
> net/ipv6/ip6_output.c | 6 +--
> net/ipv6/netfilter.c | 6 +--
> net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
> net/ipv6/reassembly.c | 2 +-
> net/ipv6/tcp_ipv6.c | 2 +-
> net/sched/act_bpf.c | 4 +-
> net/sched/cls_bpf.c | 4 +-
> 17 files changed, 80 insertions(+), 52 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 1c2902eaebd3..de3915e2bfdb 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -706,6 +706,11 @@ typedef unsigned int sk_buff_data_t;
> typedef unsigned char *sk_buff_data_t;
> #endif
>
> +enum skb_tstamp_type {
> + SKB_CLOCK_REALTIME,
> + SKB_CLOCK_MONOTONIC,
> +};
> +
> /**
> * DOC: Basic sk_buff geometry
> *
> @@ -823,10 +828,9 @@ typedef unsigned char *sk_buff_data_t;
> * @dst_pending_confirm: need to confirm neighbour
> * @decrypted: Decrypted SKB
> * @slow_gro: state present at GRO time, slower prepare step required
> - * @mono_delivery_time: When set, skb->tstamp has the
> - * delivery_time in mono clock base (i.e. EDT). Otherwise, the
> - * skb->tstamp has the (rcv) timestamp at ingress and
> - * delivery_time at egress.
> + * @tstamp_type: When set, skb->tstamp has the
> + * delivery_time in mono clock base Otherwise, the
> + * timestamp is considered real clock base.

Missing period. More importantly, no longer conditional. It always
captures the type of skb->tstamp.

> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1301,7 +1301,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
> tp = tcp_sk(sk);
> prior_wstamp = tp->tcp_wstamp_ns;
> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
> + skb_set_delivery_type_by_clockid(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC);
> if (clone_it) {
> oskb = skb;
>
> @@ -1655,7 +1655,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
>
> skb_split(skb, buff, len);
>
> - skb_set_delivery_time(buff, skb->tstamp, true);
> + skb_set_delivery_type_by_clockid(buff, skb->tstamp, CLOCK_MONOTONIC);
> tcp_fragment_tstamp(skb, buff);

All these hardcoded monotonic calls in TCP can be the shorter version

skb_set_delivery_type(.., SKB_CLOCK_MONOTONIC);


2024-05-06 19:00:31

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type

Abhishek Chauhan wrote:
> tstamp_type is now set based on actual clockid_t compressed
> into 2 bits.
>
> To make the design scalable for future needs this commit bring in
> the change to extend the tstamp_type:1 to tstamp_type:2 to support
> other clockid_t timestamp.
>
> We now support CLOCK_TAI as part of tstamp_type as part of this
> commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.
>
> Link: https://lore.kernel.org/netdev/[email protected]/
> Signed-off-by: Abhishek Chauhan <[email protected]>
> ---
> Changes since v5
> - Took care of documentation comments of tstamp_type
> in skbuff.h as mentioned by Willem.
> - Use of complete words instead of abbrevation in
> macro definitions as mentioned by Willem.
> - Fixed indentation problems
> - Removed BPF_SKB_TSTAMP_UNSPEC and marked it
> Deprecated as documentation, and introduced
> BPF_SKB_CLOCK_REALTIME instead.
> - BUILD_BUG_ON for additional enums introduced.
> - __ip_make_skb and ip6_make_skb now has
> tcp checks to mark tcp packet as mono tstamp base.
> - separated the selftests/bpf changes into another patch.
> - Made changes as per Martin in selftest bpf code and
> tool/include/uapi/linux/bpf.h
>
> Changes since v4
> - Made changes to BPF code in filter.c as per
> Martin's comments
> - Minor fixes on comments given on documentation
> from Willem in skbuff.h (removed obvious ones)
> - Made changes to ctx_rewrite.c and test_tc_dtime.c
> - test_tc_dtime.c i am not really sure if i took care
> of all the changes as i am not too familiar with
> the framework.
> - Introduce common mask SKB_TSTAMP_TYPE_MASK instead
> of multiple SKB mask.
> - Optimisation on BPF code as suggested by Martin.
> - Set default case to SKB_CLOCK_REALTME.
>
> Changes since v3
> - Carefully reviewed BPF APIs and made changes in
> BPF code as well.
> - Re-used actual clockid_t values since skbuff.h
> indirectly includes uapi/linux/time.h
> - Added CLOCK_TAI as part of the skb_set_delivery_time
> handling instead of CLOCK_USER
> - Added default in switch for unsupported and invalid
> timestamp with an WARN_ONCE
> - All of the above comments were given by Willem
> - Made changes in filter.c as per Martin's comments
> to handle invalid cases in bpf code with addition of
> SKB_TAI_DELIVERY_TIME_MASK
>
> Changes since v2
> - Minor changes to commit subject
>
> Changes since v1
> - identified additional changes in BPF framework.
> - Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
> - Made changes in skb_set_delivery_time to keep changes similar to
> previous code for mono_delivery_time and just setting tstamp_type
> bit 1 for userspace timestamp.
>
>
> include/linux/skbuff.h | 21 +++++++++++--------
> include/uapi/linux/bpf.h | 15 +++++++++-----
> net/core/filter.c | 44 +++++++++++++++++++++++-----------------
> net/ipv4/ip_output.c | 5 ++++-
> net/ipv4/raw.c | 2 +-
> net/ipv6/ip6_output.c | 5 ++++-
> net/ipv6/raw.c | 2 +-
> net/packet/af_packet.c | 7 +++----
> 8 files changed, 61 insertions(+), 40 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index de3915e2bfdb..fe7d8dbef77e 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -709,6 +709,8 @@ typedef unsigned char *sk_buff_data_t;
> enum skb_tstamp_type {
> SKB_CLOCK_REALTIME,
> SKB_CLOCK_MONOTONIC,
> + SKB_CLOCK_TAI,
> + __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
> };
>
> /**
> @@ -829,8 +831,7 @@ enum skb_tstamp_type {
> * @decrypted: Decrypted SKB
> * @slow_gro: state present at GRO time, slower prepare step required
> * @tstamp_type: When set, skb->tstamp has the
> - * delivery_time in mono clock base Otherwise, the
> - * timestamp is considered real clock base.
> + * delivery_time clock base of skb->tstamp.
> * @napi_id: id of the NAPI struct this skb came from
> * @sender_cpu: (aka @napi_id) source CPU in XPS
> * @alloc_cpu: CPU which did the skb allocation.
> @@ -958,7 +959,7 @@ struct sk_buff {
> /* private: */
> __u8 __mono_tc_offset[0];
> /* public: */
> - __u8 tstamp_type:1; /* See skb_tstamp_type */
> + __u8 tstamp_type:2; /* See skb_tstamp_type */
> #ifdef CONFIG_NET_XGRESS
> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
> __u8 tc_skip_classify:1;
> @@ -1088,15 +1089,16 @@ struct sk_buff {
> #endif
> #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
>
> -/* if you move tc_at_ingress or mono_delivery_time
> +/* if you move tc_at_ingress or tstamp_type
> * around, you also must adapt these constants.
> */
> #ifdef __BIG_ENDIAN_BITFIELD
> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
> -#define TC_AT_INGRESS_MASK (1 << 6)
> +#define SKB_TSTAMP_TYPE_MASK (3 << 6)
> +#define SKB_TSTAMP_TYPE_RSHIFT (6)
> +#define TC_AT_INGRESS_MASK (1 << 5)
> #else
> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
> -#define TC_AT_INGRESS_MASK (1 << 1)
> +#define SKB_TSTAMP_TYPE_MASK (3)
> +#define TC_AT_INGRESS_MASK (1 << 2)
> #endif
> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)
>
> @@ -4213,6 +4215,9 @@ static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
> case CLOCK_MONOTONIC:
> tstamp_type = SKB_CLOCK_MONOTONIC;
> break;
> + case CLOCK_TAI:
> + tstamp_type = SKB_CLOCK_TAI;
> + break;
> default:
> WARN_ON_ONCE(1);
> kt = 0;
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 90706a47f6ff..25ea393cf084 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -6207,12 +6207,17 @@ union { \
> __u64 :64; \
> } __attribute__((aligned(8)))
>
> +/* The enum used in skb->tstamp_type. It specifies the clock type
> + * of the time stored in the skb->tstamp.
> + */
> enum {
> - BPF_SKB_TSTAMP_UNSPEC,
> - BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
> - /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
> - * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
> - * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
> + BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
> + BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
> + BPF_SKB_CLOCK_REALTIME = 0,
> + BPF_SKB_CLOCK_MONOTONIC = 1,
> + BPF_SKB_CLOCK_TAI = 2,
> + /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
> + * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
> */
> };
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a3781a796da4..9f3df4a0d1ee 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -7726,16 +7726,20 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
> return -EOPNOTSUPP;
>
> switch (tstamp_type) {
> - case BPF_SKB_TSTAMP_DELIVERY_MONO:
> + case BPF_SKB_CLOCK_MONOTONIC:
> if (!tstamp)
> return -EINVAL;
> skb->tstamp = tstamp;
> skb->tstamp_type = SKB_CLOCK_MONOTONIC;
> break;
> - case BPF_SKB_TSTAMP_UNSPEC:
> - if (tstamp)
> + case BPF_SKB_CLOCK_TAI:
> + if (!tstamp)
> return -EINVAL;
> - skb->tstamp = 0;
> + skb->tstamp = tstamp;
> + skb->tstamp_type = SKB_CLOCK_TAI;
> + break;
> + case BPF_SKB_CLOCK_REALTIME:
> + skb->tstamp = tstamp;
> skb->tstamp_type = SKB_CLOCK_REALTIME;

Only since there is another reason to respin.

The previous code did not do this, but let's order cases by their enum
value, starting with realtime.

Also in anticipation with possible future expansions.



2024-05-06 19:04:34

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 3/3] selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets

Abhishek Chauhan wrote:
> With changes in the design to forward CLOCK_TAI in the skbuff
> framework, existing selftest framework needs modification
> to handle forwarding of UDP packets with CLOCK_TAI as clockid.
>
> Link: https://lore.kernel.org/netdev/[email protected]/
> Signed-off-by: Abhishek Chauhan <[email protected]>
> ---
> tools/include/uapi/linux/bpf.h | 15 ++++---
> .../selftests/bpf/prog_tests/ctx_rewrite.c | 10 +++--
> .../selftests/bpf/prog_tests/tc_redirect.c | 3 --
> .../selftests/bpf/progs/test_tc_dtime.c | 39 +++++++++----------
> 4 files changed, 34 insertions(+), 33 deletions(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 90706a47f6ff..25ea393cf084 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -6207,12 +6207,17 @@ union { \
> __u64 :64; \
> } __attribute__((aligned(8)))
>
> +/* The enum used in skb->tstamp_type. It specifies the clock type
> + * of the time stored in the skb->tstamp.
> + */
> enum {
> - BPF_SKB_TSTAMP_UNSPEC,
> - BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
> - /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
> - * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
> - * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
> + BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
> + BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
> + BPF_SKB_CLOCK_REALTIME = 0,
> + BPF_SKB_CLOCK_MONOTONIC = 1,
> + BPF_SKB_CLOCK_TAI = 2,
> + /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
> + * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
> */
> };
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
> index 3b7c57fe55a5..71940f4ef0fb 100644
> --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
> +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
> @@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
> {
> N(SCHED_CLS, struct __sk_buff, tstamp),
> .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
> - "w11 &= 3;"
> - "if w11 != 0x3 goto pc+2;"
> + "if w11 == 0x4 goto pc+1;"
> + "goto pc+4;"
> + "if w11 == 0x3 goto pc+1;"
> + "goto pc+2;"

Not an expert on this code, and I see that the existing code already
has this below, but: isn't it odd and unnecessary to jump to an
unconditional jump statement?

> "$dst = 0;"
> "goto pc+1;"
> "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
> .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
> - "if w11 & 0x2 goto pc+1;"
> + "if w11 & 0x4 goto pc+1;"
> "goto pc+2;"
> - "w11 &= -2;"
> + "w11 &= -3;"
> "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
> "*(u64 *)($ctx + sk_buff::tstamp) = $src;",
> },

2024-05-06 19:56:50

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 1/3] net: Rename mono_delivery_time to tstamp_type for scalabilty



On 5/6/2024 11:51 AM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> mono_delivery_time was added to check if skb->tstamp has delivery
>> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
>> timestamp in ingress and delivery_time at egress.
>>
>> Renaming the bitfield from mono_delivery_time to tstamp_type is for
>> extensibilty for other timestamps such as userspace timestamp
>> (i.e. SO_TXTIME) set via sock opts.
>>
>> As we are renaming the mono_delivery_time to tstamp_type, it makes
>> sense to start assigning tstamp_type based on enum defined
>> in this commit.
>>
>> Earlier we used bool arg flag to check if the tstamp is mono in
>> function skb_set_delivery_time, Now the signature of the functions
>> accepts tstamp_type to distinguish between mono and real time.
>>
>> Also skb_set_delivery_type_by_clockid is a new function which accepts
>> clockid to determine the tstamp_type.
>>
>> In future tstamp_type:1 can be extended to support userspace timestamp
>> by increasing the bitfield.
>>
>> Link: https://lore.kernel.org/netdev/[email protected]/
>> Signed-off-by: Abhishek Chauhan <[email protected]>
>> ---
>> Changes since v5
>> - Avoided using garble function names as mentioned by
>> Willem.
>> - Implemented a conversion function stead of duplicating
>> the same logic as mentioned by Willem.
>> - Fixed indentation problems and minor documentation issues
>> which mentions tstamp_type as a whole instead of bitfield
>> notations. (Mentioned both by Willem and Martin)
>>
>> Changes since v4
>> - Introduce new function to directly delivery_time and
>> another to set tstamp_type based on clockid.
>> - Removed un-necessary comments in skbuff.h as
>> enums were obvious and understood.
>>
>> Changes since v3
>> - Fixed inconsistent capitalization in skbuff.h
>> - remove reference to MONO_DELIVERY_TIME_MASK in skbuff.h
>> and point it to skb_tstamp_type now.
>> - Explicitely setting SKB_CLOCK_MONO if valid transmit_time
>> ip_send_unicast_reply
>> - Keeping skb_tstamp inline with skb_clear_tstamp.
>> - skb_set_delivery_time checks if timstamp is 0 and
>> sets the tstamp_type to SKB_CLOCK_REAL.
>> - Above comments are given by Willem
>> - Found out that skbuff.h has access to uapi/linux/time.h
>> So now instead of using CLOCK_REAL/CLOCK_MONO
>> i am checking actual clockid_t directly to set tstamp_type
>> example:- CLOCK_REALTIME/CLOCK_MONOTONIC
>> - Compilation error fixed in
>> net/ieee802154/6lowpan/reassembly.c
>>
>> Changes since v2
>> - Minor changes to commit subject
>>
>> Changes since v1
>> - Squashed the two commits into one as mentioned by Willem.
>> - Introduced switch in skb_set_delivery_time.
>> - Renamed and removed directionality aspects w.r.t tstamp_type
>> as mentioned by Willem.
>>
>> include/linux/skbuff.h | 53 ++++++++++++++++------
>> include/net/inet_frag.h | 4 +-
>> net/bridge/netfilter/nf_conntrack_bridge.c | 6 +--
>> net/core/dev.c | 2 +-
>> net/core/filter.c | 10 ++--
>> net/ieee802154/6lowpan/reassembly.c | 2 +-
>> net/ipv4/inet_fragment.c | 2 +-
>> net/ipv4/ip_fragment.c | 2 +-
>> net/ipv4/ip_output.c | 9 ++--
>> net/ipv4/tcp_output.c | 16 +++----
>> net/ipv6/ip6_output.c | 6 +--
>> net/ipv6/netfilter.c | 6 +--
>> net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
>> net/ipv6/reassembly.c | 2 +-
>> net/ipv6/tcp_ipv6.c | 2 +-
>> net/sched/act_bpf.c | 4 +-
>> net/sched/cls_bpf.c | 4 +-
>> 17 files changed, 80 insertions(+), 52 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 1c2902eaebd3..de3915e2bfdb 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -706,6 +706,11 @@ typedef unsigned int sk_buff_data_t;
>> typedef unsigned char *sk_buff_data_t;
>> #endif
>>
>> +enum skb_tstamp_type {
>> + SKB_CLOCK_REALTIME,
>> + SKB_CLOCK_MONOTONIC,
>> +};
>> +
>> /**
>> * DOC: Basic sk_buff geometry
>> *
>> @@ -823,10 +828,9 @@ typedef unsigned char *sk_buff_data_t;
>> * @dst_pending_confirm: need to confirm neighbour
>> * @decrypted: Decrypted SKB
>> * @slow_gro: state present at GRO time, slower prepare step required
>> - * @mono_delivery_time: When set, skb->tstamp has the
>> - * delivery_time in mono clock base (i.e. EDT). Otherwise, the
>> - * skb->tstamp has the (rcv) timestamp at ingress and
>> - * delivery_time at egress.
>> + * @tstamp_type: When set, skb->tstamp has the
>> + * delivery_time in mono clock base Otherwise, the
>> + * timestamp is considered real clock base.
>
> Missing period. More importantly, no longer conditional. It always
> captures the type of skb->tstamp.
>
I think i should move the patchset 2 documentation to this patch itself.

@tstamp_type: When set, skb->tstamp has the
+ * delivery_time clock base of skb->tstamp.
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -1301,7 +1301,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
>> tp = tcp_sk(sk);
>> prior_wstamp = tp->tcp_wstamp_ns;
>> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
>> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
>> + skb_set_delivery_type_by_clockid(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC);
>> if (clone_it) {
>> oskb = skb;
>>
>> @@ -1655,7 +1655,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
>>
>> skb_split(skb, buff, len);
>>
>> - skb_set_delivery_time(buff, skb->tstamp, true);
>> + skb_set_delivery_type_by_clockid(buff, skb->tstamp, CLOCK_MONOTONIC);
>> tcp_fragment_tstamp(skb, buff);
>
> All these hardcoded monotonic calls in TCP can be the shorter version
>
> skb_set_delivery_type(.., SKB_CLOCK_MONOTONIC);
I think i should directly call skb_set_delivery_time if i know that TCP always uses Monotonic clock base,
rather than calling the wrapper api which does nothing but switch and then calls skb_set_delivery_Time.

Makes sense. I will make the changes in v7 .
>

2024-05-06 19:59:13

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type



On 5/6/2024 12:00 PM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> tstamp_type is now set based on actual clockid_t compressed
>> into 2 bits.
>>
>> To make the design scalable for future needs this commit bring in
>> the change to extend the tstamp_type:1 to tstamp_type:2 to support
>> other clockid_t timestamp.
>>
>> We now support CLOCK_TAI as part of tstamp_type as part of this
>> commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.
>>
>> Link: https://lore.kernel.org/netdev/[email protected]/
>> Signed-off-by: Abhishek Chauhan <[email protected]>
>> ---
>> Changes since v5
>> - Took care of documentation comments of tstamp_type
>> in skbuff.h as mentioned by Willem.
>> - Use of complete words instead of abbrevation in
>> macro definitions as mentioned by Willem.
>> - Fixed indentation problems
>> - Removed BPF_SKB_TSTAMP_UNSPEC and marked it
>> Deprecated as documentation, and introduced
>> BPF_SKB_CLOCK_REALTIME instead.
>> - BUILD_BUG_ON for additional enums introduced.
>> - __ip_make_skb and ip6_make_skb now has
>> tcp checks to mark tcp packet as mono tstamp base.
>> - separated the selftests/bpf changes into another patch.
>> - Made changes as per Martin in selftest bpf code and
>> tool/include/uapi/linux/bpf.h
>>
>> Changes since v4
>> - Made changes to BPF code in filter.c as per
>> Martin's comments
>> - Minor fixes on comments given on documentation
>> from Willem in skbuff.h (removed obvious ones)
>> - Made changes to ctx_rewrite.c and test_tc_dtime.c
>> - test_tc_dtime.c i am not really sure if i took care
>> of all the changes as i am not too familiar with
>> the framework.
>> - Introduce common mask SKB_TSTAMP_TYPE_MASK instead
>> of multiple SKB mask.
>> - Optimisation on BPF code as suggested by Martin.
>> - Set default case to SKB_CLOCK_REALTME.
>>
>> Changes since v3
>> - Carefully reviewed BPF APIs and made changes in
>> BPF code as well.
>> - Re-used actual clockid_t values since skbuff.h
>> indirectly includes uapi/linux/time.h
>> - Added CLOCK_TAI as part of the skb_set_delivery_time
>> handling instead of CLOCK_USER
>> - Added default in switch for unsupported and invalid
>> timestamp with an WARN_ONCE
>> - All of the above comments were given by Willem
>> - Made changes in filter.c as per Martin's comments
>> to handle invalid cases in bpf code with addition of
>> SKB_TAI_DELIVERY_TIME_MASK
>>
>> Changes since v2
>> - Minor changes to commit subject
>>
>> Changes since v1
>> - identified additional changes in BPF framework.
>> - Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
>> - Made changes in skb_set_delivery_time to keep changes similar to
>> previous code for mono_delivery_time and just setting tstamp_type
>> bit 1 for userspace timestamp.
>>
>>
>> include/linux/skbuff.h | 21 +++++++++++--------
>> include/uapi/linux/bpf.h | 15 +++++++++-----
>> net/core/filter.c | 44 +++++++++++++++++++++++-----------------
>> net/ipv4/ip_output.c | 5 ++++-
>> net/ipv4/raw.c | 2 +-
>> net/ipv6/ip6_output.c | 5 ++++-
>> net/ipv6/raw.c | 2 +-
>> net/packet/af_packet.c | 7 +++----
>> 8 files changed, 61 insertions(+), 40 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index de3915e2bfdb..fe7d8dbef77e 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -709,6 +709,8 @@ typedef unsigned char *sk_buff_data_t;
>> enum skb_tstamp_type {
>> SKB_CLOCK_REALTIME,
>> SKB_CLOCK_MONOTONIC,
>> + SKB_CLOCK_TAI,
>> + __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
>> };
>>
>> /**
>> @@ -829,8 +831,7 @@ enum skb_tstamp_type {
>> * @decrypted: Decrypted SKB
>> * @slow_gro: state present at GRO time, slower prepare step required
>> * @tstamp_type: When set, skb->tstamp has the
>> - * delivery_time in mono clock base Otherwise, the
>> - * timestamp is considered real clock base.
>> + * delivery_time clock base of skb->tstamp.
>> * @napi_id: id of the NAPI struct this skb came from
>> * @sender_cpu: (aka @napi_id) source CPU in XPS
>> * @alloc_cpu: CPU which did the skb allocation.
>> @@ -958,7 +959,7 @@ struct sk_buff {
>> /* private: */
>> __u8 __mono_tc_offset[0];
>> /* public: */
>> - __u8 tstamp_type:1; /* See skb_tstamp_type */
>> + __u8 tstamp_type:2; /* See skb_tstamp_type */
>> #ifdef CONFIG_NET_XGRESS
>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
>> __u8 tc_skip_classify:1;
>> @@ -1088,15 +1089,16 @@ struct sk_buff {
>> #endif
>> #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
>>
>> -/* if you move tc_at_ingress or mono_delivery_time
>> +/* if you move tc_at_ingress or tstamp_type
>> * around, you also must adapt these constants.
>> */
>> #ifdef __BIG_ENDIAN_BITFIELD
>> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
>> -#define TC_AT_INGRESS_MASK (1 << 6)
>> +#define SKB_TSTAMP_TYPE_MASK (3 << 6)
>> +#define SKB_TSTAMP_TYPE_RSHIFT (6)
>> +#define TC_AT_INGRESS_MASK (1 << 5)
>> #else
>> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
>> -#define TC_AT_INGRESS_MASK (1 << 1)
>> +#define SKB_TSTAMP_TYPE_MASK (3)
>> +#define TC_AT_INGRESS_MASK (1 << 2)
>> #endif
>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)
>>
>> @@ -4213,6 +4215,9 @@ static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
>> case CLOCK_MONOTONIC:
>> tstamp_type = SKB_CLOCK_MONOTONIC;
>> break;
>> + case CLOCK_TAI:
>> + tstamp_type = SKB_CLOCK_TAI;
>> + break;
>> default:
>> WARN_ON_ONCE(1);
>> kt = 0;
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 90706a47f6ff..25ea393cf084 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -6207,12 +6207,17 @@ union { \
>> __u64 :64; \
>> } __attribute__((aligned(8)))
>>
>> +/* The enum used in skb->tstamp_type. It specifies the clock type
>> + * of the time stored in the skb->tstamp.
>> + */
>> enum {
>> - BPF_SKB_TSTAMP_UNSPEC,
>> - BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
>> - /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>> - * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>> - * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>> + BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
>> + BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
>> + BPF_SKB_CLOCK_REALTIME = 0,
>> + BPF_SKB_CLOCK_MONOTONIC = 1,
>> + BPF_SKB_CLOCK_TAI = 2,
>> + /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
>> + * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>> */
>> };
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index a3781a796da4..9f3df4a0d1ee 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -7726,16 +7726,20 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
>> return -EOPNOTSUPP;
>>
>> switch (tstamp_type) {
>> - case BPF_SKB_TSTAMP_DELIVERY_MONO:
>> + case BPF_SKB_CLOCK_MONOTONIC:
>> if (!tstamp)
>> return -EINVAL;
>> skb->tstamp = tstamp;
>> skb->tstamp_type = SKB_CLOCK_MONOTONIC;
>> break;
>> - case BPF_SKB_TSTAMP_UNSPEC:
>> - if (tstamp)
>> + case BPF_SKB_CLOCK_TAI:
>> + if (!tstamp)
>> return -EINVAL;
>> - skb->tstamp = 0;
>> + skb->tstamp = tstamp;
>> + skb->tstamp_type = SKB_CLOCK_TAI;
>> + break;
>> + case BPF_SKB_CLOCK_REALTIME:
>> + skb->tstamp = tstamp;
>> skb->tstamp_type = SKB_CLOCK_REALTIME;
>
> Only since there is another reason to respin.
>
> The previous code did not do this, but let's order cases by their enum
> value, starting with realtime.
>
> Also in anticipation with possible future expansions.
>
Noted I will take care of this.

>

2024-05-06 20:51:28

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 3/3] selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets



On 5/6/2024 12:04 PM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> With changes in the design to forward CLOCK_TAI in the skbuff
>> framework, existing selftest framework needs modification
>> to handle forwarding of UDP packets with CLOCK_TAI as clockid.
>>
>> Link: https://lore.kernel.org/netdev/[email protected]/
>> Signed-off-by: Abhishek Chauhan <[email protected]>
>> ---
>> tools/include/uapi/linux/bpf.h | 15 ++++---
>> .../selftests/bpf/prog_tests/ctx_rewrite.c | 10 +++--
>> .../selftests/bpf/prog_tests/tc_redirect.c | 3 --
>> .../selftests/bpf/progs/test_tc_dtime.c | 39 +++++++++----------
>> 4 files changed, 34 insertions(+), 33 deletions(-)
>>
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 90706a47f6ff..25ea393cf084 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -6207,12 +6207,17 @@ union { \
>> __u64 :64; \
>> } __attribute__((aligned(8)))
>>
>> +/* The enum used in skb->tstamp_type. It specifies the clock type
>> + * of the time stored in the skb->tstamp.
>> + */
>> enum {
>> - BPF_SKB_TSTAMP_UNSPEC,
>> - BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
>> - /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>> - * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>> - * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>> + BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
>> + BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
>> + BPF_SKB_CLOCK_REALTIME = 0,
>> + BPF_SKB_CLOCK_MONOTONIC = 1,
>> + BPF_SKB_CLOCK_TAI = 2,
>> + /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
>> + * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>> */
>> };
>>
>> diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>> index 3b7c57fe55a5..71940f4ef0fb 100644
>> --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>> +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>> @@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
>> {
>> N(SCHED_CLS, struct __sk_buff, tstamp),
>> .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>> - "w11 &= 3;"
>> - "if w11 != 0x3 goto pc+2;"
>> + "if w11 == 0x4 goto pc+1;"
>> + "goto pc+4;"
>> + "if w11 == 0x3 goto pc+1;"
>> + "goto pc+2;"
>
> Not an expert on this code, and I see that the existing code already
> has this below, but: isn't it odd and unnecessary to jump to an
> unconditional jump statement?
>
I am closely looking into your comment and i will evalute it(Martin can correct me
if the jumps are correct or not as i am new to BPF as well) but i found out that
JSET = "&" and not "==". So the above two ins has to change from -

"if w11 == 0x4 goto pc+1;" ==>(needs to be corrected to) "if w11 & 0x4 goto pc+1;"
"if w11 == 0x3 goto pc+1;" ==> (needs to be correct to) "if w11 & 0x3 goto pc+1;"


>> "$dst = 0;"
>> "goto pc+1;"
>> "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
>> .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>> - "if w11 & 0x2 goto pc+1;"
>> + "if w11 & 0x4 goto pc+1;"
>> "goto pc+2;"
>> - "w11 &= -2;"
>> + "w11 &= -3;"
Martin,
Also i am not sure why the the dissembly complains because the value of SKB_TSTAMP_TYPE_MASK = 3 and we are
negating it ~3 = -3.

Can't match disassembly(left) with pattern(right):
r11 = *(u8 *)(r1 +129) ; r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset)
if w11 & 0x4 goto pc+1 ; if w11 & 0x4 goto pc+1
goto pc+2 ; goto pc+2
w11 &= -4 ; w11 &= -3

>> "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
>> "*(u64 *)($ctx + sk_buff::tstamp) = $src;",
>> },

2024-05-06 20:55:40

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 3/3] selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets



On 5/6/2024 1:50 PM, Abhishek Chauhan (ABC) wrote:
>
>
> On 5/6/2024 12:04 PM, Willem de Bruijn wrote:
>> Abhishek Chauhan wrote:
>>> With changes in the design to forward CLOCK_TAI in the skbuff
>>> framework, existing selftest framework needs modification
>>> to handle forwarding of UDP packets with CLOCK_TAI as clockid.
>>>
>>> Link: https://lore.kernel.org/netdev/[email protected]/
>>> Signed-off-by: Abhishek Chauhan <[email protected]>
>>> ---
>>> tools/include/uapi/linux/bpf.h | 15 ++++---
>>> .../selftests/bpf/prog_tests/ctx_rewrite.c | 10 +++--
>>> .../selftests/bpf/prog_tests/tc_redirect.c | 3 --
>>> .../selftests/bpf/progs/test_tc_dtime.c | 39 +++++++++----------
>>> 4 files changed, 34 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>>> index 90706a47f6ff..25ea393cf084 100644
>>> --- a/tools/include/uapi/linux/bpf.h
>>> +++ b/tools/include/uapi/linux/bpf.h
>>> @@ -6207,12 +6207,17 @@ union { \
>>> __u64 :64; \
>>> } __attribute__((aligned(8)))
>>>
>>> +/* The enum used in skb->tstamp_type. It specifies the clock type
>>> + * of the time stored in the skb->tstamp.
>>> + */
>>> enum {
>>> - BPF_SKB_TSTAMP_UNSPEC,
>>> - BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
>>> - /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>>> - * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>>> - * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>>> + BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
>>> + BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
>>> + BPF_SKB_CLOCK_REALTIME = 0,
>>> + BPF_SKB_CLOCK_MONOTONIC = 1,
>>> + BPF_SKB_CLOCK_TAI = 2,
>>> + /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
>>> + * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>>> */
>>> };
>>>
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>> index 3b7c57fe55a5..71940f4ef0fb 100644
>>> --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>> +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>> @@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
>>> {
>>> N(SCHED_CLS, struct __sk_buff, tstamp),
>>> .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>> - "w11 &= 3;"
>>> - "if w11 != 0x3 goto pc+2;"
>>> + "if w11 == 0x4 goto pc+1;"
>>> + "goto pc+4;"
>>> + "if w11 == 0x3 goto pc+1;"
>>> + "goto pc+2;"
>>
>> Not an expert on this code, and I see that the existing code already
>> has this below, but: isn't it odd and unnecessary to jump to an
>> unconditional jump statement?
>>
> I am closely looking into your comment and i will evalute it(Martin can correct me
> if the jumps are correct or not as i am new to BPF as well) but i found out that
> JSET = "&" and not "==". So the above two ins has to change from -
>
> "if w11 == 0x4 goto pc+1;" ==>(needs to be corrected to) "if w11 & 0x4 goto pc+1;"
> "if w11 == 0x3 goto pc+1;" ==> (needs to be correct to) "if w11 & 0x3 goto pc+1;"
>
>
>>> "$dst = 0;"
>>> "goto pc+1;"
>>> "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
>>> .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>> - "if w11 & 0x2 goto pc+1;"
>>> + "if w11 & 0x4 goto pc+1;"
>>> "goto pc+2;"
>>> - "w11 &= -2;"
>>> + "w11 &= -3;"
> Martin,
> Also i am not sure why the the dissembly complains because the value of SKB_TSTAMP_TYPE_MASK = 3 and we are
> negating it ~3 = -3.
>
> Can't match disassembly(left) with pattern(right):
> r11 = *(u8 *)(r1 +129) ; r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset)
> if w11 & 0x4 goto pc+1 ; if w11 & 0x4 goto pc+1
> goto pc+2 ; goto pc+2
> w11 &= -4 ; w11 &= -3
>
Martin,
Please ignore this. It has to -4 and not -3. I figured it out.

>>> "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
>>> "*(u64 *)($ctx + sk_buff::tstamp) = $src;",
>>> },

2024-05-06 21:29:56

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 1/3] net: Rename mono_delivery_time to tstamp_type for scalabilty

Abhishek Chauhan (ABC) wrote:
>
>
> On 5/6/2024 11:51 AM, Willem de Bruijn wrote:
> > Abhishek Chauhan wrote:
> >> mono_delivery_time was added to check if skb->tstamp has delivery
> >> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
> >> timestamp in ingress and delivery_time at egress.
> >>
> >> Renaming the bitfield from mono_delivery_time to tstamp_type is for
> >> extensibilty for other timestamps such as userspace timestamp
> >> (i.e. SO_TXTIME) set via sock opts.
> >>
> >> As we are renaming the mono_delivery_time to tstamp_type, it makes
> >> sense to start assigning tstamp_type based on enum defined
> >> in this commit.
> >>
> >> Earlier we used bool arg flag to check if the tstamp is mono in
> >> function skb_set_delivery_time, Now the signature of the functions
> >> accepts tstamp_type to distinguish between mono and real time.
> >>
> >> Also skb_set_delivery_type_by_clockid is a new function which accepts
> >> clockid to determine the tstamp_type.
> >>
> >> In future tstamp_type:1 can be extended to support userspace timestamp
> >> by increasing the bitfield.
> >>
> >> Link: https://lore.kernel.org/netdev/[email protected]/
> >> Signed-off-by: Abhishek Chauhan <[email protected]>
> >> ---
> >> Changes since v5
> >> - Avoided using garble function names as mentioned by
> >> Willem.
> >> - Implemented a conversion function stead of duplicating
> >> the same logic as mentioned by Willem.
> >> - Fixed indentation problems and minor documentation issues
> >> which mentions tstamp_type as a whole instead of bitfield
> >> notations. (Mentioned both by Willem and Martin)
> >>
> >> Changes since v4
> >> - Introduce new function to directly delivery_time and
> >> another to set tstamp_type based on clockid.
> >> - Removed un-necessary comments in skbuff.h as
> >> enums were obvious and understood.
> >>
> >> Changes since v3
> >> - Fixed inconsistent capitalization in skbuff.h
> >> - remove reference to MONO_DELIVERY_TIME_MASK in skbuff.h
> >> and point it to skb_tstamp_type now.
> >> - Explicitely setting SKB_CLOCK_MONO if valid transmit_time
> >> ip_send_unicast_reply
> >> - Keeping skb_tstamp inline with skb_clear_tstamp.
> >> - skb_set_delivery_time checks if timstamp is 0 and
> >> sets the tstamp_type to SKB_CLOCK_REAL.
> >> - Above comments are given by Willem
> >> - Found out that skbuff.h has access to uapi/linux/time.h
> >> So now instead of using CLOCK_REAL/CLOCK_MONO
> >> i am checking actual clockid_t directly to set tstamp_type
> >> example:- CLOCK_REALTIME/CLOCK_MONOTONIC
> >> - Compilation error fixed in
> >> net/ieee802154/6lowpan/reassembly.c
> >>
> >> Changes since v2
> >> - Minor changes to commit subject
> >>
> >> Changes since v1
> >> - Squashed the two commits into one as mentioned by Willem.
> >> - Introduced switch in skb_set_delivery_time.
> >> - Renamed and removed directionality aspects w.r.t tstamp_type
> >> as mentioned by Willem.
> >>
> >> include/linux/skbuff.h | 53 ++++++++++++++++------
> >> include/net/inet_frag.h | 4 +-
> >> net/bridge/netfilter/nf_conntrack_bridge.c | 6 +--
> >> net/core/dev.c | 2 +-
> >> net/core/filter.c | 10 ++--
> >> net/ieee802154/6lowpan/reassembly.c | 2 +-
> >> net/ipv4/inet_fragment.c | 2 +-
> >> net/ipv4/ip_fragment.c | 2 +-
> >> net/ipv4/ip_output.c | 9 ++--
> >> net/ipv4/tcp_output.c | 16 +++----
> >> net/ipv6/ip6_output.c | 6 +--
> >> net/ipv6/netfilter.c | 6 +--
> >> net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
> >> net/ipv6/reassembly.c | 2 +-
> >> net/ipv6/tcp_ipv6.c | 2 +-
> >> net/sched/act_bpf.c | 4 +-
> >> net/sched/cls_bpf.c | 4 +-
> >> 17 files changed, 80 insertions(+), 52 deletions(-)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 1c2902eaebd3..de3915e2bfdb 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -706,6 +706,11 @@ typedef unsigned int sk_buff_data_t;
> >> typedef unsigned char *sk_buff_data_t;
> >> #endif
> >>
> >> +enum skb_tstamp_type {
> >> + SKB_CLOCK_REALTIME,
> >> + SKB_CLOCK_MONOTONIC,
> >> +};
> >> +
> >> /**
> >> * DOC: Basic sk_buff geometry
> >> *
> >> @@ -823,10 +828,9 @@ typedef unsigned char *sk_buff_data_t;
> >> * @dst_pending_confirm: need to confirm neighbour
> >> * @decrypted: Decrypted SKB
> >> * @slow_gro: state present at GRO time, slower prepare step required
> >> - * @mono_delivery_time: When set, skb->tstamp has the
> >> - * delivery_time in mono clock base (i.e. EDT). Otherwise, the
> >> - * skb->tstamp has the (rcv) timestamp at ingress and
> >> - * delivery_time at egress.
> >> + * @tstamp_type: When set, skb->tstamp has the
> >> + * delivery_time in mono clock base Otherwise, the
> >> + * timestamp is considered real clock base.
> >
> > Missing period. More importantly, no longer conditional. It always
> > captures the type of skb->tstamp.
> >
> I think i should move the patchset 2 documentation to this patch itself.
>
> @tstamp_type: When set, skb->tstamp has the
> + * delivery_time clock base of skb->tstamp.
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -1301,7 +1301,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
> >> tp = tcp_sk(sk);
> >> prior_wstamp = tp->tcp_wstamp_ns;
> >> tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
> >> - skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
> >> + skb_set_delivery_type_by_clockid(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC);
> >> if (clone_it) {
> >> oskb = skb;
> >>
> >> @@ -1655,7 +1655,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
> >>
> >> skb_split(skb, buff, len);
> >>
> >> - skb_set_delivery_time(buff, skb->tstamp, true);
> >> + skb_set_delivery_type_by_clockid(buff, skb->tstamp, CLOCK_MONOTONIC);
> >> tcp_fragment_tstamp(skb, buff);
> >
> > All these hardcoded monotonic calls in TCP can be the shorter version
> >
> > skb_set_delivery_type(.., SKB_CLOCK_MONOTONIC);
> I think i should directly call skb_set_delivery_time if i know that TCP always uses Monotonic clock base,
> rather than calling the wrapper api which does nothing but switch and then calls skb_set_delivery_Time.
>
> Makes sense. I will make the changes in v7 .

Thanks, +1 on both


2024-05-06 23:41:20

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 3/3] selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets



On 5/6/2024 1:54 PM, Abhishek Chauhan (ABC) wrote:
>
>
> On 5/6/2024 1:50 PM, Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 5/6/2024 12:04 PM, Willem de Bruijn wrote:
>>> Abhishek Chauhan wrote:
>>>> With changes in the design to forward CLOCK_TAI in the skbuff
>>>> framework, existing selftest framework needs modification
>>>> to handle forwarding of UDP packets with CLOCK_TAI as clockid.
>>>>
>>>> Link: https://lore.kernel.org/netdev/[email protected]/
>>>> Signed-off-by: Abhishek Chauhan <[email protected]>
>>>> ---
>>>> tools/include/uapi/linux/bpf.h | 15 ++++---
>>>> .../selftests/bpf/prog_tests/ctx_rewrite.c | 10 +++--
>>>> .../selftests/bpf/prog_tests/tc_redirect.c | 3 --
>>>> .../selftests/bpf/progs/test_tc_dtime.c | 39 +++++++++----------
>>>> 4 files changed, 34 insertions(+), 33 deletions(-)
>>>>
>>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>>>> index 90706a47f6ff..25ea393cf084 100644
>>>> --- a/tools/include/uapi/linux/bpf.h
>>>> +++ b/tools/include/uapi/linux/bpf.h
>>>> @@ -6207,12 +6207,17 @@ union { \
>>>> __u64 :64; \
>>>> } __attribute__((aligned(8)))
>>>>
>>>> +/* The enum used in skb->tstamp_type. It specifies the clock type
>>>> + * of the time stored in the skb->tstamp.
>>>> + */
>>>> enum {
>>>> - BPF_SKB_TSTAMP_UNSPEC,
>>>> - BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
>>>> - /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>>>> - * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>>>> - * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>>>> + BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
>>>> + BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
>>>> + BPF_SKB_CLOCK_REALTIME = 0,
>>>> + BPF_SKB_CLOCK_MONOTONIC = 1,
>>>> + BPF_SKB_CLOCK_TAI = 2,
>>>> + /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
>>>> + * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>>>> */
>>>> };
>>>>
>>>> diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>>> index 3b7c57fe55a5..71940f4ef0fb 100644
>>>> --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>>> +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>>> @@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
>>>> {
>>>> N(SCHED_CLS, struct __sk_buff, tstamp),
>>>> .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>>> - "w11 &= 3;"
>>>> - "if w11 != 0x3 goto pc+2;"
>>>> + "if w11 == 0x4 goto pc+1;"
>>>> + "goto pc+4;"
>>>> + "if w11 == 0x3 goto pc+1;"
>>>> + "goto pc+2;"
>>>
>>> Not an expert on this code, and I see that the existing code already
>>> has this below, but: isn't it odd and unnecessary to jump to an
>>> unconditional jump statement?
>>>
>> I am closely looking into your comment and i will evalute it(Martin can correct me
>> if the jumps are correct or not as i am new to BPF as well) but i found out that
>> JSET = "&" and not "==". So the above two ins has to change from -
>>
>> "if w11 == 0x4 goto pc+1;" ==>(needs to be corrected to) "if w11 & 0x4 goto pc+1;"
>> "if w11 == 0x3 goto pc+1;" ==> (needs to be correct to) "if w11 & 0x3 goto pc+1;"
>>
>>
Willem, I looked at the jumps in the above code. They look correct to me.
Martin can check too if i am doing anything wrong here other than the JSET "&".

Ideally pc(program counter) points to the next instruction.

"if w11 & 0x4 goto pc+1;"
"goto pc+4;"
[pc+0] "if w11 & 0x3 goto pc+1;" <== PC is going to be here
[pc+1] "goto pc+2;"
[pc+2] "$dst = 0;"
[pc+3] "goto pc+1;"
[pc+4] "$dst = *(u64 *)($ctx + sk_buff::tstamp);", <== This is where the code is intended to jump to for "goto pc+4;"



>>>> "$dst = 0;"
>>>> "goto pc+1;"
>>>> "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
>>>> .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>>> - "if w11 & 0x2 goto pc+1;"
>>>> + "if w11 & 0x4 goto pc+1;"
>>>> "goto pc+2;"
>>>> - "w11 &= -2;"
>>>> + "w11 &= -3;"


2024-05-07 00:45:54

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type

On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index fe86cadfa85b..c3d852eecb01 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>
> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> skb->mark = cork->mark;
> - skb->tstamp = cork->transmit_time;
> + if (sk_is_tcp(sk))

This seems not catching all IPPROTO_TCP case. In particular, the percpu
"ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:

void __init tcp_v4_init(void)
{

/* ... */
res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
IPPROTO_TCP, &init_net);

/* ... */
}

"while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done"
failed pretty often exactly in this case.

> + skb_set_delivery_type_by_clockid(skb, cork->transmit_time, CLOCK_MONOTONIC);
> + else
> + skb_set_delivery_type_by_clockid(skb, cork->transmit_time, sk->sk_clockid);
> /*
> * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
> * on dst refcount

[ ... ]

> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 05067bd44775..797a9764e8fe 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1924,7 +1924,10 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
>
> skb->priority = READ_ONCE(sk->sk_priority);
> skb->mark = cork->base.mark;
> - skb->tstamp = cork->base.transmit_time;
> + if (sk_is_tcp(sk))
> + skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, CLOCK_MONOTONIC);
> + else
> + skb_set_delivery_type_by_clockid(skb, cork->base.transmit_time, sk->sk_clockid);
>
> ip6_cork_steal_dst(skb, cork);
> IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);


2024-05-07 00:55:15

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 3/3] selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets

On 5/6/24 1:50 PM, Abhishek Chauhan (ABC) wrote:
>
>
> On 5/6/2024 12:04 PM, Willem de Bruijn wrote:
>> Abhishek Chauhan wrote:
>>> With changes in the design to forward CLOCK_TAI in the skbuff
>>> framework, existing selftest framework needs modification
>>> to handle forwarding of UDP packets with CLOCK_TAI as clockid.
>>>
>>> Link: https://lore.kernel.org/netdev/[email protected]/
>>> Signed-off-by: Abhishek Chauhan <[email protected]>
>>> ---
>>> tools/include/uapi/linux/bpf.h | 15 ++++---
>>> .../selftests/bpf/prog_tests/ctx_rewrite.c | 10 +++--
>>> .../selftests/bpf/prog_tests/tc_redirect.c | 3 --
>>> .../selftests/bpf/progs/test_tc_dtime.c | 39 +++++++++----------
>>> 4 files changed, 34 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>>> index 90706a47f6ff..25ea393cf084 100644
>>> --- a/tools/include/uapi/linux/bpf.h
>>> +++ b/tools/include/uapi/linux/bpf.h
>>> @@ -6207,12 +6207,17 @@ union { \
>>> __u64 :64; \
>>> } __attribute__((aligned(8)))
>>>
>>> +/* The enum used in skb->tstamp_type. It specifies the clock type
>>> + * of the time stored in the skb->tstamp.
>>> + */
>>> enum {
>>> - BPF_SKB_TSTAMP_UNSPEC,
>>> - BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
>>> - /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>>> - * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>>> - * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>>> + BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
>>> + BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
>>> + BPF_SKB_CLOCK_REALTIME = 0,
>>> + BPF_SKB_CLOCK_MONOTONIC = 1,
>>> + BPF_SKB_CLOCK_TAI = 2,
>>> + /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
>>> + * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>>> */
>>> };
>>>
>>> diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>> index 3b7c57fe55a5..71940f4ef0fb 100644
>>> --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>> +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>> @@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
>>> {
>>> N(SCHED_CLS, struct __sk_buff, tstamp),
>>> .read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>> - "w11 &= 3;"
>>> - "if w11 != 0x3 goto pc+2;"
>>> + "if w11 == 0x4 goto pc+1;"
>>> + "goto pc+4;"
>>> + "if w11 == 0x3 goto pc+1;"
>>> + "goto pc+2;"
>>
>> Not an expert on this code, and I see that the existing code already
>> has this below, but: isn't it odd and unnecessary to jump to an
>> unconditional jump statement?
>>
> I am closely looking into your comment and i will evalute it(Martin can correct me
> if the jumps are correct or not as i am new to BPF as well) but i found out that
> JSET = "&" and not "==". So the above two ins has to change from -

Yes, this should be bitwise "&" instead of "==".

The bpf CI did report this:
https://github.com/kernel-patches/bpf/actions/runs/8947652196/job/24579927178

Please monitor the bpf CI test result.

Do you have issue running the test locally?

>
> "if w11 == 0x4 goto pc+1;" ==>(needs to be corrected to) "if w11 & 0x4 goto pc+1;"
> "if w11 == 0x3 goto pc+1;" ==> (needs to be correct to) "if w11 & 0x3 goto pc+1;"
>
>
>>> "$dst = 0;"
>>> "goto pc+1;"
>>> "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
>>> .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>> - "if w11 & 0x2 goto pc+1;"
>>> + "if w11 & 0x4 goto pc+1;"
>>> "goto pc+2;"
>>> - "w11 &= -2;"
>>> + "w11 &= -3;"
> Martin,
> Also i am not sure why the the dissembly complains because the value of SKB_TSTAMP_TYPE_MASK = 3 and we are
> negating it ~3 = -3.
>
> Can't match disassembly(left) with pattern(right):
> r11 = *(u8 *)(r1 +129) ; r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset)
> if w11 & 0x4 goto pc+1 ; if w11 & 0x4 goto pc+1
> goto pc+2 ; goto pc+2
> w11 &= -4 ; w11 &= -3
>
>>> "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
>>> "*(u64 *)($ctx + sk_buff::tstamp) = $src;",
>>> },


2024-05-07 11:39:45

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type

Martin KaFai Lau wrote:
> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index fe86cadfa85b..c3d852eecb01 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
> >
> > skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> > skb->mark = cork->mark;
> > - skb->tstamp = cork->transmit_time;
> > + if (sk_is_tcp(sk))
>
> This seems not catching all IPPROTO_TCP case. In particular, the percpu
> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
>
> void __init tcp_v4_init(void)
> {
>
> /* ... */
> res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
> IPPROTO_TCP, &init_net);
>
> /* ... */
> }
>
> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done"
> failed pretty often exactly in this case.
>

Interesting. The TCP stack opens non TCP sockets.

Initializing sk->sk_clockid for this socket should address that.


2024-05-07 19:09:43

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type



On 5/7/2024 4:39 AM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>> index fe86cadfa85b..c3d852eecb01 100644
>>> --- a/net/ipv4/ip_output.c
>>> +++ b/net/ipv4/ip_output.c
>>> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>>>
>>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>>> skb->mark = cork->mark;
>>> - skb->tstamp = cork->transmit_time;
>>> + if (sk_is_tcp(sk))
>>
>> This seems not catching all IPPROTO_TCP case. In particular, the percpu
>> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
>>
>> void __init tcp_v4_init(void)
>> {
>>
>> /* ... */
>> res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
>> IPPROTO_TCP, &init_net);
>>
>> /* ... */
>> }
>>
>> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done"
>> failed pretty often exactly in this case.
>>
>
> Interesting. The TCP stack opens non TCP sockets.
>
> Initializing sk->sk_clockid for this socket should address that.
>
Willem, Are you suggesting your point from the previous patch ?

"I think we want to avoid special casing if we can. Note the if.

If TCP always uses monotonic, we could consider initializing
sk_clockid to CLOCK_MONONOTIC in tcp_init_sock.

I guess TCP logic currently entirely ignores sk_clockid. If we are to
start using this, then setsocktop SO_TXTIME must explicitly fail or
ignore for TCP sockets, or silently skip the write.

All of that is more complexity. Than is maybe warranted for this one
case. So no objections from me to special casing using sk_is_tcp(sk)
either."

Few places we need to initialize the clock base for tcp to monotonic
1. tcp_init_sock
2. void __init tcp_v4_init(void) in tcp_ipv4.c
3. static int __net_init tcpv6_net_init(struct net *net)
4. Ignore setsockopts for SO_TXTIME if the sk->protocol is tcp.

Is it safe to assume the TCP will never use any other close base ?


OR


For now we can do just protocol level check in ip_make_skb and ip6_make_skb
like
if (iph->protocol == IPPROTO_TCP)
/* ... */
else
/* ... */






2024-05-07 19:16:00

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 3/3] selftests/bpf: Handle forwarding of UDP CLOCK_TAI packets



On 5/6/2024 5:54 PM, Martin KaFai Lau wrote:
> On 5/6/24 1:50 PM, Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 5/6/2024 12:04 PM, Willem de Bruijn wrote:
>>> Abhishek Chauhan wrote:
>>>> With changes in the design to forward CLOCK_TAI in the skbuff
>>>> framework,  existing selftest framework needs modification
>>>> to handle forwarding of UDP packets with CLOCK_TAI as clockid.
>>>>
>>>> Link: https://lore.kernel.org/netdev/[email protected]/
>>>> Signed-off-by: Abhishek Chauhan <[email protected]>
>>>> ---
>>>>   tools/include/uapi/linux/bpf.h                | 15 ++++---
>>>>   .../selftests/bpf/prog_tests/ctx_rewrite.c    | 10 +++--
>>>>   .../selftests/bpf/prog_tests/tc_redirect.c    |  3 --
>>>>   .../selftests/bpf/progs/test_tc_dtime.c       | 39 +++++++++----------
>>>>   4 files changed, 34 insertions(+), 33 deletions(-)
>>>>
>>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>>>> index 90706a47f6ff..25ea393cf084 100644
>>>> --- a/tools/include/uapi/linux/bpf.h
>>>> +++ b/tools/include/uapi/linux/bpf.h
>>>> @@ -6207,12 +6207,17 @@ union {                    \
>>>>       __u64 :64;            \
>>>>   } __attribute__((aligned(8)))
>>>>   +/* The enum used in skb->tstamp_type. It specifies the clock type
>>>> + * of the time stored in the skb->tstamp.
>>>> + */
>>>>   enum {
>>>> -    BPF_SKB_TSTAMP_UNSPEC,
>>>> -    BPF_SKB_TSTAMP_DELIVERY_MONO,    /* tstamp has mono delivery time */
>>>> -    /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>>>> -     * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>>>> -     * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>>>> +    BPF_SKB_TSTAMP_UNSPEC = 0,        /* DEPRECATED */
>>>> +    BPF_SKB_TSTAMP_DELIVERY_MONO = 1,    /* DEPRECATED */
>>>> +    BPF_SKB_CLOCK_REALTIME = 0,
>>>> +    BPF_SKB_CLOCK_MONOTONIC = 1,
>>>> +    BPF_SKB_CLOCK_TAI = 2,
>>>> +    /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
>>>> +     * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
>>>>        */
>>>>   };
>>>>   diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>>> index 3b7c57fe55a5..71940f4ef0fb 100644
>>>> --- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>>> +++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
>>>> @@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
>>>>       {
>>>>           N(SCHED_CLS, struct __sk_buff, tstamp),
>>>>           .read  = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>>> -             "w11 &= 3;"
>>>> -             "if w11 != 0x3 goto pc+2;"
>>>> +             "if w11 == 0x4 goto pc+1;"
>>>> +             "goto pc+4;"
>>>> +             "if w11 == 0x3 goto pc+1;"
>>>> +             "goto pc+2;"
>>>
>>> Not an expert on this code, and I see that the existing code already
>>> has this below, but: isn't it odd and unnecessary to jump to an
>>> unconditional jump statement?
>>>
>> I am closely looking into your comment and i will evalute it(Martin can correct me
>> if the jumps are correct or not as i am new to BPF as well) but i found out that
>> JSET = "&" and not "==". So the above two ins has to change from -
>
> Yes, this should be bitwise "&" instead of "==".
>
> The bpf CI did report this: https://github.com/kernel-patches/bpf/actions/runs/8947652196/job/24579927178
>
> Please monitor the bpf CI test result.
>
> Do you have issue running the test locally?
>
Yes, To be honest. I am facing compilation issues when i follow the documentation to Make BPF on latest kernel.

This is slowing down my development with this patch.

Very similar to the problem described here :- https://github.com/jsitnicki/ebpf-summit-2020/issues/1

local/mnt/workspace/kernel_master/linux-next/tools/testing/selftests/bpf/tools/build/bpftool/bootstrap/libbpf/include/bpf/bpf_core_read.h:379:26: note: expanded from macro '___arrow2'
#define ___arrow2(a, b) a->b
~^
skeleton/pid_iter.bpf.c:19:9: note: forward declaration of 'struct bpf_link'
struct bpf_link link;
^
skeleton/pid_iter.bpf.c:105:7: error: incomplete definition of type 'struct bpf_link'
if (BPF_CORE_READ(link, type) == bpf_core_enum_value(enum bpf_link_type___local,
^~~~~~~~~~~~~~~~~~~~~~~~~

>>
>> "if w11 == 0x4 goto pc+1;" ==>(needs to be corrected to) "if w11 & 0x4 goto pc+1;"
>>   "if w11 == 0x3 goto pc+1;" ==> (needs to be correct to) "if w11 & 0x3 goto pc+1;"
>>
>>
>>>>                "$dst = 0;"
>>>>                "goto pc+1;"
>>>>                "$dst = *(u64 *)($ctx + sk_buff::tstamp);",
>>>>           .write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
>>>> -             "if w11 & 0x2 goto pc+1;"
>>>> +             "if w11 & 0x4 goto pc+1;"
>>>>                "goto pc+2;"
>>>> -             "w11 &= -2;"
>>>> +             "w11 &= -3;"
>> Martin,
>> Also i am not sure why the the dissembly complains because the value of SKB_TSTAMP_TYPE_MASK = 3 and we are
>> negating it ~3 = -3.
>>
>>    Can't match disassembly(left) with pattern(right):
>>    r11 = *(u8 *)(r1 +129)  ;  r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset)
>>    if w11 & 0x4 goto pc+1  ;  if w11 & 0x4 goto pc+1
>>    goto pc+2               ;  goto pc+2
>>    w11 &= -4               ;  w11 &= -3
>>
>>>>                "*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
>>>>                "*(u64 *)($ctx + sk_buff::tstamp) = $src;",
>>>>       },
>

2024-05-07 19:18:37

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type

Abhishek Chauhan (ABC) wrote:
>
>
> On 5/7/2024 4:39 AM, Willem de Bruijn wrote:
> > Martin KaFai Lau wrote:
> >> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
> >>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> >>> index fe86cadfa85b..c3d852eecb01 100644
> >>> --- a/net/ipv4/ip_output.c
> >>> +++ b/net/ipv4/ip_output.c
> >>> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
> >>>
> >>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> >>> skb->mark = cork->mark;
> >>> - skb->tstamp = cork->transmit_time;
> >>> + if (sk_is_tcp(sk))
> >>
> >> This seems not catching all IPPROTO_TCP case. In particular, the percpu
> >> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
> >>
> >> void __init tcp_v4_init(void)
> >> {
> >>
> >> /* ... */
> >> res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
> >> IPPROTO_TCP, &init_net);
> >>
> >> /* ... */
> >> }
> >>
> >> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done"
> >> failed pretty often exactly in this case.
> >>
> >
> > Interesting. The TCP stack opens non TCP sockets.
> >
> > Initializing sk->sk_clockid for this socket should address that.
> >
> Willem, Are you suggesting your point from the previous patch ?
>

No, just for this custom socket to initialize the sk_clockid. It is
not a TCP socket, but only used by TCP.

2024-05-07 19:43:49

by Abhishek Chauhan

[permalink] [raw]
Subject: Re: [RFC PATCH bpf-next v6 2/3] net: Add additional bit to support clockid_t timestamp type



On 5/7/2024 12:18 PM, Willem de Bruijn wrote:
> Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 5/7/2024 4:39 AM, Willem de Bruijn wrote:
>>> Martin KaFai Lau wrote:
>>>> On 5/3/24 8:13 PM, Abhishek Chauhan wrote:
>>>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>>>> index fe86cadfa85b..c3d852eecb01 100644
>>>>> --- a/net/ipv4/ip_output.c
>>>>> +++ b/net/ipv4/ip_output.c
>>>>> @@ -1457,7 +1457,10 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>>>>>
>>>>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>>>>> skb->mark = cork->mark;
>>>>> - skb->tstamp = cork->transmit_time;
>>>>> + if (sk_is_tcp(sk))
>>>>
>>>> This seems not catching all IPPROTO_TCP case. In particular, the percpu
>>>> "ipv4_tcp_sk" is SOCK_RAW. sk_is_tcp() is checking SOCK_STREAM:
>>>>
>>>> void __init tcp_v4_init(void)
>>>> {
>>>>
>>>> /* ... */
>>>> res = inet_ctl_sock_create(&sk, PF_INET, SOCK_RAW,
>>>> IPPROTO_TCP, &init_net);
>>>>
>>>> /* ... */
>>>> }
>>>>
>>>> "while :; do ./test_progs -t tc_redirect/tc_redirect_dtime || break; done"
>>>> failed pretty often exactly in this case.
>>>>
>>>
>>> Interesting. The TCP stack opens non TCP sockets.
>>>
>>> Initializing sk->sk_clockid for this socket should address that.
>>>
>> Willem, Are you suggesting your point from the previous patch ?
>>
>
> No, just for this custom socket to initialize the sk_clockid. It is
> not a TCP socket, but only used by TCP.
Thanks Willem,
Noted! Which means there are only two places these custom RAW tcp socket
are getting called

1. tcp_ipv4.c
2. tcp_ipv6.c

I will take care of initializing sk_clockid to monotonic in the next patch
in the above two files.

Let me know if i missed out anything.