Patch 1 :- This patch takes care of only renaming the mono delivery
timestamp to tstamp_type with no change in functionality of
existing available code in kernel also
Starts assigning tstamp_type with either mono or real and
introduces a new enum in the skbuff.h, again no change in functionality
of the existing available code in kernel , just making the code scalable.
Patch 2 :- Additional bit was added to support tai timestamp type to
avoid tstamp drops in the forwarding path when testing TC-ETF.
With this patch i am updating bpf filter.c and some of the BPF
unit test framework which tests redirect test scenarios.
Need reviews on those patches
Abhishek Chauhan (2):
net: Rename mono_delivery_time to tstamp_type for scalabilty
net: Add additional bit to support clockid_t timestamp type
include/linux/skbuff.h | 74 ++++++++++++++-----
include/net/inet_frag.h | 4 +-
include/uapi/linux/bpf.h | 1 +
net/bridge/netfilter/nf_conntrack_bridge.c | 6 +-
net/core/dev.c | 2 +-
net/core/filter.c | 50 +++++++------
net/ieee802154/6lowpan/reassembly.c | 2 +-
net/ipv4/inet_fragment.c | 2 +-
net/ipv4/ip_fragment.c | 2 +-
net/ipv4/ip_output.c | 11 +--
net/ipv4/raw.c | 2 +-
net/ipv4/tcp_output.c | 16 ++--
net/ipv6/ip6_output.c | 8 +-
net/ipv6/netfilter.c | 6 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
net/ipv6/raw.c | 2 +-
net/ipv6/reassembly.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/packet/af_packet.c | 7 +-
net/sched/act_bpf.c | 4 +-
net/sched/cls_bpf.c | 4 +-
.../selftests/bpf/prog_tests/ctx_rewrite.c | 10 ++-
.../selftests/bpf/progs/test_tc_dtime.c | 24 ++++--
23 files changed, 153 insertions(+), 90 deletions(-)
--
2.25.1
mono_delivery_time was added to check if skb->tstamp has delivery
time in mono clock base (i.e. EDT) otherwise skb->tstamp has
timestamp in ingress and delivery_time at egress.
Renaming the bitfield from mono_delivery_time to tstamp_type is for
extensibilty for other timestamps such as userspace timestamp
(i.e. SO_TXTIME) set via sock opts.
As we are renaming the mono_delivery_time to tstamp_type, it makes
sense to start assigning tstamp_type based on enum defined
in this commit.
Earlier we used bool arg flag to check if the tstamp is mono in
function skb_set_delivery_time, Now the signature of the functions
accepts tstamp_type to distinguish between mono and real time.
Introduce a new function to set tstamp_type based on clockid.
In future tstamp_type:1 can be extended to support userspace timestamp
by increasing the bitfield.
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Abhishek Chauhan <[email protected]>
---
Changes since v4
- Introduce new function to directly delivery_time and
another to set tstamp_type based on clockid.
- Removed un-necessary comments in skbuff.h as
enums were obvious and understood.
Changes since v3
- Fixed inconsistent capitalization in skbuff.h
- remove reference to MONO_DELIVERY_TIME_MASK in skbuff.h
and point it to skb_tstamp_type now.
- Explicitely setting SKB_CLOCK_MONO if valid transmit_time
ip_send_unicast_reply
- Keeping skb_tstamp inline with skb_clear_tstamp.
- skb_set_delivery_time checks if timstamp is 0 and
sets the tstamp_type to SKB_CLOCK_REAL.
- Above comments are given by Willem
- Found out that skbuff.h has access to uapi/linux/time.h
So now instead of using CLOCK_REAL/CLOCK_MONO
i am checking actual clockid_t directly to set tstamp_type
example:- CLOCK_REALTIME/CLOCK_MONOTONIC
- Compilation error fixed in
net/ieee802154/6lowpan/reassembly.c
Changes since v2
- Minor changes to commit subject
Changes since v1
- Squashed the two commits into one as mentioned by Willem.
- Introduced switch in skb_set_delivery_time.
- Renamed and removed directionality aspects w.r.t tstamp_type
as mentioned by Willem.
include/linux/skbuff.h | 54 ++++++++++++++++------
include/net/inet_frag.h | 4 +-
net/bridge/netfilter/nf_conntrack_bridge.c | 6 +--
net/core/dev.c | 2 +-
net/core/filter.c | 10 ++--
net/ieee802154/6lowpan/reassembly.c | 2 +-
net/ipv4/inet_fragment.c | 2 +-
net/ipv4/ip_fragment.c | 2 +-
net/ipv4/ip_output.c | 9 ++--
net/ipv4/tcp_output.c | 16 +++----
net/ipv6/ip6_output.c | 6 +--
net/ipv6/netfilter.c | 6 +--
net/ipv6/netfilter/nf_conntrack_reasm.c | 2 +-
net/ipv6/reassembly.c | 2 +-
net/ipv6/tcp_ipv6.c | 2 +-
net/sched/act_bpf.c | 4 +-
net/sched/cls_bpf.c | 4 +-
17 files changed, 81 insertions(+), 52 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 238e292696e6..e464d0ebc9c1 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -708,6 +708,11 @@ typedef unsigned int sk_buff_data_t;
typedef unsigned char *sk_buff_data_t;
#endif
+enum skb_tstamp_type {
+ SKB_CLOCK_REALTIME,
+ SKB_CLOCK_MONOTONIC,
+};
+
/**
* DOC: Basic sk_buff geometry
*
@@ -825,10 +830,9 @@ typedef unsigned char *sk_buff_data_t;
* @dst_pending_confirm: need to confirm neighbour
* @decrypted: Decrypted SKB
* @slow_gro: state present at GRO time, slower prepare step required
- * @mono_delivery_time: When set, skb->tstamp has the
- * delivery_time in mono clock base (i.e. EDT). Otherwise, the
- * skb->tstamp has the (rcv) timestamp at ingress and
- * delivery_time at egress.
+ * @tstamp_type: When set, skb->tstamp has the
+ * delivery_time in mono clock base Otherwise, the
+ * timestamp is considered real clock base.
* @napi_id: id of the NAPI struct this skb came from
* @sender_cpu: (aka @napi_id) source CPU in XPS
* @alloc_cpu: CPU which did the skb allocation.
@@ -956,7 +960,7 @@ struct sk_buff {
/* private: */
__u8 __mono_tc_offset[0];
/* public: */
- __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */
+ __u8 tstamp_type:1; /* See skb_tstamp_type */
#ifdef CONFIG_NET_XGRESS
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
__u8 tc_skip_classify:1;
@@ -4175,7 +4179,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb,
static inline void __net_timestamp(struct sk_buff *skb)
{
skb->tstamp = ktime_get_real();
- skb->mono_delivery_time = 0;
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
}
static inline ktime_t net_timedelta(ktime_t t)
@@ -4183,11 +4187,35 @@ static inline ktime_t net_timedelta(ktime_t t)
return ktime_sub(ktime_get_real(), t);
}
+static inline void skb_set_tstamp_type_frm_clkid(struct sk_buff *skb,
+ ktime_t kt, clockid_t clockid)
+{
+ skb->tstamp = kt;
+
+ if (!kt) {
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
+ return;
+ }
+
+ switch (clockid) {
+ case CLOCK_REALTIME:
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
+ break;
+ case CLOCK_MONOTONIC:
+ skb->tstamp_type = SKB_CLOCK_MONOTONIC;
+ break;
+ }
+}
+
static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
- bool mono)
+ u8 tstamp_type)
{
skb->tstamp = kt;
- skb->mono_delivery_time = kt && mono;
+
+ if (kt)
+ skb->tstamp_type = tstamp_type;
+ else
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
}
DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
@@ -4197,8 +4225,8 @@ DECLARE_STATIC_KEY_FALSE(netstamp_needed_key);
*/
static inline void skb_clear_delivery_time(struct sk_buff *skb)
{
- if (skb->mono_delivery_time) {
- skb->mono_delivery_time = 0;
+ if (skb->tstamp_type) {
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
if (static_branch_unlikely(&netstamp_needed_key))
skb->tstamp = ktime_get_real();
else
@@ -4208,7 +4236,7 @@ static inline void skb_clear_delivery_time(struct sk_buff *skb)
static inline void skb_clear_tstamp(struct sk_buff *skb)
{
- if (skb->mono_delivery_time)
+ if (skb->tstamp_type)
return;
skb->tstamp = 0;
@@ -4216,7 +4244,7 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
static inline ktime_t skb_tstamp(const struct sk_buff *skb)
{
- if (skb->mono_delivery_time)
+ if (skb->tstamp_type)
return 0;
return skb->tstamp;
@@ -4224,7 +4252,7 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb)
static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond)
{
- if (!skb->mono_delivery_time && skb->tstamp)
+ if (skb->tstamp_type != SKB_CLOCK_MONOTONIC && skb->tstamp)
return skb->tstamp;
if (static_branch_unlikely(&netstamp_needed_key) || cond)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 153960663ce4..5af6eb14c5db 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -76,7 +76,7 @@ struct frag_v6_compare_key {
* @stamp: timestamp of the last received fragment
* @len: total length of the original datagram
* @meat: length of received fragments so far
- * @mono_delivery_time: stamp has a mono delivery time (EDT)
+ * @tstamp_type: stamp has a mono delivery time (EDT)
* @flags: fragment queue flags
* @max_size: maximum received fragment size
* @fqdir: pointer to struct fqdir
@@ -97,7 +97,7 @@ struct inet_frag_queue {
ktime_t stamp;
int len;
int meat;
- u8 mono_delivery_time;
+ u8 tstamp_type;
__u8 flags;
u16 max_size;
struct fqdir *fqdir;
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index c3c51b9a6826..816bb0fde718 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -32,7 +32,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
struct sk_buff *))
{
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
unsigned int hlen, ll_rs, mtu;
ktime_t tstamp = skb->tstamp;
struct ip_frag_state state;
@@ -82,7 +82,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
if (iter.frag)
ip_fraglist_prepare(skb, &iter);
- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, data, skb);
if (err || !iter.frag)
break;
@@ -113,7 +113,7 @@ static int nf_br_ip_fragment(struct net *net, struct sock *sk,
goto blackhole;
}
- skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb2, tstamp, tstamp_type);
err = output(net, sk, data, skb2);
if (err)
goto blackhole;
diff --git a/net/core/dev.c b/net/core/dev.c
index 8bdc59074b29..df352e952fc5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2146,7 +2146,7 @@ EXPORT_SYMBOL(net_disable_timestamp);
static inline void net_timestamp_set(struct sk_buff *skb)
{
skb->tstamp = 0;
- skb->mono_delivery_time = 0;
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
if (static_branch_unlikely(&netstamp_needed_key))
skb->tstamp = ktime_get_real();
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 94d403f925c4..957c2fc724eb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7731,13 +7731,13 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
if (!tstamp)
return -EINVAL;
skb->tstamp = tstamp;
- skb->mono_delivery_time = 1;
+ skb->tstamp_type = SKB_CLOCK_MONOTONIC;
break;
case BPF_SKB_TSTAMP_UNSPEC:
if (tstamp)
return -EINVAL;
skb->tstamp = 0;
- skb->mono_delivery_time = 0;
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
break;
default:
return -EINVAL;
@@ -9444,7 +9444,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
- /* skb->tc_at_ingress && skb->mono_delivery_time,
+ /* skb->tc_at_ingress && skb->tstamp_type:1,
* read 0 as the (rcv) timestamp.
*/
*insn++ = BPF_MOV64_IMM(value_reg, 0);
@@ -9469,7 +9469,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, write skb->tstamp as is if tstamp_type_access is true.
* Otherwise, writing at ingress will have to clear the
- * mono_delivery_time bit also.
+ * mono_delivery_time (skb->tstamp_type:1)bit also.
*/
if (!prog->tstamp_type_access) {
__u8 tmp_reg = BPF_REG_AX;
@@ -9479,7 +9479,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
/* goto <store> */
*insn++ = BPF_JMP_A(2);
- /* <clear>: mono_delivery_time */
+ /* <clear>: mono_delivery_time or (skb->tstamp_type:1) */
*insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET);
}
diff --git a/net/ieee802154/6lowpan/reassembly.c b/net/ieee802154/6lowpan/reassembly.c
index 2a983cf450da..26506dd4b357 100644
--- a/net/ieee802154/6lowpan/reassembly.c
+++ b/net/ieee802154/6lowpan/reassembly.c
@@ -130,7 +130,7 @@ static int lowpan_frag_queue(struct lowpan_frag_queue *fq,
goto err;
fq->q.stamp = skb->tstamp;
- fq->q.mono_delivery_time = skb->mono_delivery_time;
+ fq->q.tstamp_type = skb->tstamp_type;
if (frag_type == LOWPAN_DISPATCH_FRAG1)
fq->q.flags |= INET_FRAG_FIRST_IN;
diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c
index faaec92a46ac..d179a2c84222 100644
--- a/net/ipv4/inet_fragment.c
+++ b/net/ipv4/inet_fragment.c
@@ -619,7 +619,7 @@ void inet_frag_reasm_finish(struct inet_frag_queue *q, struct sk_buff *head,
skb_mark_not_on_list(head);
head->prev = NULL;
head->tstamp = q->stamp;
- head->mono_delivery_time = q->mono_delivery_time;
+ head->tstamp_type = q->tstamp_type;
if (sk)
refcount_add(sum_truesize - head_truesize, &sk->sk_wmem_alloc);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 534b98a0744a..c101bb1b9d3c 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -355,7 +355,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
qp->iif = dev->ifindex;
qp->q.stamp = skb->tstamp;
- qp->q.mono_delivery_time = skb->mono_delivery_time;
+ qp->q.tstamp_type = skb->tstamp_type;
qp->q.meat += skb->len;
qp->ecn |= ecn;
add_frag_mem_limit(qp->q.fqdir, skb->truesize);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 1fe794967211..591226dcde26 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -764,7 +764,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
{
struct iphdr *iph;
struct sk_buff *skb2;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
struct rtable *rt = skb_rtable(skb);
unsigned int mtu, hlen, ll_rs;
struct ip_fraglist_iter iter;
@@ -856,7 +856,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
}
}
- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, skb);
if (!err)
@@ -912,7 +912,7 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
/*
* Put this fragment into the sending queue.
*/
- skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb2, tstamp, tstamp_type);
err = output(net, sk, skb2);
if (err)
goto fail;
@@ -1649,7 +1649,8 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
arg->csumoffset) = csum_fold(csum_add(nskb->csum,
arg->csum));
nskb->ip_summed = CHECKSUM_NONE;
- nskb->mono_delivery_time = !!transmit_time;
+ if (transmit_time)
+ nskb->tstamp_type = SKB_CLOCK_MONOTONIC;
if (txhash)
skb_set_hash(nskb, txhash, PKT_HASH_TYPE_L4);
ip_push_pending_frames(sk, &fl4);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 99a1d88f7f47..51c7399738c0 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1300,7 +1300,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
tp = tcp_sk(sk);
prior_wstamp = tp->tcp_wstamp_ns;
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
- skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
+ skb_set_tstamp_type_frm_clkid(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC);
if (clone_it) {
oskb = skb;
@@ -1654,7 +1654,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
skb_split(skb, buff, len);
- skb_set_delivery_time(buff, skb->tstamp, true);
+ skb_set_tstamp_type_frm_clkid(buff, skb->tstamp, CLOCK_MONOTONIC);
tcp_fragment_tstamp(skb, buff);
old_factor = tcp_skb_pcount(skb);
@@ -2758,7 +2758,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
/* "skb_mstamp_ns" is used as a start point for the retransmit timer */
tp->tcp_wstamp_ns = tp->tcp_clock_cache;
- skb_set_delivery_time(skb, tp->tcp_wstamp_ns, true);
+ skb_set_tstamp_type_frm_clkid(skb, tp->tcp_wstamp_ns, CLOCK_MONOTONIC);
list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
tcp_init_tso_segs(skb, mss_now);
goto repair; /* Skip network transmission */
@@ -3741,12 +3741,12 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
now = tcp_clock_ns();
#ifdef CONFIG_SYN_COOKIES
if (unlikely(synack_type == TCP_SYNACK_COOKIE && ireq->tstamp_ok))
- skb_set_delivery_time(skb, cookie_init_timestamp(req, now),
- true);
+ skb_set_tstamp_type_frm_clkid(skb, cookie_init_timestamp(req, now),
+ CLOCK_MONOTONIC);
else
#endif
{
- skb_set_delivery_time(skb, now, true);
+ skb_set_tstamp_type_frm_clkid(skb, now, CLOCK_MONOTONIC);
if (!tcp_rsk(req)->snt_synack) /* Timestamp first SYNACK */
tcp_rsk(req)->snt_synack = tcp_skb_timestamp_us(skb);
}
@@ -3833,7 +3833,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
bpf_skops_write_hdr_opt((struct sock *)sk, skb, req, syn_skb,
synack_type, &opts);
- skb_set_delivery_time(skb, now, true);
+ skb_set_tstamp_type_frm_clkid(skb, now, CLOCK_MONOTONIC);
tcp_add_tx_delay(skb, tp);
return skb;
@@ -4017,7 +4017,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation);
- skb_set_delivery_time(syn, syn_data->skb_mstamp_ns, true);
+ skb_set_tstamp_type_frm_clkid(syn, syn_data->skb_mstamp_ns, CLOCK_MONOTONIC);
/* Now full SYN+DATA was cloned and sent (or not),
* remove the SYN from the original skb (syn_data)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index b9dd3a66e423..a9e819115622 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -859,7 +859,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
struct rt6_info *rt = (struct rt6_info *)skb_dst(skb);
struct ipv6_pinfo *np = skb->sk && !dev_recursion_level() ?
inet6_sk(skb->sk) : NULL;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
struct ip6_frag_state state;
unsigned int mtu, hlen, nexthdr_offset;
ktime_t tstamp = skb->tstamp;
@@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);
- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, skb);
if (!err)
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
@@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
/*
* Put this fragment into the sending queue.
*/
- skb_set_delivery_time(frag, tstamp, mono_delivery_time);
+ skb_set_delivery_time(frag, tstamp, tstamp_type);
err = output(net, sk, frag);
if (err)
goto fail;
diff --git a/net/ipv6/netfilter.c b/net/ipv6/netfilter.c
index 53d255838e6a..e0c2347b4dc6 100644
--- a/net/ipv6/netfilter.c
+++ b/net/ipv6/netfilter.c
@@ -126,7 +126,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
struct sk_buff *))
{
int frag_max_size = BR_INPUT_SKB_CB(skb)->frag_max_size;
- bool mono_delivery_time = skb->mono_delivery_time;
+ u8 tstamp_type = skb->tstamp_type;
ktime_t tstamp = skb->tstamp;
struct ip6_frag_state state;
u8 *prevhdr, nexthdr = 0;
@@ -192,7 +192,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);
- skb_set_delivery_time(skb, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb, tstamp, tstamp_type);
err = output(net, sk, data, skb);
if (err || !iter.frag)
break;
@@ -225,7 +225,7 @@ int br_ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
goto blackhole;
}
- skb_set_delivery_time(skb2, tstamp, mono_delivery_time);
+ skb_set_delivery_time(skb2, tstamp, tstamp_type);
err = output(net, sk, data, skb2);
if (err)
goto blackhole;
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index ce8c14d8aff5..1e0cdad52207 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -264,7 +264,7 @@ static int nf_ct_frag6_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->iif = dev->ifindex;
fq->q.stamp = skb->tstamp;
- fq->q.mono_delivery_time = skb->mono_delivery_time;
+ fq->q.tstamp_type = skb->tstamp_type;
fq->q.meat += skb->len;
fq->ecn |= ecn;
if (payload_len > fq->q.max_size)
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index ee95cdcc8747..fe7a337b6bc7 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -198,7 +198,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
fq->iif = dev->ifindex;
fq->q.stamp = skb->tstamp;
- fq->q.mono_delivery_time = skb->mono_delivery_time;
+ fq->q.tstamp_type = skb->tstamp_type;
fq->q.meat += skb->len;
fq->ecn |= ecn;
add_frag_mem_limit(fq->q.fqdir, skb->truesize);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index bb7c3caf4f85..b9eafd771aa9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -975,7 +975,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
mark = inet_twsk(sk)->tw_mark;
else
mark = READ_ONCE(sk->sk_mark);
- skb_set_delivery_time(buff, tcp_transmit_time(sk), true);
+ skb_set_tstamp_type_frm_clkid(buff, tcp_transmit_time(sk), CLOCK_MONOTONIC);
}
if (txhash) {
/* autoflowlabel/skb_get_hash_flowi6 rely on buff->hash */
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 0e3cf11ae5fc..396b576390d0 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -54,8 +54,8 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb,
bpf_compute_data_pointers(skb);
filter_res = bpf_prog_run(filter, skb);
}
- if (unlikely(!skb->tstamp && skb->mono_delivery_time))
- skb->mono_delivery_time = 0;
+ if (unlikely(!skb->tstamp && skb->tstamp_type))
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
if (skb_sk_is_prefetched(skb) && filter_res != TC_ACT_OK)
skb_orphan(skb);
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 5e83e890f6a4..1941ebec23ff 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -104,8 +104,8 @@ TC_INDIRECT_SCOPE int cls_bpf_classify(struct sk_buff *skb,
bpf_compute_data_pointers(skb);
filter_res = bpf_prog_run(prog->filter, skb);
}
- if (unlikely(!skb->tstamp && skb->mono_delivery_time))
- skb->mono_delivery_time = 0;
+ if (unlikely(!skb->tstamp && skb->tstamp_type))
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
if (prog->exts_integrated) {
res->class = 0;
--
2.25.1
tstamp_type is now set based on actual clockid_t compressed
into 2 bits.
To make the design scalable for future needs this commit bring in
the change to extend the tstamp_type:1 to tstamp_type:2 to support
other clockid_t timestamp.
We now support CLOCK_TAI as part of tstamp_type as part of this
commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Abhishek Chauhan <[email protected]>
---
Changes since v4
- Made changes to BPF code in filter.c as per
Martin's comments
- Minor fixes on comments given on documentation
from Willem in skbuff.h (removed obvious ones)
- Made changes to ctx_rewrite.c and test_tc_dtime.c
- test_tc_dtime.c i am not really sure if i took care
of all the changes as i am not too familiar with
the framework.
- Introduce common mask SKB_TSTAMP_TYPE_MASK instead
of multiple SKB mask.
- Optimisation on BPF code as suggested by Martin.
- Set default case to SKB_CLOCK_REALTME.
Changes since v3
- Carefully reviewed BPF APIs and made changes in
BPF code as well.
- Re-used actual clockid_t values since skbuff.h
indirectly includes uapi/linux/time.h
- Added CLOCK_TAI as part of the skb_set_delivery_time
handling instead of CLOCK_USER
- Added default in switch for unsupported and invalid
timestamp with an WARN_ONCE
- All of the above comments were given by Willem
- Made changes in filter.c as per Martin's comments
to handle invalid cases in bpf code with addition of
SKB_TAI_DELIVERY_TIME_MASK
Changes since v2
- Minor changes to commit subject
Changes since v1
- identified additional changes in BPF framework.
- Bit shift in SKB_MONO_DELIVERY_TIME_MASK and TC_AT_INGRESS_MASK.
- Made changes in skb_set_delivery_time to keep changes similar to
previous code for mono_delivery_time and just setting tstamp_type
bit 1 for userspace timestamp.
include/linux/skbuff.h | 26 +++++++----
include/uapi/linux/bpf.h | 1 +
net/core/filter.c | 46 +++++++++++--------
net/ipv4/ip_output.c | 2 +-
net/ipv4/raw.c | 2 +-
net/ipv6/ip6_output.c | 6 +--
net/ipv6/raw.c | 2 +-
net/packet/af_packet.c | 7 ++-
.../selftests/bpf/prog_tests/ctx_rewrite.c | 10 ++--
.../selftests/bpf/progs/test_tc_dtime.c | 24 ++++++++--
10 files changed, 80 insertions(+), 46 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e464d0ebc9c1..3ad0de07d261 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -711,6 +711,8 @@ typedef unsigned char *sk_buff_data_t;
enum skb_tstamp_type {
SKB_CLOCK_REALTIME,
SKB_CLOCK_MONOTONIC,
+ SKB_CLOCK_TAI,
+ __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
};
/**
@@ -831,8 +833,8 @@ enum skb_tstamp_type {
* @decrypted: Decrypted SKB
* @slow_gro: state present at GRO time, slower prepare step required
* @tstamp_type: When set, skb->tstamp has the
- * delivery_time in mono clock base Otherwise, the
- * timestamp is considered real clock base.
+ * delivery_time in mono clock base or clock base of skb->tstamp.
+ * Otherwise, the timestamp is considered real clock base
* @napi_id: id of the NAPI struct this skb came from
* @sender_cpu: (aka @napi_id) source CPU in XPS
* @alloc_cpu: CPU which did the skb allocation.
@@ -960,7 +962,7 @@ struct sk_buff {
/* private: */
__u8 __mono_tc_offset[0];
/* public: */
- __u8 tstamp_type:1; /* See skb_tstamp_type */
+ __u8 tstamp_type:2; /* See skb_tstamp_type */
#ifdef CONFIG_NET_XGRESS
__u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
__u8 tc_skip_classify:1;
@@ -1090,15 +1092,17 @@ struct sk_buff {
#endif
#define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
-/* if you move tc_at_ingress or mono_delivery_time
+/* if you move tc_at_ingress or tstamp_type:2
* around, you also must adapt these constants.
*/
#ifdef __BIG_ENDIAN_BITFIELD
-#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
-#define TC_AT_INGRESS_MASK (1 << 6)
+#define SKB_TSTAMP_TYPE_MASK (3 << 6)
+#define SKB_TSTAMP_TYPE_RSH (6)
+#define TC_AT_INGRESS_RSH (5)
+#define TC_AT_INGRESS_MASK (1 << 5)
#else
-#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
-#define TC_AT_INGRESS_MASK (1 << 1)
+#define SKB_TSTAMP_TYPE_MASK (3)
+#define TC_AT_INGRESS_MASK (1 << 2)
#endif
#define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)
@@ -4204,6 +4208,12 @@ static inline void skb_set_tstamp_type_frm_clkid(struct sk_buff *skb,
case CLOCK_MONOTONIC:
skb->tstamp_type = SKB_CLOCK_MONOTONIC;
break;
+ case CLOCK_TAI:
+ skb->tstamp_type = SKB_CLOCK_TAI;
+ break;
+ default:
+ WARN_ONCE(true, "clockid %d not supported", clockid);
+ skb->tstamp_type = SKB_CLOCK_REALTIME;
}
}
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index cee0a7915c08..1376ed5ece10 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -6209,6 +6209,7 @@ union { \
enum {
BPF_SKB_TSTAMP_UNSPEC,
BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
+ BPF_SKB_TSTAMP_DELIVERY_TAI, /* tstamp has tai delivery time */
/* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
* the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
* and try to deduce it by ingress, egress or skb->sk->sk_clockid.
diff --git a/net/core/filter.c b/net/core/filter.c
index 957c2fc724eb..c67622f4fe98 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7733,6 +7733,12 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
skb->tstamp = tstamp;
skb->tstamp_type = SKB_CLOCK_MONOTONIC;
break;
+ case BPF_SKB_TSTAMP_DELIVERY_TAI:
+ if (!tstamp)
+ return -EINVAL;
+ skb->tstamp = tstamp;
+ skb->tstamp_type = SKB_CLOCK_TAI;
+ break;
case BPF_SKB_TSTAMP_UNSPEC:
if (tstamp)
return -EINVAL;
@@ -9388,17 +9394,17 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
{
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
- /* AX is needed because src_reg and dst_reg could be the same */
- __u8 tmp_reg = BPF_REG_AX;
-
- *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
- SKB_BF_MONO_TC_OFFSET);
- *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
- SKB_MONO_DELIVERY_TIME_MASK, 2);
- *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
- *insn++ = BPF_JMP_A(1);
- *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
-
+ BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
+ *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
+ *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK);
+#ifdef __BIG_ENDIAN_BITFIELD
+ *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH);
+#else
+ BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1));
+#endif
+ *insn++ = BPF_JMP32_IMM(BPF_JNE, value_reg, SKB_TSTAMP_TYPE_MASK, 1);
+ /* Both the bits set then mark it BPF_SKB_TSTAMP_UNSPEC */
+ *insn++ = BPF_MOV64_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
return insn;
}
@@ -9430,6 +9436,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
__u8 value_reg = si->dst_reg;
__u8 skb_reg = si->src_reg;
+BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
#ifdef CONFIG_NET_XGRESS
/* If the tstamp_type is read,
* the bpf prog is aware the tstamp could have delivery time.
@@ -9440,11 +9447,12 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
__u8 tmp_reg = BPF_REG_AX;
*insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
- *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
- TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
- *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
- TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
- /* skb->tc_at_ingress && skb->tstamp_type:1,
+ /* check if ingress mask bits is set */
+ *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
+ *insn++ = BPF_JMP_A(4);
+ *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_TSTAMP_TYPE_MASK, 1);
+ *insn++ = BPF_JMP_A(2);
+ /* skb->tc_at_ingress && skb->tstamp_type:2,
* read 0 as the (rcv) timestamp.
*/
*insn++ = BPF_MOV64_IMM(value_reg, 0);
@@ -9469,7 +9477,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
* the bpf prog is aware the tstamp could have delivery time.
* Thus, write skb->tstamp as is if tstamp_type_access is true.
* Otherwise, writing at ingress will have to clear the
- * mono_delivery_time (skb->tstamp_type:1)bit also.
+ * mono_delivery_time (skb->tstamp_type:2)bit also.
*/
if (!prog->tstamp_type_access) {
__u8 tmp_reg = BPF_REG_AX;
@@ -9479,8 +9487,8 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
*insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
/* goto <store> */
*insn++ = BPF_JMP_A(2);
- /* <clear>: mono_delivery_time or (skb->tstamp_type:1) */
- *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
+ /* <clear>: skb->tstamp_type:2 */
+ *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TSTAMP_TYPE_MASK);
*insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET);
}
#endif
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 591226dcde26..f195b31d6e75 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
skb->mark = cork->mark;
- skb->tstamp = cork->transmit_time;
+ skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
/*
* Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
* on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index dcb11f22cbf2..8b370369cdd8 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -360,7 +360,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
skb->protocol = htons(ETH_P_IP);
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
- skb->tstamp = sockc->transmit_time;
+ skb_set_tstamp_type_frm_clkid(skb, sockc->transmit_time, sk->sk_clockid);
skb_dst_set(skb, &rt->dst);
*rtp = NULL;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a9e819115622..63e4cc30d18d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
if (iter.frag)
ip6_fraglist_prepare(skb, &iter);
- skb_set_delivery_time(skb, tstamp, tstamp_type);
+ skb_set_tstamp_type_frm_clkid(skb, tstamp, tstamp_type);
err = output(net, sk, skb);
if (!err)
IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
@@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
/*
* Put this fragment into the sending queue.
*/
- skb_set_delivery_time(frag, tstamp, tstamp_type);
+ skb_set_tstamp_type_frm_clkid(frag, tstamp, tstamp_type);
err = output(net, sk, frag);
if (err)
goto fail;
@@ -1924,7 +1924,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = cork->base.mark;
- skb->tstamp = cork->base.transmit_time;
+ skb_set_tstamp_type_frm_clkid(skb, cork->base.transmit_time, sk->sk_clockid);
ip6_cork_steal_dst(skb, cork);
IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 0d896ca7b589..5649362577ab 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -621,7 +621,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
skb->protocol = htons(ETH_P_IPV6);
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
- skb->tstamp = sockc->transmit_time;
+ skb_set_tstamp_type_frm_clkid(skb, sockc->transmit_time, sk->sk_clockid);
skb_put(skb, length);
skb_reset_network_header(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8c6d3fbb4ed8..6a4a86c26d2a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2056,8 +2056,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
skb->dev = dev;
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = READ_ONCE(sk->sk_mark);
- skb->tstamp = sockc.transmit_time;
-
+ skb_set_tstamp_type_frm_clkid(skb, sockc.transmit_time, sk->sk_clockid);
skb_setup_tx_timestamp(skb, sockc.tsflags);
if (unlikely(extra_len == 4))
@@ -2585,7 +2584,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
skb->dev = dev;
skb->priority = READ_ONCE(po->sk.sk_priority);
skb->mark = READ_ONCE(po->sk.sk_mark);
- skb->tstamp = sockc->transmit_time;
+ skb_set_tstamp_type_frm_clkid(skb, sockc->transmit_time, po->sk.sk_clockid);
skb_setup_tx_timestamp(skb, sockc->tsflags);
skb_zcopy_set_nouarg(skb, ph.raw);
@@ -3063,7 +3062,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
skb->dev = dev;
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc.mark;
- skb->tstamp = sockc.transmit_time;
+ skb_set_tstamp_type_frm_clkid(skb, sockc.transmit_time, sk->sk_clockid);
if (unlikely(extra_len == 4))
skb->no_fcs = 1;
diff --git a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
index 3b7c57fe55a5..71940f4ef0fb 100644
--- a/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
+++ b/tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c
@@ -69,15 +69,17 @@ static struct test_case test_cases[] = {
{
N(SCHED_CLS, struct __sk_buff, tstamp),
.read = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
- "w11 &= 3;"
- "if w11 != 0x3 goto pc+2;"
+ "if w11 == 0x4 goto pc+1;"
+ "goto pc+4;"
+ "if w11 == 0x3 goto pc+1;"
+ "goto pc+2;"
"$dst = 0;"
"goto pc+1;"
"$dst = *(u64 *)($ctx + sk_buff::tstamp);",
.write = "r11 = *(u8 *)($ctx + sk_buff::__mono_tc_offset);"
- "if w11 & 0x2 goto pc+1;"
+ "if w11 & 0x4 goto pc+1;"
"goto pc+2;"
- "w11 &= -2;"
+ "w11 &= -3;"
"*(u8 *)($ctx + sk_buff::__mono_tc_offset) = r11;"
"*(u64 *)($ctx + sk_buff::tstamp) = $src;",
},
diff --git a/tools/testing/selftests/bpf/progs/test_tc_dtime.c b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
index 74ec09f040b7..19dba6d88265 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_dtime.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
@@ -227,6 +227,12 @@ int egress_host(struct __sk_buff *skb)
inc_dtimes(EGRESS_ENDHOST);
else
inc_errs(EGRESS_ENDHOST);
+ } else if (skb_proto(skb_type) == IPPROTO_UDP) {
+ if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI &&
+ skb->tstamp)
+ inc_dtimes(EGRESS_ENDHOST);
+ else
+ inc_errs(EGRESS_ENDHOST);
} else {
if (skb->tstamp_type == BPF_SKB_TSTAMP_UNSPEC &&
skb->tstamp)
@@ -255,6 +261,9 @@ int ingress_host(struct __sk_buff *skb)
if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO &&
skb->tstamp == EGRESS_FWDNS_MAGIC)
inc_dtimes(INGRESS_ENDHOST);
+ else if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI &&
+ skb->tstamp == EGRESS_FWDNS_MAGIC)
+ inc_dtimes(INGRESS_ENDHOST);
else
inc_errs(INGRESS_ENDHOST);
@@ -323,12 +332,14 @@ int ingress_fwdns_prio101(struct __sk_buff *skb)
/* Should have handled in prio100 */
return TC_ACT_SHOT;
- if (skb_proto(skb_type) == IPPROTO_UDP)
+ if (skb_proto(skb_type) == IPPROTO_UDP &&
+ skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI)
expected_dtime = 0;
if (skb->tstamp_type) {
if (fwdns_clear_dtime() ||
- skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
+ (skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO &&
+ skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI) ||
skb->tstamp != expected_dtime)
inc_errs(INGRESS_FWDNS_P101);
else
@@ -338,7 +349,8 @@ int ingress_fwdns_prio101(struct __sk_buff *skb)
inc_errs(INGRESS_FWDNS_P101);
}
- if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
+ if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
+ skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
skb->tstamp = INGRESS_FWDNS_MAGIC;
} else {
if (bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC,
@@ -370,7 +382,8 @@ int egress_fwdns_prio101(struct __sk_buff *skb)
if (skb->tstamp_type) {
if (fwdns_clear_dtime() ||
- skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
+ (skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO &&
+ skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI) ||
skb->tstamp != INGRESS_FWDNS_MAGIC)
inc_errs(EGRESS_FWDNS_P101);
else
@@ -380,7 +393,8 @@ int egress_fwdns_prio101(struct __sk_buff *skb)
inc_errs(EGRESS_FWDNS_P101);
}
- if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
+ if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
+ skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
skb->tstamp = EGRESS_FWDNS_MAGIC;
} else {
if (bpf_skb_set_tstamp(skb, EGRESS_FWDNS_MAGIC,
--
2.25.1
Abhishek Chauhan wrote:
> mono_delivery_time was added to check if skb->tstamp has delivery
> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
> timestamp in ingress and delivery_time at egress.
>
> Renaming the bitfield from mono_delivery_time to tstamp_type is for
> extensibilty for other timestamps such as userspace timestamp
> (i.e. SO_TXTIME) set via sock opts.
>
> As we are renaming the mono_delivery_time to tstamp_type, it makes
> sense to start assigning tstamp_type based on enum defined
> in this commit.
>
> Earlier we used bool arg flag to check if the tstamp is mono in
> function skb_set_delivery_time, Now the signature of the functions
> accepts tstamp_type to distinguish between mono and real time.
>
> Introduce a new function to set tstamp_type based on clockid.
>
> In future tstamp_type:1 can be extended to support userspace timestamp
> by increasing the bitfield.
>
> Link: https://lore.kernel.org/netdev/[email protected]/
> Signed-off-by: Abhishek Chauhan <[email protected]>
> +static inline void skb_set_tstamp_type_frm_clkid(struct sk_buff *skb,
> + ktime_t kt, clockid_t clockid)
> +{
Please don't garble words to save a few characters: .._from_clockid.
And this is essentially skb_set_delivery_type, just taking another
type. So skb_set_delivery_type_(by|from)_clockid.
Also, instead of reimplementing the same logic with a different
type, could implement as a conversion function that calls the main
function. It won't save lines. But will avoid duplicate logic that
needs to be kept in sync whenever there are future changes (fragile).
static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
ktime_t kt, clockid_t clockid)
{
u8 tstamp_type = SKB_CLOCK_REAL;
switch(clockid) {
case CLOCK_REALTIME:
break;
case CLOCK_MONOTONIC:
tstamp_type = SKB_CLOCK_MONO;
break;
default:
WARN_ON_ONCE(1);
kt = 0;
};
skb_set_delivery_type(skb, kt, tstamp_type);
}
> + skb->tstamp = kt;
> +
> + if (!kt) {
> + skb->tstamp_type = SKB_CLOCK_REALTIME;
> + return;
> + }
> +
> + switch (clockid) {
> + case CLOCK_REALTIME:
> + skb->tstamp_type = SKB_CLOCK_REALTIME;
> + break;
> + case CLOCK_MONOTONIC:
> + skb->tstamp_type = SKB_CLOCK_MONOTONIC;
> + break;
> + }
> +}
> +
> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> - bool mono)
> + u8 tstamp_type)
Indentation change: error?
> @@ -9444,7 +9444,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
> *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
> - /* skb->tc_at_ingress && skb->mono_delivery_time,
> + /* skb->tc_at_ingress && skb->tstamp_type:1,
Is the :1 a stale comment after we discussed how to handle the 2-bit
field going forward? I.e., not by ignoring the second bit.
Abhishek Chauhan wrote:
> tstamp_type is now set based on actual clockid_t compressed
> into 2 bits.
>
> To make the design scalable for future needs this commit bring in
> the change to extend the tstamp_type:1 to tstamp_type:2 to support
> other clockid_t timestamp.
>
> We now support CLOCK_TAI as part of tstamp_type as part of this
> commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.
>
> Link: https://lore.kernel.org/netdev/[email protected]/
> Signed-off-by: Abhishek Chauhan <[email protected]>
> ---
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index e464d0ebc9c1..3ad0de07d261 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -711,6 +711,8 @@ typedef unsigned char *sk_buff_data_t;
> enum skb_tstamp_type {
> SKB_CLOCK_REALTIME,
> SKB_CLOCK_MONOTONIC,
> + SKB_CLOCK_TAI,
> + __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
> };
>
> /**
> @@ -831,8 +833,8 @@ enum skb_tstamp_type {
> * @decrypted: Decrypted SKB
> * @slow_gro: state present at GRO time, slower prepare step required
> * @tstamp_type: When set, skb->tstamp has the
> - * delivery_time in mono clock base Otherwise, the
> - * timestamp is considered real clock base.
> + * delivery_time in mono clock base or clock base of skb->tstamp.
drop "in mono clock base or "
> + * Otherwise, the timestamp is considered real clock base
drop this: whenever in realtime clock base, tstamp_type is zero, so
the above shorter statement always holds.
> * @napi_id: id of the NAPI struct this skb came from
> * @sender_cpu: (aka @napi_id) source CPU in XPS
> * @alloc_cpu: CPU which did the skb allocation.
> @@ -960,7 +962,7 @@ struct sk_buff {
> /* private: */
> __u8 __mono_tc_offset[0];
> /* public: */
> - __u8 tstamp_type:1; /* See skb_tstamp_type */
> + __u8 tstamp_type:2; /* See skb_tstamp_type */
> #ifdef CONFIG_NET_XGRESS
> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
> __u8 tc_skip_classify:1;
> @@ -1090,15 +1092,17 @@ struct sk_buff {
> #endif
> #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
>
> -/* if you move tc_at_ingress or mono_delivery_time
> +/* if you move tc_at_ingress or tstamp_type:2
> * around, you also must adapt these constants.
> */
> #ifdef __BIG_ENDIAN_BITFIELD
> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
> -#define TC_AT_INGRESS_MASK (1 << 6)
> +#define SKB_TSTAMP_TYPE_MASK (3 << 6)
> +#define SKB_TSTAMP_TYPE_RSH (6)
> +#define TC_AT_INGRESS_RSH (5)
I had to find BPF_RSH to understand this abbreviation.
use SHIFT instead of RSH, as that is so domain specific?
> +#define TC_AT_INGRESS_MASK (1 << 5)
> #else
> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
> -#define TC_AT_INGRESS_MASK (1 << 1)
> +#define SKB_TSTAMP_TYPE_MASK (3)
> +#define TC_AT_INGRESS_MASK (1 << 2)
> #endif
> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)
>
> - if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
> + skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
Peculiar indentation?
Just FYI that I'm not the best person to review the BPF part.
Thankfully Martin is helping you with that.
On 4/25/2024 7:31 AM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> mono_delivery_time was added to check if skb->tstamp has delivery
>> time in mono clock base (i.e. EDT) otherwise skb->tstamp has
>> timestamp in ingress and delivery_time at egress.
>>
>> Renaming the bitfield from mono_delivery_time to tstamp_type is for
>> extensibilty for other timestamps such as userspace timestamp
>> (i.e. SO_TXTIME) set via sock opts.
>>
>> As we are renaming the mono_delivery_time to tstamp_type, it makes
>> sense to start assigning tstamp_type based on enum defined
>> in this commit.
>>
>> Earlier we used bool arg flag to check if the tstamp is mono in
>> function skb_set_delivery_time, Now the signature of the functions
>> accepts tstamp_type to distinguish between mono and real time.
>>
>> Introduce a new function to set tstamp_type based on clockid.
>>
>> In future tstamp_type:1 can be extended to support userspace timestamp
>> by increasing the bitfield.
>>
>> Link: https://lore.kernel.org/netdev/[email protected]/
>> Signed-off-by: Abhishek Chauhan <[email protected]>
>
>> +static inline void skb_set_tstamp_type_frm_clkid(struct sk_buff *skb,
>> + ktime_t kt, clockid_t clockid)
>> +{
>
> Please don't garble words to save a few characters: .._from_clockid.
>
Noted and apologies for using garble words here. I will correct it.
> And this is essentially skb_set_delivery_type, just taking another
> type. So skb_set_delivery_type_(by|from)_clockid.
>
> Also, instead of reimplementing the same logic with a different
> type, could implement as a conversion function that calls the main
> function. It won't save lines. But will avoid duplicate logic that
> needs to be kept in sync whenever there are future changes (fragile).
>
I thought about doing this but as you remember that some places in the network stack,
we are passing tstamp_type and some places we are passing clockid.
So in the previous patchset we decided with two variants.
1. One that assigns the tstamp_type directly
2. Other one which computes tstamp_type based on clockid
But i agree on the above comment that if we implement two different variants
then in future it requires maintenance to both functions.
I will make sure i handle both cases in one func.
> static inline void skb_set_delivery_type_by_clockid(struct sk_buff *skb,
> ktime_t kt, clockid_t clockid)
> {
> u8 tstamp_type = SKB_CLOCK_REAL;
>
> switch(clockid) {
> case CLOCK_REALTIME:
> break;
> case CLOCK_MONOTONIC:
> tstamp_type = SKB_CLOCK_MONO;
> break;
> default:
> WARN_ON_ONCE(1);
> kt = 0;
> };
>
> skb_set_delivery_type(skb, kt, tstamp_type);
> }
>
>
>> + skb->tstamp = kt;
>> +
>> + if (!kt) {
>> + skb->tstamp_type = SKB_CLOCK_REALTIME;
>> + return;
>> + }
>> +
>> + switch (clockid) {
>> + case CLOCK_REALTIME:
>> + skb->tstamp_type = SKB_CLOCK_REALTIME;
>> + break;
>> + case CLOCK_MONOTONIC:
>> + skb->tstamp_type = SKB_CLOCK_MONOTONIC;
>> + break;
>> + }
>> +}
>> +
>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>> - bool mono)
>> + u8 tstamp_type)
>
> Indentation change: error?
>>> @@ -9444,7 +9444,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
>> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
>> *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
>> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
>> - /* skb->tc_at_ingress && skb->mono_delivery_time,
>> + /* skb->tc_at_ingress && skb->tstamp_type:1,
>
> Is the :1 a stale comment after we discussed how to handle the 2-bit
This is first patch which does not add tstamp_type:2 at the moment.
This series is divided into two patches
1. One patchset => Just rename (So the comment is still skb->tstamp_type:1)
2. Second patchset => add another bit (comment is changed to skb->tstamp_type:2)
> field going forward? I.e., not by ignoring the second bit.
>
>
On 4/25/2024 7:42 AM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> tstamp_type is now set based on actual clockid_t compressed
>> into 2 bits.
>>
>> To make the design scalable for future needs this commit bring in
>> the change to extend the tstamp_type:1 to tstamp_type:2 to support
>> other clockid_t timestamp.
>>
>> We now support CLOCK_TAI as part of tstamp_type as part of this
>> commit with exisiting support CLOCK_MONOTONIC and CLOCK_REALTIME.
>>
>> Link: https://lore.kernel.org/netdev/[email protected]/
>> Signed-off-by: Abhishek Chauhan <[email protected]>
>> ---
>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index e464d0ebc9c1..3ad0de07d261 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -711,6 +711,8 @@ typedef unsigned char *sk_buff_data_t;
>> enum skb_tstamp_type {
>> SKB_CLOCK_REALTIME,
>> SKB_CLOCK_MONOTONIC,
>> + SKB_CLOCK_TAI,
>> + __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
>> };
>>
>> /**
>> @@ -831,8 +833,8 @@ enum skb_tstamp_type {
>> * @decrypted: Decrypted SKB
>> * @slow_gro: state present at GRO time, slower prepare step required
>> * @tstamp_type: When set, skb->tstamp has the
>> - * delivery_time in mono clock base Otherwise, the
>> - * timestamp is considered real clock base.
>> + * delivery_time in mono clock base or clock base of skb->tstamp.
>
> drop "in mono clock base or "
>
Noted
>> + * Otherwise, the timestamp is considered real clock base
>
Noted
> drop this: whenever in realtime clock base, tstamp_type is zero, so
> the above shorter statement always holds.
>
>> * @napi_id: id of the NAPI struct this skb came from
>> * @sender_cpu: (aka @napi_id) source CPU in XPS
>> * @alloc_cpu: CPU which did the skb allocation.
>> @@ -960,7 +962,7 @@ struct sk_buff {
>> /* private: */
>> __u8 __mono_tc_offset[0];
>> /* public: */
>> - __u8 tstamp_type:1; /* See skb_tstamp_type */
>> + __u8 tstamp_type:2; /* See skb_tstamp_type */
>> #ifdef CONFIG_NET_XGRESS
>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
>> __u8 tc_skip_classify:1;
>> @@ -1090,15 +1092,17 @@ struct sk_buff {
>> #endif
>> #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
>>
>> -/* if you move tc_at_ingress or mono_delivery_time
>> +/* if you move tc_at_ingress or tstamp_type:2
>> * around, you also must adapt these constants.
>> */
>> #ifdef __BIG_ENDIAN_BITFIELD
>> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
>> -#define TC_AT_INGRESS_MASK (1 << 6)
>> +#define SKB_TSTAMP_TYPE_MASK (3 << 6)
>> +#define SKB_TSTAMP_TYPE_RSH (6)
>> +#define TC_AT_INGRESS_RSH (5)
>
> I had to find BPF_RSH to understand this abbreviation.
>
> use SHIFT instead of RSH, as that is so domain specific?
>
Noted! I will use complete words instead of abbreviations
>> +#define TC_AT_INGRESS_MASK (1 << 5)
>> #else
>> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
>> -#define TC_AT_INGRESS_MASK (1 << 1)
>> +#define SKB_TSTAMP_TYPE_MASK (3)
>> +#define TC_AT_INGRESS_MASK (1 << 2)
>> #endif
>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)
>>
>
>> - if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
>> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
>> + skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
>
> Peculiar indentation?
>
Let me check why the indentation here is messed up. Ideally i run checkpatch(shows 0 errors or warnings)
and also check before raising a patch. Internally it looks good but on the patch it shows differently.
> Just FYI that I'm not the best person to review the BPF part.
> Thankfully Martin is helping you with that.
>
I will wait for comments from Martin as well.
>
>
On 4/25/24 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>>> @@ -9444,7 +9444,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
>>> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
>>> *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
>>> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
>>> - /* skb->tc_at_ingress && skb->mono_delivery_time,
>>> + /* skb->tc_at_ingress && skb->tstamp_type:1,
>> Is the :1 a stale comment after we discussed how to handle the 2-bit
> This is first patch which does not add tstamp_type:2 at the moment.
> This series is divided into two patches
> 1. One patchset => Just rename (So the comment is still skb->tstamp_type:1)
> 2. Second patchset => add another bit (comment is changed to skb->tstamp_type:2)
I would suggest to completely avoid the ":1" or ":2" part in patch 1. Just use
"... && skb->tstamp_type". The number of bits does not matter. The tstamp_type
will still be considered as a whole even if it would become 3 bits (unlikely) in
the future.
On 4/25/2024 4:50 PM, Martin KaFai Lau wrote:
> On 4/25/24 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>>>> @@ -9444,7 +9444,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
>>>> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
>>>> *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
>>>> TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
>>>> - /* skb->tc_at_ingress && skb->mono_delivery_time,
>>>> + /* skb->tc_at_ingress && skb->tstamp_type:1,
>>> Is the :1 a stale comment after we discussed how to handle the 2-bit
>> This is first patch which does not add tstamp_type:2 at the moment.
>> This series is divided into two patches
>> 1. One patchset => Just rename (So the comment is still skb->tstamp_type:1)
>> 2. Second patchset => add another bit (comment is changed to skb->tstamp_type:2)
>
> I would suggest to completely avoid the ":1" or ":2" part in patch 1. Just use "... && skb->tstamp_type". The number of bits does not matter. The tstamp_type will still be considered as a whole even if it would become 3 bits (unlikely) in the future.
Okay i will just keep it as skb->tstamp_type instead of adding bitfields.
On 4/24/24 3:20 PM, Abhishek Chauhan wrote:
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index e464d0ebc9c1..3ad0de07d261 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -711,6 +711,8 @@ typedef unsigned char *sk_buff_data_t;
> enum skb_tstamp_type {
> SKB_CLOCK_REALTIME,
> SKB_CLOCK_MONOTONIC,
> + SKB_CLOCK_TAI,
> + __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
> };
>
> /**
> @@ -831,8 +833,8 @@ enum skb_tstamp_type {
> * @decrypted: Decrypted SKB
> * @slow_gro: state present at GRO time, slower prepare step required
> * @tstamp_type: When set, skb->tstamp has the
> - * delivery_time in mono clock base Otherwise, the
> - * timestamp is considered real clock base.
> + * delivery_time in mono clock base or clock base of skb->tstamp.
> + * Otherwise, the timestamp is considered real clock base
> * @napi_id: id of the NAPI struct this skb came from
> * @sender_cpu: (aka @napi_id) source CPU in XPS
> * @alloc_cpu: CPU which did the skb allocation.
> @@ -960,7 +962,7 @@ struct sk_buff {
> /* private: */
> __u8 __mono_tc_offset[0];
> /* public: */
> - __u8 tstamp_type:1; /* See skb_tstamp_type */
> + __u8 tstamp_type:2; /* See skb_tstamp_type */
> #ifdef CONFIG_NET_XGRESS
> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
> __u8 tc_skip_classify:1;
> @@ -1090,15 +1092,17 @@ struct sk_buff {
> #endif
> #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
>
> -/* if you move tc_at_ingress or mono_delivery_time
> +/* if you move tc_at_ingress or tstamp_type:2
> * around, you also must adapt these constants.
> */
> #ifdef __BIG_ENDIAN_BITFIELD
> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
> -#define TC_AT_INGRESS_MASK (1 << 6)
> +#define SKB_TSTAMP_TYPE_MASK (3 << 6)
> +#define SKB_TSTAMP_TYPE_RSH (6)
> +#define TC_AT_INGRESS_RSH (5)
TC_AT_INGRESS_RSH is not used.
> +#define TC_AT_INGRESS_MASK (1 << 5)
> #else
> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
> -#define TC_AT_INGRESS_MASK (1 << 1)
> +#define SKB_TSTAMP_TYPE_MASK (3)
> +#define TC_AT_INGRESS_MASK (1 << 2)
> #endif
> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)
>
> @@ -4204,6 +4208,12 @@ static inline void skb_set_tstamp_type_frm_clkid(struct sk_buff *skb,
> case CLOCK_MONOTONIC:
> skb->tstamp_type = SKB_CLOCK_MONOTONIC;
> break;
> + case CLOCK_TAI:
> + skb->tstamp_type = SKB_CLOCK_TAI;
> + break;
> + default:
> + WARN_ONCE(true, "clockid %d not supported", clockid);
> + skb->tstamp_type = SKB_CLOCK_REALTIME;
> }
> }
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index cee0a7915c08..1376ed5ece10 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
The bpf.h needs to be sync to tools/include/uapi/linux/bpf.h.
Otherwise, the bpf CI cannot compile the tests:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
Please monitor the bpf CI test result after submitting the patches.
> @@ -6209,6 +6209,7 @@ union { \
> enum {
> BPF_SKB_TSTAMP_UNSPEC,
> BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
> + BPF_SKB_TSTAMP_DELIVERY_TAI, /* tstamp has tai delivery time */
> /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
> * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
> * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
SKB_CLOCK_TAI is properly defined as an enum now and there is a
WARN for clock other than REAL, MONO, and TAI. I think it is
time to remove UNSPEC and give it back the proper name REALTIME.
I want to take this chance to do some renaming:
/* The enum used in skb->tstamp_type. It specifies the clock type
* of the time stored in the skb->tstamp.
*/
enum {
BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
BPF_SKB_CLOCK_REALTIME = 0, /* Realtime clock */
BPF_SKB_CLOCK_MONOTONIC = 1, /* Monotonic clock */
BPF_SKB_CLOCK_TAI = 2, /* TAI clock */
/* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
* the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
*/
};
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 957c2fc724eb..c67622f4fe98 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -7733,6 +7733,12 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
> skb->tstamp = tstamp;
> skb->tstamp_type = SKB_CLOCK_MONOTONIC;
> break;
> + case BPF_SKB_TSTAMP_DELIVERY_TAI:
> + if (!tstamp)
> + return -EINVAL;
> + skb->tstamp = tstamp;
> + skb->tstamp_type = SKB_CLOCK_TAI;
> + break;
> case BPF_SKB_TSTAMP_UNSPEC:
> if (tstamp)
Allow to store any realtime tstamp here since BPF_SKB_TSTAMP_UNSPEC
becomes BPF_SKB_CLOCK_REALTIME.
Like:
BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
u64, tstamp, u32, tstamp_type)
{
/* ... */
case BPF_SKB_CLOCK_TAI:
if (!tstamp)
return -EINVAL;
skb->tstamp = tstamp;
skb->tstamp_type = SKB_CLOCK_TAI;
break;
case BPF_SKB_CLOCK_REALTIME:
skb->tstamp = tstamp;
skb->tstamp_type = SKB_CLOCK_REALTIME;
break;
/* ... */
}
> return -EINVAL;
> @@ -9388,17 +9394,17 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
> {
> __u8 value_reg = si->dst_reg;
> __u8 skb_reg = si->src_reg;
> - /* AX is needed because src_reg and dst_reg could be the same */
> - __u8 tmp_reg = BPF_REG_AX;
> -
> - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
> - SKB_BF_MONO_TC_OFFSET);
> - *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
> - SKB_MONO_DELIVERY_TIME_MASK, 2);
> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
> - *insn++ = BPF_JMP_A(1);
> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
> -
> + BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
Add these also:
BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
BUILD_BUG_ON(SKB_CLOCK_TAI != BPF_SKB_CLOCK_TAI);
> + *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
> + *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK);
> +#ifdef __BIG_ENDIAN_BITFIELD
> + *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH);
> +#else
> + BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1));
> +#endif
> + *insn++ = BPF_JMP32_IMM(BPF_JNE, value_reg, SKB_TSTAMP_TYPE_MASK, 1);
> + /* Both the bits set then mark it BPF_SKB_TSTAMP_UNSPEC */
> + *insn++ = BPF_MOV64_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
The kernel should not have both bits set in skb->tstamp_type. No need to
add two extra bpf insns to check this. If there is a bug in the kernel,
it is better to be uncovered instead of hiding it under BPF_SKB_TSTAMP_UNSPEC (which
is renamed to BPF_SKB_CLOCK_REALTIME anyway).
Hence, the last two bpf insns should be removed.
> return insn;
> }
>
> @@ -9430,6 +9436,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
> __u8 value_reg = si->dst_reg;
> __u8 skb_reg = si->src_reg;
>
> +BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
It is a dup of the one in bpf_convert_tstamp_type_read and can be removed.
> #ifdef CONFIG_NET_XGRESS
> /* If the tstamp_type is read,
> * the bpf prog is aware the tstamp could have delivery time.
> @@ -9440,11 +9447,12 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
> __u8 tmp_reg = BPF_REG_AX;
>
> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
> - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
> - /* skb->tc_at_ingress && skb->tstamp_type:1,
> + /* check if ingress mask bits is set */
> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
> + *insn++ = BPF_JMP_A(4);
> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_TSTAMP_TYPE_MASK, 1);
> + *insn++ = BPF_JMP_A(2);
> + /* skb->tc_at_ingress && skb->tstamp_type:2,
> * read 0 as the (rcv) timestamp.
> */
> *insn++ = BPF_MOV64_IMM(value_reg, 0);
> @@ -9469,7 +9477,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
> * the bpf prog is aware the tstamp could have delivery time.
> * Thus, write skb->tstamp as is if tstamp_type_access is true.
> * Otherwise, writing at ingress will have to clear the
> - * mono_delivery_time (skb->tstamp_type:1)bit also.
> + * mono_delivery_time (skb->tstamp_type:2)bit also.
> */
> if (!prog->tstamp_type_access) {
> __u8 tmp_reg = BPF_REG_AX;
> @@ -9479,8 +9487,8 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
> /* goto <store> */
> *insn++ = BPF_JMP_A(2);
> - /* <clear>: mono_delivery_time or (skb->tstamp_type:1) */
> - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
> + /* <clear>: skb->tstamp_type:2 */
> + *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TSTAMP_TYPE_MASK);
> *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET);
> }
> #endif
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 591226dcde26..f195b31d6e75 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>
> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> skb->mark = cork->mark;
> - skb->tstamp = cork->transmit_time;
> + skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
hmm... I think this will break for tcp. This sequence in particular:
tcp_v4_timewait_ack()
tcp_v4_send_ack()
ip_send_unicast_reply()
ip_push_pending_frames()
ip_finish_skb()
__ip_make_skb()
/* sk_clockid is REAL but cork->transmit_time should be in mono */
skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);;
I think I hit it from time to time when running the test in this patch set.
[ ... ]
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_dtime.c b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
> index 74ec09f040b7..19dba6d88265 100644
> --- a/tools/testing/selftests/bpf/progs/test_tc_dtime.c
> +++ b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
Please separate the selftests/bpf changes into another patch.
> @@ -227,6 +227,12 @@ int egress_host(struct __sk_buff *skb)
> inc_dtimes(EGRESS_ENDHOST);
> else
> inc_errs(EGRESS_ENDHOST);
> + } else if (skb_proto(skb_type) == IPPROTO_UDP) {
> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI &&
> + skb->tstamp)
> + inc_dtimes(EGRESS_ENDHOST);
> + else
> + inc_errs(EGRESS_ENDHOST);
> } else {
> if (skb->tstamp_type == BPF_SKB_TSTAMP_UNSPEC &&
> skb->tstamp)
> @@ -255,6 +261,9 @@ int ingress_host(struct __sk_buff *skb)
> if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO &&
> skb->tstamp == EGRESS_FWDNS_MAGIC)
> inc_dtimes(INGRESS_ENDHOST);
> + else if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI &&
> + skb->tstamp == EGRESS_FWDNS_MAGIC)
> + inc_dtimes(INGRESS_ENDHOST);
> else
> inc_errs(INGRESS_ENDHOST);
>
> @@ -323,12 +332,14 @@ int ingress_fwdns_prio101(struct __sk_buff *skb)
> /* Should have handled in prio100 */
> return TC_ACT_SHOT;
>
> - if (skb_proto(skb_type) == IPPROTO_UDP)
> + if (skb_proto(skb_type) == IPPROTO_UDP &&
> + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI)
> expected_dtime = 0;
The IPPROTO_UDP check and expected_dtime can be removed. The UDP test
can expect the same EGRESS_ENDHOST_MAGIC in the skb->tstamp since
the TAI tstamp is also forwarded from egress to ingress now.
>
> if (skb->tstamp_type) {
> if (fwdns_clear_dtime() ||
> - skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
> + (skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO &&
> + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI) ||
> skb->tstamp != expected_dtime)
> inc_errs(INGRESS_FWDNS_P101);
> else
> @@ -338,7 +349,8 @@ int ingress_fwdns_prio101(struct __sk_buff *skb)
> inc_errs(INGRESS_FWDNS_P101);
> }
>
> - if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
> + skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
No need to check BPF_SKB_TSTAMP_DELIVERY_TAI such that the
bpf_skb_set_tstamp() helper can still be tested.
There are some other minor changes needed for the test_tc_dtime.c and
the tc_redirect.c. I quickly made the changes and put them here (first patch):
https://git.kernel.org/pub/scm/linux/kernel/git/martin.lau/bpf-next.git/log/?h=skb.tstamp_type
> skb->tstamp = INGRESS_FWDNS_MAGIC;
> } else {
> if (bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC,
> @@ -370,7 +382,8 @@ int egress_fwdns_prio101(struct __sk_buff *skb)
>
> if (skb->tstamp_type) {
> if (fwdns_clear_dtime() ||
> - skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
> + (skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO &&
> + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI) ||
> skb->tstamp != INGRESS_FWDNS_MAGIC)
> inc_errs(EGRESS_FWDNS_P101);
> else
> @@ -380,7 +393,8 @@ int egress_fwdns_prio101(struct __sk_buff *skb)
> inc_errs(EGRESS_FWDNS_P101);
> }
>
> - if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
> + skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
> skb->tstamp = EGRESS_FWDNS_MAGIC;
> } else {
> if (bpf_skb_set_tstamp(skb, EGRESS_FWDNS_MAGIC,
On 4/25/2024 9:37 PM, Martin KaFai Lau wrote:
> On 4/24/24 3:20 PM, Abhishek Chauhan wrote:
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index e464d0ebc9c1..3ad0de07d261 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -711,6 +711,8 @@ typedef unsigned char *sk_buff_data_t;
>> enum skb_tstamp_type {
>> SKB_CLOCK_REALTIME,
>> SKB_CLOCK_MONOTONIC,
>> + SKB_CLOCK_TAI,
>> + __SKB_CLOCK_MAX = SKB_CLOCK_TAI,
>> };
>> /**
>> @@ -831,8 +833,8 @@ enum skb_tstamp_type {
>> * @decrypted: Decrypted SKB
>> * @slow_gro: state present at GRO time, slower prepare step required
>> * @tstamp_type: When set, skb->tstamp has the
>> - * delivery_time in mono clock base Otherwise, the
>> - * timestamp is considered real clock base.
>> + * delivery_time in mono clock base or clock base of skb->tstamp.
>> + * Otherwise, the timestamp is considered real clock base
>> * @napi_id: id of the NAPI struct this skb came from
>> * @sender_cpu: (aka @napi_id) source CPU in XPS
>> * @alloc_cpu: CPU which did the skb allocation.
>> @@ -960,7 +962,7 @@ struct sk_buff {
>> /* private: */
>> __u8 __mono_tc_offset[0];
>> /* public: */
>> - __u8 tstamp_type:1; /* See skb_tstamp_type */
>> + __u8 tstamp_type:2; /* See skb_tstamp_type */
>> #ifdef CONFIG_NET_XGRESS
>> __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */
>> __u8 tc_skip_classify:1;
>> @@ -1090,15 +1092,17 @@ struct sk_buff {
>> #endif
>> #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset)
>> -/* if you move tc_at_ingress or mono_delivery_time
>> +/* if you move tc_at_ingress or tstamp_type:2
>> * around, you also must adapt these constants.
>> */
>> #ifdef __BIG_ENDIAN_BITFIELD
>> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 7)
>> -#define TC_AT_INGRESS_MASK (1 << 6)
>> +#define SKB_TSTAMP_TYPE_MASK (3 << 6)
>> +#define SKB_TSTAMP_TYPE_RSH (6)
>> +#define TC_AT_INGRESS_RSH (5)
>
> TC_AT_INGRESS_RSH is not used.
>
Noted. I will remove it.
>> +#define TC_AT_INGRESS_MASK (1 << 5)
>> #else
>> -#define SKB_MONO_DELIVERY_TIME_MASK (1 << 0)
>> -#define TC_AT_INGRESS_MASK (1 << 1)
>> +#define SKB_TSTAMP_TYPE_MASK (3)
>> +#define TC_AT_INGRESS_MASK (1 << 2)
>> #endif
>> #define SKB_BF_MONO_TC_OFFSET offsetof(struct sk_buff, __mono_tc_offset)
>> @@ -4204,6 +4208,12 @@ static inline void skb_set_tstamp_type_frm_clkid(struct sk_buff *skb,
>> case CLOCK_MONOTONIC:
>> skb->tstamp_type = SKB_CLOCK_MONOTONIC;
>> break;
>> + case CLOCK_TAI:
>> + skb->tstamp_type = SKB_CLOCK_TAI;
>> + break;
>> + default:
>> + WARN_ONCE(true, "clockid %d not supported", clockid);
>> + skb->tstamp_type = SKB_CLOCK_REALTIME;
>> }
>> }
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index cee0a7915c08..1376ed5ece10 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>
> The bpf.h needs to be sync to tools/include/uapi/linux/bpf.h.
Oh i see. There is a bpf.h in tools as well . I was not aware of this part. I will further check this.
> Otherwise, the bpf CI cannot compile the tests:
>
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
>
> Please monitor the bpf CI test result after submitting the patches.
>
>> @@ -6209,6 +6209,7 @@ union { \
>> enum {
>> BPF_SKB_TSTAMP_UNSPEC,
>> BPF_SKB_TSTAMP_DELIVERY_MONO, /* tstamp has mono delivery time */
>> + BPF_SKB_TSTAMP_DELIVERY_TAI, /* tstamp has tai delivery time */
>> /* For any BPF_SKB_TSTAMP_* that the bpf prog cannot handle,
>> * the bpf prog should handle it like BPF_SKB_TSTAMP_UNSPEC
>> * and try to deduce it by ingress, egress or skb->sk->sk_clockid.
>
> SKB_CLOCK_TAI is properly defined as an enum now and there is a
> WARN for clock other than REAL, MONO, and TAI. I think it is
> time to remove UNSPEC and give it back the proper name REALTIME.
>
> I want to take this chance to do some renaming:
>
> /* The enum used in skb->tstamp_type. It specifies the clock type
> * of the time stored in the skb->tstamp.
> */
> enum {
> BPF_SKB_TSTAMP_UNSPEC = 0, /* DEPRECATED */
> BPF_SKB_TSTAMP_DELIVERY_MONO = 1, /* DEPRECATED */
> BPF_SKB_CLOCK_REALTIME = 0, /* Realtime clock */
> BPF_SKB_CLOCK_MONOTONIC = 1, /* Monotonic clock */
> BPF_SKB_CLOCK_TAI = 2, /* TAI clock */
> /* For any future BPF_SKB_CLOCK_* that the bpf prog cannot handle,
> * the bpf prog can try to deduce it by ingress/egress/skb->sk->sk_clockid.
> */
> };
>
Okay let me evalute this and get back
>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 957c2fc724eb..c67622f4fe98 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -7733,6 +7733,12 @@ BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
>> skb->tstamp = tstamp;
>> skb->tstamp_type = SKB_CLOCK_MONOTONIC;
>> break;
>> + case BPF_SKB_TSTAMP_DELIVERY_TAI:
>> + if (!tstamp)
>> + return -EINVAL;
>> + skb->tstamp = tstamp;
>> + skb->tstamp_type = SKB_CLOCK_TAI;
>> + break;
>> case BPF_SKB_TSTAMP_UNSPEC:
>> if (tstamp)
>
> Allow to store any realtime tstamp here since BPF_SKB_TSTAMP_UNSPEC
> becomes BPF_SKB_CLOCK_REALTIME.
>
> Like:
>
> BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
> u64, tstamp, u32, tstamp_type)
> {
> /* ... */
> case BPF_SKB_CLOCK_TAI:
> if (!tstamp)
> return -EINVAL;
> skb->tstamp = tstamp;
> skb->tstamp_type = SKB_CLOCK_TAI;
> break;
> case BPF_SKB_CLOCK_REALTIME:
> skb->tstamp = tstamp;
> skb->tstamp_type = SKB_CLOCK_REALTIME;
> break;
>
> /* ... */
> }
Noted!
>
>> return -EINVAL;
>
>> @@ -9388,17 +9394,17 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
>> {
>> __u8 value_reg = si->dst_reg;
>> __u8 skb_reg = si->src_reg;
>> - /* AX is needed because src_reg and dst_reg could be the same */
>> - __u8 tmp_reg = BPF_REG_AX;
>> -
>> - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
>> - SKB_BF_MONO_TC_OFFSET);
>> - *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
>> - SKB_MONO_DELIVERY_TIME_MASK, 2);
>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
>> - *insn++ = BPF_JMP_A(1);
>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
>> -
>> + BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
>
> Add these also:
>
Noted!
> BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
> BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
> BUILD_BUG_ON(SKB_CLOCK_TAI != BPF_SKB_CLOCK_TAI);
>
>> + *insn++ = BPF_LDX_MEM(BPF_B, value_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
>> + *insn++ = BPF_ALU32_IMM(BPF_AND, value_reg, SKB_TSTAMP_TYPE_MASK);
>> +#ifdef __BIG_ENDIAN_BITFIELD
>> + *insn++ = BPF_ALU32_IMM(BPF_RSH, value_reg, SKB_TSTAMP_TYPE_RSH);
>> +#else
>> + BUILD_BUG_ON(!(SKB_TSTAMP_TYPE_MASK & 0x1));
>> +#endif
>> + *insn++ = BPF_JMP32_IMM(BPF_JNE, value_reg, SKB_TSTAMP_TYPE_MASK, 1);
>> + /* Both the bits set then mark it BPF_SKB_TSTAMP_UNSPEC */
>> + *insn++ = BPF_MOV64_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
>
> The kernel should not have both bits set in skb->tstamp_type. No need to
> add two extra bpf insns to check this. If there is a bug in the kernel,
> it is better to be uncovered instead of hiding it under BPF_SKB_TSTAMP_UNSPEC (which
> is renamed to BPF_SKB_CLOCK_REALTIME anyway).
> Hence, the last two bpf insns should be removed.
>
Got it !.
>> return insn;
>> }
>> @@ -9430,6 +9436,7 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
>> __u8 value_reg = si->dst_reg;
>> __u8 skb_reg = si->src_reg;
>> +BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
>
> It is a dup of the one in bpf_convert_tstamp_type_read and can be removed.
>
Noted!
>> #ifdef CONFIG_NET_XGRESS
>> /* If the tstamp_type is read,
>> * the bpf prog is aware the tstamp could have delivery time.
>> @@ -9440,11 +9447,12 @@ static struct bpf_insn *bpf_convert_tstamp_read(const struct bpf_prog *prog,
>> __u8 tmp_reg = BPF_REG_AX;
>> *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg, SKB_BF_MONO_TC_OFFSET);
>> - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg,
>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK);
>> - *insn++ = BPF_JMP32_IMM(BPF_JNE, tmp_reg,
>> - TC_AT_INGRESS_MASK | SKB_MONO_DELIVERY_TIME_MASK, 2);
>> - /* skb->tc_at_ingress && skb->tstamp_type:1,
>> + /* check if ingress mask bits is set */
>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
>> + *insn++ = BPF_JMP_A(4);
>> + *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, SKB_TSTAMP_TYPE_MASK, 1);
>> + *insn++ = BPF_JMP_A(2);
>> + /* skb->tc_at_ingress && skb->tstamp_type:2,
>> * read 0 as the (rcv) timestamp.
>> */
>> *insn++ = BPF_MOV64_IMM(value_reg, 0);
>> @@ -9469,7 +9477,7 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
>> * the bpf prog is aware the tstamp could have delivery time.
>> * Thus, write skb->tstamp as is if tstamp_type_access is true.
>> * Otherwise, writing at ingress will have to clear the
>> - * mono_delivery_time (skb->tstamp_type:1)bit also.
>> + * mono_delivery_time (skb->tstamp_type:2)bit also.
>> */
>> if (!prog->tstamp_type_access) {
>> __u8 tmp_reg = BPF_REG_AX;
>> @@ -9479,8 +9487,8 @@ static struct bpf_insn *bpf_convert_tstamp_write(const struct bpf_prog *prog,
>> *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg, TC_AT_INGRESS_MASK, 1);
>> /* goto <store> */
>> *insn++ = BPF_JMP_A(2);
>> - /* <clear>: mono_delivery_time or (skb->tstamp_type:1) */
>> - *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_MONO_DELIVERY_TIME_MASK);
>> + /* <clear>: skb->tstamp_type:2 */
>> + *insn++ = BPF_ALU32_IMM(BPF_AND, tmp_reg, ~SKB_TSTAMP_TYPE_MASK);
>> *insn++ = BPF_STX_MEM(BPF_B, skb_reg, tmp_reg, SKB_BF_MONO_TC_OFFSET);
>> }
>> #endif
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index 591226dcde26..f195b31d6e75 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>> skb->mark = cork->mark;
>> - skb->tstamp = cork->transmit_time;
>> + skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
>
> hmm... I think this will break for tcp. This sequence in particular:
>
> tcp_v4_timewait_ack()
> tcp_v4_send_ack()
> ip_send_unicast_reply()
> ip_push_pending_frames()
> ip_finish_skb()
> __ip_make_skb()
> /* sk_clockid is REAL but cork->transmit_time should be in mono */
> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);;
>
> I think I hit it from time to time when running the test in this patch set.
>
do you think i need to check for protocol type here . since tcp uses Mono and the rest according to the new design is based on
sk->sk_clockid
if (iph->protocol == IPPROTO_TCP)
skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, CLOCK_MONOTONIC);
else
skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
> [ ... ]
>
>> diff --git a/tools/testing/selftests/bpf/progs/test_tc_dtime.c b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
>> index 74ec09f040b7..19dba6d88265 100644
>> --- a/tools/testing/selftests/bpf/progs/test_tc_dtime.c
>> +++ b/tools/testing/selftests/bpf/progs/test_tc_dtime.c
>
> Please separate the selftests/bpf changes into another patch.
>
I will do that.
>> @@ -227,6 +227,12 @@ int egress_host(struct __sk_buff *skb)
>> inc_dtimes(EGRESS_ENDHOST);
>> else
>> inc_errs(EGRESS_ENDHOST);
>> + } else if (skb_proto(skb_type) == IPPROTO_UDP) {
>> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI &&
>> + skb->tstamp)
>> + inc_dtimes(EGRESS_ENDHOST);
>> + else
>> + inc_errs(EGRESS_ENDHOST);
>> } else {
>> if (skb->tstamp_type == BPF_SKB_TSTAMP_UNSPEC &&
>> skb->tstamp)
>> @@ -255,6 +261,9 @@ int ingress_host(struct __sk_buff *skb)
>> if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO &&
>> skb->tstamp == EGRESS_FWDNS_MAGIC)
>> inc_dtimes(INGRESS_ENDHOST);
>> + else if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI &&
>> + skb->tstamp == EGRESS_FWDNS_MAGIC)
>> + inc_dtimes(INGRESS_ENDHOST);
>> else
>> inc_errs(INGRESS_ENDHOST);
>> @@ -323,12 +332,14 @@ int ingress_fwdns_prio101(struct __sk_buff *skb)
>> /* Should have handled in prio100 */
>> return TC_ACT_SHOT;
>> - if (skb_proto(skb_type) == IPPROTO_UDP)
>> + if (skb_proto(skb_type) == IPPROTO_UDP &&
>> + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI)
>> expected_dtime = 0;
>
> The IPPROTO_UDP check and expected_dtime can be removed. The UDP test
> can expect the same EGRESS_ENDHOST_MAGIC in the skb->tstamp since
> the TAI tstamp is also forwarded from egress to ingress now.
>
>> if (skb->tstamp_type) {
>> if (fwdns_clear_dtime() ||
>> - skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
>> + (skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO &&
>> + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI) ||
>> skb->tstamp != expected_dtime)
>> inc_errs(INGRESS_FWDNS_P101);
>> else
>> @@ -338,7 +349,8 @@ int ingress_fwdns_prio101(struct __sk_buff *skb)
>> inc_errs(INGRESS_FWDNS_P101);
>> }
>> - if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
>> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
>> + skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
>
> No need to check BPF_SKB_TSTAMP_DELIVERY_TAI such that the
> bpf_skb_set_tstamp() helper can still be tested.
>
> There are some other minor changes needed for the test_tc_dtime.c and
> the tc_redirect.c. I quickly made the changes and put them here (first patch):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/martin.lau/bpf-next.git/log/?h=skb.tstamp_type
>
Thanks for your help Martin. I will check the changes you have made in BPF test framework and see what i have missed.
>
>
>> skb->tstamp = INGRESS_FWDNS_MAGIC;
>> } else {
>> if (bpf_skb_set_tstamp(skb, INGRESS_FWDNS_MAGIC,
>> @@ -370,7 +382,8 @@ int egress_fwdns_prio101(struct __sk_buff *skb)
>> if (skb->tstamp_type) {
>> if (fwdns_clear_dtime() ||
>> - skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO ||
>> + (skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_MONO &&
>> + skb->tstamp_type != BPF_SKB_TSTAMP_DELIVERY_TAI) ||
>> skb->tstamp != INGRESS_FWDNS_MAGIC)
>> inc_errs(EGRESS_FWDNS_P101);
>> else
>> @@ -380,7 +393,8 @@ int egress_fwdns_prio101(struct __sk_buff *skb)
>> inc_errs(EGRESS_FWDNS_P101);
>> }
>> - if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO) {
>> + if (skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_MONO ||
>> + skb->tstamp_type == BPF_SKB_TSTAMP_DELIVERY_TAI) {
>> skb->tstamp = EGRESS_FWDNS_MAGIC;
>> } else {
>> if (bpf_skb_set_tstamp(skb, EGRESS_FWDNS_MAGIC,
>
On 4/24/24 3:20 PM, Abhishek Chauhan wrote:
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index a9e819115622..63e4cc30d18d 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
> if (iter.frag)
> ip6_fraglist_prepare(skb, &iter);
>
> - skb_set_delivery_time(skb, tstamp, tstamp_type);
> + skb_set_tstamp_type_frm_clkid(skb, tstamp, tstamp_type);
> err = output(net, sk, skb);
> if (!err)
> IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
> @@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
> /*
> * Put this fragment into the sending queue.
> */
> - skb_set_delivery_time(frag, tstamp, tstamp_type);
> + skb_set_tstamp_type_frm_clkid(frag, tstamp, tstamp_type);
> err = output(net, sk, frag);
> if (err)
> goto fail;
When replying another thread and looking closer at the ip6 changes, these two
line changes should not be needed.
On 4/26/24 11:46 AM, Abhishek Chauhan (ABC) wrote:
>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>> index 591226dcde26..f195b31d6e75 100644
>>> --- a/net/ipv4/ip_output.c
>>> +++ b/net/ipv4/ip_output.c
>>> @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>>> skb->mark = cork->mark;
>>> - skb->tstamp = cork->transmit_time;
>>> + skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
>> hmm... I think this will break for tcp. This sequence in particular:
>>
>> tcp_v4_timewait_ack()
>> tcp_v4_send_ack()
>> ip_send_unicast_reply()
>> ip_push_pending_frames()
>> ip_finish_skb()
>> __ip_make_skb()
>> /* sk_clockid is REAL but cork->transmit_time should be in mono */
>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);;
>>
>> I think I hit it from time to time when running the test in this patch set.
>>
> do you think i need to check for protocol type here . since tcp uses Mono and the rest according to the new design is based on
> sk->sk_clockid
> if (iph->protocol == IPPROTO_TCP)
> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, CLOCK_MONOTONIC);
> else
> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
Looks ok. iph->protocol is from sk->sk_protocol. I would defer to Willem input here.
There is at least one more place that needs this protocol check, __ip6_make_skb().
On 4/26/2024 4:50 PM, Martin KaFai Lau wrote:
> On 4/26/24 11:46 AM, Abhishek Chauhan (ABC) wrote:
>>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>>> index 591226dcde26..f195b31d6e75 100644
>>>> --- a/net/ipv4/ip_output.c
>>>> +++ b/net/ipv4/ip_output.c
>>>> @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>>>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>>>> skb->mark = cork->mark;
>>>> - skb->tstamp = cork->transmit_time;
>>>> + skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
>>> hmm... I think this will break for tcp. This sequence in particular:
>>>
>>> tcp_v4_timewait_ack()
>>> tcp_v4_send_ack()
>>> ip_send_unicast_reply()
>>> ip_push_pending_frames()
>>> ip_finish_skb()
>>> __ip_make_skb()
>>> /* sk_clockid is REAL but cork->transmit_time should be in mono */
>>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);;
>>>
>>> I think I hit it from time to time when running the test in this patch set.
>>>
>> do you think i need to check for protocol type here . since tcp uses Mono and the rest according to the new design is based on
>> sk->sk_clockid
>> if (iph->protocol == IPPROTO_TCP)
>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, CLOCK_MONOTONIC);
>> else
>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
>
> Looks ok. iph->protocol is from sk->sk_protocol. I would defer to Willem input here.
>
> There is at least one more place that needs this protocol check, __ip6_make_skb().
Sounds good. I will wait for Willem to comment here.
On 4/26/2024 4:55 PM, Martin KaFai Lau wrote:
> On 4/24/24 3:20 PM, Abhishek Chauhan wrote:
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index a9e819115622..63e4cc30d18d 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -955,7 +955,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>> if (iter.frag)
>> ip6_fraglist_prepare(skb, &iter);
>> - skb_set_delivery_time(skb, tstamp, tstamp_type);
>> + skb_set_tstamp_type_frm_clkid(skb, tstamp, tstamp_type);
>> err = output(net, sk, skb);
>> if (!err)
>> IP6_INC_STATS(net, ip6_dst_idev(&rt->dst),
>> @@ -1016,7 +1016,7 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>> /*
>> * Put this fragment into the sending queue.
>> */
>> - skb_set_delivery_time(frag, tstamp, tstamp_type);
>> + skb_set_tstamp_type_frm_clkid(frag, tstamp, tstamp_type);
>> err = output(net, sk, frag);
>> if (err)
>> goto fail;
>
> When replying another thread and looking closer at the ip6 changes, these two line changes should not be needed.
Similar code exists in ip_output.c for ipv4 packets => ip_do_fragment => I was thinking do we need
require that code or not.
Since in both functionality are the same, only difference is protocol.
From what i see is the for Frag cases in both ipv4 and ipv6, previously skb->tstamp_type was being
set for each fragments.
If we are planning to not do it for ip6 the same should follow for ip4 too.
Abhishek Chauhan (ABC) wrote:
>
>
> On 4/26/2024 4:50 PM, Martin KaFai Lau wrote:
> > On 4/26/24 11:46 AM, Abhishek Chauhan (ABC) wrote:
> >>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> >>>> index 591226dcde26..f195b31d6e75 100644
> >>>> --- a/net/ipv4/ip_output.c
> >>>> +++ b/net/ipv4/ip_output.c
> >>>> @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
> >>>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> >>>> skb->mark = cork->mark;
> >>>> - skb->tstamp = cork->transmit_time;
> >>>> + skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
> >>> hmm... I think this will break for tcp. This sequence in particular:
Good catch, thanks!
> >>>
> >>> tcp_v4_timewait_ack()
> >>> tcp_v4_send_ack()
> >>> ip_send_unicast_reply()
> >>> ip_push_pending_frames()
> >>> ip_finish_skb()
> >>> __ip_make_skb()
> >>> /* sk_clockid is REAL but cork->transmit_time should be in mono */
> >>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);;
> >>>
> >>> I think I hit it from time to time when running the test in this patch set.
> >>>
> >> do you think i need to check for protocol type here . since tcp uses Mono and the rest according to the new design is based on
> >> sk->sk_clockid
> >> if (iph->protocol == IPPROTO_TCP)
> >> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, CLOCK_MONOTONIC);
> >> else
> >> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
> >
> > Looks ok. iph->protocol is from sk->sk_protocol. I would defer to Willem input here.
> >
> > There is at least one more place that needs this protocol check, __ip6_make_skb().
>
> Sounds good. I will wait for Willem to comment here.
This would be sk_is_tcp(sk).
I think we want to avoid special casing if we can. Note the if.
If TCP always uses monotonic, we could consider initializing
sk_clockid to CLOCK_MONONOTIC in tcp_init_sock.
I guess TCP logic currently entirely ignores sk_clockid. If we are to
start using this, then setsocktop SO_TXTIME must explicitly fail or
ignore for TCP sockets, or silently skip the write.
All of that is more complexity. Than is maybe warranted for this one
case. So no objections from me to special casing using sk_is_tcp(sk)
either.
On 4/30/2024 1:26 PM, Willem de Bruijn wrote:
> Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 4/26/2024 4:50 PM, Martin KaFai Lau wrote:
>>> On 4/26/24 11:46 AM, Abhishek Chauhan (ABC) wrote:
>>>>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>>>>> index 591226dcde26..f195b31d6e75 100644
>>>>>> --- a/net/ipv4/ip_output.c
>>>>>> +++ b/net/ipv4/ip_output.c
>>>>>> @@ -1457,7 +1457,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>>>>>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>>>>>> skb->mark = cork->mark;
>>>>>> - skb->tstamp = cork->transmit_time;
>>>>>> + skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
>>>>> hmm... I think this will break for tcp. This sequence in particular:
>
> Good catch, thanks!
>
>>>>>
>>>>> tcp_v4_timewait_ack()
>>>>> tcp_v4_send_ack()
>>>>> ip_send_unicast_reply()
>>>>> ip_push_pending_frames()
>>>>> ip_finish_skb()
>>>>> __ip_make_skb()
>>>>> /* sk_clockid is REAL but cork->transmit_time should be in mono */
>>>>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);;
>>>>>
>>>>> I think I hit it from time to time when running the test in this patch set.
>>>>>
>>>> do you think i need to check for protocol type here . since tcp uses Mono and the rest according to the new design is based on
>>>> sk->sk_clockid
>>>> if (iph->protocol == IPPROTO_TCP)
>>>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, CLOCK_MONOTONIC);
>>>> else
>>>> skb_set_tstamp_type_frm_clkid(skb, cork->transmit_time, sk->sk_clockid);
>>>
>>> Looks ok. iph->protocol is from sk->sk_protocol. I would defer to Willem input here.
>>>
>>> There is at least one more place that needs this protocol check, __ip6_make_skb().
>>
>> Sounds good. I will wait for Willem to comment here.
>
> This would be sk_is_tcp(sk).
>
> I think we want to avoid special casing if we can. Note the if.
>
> If TCP always uses monotonic, we could consider initializing
> sk_clockid to CLOCK_MONONOTIC in tcp_init_sock.
>
> I guess TCP logic currently entirely ignores sk_clockid. If we are to
> start using this, then setsocktop SO_TXTIME must explicitly fail or
> ignore for TCP sockets, or silently skip the write.
>
> All of that is more complexity. Than is maybe warranted for this one
> case. So no objections from me to special casing using sk_is_tcp(sk)
> either.
>
>
Thanks Willem and Martin for all the support and reviews.
I will take care of this in the next RFC patch
- adding the sk_is_tcp check and setting tstamp_type to mono for tcp
>
>
> BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
> u64, tstamp, u32, tstamp_type)
> {
> /* ... */
> case BPF_SKB_CLOCK_TAI:
> if (!tstamp)
> return -EINVAL;
> skb->tstamp = tstamp;
> skb->tstamp_type = SKB_CLOCK_TAI;
> break;
> case BPF_SKB_CLOCK_REALTIME:
> skb->tstamp = tstamp;
> skb->tstamp_type = SKB_CLOCK_REALTIME;
> break;
>
> /* ... */
> }
>
>> return -EINVAL;
>
>> @@ -9388,17 +9394,17 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
>> {
>> __u8 value_reg = si->dst_reg;
>> __u8 skb_reg = si->src_reg;
>> - /* AX is needed because src_reg and dst_reg could be the same */
>> - __u8 tmp_reg = BPF_REG_AX;
>> -
>> - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
>> - SKB_BF_MONO_TC_OFFSET);
>> - *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
>> - SKB_MONO_DELIVERY_TIME_MASK, 2);
>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
>> - *insn++ = BPF_JMP_A(1);
>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
>> -
>> + BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
>
> Add these also:
>
> BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
> BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
> BUILD_BUG_ON(SKB_CLOCK_TAI != BPF_SKB_CLOCK_TAI);
>
Martin, The above suggestion of adding BUILD_BUG_ON always gives me a warning stating the following.
Some systems considers warning as error if compiler flags are enabled. I believe this requires your suggestion before i raise RFC v6 patchset to either keep the
BUILD_BUG_ON or remove it completely.
/local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9395:34: warning: comparison between ‘enum skb_tstamp_type’ and ‘enum <anonymous>’ [-Wenum-compare]
9395 | BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
| ^~
/local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:451:9: note: in definition of macro ‘__compiletime_assert’
451 | if (!(condition)) \
| ^~~~~~~~~
/local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:471:2: note: in expansion of macro ‘_compiletime_assert’
471 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
/local/mnt/workspace/kernel_master/linux-next/include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^~~~~~~~~~~~~~~~~~
/local/mnt/workspace/kernel_master/linux-next/include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
| ^~~~~~~~~~~~~~~~
/local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9395:2: note: in expansion of macro ‘BUILD_BUG_ON’
9395 | BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
| ^~~~~~~~~~~~
/local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9396:35: warning: comparison between ‘enum skb_tstamp_type’ and ‘enum <anonymous>’ [-Wenum-compare]
9396 | BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
| ^~
/local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:451:9: note: in definition of macro ‘__compiletime_assert’
451 | if (!(condition)) \
| ^~~~~~~~~
| ^~
On 5/3/24 2:33 PM, Abhishek Chauhan (ABC) wrote:
>
>> BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
>> u64, tstamp, u32, tstamp_type)
>> {
>> /* ... */
>> case BPF_SKB_CLOCK_TAI:
>> if (!tstamp)
>> return -EINVAL;
>> skb->tstamp = tstamp;
>> skb->tstamp_type = SKB_CLOCK_TAI;
>> break;
>> case BPF_SKB_CLOCK_REALTIME:
>> skb->tstamp = tstamp;
>> skb->tstamp_type = SKB_CLOCK_REALTIME;
>> break;
>>
>> /* ... */
>> }
>>
>>> return -EINVAL;
>>
>>> @@ -9388,17 +9394,17 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
>>> {
>>> __u8 value_reg = si->dst_reg;
>>> __u8 skb_reg = si->src_reg;
>>> - /* AX is needed because src_reg and dst_reg could be the same */
>>> - __u8 tmp_reg = BPF_REG_AX;
>>> -
>>> - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
>>> - SKB_BF_MONO_TC_OFFSET);
>>> - *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
>>> - SKB_MONO_DELIVERY_TIME_MASK, 2);
>>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
>>> - *insn++ = BPF_JMP_A(1);
>>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
>>> -
>>> + BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
>>
>> Add these also:
>>
>> BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
>> BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
>> BUILD_BUG_ON(SKB_CLOCK_TAI != BPF_SKB_CLOCK_TAI);
>>
>
> Martin, The above suggestion of adding BUILD_BUG_ON always gives me a warning stating the following.
>
> Some systems considers warning as error if compiler flags are enabled. I believe this requires your suggestion before i raise RFC v6 patchset to either keep the
> BUILD_BUG_ON or remove it completely.
cast it?
>
> /local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9395:34: warning: comparison between ‘enum skb_tstamp_type’ and ‘enum <anonymous>’ [-Wenum-compare]
> 9395 | BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
> | ^~
> /local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:451:9: note: in definition of macro ‘__compiletime_assert’
> 451 | if (!(condition)) \
> | ^~~~~~~~~
> /local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:471:2: note: in expansion of macro ‘_compiletime_assert’
> 471 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^~~~~~~~~~~~~~~~~~~
> /local/mnt/workspace/kernel_master/linux-next/include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
> 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> | ^~~~~~~~~~~~~~~~~~
> /local/mnt/workspace/kernel_master/linux-next/include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
> 50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
> | ^~~~~~~~~~~~~~~~
> /local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9395:2: note: in expansion of macro ‘BUILD_BUG_ON’
> 9395 | BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
> | ^~~~~~~~~~~~
> /local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9396:35: warning: comparison between ‘enum skb_tstamp_type’ and ‘enum <anonymous>’ [-Wenum-compare]
> 9396 | BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
> | ^~
> /local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:451:9: note: in definition of macro ‘__compiletime_assert’
> 451 | if (!(condition)) \
> | ^~~~~~~~~
>
> | ^~
>
>
>
On 5/3/2024 2:41 PM, Martin KaFai Lau wrote:
> On 5/3/24 2:33 PM, Abhishek Chauhan (ABC) wrote:
>>
>>> BPF_CALL_3(bpf_skb_set_tstamp, struct sk_buff *, skb,
>>> u64, tstamp, u32, tstamp_type)
>>> {
>>> /* ... */
>>> case BPF_SKB_CLOCK_TAI:
>>> if (!tstamp)
>>> return -EINVAL;
>>> skb->tstamp = tstamp;
>>> skb->tstamp_type = SKB_CLOCK_TAI;
>>> break;
>>> case BPF_SKB_CLOCK_REALTIME:
>>> skb->tstamp = tstamp;
>>> skb->tstamp_type = SKB_CLOCK_REALTIME;
>>> break;
>>>
>>> /* ... */
>>> }
>>>
>>>> return -EINVAL;
>>>
>>>> @@ -9388,17 +9394,17 @@ static struct bpf_insn *bpf_convert_tstamp_type_read(const struct bpf_insn *si,
>>>> {
>>>> __u8 value_reg = si->dst_reg;
>>>> __u8 skb_reg = si->src_reg;
>>>> - /* AX is needed because src_reg and dst_reg could be the same */
>>>> - __u8 tmp_reg = BPF_REG_AX;
>>>> -
>>>> - *insn++ = BPF_LDX_MEM(BPF_B, tmp_reg, skb_reg,
>>>> - SKB_BF_MONO_TC_OFFSET);
>>>> - *insn++ = BPF_JMP32_IMM(BPF_JSET, tmp_reg,
>>>> - SKB_MONO_DELIVERY_TIME_MASK, 2);
>>>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_UNSPEC);
>>>> - *insn++ = BPF_JMP_A(1);
>>>> - *insn++ = BPF_MOV32_IMM(value_reg, BPF_SKB_TSTAMP_DELIVERY_MONO);
>>>> -
>>>> + BUILD_BUG_ON(__SKB_CLOCK_MAX != BPF_SKB_TSTAMP_DELIVERY_TAI);
>>>
>>> Add these also:
>>>
>>> BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
>>> BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
>>> BUILD_BUG_ON(SKB_CLOCK_TAI != BPF_SKB_CLOCK_TAI);
>>>
>>
>> Martin, The above suggestion of adding BUILD_BUG_ON always gives me a warning stating the following.
>>
>> Some systems considers warning as error if compiler flags are enabled. I believe this requires your suggestion before i raise RFC v6 patchset to either keep the
>> BUILD_BUG_ON or remove it completely.
>
> cast it?
>
Thanks Martin. Will do the same. Casting worked for me!.
>>
>> /local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9395:34: warning: comparison between ‘enum skb_tstamp_type’ and ‘enum <anonymous>’ [-Wenum-compare]
>> 9395 | BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
>> | ^~
>> /local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:451:9: note: in definition of macro ‘__compiletime_assert’
>> 451 | if (!(condition)) \
>> | ^~~~~~~~~
>> /local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:471:2: note: in expansion of macro ‘_compiletime_assert’
>> 471 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
>> | ^~~~~~~~~~~~~~~~~~~
>> /local/mnt/workspace/kernel_master/linux-next/include/linux/build_bug.h:39:37: note: in expansion of macro ‘compiletime_assert’
>> 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
>> | ^~~~~~~~~~~~~~~~~~
>> /local/mnt/workspace/kernel_master/linux-next/include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
>> 50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
>> | ^~~~~~~~~~~~~~~~
>> /local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9395:2: note: in expansion of macro ‘BUILD_BUG_ON’
>> 9395 | BUILD_BUG_ON(SKB_CLOCK_REALTIME != BPF_SKB_CLOCK_REALTIME);
>> | ^~~~~~~~~~~~
>> /local/mnt/workspace/kernel_master/linux-next/net/core/filter.c:9396:35: warning: comparison between ‘enum skb_tstamp_type’ and ‘enum <anonymous>’ [-Wenum-compare]
>> 9396 | BUILD_BUG_ON(SKB_CLOCK_MONOTONIC != BPF_SKB_CLOCK_MONOTONIC);
>> | ^~
>> /local/mnt/workspace/kernel_master/linux-next/include/linux/compiler_types.h:451:9: note: in definition of macro ‘__compiletime_assert’
>> 451 | if (!(condition)) \
>> | ^~~~~~~~~
>>
>> | ^~
>>
>>
>>
>