Bridge driver today has no support to forward the userspace timestamp
packets and ends up resetting the timestamp. ETF qdisc checks the
packet coming from userspace and encounters to be 0 thereby dropping
time sensitive packets. These changes will allow userspace timestamps
packets to be forwarded from the bridge to NIC drivers.
Signed-off-by: Abhishek Chauhan <[email protected]>
---
Note:- I am a little skeptical of using bool inside the skbuff
structure as no one today has used bool so far in the struct.
(Expecting some comments from upstream for sure)
I am also touching the heart of sk buff so i hope this is reviewed
thoroughly. I have crossed checked multiple times on all the ipv4
/ipv6 paths where userspace timestamp is populated. I tried as much
as possible to cover all the references and made sure i put my changes
in place.
Bug description:- If the physical network interface is bridged the
etf packets are dropped since bridge driver before forwarding the packet
is setting the userspace timestamp to 0.
Bridge driver call stack
[ 157.120189] now is set to 1706054553072734733
[ 157.120194] tx time from SKB is 0 <== SKB when reaches the etf qdisc is 0
[ 157.120195] q->last time is 0
[ 157.120197] CPU: 3 PID: 9206 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
[ 157.120201] Hardware name: Qualcomm SA8775P Ride (DT)
[ 157.120202] Call trace:
[ 157.120203] dump_backtrace+0xb0/0x130
[ 157.120212] show_stack+0x1c/0x30
[ 157.120215] dump_stack_lvl+0x74/0x8c
[ 157.120220] dump_stack+0x14/0x24
[ 157.120223] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
[ 157.120230] dev_qdisc_enqueue+0x2c/0x110
[ 157.120234] __dev_xmit_skb+0x114/0x644
[ 157.120236] __dev_queue_xmit+0x31c/0x774
[ 157.120238] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
[ 157.120253] br_forward_finish+0xdc/0xec [bridge] <== This function is culprit as its making the tstamp as 0
[root@ecbldauto-lvarm04-lnx ~]# [ 157.120263] __br_forward+0xd8/0x210 [bridge]
[ 157.120272] br_forward+0x12c/0x150 [bridge]
[ 157.120281] br_dev_xmit+0x288/0x49c [bridge]
[ 157.120290] dev_hard_start_xmit+0xe4/0x2b4
[ 157.120292] __dev_queue_xmit+0x6ac/0x774
[ 157.120294] neigh_resolve_output+0x128/0x1ec
[ 157.120297] ip_finish_output2+0x184/0x54c
[ 157.120300] __ip_finish_output+0xa4/0x19c
[ 157.120302] ip_finish_output+0x38/0xf0
[ 157.120303] ip_output+0x13c/0x1f4
[ 157.120305] ip_send_skb+0x54/0x10c
[ 157.120307] udp_send_skb+0x128/0x394
[ 157.120310] udp_sendmsg+0x7e8/0xa6c
[ 157.120311] inet_sendmsg+0x48/0x70
[ 157.120313] sock_sendmsg+0x54/0x60
[ 157.120315] ____sys_sendmsg+0x1f8/0x254
[ 157.120316] ___sys_sendmsg+0x84/0xcc
[ 157.120318] __sys_sendmsg+0x60/0xb0
[ 157.120319] __arm64_sys_sendmsg+0x28/0x30
[ 157.120320] invoke_syscall.constprop.0+0x7c/0xd0
[ 157.120323] el0_svc_common.constprop.0+0x140/0x150
[ 157.120325] do_el0_svc+0x38/0xa0
[ 157.120327] el0_svc+0x38/0x1d0
[ 157.120329] el0t_64_sync_handler+0xb4/0x130
[ 157.120330] el0t_64_sync+0x17c/0x180
After my changes:-
[ 2215.130148] now is set to 1706056610952501031
[ 2215.130154] tx time from SKB is 1706056610953467393 <== Time is forwarded to etf correctly
[ 2215.130155] q->last time is 1706056591423364609
[ 2215.130158] CPU: 1 PID: 108166 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
[ 2215.130162] Hardware name: Qualcomm SA8775P Ride (DT) [ 2215.130163] Call trace:
[ 2215.130164] dump_backtrace+0xb0/0x130
[ 2215.130172] show_stack+0x1c/0x30 [root@ecbldauto-lvarm04-lnx ~]#
[ 2215.130175] dump_stack_lvl+0x74/0x8c [ 2215.130181] dump_stack+0x14/0x24
[ 2215.130184] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
[ 2215.130191] dev_qdisc_enqueue+0x2c/0x110
[ 2215.130197] __dev_xmit_skb+0x114/0x644
[ 2215.130200] __dev_queue_xmit+0x31c/0x774
[ 2215.130202] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
[ 2215.130217] br_forward_finish+0xe4/0xf0 [bridge]
[ 2215.130226] __br_forward+0xd8/0x20c [bridge]
[ 2215.130235] br_forward+0x12c/0x150 [bridge]
[ 2215.130243] br_dev_xmit+0x288/0x49c [bridge]
[ 2215.130252] dev_hard_start_xmit+0xe4/0x2b4
[ 2215.130254] __dev_queue_xmit+0x6ac/0x774
[ 2215.130257] neigh_hh_output+0xcc/0x140
[ 2215.130260] ip_finish_output2+0x300/0x54c
[ 2215.130262] __ip_finish_output+0xa4/0x19c
[ 2215.130263] ip_finish_output+0x38/0xf0
[ 2215.130265] ip_output+0x13c/0x1f4
[ 2215.130267] ip_send_skb+0x54/0x110
[ 2215.130269] udp_send_skb+0x128/0x394
[ 2215.130271] udp_sendmsg+0x7e8/0xa6c
[ 2215.130272] inet_sendmsg+0x48/0x70
[ 2215.130275] sock_sendmsg+0x54/0x60
[ 2215.130277] ____sys_sendmsg+0x1f8/0x254
[ 2215.130278] ___sys_sendmsg+0x84/0xcc
[ 2215.130279] __sys_sendmsg+0x60/0xb0
[ 2215.130281] __arm64_sys_sendmsg+0x28/0x30
[ 2215.130282] invoke_syscall.constprop.0+0x7c/0xd0
[ 2215.130285] el0_svc_common.constprop.0+0x140/0x150
[ 2215.130287] do_el0_svc+0x38/0xa0
[ 2215.130289] el0_svc+0x38/0x1d0
[ 2215.130291] el0t_64_sync_handler+0xb4/0x130
[ 2215.130292] el0t_64_sync+0x17c/0x180
include/linux/skbuff.h | 13 +++++++++++++
include/net/inet_sock.h | 1 +
include/net/sock.h | 1 +
net/core/sock.c | 1 +
net/ipv4/ip_output.c | 3 +++
net/ipv4/raw.c | 1 +
net/ipv6/ip6_output.c | 2 ++
net/ipv6/raw.c | 1 +
net/packet/af_packet.c | 3 +++
9 files changed, 26 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2dde34c29203..b098b7d30b56 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -744,6 +744,7 @@ typedef unsigned char *sk_buff_data_t;
* @tstamp: Time we arrived/left
* @skb_mstamp_ns: (aka @tstamp) earliest departure time; start point
* for retransmit timer
+ * @user_delivery_time: states that timestamp was populated from userspace
* @rbnode: RB tree node, alternative to next/prev for netem/tcp
* @list: queue head
* @ll_node: anchor in an llist (eg socket defer_list)
@@ -879,6 +880,8 @@ struct sk_buff {
ktime_t tstamp;
u64 skb_mstamp_ns; /* earliest departure time */
};
+ /* States that time is from userspace */
+ bool user_delivery_time;
/*
* This is the control buffer. It is free to use for every
* layer. Please put your private variables there. If you
@@ -4208,6 +4211,16 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
if (skb->mono_delivery_time)
return;
+ /* When userspace timestamp packets are forwarded via bridge
+ * the br_forward_finish clears the tstamp and the tstamp
+ * from the userspace is lost. Hence the check for user
+ * delivery time. With the below check now tc-etf qdisc will
+ * not end up dropping the packets if the packet is forwarded via
+ * bridge interface.
+ */
+ if (skb->user_delivery_time)
+ return;
+
skb->tstamp = 0;
}
diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index d94c242eb3ed..e7523545a493 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -175,6 +175,7 @@ struct inet_cork {
__u16 gso_size;
u64 transmit_time;
u32 mark;
+ bool user_delivery_time;
};
struct inet_cork_full {
diff --git a/include/net/sock.h b/include/net/sock.h
index a9d99a9c583f..d54af99129a3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1871,6 +1871,7 @@ struct sockcm_cookie {
u64 transmit_time;
u32 mark;
u32 tsflags;
+ bool user_delivery_time;
};
static inline void sockcm_init(struct sockcm_cookie *sockc,
diff --git a/net/core/sock.c b/net/core/sock.c
index 88bf810394a5..6cadedce7036 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2837,6 +2837,7 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64)))
return -EINVAL;
sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
+ sockc->user_delivery_time = true;
break;
/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
case SCM_RIGHTS:
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 5b5a0adb927f..61b59e41530e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1321,6 +1321,7 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
cork->mark = ipc->sockc.mark;
cork->priority = ipc->priority;
cork->transmit_time = ipc->sockc.transmit_time;
+ cork->user_delivery_time = ipc->sockc.user_delivery_time;
cork->tx_flags = 0;
sock_tx_timestamp(sk, ipc->sockc.tsflags, &cork->tx_flags);
@@ -1455,6 +1456,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
skb->mark = cork->mark;
skb->tstamp = cork->transmit_time;
+ skb->user_delivery_time = cork->user_delivery_time;
+
/*
* Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
* on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index aea89326c697..f7af2edcd4d5 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -353,6 +353,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
skb->tstamp = sockc->transmit_time;
+ skb->user_delivery_time = sockc->user_delivery_time;
skb_dst_set(skb, &rt->dst);
*rtp = NULL;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a722a43dd668..98aa180cee9e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1396,6 +1396,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
cork->base.length = 0;
cork->base.transmit_time = ipc6->sockc.transmit_time;
+ cork->base.user_delivery_time = ipc6->sockc.user_delivery_time;
return 0;
}
@@ -1922,6 +1923,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = cork->base.mark;
skb->tstamp = cork->base.transmit_time;
+ skb->user_delivery_time = cork->base.user_delivery_time;
ip6_cork_steal_dst(skb, cork);
IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 03dbb874c363..07f8013df556 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -616,6 +616,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
skb->tstamp = sockc->transmit_time;
+ skb->user_delivery_time = sockc->user_delivery_time;
skb_put(skb, length);
skb_reset_network_header(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9bbc2686690..6f119cb4addf 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2057,6 +2057,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = READ_ONCE(sk->sk_mark);
skb->tstamp = sockc.transmit_time;
+ skb->user_delivery_time = sockc.user_delivery_time;
skb_setup_tx_timestamp(skb, sockc.tsflags);
@@ -2586,6 +2587,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
skb->priority = READ_ONCE(po->sk.sk_priority);
skb->mark = READ_ONCE(po->sk.sk_mark);
skb->tstamp = sockc->transmit_time;
+ skb->user_delivery_time = sockc->user_delivery_time;
skb_setup_tx_timestamp(skb, sockc->tsflags);
skb_zcopy_set_nouarg(skb, ph.raw);
@@ -3064,6 +3066,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc.mark;
skb->tstamp = sockc.transmit_time;
+ skb->user_delivery_time = sockc.user_delivery_time;
if (unlikely(extra_len == 4))
skb->no_fcs = 1;
--
2.25.1
On 2/15/2024 1:56 PM, Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
>
> Signed-off-by: Abhishek Chauhan <[email protected]>
> ---
> Note:- I am a little skeptical of using bool inside the skbuff
> structure as no one today has used bool so far in the struct.
> (Expecting some comments from upstream for sure)
there are a large number of u8 <flag>:1 bitfields -- suggest you do the same
Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
>
> Signed-off-by: Abhishek Chauhan <[email protected]>
> ---
> Note:- I am a little skeptical of using bool inside the skbuff
> structure as no one today has used bool so far in the struct.
> (Expecting some comments from upstream for sure)
>
> I am also touching the heart of sk buff so i hope this is reviewed
> thoroughly. I have crossed checked multiple times on all the ipv4
> /ipv6 paths where userspace timestamp is populated. I tried as much
> as possible to cover all the references and made sure i put my changes
> in place.
>
> Bug description:- If the physical network interface is bridged the
> etf packets are dropped since bridge driver before forwarding the packet
> is setting the userspace timestamp to 0.
>
> Bridge driver call stack
>
> [ 157.120189] now is set to 1706054553072734733
> [ 157.120194] tx time from SKB is 0 <== SKB when reaches the etf qdisc is 0
> [ 157.120195] q->last time is 0
> [ 157.120197] CPU: 3 PID: 9206 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> [ 157.120201] Hardware name: Qualcomm SA8775P Ride (DT)
> [ 157.120202] Call trace:
> [ 157.120203] dump_backtrace+0xb0/0x130
> [ 157.120212] show_stack+0x1c/0x30
> [ 157.120215] dump_stack_lvl+0x74/0x8c
> [ 157.120220] dump_stack+0x14/0x24
> [ 157.120223] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> [ 157.120230] dev_qdisc_enqueue+0x2c/0x110
> [ 157.120234] __dev_xmit_skb+0x114/0x644
> [ 157.120236] __dev_queue_xmit+0x31c/0x774
> [ 157.120238] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> [ 157.120253] br_forward_finish+0xdc/0xec [bridge] <== This function is culprit as its making the tstamp as 0
> [root@ecbldauto-lvarm04-lnx ~]# [ 157.120263] __br_forward+0xd8/0x210 [bridge]
> [ 157.120272] br_forward+0x12c/0x150 [bridge]
> [ 157.120281] br_dev_xmit+0x288/0x49c [bridge]
> [ 157.120290] dev_hard_start_xmit+0xe4/0x2b4
> [ 157.120292] __dev_queue_xmit+0x6ac/0x774
> [ 157.120294] neigh_resolve_output+0x128/0x1ec
> [ 157.120297] ip_finish_output2+0x184/0x54c
> [ 157.120300] __ip_finish_output+0xa4/0x19c
> [ 157.120302] ip_finish_output+0x38/0xf0
> [ 157.120303] ip_output+0x13c/0x1f4
> [ 157.120305] ip_send_skb+0x54/0x10c
> [ 157.120307] udp_send_skb+0x128/0x394
> [ 157.120310] udp_sendmsg+0x7e8/0xa6c
> [ 157.120311] inet_sendmsg+0x48/0x70
> [ 157.120313] sock_sendmsg+0x54/0x60
> [ 157.120315] ____sys_sendmsg+0x1f8/0x254
> [ 157.120316] ___sys_sendmsg+0x84/0xcc
> [ 157.120318] __sys_sendmsg+0x60/0xb0
> [ 157.120319] __arm64_sys_sendmsg+0x28/0x30
> [ 157.120320] invoke_syscall.constprop.0+0x7c/0xd0
> [ 157.120323] el0_svc_common.constprop.0+0x140/0x150
> [ 157.120325] do_el0_svc+0x38/0xa0
> [ 157.120327] el0_svc+0x38/0x1d0
> [ 157.120329] el0t_64_sync_handler+0xb4/0x130
> [ 157.120330] el0t_64_sync+0x17c/0x180
>
> After my changes:-
> [ 2215.130148] now is set to 1706056610952501031
> [ 2215.130154] tx time from SKB is 1706056610953467393 <== Time is forwarded to etf correctly
> [ 2215.130155] q->last time is 1706056591423364609
> [ 2215.130158] CPU: 1 PID: 108166 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> [ 2215.130162] Hardware name: Qualcomm SA8775P Ride (DT) [ 2215.130163] Call trace:
> [ 2215.130164] dump_backtrace+0xb0/0x130
> [ 2215.130172] show_stack+0x1c/0x30 [root@ecbldauto-lvarm04-lnx ~]#
> [ 2215.130175] dump_stack_lvl+0x74/0x8c [ 2215.130181] dump_stack+0x14/0x24
> [ 2215.130184] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> [ 2215.130191] dev_qdisc_enqueue+0x2c/0x110
> [ 2215.130197] __dev_xmit_skb+0x114/0x644
> [ 2215.130200] __dev_queue_xmit+0x31c/0x774
> [ 2215.130202] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> [ 2215.130217] br_forward_finish+0xe4/0xf0 [bridge]
> [ 2215.130226] __br_forward+0xd8/0x20c [bridge]
> [ 2215.130235] br_forward+0x12c/0x150 [bridge]
> [ 2215.130243] br_dev_xmit+0x288/0x49c [bridge]
> [ 2215.130252] dev_hard_start_xmit+0xe4/0x2b4
> [ 2215.130254] __dev_queue_xmit+0x6ac/0x774
> [ 2215.130257] neigh_hh_output+0xcc/0x140
> [ 2215.130260] ip_finish_output2+0x300/0x54c
> [ 2215.130262] __ip_finish_output+0xa4/0x19c
> [ 2215.130263] ip_finish_output+0x38/0xf0
> [ 2215.130265] ip_output+0x13c/0x1f4
> [ 2215.130267] ip_send_skb+0x54/0x110
> [ 2215.130269] udp_send_skb+0x128/0x394
> [ 2215.130271] udp_sendmsg+0x7e8/0xa6c
> [ 2215.130272] inet_sendmsg+0x48/0x70
> [ 2215.130275] sock_sendmsg+0x54/0x60
> [ 2215.130277] ____sys_sendmsg+0x1f8/0x254
> [ 2215.130278] ___sys_sendmsg+0x84/0xcc
> [ 2215.130279] __sys_sendmsg+0x60/0xb0
> [ 2215.130281] __arm64_sys_sendmsg+0x28/0x30
> [ 2215.130282] invoke_syscall.constprop.0+0x7c/0xd0
> [ 2215.130285] el0_svc_common.constprop.0+0x140/0x150
> [ 2215.130287] do_el0_svc+0x38/0xa0
> [ 2215.130289] el0_svc+0x38/0x1d0
> [ 2215.130291] el0t_64_sync_handler+0xb4/0x130
> [ 2215.130292] el0t_64_sync+0x17c/0x180
>
>
> include/linux/skbuff.h | 13 +++++++++++++
> include/net/inet_sock.h | 1 +
> include/net/sock.h | 1 +
> net/core/sock.c | 1 +
> net/ipv4/ip_output.c | 3 +++
> net/ipv4/raw.c | 1 +
> net/ipv6/ip6_output.c | 2 ++
> net/ipv6/raw.c | 1 +
> net/packet/af_packet.c | 3 +++
> 9 files changed, 26 insertions(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2dde34c29203..b098b7d30b56 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -744,6 +744,7 @@ typedef unsigned char *sk_buff_data_t;
> * @tstamp: Time we arrived/left
> * @skb_mstamp_ns: (aka @tstamp) earliest departure time; start point
> * for retransmit timer
> + * @user_delivery_time: states that timestamp was populated from userspace
> * @rbnode: RB tree node, alternative to next/prev for netem/tcp
> * @list: queue head
> * @ll_node: anchor in an llist (eg socket defer_list)
> @@ -879,6 +880,8 @@ struct sk_buff {
> ktime_t tstamp;
> u64 skb_mstamp_ns; /* earliest departure time */
> };
> + /* States that time is from userspace */
> + bool user_delivery_time;
> /*
> * This is the control buffer. It is free to use for every
> * layer. Please put your private variables there. If you
> @@ -4208,6 +4211,16 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
> if (skb->mono_delivery_time)
> return;
>
> + /* When userspace timestamp packets are forwarded via bridge
> + * the br_forward_finish clears the tstamp and the tstamp
> + * from the userspace is lost. Hence the check for user
> + * delivery time. With the below check now tc-etf qdisc will
> + * not end up dropping the packets if the packet is forwarded via
> + * bridge interface.
> + */
> + if (skb->user_delivery_time)
> + return;
> +
> skb->tstamp = 0;
> }
>
> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> index d94c242eb3ed..e7523545a493 100644
> --- a/include/net/inet_sock.h
> +++ b/include/net/inet_sock.h
> @@ -175,6 +175,7 @@ struct inet_cork {
> __u16 gso_size;
> u64 transmit_time;
> u32 mark;
> + bool user_delivery_time;
> };
There's no need for a cork member, as by definition the fields in this
struct are coming from userspace.
There is a very high bar to adding new fields to sk_buff, because it
is used by many paths and would be enormous if stuck with fields for
every intersection between a pair of features.
The goal here is for the bridge to disambiguate earliest delivery time
timestamps from which? From those looped through ip forwarding? Why
does the bridge zero the tstamp field at all? That might help finding
a reasonable implementation.
We have run in the issue of labeling the value of skb->tstamp before.
With redirect and looping it is definitely subtle.
On 2/16/2024 10:11 AM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> Bridge driver today has no support to forward the userspace timestamp
>> packets and ends up resetting the timestamp. ETF qdisc checks the
>> packet coming from userspace and encounters to be 0 thereby dropping
>> time sensitive packets. These changes will allow userspace timestamps
>> packets to be forwarded from the bridge to NIC drivers.
>>
>> Signed-off-by: Abhishek Chauhan <[email protected]>
>> ---
>> Note:- I am a little skeptical of using bool inside the skbuff
>> structure as no one today has used bool so far in the struct.
>> (Expecting some comments from upstream for sure)
>>
>> I am also touching the heart of sk buff so i hope this is reviewed
>> thoroughly. I have crossed checked multiple times on all the ipv4
>> /ipv6 paths where userspace timestamp is populated. I tried as much
>> as possible to cover all the references and made sure i put my changes
>> in place.
>>
>> Bug description:- If the physical network interface is bridged the
>> etf packets are dropped since bridge driver before forwarding the packet
>> is setting the userspace timestamp to 0.
>>
>> Bridge driver call stack
>>
>> [ 157.120189] now is set to 1706054553072734733
>> [ 157.120194] tx time from SKB is 0 <== SKB when reaches the etf qdisc is 0
>> [ 157.120195] q->last time is 0
>> [ 157.120197] CPU: 3 PID: 9206 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
>> [ 157.120201] Hardware name: Qualcomm SA8775P Ride (DT)
>> [ 157.120202] Call trace:
>> [ 157.120203] dump_backtrace+0xb0/0x130
>> [ 157.120212] show_stack+0x1c/0x30
>> [ 157.120215] dump_stack_lvl+0x74/0x8c
>> [ 157.120220] dump_stack+0x14/0x24
>> [ 157.120223] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
>> [ 157.120230] dev_qdisc_enqueue+0x2c/0x110
>> [ 157.120234] __dev_xmit_skb+0x114/0x644
>> [ 157.120236] __dev_queue_xmit+0x31c/0x774
>> [ 157.120238] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
>> [ 157.120253] br_forward_finish+0xdc/0xec [bridge] <== This function is culprit as its making the tstamp as 0
>> [root@ecbldauto-lvarm04-lnx ~]# [ 157.120263] __br_forward+0xd8/0x210 [bridge]
>> [ 157.120272] br_forward+0x12c/0x150 [bridge]
>> [ 157.120281] br_dev_xmit+0x288/0x49c [bridge]
>> [ 157.120290] dev_hard_start_xmit+0xe4/0x2b4
>> [ 157.120292] __dev_queue_xmit+0x6ac/0x774
>> [ 157.120294] neigh_resolve_output+0x128/0x1ec
>> [ 157.120297] ip_finish_output2+0x184/0x54c
>> [ 157.120300] __ip_finish_output+0xa4/0x19c
>> [ 157.120302] ip_finish_output+0x38/0xf0
>> [ 157.120303] ip_output+0x13c/0x1f4
>> [ 157.120305] ip_send_skb+0x54/0x10c
>> [ 157.120307] udp_send_skb+0x128/0x394
>> [ 157.120310] udp_sendmsg+0x7e8/0xa6c
>> [ 157.120311] inet_sendmsg+0x48/0x70
>> [ 157.120313] sock_sendmsg+0x54/0x60
>> [ 157.120315] ____sys_sendmsg+0x1f8/0x254
>> [ 157.120316] ___sys_sendmsg+0x84/0xcc
>> [ 157.120318] __sys_sendmsg+0x60/0xb0
>> [ 157.120319] __arm64_sys_sendmsg+0x28/0x30
>> [ 157.120320] invoke_syscall.constprop.0+0x7c/0xd0
>> [ 157.120323] el0_svc_common.constprop.0+0x140/0x150
>> [ 157.120325] do_el0_svc+0x38/0xa0
>> [ 157.120327] el0_svc+0x38/0x1d0
>> [ 157.120329] el0t_64_sync_handler+0xb4/0x130
>> [ 157.120330] el0t_64_sync+0x17c/0x180
>>
>> After my changes:-
>> [ 2215.130148] now is set to 1706056610952501031
>> [ 2215.130154] tx time from SKB is 1706056610953467393 <== Time is forwarded to etf correctly
>> [ 2215.130155] q->last time is 1706056591423364609
>> [ 2215.130158] CPU: 1 PID: 108166 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
>> [ 2215.130162] Hardware name: Qualcomm SA8775P Ride (DT) [ 2215.130163] Call trace:
>> [ 2215.130164] dump_backtrace+0xb0/0x130
>> [ 2215.130172] show_stack+0x1c/0x30 [root@ecbldauto-lvarm04-lnx ~]#
>> [ 2215.130175] dump_stack_lvl+0x74/0x8c [ 2215.130181] dump_stack+0x14/0x24
>> [ 2215.130184] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
>> [ 2215.130191] dev_qdisc_enqueue+0x2c/0x110
>> [ 2215.130197] __dev_xmit_skb+0x114/0x644
>> [ 2215.130200] __dev_queue_xmit+0x31c/0x774
>> [ 2215.130202] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
>> [ 2215.130217] br_forward_finish+0xe4/0xf0 [bridge]
>> [ 2215.130226] __br_forward+0xd8/0x20c [bridge]
>> [ 2215.130235] br_forward+0x12c/0x150 [bridge]
>> [ 2215.130243] br_dev_xmit+0x288/0x49c [bridge]
>> [ 2215.130252] dev_hard_start_xmit+0xe4/0x2b4
>> [ 2215.130254] __dev_queue_xmit+0x6ac/0x774
>> [ 2215.130257] neigh_hh_output+0xcc/0x140
>> [ 2215.130260] ip_finish_output2+0x300/0x54c
>> [ 2215.130262] __ip_finish_output+0xa4/0x19c
>> [ 2215.130263] ip_finish_output+0x38/0xf0
>> [ 2215.130265] ip_output+0x13c/0x1f4
>> [ 2215.130267] ip_send_skb+0x54/0x110
>> [ 2215.130269] udp_send_skb+0x128/0x394
>> [ 2215.130271] udp_sendmsg+0x7e8/0xa6c
>> [ 2215.130272] inet_sendmsg+0x48/0x70
>> [ 2215.130275] sock_sendmsg+0x54/0x60
>> [ 2215.130277] ____sys_sendmsg+0x1f8/0x254
>> [ 2215.130278] ___sys_sendmsg+0x84/0xcc
>> [ 2215.130279] __sys_sendmsg+0x60/0xb0
>> [ 2215.130281] __arm64_sys_sendmsg+0x28/0x30
>> [ 2215.130282] invoke_syscall.constprop.0+0x7c/0xd0
>> [ 2215.130285] el0_svc_common.constprop.0+0x140/0x150
>> [ 2215.130287] do_el0_svc+0x38/0xa0
>> [ 2215.130289] el0_svc+0x38/0x1d0
>> [ 2215.130291] el0t_64_sync_handler+0xb4/0x130
>> [ 2215.130292] el0t_64_sync+0x17c/0x180
>>
>>
>> include/linux/skbuff.h | 13 +++++++++++++
>> include/net/inet_sock.h | 1 +
>> include/net/sock.h | 1 +
>> net/core/sock.c | 1 +
>> net/ipv4/ip_output.c | 3 +++
>> net/ipv4/raw.c | 1 +
>> net/ipv6/ip6_output.c | 2 ++
>> net/ipv6/raw.c | 1 +
>> net/packet/af_packet.c | 3 +++
>> 9 files changed, 26 insertions(+)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 2dde34c29203..b098b7d30b56 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -744,6 +744,7 @@ typedef unsigned char *sk_buff_data_t;
>> * @tstamp: Time we arrived/left
>> * @skb_mstamp_ns: (aka @tstamp) earliest departure time; start point
>> * for retransmit timer
>> + * @user_delivery_time: states that timestamp was populated from userspace
>> * @rbnode: RB tree node, alternative to next/prev for netem/tcp
>> * @list: queue head
>> * @ll_node: anchor in an llist (eg socket defer_list)
>> @@ -879,6 +880,8 @@ struct sk_buff {
>> ktime_t tstamp;
>> u64 skb_mstamp_ns; /* earliest departure time */
>> };
>> + /* States that time is from userspace */
>> + bool user_delivery_time;
>> /*
>> * This is the control buffer. It is free to use for every
>> * layer. Please put your private variables there. If you
>> @@ -4208,6 +4211,16 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
>> if (skb->mono_delivery_time)
>> return;
>>
>> + /* When userspace timestamp packets are forwarded via bridge
>> + * the br_forward_finish clears the tstamp and the tstamp
>> + * from the userspace is lost. Hence the check for user
>> + * delivery time. With the below check now tc-etf qdisc will
>> + * not end up dropping the packets if the packet is forwarded via
>> + * bridge interface.
>> + */
>> + if (skb->user_delivery_time)
>> + return;
>> +
>> skb->tstamp = 0;
>> }
>>
>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>> index d94c242eb3ed..e7523545a493 100644
>> --- a/include/net/inet_sock.h
>> +++ b/include/net/inet_sock.h
>> @@ -175,6 +175,7 @@ struct inet_cork {
>> __u16 gso_size;
>> u64 transmit_time;
>> u32 mark;
>> + bool user_delivery_time;
>> };
>
> There's no need for a cork member, as by definition the fields in this
> struct are coming from userspace.
>
> There is a very high bar to adding new fields to sk_buff, because it
> is used by many paths and would be enormous if stuck with fields for
> every intersection between a pair of features.
>
> The goal here is for the bridge to disambiguate earliest delivery time
> timestamps from which? From those looped through ip forwarding? Why
> does the bridge zero the tstamp field at all? That might help finding
> a reasonable implementation.
>
> We have run in the issue of labeling the value of skb->tstamp before.
> With redirect and looping it is definitely subtle.
Thanks for your comments Willem,
There is a clear explanation of why this is needed as part of the below link
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
From the above link Martin KaFai Lau has mentioned and i quote.
"In the future, another bit (e.g. skb->user_delivery_time) can be added
for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid."
Bridge driver from what i understand should not alter the skb if its the forwarding path.
Please correct me if i am wrong here.
Abhishek Chauhan (ABC) wrote:
>
>
> On 2/16/2024 10:11 AM, Willem de Bruijn wrote:
> > Abhishek Chauhan wrote:
> >> Bridge driver today has no support to forward the userspace timestamp
> >> packets and ends up resetting the timestamp. ETF qdisc checks the
> >> packet coming from userspace and encounters to be 0 thereby dropping
> >> time sensitive packets. These changes will allow userspace timestamps
> >> packets to be forwarded from the bridge to NIC drivers.
> >>
> >> Signed-off-by: Abhishek Chauhan <[email protected]>
> >> ---
> >> Note:- I am a little skeptical of using bool inside the skbuff
> >> structure as no one today has used bool so far in the struct.
> >> (Expecting some comments from upstream for sure)
> >>
> >> I am also touching the heart of sk buff so i hope this is reviewed
> >> thoroughly. I have crossed checked multiple times on all the ipv4
> >> /ipv6 paths where userspace timestamp is populated. I tried as much
> >> as possible to cover all the references and made sure i put my changes
> >> in place.
> >>
> >> Bug description:- If the physical network interface is bridged the
> >> etf packets are dropped since bridge driver before forwarding the packet
> >> is setting the userspace timestamp to 0.
> >>
> >> Bridge driver call stack
> >>
> >> [ 157.120189] now is set to 1706054553072734733
> >> [ 157.120194] tx time from SKB is 0 <== SKB when reaches the etf qdisc is 0
> >> [ 157.120195] q->last time is 0
> >> [ 157.120197] CPU: 3 PID: 9206 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> >> [ 157.120201] Hardware name: Qualcomm SA8775P Ride (DT)
> >> [ 157.120202] Call trace:
> >> [ 157.120203] dump_backtrace+0xb0/0x130
> >> [ 157.120212] show_stack+0x1c/0x30
> >> [ 157.120215] dump_stack_lvl+0x74/0x8c
> >> [ 157.120220] dump_stack+0x14/0x24
> >> [ 157.120223] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> >> [ 157.120230] dev_qdisc_enqueue+0x2c/0x110
> >> [ 157.120234] __dev_xmit_skb+0x114/0x644
> >> [ 157.120236] __dev_queue_xmit+0x31c/0x774
> >> [ 157.120238] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> >> [ 157.120253] br_forward_finish+0xdc/0xec [bridge] <== This function is culprit as its making the tstamp as 0
> >> [root@ecbldauto-lvarm04-lnx ~]# [ 157.120263] __br_forward+0xd8/0x210 [bridge]
> >> [ 157.120272] br_forward+0x12c/0x150 [bridge]
> >> [ 157.120281] br_dev_xmit+0x288/0x49c [bridge]
> >> [ 157.120290] dev_hard_start_xmit+0xe4/0x2b4
> >> [ 157.120292] __dev_queue_xmit+0x6ac/0x774
> >> [ 157.120294] neigh_resolve_output+0x128/0x1ec
> >> [ 157.120297] ip_finish_output2+0x184/0x54c
> >> [ 157.120300] __ip_finish_output+0xa4/0x19c
> >> [ 157.120302] ip_finish_output+0x38/0xf0
> >> [ 157.120303] ip_output+0x13c/0x1f4
> >> [ 157.120305] ip_send_skb+0x54/0x10c
> >> [ 157.120307] udp_send_skb+0x128/0x394
> >> [ 157.120310] udp_sendmsg+0x7e8/0xa6c
> >> [ 157.120311] inet_sendmsg+0x48/0x70
> >> [ 157.120313] sock_sendmsg+0x54/0x60
> >> [ 157.120315] ____sys_sendmsg+0x1f8/0x254
> >> [ 157.120316] ___sys_sendmsg+0x84/0xcc
> >> [ 157.120318] __sys_sendmsg+0x60/0xb0
> >> [ 157.120319] __arm64_sys_sendmsg+0x28/0x30
> >> [ 157.120320] invoke_syscall.constprop.0+0x7c/0xd0
> >> [ 157.120323] el0_svc_common.constprop.0+0x140/0x150
> >> [ 157.120325] do_el0_svc+0x38/0xa0
> >> [ 157.120327] el0_svc+0x38/0x1d0
> >> [ 157.120329] el0t_64_sync_handler+0xb4/0x130
> >> [ 157.120330] el0t_64_sync+0x17c/0x180
> >>
> >> After my changes:-
> >> [ 2215.130148] now is set to 1706056610952501031
> >> [ 2215.130154] tx time from SKB is 1706056610953467393 <== Time is forwarded to etf correctly
> >> [ 2215.130155] q->last time is 1706056591423364609
> >> [ 2215.130158] CPU: 1 PID: 108166 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
> >> [ 2215.130162] Hardware name: Qualcomm SA8775P Ride (DT) [ 2215.130163] Call trace:
> >> [ 2215.130164] dump_backtrace+0xb0/0x130
> >> [ 2215.130172] show_stack+0x1c/0x30 [root@ecbldauto-lvarm04-lnx ~]#
> >> [ 2215.130175] dump_stack_lvl+0x74/0x8c [ 2215.130181] dump_stack+0x14/0x24
> >> [ 2215.130184] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
> >> [ 2215.130191] dev_qdisc_enqueue+0x2c/0x110
> >> [ 2215.130197] __dev_xmit_skb+0x114/0x644
> >> [ 2215.130200] __dev_queue_xmit+0x31c/0x774
> >> [ 2215.130202] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
> >> [ 2215.130217] br_forward_finish+0xe4/0xf0 [bridge]
> >> [ 2215.130226] __br_forward+0xd8/0x20c [bridge]
> >> [ 2215.130235] br_forward+0x12c/0x150 [bridge]
> >> [ 2215.130243] br_dev_xmit+0x288/0x49c [bridge]
> >> [ 2215.130252] dev_hard_start_xmit+0xe4/0x2b4
> >> [ 2215.130254] __dev_queue_xmit+0x6ac/0x774
> >> [ 2215.130257] neigh_hh_output+0xcc/0x140
> >> [ 2215.130260] ip_finish_output2+0x300/0x54c
> >> [ 2215.130262] __ip_finish_output+0xa4/0x19c
> >> [ 2215.130263] ip_finish_output+0x38/0xf0
> >> [ 2215.130265] ip_output+0x13c/0x1f4
> >> [ 2215.130267] ip_send_skb+0x54/0x110
> >> [ 2215.130269] udp_send_skb+0x128/0x394
> >> [ 2215.130271] udp_sendmsg+0x7e8/0xa6c
> >> [ 2215.130272] inet_sendmsg+0x48/0x70
> >> [ 2215.130275] sock_sendmsg+0x54/0x60
> >> [ 2215.130277] ____sys_sendmsg+0x1f8/0x254
> >> [ 2215.130278] ___sys_sendmsg+0x84/0xcc
> >> [ 2215.130279] __sys_sendmsg+0x60/0xb0
> >> [ 2215.130281] __arm64_sys_sendmsg+0x28/0x30
> >> [ 2215.130282] invoke_syscall.constprop.0+0x7c/0xd0
> >> [ 2215.130285] el0_svc_common.constprop.0+0x140/0x150
> >> [ 2215.130287] do_el0_svc+0x38/0xa0
> >> [ 2215.130289] el0_svc+0x38/0x1d0
> >> [ 2215.130291] el0t_64_sync_handler+0xb4/0x130
> >> [ 2215.130292] el0t_64_sync+0x17c/0x180
> >>
> >>
> >> include/linux/skbuff.h | 13 +++++++++++++
> >> include/net/inet_sock.h | 1 +
> >> include/net/sock.h | 1 +
> >> net/core/sock.c | 1 +
> >> net/ipv4/ip_output.c | 3 +++
> >> net/ipv4/raw.c | 1 +
> >> net/ipv6/ip6_output.c | 2 ++
> >> net/ipv6/raw.c | 1 +
> >> net/packet/af_packet.c | 3 +++
> >> 9 files changed, 26 insertions(+)
> >>
> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >> index 2dde34c29203..b098b7d30b56 100644
> >> --- a/include/linux/skbuff.h
> >> +++ b/include/linux/skbuff.h
> >> @@ -744,6 +744,7 @@ typedef unsigned char *sk_buff_data_t;
> >> * @tstamp: Time we arrived/left
> >> * @skb_mstamp_ns: (aka @tstamp) earliest departure time; start point
> >> * for retransmit timer
> >> + * @user_delivery_time: states that timestamp was populated from userspace
> >> * @rbnode: RB tree node, alternative to next/prev for netem/tcp
> >> * @list: queue head
> >> * @ll_node: anchor in an llist (eg socket defer_list)
> >> @@ -879,6 +880,8 @@ struct sk_buff {
> >> ktime_t tstamp;
> >> u64 skb_mstamp_ns; /* earliest departure time */
> >> };
> >> + /* States that time is from userspace */
> >> + bool user_delivery_time;
> >> /*
> >> * This is the control buffer. It is free to use for every
> >> * layer. Please put your private variables there. If you
> >> @@ -4208,6 +4211,16 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
> >> if (skb->mono_delivery_time)
> >> return;
> >>
> >> + /* When userspace timestamp packets are forwarded via bridge
> >> + * the br_forward_finish clears the tstamp and the tstamp
> >> + * from the userspace is lost. Hence the check for user
> >> + * delivery time. With the below check now tc-etf qdisc will
> >> + * not end up dropping the packets if the packet is forwarded via
> >> + * bridge interface.
> >> + */
> >> + if (skb->user_delivery_time)
> >> + return;
> >> +
> >> skb->tstamp = 0;
> >> }
> >>
> >> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
> >> index d94c242eb3ed..e7523545a493 100644
> >> --- a/include/net/inet_sock.h
> >> +++ b/include/net/inet_sock.h
> >> @@ -175,6 +175,7 @@ struct inet_cork {
> >> __u16 gso_size;
> >> u64 transmit_time;
> >> u32 mark;
> >> + bool user_delivery_time;
> >> };
> >
> > There's no need for a cork member, as by definition the fields in this
> > struct are coming from userspace.
> >
> > There is a very high bar to adding new fields to sk_buff, because it
> > is used by many paths and would be enormous if stuck with fields for
> > every intersection between a pair of features.
> >
> > The goal here is for the bridge to disambiguate earliest delivery time
> > timestamps from which? From those looped through ip forwarding? Why
> > does the bridge zero the tstamp field at all? That might help finding
> > a reasonable implementation.
> >
> > We have run in the issue of labeling the value of skb->tstamp before.
> > With redirect and looping it is definitely subtle.
>
> Thanks for your comments Willem,
>
> There is a clear explanation of why this is needed as part of the below link
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
>
> From the above link Martin KaFai Lau has mentioned and i quote.
>
> "In the future, another bit (e.g. skb->user_delivery_time) can be added
> for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid."
Let's CC: Martin on this thread extending his work.
In some cases SO_TXTIME sets timestamps for in monotonic, for FQ.
The issue here is that it may set in against other clocks, such as
CLOCK_TAI for the ETF qdisc.
Instead of adding yet another rarely used bit to sk_buff, could we
modify the existing bit to be clockid_delivery_time, and default
sk_clockid to CLOCK_MONOTONIC for TCP?
There is no conflict between SCM_TXTIME and TCP, because the explicit
cmsg makes no sense for TCP and indeed only RAW and DGRAM sockets
interpret it.
On 2/19/2024 8:26 AM, Willem de Bruijn wrote:
> Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 2/16/2024 10:11 AM, Willem de Bruijn wrote:
>>> Abhishek Chauhan wrote:
>>>> Bridge driver today has no support to forward the userspace timestamp
>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>> time sensitive packets. These changes will allow userspace timestamps
>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>
>>>> Signed-off-by: Abhishek Chauhan <[email protected]>
>>>> ---
>>>> Note:- I am a little skeptical of using bool inside the skbuff
>>>> structure as no one today has used bool so far in the struct.
>>>> (Expecting some comments from upstream for sure)
>>>>
>>>> I am also touching the heart of sk buff so i hope this is reviewed
>>>> thoroughly. I have crossed checked multiple times on all the ipv4
>>>> /ipv6 paths where userspace timestamp is populated. I tried as much
>>>> as possible to cover all the references and made sure i put my changes
>>>> in place.
>>>>
>>>> Bug description:- If the physical network interface is bridged the
>>>> etf packets are dropped since bridge driver before forwarding the packet
>>>> is setting the userspace timestamp to 0.
>>>>
>>>> Bridge driver call stack
>>>>
>>>> [ 157.120189] now is set to 1706054553072734733
>>>> [ 157.120194] tx time from SKB is 0 <== SKB when reaches the etf qdisc is 0
>>>> [ 157.120195] q->last time is 0
>>>> [ 157.120197] CPU: 3 PID: 9206 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
>>>> [ 157.120201] Hardware name: Qualcomm SA8775P Ride (DT)
>>>> [ 157.120202] Call trace:
>>>> [ 157.120203] dump_backtrace+0xb0/0x130
>>>> [ 157.120212] show_stack+0x1c/0x30
>>>> [ 157.120215] dump_stack_lvl+0x74/0x8c
>>>> [ 157.120220] dump_stack+0x14/0x24
>>>> [ 157.120223] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
>>>> [ 157.120230] dev_qdisc_enqueue+0x2c/0x110
>>>> [ 157.120234] __dev_xmit_skb+0x114/0x644
>>>> [ 157.120236] __dev_queue_xmit+0x31c/0x774
>>>> [ 157.120238] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
>>>> [ 157.120253] br_forward_finish+0xdc/0xec [bridge] <== This function is culprit as its making the tstamp as 0
>>>> [root@ecbldauto-lvarm04-lnx ~]# [ 157.120263] __br_forward+0xd8/0x210 [bridge]
>>>> [ 157.120272] br_forward+0x12c/0x150 [bridge]
>>>> [ 157.120281] br_dev_xmit+0x288/0x49c [bridge]
>>>> [ 157.120290] dev_hard_start_xmit+0xe4/0x2b4
>>>> [ 157.120292] __dev_queue_xmit+0x6ac/0x774
>>>> [ 157.120294] neigh_resolve_output+0x128/0x1ec
>>>> [ 157.120297] ip_finish_output2+0x184/0x54c
>>>> [ 157.120300] __ip_finish_output+0xa4/0x19c
>>>> [ 157.120302] ip_finish_output+0x38/0xf0
>>>> [ 157.120303] ip_output+0x13c/0x1f4
>>>> [ 157.120305] ip_send_skb+0x54/0x10c
>>>> [ 157.120307] udp_send_skb+0x128/0x394
>>>> [ 157.120310] udp_sendmsg+0x7e8/0xa6c
>>>> [ 157.120311] inet_sendmsg+0x48/0x70
>>>> [ 157.120313] sock_sendmsg+0x54/0x60
>>>> [ 157.120315] ____sys_sendmsg+0x1f8/0x254
>>>> [ 157.120316] ___sys_sendmsg+0x84/0xcc
>>>> [ 157.120318] __sys_sendmsg+0x60/0xb0
>>>> [ 157.120319] __arm64_sys_sendmsg+0x28/0x30
>>>> [ 157.120320] invoke_syscall.constprop.0+0x7c/0xd0
>>>> [ 157.120323] el0_svc_common.constprop.0+0x140/0x150
>>>> [ 157.120325] do_el0_svc+0x38/0xa0
>>>> [ 157.120327] el0_svc+0x38/0x1d0
>>>> [ 157.120329] el0t_64_sync_handler+0xb4/0x130
>>>> [ 157.120330] el0t_64_sync+0x17c/0x180
>>>>
>>>> After my changes:-
>>>> [ 2215.130148] now is set to 1706056610952501031
>>>> [ 2215.130154] tx time from SKB is 1706056610953467393 <== Time is forwarded to etf correctly
>>>> [ 2215.130155] q->last time is 1706056591423364609
>>>> [ 2215.130158] CPU: 1 PID: 108166 Comm: a.out Tainted: G W OE X ------- --- 5.14.0-999.323ES.test.el9.aarch64 #1
>>>> [ 2215.130162] Hardware name: Qualcomm SA8775P Ride (DT) [ 2215.130163] Call trace:
>>>> [ 2215.130164] dump_backtrace+0xb0/0x130
>>>> [ 2215.130172] show_stack+0x1c/0x30 [root@ecbldauto-lvarm04-lnx ~]#
>>>> [ 2215.130175] dump_stack_lvl+0x74/0x8c [ 2215.130181] dump_stack+0x14/0x24
>>>> [ 2215.130184] etf_enqueue_timesortedlist+0x114/0x20c [sch_etf]
>>>> [ 2215.130191] dev_qdisc_enqueue+0x2c/0x110
>>>> [ 2215.130197] __dev_xmit_skb+0x114/0x644
>>>> [ 2215.130200] __dev_queue_xmit+0x31c/0x774
>>>> [ 2215.130202] br_dev_queue_push_xmit+0xd4/0x120 [bridge]
>>>> [ 2215.130217] br_forward_finish+0xe4/0xf0 [bridge]
>>>> [ 2215.130226] __br_forward+0xd8/0x20c [bridge]
>>>> [ 2215.130235] br_forward+0x12c/0x150 [bridge]
>>>> [ 2215.130243] br_dev_xmit+0x288/0x49c [bridge]
>>>> [ 2215.130252] dev_hard_start_xmit+0xe4/0x2b4
>>>> [ 2215.130254] __dev_queue_xmit+0x6ac/0x774
>>>> [ 2215.130257] neigh_hh_output+0xcc/0x140
>>>> [ 2215.130260] ip_finish_output2+0x300/0x54c
>>>> [ 2215.130262] __ip_finish_output+0xa4/0x19c
>>>> [ 2215.130263] ip_finish_output+0x38/0xf0
>>>> [ 2215.130265] ip_output+0x13c/0x1f4
>>>> [ 2215.130267] ip_send_skb+0x54/0x110
>>>> [ 2215.130269] udp_send_skb+0x128/0x394
>>>> [ 2215.130271] udp_sendmsg+0x7e8/0xa6c
>>>> [ 2215.130272] inet_sendmsg+0x48/0x70
>>>> [ 2215.130275] sock_sendmsg+0x54/0x60
>>>> [ 2215.130277] ____sys_sendmsg+0x1f8/0x254
>>>> [ 2215.130278] ___sys_sendmsg+0x84/0xcc
>>>> [ 2215.130279] __sys_sendmsg+0x60/0xb0
>>>> [ 2215.130281] __arm64_sys_sendmsg+0x28/0x30
>>>> [ 2215.130282] invoke_syscall.constprop.0+0x7c/0xd0
>>>> [ 2215.130285] el0_svc_common.constprop.0+0x140/0x150
>>>> [ 2215.130287] do_el0_svc+0x38/0xa0
>>>> [ 2215.130289] el0_svc+0x38/0x1d0
>>>> [ 2215.130291] el0t_64_sync_handler+0xb4/0x130
>>>> [ 2215.130292] el0t_64_sync+0x17c/0x180
>>>>
>>>>
>>>> include/linux/skbuff.h | 13 +++++++++++++
>>>> include/net/inet_sock.h | 1 +
>>>> include/net/sock.h | 1 +
>>>> net/core/sock.c | 1 +
>>>> net/ipv4/ip_output.c | 3 +++
>>>> net/ipv4/raw.c | 1 +
>>>> net/ipv6/ip6_output.c | 2 ++
>>>> net/ipv6/raw.c | 1 +
>>>> net/packet/af_packet.c | 3 +++
>>>> 9 files changed, 26 insertions(+)
>>>>
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 2dde34c29203..b098b7d30b56 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -744,6 +744,7 @@ typedef unsigned char *sk_buff_data_t;
>>>> * @tstamp: Time we arrived/left
>>>> * @skb_mstamp_ns: (aka @tstamp) earliest departure time; start point
>>>> * for retransmit timer
>>>> + * @user_delivery_time: states that timestamp was populated from userspace
>>>> * @rbnode: RB tree node, alternative to next/prev for netem/tcp
>>>> * @list: queue head
>>>> * @ll_node: anchor in an llist (eg socket defer_list)
>>>> @@ -879,6 +880,8 @@ struct sk_buff {
>>>> ktime_t tstamp;
>>>> u64 skb_mstamp_ns; /* earliest departure time */
>>>> };
>>>> + /* States that time is from userspace */
>>>> + bool user_delivery_time;
>>>> /*
>>>> * This is the control buffer. It is free to use for every
>>>> * layer. Please put your private variables there. If you
>>>> @@ -4208,6 +4211,16 @@ static inline void skb_clear_tstamp(struct sk_buff *skb)
>>>> if (skb->mono_delivery_time)
>>>> return;
>>>>
>>>> + /* When userspace timestamp packets are forwarded via bridge
>>>> + * the br_forward_finish clears the tstamp and the tstamp
>>>> + * from the userspace is lost. Hence the check for user
>>>> + * delivery time. With the below check now tc-etf qdisc will
>>>> + * not end up dropping the packets if the packet is forwarded via
>>>> + * bridge interface.
>>>> + */
>>>> + if (skb->user_delivery_time)
>>>> + return;
>>>> +
>>>> skb->tstamp = 0;
>>>> }
>>>>
>>>> diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
>>>> index d94c242eb3ed..e7523545a493 100644
>>>> --- a/include/net/inet_sock.h
>>>> +++ b/include/net/inet_sock.h
>>>> @@ -175,6 +175,7 @@ struct inet_cork {
>>>> __u16 gso_size;
>>>> u64 transmit_time;
>>>> u32 mark;
>>>> + bool user_delivery_time;
>>>> };
>>>
>>> There's no need for a cork member, as by definition the fields in this
>>> struct are coming from userspace.
>>>
>>> There is a very high bar to adding new fields to sk_buff, because it
>>> is used by many paths and would be enormous if stuck with fields for
>>> every intersection between a pair of features.
>>>
>>> The goal here is for the bridge to disambiguate earliest delivery time
>>> timestamps from which? From those looped through ip forwarding? Why
>>> does the bridge zero the tstamp field at all? That might help finding
>>> a reasonable implementation.
>>>
>>> We have run in the issue of labeling the value of skb->tstamp before.
>>> With redirect and looping it is definitely subtle.
>>
>> Thanks for your comments Willem,
>>
>> There is a clear explanation of why this is needed as part of the below link
>> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
>>
>> From the above link Martin KaFai Lau has mentioned and i quote.
>>
>> "In the future, another bit (e.g. skb->user_delivery_time) can be added
>> for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid."
>
> Let's CC: Martin on this thread extending his work.
>
> In some cases SO_TXTIME sets timestamps for in monotonic, for FQ.
>
> The issue here is that it may set in against other clocks, such as
> CLOCK_TAI for the ETF qdisc.
>
> Instead of adding yet another rarely used bit to sk_buff, could we
> modify the existing bit to be clockid_delivery_time, and default
> sk_clockid to CLOCK_MONOTONIC for TCP?
>
> There is no conflict between SCM_TXTIME and TCP, because the explicit
> cmsg makes no sense for TCP and indeed only RAW and DGRAM sockets
> interpret it.
So if i understand your comment correctly. You are asking me to change the
1. Existing implementation of mono_delivery_time and extend it by changing the
bit to clockid_delivery_time
2. Set the clockid_delivery_time by default to CLOCK_MONOTONIC for TCP
3. When ever ETF is enabled with CLOCK_TAI then also use the same bit to
distinguish if the clockid is set from userspace and cmsg is passed
from userspace with tstamp
Note :- ETF also supports other clock id such as REALTIME, BOOTTIME, TAI and
MONOTONIC.
Martin, Please provide your comments too.