Bridge driver today has no support to forward the userspace timestamp
packets and ends up resetting the timestamp. ETF qdisc checks the
packet coming from userspace and encounters to be 0 thereby dropping
time sensitive packets. These changes will allow userspace timestamps
packets to be forwarded from the bridge to NIC drivers.
Setting the same bit (mono_delivery_time) to avoid dropping of
userspace tstamp packets in the forwarding path.
Existing functionality of mono_delivery_time remains unaltered here,
instead just extended with userspace tstamp support for bridge
forwarding path.
Signed-off-by: Abhishek Chauhan <[email protected]>
---
Changes since v2
- Updated the commit subject and message.
- Took care of few comments from Willem to re-use mono_delivery_time
with comments and documentations in the header and source file.
- Took care of comment from Andrew on the typo in the comment.
- Existing self-test test cases are executed to make sure existing
implementation is not impacted as stated by Paolo.(so_txtime.sh).
- Internal validation of UDP packets using iperf/so_priority/so_txtime
with MQPRIO + ETF offload is executed as well.
- Test case is included below
Test 1 :- FQ + ETF (SW path)
[root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh
[ 280.640551] q->last time is 1707955476143297550
[ 283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[ 284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
SO_TXTIME ipv4 clock monotonic
payload:a delay:109 expected:0 (us)
SO_TXTIME ipv6 clock monotonic
payload:a delay:140 expected:0 (us)
SO_TXTIME ipv6 clock monotonic
payload:a delay:12739 expected:10000 (us)
SO_TXTIME ipv4 clock monotonic
payload:a delay:10054 expected:10000 (us)
payload:b delay:20043 expected:20000 (us)
SO_TXTIME ipv6 clock monotonic
payload:b delay:20078 expected:20000 (us)
payload:a delay:20177 expected:20000 (us)
SO_TXTIME ipv4 clock tai
send: pkt a at -1707955482913ms dropped: invalid txtime
[ 287.070504] now is set to 1707955482913404839
[ 287.070509] tx time from SKB is 0
/so_txtime: recv: timeout: Resource temporarily unavailable
SO_TXTIME ipv6 clock tai
send: pkt a at 0ms dropped: invalid txtime
[ 287.070510] q->last time is 0
[ 287.420590] now is set to 1707955483263491298
[ 287.420596] tx time from SKB is 1707955483263454527
/so_txtime: recv: timeout: Resource temporarily unavailable
SO_TXTIME ipv6 clock tai
[ 287.420597] q->last time is 0
[ 287.700598] now is set to 1707955483543498954
[ 287.700604] tx time from SKB is 1707955483553463173
payload:a delay:9655 expected:10000 (us)
SO_TXTIME ipv4 clock tai
[ 287.700605] q->last time is 0
[ 288.100532] now is set to 1707955483943432391
[ 288.100537] tx time from SKB is 1707955483953413016
payload:a delay:9668 expected:10000 (us)[ 288.100538] q->last time is 1707955483553463173
[ 288.100546] now is set to 1707955483943446975
[ 288.100547] tx time from SKB is 1707955483963413016
payload:b delay:20484 expected:20000 (us)
SO_TXTIME ipv6 clock tai
[ 288.100547] q->last time is 1707955483553463173
[ 288.440582] now is set to 1707955484283482495
[ 288.440587] tx time from SKB is 1707955484303452808
payload:b delay:9648 expected:10000 (us)[ 288.440588] q->last time is 1707955483963413016
[ 288.440598] now is set to 1707955484283499370
payload:a delay:22037 expected:20000 (us)
[ 288.440599] tx time from SKB is 1707955484293452808
OK. All tests passed
Test case 2 (MQPRIO + ETF HW offload)
[root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \
map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 1@2 1@3\
hw 0
[root@ecbldauto-lvarm04-lnx ~]#
tc qdisc replace dev eth0 parent 100:4 etf \
clockid CLOCK_TAI delta 40000 offload skip_sock_check
[ 89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1
[ 89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3
[root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2
SO_TXTIME ipv4 clock tai
glob_tstat = 1707955395256170394
[ 199.623650] now is set to 1707955395256215810
[ 199.623655] tx time from SKB is 1707955395257170394
[ 199.623656] q->last time is 0
[ 199.623663] now is set to 1707955395256230029
[ 199.623664] tx time from SKB is 1707955395258170394
[ 199.623665] q->last time is 0
[ 199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec
[ 199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec
Changes since v1
- Changed the commit subject as i am modifying the mono_delivery_time
bit with clockid_delivery_time.
- Took care of suggestion mentioned by Willem to use the same bit for
userspace delivery time as there are no conflicts between TCP and
SCM_TXTIME, because explicit cmsg makes no sense for TCP and only
RAW and DGRAM sockets interprets it.
- Clear explaination of why this is needed mentioned below and this
is extending the work done by Martin for mono_delivery_time
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
- Version 1 patch can be referenced with below link which states
the exact problem with tc-etf and discussions which took place
https://lore.kernel.org/all/[email protected]/
include/linux/skbuff.h | 4 ++++
net/ipv4/ip_output.c | 7 +++++++
net/ipv4/raw.c | 7 +++++++
net/ipv6/ip6_output.c | 8 +++++++-
net/ipv6/raw.c | 8 +++++++-
net/packet/af_packet.c | 8 +++++++-
6 files changed, 39 insertions(+), 3 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2dde34c29203..58586d56b19f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -820,6 +820,10 @@ typedef unsigned char *sk_buff_data_t;
* delivery_time in mono clock base (i.e. EDT). Otherwise, the
* skb->tstamp has the (rcv) timestamp at ingress and
* delivery_time at egress.
+ * This bit is also set for tstamp coming from userspace which
+ * acts as an information in the bridge forwarding path to avoid
+ * resetting the tstamp value when user sets the timestamp using
+ * SO_TXTIME sockopts.
* @napi_id: id of the NAPI struct this skb came from
* @sender_cpu: (aka @napi_id) source CPU in XPS
* @alloc_cpu: CPU which did the skb allocation.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 5b5a0adb927f..4ae6aea8f8d6 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1455,6 +1455,13 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
skb->mark = cork->mark;
skb->tstamp = cork->transmit_time;
+ /* Timestamp coming from userspace using CMSG is stored as part
+ * of transmit_time as part of cork. To ensure bridge does not
+ * drop the tstamp in the forwarding path.We are reusing bit
+ * mono_delivery_time to avoid reset of tstamp in bridge
+ * forwarding path.
+ */
+ skb->mono_delivery_time = !!skb->tstamp;
/*
* Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
* on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index aea89326c697..6e67c0203be8 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -353,6 +353,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
skb->tstamp = sockc->transmit_time;
+ /* Timestamp coming from userspace using CMSG is stored as part
+ * of transmit_time as part of sockcmcookie. To ensure bridge does not
+ * drop the tstamp in the forwarding path. We are reusing bit
+ * mono_delivery_time to avoid reset of tstamp in bridge
+ * forwarding path.
+ */
+ skb->mono_delivery_time = !!skb->tstamp;
skb_dst_set(skb, &rt->dst);
*rtp = NULL;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a722a43dd668..f5b5e13a920f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1922,7 +1922,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = cork->base.mark;
skb->tstamp = cork->base.transmit_time;
-
+ /* Timestamp coming from userspace using CMSG is stored as part
+ * of transmit_time as part of cork. To ensure bridge does not
+ * drop the tstamp in the forwarding path. We are reusing bit
+ * mono_delivery_time to avoid reset of tstamp in bridge
+ * forwarding path.
+ */
+ skb->mono_delivery_time = !!skb->tstamp;
ip6_cork_steal_dst(skb, cork);
IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
if (proto == IPPROTO_ICMPV6) {
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 03dbb874c363..d2e2a1ec3de4 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -616,7 +616,13 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = sockc->mark;
skb->tstamp = sockc->transmit_time;
-
+ /* Timestamp coming from userspace using CMSG is stored as part
+ * of transmit_time as part of sockcmcookie. To ensure bridge does not
+ * drop the tstamp in the forwarding path.We are reusing bit
+ * mono_delivery_time to avoid reset of tstamp in bridge
+ * forwarding path.
+ */
+ skb->mono_delivery_time = !!skb->tstamp;
skb_put(skb, length);
skb_reset_network_header(skb);
iph = ipv6_hdr(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9bbc2686690..949e936b5786 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2057,7 +2057,13 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
skb->priority = READ_ONCE(sk->sk_priority);
skb->mark = READ_ONCE(sk->sk_mark);
skb->tstamp = sockc.transmit_time;
-
+ /* Timestamp coming from userspace using CMSG is stored as part
+ * of transmit_time as part of sockcmcookie. To ensure bridge does not
+ * drop the tstamp in the forwarding path. We are reusing bit
+ * mono_delivery_time to avoid reset of tstamp in bridge
+ * forwarding path.
+ */
+ skb->mono_delivery_time = !!skb->tstamp;
skb_setup_tx_timestamp(skb, sockc.tsflags);
if (unlikely(extra_len == 4))
--
2.25.1
Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
>
> Setting the same bit (mono_delivery_time) to avoid dropping of
> userspace tstamp packets in the forwarding path.
>
> Existing functionality of mono_delivery_time remains unaltered here,
> instead just extended with userspace tstamp support for bridge
> forwarding path.
>
> Signed-off-by: Abhishek Chauhan <[email protected]>
> ---
> Changes since v2
> - Updated the commit subject and message.
> - Took care of few comments from Willem to re-use mono_delivery_time
> with comments and documentations in the header and source file.
> - Took care of comment from Andrew on the typo in the comment.
> - Existing self-test test cases are executed to make sure existing
> implementation is not impacted as stated by Paolo.(so_txtime.sh).
> - Internal validation of UDP packets using iperf/so_priority/so_txtime
> with MQPRIO + ETF offload is executed as well.
> - Test case is included below
>
> Test 1 :- FQ + ETF (SW path)
>
> [root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh
> [ 280.640551] q->last time is 1707955476143297550
> [ 283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [ 284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>
> SO_TXTIME ipv4 clock monotonic
> payload:a delay:109 expected:0 (us)
>
> SO_TXTIME ipv6 clock monotonic
> payload:a delay:140 expected:0 (us)
>
> SO_TXTIME ipv6 clock monotonic
> payload:a delay:12739 expected:10000 (us)
>
> SO_TXTIME ipv4 clock monotonic
> payload:a delay:10054 expected:10000 (us)
> payload:b delay:20043 expected:20000 (us)
>
> SO_TXTIME ipv6 clock monotonic
> payload:b delay:20078 expected:20000 (us)
> payload:a delay:20177 expected:20000 (us)
>
> SO_TXTIME ipv4 clock tai
> send: pkt a at -1707955482913ms dropped: invalid txtime
> [ 287.070504] now is set to 1707955482913404839
> [ 287.070509] tx time from SKB is 0
> ./so_txtime: recv: timeout: Resource temporarily unavailable
>
> SO_TXTIME ipv6 clock tai
> send: pkt a at 0ms dropped: invalid txtime
> [ 287.070510] q->last time is 0
> [ 287.420590] now is set to 1707955483263491298
> [ 287.420596] tx time from SKB is 1707955483263454527
> ./so_txtime: recv: timeout: Resource temporarily unavailable
>
> SO_TXTIME ipv6 clock tai
> [ 287.420597] q->last time is 0
> [ 287.700598] now is set to 1707955483543498954
> [ 287.700604] tx time from SKB is 1707955483553463173
> payload:a delay:9655 expected:10000 (us)
>
> SO_TXTIME ipv4 clock tai
> [ 287.700605] q->last time is 0
> [ 288.100532] now is set to 1707955483943432391
> [ 288.100537] tx time from SKB is 1707955483953413016
> payload:a delay:9668 expected:10000 (us)[ 288.100538] q->last time is 1707955483553463173
>
> [ 288.100546] now is set to 1707955483943446975
> [ 288.100547] tx time from SKB is 1707955483963413016
> payload:b delay:20484 expected:20000 (us)
>
> SO_TXTIME ipv6 clock tai
> [ 288.100547] q->last time is 1707955483553463173
> [ 288.440582] now is set to 1707955484283482495
> [ 288.440587] tx time from SKB is 1707955484303452808
> payload:b delay:9648 expected:10000 (us)[ 288.440588] q->last time is 1707955483963413016
>
> [ 288.440598] now is set to 1707955484283499370
> payload:a delay:22037 expected:20000 (us)
> [ 288.440599] tx time from SKB is 1707955484293452808
> OK. All tests passed
>
>
> Test case 2 (MQPRIO + ETF HW offload)
>
> [root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \
> map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \
> queues 1@0 1@1 1@2 1@3\
> hw 0
> [root@ecbldauto-lvarm04-lnx ~]#
> tc qdisc replace dev eth0 parent 100:4 etf \
> clockid CLOCK_TAI delta 40000 offload skip_sock_check
> [ 89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1
> [ 89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3
>
>
> [root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2
>
> SO_TXTIME ipv4 clock tai
>
> glob_tstat = 1707955395256170394
> [ 199.623650] now is set to 1707955395256215810
> [ 199.623655] tx time from SKB is 1707955395257170394
> [ 199.623656] q->last time is 0
> [ 199.623663] now is set to 1707955395256230029
> [ 199.623664] tx time from SKB is 1707955395258170394
> [ 199.623665] q->last time is 0
> [ 199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec
> [ 199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec
>
> Changes since v1
> - Changed the commit subject as i am modifying the mono_delivery_time
> bit with clockid_delivery_time.
> - Took care of suggestion mentioned by Willem to use the same bit for
> userspace delivery time as there are no conflicts between TCP and
> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only
> RAW and DGRAM sockets interprets it.
> - Clear explaination of why this is needed mentioned below and this
> is extending the work done by Martin for mono_delivery_time
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
> - Version 1 patch can be referenced with below link which states
> the exact problem with tc-etf and discussions which took place
> https://lore.kernel.org/all/[email protected]/
>
> include/linux/skbuff.h | 4 ++++
> net/ipv4/ip_output.c | 7 +++++++
> net/ipv4/raw.c | 7 +++++++
> net/ipv6/ip6_output.c | 8 +++++++-
> net/ipv6/raw.c | 8 +++++++-
> net/packet/af_packet.c | 8 +++++++-
> 6 files changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2dde34c29203..58586d56b19f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -820,6 +820,10 @@ typedef unsigned char *sk_buff_data_t;
> * delivery_time in mono clock base (i.e. EDT). Otherwise, the
> * skb->tstamp has the (rcv) timestamp at ingress and
> * delivery_time at egress.
> + * This bit is also set for tstamp coming from userspace which
> + * acts as an information in the bridge forwarding path to avoid
> + * resetting the tstamp value when user sets the timestamp using
> + * SO_TXTIME sockopts.
There are multiple applications of this information aside from
bridging. I'd drop that and instead rewrite the existing. Something
like
"delivery_time in mono clock base (i.e., EDT) or a clock base chosen
by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at
ingress."
> * @napi_id: id of the NAPI struct this skb came from
> * @sender_cpu: (aka @napi_id) source CPU in XPS
> * @alloc_cpu: CPU which did the skb allocation.
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 5b5a0adb927f..4ae6aea8f8d6 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1455,6 +1455,13 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
> skb->mark = cork->mark;
> skb->tstamp = cork->transmit_time;
> + /* Timestamp coming from userspace using CMSG is stored as part
> + * of transmit_time as part of cork. To ensure bridge does not
> + * drop the tstamp in the forwarding path.We are reusing bit
> + * mono_delivery_time to avoid reset of tstamp in bridge
> + * forwarding path.
> + */
> + skb->mono_delivery_time = !!skb->tstamp;
This patch adds too much verbose commentary, repeated multiple times,
for such a small change. Keep only the comment in skbuff.h.
> /*
> * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
> * on dst refcount
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index aea89326c697..6e67c0203be8 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -353,6 +353,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
> skb->priority = READ_ONCE(sk->sk_priority);
> skb->mark = sockc->mark;
> skb->tstamp = sockc->transmit_time;
> + /* Timestamp coming from userspace using CMSG is stored as part
> + * of transmit_time as part of sockcmcookie. To ensure bridge does not
> + * drop the tstamp in the forwarding path. We are reusing bit
> + * mono_delivery_time to avoid reset of tstamp in bridge
> + * forwarding path.
> + */
> + skb->mono_delivery_time = !!skb->tstamp;
> skb_dst_set(skb, &rt->dst);
> *rtp = NULL;
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index a722a43dd668..f5b5e13a920f 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1922,7 +1922,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
> skb->priority = READ_ONCE(sk->sk_priority);
> skb->mark = cork->base.mark;
> skb->tstamp = cork->base.transmit_time;
> -
> + /* Timestamp coming from userspace using CMSG is stored as part
> + * of transmit_time as part of cork. To ensure bridge does not
> + * drop the tstamp in the forwarding path. We are reusing bit
> + * mono_delivery_time to avoid reset of tstamp in bridge
> + * forwarding path.
> + */
> + skb->mono_delivery_time = !!skb->tstamp;
> ip6_cork_steal_dst(skb, cork);
> IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
> if (proto == IPPROTO_ICMPV6) {
> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
> index 03dbb874c363..d2e2a1ec3de4 100644
> --- a/net/ipv6/raw.c
> +++ b/net/ipv6/raw.c
> @@ -616,7 +616,13 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
> skb->priority = READ_ONCE(sk->sk_priority);
> skb->mark = sockc->mark;
> skb->tstamp = sockc->transmit_time;
> -
> + /* Timestamp coming from userspace using CMSG is stored as part
> + * of transmit_time as part of sockcmcookie. To ensure bridge does not
> + * drop the tstamp in the forwarding path.We are reusing bit
> + * mono_delivery_time to avoid reset of tstamp in bridge
> + * forwarding path.
> + */
> + skb->mono_delivery_time = !!skb->tstamp;
> skb_put(skb, length);
> skb_reset_network_header(skb);
> iph = ipv6_hdr(skb);
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index c9bbc2686690..949e936b5786 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2057,7 +2057,13 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
> skb->priority = READ_ONCE(sk->sk_priority);
> skb->mark = READ_ONCE(sk->sk_mark);
> skb->tstamp = sockc.transmit_time;
> -
> + /* Timestamp coming from userspace using CMSG is stored as part
> + * of transmit_time as part of sockcmcookie. To ensure bridge does not
> + * drop the tstamp in the forwarding path. We are reusing bit
> + * mono_delivery_time to avoid reset of tstamp in bridge
> + * forwarding path.
> + */
> + skb->mono_delivery_time = !!skb->tstamp;
Search for all occurrences of skb->tstamp getting initialized from
sockc.transmit_time. af_packet.c has three such cases.
> skb_setup_tx_timestamp(skb, sockc.tsflags);
>
> if (unlikely(extra_len == 4))
> --
> 2.25.1
>
On 3/1/2024 10:45 AM, Willem de Bruijn wrote:
> Abhishek Chauhan wrote:
>> Bridge driver today has no support to forward the userspace timestamp
>> packets and ends up resetting the timestamp. ETF qdisc checks the
>> packet coming from userspace and encounters to be 0 thereby dropping
>> time sensitive packets. These changes will allow userspace timestamps
>> packets to be forwarded from the bridge to NIC drivers.
>>
>> Setting the same bit (mono_delivery_time) to avoid dropping of
>> userspace tstamp packets in the forwarding path.
>>
>> Existing functionality of mono_delivery_time remains unaltered here,
>> instead just extended with userspace tstamp support for bridge
>> forwarding path.
>>
>> Signed-off-by: Abhishek Chauhan <[email protected]>
>> ---
>> Changes since v2
>> - Updated the commit subject and message.
>> - Took care of few comments from Willem to re-use mono_delivery_time
>> with comments and documentations in the header and source file.
>> - Took care of comment from Andrew on the typo in the comment.
>> - Existing self-test test cases are executed to make sure existing
>> implementation is not impacted as stated by Paolo.(so_txtime.sh).
>> - Internal validation of UDP packets using iperf/so_priority/so_txtime
>> with MQPRIO + ETF offload is executed as well.
>> - Test case is included below
>>
>> Test 1 :- FQ + ETF (SW path)
>>
>> [root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh
>> [ 280.640551] q->last time is 1707955476143297550
>> [ 283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>> [ 284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
>>
>> SO_TXTIME ipv4 clock monotonic
>> payload:a delay:109 expected:0 (us)
>>
>> SO_TXTIME ipv6 clock monotonic
>> payload:a delay:140 expected:0 (us)
>>
>> SO_TXTIME ipv6 clock monotonic
>> payload:a delay:12739 expected:10000 (us)
>>
>> SO_TXTIME ipv4 clock monotonic
>> payload:a delay:10054 expected:10000 (us)
>> payload:b delay:20043 expected:20000 (us)
>>
>> SO_TXTIME ipv6 clock monotonic
>> payload:b delay:20078 expected:20000 (us)
>> payload:a delay:20177 expected:20000 (us)
>>
>> SO_TXTIME ipv4 clock tai
>> send: pkt a at -1707955482913ms dropped: invalid txtime
>> [ 287.070504] now is set to 1707955482913404839
>> [ 287.070509] tx time from SKB is 0
>> ./so_txtime: recv: timeout: Resource temporarily unavailable
>>
>> SO_TXTIME ipv6 clock tai
>> send: pkt a at 0ms dropped: invalid txtime
>> [ 287.070510] q->last time is 0
>> [ 287.420590] now is set to 1707955483263491298
>> [ 287.420596] tx time from SKB is 1707955483263454527
>> ./so_txtime: recv: timeout: Resource temporarily unavailable
>>
>> SO_TXTIME ipv6 clock tai
>> [ 287.420597] q->last time is 0
>> [ 287.700598] now is set to 1707955483543498954
>> [ 287.700604] tx time from SKB is 1707955483553463173
>> payload:a delay:9655 expected:10000 (us)
>>
>> SO_TXTIME ipv4 clock tai
>> [ 287.700605] q->last time is 0
>> [ 288.100532] now is set to 1707955483943432391
>> [ 288.100537] tx time from SKB is 1707955483953413016
>> payload:a delay:9668 expected:10000 (us)[ 288.100538] q->last time is 1707955483553463173
>>
>> [ 288.100546] now is set to 1707955483943446975
>> [ 288.100547] tx time from SKB is 1707955483963413016
>> payload:b delay:20484 expected:20000 (us)
>>
>> SO_TXTIME ipv6 clock tai
>> [ 288.100547] q->last time is 1707955483553463173
>> [ 288.440582] now is set to 1707955484283482495
>> [ 288.440587] tx time from SKB is 1707955484303452808
>> payload:b delay:9648 expected:10000 (us)[ 288.440588] q->last time is 1707955483963413016
>>
>> [ 288.440598] now is set to 1707955484283499370
>> payload:a delay:22037 expected:20000 (us)
>> [ 288.440599] tx time from SKB is 1707955484293452808
>> OK. All tests passed
>>
>>
>> Test case 2 (MQPRIO + ETF HW offload)
>>
>> [root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \
>> map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \
>> queues 1@0 1@1 1@2 1@3\
>> hw 0
>> [root@ecbldauto-lvarm04-lnx ~]#
>> tc qdisc replace dev eth0 parent 100:4 etf \
>> clockid CLOCK_TAI delta 40000 offload skip_sock_check
>> [ 89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1
>> [ 89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3
>>
>>
>> [root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2
>>
>> SO_TXTIME ipv4 clock tai
>>
>> glob_tstat = 1707955395256170394
>> [ 199.623650] now is set to 1707955395256215810
>> [ 199.623655] tx time from SKB is 1707955395257170394
>> [ 199.623656] q->last time is 0
>> [ 199.623663] now is set to 1707955395256230029
>> [ 199.623664] tx time from SKB is 1707955395258170394
>> [ 199.623665] q->last time is 0
>> [ 199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec
>> [ 199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec
>>
>> Changes since v1
>> - Changed the commit subject as i am modifying the mono_delivery_time
>> bit with clockid_delivery_time.
>> - Took care of suggestion mentioned by Willem to use the same bit for
>> userspace delivery time as there are no conflicts between TCP and
>> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only
>> RAW and DGRAM sockets interprets it.
>> - Clear explaination of why this is needed mentioned below and this
>> is extending the work done by Martin for mono_delivery_time
>> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
>> - Version 1 patch can be referenced with below link which states
>> the exact problem with tc-etf and discussions which took place
>> https://lore.kernel.org/all/[email protected]/
>>
>> include/linux/skbuff.h | 4 ++++
>> net/ipv4/ip_output.c | 7 +++++++
>> net/ipv4/raw.c | 7 +++++++
>> net/ipv6/ip6_output.c | 8 +++++++-
>> net/ipv6/raw.c | 8 +++++++-
>> net/packet/af_packet.c | 8 +++++++-
>> 6 files changed, 39 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 2dde34c29203..58586d56b19f 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -820,6 +820,10 @@ typedef unsigned char *sk_buff_data_t;
>> * delivery_time in mono clock base (i.e. EDT). Otherwise, the
>> * skb->tstamp has the (rcv) timestamp at ingress and
>> * delivery_time at egress.
>> + * This bit is also set for tstamp coming from userspace which
>> + * acts as an information in the bridge forwarding path to avoid
>> + * resetting the tstamp value when user sets the timestamp using
>> + * SO_TXTIME sockopts.
>
> There are multiple applications of this information aside from
> bridging. I'd drop that and instead rewrite the existing. Something
> like
>
> "delivery_time in mono clock base (i.e., EDT) or a clock base chosen
> by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at
> ingress."
>
Will make the changes accordingly.
>> * @napi_id: id of the NAPI struct this skb came from
>> * @sender_cpu: (aka @napi_id) source CPU in XPS
>> * @alloc_cpu: CPU which did the skb allocation.
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index 5b5a0adb927f..4ae6aea8f8d6 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -1455,6 +1455,13 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>> skb->mark = cork->mark;
>> skb->tstamp = cork->transmit_time;
>> + /* Timestamp coming from userspace using CMSG is stored as part
>> + * of transmit_time as part of cork. To ensure bridge does not
>> + * drop the tstamp in the forwarding path.We are reusing bit
>> + * mono_delivery_time to avoid reset of tstamp in bridge
>> + * forwarding path.
>> + */
>> + skb->mono_delivery_time = !!skb->tstamp;
>
> This patch adds too much verbose commentary, repeated multiple times,
> for such a small change. Keep only the comment in skbuff.h.
>
Got it. I was thinking of the same. I will make the change.
>> /*
>> * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
>> * on dst refcount
>> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
>> index aea89326c697..6e67c0203be8 100644
>> --- a/net/ipv4/raw.c
>> +++ b/net/ipv4/raw.c
>> @@ -353,6 +353,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
>> skb->priority = READ_ONCE(sk->sk_priority);
>> skb->mark = sockc->mark;
>> skb->tstamp = sockc->transmit_time;
>> + /* Timestamp coming from userspace using CMSG is stored as part
>> + * of transmit_time as part of sockcmcookie. To ensure bridge does not
>> + * drop the tstamp in the forwarding path. We are reusing bit
>> + * mono_delivery_time to avoid reset of tstamp in bridge
>> + * forwarding path.
>> + */
>> + skb->mono_delivery_time = !!skb->tstamp;
>> skb_dst_set(skb, &rt->dst);
>> *rtp = NULL;
>>
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index a722a43dd668..f5b5e13a920f 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1922,7 +1922,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
>> skb->priority = READ_ONCE(sk->sk_priority);
>> skb->mark = cork->base.mark;
>> skb->tstamp = cork->base.transmit_time;
>> -
>> + /* Timestamp coming from userspace using CMSG is stored as part
>> + * of transmit_time as part of cork. To ensure bridge does not
>> + * drop the tstamp in the forwarding path. We are reusing bit
>> + * mono_delivery_time to avoid reset of tstamp in bridge
>> + * forwarding path.
>> + */
>> + skb->mono_delivery_time = !!skb->tstamp;
>> ip6_cork_steal_dst(skb, cork);
>> IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
>> if (proto == IPPROTO_ICMPV6) {
>> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
>> index 03dbb874c363..d2e2a1ec3de4 100644
>> --- a/net/ipv6/raw.c
>> +++ b/net/ipv6/raw.c
>> @@ -616,7 +616,13 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
>> skb->priority = READ_ONCE(sk->sk_priority);
>> skb->mark = sockc->mark;
>> skb->tstamp = sockc->transmit_time;
>> -
>> + /* Timestamp coming from userspace using CMSG is stored as part
>> + * of transmit_time as part of sockcmcookie. To ensure bridge does not
>> + * drop the tstamp in the forwarding path.We are reusing bit
>> + * mono_delivery_time to avoid reset of tstamp in bridge
>> + * forwarding path.
>> + */
>> + skb->mono_delivery_time = !!skb->tstamp;
>> skb_put(skb, length);
>> skb_reset_network_header(skb);
>> iph = ipv6_hdr(skb);
>> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
>> index c9bbc2686690..949e936b5786 100644
>> --- a/net/packet/af_packet.c
>> +++ b/net/packet/af_packet.c
>> @@ -2057,7 +2057,13 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
>> skb->priority = READ_ONCE(sk->sk_priority);
>> skb->mark = READ_ONCE(sk->sk_mark);
>> skb->tstamp = sockc.transmit_time;
>> -
>> + /* Timestamp coming from userspace using CMSG is stored as part
>> + * of transmit_time as part of sockcmcookie. To ensure bridge does not
>> + * drop the tstamp in the forwarding path. We are reusing bit
>> + * mono_delivery_time to avoid reset of tstamp in bridge
>> + * forwarding path.
>> + */
>> + skb->mono_delivery_time = !!skb->tstamp;
>
> Search for all occurrences of skb->tstamp getting initialized from
> sockc.transmit_time. af_packet.c has three such cases.
>
Let me check and add at every instance.
>> skb_setup_tx_timestamp(skb, sockc.tsflags);
>>
>> if (unlikely(extra_len == 4))
>> --
>> 2.25.1
>>
>
>