A BPF application, e.g., a TCP congestion control, might benefit from or
even require precise (=hardware) packet timestamps. These timestamps are
already available through __sk_buff.hwtstamp and
bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
not allowed to set SO_TIMESTAMPING* on sockets.
Enable BPF programs to actively request the generation of timestamps
from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
network device must still be done separately, in user space.
This patch had previously been submitted in a two-part series (first
link below). The second patch has been independently applied in commit
7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)")
(second link below).
On the earlier submission, there was the open question whether to only
allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
For a BPF program, this won't make a difference: A timestamp, when
accessed through the fields mentioned above, is directly read from
skb_shared_info.hwtstamps, independent of the places where NEW/OLD is
relevant. See bpf_convert_ctx_access() besides others.
I am unsure, though, when it comes to the interconnection of user space
and BPF "space", when both are interested in the timestamps. I think it
would cause an unsolvable conflict when user space is bound to use
SO_TIMESTAMPING_OLD with a BPF program only allowed to set
SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
mistaken.
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/all/[email protected]/
Cc: Arnd Bergmann <[email protected]>
Cc: Deepa Dinamani <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Signed-off-by: Jörn-Thorben Hinz <[email protected]>
---
include/uapi/linux/bpf.h | 3 ++-
net/core/filter.c | 2 ++
tools/include/uapi/linux/bpf.h | 3 ++-
tools/testing/selftests/bpf/progs/bpf_tracing_net.h | 2 ++
tools/testing/selftests/bpf/progs/setget_sockopt.c | 4 ++++
5 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 754e68ca8744..8825d0648efe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@ union bpf_attr {
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/net/core/filter.c b/net/core/filter.c
index 8c9f67c81e22..4f5280874fd8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5144,6 +5144,8 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
case SO_MAX_PACING_RATE:
case SO_BINDTOIFINDEX:
case SO_TXREHASH:
+ case SO_TIMESTAMPING_NEW:
+ case SO_TIMESTAMPING_OLD:
if (*optlen != sizeof(int))
return -EINVAL;
break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7f24d898efbb..09eaafa6ab43 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@ union bpf_attr {
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 1bdc680b0e0e..95f5f169819e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -15,8 +15,10 @@
#define SO_RCVLOWAT 18
#define SO_BINDTODEVICE 25
#define SO_MARK 36
+#define SO_TIMESTAMPING_OLD 37
#define SO_MAX_PACING_RATE 47
#define SO_BINDTOIFINDEX 62
+#define SO_TIMESTAMPING_NEW 65
#define SO_TXREHASH 74
#define __SO_ACCEPTCON (1 << 16)
diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c
index 7a438600ae98..54205d10793c 100644
--- a/tools/testing/selftests/bpf/progs/setget_sockopt.c
+++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c
@@ -48,6 +48,10 @@ static const struct sockopt_test sol_socket_tests[] = {
{ .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, },
{ .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, },
{ .opt = SO_TXREHASH, .flip = 1, },
+ { .opt = SO_TIMESTAMPING_NEW, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+ .expected = SOF_TIMESTAMPING_RX_HARDWARE, },
+ { .opt = SO_TIMESTAMPING_OLD, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+ .expected = SOF_TIMESTAMPING_RX_HARDWARE, },
{ .opt = 0, },
};
--
2.39.2
Jörn-Thorben Hinz wrote:
> A BPF application, e.g., a TCP congestion control, might benefit from or
> even require precise (=hardware) packet timestamps. These timestamps are
> already available through __sk_buff.hwtstamp and
> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
> not allowed to set SO_TIMESTAMPING* on sockets.
>
> Enable BPF programs to actively request the generation of timestamps
> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
> network device must still be done separately, in user space.
>
> This patch had previously been submitted in a two-part series (first
> link below). The second patch has been independently applied in commit
> 7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)")
> (second link below).
>
> On the earlier submission, there was the open question whether to only
> allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
>
> For a BPF program, this won't make a difference: A timestamp, when
> accessed through the fields mentioned above, is directly read from
> skb_shared_info.hwtstamps, independent of the places where NEW/OLD is
> relevant. See bpf_convert_ctx_access() besides others.
>
> I am unsure, though, when it comes to the interconnection of user space
> and BPF "space", when both are interested in the timestamps. I think it
> would cause an unsolvable conflict when user space is bound to use
> SO_TIMESTAMPING_OLD with a BPF program only allowed to set
> SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
> mistaken.
The difference between OLD and NEW only affects the system calls. It
is not reflected in how the data is stored in the skb, or how BPF can
read the data. A process setting SO_TIMESTAMPING_OLD will still allow
BPF to read data using SO_TIMESTAMPING_NEW.
But, he one place where I see a conflict is in setting sock_flag
SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg
is written:
if (sock_flag(sk, SOCK_TSTAMP_NEW))
put_cmsg_scm_timestamping64(msg, tss);
else
put_cmsg_scm_timestamping(msg, tss);
So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by
a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this
would flip SOCK_TSTAMP_NEW.
Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it
just adds the inverse case.
A related problem is how does the BPF program know which of the two
variants to set. The BPF program is usually compiled and loaded
independently of the running process.
Perhaps one option is to fail the setsockop if it would flip
sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it
changes existing ABI.
Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be
prepared to handle a particular errno, and retry with
SO_TIMESTAMPING_OLD.
> Link: https://lore.kernel.org/lkml/[email protected]/
> Link: https://lore.kernel.org/all/[email protected]/
> Cc: Arnd Bergmann <[email protected]>
> Cc: Deepa Dinamani <[email protected]>
> Cc: Willem de Bruijn <[email protected]>
> Signed-off-by: Jörn-Thorben Hinz <[email protected]>
On 1/16/24 7:17 AM, Willem de Bruijn wrote:
> Jörn-Thorben Hinz wrote:
>> A BPF application, e.g., a TCP congestion control, might benefit from or
>> even require precise (=hardware) packet timestamps. These timestamps are
>> already available through __sk_buff.hwtstamp and
>> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
>> not allowed to set SO_TIMESTAMPING* on sockets.
This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about
others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue
which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
>>
>> Enable BPF programs to actively request the generation of timestamps
>> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
>> network device must still be done separately, in user space.
hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the
SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
I likely miss something. When skb is created in the driver rx path, the sk is
not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the
skb_shinfo(skb)->hwtstamps?
Martin KaFai Lau wrote:
> On 1/16/24 7:17 AM, Willem de Bruijn wrote:
> > Jörn-Thorben Hinz wrote:
> >> A BPF application, e.g., a TCP congestion control, might benefit from or
> >> even require precise (=hardware) packet timestamps. These timestamps are
> >> already available through __sk_buff.hwtstamp and
> >> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
> >> not allowed to set SO_TIMESTAMPING* on sockets.
>
> This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about
> others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue
> which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
Good point. Or should not be allowed to be set from BPF.
That significantly changes process behavior, e.g., by returning POLLERR.
> >>
> >> Enable BPF programs to actively request the generation of timestamps
> >> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
> >> network device must still be done separately, in user space.
>
> hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the
> SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
>
> I likely miss something. When skb is created in the driver rx path, the sk is
> not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the
> skb_shinfo(skb)->hwtstamps?
Indeed it does not seem to do anything in the datapath.
Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call net_enable_timestamp
to start timestamping packets.
But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.
Drivers do use it in ethtool get_ts_info to signal hardware
capabilities. But those must be configured using the ioctl.
It is there more for consistency with the other timestamp recording
options, I suppose.
On 1/17/24 7:55 AM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 1/16/24 7:17 AM, Willem de Bruijn wrote:
>>> Jörn-Thorben Hinz wrote:
>>>> A BPF application, e.g., a TCP congestion control, might benefit from or
>>>> even require precise (=hardware) packet timestamps. These timestamps are
>>>> already available through __sk_buff.hwtstamp and
>>>> bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
>>>> not allowed to set SO_TIMESTAMPING* on sockets.
>>
>> This patch only uses the SOF_TIMESTAMPING_RX_HARDWARE in the selftest. How about
>> others? e.g. the SOF_TIMESTAMPING_TX_* that will affect the sk->sk_error_queue
>> which seems not good. If rx tstamp is useful, tx tstamp should be useful also?
>
> Good point. Or should not be allowed to be set from BPF.
>
> That significantly changes process behavior, e.g., by returning POLLERR.
>
>>>>
>>>> Enable BPF programs to actively request the generation of timestamps
>>>> from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
>>>> network device must still be done separately, in user space.
>>
>> hmm... so both ioctl(SIOCSHWTSTAMP) of the netdevice and the
>> SOF_TIMESTAMPING_RX_HARDWARE of the sk must be done?
>>
>> I likely miss something. When skb is created in the driver rx path, the sk is
>> not known yet though. How the SOF_TIMESTAMPING_RX_HARDWARE of the sk affects the
>> skb_shinfo(skb)->hwtstamps?
>
> Indeed it does not seem to do anything in the datapath.
>
> Requesting SOF_TIMESTAMPING_RX_SOFTWARE will call net_enable_timestamp
> to start timestamping packets.
>
> But SOF_TIMESTAMPING_RX_HARDWARE does not so thing.
>
> Drivers do use it in ethtool get_ts_info to signal hardware
> capabilities. But those must be configured using the ioctl.
>
> It is there more for consistency with the other timestamp recording
> options, I suppose.
>
Thanks for the explanation on the SOF_TIMESTAMPING_RX_{HARDWARE,SOFTWARE}.
__sk_buff.hwtstamp should have the NIC rx timestamp then as long as the NIC is
ioctl configured.
Jorn, do you need RX_SOFTWARE? From looking at net_timestamp_set(), any socket
requested RX_SOFTWARE should be enough to get a skb->tstamp for all skbs. A
workaround is to manually create a socket and turn on RX_SOFTWARE.
It will still be nice to get proper bpf_setsockopt() support for RX_SOFTWARE but
it should be considered together with how SO_TIMESTAMPING_TX_* should work in
bpf prog considering the TX tstamping does not have a workaround solution like
RX_SOFTWARE.
It is probably cleaner to have a separate bit in sk->sk_tsflags for bpf such
that the bpf prog won't be affected by the userspace turning it on/off and it
won't change the userspace's expectation also (e.g. sk_error_queue and POLLERR).
The part that needs more thoughts in the tx tstamp is how to notify the bpf prog
to consume it. Potentially the kernel can involve a bpf prog to collect the tx
timestamp when the bpf bit in sk->sk_tsflags is set. An example on how TCP-CC is
using it will help to think of the approach here.
Jörn-Thorben Hinz wrote:
> On Tue, 2024-01-16 at 10:17 -0500, Willem de Bruijn wrote:
> > Jörn-Thorben Hinz wrote:
> > > A BPF application, e.g., a TCP congestion control, might benefit
> > > from or
> > > even require precise (=hardware) packet timestamps. These
> > > timestamps are
> > > already available through __sk_buff.hwtstamp and
> > > bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs
> > > were
> > > not allowed to set SO_TIMESTAMPING* on sockets.
> > >
> > > Enable BPF programs to actively request the generation of
> > > timestamps
> > > from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
> > > network device must still be done separately, in user space.
> > >
> > > This patch had previously been submitted in a two-part series
> > > (first
> > > link below). The second patch has been independently applied in
> > > commit
> > > 7f6ca95d16b9 ("net: Implement missing
> > > getsockopt(SO_TIMESTAMPING_NEW)")
> > > (second link below).
> > >
> > > On the earlier submission, there was the open question whether to
> > > only
> > > allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
> > >
> > > For a BPF program, this won't make a difference: A timestamp, when
> > > accessed through the fields mentioned above, is directly read from
> > > skb_shared_info.hwtstamps, independent of the places where NEW/OLD
> > > is
> > > relevant. See bpf_convert_ctx_access() besides others.
> > >
> > > I am unsure, though, when it comes to the interconnection of user
> > > space
> > > and BPF "space", when both are interested in the timestamps. I
> > > think it
> > > would cause an unsolvable conflict when user space is bound to use
> > > SO_TIMESTAMPING_OLD with a BPF program only allowed to set
> > > SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
> > > mistaken.
> >
> > The difference between OLD and NEW only affects the system calls. It
> > is not reflected in how the data is stored in the skb, or how BPF can
> > read the data. A process setting SO_TIMESTAMPING_OLD will still allow
> > BPF to read data using SO_TIMESTAMPING_NEW.
> >
> > But, he one place where I see a conflict is in setting sock_flag
> > SOCK_TSTAMP_NEW. That affects what getsockopt returns and which cmsg
> > is written:
> >
> > if (sock_flag(sk, SOCK_TSTAMP_NEW))
> > put_cmsg_scm_timestamping64(msg, tss);
> > else
> > put_cmsg_scm_timestamping(msg, tss);
> >
> > So a process could issue setsockopt SO_TIMESTAMPING_OLD followed by
> > a BPF program that issues setsockopt SO_TIMESTAMPING_NEW and this
> > would flip SOCK_TSTAMP_NEW.
> >
> > Just allowing BPF to set SO_TIMESTAMPING_OLD does not fix it, as it
> > just adds the inverse case.
> Thanks for elaborating on this. I see I only thought of half the
> possible conflicting situations.
>
> >
> > A related problem is how does the BPF program know which of the two
> > variants to set. The BPF program is usually compiled and loaded
> > independently of the running process.
> True, that is an additional challenge. And with respect to CO-RE, I
> think a really portable BPF program could (or at least should) not even
> decide on NEW or OLD at compile time.
>
> >
> > Perhaps one option is to fail the setsockop if it would flip
> > sock_flag SOCK_TSTAMP_NEW. But only if called from BPF, as else it
> > changes existing ABI.
> >
> > Then a BPF program can attempt to set SO_TIMESTAMPING NEW, be
> > prepared to handle a particular errno, and retry with
> > SO_TIMESTAMPING_OLD.
> Hmm, would be possible, yes. But sounds like a weird and unexpected
> special-case behavior to the occasional BPF user.
Agreed. So perhaps we're back to where we say: this is a new feature
for BPF, only support it on modern environments that use
SO_TIMESTAMPING_NEW?