2022-01-13 07:03:07

by Menglong Dong

[permalink] [raw]
Subject: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

From: Menglong Dong <[email protected]>

The description of 'dst_port' in 'struct bpf_sock' is not accurated.
In fact, 'dst_port' is not in network byte order, it is 'partly' in
network byte order.

We can see it in bpf_sock_convert_ctx_access():

> case offsetof(struct bpf_sock, dst_port):
> *insn++ = BPF_LDX_MEM(
> BPF_FIELD_SIZEOF(struct sock_common, skc_dport),
> si->dst_reg, si->src_reg,
> bpf_target_off(struct sock_common, skc_dport,
> sizeof_field(struct sock_common,
> skc_dport),
> target_size));

It simply passes 'sock_common->skc_dport' to 'bpf_sock->dst_port',
which makes that the low 16-bits of 'dst_port' is equal to 'skc_port'
and is in network byte order, but the high 16-bites of 'dst_port' is
0. And the actual port is 'bpf_ntohs((__u16)dst_port)', and
'bpf_ntohl(dst_port)' is totally not the right port.

This is different form 'remote_port' in 'struct bpf_sock_ops' or
'struct __sk_buff':

> case offsetof(struct __sk_buff, remote_port):
> BUILD_BUG_ON(sizeof_field(struct sock_common, skc_dport) != 2);
>
> *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, sk),
> si->dst_reg, si->src_reg,
> offsetof(struct sk_buff, sk));
> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
> bpf_target_off(struct sock_common,
> skc_dport,
> 2, target_size));
> #ifndef __BIG_ENDIAN_BITFIELD
> *insn++ = BPF_ALU32_IMM(BPF_LSH, si->dst_reg, 16);
> #endif

We can see that it will left move 16-bits in little endian, which makes
the whole 'remote_port' is in network byte order, and the actual port
is bpf_ntohl(remote_port).

Note this in the document of 'dst_port'. ( Maybe this should be unified
in the code? )

Signed-off-by: Menglong Dong <[email protected]>
---
include/uapi/linux/bpf.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0383d371b9a..891a182a749a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5500,7 +5500,11 @@ struct bpf_sock {
__u32 src_ip4;
__u32 src_ip6[4];
__u32 src_port; /* host byte order */
- __u32 dst_port; /* network byte order */
+ __u32 dst_port; /* low 16-bits are in network byte order,
+ * and high 16-bits are filled by 0.
+ * So the real port in host byte order is
+ * bpf_ntohs((__u16)dst_port).
+ */
__u32 dst_ip4;
__u32 dst_ip6[4];
__u32 state;
--
2.34.1



2022-01-13 18:55:53

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Wed, Jan 12, 2022 at 11:03 PM <[email protected]> wrote:
>
> From: Menglong Dong <[email protected]>
>
> The description of 'dst_port' in 'struct bpf_sock' is not accurated.
> In fact, 'dst_port' is not in network byte order, it is 'partly' in
> network byte order.
>
> We can see it in bpf_sock_convert_ctx_access():
>
> > case offsetof(struct bpf_sock, dst_port):
> > *insn++ = BPF_LDX_MEM(
> > BPF_FIELD_SIZEOF(struct sock_common, skc_dport),
> > si->dst_reg, si->src_reg,
> > bpf_target_off(struct sock_common, skc_dport,
> > sizeof_field(struct sock_common,
> > skc_dport),
> > target_size));
>
> It simply passes 'sock_common->skc_dport' to 'bpf_sock->dst_port',
> which makes that the low 16-bits of 'dst_port' is equal to 'skc_port'
> and is in network byte order, but the high 16-bites of 'dst_port' is
> 0. And the actual port is 'bpf_ntohs((__u16)dst_port)', and
> 'bpf_ntohl(dst_port)' is totally not the right port.
>
> This is different form 'remote_port' in 'struct bpf_sock_ops' or
> 'struct __sk_buff':
>
> > case offsetof(struct __sk_buff, remote_port):
> > BUILD_BUG_ON(sizeof_field(struct sock_common, skc_dport) != 2);
> >
> > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, sk),
> > si->dst_reg, si->src_reg,
> > offsetof(struct sk_buff, sk));
> > *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
> > bpf_target_off(struct sock_common,
> > skc_dport,
> > 2, target_size));
> > #ifndef __BIG_ENDIAN_BITFIELD
> > *insn++ = BPF_ALU32_IMM(BPF_LSH, si->dst_reg, 16);
> > #endif
>
> We can see that it will left move 16-bits in little endian, which makes
> the whole 'remote_port' is in network byte order, and the actual port
> is bpf_ntohl(remote_port).
>
> Note this in the document of 'dst_port'. ( Maybe this should be unified
> in the code? )
>
> Signed-off-by: Menglong Dong <[email protected]>

Acked-by: Song Liu <[email protected]>

> ---
> include/uapi/linux/bpf.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b0383d371b9a..891a182a749a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5500,7 +5500,11 @@ struct bpf_sock {
> __u32 src_ip4;
> __u32 src_ip6[4];
> __u32 src_port; /* host byte order */
> - __u32 dst_port; /* network byte order */
> + __u32 dst_port; /* low 16-bits are in network byte order,
> + * and high 16-bits are filled by 0.
> + * So the real port in host byte order is
> + * bpf_ntohs((__u16)dst_port).
> + */
> __u32 dst_ip4;
> __u32 dst_ip6[4];
> __u32 state;
> --
> 2.34.1
>

2022-01-21 20:05:55

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Wed, Jan 12, 2022 at 11:03 PM <[email protected]> wrote:
>
> From: Menglong Dong <[email protected]>
>
> The description of 'dst_port' in 'struct bpf_sock' is not accurated.
> In fact, 'dst_port' is not in network byte order, it is 'partly' in
> network byte order.
>
> We can see it in bpf_sock_convert_ctx_access():
>
> > case offsetof(struct bpf_sock, dst_port):
> > *insn++ = BPF_LDX_MEM(
> > BPF_FIELD_SIZEOF(struct sock_common, skc_dport),
> > si->dst_reg, si->src_reg,
> > bpf_target_off(struct sock_common, skc_dport,
> > sizeof_field(struct sock_common,
> > skc_dport),
> > target_size));
>
> It simply passes 'sock_common->skc_dport' to 'bpf_sock->dst_port',
> which makes that the low 16-bits of 'dst_port' is equal to 'skc_port'
> and is in network byte order, but the high 16-bites of 'dst_port' is
> 0. And the actual port is 'bpf_ntohs((__u16)dst_port)', and
> 'bpf_ntohl(dst_port)' is totally not the right port.
>
> This is different form 'remote_port' in 'struct bpf_sock_ops' or
> 'struct __sk_buff':
>
> > case offsetof(struct __sk_buff, remote_port):
> > BUILD_BUG_ON(sizeof_field(struct sock_common, skc_dport) != 2);
> >
> > *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, sk),
> > si->dst_reg, si->src_reg,
> > offsetof(struct sk_buff, sk));
> > *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
> > bpf_target_off(struct sock_common,
> > skc_dport,
> > 2, target_size));
> > #ifndef __BIG_ENDIAN_BITFIELD
> > *insn++ = BPF_ALU32_IMM(BPF_LSH, si->dst_reg, 16);
> > #endif
>
> We can see that it will left move 16-bits in little endian, which makes
> the whole 'remote_port' is in network byte order, and the actual port
> is bpf_ntohl(remote_port).
>
> Note this in the document of 'dst_port'. ( Maybe this should be unified
> in the code? )

Looks like
__sk_buff->remote_port
bpf_sock_ops->remote_port
sk_msg_md->remote_port
are doing the right thing,
but bpf_sock->dst_port is not correct?

I think it's better to fix it,
but probably need to consolidate it with
convert_ctx_accesses() that deals with narrow access.
I suspect reading u8 from three flavors of 'remote_port'
won't be correct.
'dst_port' works with a narrow load, but gets endianness wrong.

2022-01-21 21:04:21

by Menglong Dong

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

Hello!

On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
<[email protected]> wrote:
>
[...]
>
> Looks like
> __sk_buff->remote_port
> bpf_sock_ops->remote_port
> sk_msg_md->remote_port
> are doing the right thing,
> but bpf_sock->dst_port is not correct?
>
> I think it's better to fix it,
> but probably need to consolidate it with
> convert_ctx_accesses() that deals with narrow access.
> I suspect reading u8 from three flavors of 'remote_port'
> won't be correct.

What's the meaning of 'narrow access'? Do you mean to
make 'remote_port' u16? Or 'remote_port' should be made
accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
won't work, as it only is accessible with u32.

I can simply make 'dst_port' endianness right with what
'remote_port' do.

Thanks!
Menglong Dong


> 'dst_port' works with a narrow load, but gets endianness wrong.

2022-01-21 21:05:13

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> Hello!
>
> On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> <[email protected]> wrote:
> >
> [...]
> >
> > Looks like
> > __sk_buff->remote_port
> > bpf_sock_ops->remote_port
> > sk_msg_md->remote_port
> > are doing the right thing,
> > but bpf_sock->dst_port is not correct?
> >
> > I think it's better to fix it,
> > but probably need to consolidate it with
> > convert_ctx_accesses() that deals with narrow access.
> > I suspect reading u8 from three flavors of 'remote_port'
> > won't be correct.
>
> What's the meaning of 'narrow access'? Do you mean to
> make 'remote_port' u16? Or 'remote_port' should be made
> accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> won't work, as it only is accessible with u32.

u8 access to remote_port won't pass the verifier,
but u8 access to dst_port will.
Though it will return incorrect data.
See how convert_ctx_accesses() handles narrow loads.
I think we need to generalize it for different endian fields.

2022-01-21 22:15:31

by Menglong Dong

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
<[email protected]> wrote:
>
> On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > Hello!
> >
> > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > <[email protected]> wrote:
> > >
> > [...]
> > >
> > > Looks like
> > > __sk_buff->remote_port
> > > bpf_sock_ops->remote_port
> > > sk_msg_md->remote_port
> > > are doing the right thing,
> > > but bpf_sock->dst_port is not correct?
> > >
> > > I think it's better to fix it,
> > > but probably need to consolidate it with
> > > convert_ctx_accesses() that deals with narrow access.
> > > I suspect reading u8 from three flavors of 'remote_port'
> > > won't be correct.
> >
> > What's the meaning of 'narrow access'? Do you mean to
> > make 'remote_port' u16? Or 'remote_port' should be made
> > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > won't work, as it only is accessible with u32.
>
> u8 access to remote_port won't pass the verifier,
> but u8 access to dst_port will.
> Though it will return incorrect data.
> See how convert_ctx_accesses() handles narrow loads.
> I think we need to generalize it for different endian fields.

Yeah, I understand narrower load in convert_ctx_accesses()
now. Seems u8 access to dst_port can't pass the verifier too,
which can be seen form bpf_sock_is_valid_access():

$ switch (off) {
$ case offsetof(struct bpf_sock, state):
$ case offsetof(struct bpf_sock, family):
$ case offsetof(struct bpf_sock, type):
$ case offsetof(struct bpf_sock, protocol):
$ case offsetof(struct bpf_sock, dst_port): // u8 access is not allowed
$ case offsetof(struct bpf_sock, src_port):
$ case offsetof(struct bpf_sock, rx_queue_mapping):
$ case bpf_ctx_range(struct bpf_sock, src_ip4):
$ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
$ case bpf_ctx_range(struct bpf_sock, dst_ip4):
$ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
$ bpf_ctx_record_field_size(info, size_default);
$ return bpf_ctx_narrow_access_ok(off, size, size_default);
$ }

I'm still not sure what should we do now. Should we make all
remote_port and dst_port narrower accessable and endianness
right? For example the remote_port in struct bpf_sock_ops:

--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
return false;
info->reg_type = PTR_TO_PACKET_END;
break;
+ case bpf_ctx_range(struct bpf_sock_ops, remote_port):
case offsetof(struct bpf_sock_ops, skb_tcp_flags):
bpf_ctx_record_field_size(info, size_default);
return bpf_ctx_narrow_access_ok(off, size,

If remote_port/dst_port are made narrower accessable, the
result will be right. Therefore, *((u16*)&sk->remote_port) will
be the port with network byte order. And the port in host byte
order can be get with:
bpf_ntohs(*((u16*)&sk->remote_port))
or
bpf_htonl(sk->remote_port)

Thanks!
Menglong Dong

2022-01-22 00:32:46

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Thu, Jan 20, 2022 at 6:18 AM Menglong Dong <[email protected]> wrote:
>
> On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
> <[email protected]> wrote:
> >
> > On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > > Hello!
> > >
> > > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > > <[email protected]> wrote:
> > > >
> > > [...]
> > > >
> > > > Looks like
> > > > __sk_buff->remote_port
> > > > bpf_sock_ops->remote_port
> > > > sk_msg_md->remote_port
> > > > are doing the right thing,
> > > > but bpf_sock->dst_port is not correct?
> > > >
> > > > I think it's better to fix it,
> > > > but probably need to consolidate it with
> > > > convert_ctx_accesses() that deals with narrow access.
> > > > I suspect reading u8 from three flavors of 'remote_port'
> > > > won't be correct.
> > >
> > > What's the meaning of 'narrow access'? Do you mean to
> > > make 'remote_port' u16? Or 'remote_port' should be made
> > > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > > won't work, as it only is accessible with u32.
> >
> > u8 access to remote_port won't pass the verifier,
> > but u8 access to dst_port will.
> > Though it will return incorrect data.
> > See how convert_ctx_accesses() handles narrow loads.
> > I think we need to generalize it for different endian fields.
>
> Yeah, I understand narrower load in convert_ctx_accesses()
> now. Seems u8 access to dst_port can't pass the verifier too,
> which can be seen form bpf_sock_is_valid_access():
>
> $ switch (off) {
> $ case offsetof(struct bpf_sock, state):
> $ case offsetof(struct bpf_sock, family):
> $ case offsetof(struct bpf_sock, type):
> $ case offsetof(struct bpf_sock, protocol):
> $ case offsetof(struct bpf_sock, dst_port): // u8 access is not allowed
> $ case offsetof(struct bpf_sock, src_port):
> $ case offsetof(struct bpf_sock, rx_queue_mapping):
> $ case bpf_ctx_range(struct bpf_sock, src_ip4):
> $ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
> $ case bpf_ctx_range(struct bpf_sock, dst_ip4):
> $ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> $ bpf_ctx_record_field_size(info, size_default);
> $ return bpf_ctx_narrow_access_ok(off, size, size_default);
> $ }
>
> I'm still not sure what should we do now. Should we make all
> remote_port and dst_port narrower accessable and endianness
> right? For example the remote_port in struct bpf_sock_ops:
>
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
> return false;
> info->reg_type = PTR_TO_PACKET_END;
> break;
> + case bpf_ctx_range(struct bpf_sock_ops, remote_port):

Ahh. bpf_sock_ops don't have it.
But bpf_sk_lookup and sk_msg_md have it.

bpf_sk_lookup->remote_port
supports narrow access.

When it accesses sport from bpf_sk_lookup_kern.

and we have tests that do u8 access from remote_port.
See verifier/ctx_sk_lookup.c

> case offsetof(struct bpf_sock_ops, skb_tcp_flags):
> bpf_ctx_record_field_size(info, size_default);
> return bpf_ctx_narrow_access_ok(off, size,
>
> If remote_port/dst_port are made narrower accessable, the
> result will be right. Therefore, *((u16*)&sk->remote_port) will
> be the port with network byte order. And the port in host byte
> order can be get with:
> bpf_ntohs(*((u16*)&sk->remote_port))
> or
> bpf_htonl(sk->remote_port)

So u8, u16, u32 will work if we make them narrow-accessible, right?

The summary if I understood it:
. only bpf_sk_lookup->remote_port is doing it correctly for u8,u16,u32 ?
. bpf_sock->dst_port is not correct for u32,
since it's missing bpf_ctx_range() ?
. __sk_buff->remote_port
bpf_sock_ops->remote_port
sk_msg_md->remote_port
correct for u32 access only. They don't support narrow access.

but wait
we have a test for bpf_sock->dst_port in progs/test_sock_fields.c.
How does it work then?

I think we need more eyes on the problem.
cc-ing more experts.

2022-01-25 08:38:24

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Thu, Jan 20, 2022 at 09:17:27PM -0800, Alexei Starovoitov wrote:
> On Thu, Jan 20, 2022 at 6:18 AM Menglong Dong <[email protected]> wrote:
> >
> > On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
> > <[email protected]> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > > > Hello!
> > > >
> > > > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > > > <[email protected]> wrote:
> > > > >
> > > > [...]
> > > > >
> > > > > Looks like
> > > > > __sk_buff->remote_port
> > > > > bpf_sock_ops->remote_port
> > > > > sk_msg_md->remote_port
> > > > > are doing the right thing,
> > > > > but bpf_sock->dst_port is not correct?
> > > > >
> > > > > I think it's better to fix it,
> > > > > but probably need to consolidate it with
> > > > > convert_ctx_accesses() that deals with narrow access.
> > > > > I suspect reading u8 from three flavors of 'remote_port'
> > > > > won't be correct.
> > > >
> > > > What's the meaning of 'narrow access'? Do you mean to
> > > > make 'remote_port' u16? Or 'remote_port' should be made
> > > > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > > > won't work, as it only is accessible with u32.
> > >
> > > u8 access to remote_port won't pass the verifier,
> > > but u8 access to dst_port will.
> > > Though it will return incorrect data.
> > > See how convert_ctx_accesses() handles narrow loads.
> > > I think we need to generalize it for different endian fields.
> >
> > Yeah, I understand narrower load in convert_ctx_accesses()
> > now. Seems u8 access to dst_port can't pass the verifier too,
> > which can be seen form bpf_sock_is_valid_access():
> >
> > $ switch (off) {
> > $ case offsetof(struct bpf_sock, state):
> > $ case offsetof(struct bpf_sock, family):
> > $ case offsetof(struct bpf_sock, type):
> > $ case offsetof(struct bpf_sock, protocol):
> > $ case offsetof(struct bpf_sock, dst_port): // u8 access is not allowed
> > $ case offsetof(struct bpf_sock, src_port):
> > $ case offsetof(struct bpf_sock, rx_queue_mapping):
> > $ case bpf_ctx_range(struct bpf_sock, src_ip4):
> > $ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
> > $ case bpf_ctx_range(struct bpf_sock, dst_ip4):
> > $ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> > $ bpf_ctx_record_field_size(info, size_default);
> > $ return bpf_ctx_narrow_access_ok(off, size, size_default);
> > $ }
> >
> > I'm still not sure what should we do now. Should we make all
> > remote_port and dst_port narrower accessable and endianness
> > right? For example the remote_port in struct bpf_sock_ops:
> >
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
> > return false;
> > info->reg_type = PTR_TO_PACKET_END;
> > break;
> > + case bpf_ctx_range(struct bpf_sock_ops, remote_port):
>
> Ahh. bpf_sock_ops don't have it.
> But bpf_sk_lookup and sk_msg_md have it.
>
> bpf_sk_lookup->remote_port
> supports narrow access.
>
> When it accesses sport from bpf_sk_lookup_kern.
>
> and we have tests that do u8 access from remote_port.
> See verifier/ctx_sk_lookup.c
>
> > case offsetof(struct bpf_sock_ops, skb_tcp_flags):
> > bpf_ctx_record_field_size(info, size_default);
> > return bpf_ctx_narrow_access_ok(off, size,
> >
> > If remote_port/dst_port are made narrower accessable, the
> > result will be right. Therefore, *((u16*)&sk->remote_port) will
> > be the port with network byte order. And the port in host byte
> > order can be get with:
> > bpf_ntohs(*((u16*)&sk->remote_port))
> > or
> > bpf_htonl(sk->remote_port)
>
> So u8, u16, u32 will work if we make them narrow-accessible, right?
>
> The summary if I understood it:
> . only bpf_sk_lookup->remote_port is doing it correctly for u8,u16,u32 ?
> . bpf_sock->dst_port is not correct for u32,
> since it's missing bpf_ctx_range() ?
> . __sk_buff->remote_port
> bpf_sock_ops->remote_port
> sk_msg_md->remote_port
> correct for u32 access only. They don't support narrow access.
>
> but wait
> we have a test for bpf_sock->dst_port in progs/test_sock_fields.c.
> How does it work then?
>
> I think we need more eyes on the problem.
> cc-ing more experts.
iiuc, I think both bpf_sk_lookup and bpf_sock allow narrow access.
bpf_sock only allows ((__u8 *)&bpf_sock->dst_port)[0] but
not ((__u8 *)&bpf_sock->dst_port)[1]. bpf_sk_lookup allows reading
a byte at [0], [1], [2], and [3].

The test_sock_fields.c currently works because it is comparing
with another __u16: "sk->dst_port == srv_sa6.sin6_port".
It should also work with bpf_ntohS() which usually is what the
userspace program expects when dealing with port instead of using bpf_ntohl()?
Thus, I think we can keep the lower 16 bits way that bpf_sock->dst_port
and bpf_sk_lookup->remote_port (and also bpf_sock_addr->user_port ?) are
using. Also, changing it to the upper 16 bits will break existing
bpf progs.

For narrow access with any number of bytes at any offset may be useful
for IP[6] addr. Not sure about the port though. Ideally it should only
allow sizeof(__u16) read at offset 0. However, I think at this point it makes
sense to make them consistent with how bpf_sk_lookup does it also,
i.e. allow byte [0], [1], [2], and [3] access.

would love to hear how others think about it.

2022-01-25 08:40:07

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Mon, Jan 24, 2022 at 05:03:20PM -0800, Alexei Starovoitov wrote:
> On Mon, Jan 24, 2022 at 4:35 PM Martin KaFai Lau <[email protected]> wrote:
> >
> > On Thu, Jan 20, 2022 at 09:17:27PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Jan 20, 2022 at 6:18 AM Menglong Dong <[email protected]> wrote:
> > > >
> > > > On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > > > > > Hello!
> > > > > >
> > > > > > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > [...]
> > > > > > >
> > > > > > > Looks like
> > > > > > > __sk_buff->remote_port
> > > > > > > bpf_sock_ops->remote_port
> > > > > > > sk_msg_md->remote_port
> > > > > > > are doing the right thing,
> > > > > > > but bpf_sock->dst_port is not correct?
> > > > > > >
> > > > > > > I think it's better to fix it,
> > > > > > > but probably need to consolidate it with
> > > > > > > convert_ctx_accesses() that deals with narrow access.
> > > > > > > I suspect reading u8 from three flavors of 'remote_port'
> > > > > > > won't be correct.
> > > > > >
> > > > > > What's the meaning of 'narrow access'? Do you mean to
> > > > > > make 'remote_port' u16? Or 'remote_port' should be made
> > > > > > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > > > > > won't work, as it only is accessible with u32.
> > > > >
> > > > > u8 access to remote_port won't pass the verifier,
> > > > > but u8 access to dst_port will.
> > > > > Though it will return incorrect data.
> > > > > See how convert_ctx_accesses() handles narrow loads.
> > > > > I think we need to generalize it for different endian fields.
> > > >
> > > > Yeah, I understand narrower load in convert_ctx_accesses()
> > > > now. Seems u8 access to dst_port can't pass the verifier too,
> > > > which can be seen form bpf_sock_is_valid_access():
> > > >
> > > > $ switch (off) {
> > > > $ case offsetof(struct bpf_sock, state):
> > > > $ case offsetof(struct bpf_sock, family):
> > > > $ case offsetof(struct bpf_sock, type):
> > > > $ case offsetof(struct bpf_sock, protocol):
> > > > $ case offsetof(struct bpf_sock, dst_port): // u8 access is not allowed
> > > > $ case offsetof(struct bpf_sock, src_port):
> > > > $ case offsetof(struct bpf_sock, rx_queue_mapping):
> > > > $ case bpf_ctx_range(struct bpf_sock, src_ip4):
> > > > $ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
> > > > $ case bpf_ctx_range(struct bpf_sock, dst_ip4):
> > > > $ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> > > > $ bpf_ctx_record_field_size(info, size_default);
> > > > $ return bpf_ctx_narrow_access_ok(off, size, size_default);
> > > > $ }
> > > >
> > > > I'm still not sure what should we do now. Should we make all
> > > > remote_port and dst_port narrower accessable and endianness
> > > > right? For example the remote_port in struct bpf_sock_ops:
> > > >
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
> > > > return false;
> > > > info->reg_type = PTR_TO_PACKET_END;
> > > > break;
> > > > + case bpf_ctx_range(struct bpf_sock_ops, remote_port):
> > >
> > > Ahh. bpf_sock_ops don't have it.
> > > But bpf_sk_lookup and sk_msg_md have it.
> > >
> > > bpf_sk_lookup->remote_port
> > > supports narrow access.
> > >
> > > When it accesses sport from bpf_sk_lookup_kern.
> > >
> > > and we have tests that do u8 access from remote_port.
> > > See verifier/ctx_sk_lookup.c
> > >
> > > > case offsetof(struct bpf_sock_ops, skb_tcp_flags):
> > > > bpf_ctx_record_field_size(info, size_default);
> > > > return bpf_ctx_narrow_access_ok(off, size,
> > > >
> > > > If remote_port/dst_port are made narrower accessable, the
> > > > result will be right. Therefore, *((u16*)&sk->remote_port) will
> > > > be the port with network byte order. And the port in host byte
> > > > order can be get with:
> > > > bpf_ntohs(*((u16*)&sk->remote_port))
> > > > or
> > > > bpf_htonl(sk->remote_port)
> > >
> > > So u8, u16, u32 will work if we make them narrow-accessible, right?
> > >
> > > The summary if I understood it:
> > > . only bpf_sk_lookup->remote_port is doing it correctly for u8,u16,u32 ?
> > > . bpf_sock->dst_port is not correct for u32,
> > > since it's missing bpf_ctx_range() ?
> > > . __sk_buff->remote_port
> > > bpf_sock_ops->remote_port
> > > sk_msg_md->remote_port
> > > correct for u32 access only. They don't support narrow access.
> > >
> > > but wait
> > > we have a test for bpf_sock->dst_port in progs/test_sock_fields.c.
> > > How does it work then?
> > >
> > > I think we need more eyes on the problem.
> > > cc-ing more experts.
> > iiuc, I think both bpf_sk_lookup and bpf_sock allow narrow access.
> > bpf_sock only allows ((__u8 *)&bpf_sock->dst_port)[0] but
> > not ((__u8 *)&bpf_sock->dst_port)[1]. bpf_sk_lookup allows reading
> > a byte at [0], [1], [2], and [3].
> >
> > The test_sock_fields.c currently works because it is comparing
> > with another __u16: "sk->dst_port == srv_sa6.sin6_port".
> > It should also work with bpf_ntohS() which usually is what the
> > userspace program expects when dealing with port instead of using bpf_ntohl()?
> > Thus, I think we can keep the lower 16 bits way that bpf_sock->dst_port
> > and bpf_sk_lookup->remote_port (and also bpf_sock_addr->user_port ?) are
> > using. Also, changing it to the upper 16 bits will break existing
> > bpf progs.
> >
> > For narrow access with any number of bytes at any offset may be useful
> > for IP[6] addr. Not sure about the port though. Ideally it should only
> > allow sizeof(__u16) read at offset 0. However, I think at this point it makes
> > sense to make them consistent with how bpf_sk_lookup does it also,
> > i.e. allow byte [0], [1], [2], and [3] access.
>
> Sounds like the proposal is to do:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a06931c27eeb..1a8c97bc1927 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8276,9 +8276,9 @@ bool bpf_sock_is_valid_access(int off, int size,
> enum bpf_access_type type,
> case offsetof(struct bpf_sock, family):
> case offsetof(struct bpf_sock, type):
> case offsetof(struct bpf_sock, protocol):
> - case offsetof(struct bpf_sock, dst_port):
> case offsetof(struct bpf_sock, src_port):
> case offsetof(struct bpf_sock, rx_queue_mapping):
> + case bpf_ctx_range(struct bpf_sock, dst_port):
> case bpf_ctx_range(struct bpf_sock, src_ip4):
>
> and then document bpf_sock->dst_port and bpf_sk_lookup->remote_port
also bpf_sock_addr->user_port

> behavior and their difference vs
> __sk_buff->remote_port
> bpf_sock_ops->remote_port
> sk_msg_md->remote_port
> ?
Yes, agree on the code change and adding doc.

> I suspect we cannot remove lshift_16 from them either,
> since it might break some prog as well.
Right, I believe the existing lshift_16 has to stay.

2022-01-25 08:55:37

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Mon, Jan 24, 2022 at 4:35 PM Martin KaFai Lau <[email protected]> wrote:
>
> On Thu, Jan 20, 2022 at 09:17:27PM -0800, Alexei Starovoitov wrote:
> > On Thu, Jan 20, 2022 at 6:18 AM Menglong Dong <[email protected]> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > > > > Hello!
> > > > >
> > > > > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > [...]
> > > > > >
> > > > > > Looks like
> > > > > > __sk_buff->remote_port
> > > > > > bpf_sock_ops->remote_port
> > > > > > sk_msg_md->remote_port
> > > > > > are doing the right thing,
> > > > > > but bpf_sock->dst_port is not correct?
> > > > > >
> > > > > > I think it's better to fix it,
> > > > > > but probably need to consolidate it with
> > > > > > convert_ctx_accesses() that deals with narrow access.
> > > > > > I suspect reading u8 from three flavors of 'remote_port'
> > > > > > won't be correct.
> > > > >
> > > > > What's the meaning of 'narrow access'? Do you mean to
> > > > > make 'remote_port' u16? Or 'remote_port' should be made
> > > > > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > > > > won't work, as it only is accessible with u32.
> > > >
> > > > u8 access to remote_port won't pass the verifier,
> > > > but u8 access to dst_port will.
> > > > Though it will return incorrect data.
> > > > See how convert_ctx_accesses() handles narrow loads.
> > > > I think we need to generalize it for different endian fields.
> > >
> > > Yeah, I understand narrower load in convert_ctx_accesses()
> > > now. Seems u8 access to dst_port can't pass the verifier too,
> > > which can be seen form bpf_sock_is_valid_access():
> > >
> > > $ switch (off) {
> > > $ case offsetof(struct bpf_sock, state):
> > > $ case offsetof(struct bpf_sock, family):
> > > $ case offsetof(struct bpf_sock, type):
> > > $ case offsetof(struct bpf_sock, protocol):
> > > $ case offsetof(struct bpf_sock, dst_port): // u8 access is not allowed
> > > $ case offsetof(struct bpf_sock, src_port):
> > > $ case offsetof(struct bpf_sock, rx_queue_mapping):
> > > $ case bpf_ctx_range(struct bpf_sock, src_ip4):
> > > $ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
> > > $ case bpf_ctx_range(struct bpf_sock, dst_ip4):
> > > $ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> > > $ bpf_ctx_record_field_size(info, size_default);
> > > $ return bpf_ctx_narrow_access_ok(off, size, size_default);
> > > $ }
> > >
> > > I'm still not sure what should we do now. Should we make all
> > > remote_port and dst_port narrower accessable and endianness
> > > right? For example the remote_port in struct bpf_sock_ops:
> > >
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
> > > return false;
> > > info->reg_type = PTR_TO_PACKET_END;
> > > break;
> > > + case bpf_ctx_range(struct bpf_sock_ops, remote_port):
> >
> > Ahh. bpf_sock_ops don't have it.
> > But bpf_sk_lookup and sk_msg_md have it.
> >
> > bpf_sk_lookup->remote_port
> > supports narrow access.
> >
> > When it accesses sport from bpf_sk_lookup_kern.
> >
> > and we have tests that do u8 access from remote_port.
> > See verifier/ctx_sk_lookup.c
> >
> > > case offsetof(struct bpf_sock_ops, skb_tcp_flags):
> > > bpf_ctx_record_field_size(info, size_default);
> > > return bpf_ctx_narrow_access_ok(off, size,
> > >
> > > If remote_port/dst_port are made narrower accessable, the
> > > result will be right. Therefore, *((u16*)&sk->remote_port) will
> > > be the port with network byte order. And the port in host byte
> > > order can be get with:
> > > bpf_ntohs(*((u16*)&sk->remote_port))
> > > or
> > > bpf_htonl(sk->remote_port)
> >
> > So u8, u16, u32 will work if we make them narrow-accessible, right?
> >
> > The summary if I understood it:
> > . only bpf_sk_lookup->remote_port is doing it correctly for u8,u16,u32 ?
> > . bpf_sock->dst_port is not correct for u32,
> > since it's missing bpf_ctx_range() ?
> > . __sk_buff->remote_port
> > bpf_sock_ops->remote_port
> > sk_msg_md->remote_port
> > correct for u32 access only. They don't support narrow access.
> >
> > but wait
> > we have a test for bpf_sock->dst_port in progs/test_sock_fields.c.
> > How does it work then?
> >
> > I think we need more eyes on the problem.
> > cc-ing more experts.
> iiuc, I think both bpf_sk_lookup and bpf_sock allow narrow access.
> bpf_sock only allows ((__u8 *)&bpf_sock->dst_port)[0] but
> not ((__u8 *)&bpf_sock->dst_port)[1]. bpf_sk_lookup allows reading
> a byte at [0], [1], [2], and [3].
>
> The test_sock_fields.c currently works because it is comparing
> with another __u16: "sk->dst_port == srv_sa6.sin6_port".
> It should also work with bpf_ntohS() which usually is what the
> userspace program expects when dealing with port instead of using bpf_ntohl()?
> Thus, I think we can keep the lower 16 bits way that bpf_sock->dst_port
> and bpf_sk_lookup->remote_port (and also bpf_sock_addr->user_port ?) are
> using. Also, changing it to the upper 16 bits will break existing
> bpf progs.
>
> For narrow access with any number of bytes at any offset may be useful
> for IP[6] addr. Not sure about the port though. Ideally it should only
> allow sizeof(__u16) read at offset 0. However, I think at this point it makes
> sense to make them consistent with how bpf_sk_lookup does it also,
> i.e. allow byte [0], [1], [2], and [3] access.

Sounds like the proposal is to do:
diff --git a/net/core/filter.c b/net/core/filter.c
index a06931c27eeb..1a8c97bc1927 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8276,9 +8276,9 @@ bool bpf_sock_is_valid_access(int off, int size,
enum bpf_access_type type,
case offsetof(struct bpf_sock, family):
case offsetof(struct bpf_sock, type):
case offsetof(struct bpf_sock, protocol):
- case offsetof(struct bpf_sock, dst_port):
case offsetof(struct bpf_sock, src_port):
case offsetof(struct bpf_sock, rx_queue_mapping):
+ case bpf_ctx_range(struct bpf_sock, dst_port):
case bpf_ctx_range(struct bpf_sock, src_ip4):

and then document bpf_sock->dst_port and bpf_sk_lookup->remote_port
behavior and their difference vs
__sk_buff->remote_port
bpf_sock_ops->remote_port
sk_msg_md->remote_port
?

I suspect we cannot remove lshift_16 from them either,
since it might break some prog as well.

2022-01-25 08:55:37

by Menglong Dong

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Tue, Jan 25, 2022 at 8:35 AM Martin KaFai Lau <[email protected]> wrote:
>
> On Thu, Jan 20, 2022 at 09:17:27PM -0800, Alexei Starovoitov wrote:
> > On Thu, Jan 20, 2022 at 6:18 AM Menglong Dong <[email protected]> wrote:
> > >
> > > On Thu, Jan 20, 2022 at 12:17 PM Alexei Starovoitov
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, Jan 20, 2022 at 11:02:27AM +0800, Menglong Dong wrote:
> > > > > Hello!
> > > > >
> > > > > On Thu, Jan 20, 2022 at 6:03 AM Alexei Starovoitov
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > [...]
> > > > > >
> > > > > > Looks like
> > > > > > __sk_buff->remote_port
> > > > > > bpf_sock_ops->remote_port
> > > > > > sk_msg_md->remote_port
> > > > > > are doing the right thing,
> > > > > > but bpf_sock->dst_port is not correct?
> > > > > >
> > > > > > I think it's better to fix it,
> > > > > > but probably need to consolidate it with
> > > > > > convert_ctx_accesses() that deals with narrow access.
> > > > > > I suspect reading u8 from three flavors of 'remote_port'
> > > > > > won't be correct.
> > > > >
> > > > > What's the meaning of 'narrow access'? Do you mean to
> > > > > make 'remote_port' u16? Or 'remote_port' should be made
> > > > > accessible with u8? In fact, '*((u16 *)&skops->remote_port + 1)'
> > > > > won't work, as it only is accessible with u32.
> > > >
> > > > u8 access to remote_port won't pass the verifier,
> > > > but u8 access to dst_port will.
> > > > Though it will return incorrect data.
> > > > See how convert_ctx_accesses() handles narrow loads.
> > > > I think we need to generalize it for different endian fields.
> > >
> > > Yeah, I understand narrower load in convert_ctx_accesses()
> > > now. Seems u8 access to dst_port can't pass the verifier too,
> > > which can be seen form bpf_sock_is_valid_access():
> > >
> > > $ switch (off) {
> > > $ case offsetof(struct bpf_sock, state):
> > > $ case offsetof(struct bpf_sock, family):
> > > $ case offsetof(struct bpf_sock, type):
> > > $ case offsetof(struct bpf_sock, protocol):
> > > $ case offsetof(struct bpf_sock, dst_port): // u8 access is not allowed
> > > $ case offsetof(struct bpf_sock, src_port):
> > > $ case offsetof(struct bpf_sock, rx_queue_mapping):
> > > $ case bpf_ctx_range(struct bpf_sock, src_ip4):
> > > $ case bpf_ctx_range_till(struct bpf_sock, src_ip6[0], src_ip6[3]):
> > > $ case bpf_ctx_range(struct bpf_sock, dst_ip4):
> > > $ case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> > > $ bpf_ctx_record_field_size(info, size_default);
> > > $ return bpf_ctx_narrow_access_ok(off, size, size_default);
> > > $ }
> > >
> > > I'm still not sure what should we do now. Should we make all
> > > remote_port and dst_port narrower accessable and endianness
> > > right? For example the remote_port in struct bpf_sock_ops:
> > >
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -8414,6 +8414,7 @@ static bool sock_ops_is_valid_access(int off, int size,
> > > return false;
> > > info->reg_type = PTR_TO_PACKET_END;
> > > break;
> > > + case bpf_ctx_range(struct bpf_sock_ops, remote_port):
> >
> > Ahh. bpf_sock_ops don't have it.
> > But bpf_sk_lookup and sk_msg_md have it.
> >
> > bpf_sk_lookup->remote_port
> > supports narrow access.
> >
> > When it accesses sport from bpf_sk_lookup_kern.
> >
> > and we have tests that do u8 access from remote_port.
> > See verifier/ctx_sk_lookup.c
> >
> > > case offsetof(struct bpf_sock_ops, skb_tcp_flags):
> > > bpf_ctx_record_field_size(info, size_default);
> > > return bpf_ctx_narrow_access_ok(off, size,
> > >
> > > If remote_port/dst_port are made narrower accessable, the
> > > result will be right. Therefore, *((u16*)&sk->remote_port) will
> > > be the port with network byte order. And the port in host byte
> > > order can be get with:
> > > bpf_ntohs(*((u16*)&sk->remote_port))
> > > or
> > > bpf_htonl(sk->remote_port)
> >
> > So u8, u16, u32 will work if we make them narrow-accessible, right?
> >
> > The summary if I understood it:
> > . only bpf_sk_lookup->remote_port is doing it correctly for u8,u16,u32 ?
> > . bpf_sock->dst_port is not correct for u32,
> > since it's missing bpf_ctx_range() ?
> > . __sk_buff->remote_port
> > bpf_sock_ops->remote_port
> > sk_msg_md->remote_port
> > correct for u32 access only. They don't support narrow access.
> >
> > but wait
> > we have a test for bpf_sock->dst_port in progs/test_sock_fields.c.
> > How does it work then?
> >
> > I think we need more eyes on the problem.
> > cc-ing more experts.
> iiuc, I think both bpf_sk_lookup and bpf_sock allow narrow access.
> bpf_sock only allows ((__u8 *)&bpf_sock->dst_port)[0] but
> not ((__u8 *)&bpf_sock->dst_port)[1]. bpf_sk_lookup allows reading
> a byte at [0], [1], [2], and [3].
>
> The test_sock_fields.c currently works because it is comparing
> with another __u16: "sk->dst_port == srv_sa6.sin6_port".
> It should also work with bpf_ntohS() which usually is what the
> userspace program expects when dealing with port instead of using bpf_ntohl()?
> Thus, I think we can keep the lower 16 bits way that bpf_sock->dst_port
> and bpf_sk_lookup->remote_port (and also bpf_sock_addr->user_port ?) are
> using. Also, changing it to the upper 16 bits will break existing
> bpf progs.
>
> For narrow access with any number of bytes at any offset may be useful
> for IP[6] addr. Not sure about the port though. Ideally it should only
> allow sizeof(__u16) read at offset 0. However, I think at this point it makes
> sense to make them consistent with how bpf_sk_lookup does it also,
> i.e. allow byte [0], [1], [2], and [3] access.

I don't think it makes much sense to make dst_port allow byte [0],
[1], [2], and [3] access. The whole part of dst_port is in host byte
order, byte access can make the result inconsistent. For example,
byte[2],byte[3] are the port part for big endian, but byte[0],byte[1]
for little endian. Am I right?

So how about it if we do these:
- keep what remote_port and dst_port do
- make all remote_port (bpf_sock_ops, __sk_buff, sk_msg_md,
etc) consistent in byte access
- document dst_port for it's different with remote_port

Glad to hear some better idea :/

>
> would love to hear how others think about it.

2022-01-26 07:36:52

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Thu, Jan 13, 2022 at 08:02 AM CET, [email protected] wrote:
> From: Menglong Dong <[email protected]>
>
> The description of 'dst_port' in 'struct bpf_sock' is not accurated.
> In fact, 'dst_port' is not in network byte order, it is 'partly' in
> network byte order.
>
> We can see it in bpf_sock_convert_ctx_access():
>
>> case offsetof(struct bpf_sock, dst_port):
>> *insn++ = BPF_LDX_MEM(
>> BPF_FIELD_SIZEOF(struct sock_common, skc_dport),
>> si->dst_reg, si->src_reg,
>> bpf_target_off(struct sock_common, skc_dport,
>> sizeof_field(struct sock_common,
>> skc_dport),
>> target_size));
>
> It simply passes 'sock_common->skc_dport' to 'bpf_sock->dst_port',
> which makes that the low 16-bits of 'dst_port' is equal to 'skc_port'
> and is in network byte order, but the high 16-bites of 'dst_port' is
> 0. And the actual port is 'bpf_ntohs((__u16)dst_port)', and
> 'bpf_ntohl(dst_port)' is totally not the right port.
>
> This is different form 'remote_port' in 'struct bpf_sock_ops' or
> 'struct __sk_buff':
>
>> case offsetof(struct __sk_buff, remote_port):
>> BUILD_BUG_ON(sizeof_field(struct sock_common, skc_dport) != 2);
>>
>> *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, sk),
>> si->dst_reg, si->src_reg,
>> offsetof(struct sk_buff, sk));
>> *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->dst_reg,
>> bpf_target_off(struct sock_common,
>> skc_dport,
>> 2, target_size));
>> #ifndef __BIG_ENDIAN_BITFIELD
>> *insn++ = BPF_ALU32_IMM(BPF_LSH, si->dst_reg, 16);
>> #endif
>
> We can see that it will left move 16-bits in little endian, which makes
> the whole 'remote_port' is in network byte order, and the actual port
> is bpf_ntohl(remote_port).
>
> Note this in the document of 'dst_port'. ( Maybe this should be unified
> in the code? )
>
> Signed-off-by: Menglong Dong <[email protected]>
> ---
> include/uapi/linux/bpf.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b0383d371b9a..891a182a749a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5500,7 +5500,11 @@ struct bpf_sock {
> __u32 src_ip4;
> __u32 src_ip6[4];
> __u32 src_port; /* host byte order */
> - __u32 dst_port; /* network byte order */
> + __u32 dst_port; /* low 16-bits are in network byte order,
> + * and high 16-bits are filled by 0.
> + * So the real port in host byte order is
> + * bpf_ntohs((__u16)dst_port).
> + */
> __u32 dst_ip4;
> __u32 dst_ip6[4];
> __u32 state;

I'm probably missing something obvious, but is there anything stopping
us from splitting the field, so that dst_ports is 16-bit wide?

I gave a quick check to the change below and it seems to pass verifier
checks and sock_field tests.

IDK, just an idea. Didn't give it a deeper thought.

--8<--

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4a2f7041ebae..344d62ccafba 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5574,7 +5574,8 @@ struct bpf_sock {
__u32 src_ip4;
__u32 src_ip6[4];
__u32 src_port; /* host byte order */
- __u32 dst_port; /* network byte order */
+ __u16 unused;
+ __u16 dst_port; /* network byte order */
__u32 dst_ip4;
__u32 dst_ip6[4];
__u32 state;
diff --git a/net/core/filter.c b/net/core/filter.c
index a06931c27eeb..c56b8ba82de5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8276,7 +8276,6 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
case offsetof(struct bpf_sock, family):
case offsetof(struct bpf_sock, type):
case offsetof(struct bpf_sock, protocol):
- case offsetof(struct bpf_sock, dst_port):
case offsetof(struct bpf_sock, src_port):
case offsetof(struct bpf_sock, rx_queue_mapping):
case bpf_ctx_range(struct bpf_sock, src_ip4):
@@ -8285,6 +8284,9 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
bpf_ctx_record_field_size(info, size_default);
return bpf_ctx_narrow_access_ok(off, size, size_default);
+ case offsetof(struct bpf_sock, dst_port):
+ bpf_ctx_record_field_size(info, sizeof(__u16));
+ return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));
}

return size == size_default;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4a2f7041ebae..344d62ccafba 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5574,7 +5574,8 @@ struct bpf_sock {
__u32 src_ip4;
__u32 src_ip6[4];
__u32 src_port; /* host byte order */
- __u32 dst_port; /* network byte order */
+ __u16 unused;
+ __u16 dst_port; /* network byte order */
__u32 dst_ip4;
__u32 dst_ip6[4];
__u32 state;

2022-01-26 11:45:55

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Tue, Jan 25, 2022 at 08:24:27PM +0100, Jakub Sitnicki wrote:
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index b0383d371b9a..891a182a749a 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5500,7 +5500,11 @@ struct bpf_sock {
> > __u32 src_ip4;
> > __u32 src_ip6[4];
> > __u32 src_port; /* host byte order */
> > - __u32 dst_port; /* network byte order */
> > + __u32 dst_port; /* low 16-bits are in network byte order,
> > + * and high 16-bits are filled by 0.
> > + * So the real port in host byte order is
> > + * bpf_ntohs((__u16)dst_port).
> > + */
> > __u32 dst_ip4;
> > __u32 dst_ip6[4];
> > __u32 state;
>
> I'm probably missing something obvious, but is there anything stopping
> us from splitting the field, so that dst_ports is 16-bit wide?
>
> I gave a quick check to the change below and it seems to pass verifier
> checks and sock_field tests.
>
> IDK, just an idea. Didn't give it a deeper thought.
>
> --8<--
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 4a2f7041ebae..344d62ccafba 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -5574,7 +5574,8 @@ struct bpf_sock {
> __u32 src_ip4;
> __u32 src_ip6[4];
> __u32 src_port; /* host byte order */
> - __u32 dst_port; /* network byte order */
> + __u16 unused;
> + __u16 dst_port; /* network byte order */
This will break the existing bpf prog.

> __u32 dst_ip4;
> __u32 dst_ip6[4];
> __u32 state;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index a06931c27eeb..c56b8ba82de5 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8276,7 +8276,6 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
> case offsetof(struct bpf_sock, family):
> case offsetof(struct bpf_sock, type):
> case offsetof(struct bpf_sock, protocol):
> - case offsetof(struct bpf_sock, dst_port):
> case offsetof(struct bpf_sock, src_port):
> case offsetof(struct bpf_sock, rx_queue_mapping):
> case bpf_ctx_range(struct bpf_sock, src_ip4):
> @@ -8285,6 +8284,9 @@ bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type,
> case bpf_ctx_range_till(struct bpf_sock, dst_ip6[0], dst_ip6[3]):
> bpf_ctx_record_field_size(info, size_default);
> return bpf_ctx_narrow_access_ok(off, size, size_default);
> + case offsetof(struct bpf_sock, dst_port):
> + bpf_ctx_record_field_size(info, sizeof(__u16));
> + return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));
> }
>
> return size == size_default;
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 4a2f7041ebae..344d62ccafba 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -5574,7 +5574,8 @@ struct bpf_sock {
> __u32 src_ip4;
> __u32 src_ip6[4];
> __u32 src_port; /* host byte order */
> - __u32 dst_port; /* network byte order */
> + __u16 unused;
> + __u16 dst_port; /* network byte order */
> __u32 dst_ip4;
> __u32 dst_ip6[4];
> __u32 state;

2022-01-26 12:58:30

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Tue, Jan 25, 2022 at 2:45 PM Martin KaFai Lau <[email protected]> wrote:
>
> On Tue, Jan 25, 2022 at 08:24:27PM +0100, Jakub Sitnicki wrote:
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index b0383d371b9a..891a182a749a 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -5500,7 +5500,11 @@ struct bpf_sock {
> > > __u32 src_ip4;
> > > __u32 src_ip6[4];
> > > __u32 src_port; /* host byte order */
> > > - __u32 dst_port; /* network byte order */
> > > + __u32 dst_port; /* low 16-bits are in network byte order,
> > > + * and high 16-bits are filled by 0.
> > > + * So the real port in host byte order is
> > > + * bpf_ntohs((__u16)dst_port).
> > > + */
> > > __u32 dst_ip4;
> > > __u32 dst_ip6[4];
> > > __u32 state;
> >
> > I'm probably missing something obvious, but is there anything stopping
> > us from splitting the field, so that dst_ports is 16-bit wide?
> >
> > I gave a quick check to the change below and it seems to pass verifier
> > checks and sock_field tests.
> >
> > IDK, just an idea. Didn't give it a deeper thought.
> >
> > --8<--
> >
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4a2f7041ebae..344d62ccafba 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -5574,7 +5574,8 @@ struct bpf_sock {
> > __u32 src_ip4;
> > __u32 src_ip6[4];
> > __u32 src_port; /* host byte order */
> > - __u32 dst_port; /* network byte order */
> > + __u16 unused;
> > + __u16 dst_port; /* network byte order */
> This will break the existing bpf prog.

I think Jakub's idea is partially expressed:
+ case offsetof(struct bpf_sock, dst_port):
+ bpf_ctx_record_field_size(info, sizeof(__u16));
+ return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));

Either 'unused' needs to be after dst_port or
bpf_sock_is_valid_access() needs to allow offset at 'unused'
and at 'dst_port'.
And allow u32 access though the size is actually u16.
Then the existing bpf progs (without recompiling) should work?

2022-01-26 13:09:02

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Tue, Jan 25, 2022 at 03:02:37PM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 25, 2022 at 2:45 PM Martin KaFai Lau <[email protected]> wrote:
> >
> > On Tue, Jan 25, 2022 at 08:24:27PM +0100, Jakub Sitnicki wrote:
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index b0383d371b9a..891a182a749a 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -5500,7 +5500,11 @@ struct bpf_sock {
> > > > __u32 src_ip4;
> > > > __u32 src_ip6[4];
> > > > __u32 src_port; /* host byte order */
> > > > - __u32 dst_port; /* network byte order */
> > > > + __u32 dst_port; /* low 16-bits are in network byte order,
> > > > + * and high 16-bits are filled by 0.
> > > > + * So the real port in host byte order is
> > > > + * bpf_ntohs((__u16)dst_port).
> > > > + */
> > > > __u32 dst_ip4;
> > > > __u32 dst_ip6[4];
> > > > __u32 state;
> > >
> > > I'm probably missing something obvious, but is there anything stopping
> > > us from splitting the field, so that dst_ports is 16-bit wide?
> > >
> > > I gave a quick check to the change below and it seems to pass verifier
> > > checks and sock_field tests.
> > >
> > > IDK, just an idea. Didn't give it a deeper thought.
> > >
> > > --8<--
> > >
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 4a2f7041ebae..344d62ccafba 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -5574,7 +5574,8 @@ struct bpf_sock {
> > > __u32 src_ip4;
> > > __u32 src_ip6[4];
> > > __u32 src_port; /* host byte order */
> > > - __u32 dst_port; /* network byte order */
> > > + __u16 unused;
> > > + __u16 dst_port; /* network byte order */
> > This will break the existing bpf prog.
>
> I think Jakub's idea is partially expressed:
> + case offsetof(struct bpf_sock, dst_port):
> + bpf_ctx_record_field_size(info, sizeof(__u16));
> + return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));
>
> Either 'unused' needs to be after dst_port or
> bpf_sock_is_valid_access() needs to allow offset at 'unused'
> and at 'dst_port'.
> And allow u32 access though the size is actually u16.
> Then the existing bpf progs (without recompiling) should work?
Yes, I think that should work with the existing bpf progs.
I suspect putting 'dst_port' first and then followed by 'unused'
may be easier. That will also serve as a natural doc for the
current behavior (the value is in the lower 16 bits).

It can be extended to bpf_sk_lookup? bpf_sk_lookup can read at any
offset of these 4 bytes, so may need to read 0 during
convert_ctx_accesses?

2022-01-28 12:20:25

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH bpf-next] bpf: Add document for 'dst_port' of 'struct bpf_sock'

On Wed, Jan 26, 2022 at 12:53 AM CET, Martin KaFai Lau wrote:
> On Tue, Jan 25, 2022 at 03:02:37PM -0800, Alexei Starovoitov wrote:
>> On Tue, Jan 25, 2022 at 2:45 PM Martin KaFai Lau <[email protected]> wrote:
>> >
>> > On Tue, Jan 25, 2022 at 08:24:27PM +0100, Jakub Sitnicki wrote:
>> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> > > > index b0383d371b9a..891a182a749a 100644
>> > > > --- a/include/uapi/linux/bpf.h
>> > > > +++ b/include/uapi/linux/bpf.h
>> > > > @@ -5500,7 +5500,11 @@ struct bpf_sock {
>> > > > __u32 src_ip4;
>> > > > __u32 src_ip6[4];
>> > > > __u32 src_port; /* host byte order */
>> > > > - __u32 dst_port; /* network byte order */
>> > > > + __u32 dst_port; /* low 16-bits are in network byte order,
>> > > > + * and high 16-bits are filled by 0.
>> > > > + * So the real port in host byte order is
>> > > > + * bpf_ntohs((__u16)dst_port).
>> > > > + */
>> > > > __u32 dst_ip4;
>> > > > __u32 dst_ip6[4];
>> > > > __u32 state;
>> > >
>> > > I'm probably missing something obvious, but is there anything stopping
>> > > us from splitting the field, so that dst_ports is 16-bit wide?
>> > >
>> > > I gave a quick check to the change below and it seems to pass verifier
>> > > checks and sock_field tests.
>> > >
>> > > IDK, just an idea. Didn't give it a deeper thought.
>> > >
>> > > --8<--
>> > >
>> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> > > index 4a2f7041ebae..344d62ccafba 100644
>> > > --- a/include/uapi/linux/bpf.h
>> > > +++ b/include/uapi/linux/bpf.h
>> > > @@ -5574,7 +5574,8 @@ struct bpf_sock {
>> > > __u32 src_ip4;
>> > > __u32 src_ip6[4];
>> > > __u32 src_port; /* host byte order */
>> > > - __u32 dst_port; /* network byte order */
>> > > + __u16 unused;
>> > > + __u16 dst_port; /* network byte order */
>> > This will break the existing bpf prog.
>>
>> I think Jakub's idea is partially expressed:
>> + case offsetof(struct bpf_sock, dst_port):
>> + bpf_ctx_record_field_size(info, sizeof(__u16));
>> + return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));
>>
>> Either 'unused' needs to be after dst_port or
>> bpf_sock_is_valid_access() needs to allow offset at 'unused'
>> and at 'dst_port'.
>> And allow u32 access though the size is actually u16.
>> Then the existing bpf progs (without recompiling) should work?
> Yes, I think that should work with the existing bpf progs.
> I suspect putting 'dst_port' first and then followed by 'unused'
> may be easier. That will also serve as a natural doc for the
> current behavior (the value is in the lower 16 bits).

You're right. I can't count. Now fixed in [1].

>
> It can be extended to bpf_sk_lookup? bpf_sk_lookup can read at any
> offset of these 4 bytes, so may need to read 0 during
> convert_ctx_accesses?

Let's see what the feedback to [1] will be.

[1] https://lore.kernel.org/bpf/[email protected]/T/#t