2023-11-17 09:32:18

by Kunwu Chan

[permalink] [raw]
Subject: [PATCH] net: sched: Fix an endian bug in tcf_proto_create

net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol
net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol

Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")

Signed-off-by: Kunwu Chan <[email protected]>
---
net/sched/cls_api.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 1976bd163986..f73f39f61f66 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
goto errout;
}
tp->classify = tp->ops->classify;
- tp->protocol = protocol;
+ tp->protocol = cpu_to_be16(protocol);
tp->prio = prio;
tp->chain = chain;
spin_lock_init(&tp->lock);
--
2.34.1


2023-11-17 12:07:03

by Pedro Tammela

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix an endian bug in tcf_proto_create

On 17/11/2023 06:31, Kunwu Chan wrote:
> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
> net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol
> net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol
>
> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
>
> Signed-off-by: Kunwu Chan <[email protected]>
> ---
> net/sched/cls_api.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 1976bd163986..f73f39f61f66 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
> goto errout;
> }
> tp->classify = tp->ops->classify;
> - tp->protocol = protocol;
> + tp->protocol = cpu_to_be16(protocol);
> tp->prio = prio;
> tp->chain = chain;
> spin_lock_init(&tp->lock);
I don't believe there's something to fix here either

2023-11-20 10:04:50

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix an endian bug in tcf_proto_create

On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote:
> On 17/11/2023 06:31, Kunwu Chan wrote:
> > net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
> > net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol
> > net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol
> >
> > Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
> >
> > Signed-off-by: Kunwu Chan <[email protected]>
> > ---
> > net/sched/cls_api.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> > index 1976bd163986..f73f39f61f66 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
> > goto errout;
> > }
> > tp->classify = tp->ops->classify;
> > - tp->protocol = protocol;
> > + tp->protocol = cpu_to_be16(protocol);
> > tp->prio = prio;
> > tp->chain = chain;
> > spin_lock_init(&tp->lock);
> I don't believe there's something to fix here either

Hi Pedro and Kunwu,

I suspect that updating the byte order of protocol isn't correct
here - else I'd assume we would have seen a user-visible bug on
little-endian systems buy now.

But nonetheless I think there is a problem, which is that the appropriate
types aren't being used, which means the tooling isn't helping us wrt any
bugs that might subsequently be added or already lurking. So I think an
appropriate question is, what is the endien and width of protocol, and how
can we use an appropriate type throughout the call-path?

2023-11-20 11:18:17

by Kunwu Chan

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix an endian bug in tcf_proto_create

Hi Simon,

Thanks for your reply.
For a lot of newcomers who aren't proficient in this part of the code,
like me, it might be confusing what is the correct endien and width of
a protocol.

In response to your question, I wonder if it is necessary to implement a
unified checking mechanism with a strict parameter validation for all
invocation parameters?

For example, add an input parameter to the 'tcf_proto_create' to
represent the endien and width of the protocol, and check the validity
of the input parameter at the beginning of the function.

I don't have a good idea of how to make sure that the right type is used
in the call path.
This is just my personal opinion, welcome to discuss.

On 2023/11/20 18:04, Simon Horman wrote:
> On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote:
>> On 17/11/2023 06:31, Kunwu Chan wrote:
>>> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
>>> net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol
>>> net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol
>>>
>>> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
>>>
>>> Signed-off-by: Kunwu Chan <[email protected]>
>>> ---
>>> net/sched/cls_api.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 1976bd163986..f73f39f61f66 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
>>> goto errout;
>>> }
>>> tp->classify = tp->ops->classify;
>>> - tp->protocol = protocol;
>>> + tp->protocol = cpu_to_be16(protocol);
>>> tp->prio = prio;
>>> tp->chain = chain;
>>> spin_lock_init(&tp->lock);
>> I don't believe there's something to fix here either
>
> Hi Pedro and Kunwu,
>
> I suspect that updating the byte order of protocol isn't correct
> here - else I'd assume we would have seen a user-visible bug on
> little-endian systems buy now.
>
> But nonetheless I think there is a problem, which is that the appropriate
> types aren't being used, which means the tooling isn't helping us wrt any
> bugs that might subsequently be added or already lurking. So I think an
> appropriate question is, what is the endien and width of protocol, and how
> can we use an appropriate type throughout the call-path?

2023-11-20 15:33:49

by Pedro Tammela

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix an endian bug in tcf_proto_create

On 20/11/2023 07:04, Simon Horman wrote:
> On Fri, Nov 17, 2023 at 09:06:45AM -0300, Pedro Tammela wrote:
>> On 17/11/2023 06:31, Kunwu Chan wrote:
>>> net/sched/cls_api.c:390:22: warning: incorrect type in assignment (different base types)
>>> net/sched/cls_api.c:390:22: expected restricted __be16 [usertype] protocol
>>> net/sched/cls_api.c:390:22: got unsigned int [usertype] protocol
>>>
>>> Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
>>>
>>> Signed-off-by: Kunwu Chan <[email protected]>
>>> ---
>>> net/sched/cls_api.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>>> index 1976bd163986..f73f39f61f66 100644
>>> --- a/net/sched/cls_api.c
>>> +++ b/net/sched/cls_api.c
>>> @@ -387,7 +387,7 @@ static struct tcf_proto *tcf_proto_create(const char *kind, u32 protocol,
>>> goto errout;
>>> }
>>> tp->classify = tp->ops->classify;
>>> - tp->protocol = protocol;
>>> + tp->protocol = cpu_to_be16(protocol);
>>> tp->prio = prio;
>>> tp->chain = chain;
>>> spin_lock_init(&tp->lock);
>> I don't believe there's something to fix here either
>
> Hi Pedro and Kunwu,
>
> I suspect that updating the byte order of protocol isn't correct
> here - else I'd assume we would have seen a user-visible bug on
> little-endian systems buy now.
>
> But nonetheless I think there is a problem, which is that the appropriate
> types aren't being used, which means the tooling isn't helping us wrt any
> bugs that might subsequently be added or already lurking. So I think an
> appropriate question is, what is the endien and width of protocol, and how
> can we use an appropriate type throughout the call-path?

Agreed and I'm all in for improving any tooling integration.
I believe a better patch would be to have protocol as a be16 since it's
creation everywhere. I looked quickly and it will be a "viral" change,
meaning a couple of places will require a one line change.

2023-11-27 14:07:55

by Oliver Sang

[permalink] [raw]
Subject: Re: [PATCH] net: sched: Fix an endian bug in tcf_proto_create



Hello,

kernel test robot noticed "kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_loopback_packets.fail" on:

commit: e706cea62528a99bf2a6ea5e420b9726540dde39 ("[PATCH] net: sched: Fix an endian bug in tcf_proto_create")
url: https://github.com/intel-lab-lkp/linux/commits/Kunwu-Chan/net-sched-Fix-an-endian-bug-in-tcf_proto_create/20231117-173323
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git 18de1e517ed37ebaf33e771e46faf052e966e163
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH] net: sched: Fix an endian bug in tcf_proto_create

in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:

group: net



compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


besides, we also noticed below tests failed upon this patch.

=========================================================================================
tbox_group/testcase/rootfs/kconfig/compiler/group:
lkp-csl-d01/kernel-selftests/debian-12-x86_64-20220629.cgz/x86_64-rhel-8.3-bpf/gcc-12/net

commit:
18de1e517ed37 ("gve: add gve_features_check()")
e706cea62528a ("net: sched: Fix an endian bug in tcf_proto_create")

18de1e517ed37eba e706cea62528a99bf2a6ea5e420
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv4.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv4_multiproto_mode.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv6.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv4_packets_over_UDPv6_multiproto_mode.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv4.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv4_multiproto_mode.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv6.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.IPv6_packets_over_UDPv6_multiproto_mode.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.Unicast_MPLS_packets_over_UDPv4.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.Unicast_MPLS_packets_over_UDPv6.fail
:10 100% 10:10 kernel-selftests.net.bareudp.sh.fail
:10 100% 10:10 kernel-selftests.net.drop_monitor_tests.sh..Capturing_active_software_drops.fail
:10 100% 10:10 kernel-selftests.net.drop_monitor_tests.sh.fail
:10 100% 10:10 kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_local_packets.fail
:10 100% 10:10 kernel-selftests.net.fib_tests.sh.IPv4_rp_filter_tests.rp_filter_passes_loopback_packets.fail
:10 100% 10:10 kernel-selftests.net.rtnetlink.sh.tc_htb_hierarchy.fail
:10 100% 10:10 kernel-selftests.net.test_ingress_egress_chaining.sh.fail



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]



# timeout set to 1500
# selftests: net: bareudp.sh
# TEST: IPv4 packets over UDPv4 [FAIL]
# TEST: IPv4 packets over UDPv6 [FAIL]
# TEST: IPv6 packets over UDPv4 [FAIL]
# TEST: IPv6 packets over UDPv6 [FAIL]
# TEST: IPv4 packets over UDPv4 (multiproto mode) [FAIL]
# TEST: IPv6 packets over UDPv4 (multiproto mode) [FAIL]
# TEST: IPv4 packets over UDPv6 (multiproto mode) [FAIL]
# TEST: IPv6 packets over UDPv6 (multiproto mode) [FAIL]
# TEST: Unicast MPLS packets over UDPv4 [FAIL]
# TEST: Unicast MPLS packets over UDPv6 [FAIL]
# Some tests failed.
not ok 49 selftests: net: bareudp.sh # exit=1



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231127/[email protected]



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki