hi, Sriram Yagnaraman,
we noticed two new added tests failed in our test environment.
want to consult with you what's the dependency and requirement to run them?
Thanks a lot!
Hello,
kernel test robot noticed "kernel-selftests.net.fib_tests.sh.fail" on:
commit: 8ae9efb859c05a54ac92b3336c6ca0597c9c8cdb ("selftests: fib_tests: Add multipath list receive tests")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:
group: net
compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]
# timeout set to 1500
# selftests: net: fib_tests.sh
#
# Single path route test
# Start point
# TEST: IPv4 fibmatch [ OK ]
# TEST: IPv6 fibmatch [ OK ]
# Nexthop device deleted
# TEST: IPv4 fibmatch - no route [ OK ]
# TEST: IPv6 fibmatch - no route [ OK ]
...
#
# Fib6 garbage collection test
# TEST: ipv6 route garbage collection [ OK ]
#
# IPv4 multipath list receive tests
# TEST: Multipath route hit ratio (.06) [FAIL]
#
# IPv6 multipath list receive tests
# TEST: Multipath route hit ratio (.10) [FAIL]
#
# Tests passed: 223
# Tests failed: 2
not ok 17 selftests: net: fib_tests.sh # exit=1
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230919/[email protected]
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
CC: Ido, who helped a lot with writing these tests.
> -----Original Message-----
> From: kernel test robot <[email protected]>
> Sent: Tuesday, 19 September 2023 10:32
> To: Sriram Yagnaraman <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; David
> S. Miller <[email protected]>; [email protected];
> [email protected]
> Subject: [linus:master] [selftests] 8ae9efb859: kernel-
> selftests.net.fib_tests.sh.fail
>
>
> hi, Sriram Yagnaraman,
>
> we noticed two new added tests failed in our test environment.
> want to consult with you what's the dependency and requirement to run
> them?
> Thanks a lot!
Sorry for the delayed response. I will look at this and get back.
I am not an expert with lkp-tests but will try to set it up on my local environment and reproduce the problem.
>
> Hello,
>
> kernel test robot noticed "kernel-selftests.net.fib_tests.sh.fail" on:
>
> commit: 8ae9efb859c05a54ac92b3336c6ca0597c9c8cdb ("selftests: fib_tests:
> Add multipath list receive tests")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-60acb023-1_20230329
> with following parameters:
>
> group: net
>
>
>
> compiler: gcc-12
> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @
> 3.00GHz (Cascade Lake) with 32G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes:
> | https://lore.kernel.org/oe-lkp/202309191658.c00d8b8-oliver.sang@intel.
> | com
>
>
>
> # timeout set to 1500
> # selftests: net: fib_tests.sh
> #
> # Single path route test
> # Start point
> # TEST: IPv4 fibmatch [ OK ]
> # TEST: IPv6 fibmatch [ OK ]
> # Nexthop device deleted
> # TEST: IPv4 fibmatch - no route [ OK ]
> # TEST: IPv6 fibmatch - no route [ OK ]
>
> ...
>
> #
> # Fib6 garbage collection test
> # TEST: ipv6 route garbage collection [ OK ]
> #
> # IPv4 multipath list receive tests
> # TEST: Multipath route hit ratio (.06) [FAIL]
> #
> # IPv6 multipath list receive tests
> # TEST: Multipath route hit ratio (.10) [FAIL]
> #
> # Tests passed: 223
> # Tests failed: 2
> not ok 17 selftests: net: fib_tests.sh # exit=1
>
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20230919/202309191658.c00d8b8-
> [email protected]
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
On Mon, Sep 25, 2023 at 06:18:34PM +0000, Sriram Yagnaraman wrote:
> CC: Ido, who helped a lot with writing these tests.
>
> > -----Original Message-----
> > From: kernel test robot <[email protected]>
> > Sent: Tuesday, 19 September 2023 10:32
> > To: Sriram Yagnaraman <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; David
> > S. Miller <[email protected]>; [email protected];
> > [email protected]
> > Subject: [linus:master] [selftests] 8ae9efb859: kernel-
> > selftests.net.fib_tests.sh.fail
> >
> >
> > hi, Sriram Yagnaraman,
> >
> > we noticed two new added tests failed in our test environment.
> > want to consult with you what's the dependency and requirement to run
> > them?
> > Thanks a lot!
>
> Sorry for the delayed response. I will look at this and get back.
> I am not an expert with lkp-tests but will try to set it up on my local environment and reproduce the problem.
>
> >
> > Hello,
> >
> > kernel test robot noticed "kernel-selftests.net.fib_tests.sh.fail" on:
> >
> > commit: 8ae9efb859c05a54ac92b3336c6ca0597c9c8cdb ("selftests: fib_tests:
> > Add multipath list receive tests")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: kernel-selftests
> > version: kernel-selftests-x86_64-60acb023-1_20230329
> > with following parameters:
> >
> > group: net
> >
> >
> >
> > compiler: gcc-12
> > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @
> > 3.00GHz (Cascade Lake) with 32G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> > same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <[email protected]>
> > | Closes:
> > | https://lore.kernel.org/oe-lkp/202309191658.c00d8b8-oliver.sang@intel.
> > | com
> >
> >
> >
> > # timeout set to 1500
> > # selftests: net: fib_tests.sh
> > #
> > # Single path route test
> > # Start point
> > # TEST: IPv4 fibmatch [ OK ]
> > # TEST: IPv6 fibmatch [ OK ]
> > # Nexthop device deleted
> > # TEST: IPv4 fibmatch - no route [ OK ]
> > # TEST: IPv6 fibmatch - no route [ OK ]
> >
> > ...
> >
> > #
> > # Fib6 garbage collection test
> > # TEST: ipv6 route garbage collection [ OK ]
> > #
> > # IPv4 multipath list receive tests
> > # TEST: Multipath route hit ratio (.06) [FAIL]
> > #
> > # IPv6 multipath list receive tests
> > # TEST: Multipath route hit ratio (.10) [FAIL]
I found two possible problems. The first is that in the IPv4 case we
might get more trace point hits than packets (ratio higher than 1)
because of the additional FIB lookups for source validation. Fixed by
disabling source validation:
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index e7d2a530618a..66d0db7a2614 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -2437,6 +2437,9 @@ ipv4_mpath_list_test()
run_cmd "ip -n ns2 route add 203.0.113.0/24
nexthop via 172.16.201.2 nexthop via 172.16.202.2"
run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.fib_multipath_hash_policy=1"
+ run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.veth2.rp_filter=0"
+ run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=0"
+ run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.default.rp_filter=0"
set +e
local dmac=$(ip -n ns2 -j link show dev veth2 | jq -r '.[]["address"]')
The second problem (which I believe is the one you encountered) is that
we might miss certain trace point hits if they happen from the ksoftirqd
task instead of the mausezahn task. Fixed by:
@@ -2449,7 +2452,7 @@ ipv4_mpath_list_test()
# words, the FIB lookup tracepoint needs to be triggered for every
# packet.
local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
- run_cmd "perf stat -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
+ run_cmd "perf stat -a -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
list_rcv_eval $tmp_file $diff
@@ -2494,7 +2497,7 @@ ipv6_mpath_list_test()
# words, the FIB lookup tracepoint needs to be triggered for every
# packet.
local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
- run_cmd "perf stat -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
+ run_cmd "perf stat -a -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
list_rcv_eval $tmp_file $diff
Ran both tests in a loop:
# for i in $(seq 1 20); do ./fib_tests.sh -t ipv4_mpath_list; done
# for i in $(seq 1 20); do ./fib_tests.sh -t ipv6_mpath_list; done
And verified that the results are stable. Also verified that the tests
reliably fail when reverting both fixes:
8423be8926aa ipv6: ignore dst hint for multipath routes
6ac66cb03ae3 ipv4: ignore dst hint for multipath routes
Can you please test with the proposed modifications?
Thanks
hi, Ido Schimmel,
On Sun, Oct 01, 2023 at 05:50:20PM +0300, Ido Schimmel wrote:
> On Mon, Sep 25, 2023 at 06:18:34PM +0000, Sriram Yagnaraman wrote:
> > CC: Ido, who helped a lot with writing these tests.
> >
> > > -----Original Message-----
> > > From: kernel test robot <[email protected]>
> > > Sent: Tuesday, 19 September 2023 10:32
> > > To: Sriram Yagnaraman <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; David
> > > S. Miller <[email protected]>; [email protected];
> > > [email protected]
> > > Subject: [linus:master] [selftests] 8ae9efb859: kernel-
> > > selftests.net.fib_tests.sh.fail
> > >
> > >
> > > hi, Sriram Yagnaraman,
> > >
> > > we noticed two new added tests failed in our test environment.
> > > want to consult with you what's the dependency and requirement to run
> > > them?
> > > Thanks a lot!
> >
> > Sorry for the delayed response. I will look at this and get back.
> > I am not an expert with lkp-tests but will try to set it up on my local environment and reproduce the problem.
> >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "kernel-selftests.net.fib_tests.sh.fail" on:
> > >
> > > commit: 8ae9efb859c05a54ac92b3336c6ca0597c9c8cdb ("selftests: fib_tests:
> > > Add multipath list receive tests")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > >
> > > in testcase: kernel-selftests
> > > version: kernel-selftests-x86_64-60acb023-1_20230329
> > > with following parameters:
> > >
> > > group: net
> > >
> > >
> > >
> > > compiler: gcc-12
> > > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @
> > > 3.00GHz (Cascade Lake) with 32G memory
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of the
> > > same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <[email protected]>
> > > | Closes:
> > > | https://lore.kernel.org/oe-lkp/202309191658.c00d8b8-oliver.sang@intel.
> > > | com
> > >
> > >
> > >
> > > # timeout set to 1500
> > > # selftests: net: fib_tests.sh
> > > #
> > > # Single path route test
> > > # Start point
> > > # TEST: IPv4 fibmatch [ OK ]
> > > # TEST: IPv6 fibmatch [ OK ]
> > > # Nexthop device deleted
> > > # TEST: IPv4 fibmatch - no route [ OK ]
> > > # TEST: IPv6 fibmatch - no route [ OK ]
> > >
> > > ...
> > >
> > > #
> > > # Fib6 garbage collection test
> > > # TEST: ipv6 route garbage collection [ OK ]
> > > #
> > > # IPv4 multipath list receive tests
> > > # TEST: Multipath route hit ratio (.06) [FAIL]
> > > #
> > > # IPv6 multipath list receive tests
> > > # TEST: Multipath route hit ratio (.10) [FAIL]
>
> I found two possible problems. The first is that in the IPv4 case we
> might get more trace point hits than packets (ratio higher than 1)
> because of the additional FIB lookups for source validation. Fixed by
> disabling source validation:
>
> diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
> index e7d2a530618a..66d0db7a2614 100755
> --- a/tools/testing/selftests/net/fib_tests.sh
> +++ b/tools/testing/selftests/net/fib_tests.sh
> @@ -2437,6 +2437,9 @@ ipv4_mpath_list_test()
> run_cmd "ip -n ns2 route add 203.0.113.0/24
> nexthop via 172.16.201.2 nexthop via 172.16.202.2"
> run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.fib_multipath_hash_policy=1"
> + run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.veth2.rp_filter=0"
> + run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.all.rp_filter=0"
> + run_cmd "ip netns exec ns2 sysctl -qw net.ipv4.conf.default.rp_filter=0"
> set +e
>
> local dmac=$(ip -n ns2 -j link show dev veth2 | jq -r '.[]["address"]')
>
> The second problem (which I believe is the one you encountered) is that
> we might miss certain trace point hits if they happen from the ksoftirqd
> task instead of the mausezahn task. Fixed by:
>
> @@ -2449,7 +2452,7 @@ ipv4_mpath_list_test()
> # words, the FIB lookup tracepoint needs to be triggered for every
> # packet.
> local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> - run_cmd "perf stat -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> + run_cmd "perf stat -a -e fib:fib_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
> list_rcv_eval $tmp_file $diff
> @@ -2494,7 +2497,7 @@ ipv6_mpath_list_test()
> # words, the FIB lookup tracepoint needs to be triggered for every
> # packet.
> local t0_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> - run_cmd "perf stat -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> + run_cmd "perf stat -a -e fib6:fib6_table_lookup --filter 'err == 0' -j -o $tmp_file -- $cmd"
> local t1_rx_pkts=$(link_stats_get ns2 veth2 rx packets)
> local diff=$(echo $t1_rx_pkts - $t0_rx_pkts | bc -l)
> list_rcv_eval $tmp_file $diff
>
> Ran both tests in a loop:
>
> # for i in $(seq 1 20); do ./fib_tests.sh -t ipv4_mpath_list; done
> # for i in $(seq 1 20); do ./fib_tests.sh -t ipv6_mpath_list; done
>
> And verified that the results are stable. Also verified that the tests
> reliably fail when reverting both fixes:
>
> 8423be8926aa ipv6: ignore dst hint for multipath routes
> 6ac66cb03ae3 ipv4: ignore dst hint for multipath routes
>
> Can you please test with the proposed modifications?
we applied above patches upon 8ae9efb859, and two tests passed now:
# IPv4 multipath list receive tests
# TEST: Multipath route hit ratio (.99) [ OK ]
#
# IPv6 multipath list receive tests
# TEST: Multipath route hit ratio (1.00) [ OK ]
#
# Tests passed: 225
# Tests failed: 0
ok 17 selftests: net: fib_tests.sh
Tested-by: kernel test robot <[email protected]>
>
> Thanks
>