2023-07-20 06:08:21

by Mirsad Todorovac

[permalink] [raw]
Subject: [PROBLEM] seltests: net/forwarding/sch_ets.sh [HANG]

Hi,

Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
stop at the exact same line on both boxes (os I reckon it is more than an
accident):

# selftests: net/forwarding: sch_ets.sh
# TEST: ping vlan 10 [ OK ]
# TEST: ping vlan 11 [ OK ]
# TEST: ping vlan 12 [ OK ]
# Running in priomap mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
# TEST: bands 1:2 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 1.99
# Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio .99
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio 1.00
# Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.42 Measured ratio 1.42
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.60 Measured ratio 1.59
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 2 quanta 5000 2500, streams 0 1
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 1.99
# Running in classifier mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1

I tried to run 'set -x' enabled version standalone, but that one finished
correctly (?).

It could be something previous scripts left, but right now I don't have a clue.
I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.

Best regards,
Mirsad Todorovac


2023-07-20 11:16:07

by Petr Machata

[permalink] [raw]
Subject: Re: [PROBLEM] seltests: net/forwarding/sch_ets.sh [HANG]


Mirsad Todorovac <[email protected]> writes:

> Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
> stop at the exact same line on both boxes (os I reckon it is more than an
> accident):
>
> # selftests: net/forwarding: sch_ets.sh
> # TEST: ping vlan 10 [ OK ]
> # TEST: ping vlan 11 [ OK ]
> # TEST: ping vlan 12 [ OK ]
> # Running in priomap mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
> # TEST: band 0 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
> # TEST: bands 1:2 [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 1.99
> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio .99
> # TEST: bands 0:2 [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio 1.00
> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 1.42 Measured ratio 1.42
> # TEST: bands 0:2 [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 1.60 Measured ratio 1.59
> # TEST: bands 0:2 [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 2 quanta 5000 2500, streams 0 1
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 1.99
> # Running in classifier mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>
> I tried to run 'set -x' enabled version standalone, but that one finished
> correctly (?).
>
> It could be something previous scripts left, but right now I don't have a clue.
> I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.

If you run it standalone without set -x, does it finish as well? That
would imply that the reproducer needs to include the previous tests as
well.

It looks like the test is stuck in ets_test_mixed in classifier_mode.
A way to run just this test would be:

TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh

Looking at the code, the only place that I can see that waits on
anything is the "{ kill %% && wait %%; } 2>/dev/null" line in
stop_traffic() (in lib.sh). Maybe something like this would let
us see if that's the case:

modified tools/testing/selftests/net/forwarding/lib.sh
@@ -1468,8 +1470,10 @@ start_tcp_traffic()

stop_traffic()
{
+ echo killing MZ
# Suppress noise from killing mausezahn.
{ kill %% && wait %%; } 2>/dev/null
+ echo killed MZ
}

2023-07-20 16:33:38

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PROBLEM] seltests: net/forwarding/sch_ets.sh [HANG]

On 7/20/23 11:43, Petr Machata wrote:
>
> Mirsad Todorovac <[email protected]> writes:
>
>> Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
>> stop at the exact same line on both boxes (os I reckon it is more than an
>> accident):
>>
>> # selftests: net/forwarding: sch_ets.sh
>> # TEST: ping vlan 10 [ OK ]
>> # TEST: ping vlan 11 [ OK ]
>> # TEST: ping vlan 12 [ OK ]
>> # Running in priomap mode
>> # Testing ets bands 3 strict 3, streams 0 1
>> # TEST: band 0 [ OK ]
>> # INFO: Expected ratio >95% Measured ratio 100.00
>> # TEST: band 1 [ OK ]
>> # INFO: Expected ratio <5% Measured ratio 0
>> # Testing ets bands 3 strict 3, streams 1 2
>> # TEST: band 1 [ OK ]
>> # INFO: Expected ratio >95% Measured ratio 100.00
>> # TEST: band 2 [ OK ]
>> # INFO: Expected ratio <5% Measured ratio 0
>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>> # TEST: band 0 [ OK ]
>> # INFO: Expected ratio >95% Measured ratio 100.00
>> # TEST: band 1 [ OK ]
>> # INFO: Expected ratio <5% Measured ratio 0
>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>> # TEST: bands 1:2 [ OK ]
>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>> # TEST: bands 0:1 [ OK ]
>> # INFO: Expected ratio 1.00 Measured ratio .99
>> # TEST: bands 0:2 [ OK ]
>> # INFO: Expected ratio 1.00 Measured ratio 1.00
>> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>> # TEST: bands 0:1 [ OK ]
>> # INFO: Expected ratio 1.42 Measured ratio 1.42
>> # TEST: bands 0:2 [ OK ]
>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>> # TEST: bands 0:1 [ OK ]
>> # INFO: Expected ratio 1.60 Measured ratio 1.59
>> # TEST: bands 0:2 [ OK ]
>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>> # Testing ets bands 2 quanta 5000 2500, streams 0 1
>> # TEST: bands 0:1 [ OK ]
>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>> # Running in classifier mode
>> # Testing ets bands 3 strict 3, streams 0 1
>> # TEST: band 0 [ OK ]
>> # INFO: Expected ratio >95% Measured ratio 100.00
>> # TEST: band 1 [ OK ]
>> # INFO: Expected ratio <5% Measured ratio 0
>> # Testing ets bands 3 strict 3, streams 1 2
>> # TEST: band 1 [ OK ]
>> # INFO: Expected ratio >95% Measured ratio 100.00
>> # TEST: band 2 [ OK ]
>> # INFO: Expected ratio <5% Measured ratio 0
>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>
>> I tried to run 'set -x' enabled version standalone, but that one finished
>> correctly (?).
>>
>> It could be something previous scripts left, but right now I don't have a clue.
>> I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
>
> If you run it standalone without set -x, does it finish as well?

Added that. Yes, standlone run finishes correctly, with or without 'set -x':

root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding# ./sch_ets.sh
TEST: ping vlan 10 [ OK ]
TEST: ping vlan 11 [ OK ]
TEST: ping vlan 12 [ OK ]
Running in priomap mode
Testing ets bands 3 strict 3, streams 0 1
TEST: band 0 [ OK ]
INFO: Expected ratio >95% Measured ratio 100.00
TEST: band 1 [ OK ]
INFO: Expected ratio <5% Measured ratio 0
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 strict 3, streams 1 2
TEST: band 1 [ OK ]
INFO: Expected ratio >95% Measured ratio 100.00
TEST: band 2 [ OK ]
INFO: Expected ratio <5% Measured ratio 0
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
TEST: band 0 [ OK ]
INFO: Expected ratio >95% Measured ratio 100.00
TEST: band 1 [ OK ]
INFO: Expected ratio <5% Measured ratio 0
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
TEST: bands 1:2 [ OK ]
INFO: Expected ratio 2.00 Measured ratio 1.99
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 1.00 Measured ratio 1.00
TEST: bands 0:2 [ OK ]
INFO: Expected ratio 1.00 Measured ratio .99
killing MZ
killed MZ
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 1.42 Measured ratio 1.42
TEST: bands 0:2 [ OK ]
INFO: Expected ratio 3.33 Measured ratio 3.33
killing MZ
killed MZ
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 1.60 Measured ratio 1.59
TEST: bands 0:2 [ OK ]
INFO: Expected ratio 3.33 Measured ratio 3.33
killing MZ
killed MZ
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 2 quanta 5000 2500, streams 0 1
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 2.00 Measured ratio 1.99
killing MZ
killed MZ
killing MZ
killed MZ
Running in classifier mode
Testing ets bands 3 strict 3, streams 0 1
TEST: band 0 [ OK ]
INFO: Expected ratio >95% Measured ratio 100.00
TEST: band 1 [ OK ]
INFO: Expected ratio <5% Measured ratio 0
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 strict 3, streams 1 2
TEST: band 1 [ OK ]
INFO: Expected ratio >95% Measured ratio 100.00
TEST: band 2 [ OK ]
INFO: Expected ratio <5% Measured ratio 0
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
TEST: band 0 [ OK ]
INFO: Expected ratio >95% Measured ratio 100.00
TEST: band 1 [ OK ]
INFO: Expected ratio <5% Measured ratio 0
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
TEST: bands 1:2 [ OK ]
INFO: Expected ratio 2.00 Measured ratio 1.99
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 1.00 Measured ratio .99
TEST: bands 0:2 [ OK ]
INFO: Expected ratio 1.00 Measured ratio .99
killing MZ
killed MZ
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 1.42 Measured ratio 1.42
TEST: bands 0:2 [ OK ]
INFO: Expected ratio 3.33 Measured ratio 3.33
killing MZ
killed MZ
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 1.60 Measured ratio 1.60
TEST: bands 0:2 [ OK ]
INFO: Expected ratio 3.33 Measured ratio 3.33
killing MZ
killed MZ
killing MZ
killed MZ
killing MZ
killed MZ
Testing ets bands 2 quanta 5000 2500, streams 0 1
TEST: bands 0:1 [ OK ]
INFO: Expected ratio 2.00 Measured ratio 2.00
killing MZ
killed MZ
killing MZ
killed MZ
root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding#

> That would imply that the reproducer needs to include the previous tests as
> well.

This is entirely possible, as timeouts and CTRL+C events do not seem to be caught
and the cleanup is not done ...

sch_ets_core.sh: trap cleanup EXIT

Some tests time out even after settings:timeout=240, so IMHO this should be taken into account.

Best regards,
Mirsad Todorovac

> It looks like the test is stuck in ets_test_mixed in classifier_mode.
> A way to run just this test would be:
>
> TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh
>
> Looking at the code, the only place that I can see that waits on
> anything is the "{ kill %% && wait %%; } 2>/dev/null" line in
> stop_traffic() (in lib.sh). Maybe something like this would let
> us see if that's the case:
>
> modified tools/testing/selftests/net/forwarding/lib.sh
> @@ -1468,8 +1470,10 @@ start_tcp_traffic()
>
> stop_traffic()
> {
> + echo killing MZ
> # Suppress noise from killing mausezahn.
> { kill %% && wait %%; } 2>/dev/null
> + echo killed MZ
> }


2023-07-20 16:35:56

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PROBLEM] seltests: net/forwarding/sch_ets.sh [HANG]

On 7/20/23 18:07, Mirsad Todorovac wrote:
> On 7/20/23 11:43, Petr Machata wrote:
>>
>> Mirsad Todorovac <[email protected]> writes:
>>
>>> Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
>>> stop at the exact same line on both boxes (os I reckon it is more than an
>>> accident):
>>>
>>> # selftests: net/forwarding: sch_ets.sh
>>> # TEST: ping vlan 10                                                  [ OK ]
>>> # TEST: ping vlan 11                                                  [ OK ]
>>> # TEST: ping vlan 12                                                  [ OK ]
>>> # Running in priomap mode
>>> # Testing ets bands 3 strict 3, streams 0 1
>>> # TEST: band 0                                                        [ OK ]
>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>> # TEST: band 1                                                        [ OK ]
>>> # INFO: Expected ratio <5% Measured ratio 0
>>> # Testing ets bands 3 strict 3, streams 1 2
>>> # TEST: band 1                                                        [ OK ]
>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>> # TEST: band 2                                                        [ OK ]
>>> # INFO: Expected ratio <5% Measured ratio 0
>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>> # TEST: band 0                                                        [ OK ]
>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>> # TEST: band 1                                                        [ OK ]
>>> # INFO: Expected ratio <5% Measured ratio 0
>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>> # TEST: bands 1:2                                                     [ OK ]
>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>> # TEST: bands 0:1                                                     [ OK ]
>>> # INFO: Expected ratio 1.00 Measured ratio .99
>>> # TEST: bands 0:2                                                     [ OK ]
>>> # INFO: Expected ratio 1.00 Measured ratio 1.00
>>> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>> # TEST: bands 0:1                                                     [ OK ]
>>> # INFO: Expected ratio 1.42 Measured ratio 1.42
>>> # TEST: bands 0:2                                                     [ OK ]
>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>> # TEST: bands 0:1                                                     [ OK ]
>>> # INFO: Expected ratio 1.60 Measured ratio 1.59
>>> # TEST: bands 0:2                                                     [ OK ]
>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>> # Testing ets bands 2 quanta 5000 2500, streams 0 1
>>> # TEST: bands 0:1                                                     [ OK ]
>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>> # Running in classifier mode
>>> # Testing ets bands 3 strict 3, streams 0 1
>>> # TEST: band 0                                                        [ OK ]
>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>> # TEST: band 1                                                        [ OK ]
>>> # INFO: Expected ratio <5% Measured ratio 0
>>> # Testing ets bands 3 strict 3, streams 1 2
>>> # TEST: band 1                                                        [ OK ]
>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>> # TEST: band 2                                                        [ OK ]
>>> # INFO: Expected ratio <5% Measured ratio 0
>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>>
>>> I tried to run 'set -x' enabled version standalone, but that one finished
>>> correctly (?).
>>>
>>> It could be something previous scripts left, but right now I don't have a clue.
>>> I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
>>
>> If you run it standalone without set -x, does it finish as well?
>
> Added that. Yes, standlone run finishes correctly, with or without 'set -x':
>
> root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding# ./sch_ets.sh
> TEST: ping vlan 10                                                  [ OK ]
> TEST: ping vlan 11                                                  [ OK ]
> TEST: ping vlan 12                                                  [ OK ]
> Running in priomap mode
> Testing ets bands 3 strict 3, streams 0 1
> TEST: band 0                                                        [ OK ]
> INFO: Expected ratio >95% Measured ratio 100.00
> TEST: band 1                                                        [ OK ]
> INFO: Expected ratio <5% Measured ratio 0
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 strict 3, streams 1 2
> TEST: band 1                                                        [ OK ]
> INFO: Expected ratio >95% Measured ratio 100.00
> TEST: band 2                                                        [ OK ]
> INFO: Expected ratio <5% Measured ratio 0
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
> TEST: band 0                                                        [ OK ]
> INFO: Expected ratio >95% Measured ratio 100.00
> TEST: band 1                                                        [ OK ]
> INFO: Expected ratio <5% Measured ratio 0
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
> TEST: bands 1:2                                                     [ OK ]
> INFO: Expected ratio 2.00 Measured ratio 1.99
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 1.00 Measured ratio 1.00
> TEST: bands 0:2                                                     [ OK ]
> INFO: Expected ratio 1.00 Measured ratio .99
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 1.42 Measured ratio 1.42
> TEST: bands 0:2                                                     [ OK ]
> INFO: Expected ratio 3.33 Measured ratio 3.33
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 1.60 Measured ratio 1.59
> TEST: bands 0:2                                                     [ OK ]
> INFO: Expected ratio 3.33 Measured ratio 3.33
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 2 quanta 5000 2500, streams 0 1
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 2.00 Measured ratio 1.99
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Running in classifier mode
> Testing ets bands 3 strict 3, streams 0 1
> TEST: band 0                                                        [ OK ]
> INFO: Expected ratio >95% Measured ratio 100.00
> TEST: band 1                                                        [ OK ]
> INFO: Expected ratio <5% Measured ratio 0
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 strict 3, streams 1 2
> TEST: band 1                                                        [ OK ]
> INFO: Expected ratio >95% Measured ratio 100.00
> TEST: band 2                                                        [ OK ]
> INFO: Expected ratio <5% Measured ratio 0
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
> TEST: band 0                                                        [ OK ]
> INFO: Expected ratio >95% Measured ratio 100.00
> TEST: band 1                                                        [ OK ]
> INFO: Expected ratio <5% Measured ratio 0
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
> TEST: bands 1:2                                                     [ OK ]
> INFO: Expected ratio 2.00 Measured ratio 1.99
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 1.00 Measured ratio .99
> TEST: bands 0:2                                                     [ OK ]
> INFO: Expected ratio 1.00 Measured ratio .99
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 1.42 Measured ratio 1.42
> TEST: bands 0:2                                                     [ OK ]
> INFO: Expected ratio 3.33 Measured ratio 3.33
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 1.60 Measured ratio 1.60
> TEST: bands 0:2                                                     [ OK ]
> INFO: Expected ratio 3.33 Measured ratio 3.33
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> Testing ets bands 2 quanta 5000 2500, streams 0 1
> TEST: bands 0:1                                                     [ OK ]
> INFO: Expected ratio 2.00 Measured ratio 2.00
> killing MZ
> killed MZ
> killing MZ
> killed MZ
> root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding#
>
>> That would imply that the reproducer needs to include the previous tests as
>> well.
>
> This is entirely possible, as timeouts and CTRL+C events do not seem to be caught
> and the cleanup is not done ...
>
> sch_ets_core.sh:    trap cleanup EXIT
>
> Some tests time out even after settings:timeout=240, so IMHO this should be taken into account.
>
> Best regards,
> Mirsad Todorovac
>
>> It looks like the test is stuck in ets_test_mixed in classifier_mode.
>> A way to run just this test would be:
>>
>>      TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh
>>
>> Looking at the code, the only place that I can see that waits on
>> anything is the "{ kill %% && wait %%; } 2>/dev/null" line in
>> stop_traffic() (in lib.sh). Maybe something like this would let
>> us see if that's the case:
>>
>> modified   tools/testing/selftests/net/forwarding/lib.sh
>> @@ -1468,8 +1470,10 @@ start_tcp_traffic()
>>   stop_traffic()
>>   {
>> +    echo killing MZ
>>       # Suppress noise from killing mausezahn.
>>       { kill %% && wait %%; } 2>/dev/null
>> +    echo killed MZ
>>   }

FYI, this is the [incomplete] list of the tests that time out even after
being assigned long timeout of 240s instead of default.

marvin@defiant:~/linux/kernel/linux_torvalds$ grep TIMEOUT ../kselftest-6.5-rc2-net-forwarding-8.log
not ok 13 selftests: net/forwarding: custom_multipath_hash.sh # TIMEOUT 240 seconds
not ok 18 selftests: net/forwarding: gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
not ok 19 selftests: net/forwarding: gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
not ok 21 selftests: net/forwarding: gre_multipath_nh_res.sh # TIMEOUT 240 seconds
not ok 22 selftests: net/forwarding: gre_multipath_nh.sh # TIMEOUT 240 seconds
not ok 27 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
not ok 34 selftests: net/forwarding: ip6gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
not ok 58 selftests: net/forwarding: no_forwarding.sh # TIMEOUT 240 seconds
not ok 67 selftests: net/forwarding: router_mpath_nh_res.sh # TIMEOUT 240 seconds
not ok 68 selftests: net/forwarding: router_mpath_nh.sh # TIMEOUT 240 seconds
not ok 70 selftests: net/forwarding: router_multipath.sh # TIMEOUT 240 seconds
marvin@defiant:~/linux/kernel/linux_torvalds$

Hope this helps.

Best regards,
Mirsad Todorovac

2023-07-21 20:33:52

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PROBLEM] [FULL TEST] seltests: net/forwarding/sch_ets.sh [HANG]

Hi,

On 7/20/23 18:07, Mirsad Todorovac wrote:
> On 7/20/23 11:43, Petr Machata wrote:
>>
>> Mirsad Todorovac <[email protected]> writes:
>>
>>> Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
>>> stop at the exact same line on both boxes (os I reckon it is more than an
>>> accident):
>>>
>>> # selftests: net/forwarding: sch_ets.sh

I have ran the full test (the net/forwarding part), with only sch_ets.sh being enabled
the 'set -x' option, as it was the most interesting part.

This is my best effort at diagnosing the situation.

The hang appears on an unsuspecting command:

[...]
# ++ local if_name=veth0
# ++ local add_del=del
# ++ local array
# ++ shift
# ++ shift
# ++ array=("${@}")
# ++ ip link set dev veth0 down
# ++ ip link set dev veth0 nomaster
# ++ vrf_destroy vveth0
# ++ local vrf_name=vveth0
# ++ local tb_id
# ++ __vrf_td_id_lookup vveth0
# ++ local vrf_name=vveth0
# ++ return 1
# ++ tb_id=1
# ++ ip -6 route del table 1 unreachable default metric 4278198272
# ++ ip -4 route del table 1 unreachable default metric 4278198272
# ++ ip link del dev vveth0
# ++ vrf_cleanup
# ++ ip -6 rule add pref 0 table local
# ++ ip -6 rule del pref 32765
# ++ ip -4 rule add pref 0 table local
# ++ ip -4 rule del pref 32765

Please find the full kselftest execution log attached at your convenience.

Hope this helps, because I do not really see what makes this command hang indefinitely.

Config is the same, kernel still 6.5-rc2, to test in the same conditions.

Best regards,
Mirsad Todorovac


Attachments:
kselftest-6.5-rc2-net-forwarding-9.log.xz (10.75 kB)
lshw.txt.xz (6.34 kB)
Download all attachments

2023-07-22 00:52:46

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PROBLEM] [ADDITIONAL DIAG] seltests: net/forwarding/sch_ets.sh [HANG]

On 7/20/23 18:25, Mirsad Todorovac wrote:
> On 7/20/23 18:07, Mirsad Todorovac wrote:
>> On 7/20/23 11:43, Petr Machata wrote:
>>>
>>> Mirsad Todorovac <[email protected]> writes:
>>>
>>>> Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
>>>> stop at the exact same line on both boxes (os I reckon it is more than an
>>>> accident):
>>>>
>>>> # selftests: net/forwarding: sch_ets.sh
>>>> # TEST: ping vlan 10                                                  [ OK ]
>>>> # TEST: ping vlan 11                                                  [ OK ]
>>>> # TEST: ping vlan 12                                                  [ OK ]
>>>> # Running in priomap mode
>>>> # Testing ets bands 3 strict 3, streams 0 1
>>>> # TEST: band 0                                                        [ OK ]
>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>> # TEST: band 1                                                        [ OK ]
>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>> # Testing ets bands 3 strict 3, streams 1 2
>>>> # TEST: band 1                                                        [ OK ]
>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>> # TEST: band 2                                                        [ OK ]
>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>>> # TEST: band 0                                                        [ OK ]
>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>> # TEST: band 1                                                        [ OK ]
>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>>> # TEST: bands 1:2                                                     [ OK ]
>>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>>> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>>> # TEST: bands 0:1                                                     [ OK ]
>>>> # INFO: Expected ratio 1.00 Measured ratio .99
>>>> # TEST: bands 0:2                                                     [ OK ]
>>>> # INFO: Expected ratio 1.00 Measured ratio 1.00
>>>> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>>> # TEST: bands 0:1                                                     [ OK ]
>>>> # INFO: Expected ratio 1.42 Measured ratio 1.42
>>>> # TEST: bands 0:2                                                     [ OK ]
>>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>>> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>>> # TEST: bands 0:1                                                     [ OK ]
>>>> # INFO: Expected ratio 1.60 Measured ratio 1.59
>>>> # TEST: bands 0:2                                                     [ OK ]
>>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>>> # Testing ets bands 2 quanta 5000 2500, streams 0 1
>>>> # TEST: bands 0:1                                                     [ OK ]
>>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>>> # Running in classifier mode
>>>> # Testing ets bands 3 strict 3, streams 0 1
>>>> # TEST: band 0                                                        [ OK ]
>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>> # TEST: band 1                                                        [ OK ]
>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>> # Testing ets bands 3 strict 3, streams 1 2
>>>> # TEST: band 1                                                        [ OK ]
>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>> # TEST: band 2                                                        [ OK ]
>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>>>
>>>> I tried to run 'set -x' enabled version standalone, but that one finished
>>>> correctly (?).
>>>>
>>>> It could be something previous scripts left, but right now I don't have a clue.
>>>> I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
>>>
>>> If you run it standalone without set -x, does it finish as well?
>>
>> Added that. Yes, standlone run finishes correctly, with or without 'set -x':
>>
>> root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding# ./sch_ets.sh
>> TEST: ping vlan 10                                                  [ OK ]
>> TEST: ping vlan 11                                                  [ OK ]
>> TEST: ping vlan 12                                                  [ OK ]
>> Running in priomap mode
>> Testing ets bands 3 strict 3, streams 0 1
>> TEST: band 0                                                        [ OK ]
>> INFO: Expected ratio >95% Measured ratio 100.00
>> TEST: band 1                                                        [ OK ]
>> INFO: Expected ratio <5% Measured ratio 0
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 strict 3, streams 1 2
>> TEST: band 1                                                        [ OK ]
>> INFO: Expected ratio >95% Measured ratio 100.00
>> TEST: band 2                                                        [ OK ]
>> INFO: Expected ratio <5% Measured ratio 0
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>> TEST: band 0                                                        [ OK ]
>> INFO: Expected ratio >95% Measured ratio 100.00
>> TEST: band 1                                                        [ OK ]
>> INFO: Expected ratio <5% Measured ratio 0
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>> TEST: bands 1:2                                                     [ OK ]
>> INFO: Expected ratio 2.00 Measured ratio 1.99
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 1.00 Measured ratio 1.00
>> TEST: bands 0:2                                                     [ OK ]
>> INFO: Expected ratio 1.00 Measured ratio .99
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 1.42 Measured ratio 1.42
>> TEST: bands 0:2                                                     [ OK ]
>> INFO: Expected ratio 3.33 Measured ratio 3.33
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 1.60 Measured ratio 1.59
>> TEST: bands 0:2                                                     [ OK ]
>> INFO: Expected ratio 3.33 Measured ratio 3.33
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 2 quanta 5000 2500, streams 0 1
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 2.00 Measured ratio 1.99
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Running in classifier mode
>> Testing ets bands 3 strict 3, streams 0 1
>> TEST: band 0                                                        [ OK ]
>> INFO: Expected ratio >95% Measured ratio 100.00
>> TEST: band 1                                                        [ OK ]
>> INFO: Expected ratio <5% Measured ratio 0
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 strict 3, streams 1 2
>> TEST: band 1                                                        [ OK ]
>> INFO: Expected ratio >95% Measured ratio 100.00
>> TEST: band 2                                                        [ OK ]
>> INFO: Expected ratio <5% Measured ratio 0
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>> TEST: band 0                                                        [ OK ]
>> INFO: Expected ratio >95% Measured ratio 100.00
>> TEST: band 1                                                        [ OK ]
>> INFO: Expected ratio <5% Measured ratio 0
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>> TEST: bands 1:2                                                     [ OK ]
>> INFO: Expected ratio 2.00 Measured ratio 1.99
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 1.00 Measured ratio .99
>> TEST: bands 0:2                                                     [ OK ]
>> INFO: Expected ratio 1.00 Measured ratio .99
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 1.42 Measured ratio 1.42
>> TEST: bands 0:2                                                     [ OK ]
>> INFO: Expected ratio 3.33 Measured ratio 3.33
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 1.60 Measured ratio 1.60
>> TEST: bands 0:2                                                     [ OK ]
>> INFO: Expected ratio 3.33 Measured ratio 3.33
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> Testing ets bands 2 quanta 5000 2500, streams 0 1
>> TEST: bands 0:1                                                     [ OK ]
>> INFO: Expected ratio 2.00 Measured ratio 2.00
>> killing MZ
>> killed MZ
>> killing MZ
>> killed MZ
>> root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding#
>>
>>> That would imply that the reproducer needs to include the previous tests as
>>> well.
>>
>> This is entirely possible, as timeouts and CTRL+C events do not seem to be caught
>> and the cleanup is not done ...
>>
>> sch_ets_core.sh:    trap cleanup EXIT
>>
>> Some tests time out even after settings:timeout=240, so IMHO this should be taken into account.
>>
>> Best regards,
>> Mirsad Todorovac
>>
>>> It looks like the test is stuck in ets_test_mixed in classifier_mode.
>>> A way to run just this test would be:
>>>
>>>      TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh
>>>
>>> Looking at the code, the only place that I can see that waits on
>>> anything is the "{ kill %% && wait %%; } 2>/dev/null" line in
>>> stop_traffic() (in lib.sh). Maybe something like this would let
>>> us see if that's the case:
>>>
>>> modified   tools/testing/selftests/net/forwarding/lib.sh
>>> @@ -1468,8 +1470,10 @@ start_tcp_traffic()
>>>   stop_traffic()
>>>   {
>>> +    echo killing MZ
>>>       # Suppress noise from killing mausezahn.
>>>       { kill %% && wait %%; } 2>/dev/null
>>> +    echo killed MZ
>>>   }
>
> FYI, this is the [incomplete] list of the tests that time out even after
> being assigned long timeout of 240s instead of default.
>
> marvin@defiant:~/linux/kernel/linux_torvalds$ grep TIMEOUT ../kselftest-6.5-rc2-net-forwarding-8.log
> not ok 13 selftests: net/forwarding: custom_multipath_hash.sh # TIMEOUT 240 seconds
> not ok 18 selftests: net/forwarding: gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
> not ok 19 selftests: net/forwarding: gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
> not ok 21 selftests: net/forwarding: gre_multipath_nh_res.sh # TIMEOUT 240 seconds
> not ok 22 selftests: net/forwarding: gre_multipath_nh.sh # TIMEOUT 240 seconds
> not ok 27 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
> not ok 34 selftests: net/forwarding: ip6gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
> not ok 58 selftests: net/forwarding: no_forwarding.sh # TIMEOUT 240 seconds
> not ok 67 selftests: net/forwarding: router_mpath_nh_res.sh # TIMEOUT 240 seconds
> not ok 68 selftests: net/forwarding: router_mpath_nh.sh # TIMEOUT 240 seconds
> not ok 70 selftests: net/forwarding: router_multipath.sh # TIMEOUT 240 seconds
> marvin@defiant:~/linux/kernel/linux_torvalds$

The test seem to be stuck despite adding SIGTERM cleanup to all tests
that timed out:

# selftests: net/forwarding: sch_ets.sh
# TEST: ping vlan 10 [ OK ]
# TEST: ping vlan 11 [ OK ]
# TEST: ping vlan 12 [ OK ]
# Running in priomap mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
# TEST: bands 1:2 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 2.00
# Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio .99
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio .99
# Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.42 Measured ratio 1.42
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.60 Measured ratio 1.59
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 2 quanta 5000 2500, streams 0 1
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 1.99
# Running in classifier mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1

What is happening is a stuck mousezahn loop:

root 558266 2732 0 01:17 pts/2 00:00:23 mausezahn veth0.10 -p 8000 -A 192.0.2.1 -B 192.0.2.2 -c 0 -a own -b ba:33:37:81:dc:5
root 558273 2732 0 01:17 pts/2 00:00:23 mausezahn veth0.11 -p 8000 -A 192.0.2.17 -B 192.0.2.18 -c 0 -a own -b ba:33:37:81:dc

$ sudo strace -p 558266
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)

I am not sure if this will ever complete, or is it a runaway loop?

Probably it is stuck, for nothing was written to the test log for 71 minutes.

I hope we are coming closer to the cause.

Previously I didn't figure out what hanged.

FWIW, I will post the patches for the scripts that timed out for your review.

Kind regards,
Mirsad

2023-07-22 01:50:34

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PROBLEM] [ADDITIONAL DIAG] seltests: net/forwarding/sch_ets.sh [HANG]

On 7/22/23 02:31, Mirsad Todorovac wrote:
> On 7/20/23 18:25, Mirsad Todorovac wrote:
>> On 7/20/23 18:07, Mirsad Todorovac wrote:
>>> On 7/20/23 11:43, Petr Machata wrote:
>>>>
>>>> Mirsad Todorovac <[email protected]> writes:
>>>>
>>>>> Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
>>>>> stop at the exact same line on both boxes (os I reckon it is more than an
>>>>> accident):
>>>>>
>>>>> # selftests: net/forwarding: sch_ets.sh
>>>>> # TEST: ping vlan 10                                                  [ OK ]
>>>>> # TEST: ping vlan 11                                                  [ OK ]
>>>>> # TEST: ping vlan 12                                                  [ OK ]
>>>>> # Running in priomap mode
>>>>> # Testing ets bands 3 strict 3, streams 0 1
>>>>> # TEST: band 0                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 3 strict 3, streams 1 2
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 2                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>>>> # TEST: band 0                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>>>> # TEST: bands 1:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>>>> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.00 Measured ratio .99
>>>>> # TEST: bands 0:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.00 Measured ratio 1.00
>>>>> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.42 Measured ratio 1.42
>>>>> # TEST: bands 0:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>>>> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 1.60 Measured ratio 1.59
>>>>> # TEST: bands 0:2                                                     [ OK ]
>>>>> # INFO: Expected ratio 3.33 Measured ratio 3.33
>>>>> # Testing ets bands 2 quanta 5000 2500, streams 0 1
>>>>> # TEST: bands 0:1                                                     [ OK ]
>>>>> # INFO: Expected ratio 2.00 Measured ratio 1.99
>>>>> # Running in classifier mode
>>>>> # Testing ets bands 3 strict 3, streams 0 1
>>>>> # TEST: band 0                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 3 strict 3, streams 1 2
>>>>> # TEST: band 1                                                        [ OK ]
>>>>> # INFO: Expected ratio >95% Measured ratio 100.00
>>>>> # TEST: band 2                                                        [ OK ]
>>>>> # INFO: Expected ratio <5% Measured ratio 0
>>>>> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>>>>
>>>>> I tried to run 'set -x' enabled version standalone, but that one finished
>>>>> correctly (?).
>>>>>
>>>>> It could be something previous scripts left, but right now I don't have a clue.
>>>>> I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
>>>>
>>>> If you run it standalone without set -x, does it finish as well?
>>>
>>> Added that. Yes, standlone run finishes correctly, with or without 'set -x':
>>>
>>> root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding# ./sch_ets.sh
>>> TEST: ping vlan 10                                                  [ OK ]
>>> TEST: ping vlan 11                                                  [ OK ]
>>> TEST: ping vlan 12                                                  [ OK ]
>>> Running in priomap mode
>>> Testing ets bands 3 strict 3, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 strict 3, streams 1 2
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 2                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>> TEST: bands 1:2                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 1.99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio 1.00
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio .99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.42 Measured ratio 1.42
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.60 Measured ratio 1.59
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 2 quanta 5000 2500, streams 0 1
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 1.99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Running in classifier mode
>>> Testing ets bands 3 strict 3, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 strict 3, streams 1 2
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 2                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>>> TEST: band 0                                                        [ OK ]
>>> INFO: Expected ratio >95% Measured ratio 100.00
>>> TEST: band 1                                                        [ OK ]
>>> INFO: Expected ratio <5% Measured ratio 0
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
>>> TEST: bands 1:2                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 1.99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio .99
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 1.00 Measured ratio .99
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.42 Measured ratio 1.42
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 1.60 Measured ratio 1.60
>>> TEST: bands 0:2                                                     [ OK ]
>>> INFO: Expected ratio 3.33 Measured ratio 3.33
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> Testing ets bands 2 quanta 5000 2500, streams 0 1
>>> TEST: bands 0:1                                                     [ OK ]
>>> INFO: Expected ratio 2.00 Measured ratio 2.00
>>> killing MZ
>>> killed MZ
>>> killing MZ
>>> killed MZ
>>> root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding#
>>>
>>>> That would imply that the reproducer needs to include the previous tests as
>>>> well.
>>>
>>> This is entirely possible, as timeouts and CTRL+C events do not seem to be caught
>>> and the cleanup is not done ...
>>>
>>> sch_ets_core.sh:    trap cleanup EXIT
>>>
>>> Some tests time out even after settings:timeout=240, so IMHO this should be taken into account.
>>>
>>> Best regards,
>>> Mirsad Todorovac
>>>
>>>> It looks like the test is stuck in ets_test_mixed in classifier_mode.
>>>> A way to run just this test would be:
>>>>
>>>>      TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh
>>>>
>>>> Looking at the code, the only place that I can see that waits on
>>>> anything is the "{ kill %% && wait %%; } 2>/dev/null" line in
>>>> stop_traffic() (in lib.sh). Maybe something like this would let
>>>> us see if that's the case:
>>>>
>>>> modified   tools/testing/selftests/net/forwarding/lib.sh
>>>> @@ -1468,8 +1470,10 @@ start_tcp_traffic()
>>>>   stop_traffic()
>>>>   {
>>>> +    echo killing MZ
>>>>       # Suppress noise from killing mausezahn.
>>>>       { kill %% && wait %%; } 2>/dev/null
>>>> +    echo killed MZ
>>>>   }
>>
>> FYI, this is the [incomplete] list of the tests that time out even after
>> being assigned long timeout of 240s instead of default.
>>
>> marvin@defiant:~/linux/kernel/linux_torvalds$ grep TIMEOUT ../kselftest-6.5-rc2-net-forwarding-8.log
>> not ok 13 selftests: net/forwarding: custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 18 selftests: net/forwarding: gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 19 selftests: net/forwarding: gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
>> not ok 21 selftests: net/forwarding: gre_multipath_nh_res.sh # TIMEOUT 240 seconds
>> not ok 22 selftests: net/forwarding: gre_multipath_nh.sh # TIMEOUT 240 seconds
>> not ok 27 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 34 selftests: net/forwarding: ip6gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
>> not ok 58 selftests: net/forwarding: no_forwarding.sh # TIMEOUT 240 seconds
>> not ok 67 selftests: net/forwarding: router_mpath_nh_res.sh # TIMEOUT 240 seconds
>> not ok 68 selftests: net/forwarding: router_mpath_nh.sh # TIMEOUT 240 seconds
>> not ok 70 selftests: net/forwarding: router_multipath.sh # TIMEOUT 240 seconds
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>
> The test seem to be stuck despite adding SIGTERM cleanup to all tests
> that timed out:
>
> # selftests: net/forwarding: sch_ets.sh
> # TEST: ping vlan 10                                                  [ OK ]
> # TEST: ping vlan 11                                                  [ OK ]
> # TEST: ping vlan 12                                                  [ OK ]
> # Running in priomap mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
> # TEST: band 0                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
> # TEST: bands 1:2                                                     [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 2.00
> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio .99
> # TEST: bands 0:2                                                     [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio .99
> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 1.42 Measured ratio 1.42
> # TEST: bands 0:2                                                     [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 1.60 Measured ratio 1.59
> # TEST: bands 0:2                                                     [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 2 quanta 5000 2500, streams 0 1
> # TEST: bands 0:1                                                     [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 1.99
> # Running in classifier mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1                                                        [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2                                                        [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>
> What is happening is a stuck mousezahn loop:
>
> root      558266    2732  0 01:17 pts/2    00:00:23 mausezahn veth0.10 -p 8000 -A 192.0.2.1 -B 192.0.2.2 -c 0 -a own -b ba:33:37:81:dc:5
> root      558273    2732  0 01:17 pts/2    00:00:23 mausezahn veth0.11 -p 8000 -A 192.0.2.17 -B 192.0.2.18 -c 0 -a own -b ba:33:37:81:dc
>
> $ sudo strace -p 558266
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
> ioctl(5, SIOCGIFINDEX, {ifr_name="veth0.10"}) = -1 ENODEV (No such device)
>
> I am not sure if this will ever complete, or is it a runaway loop?
>
> Probably it is stuck, for nothing was written to the test log for 71 minutes.
>
> I hope we are coming closer to the cause.
>
> Previously I didn't figure out what hanged.
>
> FWIW, I will post the patches for the scripts that timed out for your review.

Hi,

More homework:

I can trace the infinite mausezahn mode to here:

lib.sh
1420 __start_traffic()
1421 {
1422 local pktsize=$1; shift
1423 local proto=$1; shift
1424 local h_in=$1; shift # Where the traffic egresses the host
1425 local sip=$1; shift
1426 local dip=$1; shift
1427 local dmac=$1; shift
1428
1429 $MZ $h_in -p $pktsize -A $sip -B $dip -c 0 \
1430 -a own -b $dmac -t "$proto" -q "$@" &
1431 sleep 1
1432 }

but I don't know why in this test it runs away for "veth0.10" in an infinite loop
and is obviously nowhere killed for hours.

I seem to have no new ideas now so I guess I should take a break.

Kind regards,
Mirsad

2023-07-22 13:56:10

by Petr Machata

[permalink] [raw]
Subject: Re: [PROBLEM] [ADDITIONAL DIAG] seltests: net/forwarding/sch_ets.sh [HANG]


Mirsad Todorovac <[email protected]> writes:

> On 7/20/23 18:25, Mirsad Todorovac wrote:
>> On 7/20/23 18:07, Mirsad Todorovac wrote:
>>> root@defiant:/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/forwarding#
>>>
>>>> That would imply that the reproducer needs to include the previous tests as
>>>> well.
>>>
>>> This is entirely possible, as timeouts and CTRL+C events do not seem to be caught
>>> and the cleanup is not done ...
>>>
>>> sch_ets_core.sh:    trap cleanup EXIT
>>>
>>> Some tests time out even after settings:timeout=240, so IMHO this should be taken into account.
>>>
>>> Best regards,
>>> Mirsad Todorovac
>>>
>>>> It looks like the test is stuck in ets_test_mixed in classifier_mode.
>>>> A way to run just this test would be:
>>>>
>>>>      TESTS="classifier_mode ets_test_mixed" ./sch_ets.sh
>>>>
>>>> Looking at the code, the only place that I can see that waits on
>>>> anything is the "{ kill %% && wait %%; } 2>/dev/null" line in
>>>> stop_traffic() (in lib.sh). Maybe something like this would let
>>>> us see if that's the case:
>>>>
>>>> modified   tools/testing/selftests/net/forwarding/lib.sh
>>>> @@ -1468,8 +1470,10 @@ start_tcp_traffic()
>>>>   stop_traffic()
>>>>   {
>>>> +    echo killing MZ
>>>>       # Suppress noise from killing mausezahn.
>>>>       { kill %% && wait %%; } 2>/dev/null
>>>> +    echo killed MZ
>>>>   }
>> FYI, this is the [incomplete] list of the tests that time out even after
>> being assigned long timeout of 240s instead of default.
>> marvin@defiant:~/linux/kernel/linux_torvalds$ grep TIMEOUT
>> ../kselftest-6.5-rc2-net-forwarding-8.log
>> not ok 13 selftests: net/forwarding: custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 18 selftests: net/forwarding: gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 19 selftests: net/forwarding: gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
>> not ok 21 selftests: net/forwarding: gre_multipath_nh_res.sh # TIMEOUT 240 seconds
>> not ok 22 selftests: net/forwarding: gre_multipath_nh.sh # TIMEOUT 240 seconds
>> not ok 27 selftests: net/forwarding: ip6gre_custom_multipath_hash.sh # TIMEOUT 240 seconds
>> not ok 34 selftests: net/forwarding: ip6gre_inner_v4_multipath.sh # TIMEOUT 240 seconds
>> not ok 58 selftests: net/forwarding: no_forwarding.sh # TIMEOUT 240 seconds
>> not ok 67 selftests: net/forwarding: router_mpath_nh_res.sh # TIMEOUT 240 seconds
>> not ok 68 selftests: net/forwarding: router_mpath_nh.sh # TIMEOUT 240 seconds
>> not ok 70 selftests: net/forwarding: router_multipath.sh # TIMEOUT 240 seconds
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>
> The test seem to be stuck despite adding SIGTERM cleanup to all tests
> that timed out:
>
> # selftests: net/forwarding: sch_ets.sh
> # TEST: ping vlan 10 [ OK ]
> # TEST: ping vlan 11 [ OK ]
> # TEST: ping vlan 12 [ OK ]
> # Running in priomap mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
> # TEST: band 0 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
> # TEST: bands 1:2 [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 2.00
> # Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio .99
> # TEST: bands 0:2 [ OK ]
> # INFO: Expected ratio 1.00 Measured ratio .99
> # Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 1.42 Measured ratio 1.42
> # TEST: bands 0:2 [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 1.60 Measured ratio 1.59
> # TEST: bands 0:2 [ OK ]
> # INFO: Expected ratio 3.33 Measured ratio 3.33
> # Testing ets bands 2 quanta 5000 2500, streams 0 1
> # TEST: bands 0:1 [ OK ]
> # INFO: Expected ratio 2.00 Measured ratio 1.99
> # Running in classifier mode
> # Testing ets bands 3 strict 3, streams 0 1
> # TEST: band 0 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 3 strict 3, streams 1 2
> # TEST: band 1 [ OK ]
> # INFO: Expected ratio >95% Measured ratio 100.00
> # TEST: band 2 [ OK ]
> # INFO: Expected ratio <5% Measured ratio 0
> # Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
>
> What is happening is a stuck mousezahn loop:
>
> root 558266 2732 0 01:17 pts/2 00:00:23 mausezahn veth0.10 -p 8000 -A 192.0.2.1 -B 192.0.2.2 -c 0 -a own -b ba:33:37:81:dc:5
> root 558273 2732 0 01:17 pts/2 00:00:23 mausezahn veth0.11 -p 8000 -A 192.0.2.17 -B 192.0.2.18 -c 0 -a own -b ba:33:37:81:dc

Sure, the test is stuck and therefore didn't reap its MZ invocations.
The test is supposed to invoke stop_traffic(), which then does { kill %%
&& wait %%; }, which should do it.

The question is, does it reach that statement? If yes, the wait is
likely where it's stuck, but why? If not, where else could it be stuck?
I don't see any other place where the test loops endlessly or waits.

> I am not sure if this will ever complete, or is it a runaway loop?

It won't, that's what the "-c 0" on the command like does.

> Probably it is stuck, for nothing was written to the test log for 71 minutes.
>
> I hope we are coming closer to the cause.
>
> Previously I didn't figure out what hanged.

The script did :)

The fact that there's mausezahn running alongside is relevant, but not
the cause.