2018-12-17 19:43:38

by Rafael Tinoco

[permalink] [raw]
Subject: selftests/net: udpgso: LTS kernels supportability ?

Shuah,

I was recently investigating some errors coming out of our functional
tests and we, Dan and I, came up with a discussion that might not be new
for you, but, interests us, in defining how to better use kselftests as
a regression mechanism/tool in our LKFT (https://lkft.linaro.org).

David / Willem,

I'm only using udpgso as an example for what I'd like to ask Shuah. Feel
free to jump in in the discussion if you think its worth.

All,

Regarding: udpgso AND https://bugs.linaro.org/show_bug.cgi?id=3980

udpgso tests are failing in kernels bellow 4.18 because of 2 main reasons:

1) udp4_ufo_fragment does not seem to demand the GSO SKB to be > than
the MTU for older kernels (4th test case in udpgso.c).

2) setsockopt(...UDP_SEGMENT) support is not present for older kernels.
(commits "udp: generate gso with UDP_SEGMENT" and its fixes seem to be
needed).

With that explained, finally the question/discussion:

Shouldn't we enforce a versioning mechanism for tests that are testing
recently added features ? I mean, some of the tests inside udpgso
selftest are good enough for older kernels...

But, because we have no control over "kernel features" and "supported
test cases", we, Linaro, have to end up blacklisting all selftests that
have new feature oriented tests, because one or two test cases only.

This has already been solved in other functional tests projects:
allowing to check the running kernel version and deciding which test
cases to run.

Would that be something we should pursue ? (We could try to make patches
here and there, like this case, whenever we face this). Or... should we
stick with mainline/next only when talking about kselftest and forget
about LTS kernels ?

OBS: Situations like this are very time consuming before we can tell if
there was a regression or the older kernel did not support the test case.

Thank you for the attention.

Rafael
--
Rafael D. Tinoco
Linaro - Kernel Validation


2018-12-17 23:13:42

by Shuah Khan

[permalink] [raw]
Subject: Re: selftests/net: udpgso: LTS kernels supportability ?

Hi Rafael,

On 12/17/18 10:53 AM, Rafael David Tinoco wrote:
> Shuah,
>
> I was recently investigating some errors coming out of our functional
> tests and we, Dan and I, came up with a discussion that might not be new
> for you, but, interests us, in defining how to better use kselftests as
> a regression mechanism/tool in our LKFT (https://lkft.linaro.org).
>
> David / Willem,
>
> I'm only using udpgso as an example for what I'd like to ask Shuah. Feel
> free to jump in in the discussion if you think its worth.
>
> All,
>
> Regarding: udpgso AND https://bugs.linaro.org/show_bug.cgi?id=3980
>
> udpgso tests are failing in kernels bellow 4.18 because of 2 main reasons:
>
> 1) udp4_ufo_fragment does not seem to demand the GSO SKB to be > than
> the MTU for older kernels (4th test case in udpgso.c).
>
> 2) setsockopt(...UDP_SEGMENT) support is not present for older kernels.
> (commits "udp: generate gso with UDP_SEGMENT" and its fixes seem to be
> needed).

This case is easy right? Based on the test output below , I can see that
the failure is due to

./udpgso: setsockopt udp segment: Protocol not available. setsockopt()
is returning an error to clearly indicate that this options isn't
supported. This will be a test change to say test is a skip as opposed
to fail.

We have a solution for this - test should SKIP as opposed to FAIL.

> With that explained, finally the question/discussion:
>
> Shouldn't we enforce a versioning mechanism for tests that are testing
> recently added features ? I mean, some of the tests inside udpgso
> selftest are good enough for older kernels...

Right - we do have generic way to handle that by detecting if feature is
supported and skip instead of using Kernel version which is going to be
hard to maintain.

>
> But, because we have no control over "kernel features" and "supported
> test cases", we, Linaro, have to end up blacklisting all selftests that
> have new feature oriented tests, because one or two test cases only.
>
> This has already been solved in other functional tests projects:
> allowing to check the running kernel version and deciding which test
> cases to run.
>

I would like to see effort going into fixing tests to skip when a
feature isn't supported. I think that is the solution that will be
maintainable in the long run.

> Would that be something we should pursue ? (We could try to make patches
> here and there, like this case, whenever we face this). Or... should we
> stick with mainline/next only when talking about kselftest and forget
> about LTS kernels ?
>

There is a middle of the road solution to run Kselftest from the same
kernel release on LTS kernels and report the results as it is turning
out be adding overhead in interpreting results when mainline/next
Kselftest are run on LTS.

Kselftest mainline/next tends to be in a state where there could be bugs
in tests like the one you are finding in the example you used to
describe the problem. As we find them we fix them. That is just the
nature of mainline/next

Maybe for LTS kernels it is better for you to stay with Kselftest from
the same release or close to it. For example, running 4.20 Kselftest on
4.4 is going to result in more skips/(false fails) than running 4.4
Kselftest on 4.4 even though it might provide better coverage. It is a
judgment call on the overhead vs. advantage running newer Kselftest from
mainline/next on LTS.

I don't think versioning (skip or release based) can fully address the
problem you are seeing considering the fluid nature of mainline/next.

thanks,
-- Shuah

2018-12-18 11:39:25

by Rafael Tinoco

[permalink] [raw]
Subject: Re: selftests/net: udpgso: LTS kernels supportability ?

On 12/17/18 4:42 PM, shuah wrote:
> Hi Rafael,
>
> On 12/17/18 10:53 AM, Rafael David Tinoco wrote:
>> Shuah,
>>
>> I was recently investigating some errors coming out of our functional
>> tests and we, Dan and I, came up with a discussion that might not be new
>> for you, but, interests us, in defining how to better use kselftests as
>> a regression mechanism/tool in our LKFT (https://lkft.linaro.org).
>>
>> David / Willem,
>>
>> I'm only using udpgso as an example for what I'd like to ask Shuah. Feel
>> free to jump in in the discussion if you think its worth.
>>
>> All,
>>
>> Regarding: udpgso AND https://bugs.linaro.org/show_bug.cgi?id=3980
>>
>> udpgso tests are failing in kernels bellow 4.18 because of 2 main
>> reasons:
>>
>> 1) udp4_ufo_fragment does not seem to demand the GSO SKB to be > than
>> the MTU for older kernels (4th test case in udpgso.c).
>>
>> 2) setsockopt(...UDP_SEGMENT) support is not present for older kernels.
>> (commits "udp: generate gso with UDP_SEGMENT" and its fixes seem to be
>> needed).
>
> This case is easy right? Based on the test output below , I can see that
> the failure is due to
>
> ./udpgso: setsockopt udp segment: Protocol not available. setsockopt()
> is returning an error to clearly indicate that this options isn't
> supported. This will be a test change to say test is a skip as opposed
> to fail.

You referred to (2). (1) isn't that straightforward.

> We have a solution for this - test should SKIP as opposed to FAIL.
>
>> With that explained, finally the question/discussion:
>>
>> Shouldn't we enforce a versioning mechanism for tests that are testing
>> recently added features ? I mean, some of the tests inside udpgso
>> selftest are good enough for older kernels...
>
> Right - we do have generic way to handle that by detecting if feature is
> supported and skip instead of using Kernel version which is going to be
> hard to maintain.

You can't distinguish case (1) failures between real failures OR older
kernel behaving differently then testcase expects.

>>
>> But, because we have no control over "kernel features" and "supported
>> test cases", we, Linaro, have to end up blacklisting all selftests that
>> have new feature oriented tests, because one or two test cases only.
>>
>> This has already been solved in other functional tests projects:
>> allowing to check the running kernel version and deciding which test
>> cases to run.
>>
>
> I would like to see effort going into fixing tests to skip when a
> feature isn't supported. I think that is the solution that will be
> maintainable in the long run.
>
>> Would that be something we should pursue ? (We could try to make patches
>> here and there, like this case, whenever we face this). Or... should we
>> stick with mainline/next only when talking about kselftest and forget
>> about LTS kernels ?
>>
>
> There is a middle of the road solution to run Kselftest from the same
> kernel release on LTS kernels and report the results as it is turning
> out be adding overhead in interpreting results when mainline/next
> Kselftest are run on LTS.
>
> Kselftest mainline/next tends to be in a state where there could be bugs
> in tests like the one you are finding in the example you used to
> describe the problem. As we find them we fix them. That is just the
> nature of mainline/next
>
> Maybe for LTS kernels it is better for you to stay with Kselftest from
> the same release or close to it. For example, running 4.20 Kselftest on
> 4.4 is going to result in more skips/(false fails) than running 4.4
> Kselftest on 4.4 even though it might provide better coverage. It is a
> judgment call on the overhead vs. advantage running newer Kselftest from
> mainline/next on LTS.
>
> I don't think versioning (skip or release based) can fully address the
> problem you are seeing considering the fluid nature of mainline/next.

Alright. I needed a statement in that direction to decide how to better
address our "regressions" for LTS kernels.

>
> thanks,
> -- Shuah

Thanks a lot.

--
Rafael D. Tinoco
Linaro - Kernel Validation

2018-12-18 14:55:02

by Shuah Khan

[permalink] [raw]
Subject: Re: selftests/net: udpgso: LTS kernels supportability ?

On 12/18/18 4:37 AM, Rafael David Tinoco wrote:
> On 12/17/18 4:42 PM, shuah wrote:
>> Hi Rafael,
>>
>> On 12/17/18 10:53 AM, Rafael David Tinoco wrote:
>>> Shuah,
>>>
>>> I was recently investigating some errors coming out of our functional
>>> tests and we, Dan and I, came up with a discussion that might not be new
>>> for you, but, interests us, in defining how to better use kselftests as
>>> a regression mechanism/tool in our LKFT (https://lkft.linaro.org).
>>>
>>> David / Willem,
>>>
>>> I'm only using udpgso as an example for what I'd like to ask Shuah. Feel
>>> free to jump in in the discussion if you think its worth.
>>>
>>> All,
>>>
>>> Regarding: udpgso AND https://bugs.linaro.org/show_bug.cgi?id=3980
>>>
>>> udpgso tests are failing in kernels bellow 4.18 because of 2 main
>>> reasons:
>>>
>>> 1) udp4_ufo_fragment does not seem to demand the GSO SKB to be > than
>>> the MTU for older kernels (4th test case in udpgso.c).
>>>
>>> 2) setsockopt(...UDP_SEGMENT) support is not present for older kernels.
>>> (commits "udp: generate gso with UDP_SEGMENT" and its fixes seem to be
>>> needed).
>>
>> This case is easy right? Based on the test output below , I can see that
>> the failure is due to
>>
>> ./udpgso: setsockopt udp segment: Protocol not available. setsockopt()
>> is returning an error to clearly indicate that this options isn't
>> supported. This will be a test change to say test is a skip as opposed
>> to fail.
>
> You referred to (2). (1) isn't that straightforward.
>
>> We have a solution for this - test should SKIP as opposed to FAIL.
>>
>>> With that explained, finally the question/discussion:
>>>
>>> Shouldn't we enforce a versioning mechanism for tests that are testing
>>> recently added features ? I mean, some of the tests inside udpgso
>>> selftest are good enough for older kernels...
>>
>> Right - we do have generic way to handle that by detecting if feature is
>> supported and skip instead of using Kernel version which is going to be
>> hard to maintain.
>
> You can't distinguish case (1) failures between real failures OR older
> kernel behaving differently then testcase expects.
>
>>>
>>> But, because we have no control over "kernel features" and "supported
>>> test cases", we, Linaro, have to end up blacklisting all selftests that
>>> have new feature oriented tests, because one or two test cases only.
>>>

Can you share the blacklisted tests?

thanks,
-- Shuah

2018-12-18 16:41:56

by Rafael Tinoco

[permalink] [raw]
Subject: Re: selftests/net: udpgso: LTS kernels supportability ?

On 12/18/18 12:53 PM, shuah wrote:
> On 12/18/18 4:37 AM, Rafael David Tinoco wrote:
>> On 12/17/18 4:42 PM, shuah wrote:
>>> Hi Rafael,
>>>
>>> On 12/17/18 10:53 AM, Rafael David Tinoco wrote:
>>>> Shuah,
>>>>
>>>> I was recently investigating some errors coming out of our functional
>>>> tests and we, Dan and I, came up with a discussion that might not be
>>>> new
>>>> for you, but, interests us, in defining how to better use kselftests as
>>>> a regression mechanism/tool in our LKFT (https://lkft.linaro.org).
>>>>
>>>> David / Willem,
>>>>
>>>> I'm only using udpgso as an example for what I'd like to ask Shuah.
>>>> Feel
>>>> free to jump in in the discussion if you think its worth.
>>>>
>>>> All,
>>>>
>>>> Regarding: udpgso AND https://bugs.linaro.org/show_bug.cgi?id=3980
>>>>
>>>> udpgso tests are failing in kernels bellow 4.18 because of 2 main
>>>> reasons:
>>>>
>>>> 1) udp4_ufo_fragment does not seem to demand the GSO SKB to be > than
>>>> the MTU for older kernels (4th test case in udpgso.c).
>>>>
>>>> 2) setsockopt(...UDP_SEGMENT) support is not present for older kernels.
>>>> (commits "udp: generate gso with UDP_SEGMENT" and its fixes seem to be
>>>> needed).
>>>
>>> This case is easy right? Based on the test output below , I can see that
>>> the failure is due to
>>>
>>> ./udpgso: setsockopt udp segment: Protocol not available. setsockopt()
>>> is returning an error to clearly indicate that this options isn't
>>> supported. This will be a test change to say test is a skip as opposed
>>> to fail.
>>
>> You referred to (2). (1) isn't that straightforward.
>>
>>> We have a solution for this - test should SKIP as opposed to FAIL.
>>>
>>>> With that explained, finally the question/discussion:
>>>>
>>>> Shouldn't we enforce a versioning mechanism for tests that are testing
>>>> recently added features ? I mean, some of the tests inside udpgso
>>>> selftest are good enough for older kernels...
>>>
>>> Right - we do have generic way to handle that by detecting if feature is
>>> supported and skip instead of using Kernel version which is going to be
>>> hard to maintain.
>>
>> You can't distinguish case (1) failures between real failures OR older
>> kernel behaving differently then testcase expects.
>>
>>>>
>>>> But, because we have no control over "kernel features" and "supported
>>>> test cases", we, Linaro, have to end up blacklisting all selftests that
>>>> have new feature oriented tests, because one or two test cases only.
>>>>
>
> Can you share the blacklisted tests?

Sure!

The following yaml file:

https://github.com/Linaro/qa-reports-known-issues/blob/master/kselftests-production.yaml

Contains a list of all "known issues" for kselftests running in LKFT.

1 entry example (this particular one):

- environments: *environments_all
notes: >
LKFT: net: udpgso.sh PASS on 4.18 and failed on 4.17 and below
projects:
- lkft/linux-stable-rc-4.14-oe
- lkft/linux-stable-rc-4.9-oe
- lkft/linux-stable-rc-4.4-oe
- lkft/linaro-hikey-stable-rc-4.4-oe
test_names:
- kselftest/net_udpgso.sh
- kselftest-vsyscall-mode-none/net_udpgso.sh
- kselftest-vsyscall-mode-native/net_udpgso.sh
url: https://bugs.linaro.org/show_bug.cgi?id=3980
active: true
intermittent: false

Tells us:

Its a known issue for all environments (arm, arm64, x86, x86_64, qemus,
etc) and kernels 4.14, 4.9 and 4.4. The "url" shows our internal bug
related to the known issue/investigation.

With that, we don't consider it as a regression when this test fails for
4.14, 4.9 and 4.4 in any existing environment (boards, qemu/kvm, hosts,
diff archs).

I hope it helps you,

Cheers o/
--
Rafael D. Tinoco
Linaro - Kernel Validation