2023-08-07 20:39:03

by Mirsad Todorovac

[permalink] [raw]
Subject: selftests: net/af_unix test_unix_oob [FAILED]

Hi all,

In the kernel 6.5-rc5 build on Ubuntu 22.04 LTS (jammy jellyfish) on a Ryzen 7950 assembled box,
vanilla torvalds tree kernel, the test test_unix_oob unexpectedly fails:

# selftests: net/af_unix: test_unix_oob
# Test 2 failed, sigurg 23 len 63 OOB %

It is this code:

/* Test 2:
* Verify that the first OOB is over written by
* the 2nd one and the first OOB is returned as
* part of the read, and sigurg is received.
*/
wait_for_data(pfd, POLLIN | POLLPRI);
len = 0;
while (len < 70)
len = recv(pfd, buf, 1024, MSG_PEEK);
len = read_data(pfd, buf, 1024);
read_oob(pfd, &oob);
if (!signal_recvd || len != 127 || oob != '#') {
fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n",
signal_recvd, len, oob);
die(1);
}

In 6.5-rc4, this test was OK, so it might mean we have a regression?

marvin@defiant:~/linux/kernel/linux_torvalds$ grep test_unix_oob ../kselftest-6.5-rc4-1.log
/net/af_unix/test_unix_oob
# selftests: net/af_unix: test_unix_oob
ok 2 selftests: net/af_unix: test_unix_oob
marvin@defiant:~/linux/kernel/linux_torvalds$

Hope this helps.

NOTE: the kernel is vanilla torvalds tree, only "dirty" because the selftests were modified.

Kind regards,
Mirsad Todorovac


Attachments:
config-6.5.0-rc5-debug-dirty.xz (56.40 kB)

2023-08-07 21:52:47

by Kuniyuki Iwashima

[permalink] [raw]
Subject: Re: selftests: net/af_unix test_unix_oob [FAILED]

From: Mirsad Todorovac <[email protected]>
Date: Mon, 7 Aug 2023 21:44:41 +0200
> Hi all,
>
> In the kernel 6.5-rc5 build on Ubuntu 22.04 LTS (jammy jellyfish) on a Ryzen 7950 assembled box,
> vanilla torvalds tree kernel, the test test_unix_oob unexpectedly fails:
>
> # selftests: net/af_unix: test_unix_oob
> # Test 2 failed, sigurg 23 len 63 OOB %
>
> It is this code:
>
> /* Test 2:
> * Verify that the first OOB is over written by
> * the 2nd one and the first OOB is returned as
> * part of the read, and sigurg is received.
> */
> wait_for_data(pfd, POLLIN | POLLPRI);
> len = 0;
> while (len < 70)
> len = recv(pfd, buf, 1024, MSG_PEEK);
> len = read_data(pfd, buf, 1024);
> read_oob(pfd, &oob);
> if (!signal_recvd || len != 127 || oob != '#') {
> fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n",
> signal_recvd, len, oob);
> die(1);
> }
>
> In 6.5-rc4, this test was OK, so it might mean we have a regression?

Thanks for reporting.

I confirmed the test doesn't fail on net-next at least, but it's based
on v6.5-rc4.

---8<---
[root@localhost ~]# ./test_unix_oob
[root@localhost ~]# echo $?
0
[root@localhost ~]# uname -r
6.5.0-rc4-01192-g66244337512f
---8<---

I'll check 6.5-rc5 later.


>
> marvin@defiant:~/linux/kernel/linux_torvalds$ grep test_unix_oob ../kselftest-6.5-rc4-1.log
> /net/af_unix/test_unix_oob
> # selftests: net/af_unix: test_unix_oob
> ok 2 selftests: net/af_unix: test_unix_oob
> marvin@defiant:~/linux/kernel/linux_torvalds$
>
> Hope this helps.
>
> NOTE: the kernel is vanilla torvalds tree, only "dirty" because the selftests were modified.
>
> Kind regards,
> Mirsad Todorovac

2023-08-07 23:41:10

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: selftests: net/af_unix test_unix_oob [FAILED]

On 8/7/23 22:46, Kuniyuki Iwashima wrote:
> From: Mirsad Todorovac <[email protected]>
> Date: Mon, 7 Aug 2023 21:44:41 +0200
>> Hi all,
>>
>> In the kernel 6.5-rc5 build on Ubuntu 22.04 LTS (jammy jellyfish) on a Ryzen 7950 assembled box,
>> vanilla torvalds tree kernel, the test test_unix_oob unexpectedly fails:
>>
>> # selftests: net/af_unix: test_unix_oob
>> # Test 2 failed, sigurg 23 len 63 OOB %
>>
>> It is this code:
>>
>> /* Test 2:
>> * Verify that the first OOB is over written by
>> * the 2nd one and the first OOB is returned as
>> * part of the read, and sigurg is received.
>> */
>> wait_for_data(pfd, POLLIN | POLLPRI);
>> len = 0;
>> while (len < 70)
>> len = recv(pfd, buf, 1024, MSG_PEEK);
>> len = read_data(pfd, buf, 1024);
>> read_oob(pfd, &oob);
>> if (!signal_recvd || len != 127 || oob != '#') {
>> fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n",
>> signal_recvd, len, oob);
>> die(1);
>> }
>>
>> In 6.5-rc4, this test was OK, so it might mean we have a regression?
>
> Thanks for reporting.
>
> I confirmed the test doesn't fail on net-next at least, but it's based
> on v6.5-rc4.
>
> ---8<---
> [root@localhost ~]# ./test_unix_oob
> [root@localhost ~]# echo $?
> 0
> [root@localhost ~]# uname -r
> 6.5.0-rc4-01192-g66244337512f
> ---8<---
>
> I'll check 6.5-rc5 later.

Hi, Kuniyuki,

It seems that there is a new development. I could reproduce the error with the failed test 2
as early as 6.0-rc1. However, the gotcha is that the error appears to be sporadically manifested
(possibly a race)?

I am currently attempting a bisect.

Kind regards,
Mirsad

>> marvin@defiant:~/linux/kernel/linux_torvalds$ grep test_unix_oob ../kselftest-6.5-rc4-1.log
>> /net/af_unix/test_unix_oob
>> # selftests: net/af_unix: test_unix_oob
>> ok 2 selftests: net/af_unix: test_unix_oob
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>>
>> Hope this helps.
>>
>> NOTE: the kernel is vanilla torvalds tree, only "dirty" because the selftests were modified.
>>
>> Kind regards,
>> Mirsad Todorovac

2023-08-08 17:55:35

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: selftests: net/af_unix test_unix_oob [FAILED]

On 8/8/23 01:09, Mirsad Todorovac wrote:
> On 8/7/23 22:46, Kuniyuki Iwashima wrote:
>> From: Mirsad Todorovac <[email protected]>
>> Date: Mon, 7 Aug 2023 21:44:41 +0200
>>> Hi all,
>>>
>>> In the kernel 6.5-rc5 build on Ubuntu 22.04 LTS (jammy jellyfish) on a Ryzen 7950 assembled box,
>>> vanilla torvalds tree kernel, the test test_unix_oob unexpectedly fails:
>>>
>>> # selftests: net/af_unix: test_unix_oob
>>> # Test 2 failed, sigurg 23 len 63 OOB %
>>>
>>> It is this code:
>>>
>>>           /* Test 2:
>>>            * Verify that the first OOB is over written by
>>>            * the 2nd one and the first OOB is returned as
>>>            * part of the read, and sigurg is received.
>>>            */
>>>           wait_for_data(pfd, POLLIN | POLLPRI);
>>>           len = 0;
>>>           while (len < 70)
>>>                   len = recv(pfd, buf, 1024, MSG_PEEK);
>>>           len = read_data(pfd, buf, 1024);
>>>           read_oob(pfd, &oob);
>>>           if (!signal_recvd || len != 127 || oob != '#') {
>>>                   fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n",
>>>                   signal_recvd, len, oob);
>>>                   die(1);
>>>           }
>>>
>>> In 6.5-rc4, this test was OK, so it might mean we have a regression?
>>
>> Thanks for reporting.
>>
>> I confirmed the test doesn't fail on net-next at least, but it's based
>> on v6.5-rc4.
>>
>>    ---8<---
>>    [root@localhost ~]# ./test_unix_oob
>>    [root@localhost ~]# echo $?
>>    0
>>    [root@localhost ~]# uname -r
>>    6.5.0-rc4-01192-g66244337512f
>>    ---8<---
>>
>> I'll check 6.5-rc5 later.
>
> Hi, Kuniyuki,
>
> It seems that there is a new development. I could reproduce the error with the failed test 2
> as early as 6.0-rc1. However, the gotcha is that the error appears to be sporadically manifested
> (possibly a race)?
>
> I am currently attempting a bisect.

Bisect had shown that the condition existed already at 5.11 torvalds tree.

It has to do with the configs chosen (I used the configs from seltests/*/config merged), but it
is also present in the Ubuntu production build:

marvin@defiant:~$ cd linux/kernel/linux_torvalds
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
Test 2 failed, sigurg 23 len 63 OOB %
marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
Linux 6.4.8-060408-generic x86_64
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
Test 1 failed sigurg 0 len 63
marvin@defiant:~/linux/kernel/linux_torvalds$

It happens on rare occasions, so it seems to be a hard-to-spot race.

Normal test running test_unix_oob once never noticed that, save by accident, which brought the problem to attention ...

However, the problem seems to be config-driven rather than kernel-version-driven.

marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..100000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 1 Inline failed, sigurg 0 len 63
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 failed, sigurg 23 len 63 OOB %
marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
Linux 6.5.0-060500rc4-generic x86_64
marvin@defiant:~/linux/kernel/linux_torvalds$

At moments, I was able to reproduce with certain configs, but now something odd happens.

I will keep investigating.

Kind regards,
Mirsad

2023-08-14 09:37:00

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: selftests: net/af_unix test_unix_oob [FAILED]

On 8/8/23 10:53, Mirsad Todorovac wrote:
> On 8/8/23 01:09, Mirsad Todorovac wrote:
>> On 8/7/23 22:46, Kuniyuki Iwashima wrote:
>>> From: Mirsad Todorovac <[email protected]>
>>> Date: Mon, 7 Aug 2023 21:44:41 +0200
>>>> Hi all,
>>>>
>>>> In the kernel 6.5-rc5 build on Ubuntu 22.04 LTS (jammy jellyfish) on a Ryzen 7950 assembled box,
>>>> vanilla torvalds tree kernel, the test test_unix_oob unexpectedly fails:
>>>>
>>>> # selftests: net/af_unix: test_unix_oob
>>>> # Test 2 failed, sigurg 23 len 63 OOB %
>>>>
>>>> It is this code:
>>>>
>>>>           /* Test 2:
>>>>            * Verify that the first OOB is over written by
>>>>            * the 2nd one and the first OOB is returned as
>>>>            * part of the read, and sigurg is received.
>>>>            */
>>>>           wait_for_data(pfd, POLLIN | POLLPRI);
>>>>           len = 0;
>>>>           while (len < 70)
>>>>                   len = recv(pfd, buf, 1024, MSG_PEEK);
>>>>           len = read_data(pfd, buf, 1024);
>>>>           read_oob(pfd, &oob);
>>>>           if (!signal_recvd || len != 127 || oob != '#') {
>>>>                   fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n",
>>>>                   signal_recvd, len, oob);
>>>>                   die(1);
>>>>           }
>>>>
>>>> In 6.5-rc4, this test was OK, so it might mean we have a regression?
>>>
>>> Thanks for reporting.
>>>
>>> I confirmed the test doesn't fail on net-next at least, but it's based
>>> on v6.5-rc4.
>>>
>>>    ---8<---
>>>    [root@localhost ~]# ./test_unix_oob
>>>    [root@localhost ~]# echo $?
>>>    0
>>>    [root@localhost ~]# uname -r
>>>    6.5.0-rc4-01192-g66244337512f
>>>    ---8<---
>>>
>>> I'll check 6.5-rc5 later.
>>
>> Hi, Kuniyuki,
>>
>> It seems that there is a new development. I could reproduce the error with the failed test 2
>> as early as 6.0-rc1. However, the gotcha is that the error appears to be sporadically manifested
>> (possibly a race)?
>>
>> I am currently attempting a bisect.
>
> Bisect had shown that the condition existed already at 5.11 torvalds tree.
>
> It has to do with the configs chosen (I used the configs from seltests/*/config merged), but it
> is also present in the Ubuntu production build:
>
> marvin@defiant:~$ cd linux/kernel/linux_torvalds
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> Test 2 failed, sigurg 23 len 63 OOB %
> marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
> Linux 6.4.8-060408-generic x86_64
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> Test 1 failed sigurg 0 len 63
> marvin@defiant:~/linux/kernel/linux_torvalds$
>
> It happens on rare occasions, so it seems to be a hard-to-spot race.
>
> Normal test running test_unix_oob once never noticed that, save by accident, which brought the problem to attention ...
>
> However, the problem seems to be config-driven rather than kernel-version-driven.
>
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..100000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> Test 3.1 Inline failed, len 1 oob % atmark 0
> Test 1 Inline failed, sigurg 0 len 63
> Test 1 Inline failed, sigurg 0 len 63
> Test 1 Inline failed, sigurg 0 len 63
> Test 2 Inline failed, len 63 atmark 1
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 2 Inline failed, len 63 atmark 1
> Test 3.1 Inline failed, len 1 oob % atmark 0
> Test 2 failed, sigurg 23 len 63 OOB %
> marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
> Linux 6.5.0-060500rc4-generic x86_64
> marvin@defiant:~/linux/kernel/linux_torvalds$
>
> At moments, I was able to reproduce with certain configs, but now something odd happens.
>
> I will keep investigating.

Please not that the bug persisted in 6.5-rc6:

marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..100000}; do !!; done
for a in {0..100000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 1 Inline failed, sigurg 0 len 63
Test 1 Inline failed, sigurg 0 len 63
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 failed, sigurg 23 len 63 OOB %
Test 1 Inline failed, sigurg 0 len 63
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
marvin@defiant:~/linux/kernel/linux_torvalds$

The bug can be triggered as a non-privileged user, but is not clear whether it is exploitable to elevate privileges.

Best regards,
Mirsad Todorovac

2023-08-21 04:12:41

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: selftests: net/af_unix test_unix_oob [FAILED][NEW]

On 8/14/23 10:54, Mirsad Todorovac wrote:
> On 8/8/23 10:53, Mirsad Todorovac wrote:
>> On 8/8/23 01:09, Mirsad Todorovac wrote:
>>> On 8/7/23 22:46, Kuniyuki Iwashima wrote:
>>>> From: Mirsad Todorovac <[email protected]>
>>>> Date: Mon, 7 Aug 2023 21:44:41 +0200
>>>>> Hi all,
>>>>>
>>>>> In the kernel 6.5-rc5 build on Ubuntu 22.04 LTS (jammy jellyfish) on a Ryzen 7950 assembled box,
>>>>> vanilla torvalds tree kernel, the test test_unix_oob unexpectedly fails:
>>>>>
>>>>> # selftests: net/af_unix: test_unix_oob
>>>>> # Test 2 failed, sigurg 23 len 63 OOB %
>>>>>
>>>>> It is this code:
>>>>>
>>>>>           /* Test 2:
>>>>>            * Verify that the first OOB is over written by
>>>>>            * the 2nd one and the first OOB is returned as
>>>>>            * part of the read, and sigurg is received.
>>>>>            */
>>>>>           wait_for_data(pfd, POLLIN | POLLPRI);
>>>>>           len = 0;
>>>>>           while (len < 70)
>>>>>                   len = recv(pfd, buf, 1024, MSG_PEEK);
>>>>>           len = read_data(pfd, buf, 1024);
>>>>>           read_oob(pfd, &oob);
>>>>>           if (!signal_recvd || len != 127 || oob != '#') {
>>>>>                   fprintf(stderr, "Test 2 failed, sigurg %d len %d OOB %c\n",
>>>>>                   signal_recvd, len, oob);
>>>>>                   die(1);
>>>>>           }
>>>>>
>>>>> In 6.5-rc4, this test was OK, so it might mean we have a regression?
>>>>
>>>> Thanks for reporting.
>>>>
>>>> I confirmed the test doesn't fail on net-next at least, but it's based
>>>> on v6.5-rc4.
>>>>
>>>>    ---8<---
>>>>    [root@localhost ~]# ./test_unix_oob
>>>>    [root@localhost ~]# echo $?
>>>>    0
>>>>    [root@localhost ~]# uname -r
>>>>    6.5.0-rc4-01192-g66244337512f
>>>>    ---8<---
>>>>
>>>> I'll check 6.5-rc5 later.
>>>
>>> Hi, Kuniyuki,
>>>
>>> It seems that there is a new development. I could reproduce the error with the failed test 2
>>> as early as 6.0-rc1. However, the gotcha is that the error appears to be sporadically manifested
>>> (possibly a race)?
>>>
>>> I am currently attempting a bisect.
>>
>> Bisect had shown that the condition existed already at 5.11 torvalds tree.
>>
>> It has to do with the configs chosen (I used the configs from seltests/*/config merged), but it
>> is also present in the Ubuntu production build:
>>
>> marvin@defiant:~$ cd linux/kernel/linux_torvalds
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> Test 2 failed, sigurg 23 len 63 OOB %
>> marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
>> Linux 6.4.8-060408-generic x86_64
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> Test 1 failed sigurg 0 len 63
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>>
>> It happens on rare occasions, so it seems to be a hard-to-spot race.
>>
>> Normal test running test_unix_oob once never noticed that, save by accident, which brought the problem to attention ...
>>
>> However, the problem seems to be config-driven rather than kernel-version-driven.
>>
>> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..100000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
>> Test 3.1 Inline failed, len 1 oob % atmark 0
>> Test 1 Inline failed, sigurg 0 len 63
>> Test 1 Inline failed, sigurg 0 len 63
>> Test 1 Inline failed, sigurg 0 len 63
>> Test 2 Inline failed, len 63 atmark 1
>> Test 3 Inline failed, sigurg 23 len 63 data x
>> Test 3 Inline failed, sigurg 23 len 63 data x
>> Test 3 Inline failed, sigurg 23 len 63 data x
>> Test 3 Inline failed, sigurg 23 len 63 data x
>> Test 2 Inline failed, len 63 atmark 1
>> Test 3.1 Inline failed, len 1 oob % atmark 0
>> Test 2 failed, sigurg 23 len 63 OOB %
>> marvin@defiant:~/linux/kernel/linux_torvalds$ uname -rms
>> Linux 6.5.0-060500rc4-generic x86_64
>> marvin@defiant:~/linux/kernel/linux_torvalds$
>>
>> At moments, I was able to reproduce with certain configs, but now something odd happens.
>>
>> I will keep investigating.
>
> Please not that the bug persisted in 6.5-rc6:
>
> marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..100000}; do !!; done
> for a in {0..100000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
> Test 2 failed, sigurg 23 len 63 OOB %
> Test 2 Inline failed, len 63 atmark 1
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 2 failed, sigurg 23 len 63 OOB %
> Test 3.1 Inline failed, len 1 oob % atmark 0
> Test 3 Inline failed, sigurg 23 len 63 data x
> Test 1 Inline failed, sigurg 0 len 63
> Test 1 Inline failed, sigurg 0 len 63
> Test 3.1 Inline failed, len 1 oob % atmark 0
> Test 1 Inline failed, sigurg 0 len 63
> Test 2 failed, sigurg 23 len 63 OOB %
> Test 1 Inline failed, sigurg 0 len 63
> Test 2 failed, sigurg 23 len 63 OOB %
> Test 3.1 Inline failed, len 1 oob % atmark 0
> Test 3.1 Inline failed, len 1 oob % atmark 0
> marvin@defiant:~/linux/kernel/linux_torvalds$
>
> The bug can be triggered as a non-privileged user, but is not clear whether it is exploitable to elevate privileges.

Hi again,

I have tried the selftests/net/af_unix/test_oob_unix and:

marvin@defiant:~/linux/kernel/linux_torvalds$ for a in {0..1000}; do tools/testing/selftests/net/af_unix/test_unix_oob ; done
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 1 Inline failed, sigurg 0 len 63
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 failed, sigurg 23 len 63 OOB %
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 3 Inline failed, sigurg 23 len 63 data x
Test 1 Inline failed, sigurg 0 len 63
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 1 Inline failed, sigurg 0 len 63
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 2 failed, sigurg 23 len 63 OOB %
Test 1 Inline failed, sigurg 0 len 63
Test 1 Inline failed, sigurg 0 len 63
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 2 failed, sigurg 23 len 63 OOB %
Test 1 Inline failed, sigurg 0 len 63
Test 1 Inline failed, sigurg 0 len 63
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 2 failed, sigurg 23 len 63 OOB %
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3 Inline failed, sigurg 23 len 63 data x
Test 1 Inline failed, sigurg 0 len 63
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 3 Inline failed, sigurg 23 len 63 data x
Test 1 Inline failed, sigurg 0 len 63
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 2 Inline failed, len 63 atmark 1
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 1 Inline failed, sigurg 0 len 63
Test 2 Inline failed, len 63 atmark 1
Test 3 Inline failed, sigurg 23 len 63 data x
Test 2 Inline failed, len 63 atmark 1
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 Inline failed, len 63 atmark 1
Test 3.1 Inline failed, len 1 oob % atmark 0
Test 2 failed, sigurg 23 len 63 OOB %
Test 2 failed, sigurg 23 len 63 OOB %
marvin@defiant:~/linux/kernel/linux_torvalds$

The kernel is 6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 vanilla torvalds tree on Ubuntu 22.04.

Best regards,
Mirsad Todorovac


Attachments:
config-6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650.xz (56.34 kB)