2017-05-26 07:45:18

by Stefan Seyfried

[permalink] [raw]
Subject: [BUG] unit/test-gatt failure uninitialized pointer(?)

Hi all,

in the Open Build Service packages for openSUSE are built in VMs without network connection. This leads to failure in "make check". Debugging found:

/robustness/unkown-request - init
/robustness/unkown-request - setup
/robustness/unkown-request - setup complete
/robustness/unkown-request - run

Program received signal SIGSEGV, Segmentation fault.
0x0000555555598225 in timeout_cb (user_data=0x5555557c82f0) at src/shared/att.c:405
405 if (att->pending_req && att->pending_req->id == timeout->id) {
(gdb) print att->pending_req->id
Cannot access memory at address 0x4545454545454545
(gdb) print att->pending_req
$1 = (struct att_send_op *) 0x4545454545454545
(gdb) print timeout->id
$2 = 12
(gdb) bt
#0 0x0000555555598225 in timeout_cb (user_data=0x5555557c82f0) at src/shared/att.c:405
#1 0x00005555555a499d in timeout_callback (user_data=<optimized out>) at src/shared/timeout-glib.c:34
#2 0x00007ffff7b101d3 in ?? () from /usr/lib64/libglib-2.0.so.0
#3 0x00007ffff7b0f75a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#4 0x00007ffff7b0fb10 in ?? () from /usr/lib64/libglib-2.0.so.0
#5 0x00007ffff7b0fe32 in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#6 0x0000555555597e84 in tester_run () at src/shared/tester.c:830
#7 0x0000555555594a4c in main (argc=<optimized out>, argv=<optimized out>) at unit/test-gatt.c:4474
(gdb)

This started after I tried to update the package to bluez-5.45, it
still happens with current bluez git master.

Hope this helps, I am now commenting out this test for now.
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman


2017-05-29 12:45:42

by Stefan Seyfried

[permalink] [raw]
Subject: Re: [BUG] unit/test-gatt failure uninitialized pointer(?)

On 29.05.2017 14:10, Stefan Seyfried wrote:
> It should probably be reproducible by unloading and blacklisting the af_alg module on your machine.

I just tried this and this reproduces the infinite hang just fine. It does not crash the test, though.

--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman

2017-05-29 12:10:43

by Stefan Seyfried

[permalink] [raw]
Subject: Re: [BUG] unit/test-gatt failure uninitialized pointer(?)

Hi Luiz,

On 29.05.2017 13:42, Luiz Augusto von Dentz wrote:
> unit/test-gatt -p /robustness/unkown-request

This hangs for apparently infinite time, but does not crash. So you are probably right, that some previous test does
create a timeout handler which is triggered by this infinite wait and then crashes the test.

This test VMs are not slow, so it's not a performance problem. They just have limited connectivity.

I debugged this with strace and found the following difference:

Working:

13:55:57 write(1, "\33[1;39m/robustness/unkown-reques"..., 55/robustness/unkown-request - run
) = 55 <0.000053>
13:55:57 socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0, [5, 6]) = 0 <0.000051>
13:55:57 fstat(5, {st_dev=makedev(0, 9), st_ino=281430, st_mode=S_IFSOCK|0777, st_nlink=1, st_uid=10329, st_gid=100,
st_blksize=4096, st_blocks=0, st_size=0, st_atime=0, st_mtime=0, st_ctime=0}) = 0 <0.000049>
13:55:57 fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) <0.000049>
13:55:57 socket(AF_ALG, SOCK_SEQPACKET|SOCK_CLOEXEC, 0) = 7 <0.000048>
13:55:57 bind(7, {sa_family=AF_ALG,
sa_data="skcipher\0\0\0\0\0\0\0\0\0\0\0\0\0\0ecb(aes)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"},
88) = 0 <0.000071>
13:55:57 open("/dev/urandom", O_RDONLY) = 8 <0.000054>
13:55:57 socket(AF_ALG, SOCK_SEQPACKET|SOCK_CLOEXEC, 0) = 9 <0.000062>
13:55:57 bind(9, {sa_family=AF_ALG,
sa_data="hash\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0cmac(aes)\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"},
88) = 0 <0.000061>
13:55:57 getsockopt(5, SOL_SOCKET, SO_DOMAIN, [1], [4]) = 0 <0.000052>
13:55:57 fstat(6, {st_dev=makedev(0, 9), st_ino=281431, st_mode=S_IFSOCK|0777, st_nlink=1, st_uid=10329, st_gid=100,
st_blksize=4096, st_blocks=0, st_size=0, st_atime=0, st_mtime=0, st_ctime=0}) = 0 <0.000049>
13:55:57 fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) <0.000050>

Not working:

11:54:59 write(1, "\33[1;39m/robustness/unkown-reques"..., 55/robustness/unkown-request - run
) = 55 <0.000354>
11:54:59 socketpair(AF_UNIX, SOCK_SEQPACKET|SOCK_CLOEXEC, 0, [5, 6]) = 0 <0.000224>
11:54:59 fstat(5, {st_dev=makedev(0, 9), st_ino=26139, st_mode=S_IFSOCK|0777, st_nlink=1, st_uid=399, st_gid=399,
st_blksize=4096, st_blocks=0, st_size=0, st_atime=0, st_mtime=0, st_ctime=0}) = 0 <0.000174>
11:54:59 fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) <0.000079>
11:54:59 socket(AF_ALG, SOCK_SEQPACKET|SOCK_CLOEXEC, 0) = -1 EAFNOSUPPORT (Address family not supported by protocol)
<0.000287>
11:54:59 getsockopt(5, SOL_SOCKET, SO_DOMAIN, [1], [4]) = 0 <0.000191>
11:54:59 fstat(6, {st_dev=makedev(0, 9), st_ino=26140, st_mode=S_IFSOCK|0777, st_nlink=1, st_uid=399, st_gid=399,
st_blksize=4096, st_blocks=0, st_size=0, st_atime=0, st_mtime=0, st_ctime=0}) = 0 <0.000062>
11:54:59 fcntl(6, F_GETFL) = 0x2 (flags O_RDWR) <0.000223>
11:54:59 write(6, "\2\27\0", 3) = 3 <0.000061>
11:54:59 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 4, -1) = 1
([{fd=5, revents=POLLIN}]) <0.000208>
11:54:59 read(5, "\2\27\0", 23) = 3 <0.000062>
11:54:59 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}, {fd=6, events=POLLIN}], 4,
-1) = 1 ([{fd=5, revents=POLLOUT}]) <0.000040>
11:54:59 writev(5, [{iov_base="\3\0\2", iov_len=3}], 1) = 3 <0.000060>
11:54:59 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN|POLLOUT}, {fd=6, events=POLLIN}], 4,
-1) = 2 ([{fd=5, revents=POLLOUT}, {fd=6, revents=POLLIN}]) <0.000054>
11:54:59 read(6, "\3\0\2", 512) = 3 <0.000035>
11:54:59 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 4, 0) = 0
(Timeout) <0.000198>
11:54:59 write(6, "\277\0", 2) = 2 <0.000208>
11:54:59 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 4, -1) = 1
([{fd=5, revents=POLLIN}]) <0.000203>
11:54:59 read(5, "\277\0", 23) = 2 <0.000197>
11:54:59 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 4, -1

So it seems the VM has no support for AF_ALG and thus the test hangs infinitely, the old timeout kicks in and crashes
the test.

It should probably be reproducible by unloading and blacklisting the af_alg module on your machine.

Best regards,

Stefan
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman

2017-05-29 11:42:21

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] unit/test-gatt failure uninitialized pointer(?)

Hi Stefan,

On Mon, May 29, 2017 at 12:06 PM, Stefan Seyfried
<[email protected]> wrote:
> On 29.05.2017 09:14, Stefan Seyfried wrote:
>> I did not yet use valgrind inside the OBS build VM, but I can try to do that, that might give additional hints.
>
> /robustness/unkown-request - init
> /robustness/unkown-request - setup
> /robustness/unkown-request - setup complete
> /robustness/unkown-request - run
> ==12262== Invalid read of size 8
> ==12262== at 0x13FF29: timeout_cb (att.c:405)
> ==12262== by 0x14C45C: timeout_callback (timeout-glib.c:34)
> ==12262== by 0x4E86412: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x13FBAB: tester_run (tester.c:830)
> ==12262== by 0x13C83D: main (test-gatt.c:4488)
> ==12262== Address 0x5a9c970 is 32 bytes inside a block of size 192 free'd
> ==12262== at 0x4C2D27B: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==12262== by 0x1414EB: can_read_data (att.c:920)
> ==12262== by 0x14C022: watch_callback (io-glib.c:170)
> ==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x13FBAB: tester_run (tester.c:830)
> ==12262== by 0x13C83D: main (test-gatt.c:4488)
> ==12262== Block was alloc'd at
> ==12262== at 0x4C2C04F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==12262== by 0x13E90D: btd_malloc (util.c:45)
> ==12262== by 0x14052E: bt_att_new (att.c:1000)
> ==12262== by 0x13E3F1: create_context.constprop.27 (test-gatt.c:670)
> ==12262== by 0x13ED57: run_callback (tester.c:415)
> ==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x13FBAB: tester_run (tester.c:830)
> ==12262== by 0x13C83D: main (test-gatt.c:4488)
> ==12262==
> ==12262== Invalid read of size 8
> ==12262== at 0x13FF39: timeout_cb (att.c:408)
> ==12262== by 0x14C45C: timeout_callback (timeout-glib.c:34)
> ==12262== by 0x4E86412: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x13FBAB: tester_run (tester.c:830)
> ==12262== by 0x13C83D: main (test-gatt.c:4488)
> ==12262== Address 0x5a9c980 is 48 bytes inside a block of size 192 free'd
> ==12262== at 0x4C2D27B: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==12262== by 0x1414EB: can_read_data (att.c:920)
> ==12262== by 0x14C022: watch_callback (io-glib.c:170)
> ==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x13FBAB: tester_run (tester.c:830)
> ==12262== by 0x13C83D: main (test-gatt.c:4488)
> ==12262== Block was alloc'd at
> ==12262== at 0x4C2C04F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==12262== by 0x13E90D: btd_malloc (util.c:45)
> ==12262== by 0x14052E: bt_att_new (att.c:1000)
> ==12262== by 0x13E3F1: create_context.constprop.27 (test-gatt.c:670)
> ==12262== by 0x13ED57: run_callback (tester.c:415)
> ==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
> ==12262== by 0x13FBAB: tester_run (tester.c:830)
> ==12262== by 0x13C83D: main (test-gatt.c:4488)
> ==12262==
>
> it does not crash under valgrind, but also does not finish, I aborted it with ctrl-C after a few minutes.
>
> It seems to hit a 30 seconds timeout btw, if that's of any use:
>
> abuild@susi:~/rpmbuild/BUILD/bluez-5.45> time unit/test-gatt >/dev/null
> Segmentation fault (core dumped)
>
> real 0m30.061s
> user 0m0.044s
> sys 0m0.024s

That test in specific does not do any request, it only respond so I
think it is perhaps some of the previous test. Just confirm this you
can run just /robustness/unkown-request test alone with:

unit/test-gatt -p /robustness/unkown-request

Btw, it may as well be that this problem has been around since the
beginning but we only find out about it now with addition of more test
which makes the entire set to be over 30 seconds on slow hosts.

> Hope this sheds some light on the issue.
>
> Best regards,
>
> Stefan
> --
> Stefan Seyfried
>
> "For a successful technology, reality must take precedence over
> public relations, for nature cannot be fooled." -- Richard Feynman



--
Luiz Augusto von Dentz

2017-05-29 09:06:12

by Stefan Seyfried

[permalink] [raw]
Subject: Re: [BUG] unit/test-gatt failure uninitialized pointer(?)

On 29.05.2017 09:14, Stefan Seyfried wrote:
> I did not yet use valgrind inside the OBS build VM, but I can try to do that, that might give additional hints.

/robustness/unkown-request - init
/robustness/unkown-request - setup
/robustness/unkown-request - setup complete
/robustness/unkown-request - run
==12262== Invalid read of size 8
==12262== at 0x13FF29: timeout_cb (att.c:405)
==12262== by 0x14C45C: timeout_callback (timeout-glib.c:34)
==12262== by 0x4E86412: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x13FBAB: tester_run (tester.c:830)
==12262== by 0x13C83D: main (test-gatt.c:4488)
==12262== Address 0x5a9c970 is 32 bytes inside a block of size 192 free'd
==12262== at 0x4C2D27B: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12262== by 0x1414EB: can_read_data (att.c:920)
==12262== by 0x14C022: watch_callback (io-glib.c:170)
==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x13FBAB: tester_run (tester.c:830)
==12262== by 0x13C83D: main (test-gatt.c:4488)
==12262== Block was alloc'd at
==12262== at 0x4C2C04F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12262== by 0x13E90D: btd_malloc (util.c:45)
==12262== by 0x14052E: bt_att_new (att.c:1000)
==12262== by 0x13E3F1: create_context.constprop.27 (test-gatt.c:670)
==12262== by 0x13ED57: run_callback (tester.c:415)
==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x13FBAB: tester_run (tester.c:830)
==12262== by 0x13C83D: main (test-gatt.c:4488)
==12262==
==12262== Invalid read of size 8
==12262== at 0x13FF39: timeout_cb (att.c:408)
==12262== by 0x14C45C: timeout_callback (timeout-glib.c:34)
==12262== by 0x4E86412: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x13FBAB: tester_run (tester.c:830)
==12262== by 0x13C83D: main (test-gatt.c:4488)
==12262== Address 0x5a9c980 is 48 bytes inside a block of size 192 free'd
==12262== at 0x4C2D27B: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12262== by 0x1414EB: can_read_data (att.c:920)
==12262== by 0x14C022: watch_callback (io-glib.c:170)
==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x13FBAB: tester_run (tester.c:830)
==12262== by 0x13C83D: main (test-gatt.c:4488)
==12262== Block was alloc'd at
==12262== at 0x4C2C04F: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12262== by 0x13E90D: btd_malloc (util.c:45)
==12262== by 0x14052E: bt_att_new (att.c:1000)
==12262== by 0x13E3F1: create_context.constprop.27 (test-gatt.c:670)
==12262== by 0x13ED57: run_callback (tester.c:415)
==12262== by 0x4E85994: g_main_context_dispatch (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E85D57: ??? (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x4E86071: g_main_loop_run (in /usr/lib64/libglib-2.0.so.0.5200.2)
==12262== by 0x13FBAB: tester_run (tester.c:830)
==12262== by 0x13C83D: main (test-gatt.c:4488)
==12262==

it does not crash under valgrind, but also does not finish, I aborted it with ctrl-C after a few minutes.

It seems to hit a 30 seconds timeout btw, if that's of any use:

abuild@susi:~/rpmbuild/BUILD/bluez-5.45> time unit/test-gatt >/dev/null
Segmentation fault (core dumped)

real 0m30.061s
user 0m0.044s
sys 0m0.024s

Hope this sheds some light on the issue.

Best regards,

Stefan
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman

2017-05-29 07:14:02

by Stefan Seyfried

[permalink] [raw]
Subject: Re: [BUG] unit/test-gatt failure uninitialized pointer(?)

Hi Luiz,

On 29.05.2017 09:03, Luiz Augusto von Dentz wrote:
> Hi Stefan,
>
> On Fri, May 26, 2017 at 10:45 AM, Stefan Seyfried <[email protected]> wrote:
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x0000555555598225 in timeout_cb (user_data=0x5555557c82f0) at src/shared/att.c:405
>> 405 if (att->pending_req && att->pending_req->id == timeout->id) {
>> (gdb) print att->pending_req->id
>> Cannot access memory at address 0x4545454545454545
>> (gdb) print att->pending_req
>> $1 = (struct att_send_op *) 0x4545454545454545
>> (gdb) print timeout->id
>> $2 = 12
>> (gdb) bt
>> #0 0x0000555555598225 in timeout_cb (user_data=0x5555557c82f0) at src/shared/att.c:405
>> #1 0x00005555555a499d in timeout_callback (user_data=<optimized out>) at src/shared/timeout-glib.c:34
>> #2 0x00007ffff7b101d3 in ?? () from /usr/lib64/libglib-2.0.so.0
>> #3 0x00007ffff7b0f75a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>> #4 0x00007ffff7b0fb10 in ?? () from /usr/lib64/libglib-2.0.so.0
>> #5 0x00007ffff7b0fe32 in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
>> #6 0x0000555555597e84 in tester_run () at src/shared/tester.c:830
>> #7 0x0000555555594a4c in main (argc=<optimized out>, argv=<optimized out>) at unit/test-gatt.c:4474
>> (gdb)
>>
>> This started after I tried to update the package to bluez-5.45, it
>> still happens with current bluez git master.
>
> Are there any special tool you are running this? Or it is just gdb?
> Under valgrind everything looks fine:

I am unable to reproduce this on my laptop, but only in the OBS VM build.
The special thing about the OBS VMs is, that they have no network interfaces configured
(to ensure an isolated build).
I *guess* that this is what triggers this timeout and this timeout then references either
freed or uninitialized pointers.

I did not yet use valgrind inside the OBS build VM, but I can try to do that, that might give additional hints.

> /robustness/unkown-request - init
> /robustness/unkown-request - setup
> /robustness/unkown-request - setup complete
> /robustness/unkown-request - run
> /robustness/unkown-request - test passed
> /robustness/unkown-request - teardown
> /robustness/unkown-request - teardown complete
> /robustness/unkown-request - done
>
>
> Test Summary
> ------------
> /robustness/unkown-request Passed 0.070 seconds

Exactly what happens on my machine locally ;)
In the OBS VM after run there is a "long" pause (5? or 10? seconds, did not measure it) and then the segv happens.

Thanks and best regards,

Stefan
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman

2017-05-29 07:03:36

by Luiz Augusto von Dentz

[permalink] [raw]
Subject: Re: [BUG] unit/test-gatt failure uninitialized pointer(?)

Hi Stefan,

On Fri, May 26, 2017 at 10:45 AM, Stefan Seyfried
<[email protected]> wrote:
> Hi all,
>
> in the Open Build Service packages for openSUSE are built in VMs without network connection. This leads to failure in "make check". Debugging found:
>
> /robustness/unkown-request - init
> /robustness/unkown-request - setup
> /robustness/unkown-request - setup complete
> /robustness/unkown-request - run
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000555555598225 in timeout_cb (user_data=0x5555557c82f0) at src/shared/att.c:405
> 405 if (att->pending_req && att->pending_req->id == timeout->id) {
> (gdb) print att->pending_req->id
> Cannot access memory at address 0x4545454545454545
> (gdb) print att->pending_req
> $1 = (struct att_send_op *) 0x4545454545454545
> (gdb) print timeout->id
> $2 = 12
> (gdb) bt
> #0 0x0000555555598225 in timeout_cb (user_data=0x5555557c82f0) at src/shared/att.c:405
> #1 0x00005555555a499d in timeout_callback (user_data=<optimized out>) at src/shared/timeout-glib.c:34
> #2 0x00007ffff7b101d3 in ?? () from /usr/lib64/libglib-2.0.so.0
> #3 0x00007ffff7b0f75a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> #4 0x00007ffff7b0fb10 in ?? () from /usr/lib64/libglib-2.0.so.0
> #5 0x00007ffff7b0fe32 in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
> #6 0x0000555555597e84 in tester_run () at src/shared/tester.c:830
> #7 0x0000555555594a4c in main (argc=<optimized out>, argv=<optimized out>) at unit/test-gatt.c:4474
> (gdb)
>
> This started after I tried to update the package to bluez-5.45, it
> still happens with current bluez git master.

Are there any special tool you are running this? Or it is just gdb?
Under valgrind everything looks fine:

/robustness/unkown-request - init
/robustness/unkown-request - setup
/robustness/unkown-request - setup complete
/robustness/unkown-request - run
/robustness/unkown-request - test passed
/robustness/unkown-request - teardown
/robustness/unkown-request - teardown complete
/robustness/unkown-request - done


Test Summary
------------
/robustness/unkown-request Passed 0.070 seconds


> Hope this helps, I am now commenting out this test for now.
> --
> Stefan Seyfried
>
> "For a successful technology, reality must take precedence over
> public relations, for nature cannot be fooled." -- Richard Feynman
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



--
Luiz Augusto von Dentz

2017-05-28 20:11:47

by Stefan Seyfried

[permalink] [raw]
Subject: Re: [BUG] unit/test-midi spurious failure

Hi Felipe,

Am 28.05.2017 um 19:55 schrieb Felipe Tonello:
> Hi Stefan,
>
> On Fri, May 26, 2017 at 9:17 AM, Stefan Seyfried
> <[email protected]> wrote:
>> Hi all,
>>
>> I get spurious failures in unit/test-midi

>> ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (5 == 4)

>> ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (1024 == 1023)
>> Aborted (core dumped)
>> ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (5 == 4)
>> Aborted (core dumped)
>>
>> About one in four runs fails:
>>
>> #!/bin/bash
>> G=0; B=0;
>> for i in `seq 1 1000`; do
>> if unit/test-midi; then
>> G=$((G+1)
>> else
>> B=$((B+1))
>> fi
>> done
>> echo "good: $G bad: $B"
>>
>> good: 730 bad: 270
>>
>> I have no idea what this means, if this is an error in the
>> test code or a sign something is wrong with the testing system.
>>
>> It started to happen with bluez-5.45, still happens with git master.
>
> Does it happen on BlueZ 5.44? If so, then I believe it is a bug in the
> MIDI code.

I have not seen it in BlueZ 5.44, only in 5.45 and master.
But because it is a spurious failure, I might have just been lucky (I
only notice failing builds in the openSUSE Build Service if I actively
look for them, but if some dependencies are rebuilt, then the bluez
package also gets automatically rebuilt and the spurious failure might
"fix" itself)

> Thanks for finding it out.

"make check" is part of the openSUSE RPM packaging process, to find such
things, so I did not have to do any real work to find this :-)

Best regards,

Stefan
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman

2017-05-28 17:55:03

by Felipe Ferreri Tonello

[permalink] [raw]
Subject: Re: [BUG] unit/test-midi spurious failure

Hi Stefan,

On Fri, May 26, 2017 at 9:17 AM, Stefan Seyfried
<[email protected]> wrote:
> Hi all,
>
> I get spurious failures in unit/test-midi
>
> Debugging gives:
>
> ALSA SysEx events to Raw BLE packets - init
> ALSA SysEx events to Raw BLE packets - setup
> ALSA SysEx events to Raw BLE packets - setup complete
> ALSA SysEx events to Raw BLE packets - run
> **
> ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (5 == 4)
>
> Program received signal SIGABRT, Aborted.
> 0x00007ffff746a8d7 in raise () from /lib64/libc.so.6
> Missing separate debuginfos, use: zypper install libasound2-debuginfo-1.1.2-1.2.x86_64 libglib-2_0-0-debuginfo-2.48.2-1.10.x86_64 libpcre1-debuginfo-8.33-4.51.x86_64
> (gdb) bt
> #0 0x00007ffff746a8d7 in raise () from /lib64/libc.so.6
> #1 0x00007ffff746bcaa in abort () from /lib64/libc.so.6
> #2 0x00007ffff7b3bbc5 in g_assertion_message () from /usr/lib64/libglib-2.0.so.0
> #3 0x00007ffff7b3bf61 in g_assertion_message_cmpnum () from /usr/lib64/libglib-2.0.so.0
> #4 0x000055555555662c in compare_events (ev1=0x55555575bb20 <event4>, ev2=0x7fffffffe590) at unit/test-midi.c:285
> #5 0x000055555555693b in compare_events_cb (parser=0x7fffffffe660, user_data=0x7fffffffe640) at unit/test-midi.c:577
> #6 0x0000555555556bcb in test_midi_writer (data=0x55555575bb00 <midi4>) at unit/test-midi.c:603
> #7 0x0000555555557878 in run_callback (user_data=0x555555763600) at src/shared/tester.c:415
> #8 0x00007ffff7b18015 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> #9 0x00007ffff7b18388 in ?? () from /usr/lib64/libglib-2.0.so.0
> #10 0x00007ffff7b1864a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
> #11 0x0000555555558721 in tester_run () at src/shared/tester.c:830
> #12 0x0000555555556429 in main (argc=1, argv=0x7fffffffe9c8) at unit/test-midi.c:631
> (gdb)
>
> It does not happen every time, but maybe once in 10 runs,
> the assertion failure numbers are different:
> ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (1024 == 1023)
> Aborted (core dumped)
> ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (5 == 4)
> Aborted (core dumped)
>
> About one in four runs fails:
>
> #!/bin/bash
> G=0; B=0;
> for i in `seq 1 1000`; do
> if unit/test-midi; then
> G=$((G+1)
> else
> B=$((B+1))
> fi
> done
> echo "good: $G bad: $B"
>
> good: 730 bad: 270
>
> I have no idea what this means, if this is an error in the
> test code or a sign something is wrong with the testing system.
>
> It started to happen with bluez-5.45, still happens with git master.

Does it happen on BlueZ 5.44? If so, then I believe it is a bug in the
MIDI code.

Thanks for finding it out.

Felipe

2017-05-26 08:17:00

by Stefan Seyfried

[permalink] [raw]
Subject: [BUG] unit/test-midi spurious failure

Hi all,

I get spurious failures in unit/test-midi

Debugging gives:

ALSA SysEx events to Raw BLE packets - init
ALSA SysEx events to Raw BLE packets - setup
ALSA SysEx events to Raw BLE packets - setup complete
ALSA SysEx events to Raw BLE packets - run
**
ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (5 == 4)

Program received signal SIGABRT, Aborted.
0x00007ffff746a8d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: zypper install libasound2-debuginfo-1.1.2-1.2.x86_64 libglib-2_0-0-debuginfo-2.48.2-1.10.x86_64 libpcre1-debuginfo-8.33-4.51.x86_64
(gdb) bt
#0 0x00007ffff746a8d7 in raise () from /lib64/libc.so.6
#1 0x00007ffff746bcaa in abort () from /lib64/libc.so.6
#2 0x00007ffff7b3bbc5 in g_assertion_message () from /usr/lib64/libglib-2.0.so.0
#3 0x00007ffff7b3bf61 in g_assertion_message_cmpnum () from /usr/lib64/libglib-2.0.so.0
#4 0x000055555555662c in compare_events (ev1=0x55555575bb20 <event4>, ev2=0x7fffffffe590) at unit/test-midi.c:285
#5 0x000055555555693b in compare_events_cb (parser=0x7fffffffe660, user_data=0x7fffffffe640) at unit/test-midi.c:577
#6 0x0000555555556bcb in test_midi_writer (data=0x55555575bb00 <midi4>) at unit/test-midi.c:603
#7 0x0000555555557878 in run_callback (user_data=0x555555763600) at src/shared/tester.c:415
#8 0x00007ffff7b18015 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#9 0x00007ffff7b18388 in ?? () from /usr/lib64/libglib-2.0.so.0
#10 0x00007ffff7b1864a in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#11 0x0000555555558721 in tester_run () at src/shared/tester.c:830
#12 0x0000555555556429 in main (argc=1, argv=0x7fffffffe9c8) at unit/test-midi.c:631
(gdb)

It does not happen every time, but maybe once in 10 runs,
the assertion failure numbers are different:
ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (1024 == 1023)
Aborted (core dumped)
ERROR:unit/test-midi.c:287:compare_events: assertion failed (ev1->data.ext.len == ev2->data.ext.len): (5 == 4)
Aborted (core dumped)

About one in four runs fails:

#!/bin/bash
G=0; B=0;
for i in `seq 1 1000`; do
if unit/test-midi; then
G=$((G+1)
else
B=$((B+1))
fi
done
echo "good: $G bad: $B"

good: 730 bad: 270

I have no idea what this means, if this is an error in the
test code or a sign something is wrong with the testing system.

It started to happen with bluez-5.45, still happens with git master.
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman