2015-06-17 21:55:47

by Andrei Vagin

[permalink] [raw]
Subject: linux-next: BUG: unable to handle kernel NULL pointer dereference in cfq_print_leaf_weight()

Hi All,

I executed CRIU tests on the 4.1.0-rc8-next-20150617 kernel and met
this bug. Maybe it will be interested for someone to look at it.

[ 30.130897] BUG: unable to handle kernel NULL pointer dereference
at 000000000000001c
[ 30.130970] IP: [<ffffffff813bc7e1>] cfq_print_leaf_weight+0x31/0x50
[ 30.131013] PGD 3b77a067 PUD 39883067 PMD 0
[ 30.131013] Oops: 0000 [#1] SMP
[ 30.131013] Modules linked in: tun netlink_diag af_packet_diag
udp_diag tcp_diag inet_diag unix_diag ppdev crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_balloon serio_raw
i2c_piix4 parport_pc parport virtio_blk virtio_net cirrus
drm_kms_helper ttm drm virtio_pci virtio_ring virtio ata_generic
pata_acpi
[ 30.131013] CPU: 0 PID: 613 Comm: criu Not tainted 4.1.0-rc8-next-20150617 #1
[ 30.131013] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 30.131013] task: ffff880039eb0000 ti: ffff880039db0000 task.ti:
ffff880039db0000
[ 30.131013] RIP: 0010:[<ffffffff813bc7e1>] [<ffffffff813bc7e1>]
cfq_print_leaf_weight+0x31/0x50
[ 30.131013] RSP: 0018:ffff880039db3d08 EFLAGS: 00010286
[ 30.131013] RAX: 0000000000000000 RBX: ffff88003a524200 RCX: 0000000000000001
[ 30.131013] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88003a524800
[ 30.131013] RBP: ffff880039db3d18 R08: 0000000000000001 R09: 0000000000000000
[ 30.131013] R10: 0000000000000080 R11: 0000000000000000 R12: ffffffff81cba6c8
[ 30.131013] R13: 0000000000000001 R14: ffff880039db3f18 R15: ffff88003a524200
[ 30.131013] FS: 00007f2b6c495740(0000) GS:ffff88003d200000(0000)
knlGS:0000000000000000
[ 30.131013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 30.131013] CR2: 000000000000001c CR3: 000000003de3c000 CR4: 00000000000407f0
[ 30.131013] Stack:
[ 30.131013] ffff880039db3d28 ffff88003a524200 ffff880039db3d48
ffffffff8113c9d3
[ 30.131013] ffff880039db3d58 0000000000000000 ffff880039773380
0000000000000001
[ 30.131013] ffff880039db3d58 ffffffff812b9df6 ffff880039db3dd8
ffffffff81257b7d
[ 30.131013] Call Trace:
[ 30.131013] [<ffffffff8113c9d3>] cgroup_seqfile_show+0x43/0xc0
[ 30.131013] [<ffffffff812b9df6>] kernfs_seq_show+0x26/0x30
[ 30.131013] [<ffffffff81257b7d>] seq_read+0x10d/0x3c0
[ 30.131013] [<ffffffff812ba7c9>] kernfs_fop_read+0x129/0x180
[ 30.131013] [<ffffffff8122f9d7>] __vfs_read+0x37/0x100
[ 30.131013] [<ffffffff81349db3>] ? security_file_permission+0xa3/0xc0
[ 30.131013] [<ffffffff8122ff86>] ? rw_verify_area+0x56/0xe0
[ 30.131013] [<ffffffff81230096>] vfs_read+0x86/0x130
[ 30.131013] [<ffffffff81230f28>] SyS_read+0x58/0xd0
[ 30.131013] [<ffffffff817c386e>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 30.131013] Code: 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b bf d8 00
00 00 e8 93 fe d7 ff 48 85 c0 74 0f 48 63 15 27 f0 8f 00 48 8b 84 d0
28 01 00 00 <8b> 50 1c 48 89 df 31 c0 48 c7 c6 f8 f8 ad 81 e8 1b b9 e9
ff 48
[ 30.131013] RIP [<ffffffff813bc7e1>] cfq_print_leaf_weight+0x31/0x50
[ 30.131013] RSP <ffff880039db3d08>
[ 30.131013] CR2: 000000000000001c
[ 30.142367] ---[ end trace bad3e020b932bbb1 ]---


2015-06-18 21:09:14

by Andrei Vagin

[permalink] [raw]
Subject: Re: linux-next: BUG: unable to handle kernel NULL pointer dereference in cfq_print_leaf_weight()

2015-06-18 0:55 GMT+03:00 Andrey Wagin <[email protected]>:
> Hi All,
>
> I executed CRIU tests on the 4.1.0-rc8-next-20150617 kernel and met
> this bug. Maybe it will be interested for someone to look at it.

This bug isn't reproduced if one of devices uses the cfq scheduler:

[root@avagin-fc19-cr ~]# cat /sys/block/sda/queue/scheduler
noop deadline [cfq]
[root@avagin-fc19-cr ~]# cat /sys/fs/cgroup/blkio/blkio.leaf_weight
1000

[root@avagin-fc19-cr ~]# echo noop > /sys/block/sda/queue/scheduler
[root@avagin-fc19-cr ~]# cat /sys/block/sda/queue/scheduler
[noop] deadline cfq
[root@avagin-fc19-cr ~]# cat /sys/fs/cgroup/blkio/blkio.leaf_weight
Killed

blkcg_to_cfqgd(blkcg) returns NULL.

>
> [ 30.130897] BUG: unable to handle kernel NULL pointer dereference
> at 000000000000001c
> [ 30.130970] IP: [<ffffffff813bc7e1>] cfq_print_leaf_weight+0x31/0x50
> [ 30.131013] PGD 3b77a067 PUD 39883067 PMD 0
> [ 30.131013] Oops: 0000 [#1] SMP
> [ 30.131013] Modules linked in: tun netlink_diag af_packet_diag
> udp_diag tcp_diag inet_diag unix_diag ppdev crct10dif_pclmul
> crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_balloon serio_raw
> i2c_piix4 parport_pc parport virtio_blk virtio_net cirrus
> drm_kms_helper ttm drm virtio_pci virtio_ring virtio ata_generic
> pata_acpi
> [ 30.131013] CPU: 0 PID: 613 Comm: criu Not tainted 4.1.0-rc8-next-20150617 #1
> [ 30.131013] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 30.131013] task: ffff880039eb0000 ti: ffff880039db0000 task.ti:
> ffff880039db0000
> [ 30.131013] RIP: 0010:[<ffffffff813bc7e1>] [<ffffffff813bc7e1>]
> cfq_print_leaf_weight+0x31/0x50
> [ 30.131013] RSP: 0018:ffff880039db3d08 EFLAGS: 00010286
> [ 30.131013] RAX: 0000000000000000 RBX: ffff88003a524200 RCX: 0000000000000001
> [ 30.131013] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff88003a524800
> [ 30.131013] RBP: ffff880039db3d18 R08: 0000000000000001 R09: 0000000000000000
> [ 30.131013] R10: 0000000000000080 R11: 0000000000000000 R12: ffffffff81cba6c8
> [ 30.131013] R13: 0000000000000001 R14: ffff880039db3f18 R15: ffff88003a524200
> [ 30.131013] FS: 00007f2b6c495740(0000) GS:ffff88003d200000(0000)
> knlGS:0000000000000000
> [ 30.131013] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 30.131013] CR2: 000000000000001c CR3: 000000003de3c000 CR4: 00000000000407f0
> [ 30.131013] Stack:
> [ 30.131013] ffff880039db3d28 ffff88003a524200 ffff880039db3d48
> ffffffff8113c9d3
> [ 30.131013] ffff880039db3d58 0000000000000000 ffff880039773380
> 0000000000000001
> [ 30.131013] ffff880039db3d58 ffffffff812b9df6 ffff880039db3dd8
> ffffffff81257b7d
> [ 30.131013] Call Trace:
> [ 30.131013] [<ffffffff8113c9d3>] cgroup_seqfile_show+0x43/0xc0
> [ 30.131013] [<ffffffff812b9df6>] kernfs_seq_show+0x26/0x30
> [ 30.131013] [<ffffffff81257b7d>] seq_read+0x10d/0x3c0
> [ 30.131013] [<ffffffff812ba7c9>] kernfs_fop_read+0x129/0x180
> [ 30.131013] [<ffffffff8122f9d7>] __vfs_read+0x37/0x100
> [ 30.131013] [<ffffffff81349db3>] ? security_file_permission+0xa3/0xc0
> [ 30.131013] [<ffffffff8122ff86>] ? rw_verify_area+0x56/0xe0
> [ 30.131013] [<ffffffff81230096>] vfs_read+0x86/0x130
> [ 30.131013] [<ffffffff81230f28>] SyS_read+0x58/0xd0
> [ 30.131013] [<ffffffff817c386e>] entry_SYSCALL_64_fastpath+0x12/0x76
> [ 30.131013] Code: 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b bf d8 00
> 00 00 e8 93 fe d7 ff 48 85 c0 74 0f 48 63 15 27 f0 8f 00 48 8b 84 d0
> 28 01 00 00 <8b> 50 1c 48 89 df 31 c0 48 c7 c6 f8 f8 ad 81 e8 1b b9 e9
> ff 48
> [ 30.131013] RIP [<ffffffff813bc7e1>] cfq_print_leaf_weight+0x31/0x50
> [ 30.131013] RSP <ffff880039db3d08>
> [ 30.131013] CR2: 000000000000001c
> [ 30.142367] ---[ end trace bad3e020b932bbb1 ]---

2015-06-19 16:24:36

by Jens Axboe

[permalink] [raw]
Subject: Re: linux-next: BUG: unable to handle kernel NULL pointer dereference in cfq_print_leaf_weight()

On 06/17/2015 03:55 PM, Andrey Wagin wrote:
> Hi All,
>
> I executed CRIU tests on the 4.1.0-rc8-next-20150617 kernel and met
> this bug. Maybe it will be interested for someone to look at it.

This should fix it:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-4.2/core&id=9470e4a693db84bee7becbba8de01af02bb23c9f

--
Jens Axboe

2015-06-19 20:26:59

by Andrei Vagin

[permalink] [raw]
Subject: Re: linux-next: BUG: unable to handle kernel NULL pointer dereference in cfq_print_leaf_weight()

2015-06-19 19:24 GMT+03:00 Jens Axboe <[email protected]>:
> On 06/17/2015 03:55 PM, Andrey Wagin wrote:
>>
>> Hi All,
>>
>> I executed CRIU tests on the 4.1.0-rc8-next-20150617 kernel and met
>> this bug. Maybe it will be interested for someone to look at it.
>
>
> This should fix it:
>
> http://git.kernel.dk/cgit/linux-block/commit/?h=for-4.2/core&id=9470e4a693db84bee7becbba8de01af02bb23c9f

Hi Jens,

Thank you for the path. I think we need to fix __cfq_set_weight and
__cfq_set_weight_device too.

>
> --
> Jens Axboe
>

2015-06-22 20:13:37

by Andrei Vagin

[permalink] [raw]
Subject: Re: linux-next: BUG: unable to handle kernel NULL pointer dereference in cfq_print_leaf_weight()

2015-06-19 23:26 GMT+03:00 Andrey Wagin <[email protected]>:
> 2015-06-19 19:24 GMT+03:00 Jens Axboe <[email protected]>:
>> On 06/17/2015 03:55 PM, Andrey Wagin wrote:
>>>
>>> Hi All,
>>>
>>> I executed CRIU tests on the 4.1.0-rc8-next-20150617 kernel and met
>>> this bug. Maybe it will be interested for someone to look at it.
>>
>>
>> This should fix it:
>>
>> http://git.kernel.dk/cgit/linux-block/commit/?h=for-4.2/core&id=9470e4a693db84bee7becbba8de01af02bb23c9f
>
> Hi Jens,
>
> Thank you for the path. I think we need to fix __cfq_set_weight and
> __cfq_set_weight_device too.

I've seen that you fixed these functions too. But CRIU tests still
fail, because they tries to restore a value of blkio.weight and get
EINVAL. It works fine on the upstream kernel.

For me the suggested interface looks weird. What if a device which
uses cfq isn't permanent. If I detach the device, cgroup configuration
will be destroyed and then if I attach the device again, I will need
to apply cgroup parameters again.

Thanks,
Andrew