Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933201AbbDISYe (ORCPT ); Thu, 9 Apr 2015 14:24:34 -0400 Received: from ares41.inai.de ([46.4.122.207]:60294 "EHLO ares41.inai.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755911AbbDISYb (ORCPT ); Thu, 9 Apr 2015 14:24:31 -0400 Date: Thu, 9 Apr 2015 20:24:30 +0200 (CEST) From: Jan Engelhardt To: Linus Torvalds cc: "Rafael J. Wysocki" , Jens Axboe , Linux Kernel Mailing List Subject: =?UTF-8?Q?Re=3A_NULL_deref_around_blkmq_in_v4=2E0-rc1=E2=80=93rc7?= In-Reply-To: Message-ID: References: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4491 Lines: 94 On Thursday 2015-04-09 19:38, Linus Torvalds wrote: >> >> I reran bisect just to be sure. >> It now shows v4.0-rc1~9 is bad, v4.0-rc1~9^1 is ok, and v4.0-rc~9^2 is >> ok too. So this means that the combination of the both ~9 childs work >> badly together. > >Ok, that's just _odd_. >[...] >So I get the feeling that the oops you are seeing is likely not >consistent, and may depend on allocation patterns or similar. It's fairly consistent (reproducible?). Only 1 in 15 or so (have not kept track really) attempts does it not die. With frame pointers: BUG: unable to handle kernel paging request at 0000000000001000 IP: [] scsi_init_cmd_errh+0x2a/0x62 PGD 0 Oops: 0002 [#1] SMP Modules linked in: xfs crc32c_generic libcrc32c dm_crypt xts gf128mul algif_skcipher af_alg sd_mod mptsas scsi_transport_sas mptscsih mptbase dm_mod sg ipv6 CPU: 0 PID: 403 Comm: kworker/u2:1 Not tainted 4.0.0-rc7+ #55 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ffff88007b686f60 ti: ffff88007bcb4000 task.ti: ffff88007bcb4000 RIP: 0010:[] [] scsi_init_cmd_errh+0x2a/0x62 RSP: 0018:ffff88007bcb77a8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88007bf8d800 RCX: 0000000000000018 RDX: ffff88007bf7ab70 RSI: 0000000000000000 RDI: 0000000000001000 RBP: ffff88007bcb77a8 R08: ffff88007beb9c40 R09: 0000000000000000 R10: 0000000000000000 R11: ffffea0001fe17c0 R12: ffff88007bf7ab70 R13: 0000000000000000 R14: ffff88007bf8d800 R15: ffff88007bf7aa00 FS: 0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000001000 CR3: 000000007cb0d000 CR4: 00000000000007f0 Stack: ffff88007bcb7818 ffffffff81286d59 ffff88007b686f60 ffff88007bc24000 ffff88007bf7ab78 ffff88007bf8d968 ffff88007be56c00 ffff88007bc24000 ffff88007cbfb400 ffff88007bcb7850 ffff88007be56c08 ffff88007bf7aa00 Call Trace: [] scsi_queue_rq+0x2e8/0x3d2 [] __blk_mq_run_hw_queue+0x19b/0x2a2 [] ? blk_mq_merge_queue_io+0x75/0x147 [] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [] blk_mq_run_hw_queue+0x4f/0x99 [] blk_sq_make_request+0x163/0x170 [] generic_make_request+0x97/0xd6 [] submit_bio+0x10d/0x12c [] ? __lru_cache_add+0x1e/0x3f [] mpage_bio_submit+0x25/0x2c [] mpage_readpages+0xf8/0x10c [] ? __xfs_get_blocks+0x2f9/0x2f9 [xfs] [] xfs_vm_readpages+0x18/0x1a [xfs] [] __do_page_cache_readahead+0x137/0x1d3 [] ondemand_readahead+0x20a/0x21b [] page_cache_sync_readahead+0x38/0x3a [] generic_file_read_iter+0x191/0x4fb [] ? xfs_ilock+0x32/0x5d [xfs] [] xfs_file_read_iter+0x1c2/0x213 [xfs] [] new_sync_read+0x74/0x98 [] __vfs_read+0x14/0x3b [] vfs_read+0x74/0xc1 [] kernel_read+0x3c/0x4a [] prepare_binprm+0x117/0x11f [] do_execveat_common.isra.31+0x3b2/0x5d8 [] do_execve+0x27/0x29 [] ____call_usermodehelper+0x10a/0x138 [] ? call_usermodehelper+0x49/0x49 [] ret_from_fork+0x58/0x90 [] ? call_usermodehelper+0x49/0x49 Code: c3 55 48 89 fa 48 c7 87 b0 00 00 00 00 00 00 00 c7 87 f4 00 00 00 00 00 00 00 48 8b bf 10 01 00 00 31 c0 b9 18 00 00 00 48 89 e5 ab 66 83 ba cc 00 00 00 00 75 2a 48 8b 8a d8 00 00 00 8a 01 RIP [] scsi_init_cmd_errh+0x2a/0x62 RSP CR2: 0000000000001000 ---[ end trace fbec0fe487830b1d ]--- >and %rdi is 0x1000. It seems to be simply > > memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); > >where 'cmd->sense_buffer' has some insane value ("PAGE_SIZE" or just a >flipped bit, or whatever) Having been observed on two isolated different systems, I don't think so much that it would be a broken HW-induced bitflip. Oh yeah, if anybody likes, I can hand out the virtualbox image. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/