Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933215AbbDIRib (ORCPT ); Thu, 9 Apr 2015 13:38:31 -0400 Received: from mail-ig0-f181.google.com ([209.85.213.181]:37427 "EHLO mail-ig0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932825AbbDIRiV (ORCPT ); Thu, 9 Apr 2015 13:38:21 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Thu, 9 Apr 2015 10:38:20 -0700 X-Google-Sender-Auth: RCSVtnEZTmMmCEEfTj1UiVZMVMI Message-ID: Subject: =?UTF-8?Q?Re=3A_NULL_deref_around_xfs_in_v4=2E0=2Drc1=E2=80=93rc7?= From: Linus Torvalds To: Jan Engelhardt , "Rafael J. Wysocki" , Jens Axboe Cc: Linux Kernel Mailing List Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2143 Lines: 58 On Wed, Apr 8, 2015 at 8:20 AM, Jan Engelhardt wrote: > On Wednesday 2015-04-08 15:41, Jan Engelhardt wrote: > >>Starting somewhere around v4.0-rc1 and persisting through commit >>v4.0-rc7, there is a new NULL deference apparently happening in >>conjunction with xfs. This inhibits this machine's booting, >>as xfs is used for the root filesystem. >> >>First bisection points at first-bad commit v4.0-rc1~8, and since that is >>a merge commit, I'll be investigating some more hand-chosen commits (and >>then people to Cc) as we speak. > > I reran bisect just to be sure. > It now shows v4.0-rc1~9 is bad, v4.0-rc1~9^1 is ok, and v4.0-rc~9^2 is > ok too. So this means that the combination of the both ~9 childs work > badly together. Ok, that's just _odd_. That v4.0-rc1~9 is just the pm+acpi merge, and has absolutely nothing to do with XFS or the block code. In fact, looking at the diff from it's direct parent, it doesn't even really change any relevant code. So I get the feeling that the oops you are seeing is likely not consistent, and may depend on allocation patterns or similar. Because the bisect doesn't make any sense at all. It looks much more like a pure block-mq bug, but one that needs some very special condition to trigger. Jens, does this look familiar or trigger any ideas: BUG: unable to handle kernel paging request at 0000000000001000 IP: [] scsi_init_cmd_errh+0x26/0x5d (The whole oops is on lkml). Jan, can you reproduce the oops with frame pointers so that we get a better call trace? Although it looks fairly normal: the trapping code is rep stos %eax,%es:*(%rdi) and %rdi is 0x1000. It seems to be simply memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE); where 'cmd->sense_buffer' has some insane value ("PAGE_SIZE" or just a flipped bit, or whatever) Jens? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/