Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754029AbYKXTPi (ORCPT ); Mon, 24 Nov 2008 14:15:38 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752520AbYKXTP1 (ORCPT ); Mon, 24 Nov 2008 14:15:27 -0500 Received: from accolon.hansenpartnership.com ([76.243.235.52]:59453 "EHLO accolon.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752191AbYKXTPZ (ORCPT ); Mon, 24 Nov 2008 14:15:25 -0500 Subject: Re: next-20081119: general protection fault: get_next_timer_interrupt() From: James Bottomley To: Thomas Gleixner Cc: Alexander Beregalov , LKML , linux-next@vger.kernel.org, Ingo Molnar , linux-scsi@vger.kernel.org, David Miller In-Reply-To: References: Content-Type: text/plain Date: Mon, 24 Nov 2008 14:15:17 -0500 Message-Id: <1227554117.25499.46.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 (2.22.3.1-1.fc9) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4586 Lines: 93 On Mon, 2008-11-24 at 18:43 +0100, Thomas Gleixner wrote: > > scsi0 : LSI SAS based MegaRAID driver > > Driver 'sd' needs updating - please use bus_type methods > > scsi 0:0:0:0: Direct-Access ATA SAMSUNG HE160HJ 0-24 PQ: 0 ANSI: 5 > > ------------[ cut here ]------------ > > WARNING: at lib/debugobjects.c:215 debug_print_object+0x4f/0x57() > > ODEBUG: free active object type: timer_list > > That's the cause for your boot crash. The scsi/blk code is freeing a > page which contains an active timer, so the timer code references gone > memory. You triggered it because DEBUG_PAGEALLOC unmaps the page when > it's freed. > > James, or other scsi experts please. > > > Modules linked in: > > Pid: 580, comm: scsi_scan_0 Tainted: G W 2.6.28-rc5-next-20081119 #9 > > Call Trace: > > [] warn_slowpath+0xae/0xd5 > > [] ? debug_check_no_obj_freed+0x75/0x1c8 > > [] debug_print_object+0x4f/0x57 > > [] debug_check_no_obj_freed+0x9c/0x1c8 > > [] kmem_cache_free+0x64/0xc0 > > [] ? blk_release_queue+0x61/0x66 > > [] blk_release_queue+0x61/0x66 > > [] kobject_release+0x52/0x68 > > [] ? kobject_release+0x0/0x68 > > [] kref_put+0x43/0x4f > > [] kobject_put+0x47/0x4b > > [] blk_cleanup_queue+0x57/0x5c > > [] scsi_free_queue+0x9/0xb > > [] scsi_device_dev_release_usercontext+0xdc/0x127 > > [] ? scsi_device_dev_release_usercontext+0x0/0x127 > > [] execute_in_process_context+0x2a/0x70 > > [] scsi_device_dev_release+0x17/0x19 > > [] device_release+0x43/0x68 > > [] kobject_release+0x52/0x68 > > [] ? kobject_release+0x0/0x68 > > [] kref_put+0x43/0x4f > > [] kobject_put+0x47/0x4b > > [] put_device+0x15/0x17 > > [] scsi_destroy_sdev+0x48/0x4c > > [] scsi_probe_and_add_lun+0xb5d/0xb81 > > [] ? scsi_alloc_target+0x22b/0x267 > > [] __scsi_scan_target+0x9d/0x598 > > [] ? trace_hardirqs_on_caller+0x1f/0x153 > > [] ? __mutex_lock_common+0x371/0x3be > > [] ? scsi_scan_host_selected+0xb6/0x133 > > [] ? trace_hardirqs_on_caller+0x1f/0x153 > > [] ? scsi_scan_host_selected+0xb6/0x133 > > [] scsi_scan_channel+0x52/0x78 > > [] scsi_scan_host_selected+0xf1/0x133 > > [] ? do_scan_async+0x0/0x127 > > [] do_scsi_scan_host+0x6b/0x70 > > [] ? do_scan_async+0x0/0x127 > > [] do_scan_async+0x17/0x127 > > [] ? do_scan_async+0x0/0x127 > > [] kthread+0x49/0x76 > > [] child_rip+0xa/0x11 > > [] ? restore_args+0x0/0x30 > > [] ? kthread+0x0/0x76 > > [] ? child_rip+0x0/0x11 > > ---[ end trace 4eaa2a86a8e2da22 ]--- Well, not sure. Most likely candidate is the new block timer code. What seems to be happening is that the queue is being released with either an outstanding request (refcounting problem) or ticking timer with no work (block timer problem). The way scanning works is that we create a request queue for each device we probe and then delete it again if nothing appears after the bus settle time. The argument against this is that it should show up on every scanned bus. However, these are getting rarer; I was just about to write that I hadn't seen it when I remembered that all my SCSI testing systems are currently running hotplug reporting busses (i.e. don't do scanning). However, fortunately, I've also booted voyager recently which does use parallel SCSI and doesn't see this either, so it could also be megaraid_sas specific. Could you turn on SCSI logging so we can see the sequences. Probably since this is boot time, just enable all logging: echo 0xffffffff > /sys/module/scsi_mod/parameters/scsi_logging_level (kernel must be compiled with CONFIG_SCSI_LOGGING=y James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/