Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753607Ab1DKOxd (ORCPT ); Mon, 11 Apr 2011 10:53:33 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12891 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752268Ab1DKOxc (ORCPT ); Mon, 11 Apr 2011 10:53:32 -0400 Subject: Strange block/scsi/workqueue issue From: Steven Whitehouse To: linux-kernel@vger.kernel.org, Jens Axboe , Tejun Heo Content-Type: text/plain; charset="UTF-8" Organization: Red Hat UK Ltd Date: Mon, 11 Apr 2011 15:56:03 +0100 Message-ID: <1302533763.2596.23.camel@dolmen> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6946 Lines: 142 Hi, Sometime around the -rc1 time (I'm not sure exactly, but it seemed to start around then) my test box stopped booting. Lacking a serial console I didn't get around to looking further before I went on holiday, but since the problem is still there in the latest Linus' tree, I've been looking again today. My original problem was that the messages were scrolling off the screen before I could read them. Using a video camera to try and capture them was unsuccessful. Today I used a serial console, and magically it seemed that the initial boot issue had gone away - so I guess there is at least an element of timing dependence. However, not very long after boot, I got the attached messages. These look suspiciously similar (bearing in mind I could only see the last few lines) to the messages I was getting with the boot issue. I should also say that the box was idle aside from the usual f14 start up tasks, so there was not much I/O going on at the time. Is the workqueue by any chance trying to wait synchronously for itself to finish? Steve. INFO: task kworker/2:1:382 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/2:1 D ffff8800cb5d2500 4528 382 2 0x00000000 ffff8800c88d5880 0000000000000086 00000000001d2500 ffff8800c88d57c8 0000000000000086 ffff8800c88d5fd8 ffff8800c92d8b80 ffff8800c88d5fd8 ffff8800c88d4000 ffff8800c88d4000 ffff8800c88d4000 ffff8800c88d5fd8 Call Trace: [] ? finish_task_switch+0x3a/0x110 [] ? _raw_spin_unlock_irq+0x46/0x70 [] ? finish_task_switch+0x6f/0x110 [] ? finish_task_switch+0x3a/0x110 [] schedule_timeout+0x295/0x310 [] ? sub_preempt_count+0xa9/0xe0 [] wait_for_common+0x120/0x170 [] ? try_to_wake_up+0x360/0x360 [] wait_for_completion+0x18/0x20 [] wait_on_cpu_work+0xec/0x100 [] ? do_work_for_cpu+0x30/0x30 [] wait_on_work+0xdb/0x150 [] ? wait_on_cpu_work+0x100/0x100 [] __cancel_work_timer+0x83/0x130 [] cancel_delayed_work_sync+0xd/0x10 [] blk_sync_queue+0x24/0x50 [] blk_cleanup_queue+0xf/0x60 [] scsi_free_queue+0x9/0x10 [] scsi_device_dev_release_usercontext+0xeb/0x140 [] ? trace_hardirqs_on_caller+0x14d/0x190 [] ? scsi_device_cls_release+0x20/0x20 [] execute_in_process_context+0x86/0xa0 [] scsi_device_dev_release+0x17/0x20 [] device_release+0x22/0x90 [] ? retint_restore_args+0x13/0x13 [] kobject_release+0x45/0x90 [] ? kobject_del+0x40/0x40 [] kref_put+0x37/0x70 [] kobject_put+0x27/0x60 [] put_device+0x12/0x20 [] scsi_request_fn+0xb9/0x4a0 [] __blk_run_queue+0x6a/0x110 [] blk_delay_work+0x26/0x40 [] process_one_work+0x197/0x520 [] ? process_one_work+0x131/0x520 [] ? blk_make_request+0x90/0x90 [] worker_thread+0x15c/0x330 [] ? manage_workers.clone.20+0x240/0x240 [] ? manage_workers.clone.20+0x240/0x240 [] kthread+0xa6/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? finish_task_switch+0x6f/0x110 [] ? _raw_spin_unlock_irq+0x46/0x70 [] ? retint_restore_args+0x13/0x13 [] ? __init_kthread_worker+0x70/0x70 [] ? gs_change+0x13/0x13 INFO: lockdep is turned off. INFO: task kworker/2:1:382 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kworker/2:1 D ffff8800cb5d2500 4528 382 2 0x00000000 ffff8800c88d5880 0000000000000086 00000000001d2500 ffff8800c88d57c8 0000000000000086 ffff8800c88d5fd8 ffff8800c92d8b80 ffff8800c88d5fd8 ffff8800c88d4000 ffff8800c88d4000 ffff8800c88d4000 ffff8800c88d5fd8 Call Trace: [] ? finish_task_switch+0x3a/0x110 [] ? _raw_spin_unlock_irq+0x46/0x70 [] ? finish_task_switch+0x6f/0x110 [] ? finish_task_switch+0x3a/0x110 [] schedule_timeout+0x295/0x310 [] ? sub_preempt_count+0xa9/0xe0 [] wait_for_common+0x120/0x170 [] ? try_to_wake_up+0x360/0x360 [] wait_for_completion+0x18/0x20 [] wait_on_cpu_work+0xec/0x100 [] ? do_work_for_cpu+0x30/0x30 [] wait_on_work+0xdb/0x150 [] ? wait_on_cpu_work+0x100/0x100 [] __cancel_work_timer+0x83/0x130 [] cancel_delayed_work_sync+0xd/0x10 [] blk_sync_queue+0x24/0x50 [] blk_cleanup_queue+0xf/0x60 [] scsi_free_queue+0x9/0x10 [] scsi_device_dev_release_usercontext+0xeb/0x140 [] ? trace_hardirqs_on_caller+0x14d/0x190 [] ? scsi_device_cls_release+0x20/0x20 [] execute_in_process_context+0x86/0xa0 [] scsi_device_dev_release+0x17/0x20 [] device_release+0x22/0x90 [] ? retint_restore_args+0x13/0x13 [] kobject_release+0x45/0x90 [] ? kobject_del+0x40/0x40 [] kref_put+0x37/0x70 [] kobject_put+0x27/0x60 [] put_device+0x12/0x20 [] scsi_request_fn+0xb9/0x4a0 [] __blk_run_queue+0x6a/0x110 [] blk_delay_work+0x26/0x40 [] process_one_work+0x197/0x520 [] ? process_one_work+0x131/0x520 [] ? blk_make_request+0x90/0x90 [] worker_thread+0x15c/0x330 [] ? manage_workers.clone.20+0x240/0x240 [] ? manage_workers.clone.20+0x240/0x240 [] kthread+0xa6/0xb0 [] kernel_thread_helper+0x4/0x10 [] ? finish_task_switch+0x6f/0x110 [] ? _raw_spin_unlock_irq+0x46/0x70 [] ? retint_restore_args+0x13/0x13 [] ? __init_kthread_worker+0x70/0x70 [] ? gs_change+0x13/0x13 INFO: lockdep is turned off. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/