Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764485AbYHFVec (ORCPT ); Wed, 6 Aug 2008 17:34:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751610AbYHFV2h (ORCPT ); Wed, 6 Aug 2008 17:28:37 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:36975 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750936AbYHFV2g (ORCPT ); Wed, 6 Aug 2008 17:28:36 -0400 Date: Wed, 6 Aug 2008 14:28:05 -0700 From: Andrew Morton To: Alan Stern Cc: linux-kernel@vger.kernel.org, ospite@studenti.unina.it, Matthew Wilcox , Nick Piggin Subject: Re: BUG in VFS or block layer Message-Id: <20080806142805.9db6f52f.akpm@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed version 2.2.4 (GTK+ 2.8.20; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6643 Lines: 175 On Wed, 6 Aug 2008 16:40:02 -0400 (EDT) Alan Stern wrote: > This bug is certainly present in 2.6.27-rc2, and it may be present in > several earlier kernel versions too, perhaps as far back as 2.6.25. > What happens is that in the presence of certain I/O errors, user > processes can hang. Here is a procedure for recreating the bug: > > Make sure that CONFIG_USB_GADGET and CONFIG_USB_GADGET_DUMMY_HCD are > enabled as modules, as well as CONFIG_USB_FILE_STORAGE. Apply the > patch below and build the drivers. Then create a blank image file with > 1 MB of zeros and load the drivers: > > dd if=/dev/zero of=image bs=1M count=1 > modprobe dummy-hcd > modprobe g-file-storage file=image > > The patch causes a number of sector-read errors to occur, replicating > those in the log attached to > > http://marc.info/?l=linux-usb&m=121802760208717&w=2 > > On my system this leads hald-probe-storage to hang during exit with the > following stack trace: > > [ 85.330010] hald-probe-st D e6d17cac 0 2031 1658 > [ 85.330174] e6d17cd0 00000046 00000000 e6d17cac ee78d160 e6d83a20 e6d83b74 00000000 > [ 85.330600] efb76940 e6d17cb8 c04c447b c17e2024 e6d17cc0 c04c2f0b e6d17cc8 c19d1180 > [ 85.331066] c17e2024 c19c8210 e6d17cdc c059c67d e707da48 e6d17cec c0447f64 e6d17d10 > [ 85.331534] Call Trace: > [ 85.331693] [] ? generic_unplug_device+0x16/0x3a > [ 85.331860] [] ? blk_unplug+0xc/0xe > [ 85.332023] [] io_schedule+0x1e/0x28 > [ 85.332149] [] sync_page+0x46/0x4c > [ 85.332276] [] __wait_on_bit_lock+0x30/0x59 > [ 85.332403] [] ? sync_page+0x0/0x4c > [ 85.332566] [] __lock_page+0x4e/0x56 > [ 85.332692] [] ? wake_bit_function+0x0/0x43 > [ 85.332858] [] lock_page+0x27/0x2a > [ 85.332985] [] truncate_inode_pages_range+0x218/0x27a > [ 85.333117] [] ? trace_hardirqs_on_caller+0xe1/0x102 > [ 85.333283] [] ? trace_hardirqs_on+0xb/0xd > [ 85.333448] [] truncate_inode_pages+0xc/0x12 > [ 85.333576] [] kill_bdev+0x2c/0x2f > [ 85.333704] [] __blkdev_put+0x4c/0x12b > [ 85.333830] [] ? d_free+0x25/0x37 > [ 85.333992] [] blkdev_put+0xa/0xc > [ 85.334117] [] blkdev_close+0x25/0x28 > [ 85.334243] [] __fput+0xae/0x13c > [ 85.334368] [] fput+0x17/0x19 > [ 85.334492] [] filp_close+0x50/0x5a > [ 85.334618] [] put_files_struct+0x68/0xaa > [ 85.334746] [] exit_files+0x2e/0x32 > [ 85.334871] [] do_exit+0x1df/0x64a > [ 85.334997] [] ? set_tsk_thread_flag+0xb/0xd > [ 85.335162] [] ? trace_hardirqs_on+0xb/0xd > [ 85.335326] [] sys_exit_group+0x0/0x11 > [ 85.335453] [] get_signal_to_deliver+0x2ee/0x32d > [ 85.335582] [] do_notify_resume+0x6b/0x5ff > [ 85.335711] [] ? trace_hardirqs_on+0xb/0xd > [ 85.335875] [] ? __mutex_unlock_slowpath+0xf4/0x103 > [ 85.336042] [] ? mutex_unlock+0x8/0xa > [ 85.336204] [] ? block_llseek+0x94/0xa2 > [ 85.336368] [] ? vfs_read+0x78/0xa8 > [ 85.336531] [] ? do_sync_read+0x0/0xe9 > [ 85.336694] [] ? sys_read+0x53/0x5d > [ 85.336857] [] work_notifysig+0x13/0x19 > [ 85.336984] ======================= > > I don't know enough about the VFS or block layers to track this down > any further. > > Alan Stern > > > Index: usb-2.6/drivers/usb/gadget/file_storage.c > =================================================================== > --- usb-2.6.orig/drivers/usb/gadget/file_storage.c > +++ usb-2.6/drivers/usb/gadget/file_storage.c > @@ -1569,6 +1569,11 @@ static int do_read(struct fsg_dev *fsg) > } > file_offset = ((loff_t) lba) << 9; > > + if (lba >= 100) { > + curlun->sense_data = 0x030000; > + return -EINVAL; > + } > + > /* Carry out the file reads */ > amount_left = fsg->data_size_from_cmnd; > if (unlikely(amount_left == 0)) What the VFS will do is - lock the page - put the page into a BIO and send it down to the block layer - later, wait for IO completion. It does this by running lock_page[_killable](), which will waiting for the page to come unlocked. The page comes unlocked via the device driver, usually within the IO completion interrupt. A common cause of userspace lockups during IO errors is that the driver layer screwed up and didn't run the completion callback. Now, according to the above trace, the above code sequence _did_ work OK. Or at least, it ran to completion. It was later, when we tried to truncate a file that we stumbled across a permanently-locked page. So it would appear that the VFS read() code successfully completed, but left locked pages behind it, which caused the truncate to hang. Aside: why does this code in do_generic_file_read() return -EIO when it got a signal? page_not_up_to_date: /* Get exclusive access to the page ... */ if (lock_page_killable(page)) goto readpage_eio; One possible problem is here: readpage: /* Start the actual read. The read will unlock the page. */ error = mapping->a_ops->readpage(filp, page); if (unlikely(error)) { if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); goto find_page; } goto readpage_error; } the VFS layer assumes that if ->readpage() returned a synchronous error then the page was already unlocked within ->readpage(). Usually this means that the driver layer had to run the BIO completion callback to do that unlocking. It is possible that the USB code forgot to do this. This would explain what you're seeing. So... would you be able to verify that the USB, layer is correctly calling bio->bi_end_io() for the offending requests? Aside2: why does this code: readpage: /* Start the actual read. The read will unlock the page. */ error = mapping->a_ops->readpage(filp, page); if (unlikely(error)) { if (error == AOP_TRUNCATED_PAGE) { page_cache_release(page); goto find_page; } goto readpage_error; } if (!PageUptodate(page)) { if (lock_page_killable(page)) goto readpage_eio; return EIO if lock_page_killable() saw a signal? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/