Date: Wed, 6 Aug 2008 15:55:47 -0700
From: Andrew Morton <akpm@linux-foundation.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-kernel@vger.kernel.org, ospite@studenti.unina.it, matthew@wil.cx,
       nickpiggin@yahoo.com.au
Subject: Re: BUG in VFS or block layer
Message-Id: <20080806155547.619f13f8.akpm@linux-foundation.org>
In-Reply-To: <Pine.LNX.4.44L0.0808061838340.2145-100000@iolanthe.rowland.org>
References: <20080806142805.9db6f52f.akpm@linux-foundation.org>
	<Pine.LNX.4.44L0.0808061838340.2145-100000@iolanthe.rowland.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2530
Lines: 68

On Wed, 6 Aug 2008 18:40:54 -0400 (EDT)
Alan Stern <stern@rowland.harvard.edu> wrote:

> On Wed, 6 Aug 2008, Andrew Morton wrote:
> 
> > What the VFS will do is
> > 
> > - lock the page
> > 
> > - put the page into a BIO and send it down to the block layer
> > 
> > - later, wait for IO completion.  It does this by running
> >   lock_page[_killable](), which will waiting for the page to come unlocked.
> > 
> >   The page comes unlocked via the device driver, usually within the
> >   IO completion interrupt.
> > 
> > 
> > A common cause of userspace lockups during IO errors is that the driver
> > layer screwed up and didn't run the completion callback.
> > 
> > Now, according to the above trace, the above code sequence _did_ work
> > OK.  Or at least, it ran to completion.  It was later, when we tried to
> > truncate a file that we stumbled across a permanently-locked page.
> > 
> > So it would appear that the VFS read() code successfully completed, but
> > left locked pages behind it, which caused the truncate to hang.
> 
> ...
> 
> > One possible problem is here:
> > 
> > readpage:
> > 		/* Start the actual read. The read will unlock the page. */
> > 		error = mapping->a_ops->readpage(filp, page);
> > 
> > 		if (unlikely(error)) {
> > 			if (error == AOP_TRUNCATED_PAGE) {
> > 				page_cache_release(page);
> > 				goto find_page;
> > 			}
> > 			goto readpage_error;
> > 		}
> > 
> > the VFS layer assumes that if ->readpage() returned a synchronous error
> > then the page was already unlocked within ->readpage().  Usually this
> > means that the driver layer had to run the BIO completion callback to
> > do that unlocking.  It is possible that the USB code forgot to do this.
> > This would explain what you're seeing.
> > 
> > So...  would you be able to verify that the USB, layer is correctly
> > calling bio->bi_end_io() for the offending requests?
> 
> The USB layer doesn't handle that; the SCSI layer takes care of it.  
> Possibly the I/O error confuses the code in and around 
> scsi_end_request().  I'll have to do some testing to find out.
> 

Well...  looking at your patch to
drivers/usb/gadget/file_storage.c:do_read(), it appears that
do_scsi_command() just drops do_read()'s error code on the floor rather
than returning it to the scsi layer?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/