Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753264AbYLBESt (ORCPT ); Mon, 1 Dec 2008 23:18:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751862AbYLBESj (ORCPT ); Mon, 1 Dec 2008 23:18:39 -0500 Received: from hera.kernel.org ([140.211.167.34]:59669 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751832AbYLBESi (ORCPT ); Mon, 1 Dec 2008 23:18:38 -0500 Message-ID: <4934B71C.4030907@kernel.org> Date: Tue, 02 Dec 2008 13:18:36 +0900 From: Tejun Heo User-Agent: Thunderbird 2.0.0.17 (X11/20080922) MIME-Version: 1.0 To: "Nicholas A. Bellinger" CC: FUJITA Tomonori , Mike Anderson , Mike Christie , Christoph Hellwig , James Bottomley , Andrew Morton , Alan Stern , Hannes Reinecke , Boaz Harrosh , Jens Axboe , linux-scsi , LKML , "Linux-iSCSI.org Target Dev" Subject: Re: Changes to Linux/SCSI target mode infrastructure for v2.6.28 References: <1228182727.13241.160.camel@haakon2.linux-iscsi.org> <1228183480.13241.163.camel@haakon2.linux-iscsi.org> <1228187439.13241.176.camel@haakon2.linux-iscsi.org> In-Reply-To: <1228187439.13241.176.camel@haakon2.linux-iscsi.org> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.0 (hera.kernel.org [127.0.0.1]); Tue, 02 Dec 2008 04:18:08 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1536 Lines: 34 Nicholas A. Bellinger wrote: >>> So far during my initial testing, I am running into a two different >>> exceptions. One NULL pointer deference OOPS after half dozen Open/iSCSI >>> login/logouts in block/elevator.c:elv_dequeue_request(). Here is the >>> trace from SCSI softirq context: >>> >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-0.png >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-1.png Can you build with debug info and find out which line is the offending one? >>> The other one is a BUG_ON in blk/blk-timeout.c:177 in blk_add_timeout() >>> that happens after a few hundred MB of READ_10 traffic, which also >>> appears to pass through elv_dequeue_request() at some point: >>> >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-2.png >>> http://linux-iscsi.org/builds/user/nab/2.6.28-rc6-oops-4.png Hmmm... this means blk_add_timer() is being called after the request is already completed. All the problem discovered till now have to do with timeout going off without the low level driver knowing about the request. I don't have much idea and it'll probably be best to trace what's going on using blktrace or printks. Maybe this is caused by list corruption as with the first issue or request completion races with requeueing? -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/