Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965482AbbBCA2Y (ORCPT ); Mon, 2 Feb 2015 19:28:24 -0500 Received: from mail-la0-f50.google.com ([209.85.215.50]:60271 "EHLO mail-la0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755358AbbBCA2V (ORCPT ); Mon, 2 Feb 2015 19:28:21 -0500 MIME-Version: 1.0 In-Reply-To: References: <1421920228.7061.48.camel@haakon3.risingtidesystems.com> <1421948107.30821.2.camel@haakon3.risingtidesystems.com> <1422658406.5117.31.camel@haakon3.risingtidesystems.com> Date: Tue, 3 Feb 2015 08:28:19 +0800 Message-ID: Subject: Re: General protection fault in iscsi_rx_thread_pre_handler From: Gavin Guo To: "Nicholas A. Bellinger" Cc: linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-kernel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3778 Lines: 98 Hi Nicholas, On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo wrote: > Hi Nicholas, > > On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger > wrote: >> On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote: >>> Hi Nicholas, >>> >>> On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger >>> wrote: >>> > On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote: >>> >> Hi Nicolas, >>> >> >>> >> On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger >>> >> wrote: >>> >> > Hi Gavin, >>> >> > >>> >> > On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote: >>> >> >> Hi all, >>> >> >> >>> >> >> The general protection fault screenshot is attached. >>> >> >> >>> >> >> Summary: >>> >> >> The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found >>> >> >> the fault is in list_del of iscsi_del_ts_from_active_list. And it >>> >> >> looks like deleting the iscsi_thread_set *ts two times. The point to >>> >> >> delete including iscsi_get_ts_from_inactive_list, was also checked but >>> >> >> still can't find the clue. Really appreciate if anyone can provide any >>> >> >> idea on the bug. >>> >> >> >>> > >>> > >>> > >>> >> > >>> >> > Thanks for your detailed analysis. >>> >> > >>> >> > A similar bug was reported off-list some months back by a person using >>> >> > iser-target + RoCE export on v3.12.y code. Just to confirm, your >>> >> > environment is using traditional iscsi-target + TCP export, right..? >>> >> >>> >> I am sorry that I'm not an expert of the field and already google RoCE >>> >> on the internet but still don't really know what RoCE is. However, I >>> >> can provide the informations. We used iscsiadm on the initiator side >>> >> and lio_node and tcm_node commands to create the targets for >>> >> connection. I think it should be normal iscsi-target using TCP >>> >> export. >>> >> >>> > >>> > Yep, that would be traditional iscsi-target + TCP export. >>> > >>> >> > >>> >> > At the time, a different set of iser-target related changes ended up >>> >> > avoiding this issue on his particular setup, so we thought it was likely >>> >> > a race triggered by login failures specific to iser-target code. >>> >> > >>> >> > There was a untested patch (included inline below) to drop the legacy >>> >> > active_ts_list usage all-together, but IIRC he was not able to reproduce >>> >> > further so the patch didn't get picked up for mainline. >>> >> > >>> >> > If your able to reliability reproduce, please try with the following >>> >> > patch and let us know your progress. >>> >> >>> >> Thanks for your time reading the mail. I'll let you know the result. >>> > >>> > Just curious, are you able to reliability reproduce this bug in a VM..? >>> >>> Thanks for your caring, the machine is on the customer side, I've >>> asked and now waiting for their response. >> >> Hi Gavin, >> >> Just curious if there has been any update on this yet..? >> >> --nab >> > > Really thanks for your attention. I'm also currently waiting for the > customer's reply and will send the email again to ask for the result. > However, I think the symptom may be hard to replicate that's why the > customer didn't reply me for a long time. Thanks for your time again. > > Thanks, > Gavin Sorry for making you wait so long. I just got the response from the customer, they said the general protection fault happened just 2 times in the past and cannot be reliably reproduced. And I am now waiting for the verification test. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/