Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964983AbcJ1JzD convert rfc822-to-8bit (ORCPT ); Fri, 28 Oct 2016 05:55:03 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:37245 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S964792AbcJ1JzB (ORCPT ); Fri, 28 Oct 2016 05:55:01 -0400 Subject: Re: [PATCH v2 02/16] scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly To: Johannes Thumshirn References: <2ea07f3f-88eb-b795-fa37-a223bf80e581@linux.vnet.ibm.com> <20161013162405.aoxy3bdkc4bqtwsk@linux-x5ow.site> Cc: "Martin K . Petersen" , Christoph Hellwig , Hannes Reinecke , Linux Kernel Mailinglist , Linux SCSI Mailinglist , Martin Schwidefsky , Heiko Carstens , Anil Gurumurthy , Sudarsana Kalluru , "James E.J. Bottomley" , Tyrel Datwyler , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Johannes Thumshirn , James Smart , Dick Kennedy , "supporter:QLOGIC QLA2XXX FC-SCSI DRIVER" , "open list:S390 ZFCP DRIVER" , "open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)" , "open list:FCOE SUBSYSTEM (libfc, libfcoe, fcoe)" From: Steffen Maier Date: Fri, 28 Oct 2016 11:53:46 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161013162405.aoxy3bdkc4bqtwsk@linux-x5ow.site> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8BIT X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16102809-0008-0000-0000-000002E93451 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16102809-0009-0000-0000-00001A839290 Message-Id: <4b411836-e76f-b67a-3d49-ad3d51b8f216@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-10-28_05:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1610280175 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2648 Lines: 77 On 10/13/2016 06:24 PM, Johannes Thumshirn wrote: > On Thu, Oct 13, 2016 at 05:15:25PM +0200, Steffen Maier wrote: >> I'm puzzled. >> >> $ git bisect start fc_bsg master >>> 3087864ce3d7282f59021245d8a5f83ef1caef18 is the first bad commit >>> commit 3087864ce3d7282f59021245d8a5f83ef1caef18 >>> Author: Johannes Thumshirn >>> Date: Wed Oct 12 15:06:28 2016 +0200 >>> >>> scsi: don't use fc_bsg_job::request and fc_bsg_job::reply directly >>> >>> Don't use fc_bsg_job::request and fc_bsg_job::reply directly, but use >>> helper variables bsg_request and bsg_reply. This will be helpfull when >>> transitioning to bsg-lib. >>> >>> Signed-off-by: Johannes Thumshirn >>> >>> :040000 040000 140c4b6829d5cfaec4079716e0795f63f8bc3bd2 0d9fe225615679550be91fbd9f84c09ab1e280fc M drivers >> >> From there (on the reverse bisect path) I get the following Oops, >> except for the full patch set having another stack trace as in my previous >> mail (dying in zfcp code). >> > > [...] > >> >>> @@ -3937,6 +3944,7 @@ fc_bsg_request_handler(struct request_queue *q, struct Scsi_Host *shost, >>> struct request *req; >>> struct fc_bsg_job *job; >>> enum fc_dispatch_result ret; >>> + struct fc_bsg_reply *bsg_reply; >>> >>> if (!get_device(dev)) >>> return; >>> @@ -3973,8 +3981,9 @@ fc_bsg_request_handler(struct request_queue *q, struct Scsi_Host *shost, >>> /* check if we have the msgcode value at least */ >>> if (job->request_len < sizeof(uint32_t)) { >>> BUG_ON(job->reply_len < sizeof(uint32_t)); >>> - job->reply->reply_payload_rcv_len = 0; >>> - job->reply->result = -ENOMSG; >>> + bsg_reply = job->reply; >>> + bsg_reply->reply_payload_rcv_len = 0; >>> + bsg_reply->result = -ENOMSG; Compiler optimization re-ordered above two lines and the first pointer derefence is bsg_reply->result [field offset 0] where bsg_reply is NULL. The assignment tries to write to memory at address NULL causing the kernel page fault. Does your suggested change for [PATCH v3 02/16], shuffling the job->request_len checks, address above kernel page fault? >>> job->reply_len = sizeof(uint32_t); >>> fc_bsg_jobdone(job); >>> spin_lock_irq(q->queue_lock); >>> > > Ahm and what exactly can break here? It's just assigning variables. Now > I'm puzzled too. -- Mit freundlichen Gr??en / Kind regards Steffen Maier Linux on z Systems Development IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294