Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp3054336pxa; Sat, 8 Aug 2020 08:09:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxQ2WTy+KSqt0M5GkZQwWmtBjep9KaeYAZL0fuZNTj2kJRhi8AOqf0lCdZO50UtInjMYMr3 X-Received: by 2002:a17:907:208e:: with SMTP id pv14mr15136122ejb.438.1596899386646; Sat, 08 Aug 2020 08:09:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1596899386; cv=none; d=google.com; s=arc-20160816; b=S+Zg8FYezPQjZ9VGdHa3oMPMgdKfQpdhfKpzV7eEwywMQXS+Dsn98IByjL15FnQGJa Vz7UeI172wT4YT54j5xd4/O3znAB0ICeEoFiNmD4hCOboZNW+p0D2BYGGLfgYwd0LOvE 1blzaBeSkZ1ZQPHx1MEdOdmgc/WwHcnej1v4xNvpribrjqdmLu2uwhReWZXttDSqRYik o0pS+Gvn126zp4qgP95pMnfgTj85+Tev8kvGbjfeRifG1YEXAMzpl3qXuHsA5Q5KuojL wANlDu6B2bBeskUcBmkKGd+mZmzX7dPFHKjYF0Mwnz+80CjnARGreiMg9Km8M6jGAdwc ibuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=IQ9YD1CpEHFC8MVuDlyOI9jXEnpXUAJhDUbOMvxtulc=; b=EMQ45ZgeeoXax6xMsD0gCryfWmWLYa+d3/7C1AC8hAE6Y/MK5THBFQviPIPCgrgFar Hxaaksdh3HepxVKIs+vV8eASdhbS25o4ZkiNIDtO0uNfqqyUfUVQHewJQWsehX00jQe8 NeuYKnnC2BLsCbL5nJMy0fRiYh1oHb4MFJTDAhV5p0eJ6zlUcKPlP/br6lDWmxSQsfGp Yblokd0AIJZVUbUaN5BxkEbTE5AC0Tm0+cBjbi+wPnD2tn1cYYnZfvuSa6v40wb2WQrj nZYr8X7zrP+xWMfGR+Zdk/CPHJFRFtI7PfakiKzKgpvqb//zX/GsfJlQX3YRvhfkbBZF JN7A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w22si5713490ejk.590.2020.08.08.08.09.23; Sat, 08 Aug 2020 08:09:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726238AbgHHPFo (ORCPT + 99 others); Sat, 8 Aug 2020 11:05:44 -0400 Received: from netrider.rowland.org ([192.131.102.5]:34361 "HELO netrider.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1726338AbgHHPFo (ORCPT ); Sat, 8 Aug 2020 11:05:44 -0400 Received: (qmail 257140 invoked by uid 1000); 8 Aug 2020 11:05:42 -0400 Date: Sat, 8 Aug 2020 11:05:42 -0400 From: Alan Stern To: Martin Kepplinger Cc: James Bottomley , Bart Van Assche , Can Guo , martin.petersen@oracle.com, linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, kernel@puri.sm Subject: Re: [PATCH] scsi: sd: add runtime pm to open / release Message-ID: <20200808150542.GB256751@rowland.harvard.edu> References: <1596037482.4356.37.camel@HansenPartnership.com> <20200729182515.GB1580638@rowland.harvard.edu> <1596047349.4356.84.camel@HansenPartnership.com> <20200730151030.GB6332@rowland.harvard.edu> <9b80ca7c-39f8-e52d-2535-8b0baf93c7d1@puri.sm> <425990b3-4b0b-4dcf-24dc-4e7e60d5869d@puri.sm> <20200807143002.GE226516@rowland.harvard.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Aug 08, 2020 at 08:59:09AM +0200, Martin Kepplinger wrote: > On 07.08.20 16:30, Alan Stern wrote: > > On Fri, Aug 07, 2020 at 11:51:21AM +0200, Martin Kepplinger wrote: > >> it's really strange: below is the change I'm trying. Of course that's > >> only for testing the functionality, nothing how a patch could look like. > >> > >> While I remember it had worked, now (weirdly since I tried that mounting > >> via fstab) it doesn't anymore! > >> > >> What I understand (not much): I handle the error with "retry" via the > >> new flag, but scsi_decide_disposition() returns SUCCESS because of "no > >> more retries"; but it's the first and only time it's called. > > > > Are you saying that scmd->allowed is set to 0? Or is scsi_notry_cmd() > > returning a nonzero value? Whichever is true, why does it happen that > > way? > > scsi_notry_cmd() is returning 1. (it's retry 1 of 5 allowed). > > why is it returning 1? REQ_FAILFAST_DEV is set. It's DID_OK, then "if > (status_byte(scmd->result) != CHECK_CONDITION)" appearently is not true, > then at the end it returns 1 because of REQ_FAILFAST_DEV. > > that seems to come from the block layer. why and when? could I change > that so that the scsi error handling stays in control? The only place I see where that flag might get set is in blk_mq_bio_to_request() in block/blk-mq.c, which does: if (bio->bi_opf & REQ_RAHEAD) rq->cmd_flags |= REQ_FAILFAST_MASK; So apparently read-ahead reads are supposed to fail fast (i.e., without retries), presumably because they are optional after all. > > What is the failing command? Is it a READ(10)? > > Not sure how I'd answer that, but here's the test to trigger the error: > > mount /dev/sda1 /mnt > cd /mnt > ls > cp file ~/ (if ls "works" and doesn't yet trigger the error) > > and that's the (familiar looking) logs when doing so. again: despite the > mentioned workaround in scsi_error and the new expected_media_change > flag *is* set and gets cleared, as it should be. REQ_FAILFAST_DEV seems > to override what I want to do is scsi_error: > > [ 55.557629] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: > hostbyte=0x00 driverbyte=0x08 cmd_age=0s > [ 55.557639] sd 0:0:0:0: [sda] tag#0 Sense Key : 0x6 [current] > [ 55.557646] sd 0:0:0:0: [sda] tag#0 ASC=0x28 ASCQ=0x0 > [ 55.557657] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x28 28 00 00 08 fc > e0 00 00 01 00 Yes, 0x28 is READ(10). Likely this is a read-ahead request, although I don't know how we can tell for sure. > [ 55.557666] blk_update_request: I/O error, dev sda, sector 589024 op > 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 > [ 55.568899] sd 0:0:0:0: [sda] tag#0 device offline or changed > [ 55.574691] blk_update_request: I/O error, dev sda, sector 589025 op > 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 > [ 55.585756] sd 0:0:0:0: [sda] tag#0 device offline or changed > [ 55.591562] blk_update_request: I/O error, dev sda, sector 589026 op > 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 > [ 55.602274] sd 0:0:0:0: [sda] tag#0 device offline or changed > (... goes on with the same) Is such a drastic response really appropriate for the failure of a read-ahead request? It seems like a more logical response would be to let the request fail but keep the device online. Of course, that would only solve part of your problem -- your log would still get filled with those "I/O error" messages even though they wouldn't be fatal. Probably a better approach would be to make the new expecting_media_change flag override scsi_no_retry_cmd(). But this is not my area of expertise. Maybe someone else will have more to say. Alan Stern