Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp279246rwd; Wed, 14 Jun 2023 16:03:15 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4GQwA7uvLTQCMi7qeI9F/+Ta4he98aNWeC2vPSii98ma9EGlrS0wb2e1ZTRjU0dUR7pxrd X-Received: by 2002:a17:902:a50a:b0:1ae:3991:e4f9 with SMTP id s10-20020a170902a50a00b001ae3991e4f9mr11759536plq.61.1686783795407; Wed, 14 Jun 2023 16:03:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686783795; cv=none; d=google.com; s=arc-20160816; b=Zez9jzkpx1lSn9peEGzBwO0FOXajwBze0yclSTyjTvSeG8+ivxz+QDTysBCV/d5URz i2Rv7vsTPTGGyeHDLcn8sJUGkh88Dgz6xoVLgZ8kBkxSodXDy6pHvf+QacbKRkgw4WrN 5oDQghglRuRTiresOOi0oLa7G9iJbWGloMu2tmr/CqT7VAWuYk/VMz9EDYF6iKVlwAbr BrjrBDege3iE6tJZb2Ixtrn5kvdfyhmYfNNo2cvwaFDBORvl6KTbAIkhuHteHQNpyU7H wy+LMzYdpRF1XquKnF5ejbnaR/bJyiN6wFfNuoaWEKEE1dWm+w4A1SKWL26+LOmGUa85 h4cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:content-language:references:cc:to:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=k8HZ16SPCEt4kJs5BF7drylZPBaMrVyjmsEUFcGQicc=; b=z13lUKdRLoRhovtMJhJTXiXvLmMyAFXV5/vXm/LXcSFTCLN34vhfk+Mkp72YJEQRs3 hTJ91gpIMrmBqhOfhjm3qkarAphaKP1ChJMT2vMYdNSRalLtLZ5/qbMan0oP70E28pjU EXS8qfwhFDCP8Y3NM2VskVGN7L7hK0TCv/0p0NUJ2jNL8VI72FfGwcEcqqNcYU3cFt0V R8FY+hdJt7b66KhlQhGH/1+zadXEf65geKYpAsZ692XMOWPeRSLeE+x2P0nJzglabSGC or+8cuS74/a2ME0v/NgRM4tVnBe0QwgcpjfJzx9CkcaDDrPrl/a6vJvA0e6EEBGV3wPW 1wVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Zk6bt6oV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i12-20020a17090332cc00b001a979e702b2si12682689plr.416.2023.06.14.16.02.56; Wed, 14 Jun 2023 16:03:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Zk6bt6oV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235419AbjFNWot (ORCPT + 99 others); Wed, 14 Jun 2023 18:44:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229832AbjFNWor (ORCPT ); Wed, 14 Jun 2023 18:44:47 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C45C1BF7; Wed, 14 Jun 2023 15:44:46 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 2B4556281D; Wed, 14 Jun 2023 22:44:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5B01C433C0; Wed, 14 Jun 2023 22:44:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1686782685; bh=lMvudJgKGP6G5z1D5G5ktPPOIHeYLAxIIf2pg1Bn6dA=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=Zk6bt6oVdjWME3wilTbcHVuIxYexd/ugfSoqlxQA+OKl7qFQrKmyiFMXVZXlSLBrh 7Jry8e84dvXNqFZPIPlrmBuZoyCRj9Y6/+vzIlZqLjR2NWK/Ahana4A/5nbZpJ+xds c7GN7maBBDBmKOcMkmw2tuqPXW3rM46twikbRfrEt5C92i+eoeQbNcEOXPuJrwAEOe qPfWvLRNKp0EL4+S82/uGEoYOD+Fpg8jyck9vu6YebQG1WBdkb99Gz+xlMyb2XkkyC /EzMJ5NBWSplFJ1Wj39Ba6c4ldZQoyJUY+xKb3VF1YWmjkV9RicqPqmO8QnICaTDna dUKeQNi4BltJQ== Message-ID: Date: Thu, 15 Jun 2023 07:44:39 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: Fwd: Waking up from resume locks up on sr device To: Bart Van Assche , Alan Stern Cc: Hannes Reinecke , Joe Breuer , Bagas Sanjaya , Pavel Machek , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , Kees Cook , Tony Luck , "Guilherme G. Piccoli" , Thorsten Leemhuis , "James E.J. Bottomley" , "Martin K. Petersen" , Phillip Potter , Linux Power Management , Linux Kernel Mailing List , Linux Hardening , Linux Regressions , Linux SCSI , Dan Williams , Hannes Reinecke , Adrian Hunter , Martin Kepplinger , Kai-Heng Feng References: <2d1fdf6d-682c-a18d-2260-5c5ee7097f7d@gmail.com> <5513e29d-955a-f795-21d6-ec02a2e2e128@gmail.com> <07d6e2e7-a50a-8cf4-5c5d-200551bd6687@gmail.com> <02e4f87a-80e8-dc5d-0d6e-46939f2c74ac@acm.org> <84f1c51c-86f9-04b3-0cd1-f685ebee7592@kernel.org> <37ed36f0-6f72-115c-85fb-62ef5ad72e76@suse.de> <859f0eda-4984-4489-9851-c9f6ec454a88@rowland.harvard.edu> <3f85cb4a-8b14-623f-eb4e-40baab1ed888@acm.org> Content-Language: en-US From: Damien Le Moal Organization: Western Digital Research In-Reply-To: <3f85cb4a-8b14-623f-eb4e-40baab1ed888@acm.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/15/23 03:04, Bart Van Assche wrote: > On 6/14/23 07:26, Alan Stern wrote: >> On Wed, Jun 14, 2023 at 04:35:50PM +0900, Damien Le Moal wrote: >>> Or... Why the heck scsi_rescan_device() is calling device_lock() ? This >>> is the only place in scsi code I can see that takes this lock. I suspect >>> this is to serialize either rescans, or serialize with resume, or both. >>> For serializing rescans, we can use another lock. For serializing with >>> PM, we should wait for PM transitions... >>> Something is not right here. >> >> Here's what commit e27829dc92e5 ("scsi: serialize ->rescan against >> ->remove", written by Christoph Hellwig) says: >> >> Lock the device embedded in the scsi_device to protect against >> concurrent calls to ->remove. >> >> That's the commit which added the device_lock() call. > > Even if scsi_rescan_device() would use another mechanism for > serialization against sd_remove() and sr_remove(), we still need to > solve the issue that the ATA code calls scsi_rescan_device() before > resuming has finished. scsi_rescan_device() issues I/O. Issuing I/O to a > device is not allowed before that device has been resumed. I am not convinced of that: scsi suspend quiecse the queue, thus preventing IOs from the block layer, but not internale scsi ml commands, which is what scsi_rescan_device() issues. In any case, I am thinking that best (and quickest) fix for this issue for now is to have libata define a device link to make the scsi device a "parent" of the ata device (which is the ata link as of now). This way, PM operation ordering will ensure that the scsi device resume will be done before the ata device. What I really do not like about this though is that the suspend side would be done in the reverse order: ata first and then scsi, but we really want the reverse here to ensure that the request queue is quiesced before we suspend ata. That said, there is no such synchronization right now and so this is probably happening already without raising issues apparently. So ideally: 1) Make the ata device the parent of the scsi device using a device link 2) For suspend, the scsi device suspend will be done first, followed by the ata device, which is what we want. 3) For resume, ata device will be first, followed by scsi device. The call to scsi_rescan_device() from libata being in a work task, asynchronous from the ata resume context, we need to synchronize that work to wait for the scsi device resume to complete. (but do we really given that we are going to issue internal commands only ?) Alan, Rafael, For the synchronization of step (3), if I understand the pm code correctly, using device_pm_wait_for_dev() would work only if async resume is on. This would be ineffective for the sync case. How can we best deal with this ? -- Damien Le Moal Western Digital Research