2023-04-25 06:27:29

by Kai-Heng Feng

[permalink] [raw]
Subject: [PATCH v2] ata: libata: Defer rescan on suspended device

During system resume, if an EH is schduled after ATA host is resumed
(i.e. ATA_PFLAG_PM_PENDING cleared), but before the disk device is
fully resumed, the device_lock hold by scsi_rescan_device() is never
released so the dpm_resume() of the disk is blocked forerver.

That's because scsi_attach_vpd() is expecting the disk device is in
operational state, as it doesn't work on suspended device.

To avoid such deadlock, defer rescan if the disk is still suspended so
the resume process of the disk device can proceed. At the end of the
resume process, use the complete() callback to schedule the rescan task.

Signed-off-by: Kai-Heng Feng <[email protected]>
---
v2:
- Schedule rescan task at the end of system resume phase.
- Wording.

drivers/ata/libata-core.c | 11 +++++++++++
drivers/ata/libata-eh.c | 11 +++++++++--
include/linux/libata.h | 1 +
3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 14c17c3bda4e..564d72bf1ec6 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5093,6 +5093,16 @@ static int ata_port_pm_poweroff(struct device *dev)
return 0;
}

+static void ata_port_pm_complete(struct device *dev)
+{
+ struct ata_port *ap = to_ata_port(dev);
+
+ if (ap->pflags & ATA_PFLAG_DEFER_RESCAN)
+ schedule_work(&(ap->scsi_rescan_task));
+
+ ap->pflags &= ~ATA_PFLAG_DEFER_RESCAN;
+}
+
static const unsigned int ata_port_resume_ehi = ATA_EHI_NO_AUTOPSY
| ATA_EHI_QUIET;

@@ -5158,6 +5168,7 @@ static const struct dev_pm_ops ata_port_pm_ops = {
.thaw = ata_port_pm_resume,
.poweroff = ata_port_pm_poweroff,
.restore = ata_port_pm_resume,
+ .complete = ata_port_pm_complete,

.runtime_suspend = ata_port_runtime_suspend,
.runtime_resume = ata_port_runtime_resume,
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index a6c901811802..0881b590fb7e 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -15,6 +15,7 @@
#include <linux/blkdev.h>
#include <linux/export.h>
#include <linux/pci.h>
+#include <linux/suspend.h>
#include <scsi/scsi.h>
#include <scsi/scsi_host.h>
#include <scsi/scsi_eh.h>
@@ -2983,8 +2984,14 @@ static int ata_eh_revalidate_and_attach(struct ata_link *link,
*/
ehc->i.flags |= ATA_EHI_SETMODE;

- /* schedule the scsi_rescan_device() here */
- schedule_work(&(ap->scsi_rescan_task));
+ /* Schedule the scsi_rescan_device() here.
+ * Defer the rescan if it's in process of
+ * suspending or resuming.
+ */
+ if (pm_suspend_target_state != PM_SUSPEND_ON)
+ ap->pflags |= ATA_PFLAG_DEFER_RESCAN;
+ else
+ schedule_work(&(ap->scsi_rescan_task));
} else if (dev->class == ATA_DEV_UNKNOWN &&
ehc->tries[dev->devno] &&
ata_class_enabled(ehc->classes[dev->devno])) {
diff --git a/include/linux/libata.h b/include/linux/libata.h
index a759dfbdcc91..aaa549444abc 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -189,6 +189,7 @@ enum {
ATA_PFLAG_UNLOADING = (1 << 9), /* driver is being unloaded */
ATA_PFLAG_UNLOADED = (1 << 10), /* driver is unloaded */

+ ATA_PFLAG_DEFER_RESCAN = (1 << 16), /* peform deferred rescan on system resume */
ATA_PFLAG_SUSPENDED = (1 << 17), /* port is suspended (power) */
ATA_PFLAG_PM_PENDING = (1 << 18), /* PM operation pending */
ATA_PFLAG_INIT_GTM_VALID = (1 << 19), /* initial gtm data valid */
--
2.34.1


2023-04-25 08:43:32

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2] ata: libata: Defer rescan on suspended device

Hi Kai-Heng,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.3 next-20230424]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Kai-Heng-Feng/ata-libata-Defer-rescan-on-suspended-device/20230425-142001
base: linus/master
patch link: https://lore.kernel.org/r/20230425061746.503145-1-kai.heng.feng%40canonical.com
patch subject: [PATCH v2] ata: libata: Defer rescan on suspended device
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230425/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/bed073367820f7e00147a8f39ed5f0d9d5eb6c67
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Kai-Heng-Feng/ata-libata-Defer-rescan-on-suspended-device/20230425-142001
git checkout bed073367820f7e00147a8f39ed5f0d9d5eb6c67
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

drivers/ata/libata-eh.c: In function 'ata_eh_revalidate_and_attach':
>> drivers/ata/libata-eh.c:2991:29: error: 'pm_suspend_target_state' undeclared (first use in this function); did you mean 'scsi_target_state'?
2991 | if (pm_suspend_target_state != PM_SUSPEND_ON)
| ^~~~~~~~~~~~~~~~~~~~~~~
| scsi_target_state
drivers/ata/libata-eh.c:2991:29: note: each undeclared identifier is reported only once for each function it appears in


vim +2991 drivers/ata/libata-eh.c

2927
2928 static int ata_eh_revalidate_and_attach(struct ata_link *link,
2929 struct ata_device **r_failed_dev)
2930 {
2931 struct ata_port *ap = link->ap;
2932 struct ata_eh_context *ehc = &link->eh_context;
2933 struct ata_device *dev;
2934 unsigned int new_mask = 0;
2935 unsigned long flags;
2936 int rc = 0;
2937
2938 /* For PATA drive side cable detection to work, IDENTIFY must
2939 * be done backwards such that PDIAG- is released by the slave
2940 * device before the master device is identified.
2941 */
2942 ata_for_each_dev(dev, link, ALL_REVERSE) {
2943 unsigned int action = ata_eh_dev_action(dev);
2944 unsigned int readid_flags = 0;
2945
2946 if (ehc->i.flags & ATA_EHI_DID_RESET)
2947 readid_flags |= ATA_READID_POSTRESET;
2948
2949 if ((action & ATA_EH_REVALIDATE) && ata_dev_enabled(dev)) {
2950 WARN_ON(dev->class == ATA_DEV_PMP);
2951
2952 /*
2953 * The link may be in a deep sleep, wake it up.
2954 *
2955 * If the link is in deep sleep, ata_phys_link_offline()
2956 * will return true, causing the revalidation to fail,
2957 * which leads to a (potentially) needless hard reset.
2958 *
2959 * ata_eh_recover() will later restore the link policy
2960 * to ap->target_lpm_policy after revalidation is done.
2961 */
2962 if (link->lpm_policy > ATA_LPM_MAX_POWER) {
2963 rc = ata_eh_set_lpm(link, ATA_LPM_MAX_POWER,
2964 r_failed_dev);
2965 if (rc)
2966 goto err;
2967 }
2968
2969 if (ata_phys_link_offline(ata_dev_phys_link(dev))) {
2970 rc = -EIO;
2971 goto err;
2972 }
2973
2974 ata_eh_about_to_do(link, dev, ATA_EH_REVALIDATE);
2975 rc = ata_dev_revalidate(dev, ehc->classes[dev->devno],
2976 readid_flags);
2977 if (rc)
2978 goto err;
2979
2980 ata_eh_done(link, dev, ATA_EH_REVALIDATE);
2981
2982 /* Configuration may have changed, reconfigure
2983 * transfer mode.
2984 */
2985 ehc->i.flags |= ATA_EHI_SETMODE;
2986
2987 /* Schedule the scsi_rescan_device() here.
2988 * Defer the rescan if it's in process of
2989 * suspending or resuming.
2990 */
> 2991 if (pm_suspend_target_state != PM_SUSPEND_ON)
2992 ap->pflags |= ATA_PFLAG_DEFER_RESCAN;
2993 else
2994 schedule_work(&(ap->scsi_rescan_task));
2995 } else if (dev->class == ATA_DEV_UNKNOWN &&
2996 ehc->tries[dev->devno] &&
2997 ata_class_enabled(ehc->classes[dev->devno])) {
2998 /* Temporarily set dev->class, it will be
2999 * permanently set once all configurations are
3000 * complete. This is necessary because new
3001 * device configuration is done in two
3002 * separate loops.
3003 */
3004 dev->class = ehc->classes[dev->devno];
3005
3006 if (dev->class == ATA_DEV_PMP)
3007 rc = sata_pmp_attach(dev);
3008 else
3009 rc = ata_dev_read_id(dev, &dev->class,
3010 readid_flags, dev->id);
3011
3012 /* read_id might have changed class, store and reset */
3013 ehc->classes[dev->devno] = dev->class;
3014 dev->class = ATA_DEV_UNKNOWN;
3015
3016 switch (rc) {
3017 case 0:
3018 /* clear error info accumulated during probe */
3019 ata_ering_clear(&dev->ering);
3020 new_mask |= 1 << dev->devno;
3021 break;
3022 case -ENOENT:
3023 /* IDENTIFY was issued to non-existent
3024 * device. No need to reset. Just
3025 * thaw and ignore the device.
3026 */
3027 ata_eh_thaw_port(ap);
3028 break;
3029 default:
3030 goto err;
3031 }
3032 }
3033 }
3034
3035 /* PDIAG- should have been released, ask cable type if post-reset */
3036 if ((ehc->i.flags & ATA_EHI_DID_RESET) && ata_is_host_link(link)) {
3037 if (ap->ops->cable_detect)
3038 ap->cbl = ap->ops->cable_detect(ap);
3039 ata_force_cbl(ap);
3040 }
3041
3042 /* Configure new devices forward such that user doesn't see
3043 * device detection messages backwards.
3044 */
3045 ata_for_each_dev(dev, link, ALL) {
3046 if (!(new_mask & (1 << dev->devno)))
3047 continue;
3048
3049 dev->class = ehc->classes[dev->devno];
3050
3051 if (dev->class == ATA_DEV_PMP)
3052 continue;
3053
3054 ehc->i.flags |= ATA_EHI_PRINTINFO;
3055 rc = ata_dev_configure(dev);
3056 ehc->i.flags &= ~ATA_EHI_PRINTINFO;
3057 if (rc) {
3058 dev->class = ATA_DEV_UNKNOWN;
3059 goto err;
3060 }
3061
3062 spin_lock_irqsave(ap->lock, flags);
3063 ap->pflags |= ATA_PFLAG_SCSI_HOTPLUG;
3064 spin_unlock_irqrestore(ap->lock, flags);
3065
3066 /* new device discovered, configure xfermode */
3067 ehc->i.flags |= ATA_EHI_SETMODE;
3068 }
3069
3070 return 0;
3071
3072 err:
3073 *r_failed_dev = dev;
3074 return rc;
3075 }
3076

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests