Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp155211ybl; Wed, 22 Jan 2020 18:16:26 -0800 (PST) X-Google-Smtp-Source: APXvYqy+UMggbMw7esRpwWHuaysMCfsWz5+owqLQSB4XhfZNoUv4RMxLJo6dIB4+fKCp03qq0voj X-Received: by 2002:a05:6830:10c6:: with SMTP id z6mr10123316oto.203.1579745786619; Wed, 22 Jan 2020 18:16:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579745786; cv=none; d=google.com; s=arc-20160816; b=NZDItTyPhsISUXPUIxtSE/v50yCS5VzobRmYpbAaF+wnZY2ECcsHAH4eGBMZFAM1t6 5a13pRAeL4+LwdTGJLsSQMD68qPKFweWfKa9Qf0b7YAq8xZgU+BEjgLOdTu4WD4dNjKp N90ZZZyCHsZgBvndIPhYegIu+M4MitZwmRvWPuvxRR7hJLX+X6w+PetcHWc6XmKVDzFm iOWALV28XS/u0iDa7md7oUMMoPr8jOsyGp4IS2scNmdmjG0tu+xcyG7Uky730j9dwoM2 XknA3whohMxbzfqfsolD9ADo7pP2gwEludEWIi/GJKQ5go7vOxergkLeDZZm3+WjJbRG RUEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=tMkvrQ8d6snTBedSYACRJ1y02gl0g6UsGyaGiOa7S5w=; b=mNaZZGtNqYExyRnisynsGCrW5T3MUoB6eUMk8PLvz4wsU1b614VpDT1bNLVUjB9VO3 E+CG2OR1IRJV5Rv7k+wnkRj9nSuyQnLqRrdDhAYGg/8K6JfUcRr3tMg0tTwA0C9JiV3+ EEKoU2GmypAxBPlItLb63lm4jbZFQq8+Nik7qgFGRIoQJSCEBZ8q0d8b2YXbOCZ7La85 Si223acIprftnvYhRoIVDDfSj8Sv8lcwFAXYLpj6jTCkHfsV65bq4xCJHz5XCjQg1KT2 VGOcygxxWD0zOwATXE6byHzkefPg4ti16LzRHNyjl6FeAmjDb8CEjjuS4KLTNDdzym9i M1JA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r1si350389otk.251.2020.01.22.18.16.14; Wed, 22 Jan 2020 18:16:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726205AbgAWCOI (ORCPT + 99 others); Wed, 22 Jan 2020 21:14:08 -0500 Received: from lgeamrelo11.lge.com ([156.147.23.51]:36456 "EHLO lgeamrelo11.lge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725911AbgAWCOI (ORCPT ); Wed, 22 Jan 2020 21:14:08 -0500 Received: from unknown (HELO lgemrelse6q.lge.com) (156.147.1.121) by 156.147.23.51 with ESMTP; 23 Jan 2020 11:14:05 +0900 X-Original-SENDERIP: 156.147.1.121 X-Original-MAILFROM: chanho.min@lge.com Received: from unknown (HELO ?10.178.36.63?) (10.178.36.63) by 156.147.1.121 with ESMTP; 23 Jan 2020 11:14:05 +0900 X-Original-SENDERIP: 10.178.36.63 X-Original-MAILFROM: chanho.min@lge.com Subject: Re: [PATCH] PM: core: Fix handling of devices deleted during system-wide resume To: "Rafael J. Wysocki" , Linux PM Cc: LKML , Greg Kroah-Hartman , Daewoong Kim , Lee Gunho , "Rafael J. Wysocki" References: <2601275.1tEomSadG4@kreacher> From: Chanho Min Message-ID: Date: Thu, 23 Jan 2020 11:14:04 +0900 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: <2601275.1tEomSadG4@kreacher> Content-Type: text/plain; charset=euc-kr; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > From: Rafael J. Wysocki > > If a device is deleted by one of its system-wide resume callbacks > (for example, because it does not appear to be present or accessible > any more) along with its children, the resume of the children may > continue leading to use-after-free errors and other issues > (potentially). > > Namely, if the device's children are resumed asynchronously, their > resume may have been scheduled already before the device's callback > runs and so the device may be deleted while dpm_wait_for_superior() > is being executed for them. The memory taken up by the parent device > object may be freed then while dpm_wait() is waiting for the parent's > resume callback to complete, which leads to a use-after-free. > Moreover, the resume of the children is really not expected to > continue after they have been unregistered, so it must be terminated > right away in that case.Seokjoo Lee > > To address this problem, modify dpm_wait_for_superior() to check > if the target device is still there in the system-wide PM list of > devices and if so, to increment its parent's reference counter, both > under dpm_list_mtx which prevents device_del() running for the child > from dropping the parent's reference counter prematurely. > > If the device is not present in the system-wide PM list of devices > any more, the resume of it cannot continue, so check that again after > dpm_wait() returns, which means that the parent's callback has been > completed, and pass the result of that check to the caller of > dpm_wait_for_superior() to allow it to abort the device's resume > if it is not there any more. > > Link: https://lore.kernel.org/linux-pm/1579568452-27253-1-git-send-email-chanho.min@lge.com > Reported-by: Chanho Min > Signed-off-by: Rafael J. Wysocki > --- > drivers/base/power/main.c | 42 +++++++++++++++++++++++++++++++++++++----- > 1 file changed, 37 insertions(+), 5 deletions(-) > > Index: linux-pm/drivers/base/power/main.c > =================================================================== > --- linux-pm.orig/drivers/base/power/main.cSeokjoo Lee > +++ linux-pm/drivers/base/power/main.c20. 1. 23. ???? 8:11?? Rafael J. Wysocki ??(??) ?? ??: > @@ -273,10 +273,38 @@ static void dpm_wait_for_suppliers(struc > device_links_read_unlock(idx); > } > > -static void dpm_wait_for_superior(struct device *dev, bool async) > +static bool dpm_wait_for_superior(struct device *dev, bool async) > { > - dpm_wait(dev->parent, async); > + struct device *parent;board > + > + /* > + * If the device is resumed asynchronously and the parent's callback > + * deletes both the device and the parent itself, the parent object may > + * be freed while this function is running, so avoid that by reference > + * counting the parent once more unless the device has been deleted > + * already (in which case return right away). > + */ > + mutex_lock(&dpm_list_mtx); > + > + if (!device_pm_initialized(dev)) {20. 1. 23. ???? 8:11?? Rafael J. Wysocki ??(??) ?? ??: > + mutex_unlock(&dpm_list_mtx); > + return false; > + } > + > + parent = get_device(dev->parent); > + > + mutex_unlock(&dpm_list_mtx); > + > + dpm_wait(parent, async); > + put_device(parent); > + > dpm_wait_for_suppliers(dev, async); > + > + /* > + * If the parent's callback has deleted the device, attempting to resume > + * it would be invalid, so avoid doing that then. > + */ > + return device_pm_initialized(dev);20. 1. 23. ???? 8:11?? Rafael J. Wysocki ??(??) ?? ??: > } > > static void dpm_wait_for_consumers(struct device *dev, bool async) > @@ -621,7 +649,8 @@ static int device_resume_noirq(struct de > if (!dev->power.is_noirq_suspended) > goto Out; > > - dpm_wait_for_superior(dev, async); > + if (!dpm_wait_for_superior(dev, async)) > + goto Out; > > skip_resume = dev_pm_may_skip_resume(dev); > > @@ -829,7 +858,8 @@ static int device_resume_early(struct de > if (!dev->power.is_late_suspended) > goto Out; > > - dpm_wait_for_superior(dev, async);Seokjoo Lee > + if (!dpm_wait_for_superior(dev, async)) > + goto Out; > > callback = dpm_subsys_resume_early_cb(dev, state, &info); > > @@ -944,7 +974,9 @@ static int device_resume(struct device * > goto Complete; > } > > - dpm_wait_for_superior(dev, async); > + if (!dpm_wait_for_superior(dev, async)) > + goto Complete; > + > dpm_watchdog_set(&wd, dev); > device_lock(dev);Thanks, This seems to solve the rare hang on our target. Actually, the problem is occurred in v4.4. Shouldn't it apply to -stable? Chanho