Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp1047805ybl; Thu, 12 Dec 2019 08:52:47 -0800 (PST) X-Google-Smtp-Source: APXvYqyokEBBlbiabj3ATfGgWzwC91HllSjfcbehqfgHM9VPAnr/zQ2zQPJch1W81fofGPj6aMdG X-Received: by 2002:a9d:53c2:: with SMTP id i2mr8606048oth.43.1576169567484; Thu, 12 Dec 2019 08:52:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576169567; cv=none; d=google.com; s=arc-20160816; b=pw+67qzih74iIx+z7bpUwsgvxnuwkVHMsIeIUTvsaHFhBXngeW3p/qvFW/WgzF6/c/ 97Eor3SAHDveMqlP05Fndj3fBBcYISFjKEl1oLAO9PGYV704KV3fsvecWnGTLwjxiMLx Zhqsq6zdRr5G0FvLI2v0o8hh/b63MOTT/jcSNBic/YX39lCV/9OcuKSfdBxZwzKh1V0V zHpDor/TK4sajLpeczyNevC8elgc+JCXoGdLPDDWFAX4hq5ZK41AlAQWnu7Fg/KfIcpx Ty4XzOkTEPIf4X07ySMOvfHK7lGHAFiJFSxrxoQXxBeBMyUwfttTBoUBejXSDEqysHK5 9fYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=XsNHcoXojvn1TcE1nuLVaSYl2O55IiSy7eZOYf0Q2mU=; b=TgM6Yz7AHGdaGxSfWRe2W0iQ07LtlMSxt8rGx+3nyFLGz0zWFPLgdyYtLn0l7yGqXS 6w8/hwwS2SeQm79Gw/js0Uj2/SMkXWYl85wki5KUSKNMdwV3BMxrzPWuJSbxd8/+cZg2 eS5wLugxrKEWkMfaxLupE1IxJEyltTxjHShoRvIig0g+vppqvG0e3uU9zYnX7QI39+nQ eMe612uAMFCK8QUY06Z8+5TrnagpbjFKZxkJQ3O6HDZVWa+H9Xg0hswJlxJ6Q6AnbsUG qRzDdsKv5rh8PtM5+diS5wOZC16UkcFcOVekAL3TT3RY9dag3UASBJ9Jn3J63FsAEE7x K6Ag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w16si3395163oih.154.2019.12.12.08.52.23; Thu, 12 Dec 2019 08:52:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730013AbfLLQvh (ORCPT + 99 others); Thu, 12 Dec 2019 11:51:37 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:35977 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729804AbfLLQvh (ORCPT ); Thu, 12 Dec 2019 11:51:37 -0500 Received: by mail-ot1-f66.google.com with SMTP id i4so2664973otr.3; Thu, 12 Dec 2019 08:51:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XsNHcoXojvn1TcE1nuLVaSYl2O55IiSy7eZOYf0Q2mU=; b=Ds8tcD/+8UTS2Exjg55SdWLE7JLnsXy1P/UhYBqeupvZm8P1Xy2TWOUN1dKYX9lr3b xHIsBrck2o98fB4G/dgrdiSiVQLNtzfRVPDdpBD1hZtaR9pnuocg2AO3IVyFUDM2nmoK 1rYKhT8Jn0hMFfOdBY+QFHe9uy7Pqa63TGXD+/9zW/lYKfu0royzLWCtQPDfZnpuHsBR 9KlkUtppp026NWi9INKbCX0bq3Gy4BQcI3bWyBBQWhFQ/rj0o/I6yiQl1/pOTgZ6gPqc TMD7IxPa1r561L9IJpAUeFeXEQeM5VC8mwRQqsMAXAXYiyZa7pJdUQA89lnufOryztq2 qSZQ== X-Gm-Message-State: APjAAAXazOKePaWNZGyv4CiZ97K3ZwZHDF+3MvUYkIyfNyYVCIukOm07 0ryQBh8aYHSd6IS0asoM6uUfXl7PVtuGr5HZOdvpdA== X-Received: by 2002:a05:6830:95:: with SMTP id a21mr8624849oto.167.1576169496091; Thu, 12 Dec 2019 08:51:36 -0800 (PST) MIME-Version: 1.0 References: <2691942.bH9KnLg61H@kreacher> In-Reply-To: <2691942.bH9KnLg61H@kreacher> From: "Rafael J. Wysocki" Date: Thu, 12 Dec 2019 17:51:25 +0100 Message-ID: Subject: Re: [PATCH] cpufreq: Avoid leaving stale IRQ work items during CPU offline To: Linux PM , Peter Zijlstra Cc: LKML , Viresh Kumar , Anson Huang , Peng Fan , "Rafael J. Wysocki" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 11, 2019 at 11:28 AM Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki > > The scheduler code calling cpufreq_update_util() may run during CPU > offline on the target CPU after the IRQ work lists have been flushed > for it, so the target CPU should be prevented from running code that > may queue up an IRQ work item on it at that point. > > Unfortunately, that may not be the case if dvfs_possible_from_any_cpu > is set for at least one cpufreq policy in the system, because that > allows the CPU going offline to run the utilization update callback > of the cpufreq governor on behalf of another (online) CPU in some > cases. > > If that happens, the cpufreq governor callback may queue up an IRQ > work on the CPU running it, which is going offline, and the IRQ work > will not be flushed after that point. Moreover, that IRQ work cannot > be flushed until the "offlining" CPU goes back online, so if any > other CPU calls irq_work_sync() to wait for the completion of that > IRQ work, it will have to wait until the "offlining" CPU is back > online and that may not happen forever. In particular, a system-wide > deadlock may occur during CPU online as a result of that. > > The failing scenario is as follows. CPU0 is the boot CPU, so it > creates a cpufreq policy and becomes the "leader" of it > (policy->cpu). It cannot go offline, because it is the boot CPU. > Next, other CPUs join the cpufreq policy as they go online and they > leave it when they go offline. The last CPU to go offline, say CPU3, > may queue up an IRQ work while running the governor callback on > behalf of CPU0 after leaving the cpufreq policy because of the > dvfs_possible_from_any_cpu effect described above. Then, CPU0 is > the only online CPU in the system and the stale IRQ work is still > queued on CPU3. When, say, CPU1 goes back online, it will run > irq_work_sync() to wait for that IRQ work to complete and so it > will wait for CPU3 to go back online (which may never happen even > in principle), but (worse yet) CPU0 is waiting for CPU1 at that > point too and a system-wide deadlock occurs. > > To address this problem notice that CPUs which cannot run cpufreq > utilization update code for themselves (for example, because they > have left the cpufreq policies that they belonged to), should also > be prevented from running that code on behalf of the other CPUs that > belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so > in that case the cpufreq_update_util_data pointer of the CPU running > the code must not be NULL as well as for the CPU which is the target > of the cpufreq utilization update in progress. > > Accordingly, change cpufreq_this_cpu_can_update() into a regular > function in kernel/sched/cpufreq.c (instead of a static inline in a > header file) and make it check the cpufreq_update_util_data pointer > of the local CPU if dvfs_possible_from_any_cpu is set for the target > cpufreq policy. > > Also update the schedutil governor to do the > cpufreq_this_cpu_can_update() check in the non-fast-switch > case too to avoid the stale IRQ work issues. > > Fixes: 99d14d0e16fa ("cpufreq: Process remote callbacks from any CPU if the platform permits") > Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/ > Reported-by: Anson Huang > Cc: 4.14+ # 4.14+ > Signed-off-by: Rafael J. Wysocki > --- > > Peter, > > The reason why I want to address the issue this way is because IMO > the right place to do the check is the cpufreq governor. Governors > that don't use cpufreq_this_cpu_can_update() at all don't need to > worry as well as governors that don't use IRQ works. > > The cpufreq governor is given an opportunity to update the frequency > of the CPU and it needs to decide whether or not to really do that. > > Please let me know if you have any concerns. Given the response so far this doesn't seem to be controversial, so I'm going to queue it up for 5.5.