Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp828764iob; Thu, 12 May 2022 05:40:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzFFP1PCVoM30lX1uBTezje0Ugg+Nj1eCtMPC6gzTMug9QrykYEy0l/X4VuEPog8/ieIb8O X-Received: by 2002:a17:906:58d1:b0:6f4:6e61:dae with SMTP id e17-20020a17090658d100b006f46e610daemr29659958ejs.468.1652359219655; Thu, 12 May 2022 05:40:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652359219; cv=none; d=google.com; s=arc-20160816; b=GwFk7LWJBibj4+1bUWck9MdepaNUdFAu0zmmqO3r2B6fW7Ne5VkLVPxftqeTQ0X+0/ io9eWnlLRwI0nh/whlAHev59k1ynlBELvp3be+OvZOvJ3VkGMxJQ2wFsWxfjHecRACb8 mrHdy2MuVK3JUaG9+rz+6vXzOjdR6w6mzNuaEnVdfaVuvQUYG1txTZK8Dlzxcq41yyvb XSAAgWWBk5YLSGmbV4o7YoEdUGJUKiuKNVpGck1gNg7jVdK3SkbCRwJ9Ec1NWWT38vSN iwXv1NlTFUvlpUpFNR0DvOBlxNbiVWoFMXXq3DRg4UZ539OUKJgdN5sihk7xpROR5hPA +Q9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=6T461oZUF/Gcihy0sNyV0keeWDDkJXZ5dXIGK7Rlpm4=; b=mgG6234yX8MAeme8MprjRhviWbdiV2yvidvGwdNXCfknbuq0E5nA3UUjqVpjJyeART tBYlFfI8WbPlZiag96lCwbRz7oZ/OhJk829Lko5IyVgATOZbJsXbzYlXXy/n4gYcQ+BV Jg6R2GRq+CxtpYq4JE+sIWikQ14vWD0daUU8uKJ0rciCUSXD2UnEOlavOyIhebwj6elS EXTKtna4B5E0MQE8e2buJAlvdreqAFn10cO5JgEhmNfeGeyM3nwRKmGX0UdKZEDshEo9 ambPbTn1AKS+ykwjDRSflfwCFy6a3M1DeuBRy4HU7Ere9PVvEp2ZpkqWEzVkqQndZ3zw JsXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ji17-20020a170907981100b006f496889e5dsi5425851ejc.516.2022.05.12.05.39.53; Thu, 12 May 2022 05:40:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244169AbiEKNud (ORCPT + 99 others); Wed, 11 May 2022 09:50:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244192AbiEKNu2 (ORCPT ); Wed, 11 May 2022 09:50:28 -0400 Received: from mail-yb1-f173.google.com (mail-yb1-f173.google.com [209.85.219.173]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D9FA248248; Wed, 11 May 2022 06:50:23 -0700 (PDT) Received: by mail-yb1-f173.google.com with SMTP id v59so4076068ybi.12; Wed, 11 May 2022 06:50:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6T461oZUF/Gcihy0sNyV0keeWDDkJXZ5dXIGK7Rlpm4=; b=FhD/mJI6beFgnO2dlADIJAJfIwvW4jGtAWwL2F6iLqVRVLiIYcmrQ5fANhRa0jUSR3 zhMAso//l5f6lyO9LocDtRgfLwV0noYvlt3Cx0PLFs52u8VYB/6941Z+OCSXnZyj3xqP Iv0FaQkeSQ0H+fM+GQPEhRdnANRLhZ6JS+oz+QroNCibcI7jxCEMZ80snwm3bLDQA3vr 9/fmafGknoUQmjXyHjwiMaOgPrAuYvlaM4EDcpHm1g50m2O+luvYHTZY1FhExDhMKLmH ysDFaRVTzG54mTBsm2CleiAJmucQZvhjduGg/nVwQeysGeljsU30u1yqTemhW52SLeNY ZFog== X-Gm-Message-State: AOAM531zux7S254RJAk/GdC6oyG3hdu/7QKEpQ35H76W6snLmKOXX1Dg 8KP1dKMyT6B10Uwm0N4DLhfIF362FKAYC363A54= X-Received: by 2002:a25:e792:0:b0:645:7ddb:b5eb with SMTP id e140-20020a25e792000000b006457ddbb5ebmr23615068ybh.482.1652277022345; Wed, 11 May 2022 06:50:22 -0700 (PDT) MIME-Version: 1.0 References: <20220510035259.5ep52sgahd2a6rie@vireshk-i7> <20220510154236.88753-1-schspa@gmail.com> <20220511043515.fn2gz6q3kcpdai5p@vireshk-i7> <20220511122114.wccgyur6g3qs6fps@vireshk-i7> In-Reply-To: From: "Rafael J. Wysocki" Date: Wed, 11 May 2022 15:50:11 +0200 Message-ID: Subject: Re: [PATCH v3] cpufreq: fix race on cpufreq online To: Schspa Shi Cc: "Rafael J. Wysocki" , Viresh Kumar , Linux Kernel Mailing List , Linux PM Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 11, 2022 at 3:42 PM Schspa Shi wrote: > > "Rafael J. Wysocki" writes: > > > On Wed, May 11, 2022 at 2:59 PM Rafael J. Wysocki wrote: > >> > >> On Wed, May 11, 2022 at 2:21 PM Viresh Kumar wrote: > >> > > >> > On 11-05-22, 16:10, Schspa Shi wrote: > >> > > Viresh Kumar writes: > >> > > > I am not sure, but maybe there were issues in calling init() with rwsem held, as > >> > > > it may want to call some API from there. > >> > > > > >> > > > >> > > I have checked all the init() implement of the fellowing files, It should be OK. > >> > > Function find command: > >> > > ag "init[\s]+=" drivers/cpufreq > >> > > > >> > > All the init() implement only initialize policy object without holding this lock > >> > > and won't call cpufreq APIs need to hold this lock. > >> > > >> > Okay, we can see if someone complains later then :) > >> > > >> > > > I don't think you can do that safely. offline() or exit() may depend on > >> > > > policy->cpus being set to all CPUs. > >> > > OK, I will move this after exit(). and there will be no effect with those > >> > > two APIs. But policy->cpus must be clear before release policy->rwsem. > >> > > >> > Hmm, I don't think depending on the values of policy->cpus is a good idea to be > >> > honest. This design is inviting bugs to come in at another place. We need a > >> > clear flag for this, a new flag or something like policy_list. > > > > Why? > > > >> > Also I see the same bug happening while the policy is removed. The kobject is > >> > put after the rwsem is dropped. > > > > This shouldn't be a problem because of the wait_for_completion() in > > cpufreq_policy_put_kobj(). It is known that cpufreq_sysfs_release() > > has run when cpufreq_policy_put_kobj() returns, so it is safe to free > > the policy then. > > > >> > > > static inline bool policy_is_inactive(struct cpufreq_policy *policy) > >> > > > { > >> > > > - return cpumask_empty(policy->cpus); > >> > > > + return unlikely(cpumask_empty(policy->cpus) || > >> > > > + list_empty(&policy->policy_list)); > >> > > > } > >> > > > > >> > > > >> > > I don't think this fully solves my problem. > >> > > 1. There is some case which cpufreq_online failed after the policy is added to > >> > > cpufreq_policy_list. > >> > > >> > And I missed that :( > >> > > >> > > 2. policy->policy_list is not protected by &policy->rwsem, and we > >> > > can't relay on this to > >> > > indict the policy is fine. > >> > > >> > Ahh.. > >> > > >> > > >From this point of view, we can fix this problem through the state of > >> > > this linked list. > >> > > But the above two problems need to be solved first. > >> > > >> > I feel overriding policy_list for this is going to make it complex/messy. > >> > > >> > Maybe something like this then: > >> > >> There are two things. > >> > >> One is the possible race with respect to the sysfs access occurring > >> during failing initialization and the other is that ->offline() or > >> ->exit() can be called with or without holding the policy rwsem > >> depending on the code path. > >> > >> Namely, cpufreq_offline() calls them under the policy rwsem, but > >> cpufreq_remove_dev() calls ->exit() outside the rwsem. Also they are > >> called outside the rwsem in cpufreq_online(). > >> > >> Moreover, ->offline() and ->exit() cannot expect policy->cpus to be > >> populated, because they are called when it is empty from > >> cpufreq_offline(). > >> > >> So the $subject patch is correct AFAICS even though it doesn't address > >> all of the above. > > > > TBH, I'm not sure why show() doesn't check policy_is_inactive() under the rwsem. > > > > There is a exist bugs, and somebody try to fixed, please see commit > Fixes: 2f66196208c9 ("cpufreq: check if policy is inactive early in > __cpufreq_get()") Well, exactly. This only addressed one bug out of a category. > > Moreover, I'm not sure why the locking dance in store() is necessary. > > The store interface hold cpu_hotplug_lock via > cpus_read_trylock(); > , cannot run in parallel with cpufreq_online() & cpufreq_offline(). So the reason why is to prevent store() from running in parallel with the two functions above. Which generally is because the policy configuration is in-flight then. However, I'm wondering about what exactly would break then.