Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp702398iob; Thu, 12 May 2022 02:54:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzyBOI1jMK0a4rX9P7sZp7q7ueFf1cFJGW5Ke3TItFvUWJf/zpK5Juv1VKB7OG8sM83gbJD X-Received: by 2002:a17:902:8487:b0:15f:b2c:435 with SMTP id c7-20020a170902848700b0015f0b2c0435mr20003220plo.33.1652349240359; Thu, 12 May 2022 02:54:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652349240; cv=none; d=google.com; s=arc-20160816; b=i8o3+vDA7z0a7NtTWW3q42eSZgJkDEGC5w3+mDACYzNgoOtnID4Ia93jU6jyXokFv2 b2JYEZS/ilDTO3OPO/3/K4E9qlGEr0pZflB2anbYEApJxgBu3ULzIcxSPSYj1xeIuBzU vW9nkf7CPwvbi4WJBA/X95V68kGqK3YxfiBQdgRjmZVUKlkagubjPSsufd7ZPO5FShgQ +Fw9UFGJU1dowoqnb+jofH/LZNFS66RqkYZy6sTBEH525y+fjZn+p5fnYDz5lDxf/RqS SoqzgPRQBOTe1dA6n3VASDDq9S9Wj1Ns8Kr3S7FF/5EoeMfw+1f2CCqIQTAn9CrUC8Lq ktVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version; bh=U2CF1sPzePx4zRQ9D03pSv7PxSEBSI7R4J8C6F4VN9Q=; b=ACaYooKqOIzRK+fEzaLQ9tmjgi2pm2ybYue7XzCkvlo3hf3LlDpuvHEkJel/UxWGlt 9NXDVbd+1OCMNdBN5wYbUd3mFp9b0ueX14lGdlJigFc4192iq+lqR4QmcaSG+4oj5cWL gSPEywEcHEfran5EpyT5wu/m5dgfBKlGF6oVHSJz7RBV6aS+JhnJ+o0oxZlVGFA18Llu o6SS1TQfl/2ADcIvN/e/GRDiGGWwghgSiv1/EFQi+PUSU0PBeKmlz1NuRh5LSkHrGeMA NkmhOWAfFIHn0bA05GPLIJk1n5aU/V8MILOCvQ8pCcKH3TYsdSkbn80bvi/9s+WYCUYV 40xA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z11-20020a170903018b00b00153b2d16621si6496514plg.553.2022.05.12.02.53.48; Thu, 12 May 2022 02:54:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243988AbiEKNmv (ORCPT + 99 others); Wed, 11 May 2022 09:42:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48346 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231907AbiEKNmo (ORCPT ); Wed, 11 May 2022 09:42:44 -0400 Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 41A9B73568; Wed, 11 May 2022 06:42:43 -0700 (PDT) Received: by mail-yb1-f171.google.com with SMTP id x17so4109139ybj.3; Wed, 11 May 2022 06:42:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=U2CF1sPzePx4zRQ9D03pSv7PxSEBSI7R4J8C6F4VN9Q=; b=JwkLIvXl35qI8grBgCd+SiIb5faDWkJwuCPu2+BR7uWmgguPqKaUDPeo2uMEWjh+aO bYrXznvyUHKmC+BfnVkmjV4evN2+F8ZbzBsrbV9y04VP9volE9cOh3efky1n5FewJULI LtXZsveMtuO2iEQ7avpEvT/N6c/dLqhRCch8RmlIWicFNz2lk4rQBQ13njKxHZvwozh8 wFOkB7FLhF3RZSStl9/9/dnc5TIRJ95SRUF6Kcek1BxRcasIL0KUVjagfZf3wFmjZclM EExbyNGlnHkKiLyoJ1khPtQxd8AbzNa15xr3VASqytEDx932ghoblbDZeNFzM7tObtRQ N6nw== X-Gm-Message-State: AOAM533hEKkDfjK5Cm1PpZy5t36gVimMhbLT6I9fd4/ftFpp5z0EtP3o ssFrPiruGEjFEPvOJYox9mDBbniLUW5mJdgPzdI= X-Received: by 2002:a25:e792:0:b0:645:7ddb:b5eb with SMTP id e140-20020a25e792000000b006457ddbb5ebmr23579494ybh.482.1652276562503; Wed, 11 May 2022 06:42:42 -0700 (PDT) MIME-Version: 1.0 References: <20220510035259.5ep52sgahd2a6rie@vireshk-i7> <20220510154236.88753-1-schspa@gmail.com> <20220511043515.fn2gz6q3kcpdai5p@vireshk-i7> <20220511122114.wccgyur6g3qs6fps@vireshk-i7> In-Reply-To: From: "Rafael J. Wysocki" Date: Wed, 11 May 2022 15:42:31 +0200 Message-ID: Subject: Re: [PATCH v3] cpufreq: fix race on cpufreq online To: Schspa Shi , Viresh Kumar Cc: Linux Kernel Mailing List , Linux PM , "Rafael J. Wysocki" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 11, 2022 at 3:19 PM Rafael J. Wysocki wrote: > > On Wed, May 11, 2022 at 2:59 PM Rafael J. Wysocki wrote: > > > > On Wed, May 11, 2022 at 2:21 PM Viresh Kumar wrote: > > > > > > On 11-05-22, 16:10, Schspa Shi wrote: > > > > Viresh Kumar writes: > > > > > I am not sure, but maybe there were issues in calling init() with rwsem held, as > > > > > it may want to call some API from there. > > > > > > > > > > > > > I have checked all the init() implement of the fellowing files, It should be OK. > > > > Function find command: > > > > ag "init[\s]+=" drivers/cpufreq > > > > > > > > All the init() implement only initialize policy object without holding this lock > > > > and won't call cpufreq APIs need to hold this lock. > > > > > > Okay, we can see if someone complains later then :) > > > > > > > > I don't think you can do that safely. offline() or exit() may depend on > > > > > policy->cpus being set to all CPUs. > > > > OK, I will move this after exit(). and there will be no effect with those > > > > two APIs. But policy->cpus must be clear before release policy->rwsem. > > > > > > Hmm, I don't think depending on the values of policy->cpus is a good idea to be > > > honest. This design is inviting bugs to come in at another place. We need a > > > clear flag for this, a new flag or something like policy_list. > > Why? > > > > Also I see the same bug happening while the policy is removed. The kobject is > > > put after the rwsem is dropped. > > This shouldn't be a problem because of the wait_for_completion() in > cpufreq_policy_put_kobj(). It is known that cpufreq_sysfs_release() > has run when cpufreq_policy_put_kobj() returns, so it is safe to free > the policy then. > > > > > > static inline bool policy_is_inactive(struct cpufreq_policy *policy) > > > > > { > > > > > - return cpumask_empty(policy->cpus); > > > > > + return unlikely(cpumask_empty(policy->cpus) || > > > > > + list_empty(&policy->policy_list)); > > > > > } > > > > > > > > > > > > > I don't think this fully solves my problem. > > > > 1. There is some case which cpufreq_online failed after the policy is added to > > > > cpufreq_policy_list. > > > > > > And I missed that :( > > > > > > > 2. policy->policy_list is not protected by &policy->rwsem, and we > > > > can't relay on this to > > > > indict the policy is fine. > > > > > > Ahh.. > > > > > > > >From this point of view, we can fix this problem through the state of > > > > this linked list. > > > > But the above two problems need to be solved first. > > > > > > I feel overriding policy_list for this is going to make it complex/messy. > > > > > > Maybe something like this then: > > > > There are two things. > > > > One is the possible race with respect to the sysfs access occurring > > during failing initialization and the other is that ->offline() or > > ->exit() can be called with or without holding the policy rwsem > > depending on the code path. > > > > Namely, cpufreq_offline() calls them under the policy rwsem, but > > cpufreq_remove_dev() calls ->exit() outside the rwsem. Also they are > > called outside the rwsem in cpufreq_online(). > > > > Moreover, ->offline() and ->exit() cannot expect policy->cpus to be > > populated, because they are called when it is empty from > > cpufreq_offline(). > > > > So the $subject patch is correct AFAICS even though it doesn't address > > all of the above. Actually, I meant the last version of it, that is: https://patchwork.kernel.org/project/linux-pm/patch/20220510154236.88753-1-schspa@gmail.com/ > TBH, I'm not sure why show() doesn't check policy_is_inactive() under the rwsem. > > Moreover, I'm not sure why the locking dance in store() is necessary.