Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp675331iob; Thu, 12 May 2022 02:09:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzCOPlBYoAoa9MKTlSTqayS9iwfrzW5+F5DLSvoEl4EAkxDIjYUwlqffmw8B59ZCjCc+nBW X-Received: by 2002:a05:6a00:1914:b0:50e:128e:ef42 with SMTP id y20-20020a056a00191400b0050e128eef42mr28847014pfi.65.1652346581772; Thu, 12 May 2022 02:09:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652346581; cv=none; d=google.com; s=arc-20160816; b=bykW/oUQmNhsnNLCeuUDO3gYgRGDr+wxgYtmi5jwPHkiDNzOPOkZ7fK0ljxWg40N0U bC3tLJuOnJt37FWXuSQZVobvG4P2pSnfisY4JsqiOq0/PS/x/D8l/1D5/DyIdZKtxYDg Fl46uQMnIo5Z+J3t/6P+V0+ql4tkXnkxiumga5rovSvbL4uJSMyp4SJ3OfmJj8wK+NsE OmYL5kg/trcFv7zIMwxGUZqprDjZSakaTxma9ZzUJRA6lguJqaUSpjG8BFn7BvfbmTND uzxhZuGqG7Xsdfy8fG5BsWQS/wOB+hqU0VUsUCRf2qnjxgsWYv1D7/m625MuPMReWlgQ J61A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=LmKpLR3KpqpjhFrNdA9cs+ta9/b/YlHR5WXRY1TMskg=; b=SBegSKUeO838waWcrG7kxPDJh+SiVpUz1k2wjVlKZ+hf/ixF7zge5D3MWxLa7p9DUl 1gkVdu6G8HFRlg0yCIjnaVAmmCQPDDzTQ146AJQguSpoiaE/cJMKmGxOhBJH21BjaKgv FybtGAcouZIeuMKm5CaZoNq3QjEDdr1xlHxQPmPB5eWgpkRKbtp/VdfeghgMyv72uc+T rxDRCHHcksp2yCMFn2hhidqgHKq/T8JTDCIWbRwcLOA4rqz3voGgWCiR9syKFBz6A3qI 6AmfzIy3OaJIAeYPxRxltPZOVFPjB5PJhu7sLosbZdxT6CetLYmJxUyRLNNDaXG36UYK d/TQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ANJ9TCDF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bd34-20020a056a0027a200b0050605f7112bsi5427675pfb.130.2022.05.12.02.09.27; Thu, 12 May 2022 02:09:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ANJ9TCDF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244014AbiEKNm5 (ORCPT + 99 others); Wed, 11 May 2022 09:42:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243999AbiEKNmy (ORCPT ); Wed, 11 May 2022 09:42:54 -0400 Received: from mail-oa1-x32.google.com (mail-oa1-x32.google.com [IPv6:2001:4860:4864:20::32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FEB613483C; Wed, 11 May 2022 06:42:52 -0700 (PDT) Received: by mail-oa1-x32.google.com with SMTP id 586e51a60fabf-ed9ac77cbbso2869592fac.1; Wed, 11 May 2022 06:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=LmKpLR3KpqpjhFrNdA9cs+ta9/b/YlHR5WXRY1TMskg=; b=ANJ9TCDF1OKYyfNeOvsA8MBtwBpDguU0QQxIWOGUvudSGL5OmTzf3AVmw4kvGJmDea ve2oPAAYI0Ss6dh3PSe0LFhX3509IZikdKJMzhWsCwlCqkhtiCRvKi23KgfNs2nLugII eQqd7Z1F5Ajc1aAZfv8MKrudg8AhoFrjjq/kLnXTG6s9DQHdUibUQPKvS57wzt+pD8rQ KL1QrX2Tix9PrcbC/n00g1ZZnakZZgX6oIg+fw7ZU3O7CjTXnzETh2gvWDH1Grdrn6iL UiAJLysUmVb3s7FqaLY0DIAu65gRryHR3WegDXnBvYh4LQU8fTVBnwC4QfyFbRasiF83 D5WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=LmKpLR3KpqpjhFrNdA9cs+ta9/b/YlHR5WXRY1TMskg=; b=vzPh6gznmeJAU+tT3vBROxrImi6APUlOLV6mdQ3Hy+9oBCP6Q8BH1i0z1IGohVzWVQ oKyLOtA9SOpDKsZT4zYWRHWQ/WQTR4A+BlWMtPalCyB32MnuBcQUl1G35xrI8pU/L0qe 2hFMc7t3PNWryWrJiHL9fMd6PVXdl3RnS05/MNjzCZH88KbkQtbaoxgjyR4IJQBKVRqE MhvPSIawWhYMJ75L2SbTnYXYw7AadCqz7Hj0sMuqvozNXIOFhBP5S60UPLSp61tg9RWQ fc1PVPftakt18S5bSzWnxMhiUnmkQO/cWxfWvND8FlXiQL/CKuTTVfjVKTCpcrY9VFjH lBdw== X-Gm-Message-State: AOAM532/Yj6WFu51zjDaAv5ysDX/KhwVnFiAAouwul3YvPBapwMTKWWL 9AmqiRLKhTsyW6K1TdEBKczFDuouLgi95jHIhuKlZk8yUZa6kQ== X-Received: by 2002:a05:6870:5b89:b0:e9:bb4c:a6f1 with SMTP id em9-20020a0568705b8900b000e9bb4ca6f1mr2767148oab.52.1652276571601; Wed, 11 May 2022 06:42:51 -0700 (PDT) MIME-Version: 1.0 References: <20220510035259.5ep52sgahd2a6rie@vireshk-i7> <20220510154236.88753-1-schspa@gmail.com> <20220511043515.fn2gz6q3kcpdai5p@vireshk-i7> <20220511122114.wccgyur6g3qs6fps@vireshk-i7> In-Reply-To: From: Schspa Shi Date: Wed, 11 May 2022 21:42:40 +0800 Message-ID: Subject: Re: [PATCH v3] cpufreq: fix race on cpufreq online To: "Rafael J. Wysocki" Cc: Viresh Kumar , Linux Kernel Mailing List , Linux PM Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org "Rafael J. Wysocki" writes: > On Wed, May 11, 2022 at 2:59 PM Rafael J. Wysocki wrote: >> >> On Wed, May 11, 2022 at 2:21 PM Viresh Kumar wrote: >> > >> > On 11-05-22, 16:10, Schspa Shi wrote: >> > > Viresh Kumar writes: >> > > > I am not sure, but maybe there were issues in calling init() with rwsem held, as >> > > > it may want to call some API from there. >> > > > >> > > >> > > I have checked all the init() implement of the fellowing files, It should be OK. >> > > Function find command: >> > > ag "init[\s]+=" drivers/cpufreq >> > > >> > > All the init() implement only initialize policy object without holding this lock >> > > and won't call cpufreq APIs need to hold this lock. >> > >> > Okay, we can see if someone complains later then :) >> > >> > > > I don't think you can do that safely. offline() or exit() may depend on >> > > > policy->cpus being set to all CPUs. >> > > OK, I will move this after exit(). and there will be no effect with those >> > > two APIs. But policy->cpus must be clear before release policy->rwsem. >> > >> > Hmm, I don't think depending on the values of policy->cpus is a good idea to be >> > honest. This design is inviting bugs to come in at another place. We need a >> > clear flag for this, a new flag or something like policy_list. > > Why? > >> > Also I see the same bug happening while the policy is removed. The kobject is >> > put after the rwsem is dropped. > > This shouldn't be a problem because of the wait_for_completion() in > cpufreq_policy_put_kobj(). It is known that cpufreq_sysfs_release() > has run when cpufreq_policy_put_kobj() returns, so it is safe to free > the policy then. > >> > > > static inline bool policy_is_inactive(struct cpufreq_policy *policy) >> > > > { >> > > > - return cpumask_empty(policy->cpus); >> > > > + return unlikely(cpumask_empty(policy->cpus) || >> > > > + list_empty(&policy->policy_list)); >> > > > } >> > > > >> > > >> > > I don't think this fully solves my problem. >> > > 1. There is some case which cpufreq_online failed after the policy is added to >> > > cpufreq_policy_list. >> > >> > And I missed that :( >> > >> > > 2. policy->policy_list is not protected by &policy->rwsem, and we >> > > can't relay on this to >> > > indict the policy is fine. >> > >> > Ahh.. >> > >> > > >From this point of view, we can fix this problem through the state of >> > > this linked list. >> > > But the above two problems need to be solved first. >> > >> > I feel overriding policy_list for this is going to make it complex/messy. >> > >> > Maybe something like this then: >> >> There are two things. >> >> One is the possible race with respect to the sysfs access occurring >> during failing initialization and the other is that ->offline() or >> ->exit() can be called with or without holding the policy rwsem >> depending on the code path. >> >> Namely, cpufreq_offline() calls them under the policy rwsem, but >> cpufreq_remove_dev() calls ->exit() outside the rwsem. Also they are >> called outside the rwsem in cpufreq_online(). >> >> Moreover, ->offline() and ->exit() cannot expect policy->cpus to be >> populated, because they are called when it is empty from >> cpufreq_offline(). >> >> So the $subject patch is correct AFAICS even though it doesn't address >> all of the above. > > TBH, I'm not sure why show() doesn't check policy_is_inactive() under the rwsem. > There is a exist bugs, and somebody try to fixed, please see commit Fixes: 2f66196208c9 ("cpufreq: check if policy is inactive early in __cpufreq_get()") > Moreover, I'm not sure why the locking dance in store() is necessary. The store interface hold cpu_hotplug_lock via cpus_read_trylock(); , cannot run in parallel with cpufreq_online() & cpufreq_offline(). --- BRs Schspa Shi