Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp5080397iob; Mon, 9 May 2022 08:14:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwAj7RXMThdbVMtfPamVKWjRgkzhOBHFddzTMpZy0Q2u0vQvlJqPbC6IxlzKwMyP5mMh4I6 X-Received: by 2002:a65:4807:0:b0:3aa:3050:e24 with SMTP id h7-20020a654807000000b003aa30500e24mr13553448pgs.299.1652109247713; Mon, 09 May 2022 08:14:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652109247; cv=none; d=google.com; s=arc-20160816; b=fk5KS2kACmOaO+Y1fZnAKH8E3LS7ljE6aU1znQj0StyW8LRgc+mn+ivZwP5+91ON6n vgULIFkcW+4Gx5xEIykSwBU0JSXbF07sTr0IKcZITRo+Pgj2GMYFyfsYYNQPyg9uYNJz K/F8L17LjOzI7uE61esqEKL3y1hXZcHSRI1uaXhjwVL9Jp4PWctN49Yp5Pj4/7djZBkT OiIbDayludkmcZgjOYHhxIGyCLIVJv9ayO90SSnCYqFYrKZQ/9/UAOEvjwLJUYs1TMYF kHc2dM7ud9vB6r7bR6DZgP0tOWU7mCsNetI1cLRgcNkFa1nTbGZiG0/RsKM6eNbbiksz VSnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=9ec2kuE8yrar64udn0GgU+kpHfgIkfX/pJO2edCpILI=; b=CCpTMqlhvFs3Jv07vbd2Q+FUaS277pHCgcQLZr7CbLf2Q9MvLY9ZVqRPdmRtBtQzco JkRV5o4KXN8EyeJ+sHPYyHNdVtiyFGOrAiwxG5qeQggSWHwDudNyT29EVZdjcthvuQzN k7kCvbNF71UtmsAKLkPcTT/mm6RKWtwqUoMds90cNw+b+f6QUwVpcpumo1KFrQ63Esll RCSuWqFWJne66LmkDQXbEA+Ij8IUXQAeSxkhdg6PaqG4CkPxizvOMU6LAcZto5B+8X9b cdg1UfT8UMtevTO8lUaDQNWpzxT8WTQpYGas4QMYL0zr4Gn9/3e/U9oGAocUJOoDehmW DCNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=lJPRcJ4P; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id nv6-20020a17090b1b4600b001dc758111afsi16341566pjb.108.2022.05.09.08.14.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 08:14:07 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=lJPRcJ4P; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EB15B1FA64; Mon, 9 May 2022 08:06:31 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237978AbiEIPKL (ORCPT + 99 others); Mon, 9 May 2022 11:10:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237897AbiEIPKI (ORCPT ); Mon, 9 May 2022 11:10:08 -0400 Received: from mail-oa1-x2b.google.com (mail-oa1-x2b.google.com [IPv6:2001:4860:4864:20::2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6A4713CD7; Mon, 9 May 2022 08:06:13 -0700 (PDT) Received: by mail-oa1-x2b.google.com with SMTP id 586e51a60fabf-deb9295679so15030223fac.6; Mon, 09 May 2022 08:06:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9ec2kuE8yrar64udn0GgU+kpHfgIkfX/pJO2edCpILI=; b=lJPRcJ4PECBfMkWJDEH7P2x50CIidv1gXJGMvzJjcxKDPPsMXummz30cPUifBm366t Zl1PD1O8KPWNAHA8iVAQvpeIwOoFouCCruNfvs/T0O/ng9Jspt2BAh2VKGXc0nT80qx6 Zdoqs+8+v3JMNsHkZdTzNONpkp72OAMvfaHDGtm74IbFH+4jcOjUzGvv0tFuAF7ULrWQ g+aiUvM8VytpnYwJssBSJ0Hwb3mXMsTCVDbiK50yKwF1UwnydKjSBfF4BnWiiNuBIKC7 WHKEs5x6btMm/YMzjtyqsvpAirI+xTIV4mLBtgIHSJHpNDNpcm+9yV9l/PSqfcNDGivx VbQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9ec2kuE8yrar64udn0GgU+kpHfgIkfX/pJO2edCpILI=; b=KWXZ0X60dW8e9dDcvgjYaN84iA3uO1qMGbtPCm6ujhTfKe2iIzOTgcydSwBg1C6RHA QVNi+Kd87uBmfJGdD7jbybn20P8TiHdq+f0CDMFYNG4iV39dnPUGLKEzHVezxUcoGePQ Ce+mf1x+6I6RP6sWEHd38lkvyuW42sZG7lOMPzNbRN6USQjw7wNBXH+pYVsgq82U+lvx pfdO8oJSOV+czkgSffc/vDMYjqI5hqBOlz7Vzbb8/Ptx+YXHf4522RNas7FXUZiZslBM 5SEXemyHa+oC/iTWWR+gnzpNhHm0+pN8Gg52U+KnKIg0Lpoqux4Z4+stBB47ulAqVF5O 5T6A== X-Gm-Message-State: AOAM532ikvIAnbI2SU1FDpL80M4XeabRa6CPRWiTaWu+qnb+7YFcJ5wy cJGVvlNhbHNc0ooDNRbyE9Ne0HuYGRyBzO+tXGhSKjbtnrM= X-Received: by 2002:a05:6871:611:b0:ed:9b5e:261f with SMTP id w17-20020a056871061100b000ed9b5e261fmr7720784oan.276.1652108773221; Mon, 09 May 2022 08:06:13 -0700 (PDT) MIME-Version: 1.0 References: <20220420191541.99528-1-schspa@gmail.com> <20220509035746.aeggm4cut2ewcmmk@vireshk-i7> In-Reply-To: <20220509035746.aeggm4cut2ewcmmk@vireshk-i7> From: Schspa Shi Date: Mon, 9 May 2022 23:06:01 +0800 Message-ID: Subject: Re: [PATCH] cpufreq: fix race on cpufreq online To: Viresh Kumar Cc: rafael@kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Viresh Kumar writes: > I had to dig the old patch first before starting to review what your > next one does. > > On 21-04-22, 03:15, Schspa Shi wrote: >> When cpufreq online failed, policy->cpus are not empty while >> cpufreq sysfs file available, we may access some data freed. >> >> Take policy->clk as an example: >> >> static int cpufreq_online(unsigned int cpu) >> { >> ... >> // policy->cpus != 0 at this time >> down_write(&policy->rwsem); >> ret = cpufreq_add_dev_interface(policy); > > Please keep some code to help understand where we went from here. I am > sure you meant that we will error out in this case, but you removed > the relevant code. > Yes, I will add this to the next version of patch. >> up_write(&policy->rwsem); >> >> return 0; >> >> out_destroy_policy: >> for_each_cpu(j, policy->real_cpus) >> remove_cpu_dev_symlink(policy, get_cpu_device(j)); >> up_write(&policy->rwsem); >> ... >> out_exit_policy: >> if (cpufreq_driver->exit) >> cpufreq_driver->exit(policy); >> clk_put(policy->clk); >> // policy->clk is a wild pointer >> ... >> ^ >> | >> Another process access >> __cpufreq_get >> cpufreq_verify_current_freq >> cpufreq_generic_get >> // acces wild pointer of policy->clk; >> | >> | >> out_offline_policy: | >> cpufreq_policy_free(policy); | >> // deleted here, and will wait for no body reference >> cpufreq_policy_put_kobj(policy); >> } >> >> Signed-off-by: Schspa Shi >> --- >> drivers/cpufreq/cpufreq.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index 80f535cc8a75..0d58b0f8f3af 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -1533,8 +1533,6 @@ static int cpufreq_online(unsigned int cpu) >> for_each_cpu(j, policy->real_cpus) >> remove_cpu_dev_symlink(policy, get_cpu_device(j)); >> >> - up_write(&policy->rwsem); >> - >> out_offline_policy: >> if (cpufreq_driver->offline) >> cpufreq_driver->offline(policy); >> @@ -1543,6 +1541,9 @@ static int cpufreq_online(unsigned int cpu) >> if (cpufreq_driver->exit) >> cpufreq_driver->exit(policy); >> >> + cpumask_clear(policy->cpus); >> + up_write(&policy->rwsem); > > This is simply buggy as now an error out to out_offline_policy or > out_exit_policy will try to release a semaphore which was never taken > in the first place. This works fine only if we failed late, i.e. via > out_destroy_policy. > I am very sorry for this oversight. To fix this issue, there is no need to move cpufreq_driver->exit(policy) and cpufreq_driver->offline(policy) to inside of &policy->rwsem. I made this change because they are inside of &policy->rwsem write lock at cpufreq_offline. I think we should keep offline & exit call inside of policy->rwsem for better symmetry. static int cpufreq_offline(unsigned int cpu) { ... down_write(&policy->rwsem); ... /* * Perform the ->offline() during light-weight tear-down, as * that allows fast recovery when the CPU comes back. */ if (cpufreq_driver->offline) { cpufreq_driver->offline(policy); } else if (cpufreq_driver->exit) { cpufreq_driver->exit(policy); policy->freq_table = NULL; } unlock: up_write(&policy->rwsem); return 0; } > The very first thing we need to do now is revert this patch. Lemme > send a patch for that and you can send a fresh fix over that once you > have a stable fix. For the next version of the stable fix, I'd be willing to keep exit and offline calls inside of policy->rwsem. But it's OK for me to keep offline & exit calls outside of policy->rwsem. --- BRs Schspa Shi