Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp3104239ybi; Mon, 17 Jun 2019 16:42:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqwLVLttPUFk9nNS0UHeEBIL+ycs9xt0fBt7NJulAuLmuWTWROZscPB7QKzZPlIdjgqQW4Vs X-Received: by 2002:a17:90a:1c17:: with SMTP id s23mr1875848pjs.108.1560814932739; Mon, 17 Jun 2019 16:42:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560814932; cv=none; d=google.com; s=arc-20160816; b=LrprzNFcICEHJPsOtwXR/V+pINl/GB6/bpbDqWBi4OZdzzKjCRnbdgbY7M8d3kMkzS DRDeSOX3cCEGSGFdTHD52uixmLvGY1osyTBIfHcxN4pttoke48eOvUWtuJUO7kf8a5c6 GtoArmaUvaLt+LFE35SXXxYtTpM75BlpT18eFvbGK6/G1JpJ+pbeKlrrKZRW9LrSlIVN gHwDdi3qMpIyca+6LPvpMihWLosqyeEwO+ueUtmcGHFYPRIsqM0OOtvTMdPXNg7KTrHr YZEMbA+4/tL7cPDa4Ca76JA5qFE8fE41egqs0OygXZ+R5PQ/mdEcqxZ0q/ekZflsEBk/ h1MQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=PkOYi+QtNmokv1IxPfoOCf8vuAgX7taot0ooYy8LhUs=; b=cEadLI2aoKzKxOFRx3cBtlSU80cI8fqBHs4m7UBlTybeJrshX8yymmWM3dkKZjJ89G pK5WK7nyNt527cknkLxCd+fRHFLXxO87M3YBFL5UOXdTlIJp6cGer3wWsyAqMW/v7lev 9t3KqPLIo+U5V+8bgHRSxsWfp94Gq1yfdS8JZBiqy6v6AkdFQe88hwyVotb2JmpVYub0 Qx3N5OB6ctUa3l5HayGV7+PQhykDYo0S5LBB2EvzCR4ZpXMhNhkC2c/FTqwQQ5B51Ns1 CtksRsCk2aR1q5zo+xwZQZi/xnvHCmyPPqjE9saVDRG+lfZyQN58ZPHAa58MhLDKuUYD a07g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=m6qCyMZD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si11387832pli.427.2019.06.17.16.41.57; Mon, 17 Jun 2019 16:42:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=m6qCyMZD; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728040AbfFQXlw (ORCPT + 99 others); Mon, 17 Jun 2019 19:41:52 -0400 Received: from mail.kernel.org ([198.145.29.99]:42114 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726724AbfFQXlw (ORCPT ); Mon, 17 Jun 2019 19:41:52 -0400 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CD7AC21530 for ; Mon, 17 Jun 2019 23:41:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1560814911; bh=pIjyNyWPgEfxOgkE5R5Wmxea9h4OvnpDOVDODY7GIEE=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=m6qCyMZDbBid3RYL83C1MlGxMkh6jjCC3XZh9EOQlqrRBcQfPQrT2iqn6PqyhInGj e4rDm8agWGdC+4O5Mv+PBNbe8vZ9MTwNVZ00KRK4ViOqo6KennYcjLkpSkCyXbeCWO Ia5+INYTGg0VG7QVbP2z4MUbb/Yo3mW1EKxMXPr4= Received: by mail-wr1-f52.google.com with SMTP id n4so11821721wrw.13 for ; Mon, 17 Jun 2019 16:41:50 -0700 (PDT) X-Gm-Message-State: APjAAAWUDCZrBe5juJEPOKmSuL8TtH4RZg1+noesQUIcRxa3KRPu9VwG PmCNwaPabxY7j2CuV59tWBjFzAbbXSyuTQfhvWkRcA== X-Received: by 2002:a5d:6a42:: with SMTP id t2mr13574335wrw.352.1560814909361; Mon, 17 Jun 2019 16:41:49 -0700 (PDT) MIME-Version: 1.0 References: <1559944837-149589-1-git-send-email-fenghua.yu@intel.com> <1559944837-149589-4-git-send-email-fenghua.yu@intel.com> <20190610035302.GA162238@romley-ivt3.sc.intel.com> <20190610060234.GD162238@romley-ivt3.sc.intel.com> <20190617202702.GB217081@romley-ivt3.sc.intel.com> <20190617231104.GF217081@romley-ivt3.sc.intel.com> In-Reply-To: <20190617231104.GF217081@romley-ivt3.sc.intel.com> From: Andy Lutomirski Date: Mon, 17 Jun 2019 16:41:38 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v4 3/5] x86/umwait: Add sysfs interface to control umwait C0.2 state To: Fenghua Yu Cc: Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Ashok Raj , Tony Luck , Ravi V Shankar , linux-kernel , x86 Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 17, 2019 at 4:20 PM Fenghua Yu wrote: > > On Mon, Jun 17, 2019 at 04:02:50PM -0700, Andy Lutomirski wrote: > > On Mon, Jun 17, 2019 at 1:36 PM Fenghua Yu wrote: > > > > > > On Mon, Jun 10, 2019 at 06:41:31AM -0700, Andy Lutomirski wrote: > > > > > > > > > > > > > On Jun 9, 2019, at 11:02 PM, Fenghua Yu wrote: > > > > > > > > > >> On Sun, Jun 09, 2019 at 09:24:18PM -0700, Andy Lutomirski wrote: > > > > >>> On Sun, Jun 9, 2019 at 9:02 PM Fenghua Yu wrote: > > > > >>> > > > > >>>> On Sat, Jun 08, 2019 at 03:50:32PM -0700, Andy Lutomirski wrote: > > > > >>>>> On Fri, Jun 7, 2019 at 3:10 PM Fenghua Yu wrote: > > > > >>>>> > > > > >>>>> C0.2 state in umwait and tpause instructions can be enabled or disabled > > > > >>>>> on a processor through IA32_UMWAIT_CONTROL MSR register. > > > > >>>>> > > > > >>>>> By default, C0.2 is enabled and the user wait instructions result in > > > > >>>>> lower power consumption with slower wakeup time. > > > > >>>>> > > > > >>>>> But in real time systems which require faster wakeup time although power > > > > >>>>> savings could be smaller, the administrator needs to disable C0.2 and all > > > > >>>>> C0.2 requests from user applications revert to C0.1. > > > > >>>>> > > > > >>>>> A sysfs interface "/sys/devices/system/cpu/umwait_control/enable_c02" is > > > > >>>>> created to allow the administrator to control C0.2 state during run time. > > > > >>>> > > > > >>>> This looks better than the previous version. I think the locking is > > > > >>>> still rather confused. You have a mutex that you hold while changing > > > > >>>> the value, which is entirely reasonable. But, of the code paths that > > > > >>>> write the MSR, only one takes the mutex. > > > > >>>> > > > > >>>> I think you should consider making a function that just does: > > > > >>>> > > > > >>>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0); > > > > >>>> > > > > >>>> and using it in all the places that update the MSR. The only thing > > > > >>>> that should need the lock is the sysfs code to avoid accidentally > > > > >>>> corrupting the value, but that code should also use WRITE_ONCE to do > > > > >>>> its update. > > > > >>> > > > > >>> Based on the comment, the illustrative CPU online and enable_c02 store > > > > >>> functions would be: > > > > >>> > > > > >>> umwait_cpu_online() > > > > >>> { > > > > >>> wrmsr(MSR_IA32_UMWAIT_CONTROL, READ_ONCE(umwait_control_cached), 0); > > > > >>> return 0; > > > > >>> } > > > > >>> > > > > >>> enable_c02_store() > > > > >>> { > > > > >>> mutex_lock(&umwait_lock); > > > > >>> umwait_control_c02 = (u32)!c02_enabled; > > > > >>> WRITE_ONCE(umwait_control_cached, 2 | get_umwait_control_max_time()); > > > > >>> on_each_cpu(umwait_control_msr_update, NULL, 1); > > > > >>> mutex_unlock(&umwait_lock); > > > > >>> } > > > > >>> > > > > >>> Then suppose umwait_control_cached = 100000 initially and only CPU0 is > > > > >>> running. Admin change bit 0 in MSR from 0 to 1 to disable C0.2 and is > > > > >>> onlining CPU1 in the same time: > > > > >>> > > > > >>> 1. On CPU1, read umwait_control_cached to eax as 100000 in > > > > >>> umwait_cpu_online() > > > > >>> 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store() > > > > >>> 3. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online() > > > > >>> 4. On CPU0, wrmsr with 100001 in enabled_c02_store() > > > > >>> > > > > >>> The result is CPU0 and CPU1 have different MSR values. > > > > >> > > > > >> Yes, but only transiently, because you didn't finish your example. > > > > >> > > > > >> Step 5: enable_c02_store() does on_each_cpu(), and CPU 1 gets updated. > > > > > > > > > > There is no sync on wrmsr on CPU0 and CPU1. > > > > > > > > What do you mean by sync? > > > > > > > > > So a better sequence to > > > > > describe the problem is changing the order of wrmsr: > > > > > > > > > > 1. On CPU1, read umwait_control_cached to eax as 100000 in > > > > > umwait_cpu_online() > > > > > 2. On CPU0, write 100001 to umwait_control_cached in enable_c02_store() > > > > > 3. On CPU0, wrmsr with 100001 in on_each_cpu() in enabled_c02_store() > > > > > 4. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online() > > > > > > > > > > So CPU1 and CPU0 have different MSR values. This won't be transient. > > > > > > > > You are still ignoring the wrmsr on CPU1 due to on_each_cpu(). > > > > > > > > > > Initially umwait_control_cached is 100000 and CPU0 is online while CPU1 > > > is going to be online: > > > > > > 1. On CPU1, cpu_online_mask=0x3 in start_secondary() > > > 2. On CPU1, read umwait_control_cached to eax as 100000 in umwait_cpu_online() > > > 3. On CPU0, write 100001 to umwait_control_cached in enable_c02_store() > > > 4. On CPU0, execute one_each_cpu() in enabled_c02_store(): > > > wrmsr with 100001 on CPU0 > > > wrmsr with 100001 on CPU1 > > > 5. On CPU1, wrmsr with eax=100000 in umwaint_cpu_online() > > > > > > So the MSR is 100000 on CPU1 and 100001 on CPU0. The MSRs are different on > > > the CPUs. > > > > > > Is this a right sequence to demonstrate locking issue without the mutex > > > locking? > > > > > > > Fair enough. I would fix it differently, though: > > > > static void update_this_cpu_umwait_msr(void) > > { > > WARN_ON_ONCE(!irqs_disabled()); /* or local_irq_save() */ > > > > /* We need to prevent umwait_control from being changed *and* > > completing its WRMSR between our read and our WRMSR. By turning IRQs > > off here, we ensure that no sysfs write happens on this CPU and we > > also make sure that any concurrent sysfs write from a different CPU > > will not finish updating us via IPI until we're done. */ > > wrmsrl(MSR_..., READ_ONCE(umwait_control), 0); > > } > > If no other objections, then I will keep the current mutex lock/unlock to > protect wrmsr and the umwait_control_cached variable. > I don't think that's sufficient. In your current code, you hold the mutex in some places and not in others, and there's no explanation. And I think you're relying on the IRQs-off protection in at least one code path already, so you're not gaining any simplicity. At the very least, you need to add some extensive comments everywhere if you want to keep the mutex, but I think it's simpler and clearer if you just use the same logic everywhere, for example, as I proposed above.