Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp7763267rwd; Tue, 6 Jun 2023 16:15:27 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6DCJ7F/vk8tHiuGAit0tOARrir9XpDlJQTfpOV8+emlmZBQUJeHo0rkjFF8o4KyZQdF1XL X-Received: by 2002:a05:6e02:151:b0:335:7e65:dfa with SMTP id j17-20020a056e02015100b003357e650dfamr4761798ilr.30.1686093327718; Tue, 06 Jun 2023 16:15:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686093327; cv=none; d=google.com; s=arc-20160816; b=G9//mTjK4uuqXC6YaD/CN8HoJbuPiQc+gR2XoHUViB+l16zlmnd/xXB4HD0tkeKCXD 9VScTyeWYu+dxwMKAGyTLgwgdY2GdZVYoOOh79Bo58HwL51lm4bmuRHFsxAZhFIMCCYD F85JtnGfeEW7CGIoNgKxl4xKiqASwqZ3xCNeNeyok2jNMP+10qebU1qCCFSmC5ewunn7 yJ8LVwiiFSfWqoUi9HvJUWoNtkv7VagMK/dCbs59Zr7w2mTrPGS/pc4xFQ6e71XF50Sl J+zwzC3MAoiwSu/FxQ+X+XEru9xLcNHuMYIyhHqrrb7lY8K4wDsC8XUMsLvIOSlHghG1 KQ9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=KDNeipcrG/FC01ibfxC2YJ7aaXE1cUL268eOBCkQVsw=; b=klYE3/IZT3fUg/Ks1k0yD8OzvZA+fcGai/LHjiJ84qgAOzYzHL2257K8P8ZEID531z ORXoRJpn5cOosqScZ8nvv+BAc+3y72kpLhIcJkGu39c6SQlGFjjXp9wGKDVd0jnHuicP J2qFNtg7J34u1yM72EEIvk+cl02JrthLrBJ1SrlQGFHpr7VglhDjUwTC9Wsfl3jxByga EfN4YrxQT4vlQf9ymtTxCYywXrN+sxoyQ4F/ZKfjxfw7SvUWhWuZ+kllPR4nrdcKP0c+ 0Dkh41vhrzw5fiLaW1Pd+NniMC4VgKCiaLyJv7uTSu7m9QKfyO9vkNmIZfSlcwUDRY6/ fqCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=M4fVxmsz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y30-20020a63b51e000000b0052cb473f95esi7746941pge.214.2023.06.06.16.15.13; Tue, 06 Jun 2023 16:15:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=M4fVxmsz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240176AbjFFXHK (ORCPT + 99 others); Tue, 6 Jun 2023 19:07:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240177AbjFFXHI (ORCPT ); Tue, 6 Jun 2023 19:07:08 -0400 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7AC04E7E for ; Tue, 6 Jun 2023 16:07:07 -0700 (PDT) Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1b0427acfc3so24583855ad.0 for ; Tue, 06 Jun 2023 16:07:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686092827; x=1688684827; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KDNeipcrG/FC01ibfxC2YJ7aaXE1cUL268eOBCkQVsw=; b=M4fVxmszIMy2OeC29oPt4CNRr6vvBhxdssqEX54sx7F5yZY9AlT8PM6trIsIsNYh2m zhS0OIGFWvNZ7CDdZM/l1mWk4Q3ogJHa5KGU77Mz+rkZxTuc9UNuOLvgPFry3K7rNfNL CpT27bjhgzhhXRkfEw3hSK0DduoB8srC5cm4VK1POAZa/PSqcymJP3g72A1d5Ko0Hu5S cgoEu4DfN5hLVcrJtsFfVQ/ehgnk68jH83wCB9j0FHjwIbLyY/E8bOBqRO4Mrey37mEJ fXiDnGGij+Yg1ep/nBUwUuH38iLQJ7bOG7Ir77Li/WlIc+xIeQjXiLaTUhzRyo915FkG /9hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686092827; x=1688684827; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KDNeipcrG/FC01ibfxC2YJ7aaXE1cUL268eOBCkQVsw=; b=W9/E/hI41nNwsoSkgyxfWFvCyYgB00HozFkqfHjcXsoyOCan8HDL1OxdVkbhAhNAAH l1DnLHHJSkuQ/j45fYQ3oCC4wKAg+nVIE1eX1HLITzq7TvIENRjWJ5MquGEutA44BEoj i9UuxkNYpf96thpFs/TZh2u0g/j/FwXWpOJogm4EU42FrtazeyjwRACfOGDwaK49UuBT zt0LyKWKDrZ2MinDpJ6JiVjyfKEt59j0ZuP5GkLgit1BtYlZFzWICPzJtGtgEWKM8js5 lXsM1CPnpipd2eg1uAx79zF1AUst81fYy6U/vRIcfSIed+rZPVXBRcSnuX02qL4sdJJs JnaA== X-Gm-Message-State: AC+VfDyYt+kUB86YHXTY4Skn1GzOXMMvMDFKQ3b0LwbZAeh99jS1PNJb El1iAbAywMIZvjc5MfTk/hmRKCDJXeI= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90a:bb18:b0:253:3eb4:c1f2 with SMTP id u24-20020a17090abb1800b002533eb4c1f2mr945446pjr.5.1686092826879; Tue, 06 Jun 2023 16:07:06 -0700 (PDT) Date: Tue, 6 Jun 2023 16:07:05 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230605004334.1930091-1-mizhang@google.com> Message-ID: Subject: Re: [PATCH] KVM: x86/mmu: Remove KVM MMU write lock when accessing indirect_shadow_pages From: Sean Christopherson To: Mingwei Zhang Cc: Jim Mattson , Paolo Bonzini , "H. Peter Anvin" , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 06, 2023, Mingwei Zhang wrote: > > > > > > > > I don't understand the need for READ_ONCE() here. That implies that > > > > there is something tricky going on, and I don't think that's the case. > > > > > > READ_ONCE() is just telling the compiler not to remove the read. Since > > > this is reading a global variable, the compiler might just read a > > > previous copy if the value has already been read into a local > > > variable. But that is not the case here... > > > > > > Note I see there is another READ_ONCE for > > > kvm->arch.indirect_shadow_pages, so I am reusing the same thing. > > > > I agree with Jim, using READ_ONCE() doesn't make any sense. I suspect it may have > > been a misguided attempt to force the memory read to be as close to the write_lock() > > as possible, e.g. to minimize the chance of a false negative. > > Sean :) Your suggestion is the opposite with Jim. He is suggesting > doing nothing, but your suggestion is doing way more than READ_ONCE(). Not really. Jim is asserting that the READ_ONCE() is pointless, and I completely agree. I am also saying that I think there is a real memory ordering issue here, and that it was being papered over by the READ_ONCE() in kvm_mmu_pte_write(). > > So I think this? > > Hmm. I agree with both points above, but below, the change seems too > heavyweight. smp_wb() is a mfence(), i.e., serializing all > loads/stores before the instruction. Doing that for every shadow page > creation and destruction seems a lot. No, the smp_*b() variants are just compiler barriers on x86. > In fact, the case that only matters is '0->1' which may potentially > confuse kvm_mmu_pte_write() when it reads 'indirect_shadow_count', but > the majority of the cases are 'X => X + 1' where X != 0. So, those > cases do not matter. So, if we want to add barriers, we only need it > for 0->1. Maybe creating a new variable and not blocking > account_shadow() and unaccount_shadow() is a better idea? > > Regardless, the above problem is related to interactions among > account_shadow(), unaccount_shadow() and kvm_mmu_pte_write(). It has > nothing to do with the 'reexecute_instruction()', which is what this > patch is about. So, I think having a READ_ONCE() for > reexecute_instruction() should be good enough. What do you think. The reexecute_instruction() case should be fine without any fanciness, it's nothing more than a heuristic, i.e. neither a false positive nor a false negative will impact functional correctness, and nothing changes regardless of how many times the compiler reads the variable outside of mmu_lock. I was thinking that it would be better to have a single helper to locklessly access indirect_shadow_pages, but I agree that applying the barriers to reexecute_instruction() introduces a different kind of confusion. Want to post a v2 of yours without a READ_ONCE(), and I'll post a separate fix for the theoretical kvm_mmu_pte_write() race? And then Paolo can tell me that there's no race and school me on lockless programming once more ;-)