Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp1624865pxy; Thu, 29 Apr 2021 10:46:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyWSxMqqDKscNNiKiFvmWdKoOpJ2H2eta9/5cLiIjw01dqBaoiLTcY3TYiAQ0JUP2CK9o7D X-Received: by 2002:a17:906:b251:: with SMTP id ce17mr1083880ejb.149.1619718407404; Thu, 29 Apr 2021 10:46:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619718407; cv=none; d=google.com; s=arc-20160816; b=QxzvZ6vMyvLtCzSdwCUEmn8rwkCHRiCxAlDdzJUPowFPenQx3tVA95G7+3QDEoioVP pa6+/RZ5/K7MmFzbkRedqPd/DL1vLl0biKyCg+2aMOuR8fKH3gmL05aLVySOK2yaBUy9 tCSG2lQru5DVDzocA0UcLLiZ/YnBNRuhW2+WuBw8uOPOtC0wfOsgOiOxh32EIoFWsUnb esxx3dzrdpM3Vl9R5YUwrnWBCDoFHSMRHZgFkfpLwegjL7nubW/HYaW1guuC21hQILHd MATVdaixUjLZzC5OhSnvd+39CkmUO9bKiDXJAO6bdQW+/laT0sq/WYHDmqC3JpprCnTq 4KXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=GbYmESgQnmInPmbhpZ7rWt4zsWsa0tFAztscShVWN9U=; b=hf7YAjD2cBLzS6mwLuYC9bfWGJnHD7wrNkHZRE8/SQNuWpygoF3gncHxrUHN9KJSYT SD+YS7UOz/i+uJFa7IRfyMocHmMHv25Z7MOkus4s9DB+pUGNHscOIHrURuc7mydXfxn1 aRGjLoMRssyPCGeNdNtaWPdCZo8j9lu6GENUrOI9UZn+hCMRfEWo7ZNjlSznDFd40nKd uHCxoP0wDNsXvNO7t/mdsp2e8d8cE/BGTJWVJo/vqACzq+h682iY9ZOa+Z9edVwmYOaJ 7sE9elYDh+So76shtDCX09doBs7dVMyfckbO+NJ43YWdsyMGo7/qZuaCot+F+zmrcV00 BL1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vo2hrfvn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id zg23si715023ejb.163.2021.04.29.10.46.22; Thu, 29 Apr 2021 10:46:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=vo2hrfvn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240968AbhD2RqK (ORCPT + 99 others); Thu, 29 Apr 2021 13:46:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240931AbhD2RqJ (ORCPT ); Thu, 29 Apr 2021 13:46:09 -0400 Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4620BC06138C for ; Thu, 29 Apr 2021 10:45:21 -0700 (PDT) Received: by mail-ot1-x336.google.com with SMTP id d3-20020a9d29030000b029027e8019067fso60216382otb.13 for ; Thu, 29 Apr 2021 10:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GbYmESgQnmInPmbhpZ7rWt4zsWsa0tFAztscShVWN9U=; b=vo2hrfvnOI6K5Ln4dS+wWzLtKsRVK+bII/RzGZi/zH190Occ5Gd1Ik+idDOaOpnbta 3tO3kUKyXRI4aBXIlJvFthSwTUmxRPSBoX8Mfw+PdFgqvn8fBB+6wR3GceKKQV66mJKm EQmKQClJUnZRdiy1R/LlPTUbXPcqlbmypPtTLlOResxjAy2PLcMqkHMuO/c0QBLd1dPX AdoUVG7nWCgh6hO+xG5DpB8AIHPuNzVYkW7B84gY0zdzKezbb4LvklN/3zzC4ghTMha6 dH84LoDHmRSAM+YwqE6XmtbWHLg/QNmi9ZYhst4CaC9uwj7PgZwudqsFQ4vyUgX52NLS W9/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GbYmESgQnmInPmbhpZ7rWt4zsWsa0tFAztscShVWN9U=; b=NBm0dOF9rrreivynsKtdIWEXu4aa00fWmQxnEFxTy6duKeffapviNeugCDbbx2Fp2N DD+z1ZcXSDBCGU8dIof1FgMH9qwhBr3JcsGL9FbNMqYF48CJ7zDJZEcWx+oZYxVZ3g7S i5iXrrlu9vO8hFzi7UD03rAx56/CzNR7fRmfaMe/k2PznojWv54NEn2eMIl+oqCBf6IT U/EwBVIlz4GIvh1aAZITthxue8h2Zeyc+VZB1eW0marbzaxFR1EjXiV1y3askyc5u6NY H/Ei89k9eNxiOuRCpmT7S8k13WOWi5DcK8gcXo7CuSE4XXGKtVtcy2J1x4QjZt+SuxFl G+Aw== X-Gm-Message-State: AOAM530vPn5C9m5VGxtcTO0ziWNweyS5g+ZRgwbgCBngy5DjwVT9+MMr P4XS2v7q/bIEVf3NO5R9ps1751+J5ilAkrl3Qx5sQIVs4wE= X-Received: by 2002:a9d:7857:: with SMTP id c23mr453371otm.208.1619718318872; Thu, 29 Apr 2021 10:45:18 -0700 (PDT) MIME-Version: 1.0 References: <20210427223635.2711774-1-bgardon@google.com> <20210427223635.2711774-6-bgardon@google.com> <997f9fe3-847b-8216-c629-1ad5fdd2ffae@redhat.com> <5b4a0c30-118c-da1f-281c-130438a1c833@redhat.com> <16b2f0f3-c9a8-c455-fff0-231c2fe04a8e@redhat.com> <623c2305-91ae-4617-357e-fe7d9147b656@redhat.com> In-Reply-To: <623c2305-91ae-4617-357e-fe7d9147b656@redhat.com> From: Ben Gardon Date: Thu, 29 Apr 2021 10:45:07 -0700 Message-ID: Subject: Re: [PATCH 5/6] KVM: x86/mmu: Protect kvm->memslots with a mutex To: Paolo Bonzini Cc: Sean Christopherson , LKML , kvm , Peter Xu , Peter Shier , Junaid Shahid , Jim Mattson , Yulei Zhang , Wanpeng Li , Vitaly Kuznetsov , Xiao Guangrong Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 29, 2021 at 12:02 AM Paolo Bonzini wrote: > > On 29/04/21 02:40, Sean Christopherson wrote: > > On Thu, Apr 29, 2021, Paolo Bonzini wrote: > >> it's not ugly and it's still relatively easy to explain. > > > > LOL, that's debatable. > > From your remark below it looks like we have different priorities on > what to avoid modifying. > > I like the locks to be either very coarse or fine-grained enough for > them to be leaves, as I find that to be the easiest way to avoid > deadlocks and complex hierarchies. For this reason, I treat unlocking > in the middle of a large critical section as "scary by default"; you > have to worry about which invariants might be required (in the case of > RCU, which pointers might be stored somewhere and would be invalidated), > and which locks are taken at that point so that the subsequent relocking > would change the lock order from AB to BA. The idea of dropping the srcu read lock to allocate the rmaps scares me too. I have no idea what it protects, in addition to the memslots, and where we might be making assumptions about things staying valid because of it. Is there anywhere that we enumerate the things protected by that SRCU? I wonder if we have complete enough annotations for the SRCU / lockdep checker to find a problem if we were dropping the SRCU read lock in a bad place. > > This applies to every path leading to the unlock/relock. So instead > what matters IMO is shielding architecture code from the races that Ben > had to point out to me, _and the possibility to apply easily explained > rules_ outside more complex core code. > > So, well, "relatively easy" because it's indeed subtle. But if you > consider what the locking rules are, "you can choose to protect > slots->arch data with this mutex and it will have no problematic > interactions with the memslot copy/update code" is as simple as it can get. > > >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > >> index 2799c6660cce..48929dd5fb29 100644 > >> --- a/virt/kvm/kvm_main.c > >> +++ b/virt/kvm/kvm_main.c > >> @@ -1377,16 +1374,17 @@ static int kvm_set_memslot(struct kvm *kvm, > >> goto out_slots; > >> update_memslots(slots, new, change); > >> - slots = install_new_memslots(kvm, as_id, slots); > >> + install_new_memslots(kvm, as_id, slots); > >> kvm_arch_commit_memory_region(kvm, mem, old, new, change); > >> - > >> - kvfree(slots); > >> return 0; > >> out_slots: > >> - if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) > >> + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { > >> + slot = id_to_memslot(slots, old->id); > >> + slot->flags &= ~KVM_MEMSLOT_INVALID; > > > > Modifying flags on an SRCU-protect field outside of said protection is sketchy. > > It's probably ok to do this prior to the generation update, emphasis on > > "probably". Of course, the VM is also likely about to be killed in this case... > > > >> slots = install_new_memslots(kvm, as_id, slots); > > > > This will explode if memory allocation for KVM_MR_MOVE fails. In that case, > > the rmaps for "slots" will have been cleared by kvm_alloc_memslot_metadata(). > > I take your subsequent reply as a sort-of-review that the above approach > works, though we may disagree on its elegance and complexity. I'll try Paolo's suggestion about using a second dup_slots to avoid backing the higher level rmap arrays with dynamic memory and send out a V2. > > Paolo > > > The SRCU index is already tracked in vcpu->srcu_idx, why not temporarily drop > > the SRCU lock if activate_shadow_mmu() needs to do work so that it can take > > slots_lock? That seems simpler and I think would avoid modifying the common > > memslot code. > > > > kvm_arch_async_page_ready() is the only path for reaching kvm_mmu_reload() that > > looks scary, but that should be impossible to reach with the correct MMU context. > > We could always and an explicit sanity check on the rmaps being avaiable. >