Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp400074pxu; Tue, 5 Jan 2021 14:28:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJx8k3N6Ns4SiOfg1bZ7YCX1cDtdZ/0hGpedgFbpQAJArIGFHjV351wILaoR/5nErJ2V3T8F X-Received: by 2002:a05:6402:7d7:: with SMTP id u23mr1825105edy.325.1609885686633; Tue, 05 Jan 2021 14:28:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609885686; cv=none; d=google.com; s=arc-20160816; b=rWGiF9T645N9J4NLYTdyT0PPA7aKUZiGt9C05RjjRDryVYi6FS9pirrEZeSKJE1tM0 0YCGW1cRSazHKJO38Kmij5LDK1viLp4sYP0hEsONyQKDs3S47zczik8HYSE1j1+gyRq3 kakIDW3HSR8auhkfAClRipayWFcXRDa21kltCLFvMKH/IhGb4bL+jCypYms9Sy1pUzYr /YwnPpo6/pbJgcqrf0oM9frugFly2BclcFfPeSx0d/DHW7DVpScMLpitDDU3EMu73E3H GviCwlOrEXqZT+4ZaioaP1FTRE7UxrpemonExybr6gpLlbb1f8kXMMgq2hP9rgcMNMC0 E/Zg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=NI7JRZUNfaLXyoeGZ8aqowVREIgk5DG9F7C0+4KUbB0=; b=wsqCmBoPJmwZ1G47WKkV9AfP3zHEpsbCyWKgR78XUAVywgIHtvVmQaC2RiceoZ23ik Dc4ZKgn09ROrgvizXaM+rz4TYsmuZsgLY/GVsjUM+QxlDNbZbr1rLuQJ+VrK9Ae/n/B7 SEtf4+p6D4IwjZ9DaRT6MRvYxLVc5WtjOeHLTA4NVZxYmXhjfzIh+t4KfR5yHYGy5CEZ 36YavBxgHXhuMjloUqs1QDM9Hbw2m225ZTmljBlDqARquJZRLzTfIJGduyJToezBgPf1 E7o4VqmEns6mpnuGAH8aLgwX9/AoCrEgeX4fppFtFvYCQHtImLMLXR24OCaQVVmzmgGV aTFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cKXPT0s2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id a14si191832eds.103.2021.01.05.14.27.43; Tue, 05 Jan 2021 14:28:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=cKXPT0s2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730733AbhAETWl (ORCPT + 99 others); Tue, 5 Jan 2021 14:22:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730398AbhAETWl (ORCPT ); Tue, 5 Jan 2021 14:22:41 -0500 Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5131C061795 for ; Tue, 5 Jan 2021 11:22:00 -0800 (PST) Received: by mail-io1-xd2d.google.com with SMTP id z5so373012iob.11 for ; Tue, 05 Jan 2021 11:22:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NI7JRZUNfaLXyoeGZ8aqowVREIgk5DG9F7C0+4KUbB0=; b=cKXPT0s20xckt3pn4JszbxoqroM4ZFtl1KTkYSLpKTVM4pxFdo8qyExlNNGT8BYpMb NZ2RIqDP3gWJ6bkmP4nvlIVvZXmWQ5joEet2CujZKwpWcJ7N5n/auk1BW+/+tVoS+/s4 lT8gVE8FyVV7Ciz+NDT9hiz3e4292ghz1R8TSVmowELFoufgEE91l6yLtA+GErZZEkDD YbByOYNSLpJO+b4s+aYQ+miKLn/QmnSxdVQZ0VKNvimxO5HFTijBWlM7SZ7oh+QA3lXf zI3VjqvvmueJ2mny9sPweOKYQ1sjfncPveIewLrgy3QOtiSHODxs+Eow1AFox5W0KogK VNpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NI7JRZUNfaLXyoeGZ8aqowVREIgk5DG9F7C0+4KUbB0=; b=V8OFD6FDHnjeQwdaqSCAffw9i+myM0vs28E/j0RxozNCPirfvCBXB5clpBa73ytj57 RQ8CdkQk7VHTWH7WHyXqUicoi1uU/Ktl9At7Z3flc5xOFnA2D7VrPHTUoum2W2IkT6/J NGZDYLTU6uOEuet2mtTneFJh+YNt5Rem1AB3kqCwARzZ7372tb/MtNkILpxBjLnfE86d hHp04mgCHGi/STj+h8NIVaZmk5ntZd7JsXBrQlkEau89+vjIzQ7PIRk1vwpyxpx1mKOB K/BiVBo+as/55+YpzN/Zsb36Uc9/fXigYjv4ECbjKGJUVHGlWSdimR6yhIL2XC94DAlf Wu+g== X-Gm-Message-State: AOAM531w+lNRnqarX7H+v/1LO1S9b4z9S+lYxWKF0UHufsyIHkNRzyUe oE+TJStXZMCaF9xyB5CuPZaFwO4BxpKCAB38T6Vt/Q== X-Received: by 2002:a02:cf9a:: with SMTP id w26mr958587jar.25.1609874519283; Tue, 05 Jan 2021 11:21:59 -0800 (PST) MIME-Version: 1.0 References: <4bf6fcae-20e7-3eae-83ec-51fb52110487@oracle.com> <8A352C2E-E7D2-4873-807F-635A595DCAEF@gmail.com> In-Reply-To: From: Ben Gardon Date: Tue, 5 Jan 2021 11:21:48 -0800 Message-ID: Subject: Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU To: Sean Christopherson Cc: Paolo Bonzini , leohou1402 , "maciej.szmigiero@oracle.com" , "linux-kernel@vger.kernel.org" , "kvm@vger.kernel.org" , "cannonmatthews@google.com" , "peterx@redhat.com" , "pshier@google.com" , "pfeiner@google.com" , "junaids@google.com" , "jmattson@google.com" , "yulei.kernel@gmail.com" , "kernellwp@gmail.com" , "vkuznets@redhat.com" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 5, 2021 at 10:46 AM Sean Christopherson wrote: > > On Tue, Jan 05, 2021, Paolo Bonzini wrote: > > On 05/01/21 18:49, Ben Gardon wrote: > > > for_each_tdp_mmu_root(kvm, root) { > > > kvm_mmu_get_root(kvm, root); > > > > > > kvm_mmu_put_root(kvm, root); > > > } > > > > > > In these cases the get and put root calls are there to ensure that the > > > root is not freed while the function is running, however they do this > > > too well. If the put root call reduces the root's root_count to 0, it > > > should be removed from the roots list and freed before the MMU lock is > > > released. However the above pattern never bothers to free the root. > > > The following would fix this bug: > > > > > > -kvm_mmu_put_root(kvm, root); > > > +if (kvm_mmu_put_root(kvm, root)) > > > + kvm_tdp_mmu_free_root(kvm, root); > > > > Is it worth writing a more complex iterator struct, so that > > for_each_tdp_mmu_root takes care of the get and put? > > Ya, and maybe with an "as_id" variant to avoid the get/put? Not sure that's a > worthwhile optimization though. I'll see about adding such an iterator. I don't think avoiding the get / put is really worthwhile in this case since they're cheap operations and putting them in the iterator makes it less obvious that they're missing if those functions ever need to yield. > > On a related topic, there are a few subtleties with respect to > for_each_tdp_mmu_root() that we should document/comment. The flows that drop > mmu_lock while iterating over the roots don't protect against the list itself > from being modified. E.g. the next entry could be deleted, or a new root > could be added. I think I've convinced myself that there are no existing bugs, > but we should document that the exact current behavior is required for > correctness. > > The use of "unsafe" list_for_each_entry() in particular is unintuitive, as using > the "safe" variant would dereference a deleted entry in the "next entry is > deleted" scenario. > > And regarding addomg a root, using list_add_tail() instead of list_add() in > get_tdp_mmu_vcpu_root() would cause iteration to visit a root that was added > after the iteration started (though I don't think this would ever be problematic > in practice?). A lot of these observations are safe because the operations using this iterator only consider one root at a time and aren't really interested in the state of the list. Your point about the dangers of adding and removing roots while one of these functions is running is valid, but like the legacy / shadow MMU implementation, properties which need to be guaranteed across multiple roots need to be managed at a higher level. I believe that with the legacy / shadow MMU the creation of a new root, while enabling write protection dirty logging for example, could result in entries in the new root being considered, and others not considered. That's why we set the dirty logging flag on the memslot before we write protect SPTEs. I'm not sure where exactly to document these properties, but I agree it would be a good thing to do in a future patch set.