Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E57DCC433F5 for ; Tue, 16 Nov 2021 00:04:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CEA3B60EE2 for ; Tue, 16 Nov 2021 00:04:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1348240AbhKPAHj (ORCPT ); Mon, 15 Nov 2021 19:07:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347398AbhKOTjt (ORCPT ); Mon, 15 Nov 2021 14:39:49 -0500 Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EEACC061220 for ; Mon, 15 Nov 2021 11:30:09 -0800 (PST) Received: by mail-lj1-x230.google.com with SMTP id k2so30299252lji.4 for ; Mon, 15 Nov 2021 11:30:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JlBg5LtyqKvrpUdGEugnNnbVsAfgz6T1HCSvSGsjlZk=; b=GM4JCZ3y0BLNP/TxHX2RKDzwzQXrmBW3C2YVhDU0aTewQCnpDjP3k3KVYqcP9gtkkf a0vY7CrJ3Ao7m/3NH2aReiYRb0ktb4y8NsVCnG8VB9RLC1M/an8/s1yLEu+HLbf8rcFN iLQ+vuqBDlgbWLjd+SpjFNoacndfywNLpTnxuoMP9sz4B7oDAlkOkYvWzWi2XbmuE1z0 l9Nd620wI7fij/Jfc1YZFf2aNuGYvBnJIAcQwBZk+ig/TCXePlNpJuUe7hheKF8RB9Ia X/Iwogst9O0aHNxoC716dOFOdcH49ifqcHR6Zx3s0khksOrtpHwaG7fHh1TqjX9/ZjYw Z5wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JlBg5LtyqKvrpUdGEugnNnbVsAfgz6T1HCSvSGsjlZk=; b=4Nqozkg6KwO97R7HPWqncadxNQMid55W2BkrbQyVoLXCA28D/m32SC7UplpP1JYRBs CrRxgw+rxbhNzVU7ROhVP8XkstAPWwvKRTTzZTTBWXJawWVoIRffeUghpT13NVF5bARY rQG2Dj0YBXj745kiH7P7RJTY/AoAhtO+WJB3zxoCO7W8YgOcHFNVOHzshx9vOtfJAbZR FhL2f28SdSVkVAbKvWoIu/GOKP6mO27m+g/3oYhQitQ81TCLnnx1PlgCmP6KnzWXjvGH OwxySTmn/gAte8DjwZUoK3FGx45DuwtatYFfID1BuJP14xKqeZvbwmFwfjuXue6EPwHl MVSA== X-Gm-Message-State: AOAM532LA0x/YWbS88gKwFqKf7L+YJJv9vuJrqg/oZkXCo9KgQovghaR JBvaezxnWIJbx8mscch65NiT3qPcV1ADrcFKC3paPw== X-Google-Smtp-Source: ABdhPJzE8zq/IQ4W4/8Y/EEjilAn9vCsEXb84iRBWktB/ycTLjv28nMT0r/8iAKgJiA4OXYHBe6L2zu42b2iPQv+NNU= X-Received: by 2002:a2e:8895:: with SMTP id k21mr1005211lji.331.1637004607113; Mon, 15 Nov 2021 11:30:07 -0800 (PST) MIME-Version: 1.0 References: <20211111221448.2683827-1-seanjc@google.com> In-Reply-To: From: David Matlack Date: Mon, 15 Nov 2021 11:29:40 -0800 Message-ID: Subject: Re: [PATCH] KVM: x86/mmu: Update number of zapped pages even if page list is stable To: Sean Christopherson Cc: Paolo Bonzini , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Ben Gardon Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 15, 2021 at 11:23 AM Sean Christopherson wrote: > > On Mon, Nov 15, 2021, David Matlack wrote: > > On Thu, Nov 11, 2021 at 2:14 PM Sean Christopherson wrote: > > > > > > When zapping obsolete pages, update the running count of zapped pages > > > regardless of whether or not the list has become unstable due to zapping > > > a shadow page with its own child shadow pages. If the VM is backed by > > > mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping > > > the batch count and thus without yielding. In the worst case scenario, > > > this can cause an RCU stall. > > > > > > rcu: INFO: rcu_sched self-detected stall on CPU > > > rcu: 52-....: (20999 ticks this GP) idle=7be/1/0x4000000000000000 > > > softirq=15759/15759 fqs=5058 > > > (t=21016 jiffies g=66453 q=238577) > > > NMI backtrace for cpu 52 > > > Call Trace: > > > ... > > > mark_page_accessed+0x266/0x2f0 > > > kvm_set_pfn_accessed+0x31/0x40 > > > handle_removed_tdp_mmu_page+0x259/0x2e0 > > > __handle_changed_spte+0x223/0x2c0 > > > handle_removed_tdp_mmu_page+0x1c1/0x2e0 > > > __handle_changed_spte+0x223/0x2c0 > > > handle_removed_tdp_mmu_page+0x1c1/0x2e0 > > > __handle_changed_spte+0x223/0x2c0 > > > zap_gfn_range+0x141/0x3b0 > > > kvm_tdp_mmu_zap_invalidated_roots+0xc8/0x130 > > > > This is a useful patch but I don't see the connection with this stall. > > The stall is detected in kvm_tdp_mmu_zap_invalidated_roots, which runs > > after kvm_zap_obsolete_pages. How would rescheduling during > > kvm_zap_obsolete_pages help? > > Ah shoot, I copy+pasted the wrong splat. The correct, revelant backtrace is: Ok that makes more sense :). Also that was a soft lockup rather than an RCU stall. > > mark_page_accessed+0x266/0x2e0 > kvm_set_pfn_accessed+0x31/0x40 > mmu_spte_clear_track_bits+0x136/0x1c0 > drop_spte+0x1a/0xc0 > mmu_page_zap_pte+0xef/0x120 > __kvm_mmu_prepare_zap_page+0x205/0x5e0 > kvm_mmu_zap_all_fast+0xd7/0x190 > kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 > kvm_page_track_flush_slot+0x5c/0x80 > kvm_arch_flush_shadow_memslot+0xe/0x10 > kvm_set_memslot+0x1a8/0x5d0 > __kvm_set_memory_region+0x337/0x590 > kvm_vm_ioctl+0xb08/0x1040