Received: by 2002:a05:7412:e794:b0:fa:551:50a7 with SMTP id o20csp1741065rdd; Thu, 11 Jan 2024 08:00:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IFPNdC5Zub/wnNOa6bHhg4TnRMAKHtJ7hvTwuRtjmgYep6YTIRmrmhQlQH9n3B3x0KPLXYi X-Received: by 2002:a05:6402:514f:b0:558:30df:b183 with SMTP id n15-20020a056402514f00b0055830dfb183mr741231edd.22.1704988820741; Thu, 11 Jan 2024 08:00:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704988820; cv=none; d=google.com; s=arc-20160816; b=jefX8kerXRuXBD8uvg/UvM9AGH7kIeazyHXfRC0BPm1TIbSyXVu7Zr7Pr0ldfJirTJ jVoBerhDoAEoduzMK4LvVsnnNVA9Ed9h/8pLIGl4mZPoZizD1evcMfe4Q74KkU78qltD UILyJT8ZX/0xx5QwkEZP2N7EvlqSa/TYq5gt2kyf5W2wP4OR6pOB2sPXRpl2o3NWLln9 L8r/U4QVhtccpofYRDmAjwXOa3uJwnzOE8t3F/8hlNhaGszgnKFmX8uDBA6k22RJLuQW 4iQh/+J5Fx7dxu7hyxdmozYzrNL3FP9MMBUASo1h9QT3M/RypsTtGgkdQ0iysL5dLot+ ZKSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=JxN7R377bwy0eVKj0G4Rz7CiNeL7SD47OvC/Yydz4N4=; fh=FNQjdZUi2IQLjpDVki0yMVkN9uY7qMEsFAUHGOVORVM=; b=ajQfD4H0L5/YAnxlq+MtkXR1LyflRag6VDQjmUkwfswftQxXWoJ2NYnV8u9OBjpwE5 ATm4CS0AMXaJ8XwCHNtTZ7Tzof48y4lCmiRFjTiAvacEem3ZVY5HdLOcLrRA51CvZoCr LFPZh9eQyIwxZW7rB3xgxKW/OlcfH+C6aZSF4iu9aKp5f0NR5GSjZlGpkFtfdGQ7iQpp 1r5z+8TXclkN4gulwJUP3uncQLQX+S8mshowqklm5xe/pw/OdNrj/tUYoPj51vuJOZGj NVIdBy8i1B4G8lxRnNydYPH/ZTiTUTATuZwZqr76oQHJxAj5irs2MstKrcWtUgxCSGVO G+PQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=iAIp9ESR; spf=pass (google.com: domain of linux-kernel+bounces-23822-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23822-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id q28-20020a056402249c00b00553ae98b323si637205eda.248.2024.01.11.08.00.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 08:00:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-23822-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=iAIp9ESR; spf=pass (google.com: domain of linux-kernel+bounces-23822-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23822-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4EA201F23A96 for ; Thu, 11 Jan 2024 16:00:20 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 64E3A4F60B; Thu, 11 Jan 2024 16:00:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iAIp9ESR" Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 49BC74CDE8 for ; Thu, 11 Jan 2024 16:00:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbeac1f5045so6629047276.1 for ; Thu, 11 Jan 2024 08:00:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1704988807; x=1705593607; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=JxN7R377bwy0eVKj0G4Rz7CiNeL7SD47OvC/Yydz4N4=; b=iAIp9ESRiT4KZxhu4joy8o8utXwsHC4VBhtQKqDS4Q97s4xLLanH4k6bisIn1bNxQj gspevdfYmcxrAqNOVMaxuwyaWQGrSItt8T2ciCku6xXRglmOww08pPh/qWsu1x1G7Ii3 43OExbsGQDjmWtLzrthS20D26xWTbxQxdZR8vr0TvtHNmgiHZvlTPgSRY+c9jvUPwNQn djnz3FyalPXBbsdPfbhKT9BTliq0PaO6XApoDlgBxXViMPGN1CG4rU2eQHaux4AwqirK Dp02kew0zgjJqmUO4Bwmo78eYIRKb9RKEQ+ZZXV/lsI4K8DnCBD5KZY28LG2iLm6vQR+ MsLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704988807; x=1705593607; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JxN7R377bwy0eVKj0G4Rz7CiNeL7SD47OvC/Yydz4N4=; b=qYPPWb5OeLv+E4LfLlJiwB5UInMlHkYtnG4PgNPjLzBImhCrgjXtdv2isZr1WY/55f tGEhwCqlCqG1nJV/J5adLWEMLExVCx+pna/BPiu6VvdGJ8PVsY2oAjmIo6SqHBcpQumK GYjgJCDDuUsjOrxhW4qUlkZBXaMVv/NcYD9UB9Y+yCfRFlbi2+QWX7QfKsybfD/5tfQq IW2caExr0UATr22v3p6kDV/zErdJMX7yBpJuxHX4P+hK0plLXukvRI4kDRTUhSEi7UXK 8MDYav2J0FLku2wQog1RKF3MqCAQaFZxaNr70KujUTTCjq4zAIr2mjtU6XbdHsfvzYAa fLVg== X-Gm-Message-State: AOJu0YyGQCX+ot6jr/R0rmxTMx2HIZmkZq9cgM9xhBevDouv+gbssSDg G9DDsG4Pl1Msb1WYfg48z3EjZrUiaxypEKAmdQ== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6902:72a:b0:dbf:4556:2c58 with SMTP id l10-20020a056902072a00b00dbf45562c58mr241806ybt.1.1704988806911; Thu, 11 Jan 2024 08:00:06 -0800 (PST) Date: Thu, 11 Jan 2024 08:00:05 -0800 In-Reply-To: <832697b9-3652-422d-a019-8c0574a188ac@proxmox.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <832697b9-3652-422d-a019-8c0574a188ac@proxmox.com> Message-ID: Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer From: Sean Christopherson To: Friedrich Weber Cc: kvm@vger.kernel.org, Paolo Bonzini , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="us-ascii" On Thu, Jan 04, 2024, Friedrich Weber wrote: > Hi, > > some of our (Proxmox VE) users have been reporting [1] that guests > occasionally become unresponsive with high CPU usage for some time > (varying between ~1 and more than 60 seconds). After that time, the > guests come back and continue running fine. Windows guests seem most > affected (not responding to pings during the hang, RDP sessions time > out). But we also got reports about Linux guests. This issue was not > present while we provided (host) kernel 5.15 and was first reported when > we rolled out a kernel based on 6.2. The reports seem to concern NUMA > hosts only. Users reported that the issue becomes easier to trigger the > more memory is assigned to the guests. Setting mitigations=off was > reported to alleviate (but not eliminate) the issue. The issue seems to > disappear after disabling KSM. > > We can reproduce the issue with a Windows guest on a NUMA host, though > only occasionally and not very reliably. Using a bpftrace script like > [7] we found the hangs to correlate with long-running invocations of > `task_numa_work` (more than 500ms), suggesting a connection to the NUMA > balancer. Indeed, we can't reproduce the issue after disabling the NUMA > balancer with `echo 0 > /proc/sys/kernel/numa_balancing` [2] and got a > user confirming this fixes the issue for them [3]. > > Since the Windows reproducer is not very stable, we tried to find a > Linux guest reproducer and have found one (described below [0]) that > triggers a very similar (hopefully the same) issue. The reproducer > triggers the hangs also if the host is on current Linux 6.7-rc8 > (610a9b8f). A kernel bisect points to the following as the commit > introducing the issue: > > f47e5bbb ("KVM: x86/mmu: Zap only TDP MMU leafs in zap range and > mmu_notifier unmap") > > which is why I cc'ed Sean and Paolo. Because of the possible KSM > connection I cc'ed Andrew and linux-mm. > > Indeed, on f47e5bbb~1 = a80ced6e ("KVM: SVM: fix panic on out-of-bounds > guest IRQ") the reproducer does not trigger the hang, and on f47e5bbb it > triggers the hang. > > Currently I don't know enough about the KVM/KSM/NUMA balancer code to > tell how the patch may trigger these issues. Any idea who we could ask > about this, or how we could further debug this would be greatly appreciated! This is a known issue. It's mostly a KVM bug[1][2] (fix posted[3]), but I suspect that a bug in the dynamic preemption model logic[4] is also contributing to the behavior by causing KVM to yield on preempt models where it really shouldn't. [1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com [2] https://lore.kernel.org/all/bug-218259-28872@https.bugzilla.kernel.org%2F [3] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com [4] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com