Received: by 2002:a05:6a10:eb17:0:0:0:0 with SMTP id hx23csp3939911pxb; Tue, 7 Sep 2021 10:49:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzm9zd/lk/yzGgbY+PsTfIccLoyrSJwTF6g1/vMm2J3dtN0tK9NjNbm47UQ4TRkXXdg8TZv X-Received: by 2002:a17:906:3bc9:: with SMTP id v9mr1327924ejf.168.1631036948973; Tue, 07 Sep 2021 10:49:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631036948; cv=none; d=google.com; s=arc-20160816; b=0iKLDobogYUMUJUtcDp1QmT4+hQkv7WsX+RUu5B6wOVeiyEgTDqtJ9aRqcmwj95+bo 5D4zAP67Y4ZkFPeC/BtuC73rnYFz0DSLnMZW7Qm7omvXyNVe8kHCvG6VEgYpCE7acy7a Kf24OkHKADLtGozXYuO78ZUj38fVUmmP27jHNmzjEpeyrKdte90zVoV1JXKecB0u6leh w1ehiL0h9imcFxkKW0ILQedmKgXxzebXrLhy1serfcs46O2QHeRTtJ6Bj/bzyiU3K0aW lnJxvaZHCSQUJQfHfkGxptIxDV8eGpCphE9jUX27RfFqDgni50nXDeSJATLRkWzRNF46 WD7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=vjTxGNoImuIhDp4ZU4F0M0RmU0pYqlzioHk3BE15X7E=; b=eoHWZV3Hr5ky44YPIdadvgkr7kAIfknsa21PNkExhJpTCZpvCY9bWum7xp0j18tD+m TJbB+el33MKBtOoZKISmw1gAhbWnOAsgDsqr9x3car0690hxsY8JyPkLe3yxXXNYyxjq ycOldDovQ1X6YPt38hR1NXTnR9FFExA/yHqsIeA5boz//caC2LA6BqHuBFdRn1MszfFb 5V85vAED6YgrCfLDk44KtKzGUc+gJ+5adtVkUHamql3aXnj0rGbkSDTMvZsX3Und/gP3 +gZ7SxJPyihuNvMffnpjCIqfl5cRaRoWcsYXW0zksRluPuIKntzruXN8AflkdtyjebbU 1KeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VIPyCSzm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q12si11365942edw.452.2021.09.07.10.48.45; Tue, 07 Sep 2021 10:49:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VIPyCSzm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242931AbhIGRbN (ORCPT + 99 others); Tue, 7 Sep 2021 13:31:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57032 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229914AbhIGRbM (ORCPT ); Tue, 7 Sep 2021 13:31:12 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4BEDBC061575 for ; Tue, 7 Sep 2021 10:30:06 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id x19so8723681pfu.4 for ; Tue, 07 Sep 2021 10:30:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=vjTxGNoImuIhDp4ZU4F0M0RmU0pYqlzioHk3BE15X7E=; b=VIPyCSzmwYwkIDNutJGP7K+imsB3i7QYdFqn0+uGDBW236ho9XuYGWGiP1RAjoB1lf ctXsE1WqK8PrhAiZ3tbDvD8FuAIYJHXtR9FpGrnfKcpRu2lmuyaOS4kR0ZIOE4+kQsPK 6W2pE/oQ89Cw4XXELHjP+t/oo/AGpw7uDL8I430IUgrBhtdqrxGzjXjFybCUZWLX1pB8 dz0991PR3zWwyb8kbw0zNnwSVPhZNH0fB7W2lekkaOt8LnlUJVl33Yh96QkR0UpZlwyN f8JgryVr371ehPSN01I1yVVU/FDVCOoMLY3svn+fws1L8zqbnE7ovKxbMoq0UIUNHz9m CUxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=vjTxGNoImuIhDp4ZU4F0M0RmU0pYqlzioHk3BE15X7E=; b=SgPENBBmGly6VQTVFkD8+rabpDTZY4K5ZjYHqDjE9JPKSWlK4WpRtLKxuipnBNRRIt KatngpZp5mPKLA5mpmGMokOAIxKW0b+5jfobWJ7x4GlnPeTFkzh7v0tmWcTD53GVnrQr LfnWURBTiqB/0DyhDdscYzL7OHIbZ9TZD+OqYs89XRKNXCKb/0zBwY5TzkbUh5CM8XtD Rypw8UgZlN2m4cZdRkWnaNe0s2Q5DnjOiv0QPnm2EJTBJO86PyA65XCzAUfyZbN0EXTK 6JA/JtxHEAZU0U03mrYdwkutmYaqsk2PyktLXiJnZqUtKZ6VSjca34+vzmQP+lMpsIkh uqeA== X-Gm-Message-State: AOAM531sQorGMDz659PoKL656ZJlAFJTKOs4KUIEHIadXc8Zut+4Txu2 FIQCeOvDKgA5fr6iaSJe8Y1j4w== X-Received: by 2002:a63:4610:: with SMTP id t16mr18030575pga.176.1631035805550; Tue, 07 Sep 2021 10:30:05 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id c21sm11503205pfd.200.2021.09.07.10.30.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Sep 2021 10:30:05 -0700 (PDT) Date: Tue, 7 Sep 2021 17:30:01 +0000 From: Sean Christopherson To: syzbot Cc: bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, mingo@redhat.com, pbonzini@redhat.com, syzkaller-bugs@googlegroups.com, tglx@linutronix.de, vkuznets@redhat.com, wanpengli@tencent.com, x86@kernel.org, Linus Torvalds , Ben Gardon Subject: Re: [syzbot] WARNING: kmalloc bug in memslot_rmap_alloc Message-ID: References: <0000000000006096fa05cb454a9c@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0000000000006096fa05cb454a9c@google.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org +Linus and Ben On Sun, Sep 05, 2021, syzbot wrote: > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 8419 at mm/util.c:597 kvmalloc_node+0x111/0x120 mm/util.c:597 > Modules linked in: > CPU: 0 PID: 8419 Comm: syz-executor520 Not tainted 5.14.0-syzkaller #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > RIP: 0010:kvmalloc_node+0x111/0x120 mm/util.c:597 ... > Call Trace: > kvmalloc include/linux/mm.h:806 [inline] > kvmalloc_array include/linux/mm.h:824 [inline] > kvcalloc include/linux/mm.h:829 [inline] > memslot_rmap_alloc+0xf6/0x310 arch/x86/kvm/x86.c:11320 > kvm_alloc_memslot_metadata arch/x86/kvm/x86.c:11388 [inline] > kvm_arch_prepare_memory_region+0x48d/0x610 arch/x86/kvm/x86.c:11462 > kvm_set_memslot+0xfe/0x1700 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1505 > __kvm_set_memory_region+0x761/0x10e0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1668 > kvm_set_memory_region arch/x86/kvm/../../../virt/kvm/kvm_main.c:1689 [inline] > kvm_vm_ioctl_set_memory_region arch/x86/kvm/../../../virt/kvm/kvm_main.c:1701 [inline] > kvm_vm_ioctl+0x4c6/0x2330 arch/x86/kvm/../../../virt/kvm/kvm_main.c:4236 KVM is tripping the WARN_ON_ONCE(size > INT_MAX) added in commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls"). The allocation size is absurd and doomed to fail in this particular configuration (syzkaller is just throwing garbage at KVM), but for humongous virtual machines it's feasible that KVM could run afoul of the sanity check for an otherwise legitimate allocation. The allocation in question is for KVM's "rmap" to translate a guest pfn to a host virtual address. The size of the rmap in question is an unsigned long per 4kb page in a memslot, i.e. on x86-64, 8 bytes per 4096 bytes of guest memory in a memslot. With INT_MAX=0x7fffffff, KVM will trip the WARN and fail rmap allocations for memslots >= 1tb, and Google already has VMs that create 1.5tb memslots (12tb of total guest memory spread across 8 virtual NUMA nodes). One caveat is that KVM's newfangled "TDP MMU" was designed specifically to avoid the rmap allocation (among other things), precisely because of its scalability issues. I.e. it's unlikely KVM's so called "legacy MMU" that relies on the rmaps would be used for such large VMs. However, KVM's legacy MMU is still the only option for shadowing nested EPT/NPT, i.e. the rmap allocation would be problematic if/when nested virtualization is enabled in large VMs. KVM also has other allocations based on memslot size that are _not_ avoided by KVM's TDP MMU and may eventually be problematic, though presumably not for quite some time as it would require petabyte? memslots. E.g. a different metadata array requires 4 bytes per 2mb of guest memory. I don't have any clever ideas to handle this from the KVM side, at least not in the short term. Long term, I think it would be doable to reduce the rmap size for large memslots by 512x, but any change of that nature would be very invasive to KVM and be fairly risky. It also wouldn't prevent syskaller from triggering this WARN at will.