Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp2939089pxb; Tue, 24 Aug 2021 11:02:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJx4BX4Nv4Ftr7EIGCOUgq1ut2ErI4mCkfFDZtahoRmzRdL3J8C8O8oY3DdrEW85zFKODKRV X-Received: by 2002:a02:1cc5:: with SMTP id c188mr36025701jac.46.1629828134510; Tue, 24 Aug 2021 11:02:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1629828134; cv=none; d=google.com; s=arc-20160816; b=YtD9IdEWJfvH6PIaDuV6SzuSLzC2b7khVc7GKceCr+67s+J/6ekoEbOhq0iGYhvX5M fHG7CPtXrMLK5QbuItyizJCbxenpuDo2jDY2btQHRlrdkz4R1KQe+6zXi+E34fXWBN9E y2Vr73sGCMQWI+jzAizCRFM6WawtIXaudbwGnsXDJ2To59OZN9/mi5Yn3PYDRLjNWC2p jiJ5jYTsDb2r8qxSS7Jp3FXjP5d7lUnb7TnGwXGnW2BA1BKlkAGiJtxJ410A6TpRU2nE Hvfa5k/iuvODCib1skmEO484pfhCLj4g+1oQgFbjimx2JTWL7i8UD7jHSMNO8I9QtcBN W49g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=2MRfElf5Ktg2ZL34FL8vljahbPh47EMrD1FdyVS+g+o=; b=nw9hDyvvWYYTWAJgEmJqxNjTa7RRuH2glvHCY6Pjhx1qxDinNC05s2KImX30M6ZERm iNYHCa6CMG8GEPXSiEM1G76F6BLolkvAS9afl4eg3PzMU0sdekqqgsazt1/OjERdfGSA 2vqhQlHZGcsn6kISYx9l8LCRgIy1bgDKcVIOmsp8I8Ga2V+Kwokn7QnbrjygrtVv+bso Lun55gmTxYsJH1ulmlNCbzg0DDfKdeQcPdfFw49O+r12c2g2hpGHmgfLCUDo/q+18bQ5 gox1m54gotZpq6SbjCn0luLh+DTwlIklqZP4F6fXV1r6dhrxAEWMs2SKpGMWrXj+ilQ0 YHzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bNZ07zqJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y5si17227476ion.91.2021.08.24.11.02.01; Tue, 24 Aug 2021 11:02:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=bNZ07zqJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234026AbhHXSAY (ORCPT + 99 others); Tue, 24 Aug 2021 14:00:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41514 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236374AbhHXR77 (ORCPT ); Tue, 24 Aug 2021 13:59:59 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1041C08ED8E for ; Tue, 24 Aug 2021 10:40:40 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id a5so12681164plh.5 for ; Tue, 24 Aug 2021 10:40:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=2MRfElf5Ktg2ZL34FL8vljahbPh47EMrD1FdyVS+g+o=; b=bNZ07zqJSLhYeyR5d7Jb4iWgrARD6Kv3snnrt53OcU6hGM2ZXcKg52lyPlKS/M9nfE Z9itCPtOfwJR6mmWNYz2hK1E/aEahDPoH3JAhvNCctvkXn4geR/KJmm0KLR6FH+Nckp9 mq+wqqysJjUupRx/ED1ySAlh8/by+zl8rYDK2XYtG/I1SthYnPMZkrzeQBCRVbbe2Xyx YSzEmJ+8rkaoQzk9oe+OT+88PfGjC5ezP1rg92Zo0hPAacDisujaH6rSP2waujq/ZOfm q6kqCm05NYHKizssC9zM9vcFjLutUCIVlz2zFylryHrmPldGBQWCaxhuv4Fe1XyCJkHD 6eUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=2MRfElf5Ktg2ZL34FL8vljahbPh47EMrD1FdyVS+g+o=; b=pSfdKKCGckNKXK+YYKgwxbmcGEL5iYxtcyP8KjrPzqrQTkcib/q0CcX2YqDd4Kkeky ylzW3w1OYWU8a3XfMayiGdNkuix0T/JIZhZOU9hrKevQhYiqwYZTD8kPk+ycu7Bc1INq nZ77TQwhiZ4CVu+VxGREWmyIiZ45EQUE/aQRvHXVYMikd5iDjXIag4kxTQQsGcRrN769 tYeL7LdgB/AcRj9B+MWuXvfzoEioGWULQyq3l8+9X+O267nxIibHq4DVbpTfO630PL55 YMhxzbYfmiNBKy+HPBo6eICvUAVD9FOMtrcY8qXGQtF1a0bU5IGl6l0l63daMx0TekJb KOlQ== X-Gm-Message-State: AOAM532n5iFsDq0mXpWhrM+Tyon13e9rlKOTZ/bIARr4l86WbqR7q3dY LHOjZ6a4FY00zl7aCAMtqGPhUQ== X-Received: by 2002:a17:902:7145:b0:137:2e25:5bf0 with SMTP id u5-20020a170902714500b001372e255bf0mr896260plm.10.1629826840089; Tue, 24 Aug 2021 10:40:40 -0700 (PDT) Received: from google.com (157.214.185.35.bc.googleusercontent.com. [35.185.214.157]) by smtp.gmail.com with ESMTPSA id gn12sm2996444pjb.26.2021.08.24.10.40.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 10:40:39 -0700 (PDT) Date: Tue, 24 Aug 2021 17:40:33 +0000 From: Sean Christopherson To: Maxim Levitsky Cc: Vitaly Kuznetsov , Eduardo Habkost , kvm@vger.kernel.org, Paolo Bonzini , Wanpeng Li , Jim Mattson , "Dr. David Alan Gilbert" , Nitesh Narayan Lal , linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 4/4] KVM: x86: Fix stack-out-of-bounds memory access from ioapic_write_indirect() Message-ID: References: <20210823143028.649818-1-vkuznets@redhat.com> <20210823143028.649818-5-vkuznets@redhat.com> <20210823185841.ov7ejn2thwebcwqk@habkost.net> <87mtp7jowv.fsf@vitty.brq.redhat.com> <87k0kakip9.fsf@vitty.brq.redhat.com> <2df0b6d18115fb7f2701587b7937d8ddae38e36a.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2df0b6d18115fb7f2701587b7937d8ddae38e36a.camel@redhat.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 24, 2021, Maxim Levitsky wrote: > On Tue, 2021-08-24 at 16:42 +0200, Vitaly Kuznetsov wrote: > > Eduardo Habkost writes: > > > > > On Tue, Aug 24, 2021 at 3:13 AM Vitaly Kuznetsov wrote: > > > > Eduardo Habkost writes: > > > > > > > > > On Mon, Aug 23, 2021 at 04:30:28PM +0200, Vitaly Kuznetsov wrote: > > > > > > diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > > > > > > index ff005fe738a4..92cd4b02e9ba 100644 > > > > > > --- a/arch/x86/kvm/ioapic.c > > > > > > +++ b/arch/x86/kvm/ioapic.c > > > > > > @@ -319,7 +319,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) > > > > > > unsigned index; > > > > > > bool mask_before, mask_after; > > > > > > union kvm_ioapic_redirect_entry *e; > > > > > > - unsigned long vcpu_bitmap; > > > > > > + unsigned long vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; The preferred pattern is: DECLARE_BITMAP(vcpu_bitmap, KVM_MAX_VCPUS); > > > > > > > > > > Is there a way to avoid this KVM_MAX_VCPUS-sized variable on the > > > > > stack? This might hit us back when we increase KVM_MAX_VCPUS to > > > > > a few thousand VCPUs (I was planning to submit a patch for that > > > > > soon). > > > > > > > > What's the short- or mid-term target? > > > > > > Short term target is 2048 (which was already tested). Mid-term target > > > (not tested yet) is 4096, maybe 8192. > > > > > > > Note, we're allocating KVM_MAX_VCPUS bits (not bytes!) here, this means > > > > that for e.g. 2048 vCPUs we need 256 bytes of the stack only. In case > > > > the target much higher than that, we will need to either switch to > > > > dynamic allocation or e.g. use pre-allocated per-CPU variables and make > > > > this a preempt-disabled region. I, however, would like to understand if > > > > the problem with allocating this from stack is real or not first. > > > > > > Is 256 bytes too much here, or would that be OK? > > > > > > > AFAIR, on x86_64 stack size (both reqular and irq) is 16k, eating 256 Don't forget i386! :-) > > bytes of it is probably OK. I'd start worrying when we go to 1024 (8k > > vCPUs) and above (but this is subjective of course). 256 is fine, 1024 would indeed be problematic, e.g. CONFIG_FRAME_WARN defaults to 1024 on 32-bit kernels. That's not a hard limit per se, but ideally KVM will stay warn-free on all flavors of x86. > On the topic of enlarging these bitmaps to cover all vCPUs. > > I also share the worry of having the whole bitmap on kernel stack for very > large number of vcpus. > Maybe we need to abstract and use a bitmap for a sane number of vcpus, > and use otherwise a 'kmalloc'ed buffer? That's a future problem. More specifically, it's the problem of whoever wants to push KVM_MAX_VCPUS > ~2048. There are a lot of ways to solve the problem, e.g. this I/O APIC code runs under a spinlock so a dedicated bitmap in struct kvm_ioapic could be used to avoid a large stack allocation. > Also in theory large bitmaps might affect performance a bit. Maybe. The only possible degredation for small VMs, i.e. VMs that don't need the full bitmap, is if the compiler puts other variables below the bitmap and causes sub-optimal cache line usage. But I suspect/hope the compiler is smart enough to use GPRs and/or organize the local variables on the stack so that doesn't happen.