Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp770914pxb; Tue, 5 Apr 2022 22:36:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw78FD2GiRt9+kgum6yvkITZMTiIWQ3wR2dkF3SgPpBv9beLa+40equYHFNrn6uH0sP8oVm X-Received: by 2002:a17:902:f702:b0:156:aaa8:7479 with SMTP id h2-20020a170902f70200b00156aaa87479mr7090007plo.161.1649223404269; Tue, 05 Apr 2022 22:36:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649223404; cv=none; d=google.com; s=arc-20160816; b=XchH2JJsEyQQS2bFFrNFEyphFTfmNuTOua3OdxsmyTLnmeZOgUHp4Jre4DXoTp21RH zQlRnKIgcXvtANDc5Z8ZWeG9lf9cpFQd0uxWGnZvWmHbdqhOKqJ8s+TQyt6RhTYIj7jN AR23/K/cUdciEKOUZaiK7+mOVlkjvbZmmQI0EhH+lG1n2XobSAb91ljfj9kdEe4xNbsB qI0HbIGimPZHVK8vF8FpiNSKvcgYTWPJu9rhChA3HNO3LBslGraScF2NDTOLZZn+MRTj AM04Vf4nMozjFD1oxeijnKUd6bndagsgU0jTm1AQHF96WwSJ1QSy2A0Fx0R48iqkm+UL Ueug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=sE6QjLjk2on5CD7/+R4k8ZlwLqlZjfuLbeTOJY8ZrIg=; b=Eu6bTHQRZ4j3ULuVbXlXEpeCNZSXS3lSir47pqn23+UP8StVs9Cxu4EFr9Y0xnsyrw y0j3EMdeGBamD2nYjm9AcyBMnk2KT1eJS0akDNhMyTFf72+PuimDdc3WcYwLZHSWvwv2 GNrKwSako2kwoFZU5OEsPGsvWgIwm0AI3YBA4dtbwWGDdORXkmbiB5x+WrR25NeCoONu lp0GyRgal9HFMMeRlqqmWJb332YUfqk+vs0pHRCe7mUFNKA+IzCoquNGOajw7XcnRxpP YlPnrYqN/pE6g/J5H+tLHMh5y0JRcZRJSJiWMTP/qyBbTYFaZLQ9gowFrfbsm8c/PwyY XHqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=BX1m9Xbm; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id n16-20020a637210000000b003816043f053si15438797pgc.584.2022.04.05.22.36.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Apr 2022 22:36:44 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=BX1m9Xbm; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3023345405D; Tue, 5 Apr 2022 21:33:01 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1391507AbiDEWFe (ORCPT + 99 others); Tue, 5 Apr 2022 18:05:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1354395AbiDEKOA (ORCPT ); Tue, 5 Apr 2022 06:14:00 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 27124692BA; Tue, 5 Apr 2022 03:00:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DB6D8B81C83; Tue, 5 Apr 2022 09:59:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 33ABEC385A1; Tue, 5 Apr 2022 09:59:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649152797; bh=/UGrkgR647QQXM3GQRZYQjVcQ6Z9E8jtvWs6yNBcuDQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BX1m9XbmmB419nYHNUW4m5KTWdjex61iBKouzFiCuK7NaGBZgG6oKOSKN1denDwcs Gluv6MhqYPP1NpLGOsLH1X6+SAVxuZ+Ix/395/bvYjmWK0RlmKbs5BvgJHwXxpGm/T u3j3IwUb1OxV20eIMw5de9ZG5nA8toDxhTYEnneI= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Qiuhao Li , Gaoning Pan , Yongkang Jia , syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com, Maxim Levitsky , Paolo Bonzini , Tadeusz Struk Subject: [PATCH 5.15 904/913] KVM: x86/mmu: do compare-and-exchange of gPTE via the user address Date: Tue, 5 Apr 2022 09:32:46 +0200 Message-Id: <20220405070406.915680217@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220405070339.801210740@linuxfoundation.org> References: <20220405070339.801210740@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Paolo Bonzini commit 2a8859f373b0a86f0ece8ec8312607eacf12485d upstream. FNAME(cmpxchg_gpte) is an inefficient mess. It is at least decent if it can go through get_user_pages_fast(), but if it cannot then it tries to use memremap(); that is not just terribly slow, it is also wrong because it assumes that the VM_PFNMAP VMA is contiguous. The right way to do it would be to do the same thing as hva_to_pfn_remapped() does since commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05), using follow_pte() and fixup_user_fault() to determine the correct address to use for memremap(). To do this, one could for example extract hva_to_pfn() for use outside virt/kvm/kvm_main.c. But really there is no reason to do that either, because there is already a perfectly valid address to do the cmpxchg() on, only it is a userspace address. That means doing user_access_begin()/user_access_end() and writing the code in assembly to handle exceptions correctly. Worse, the guest PTE can be 8-byte even on i686 so there is the extra complication of using cmpxchg8b to account for. But at least it is an efficient mess. (Thanks to Linus for suggesting improvement on the inline assembly). Reported-by: Qiuhao Li Reported-by: Gaoning Pan Reported-by: Yongkang Jia Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Debugged-by: Tadeusz Struk Tested-by: Maxim Levitsky Cc: stable@vger.kernel.org Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/mmu/paging_tmpl.h | 77 +++++++++++++++++++---------------------- 1 file changed, 37 insertions(+), 40 deletions(-) --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -34,9 +34,8 @@ #define PT_HAVE_ACCESSED_DIRTY(mmu) true #ifdef CONFIG_X86_64 #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL - #define CMPXCHG cmpxchg + #define CMPXCHG "cmpxchgq" #else - #define CMPXCHG cmpxchg64 #define PT_MAX_FULL_LEVELS 2 #endif #elif PTTYPE == 32 @@ -52,7 +51,7 @@ #define PT_GUEST_DIRTY_SHIFT PT_DIRTY_SHIFT #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT #define PT_HAVE_ACCESSED_DIRTY(mmu) true - #define CMPXCHG cmpxchg + #define CMPXCHG "cmpxchgl" #elif PTTYPE == PTTYPE_EPT #define pt_element_t u64 #define guest_walker guest_walkerEPT @@ -65,7 +64,9 @@ #define PT_GUEST_DIRTY_SHIFT 9 #define PT_GUEST_ACCESSED_SHIFT 8 #define PT_HAVE_ACCESSED_DIRTY(mmu) ((mmu)->ept_ad) - #define CMPXCHG cmpxchg64 + #ifdef CONFIG_X86_64 + #define CMPXCHG "cmpxchgq" + #endif #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL #else #error Invalid PTTYPE value @@ -147,43 +148,39 @@ static int FNAME(cmpxchg_gpte)(struct kv pt_element_t __user *ptep_user, unsigned index, pt_element_t orig_pte, pt_element_t new_pte) { - int npages; - pt_element_t ret; - pt_element_t *table; - struct page *page; - - npages = get_user_pages_fast((unsigned long)ptep_user, 1, FOLL_WRITE, &page); - if (likely(npages == 1)) { - table = kmap_atomic(page); - ret = CMPXCHG(&table[index], orig_pte, new_pte); - kunmap_atomic(table); - - kvm_release_page_dirty(page); - } else { - struct vm_area_struct *vma; - unsigned long vaddr = (unsigned long)ptep_user & PAGE_MASK; - unsigned long pfn; - unsigned long paddr; - - mmap_read_lock(current->mm); - vma = find_vma_intersection(current->mm, vaddr, vaddr + PAGE_SIZE); - if (!vma || !(vma->vm_flags & VM_PFNMAP)) { - mmap_read_unlock(current->mm); - return -EFAULT; - } - pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; - paddr = pfn << PAGE_SHIFT; - table = memremap(paddr, PAGE_SIZE, MEMREMAP_WB); - if (!table) { - mmap_read_unlock(current->mm); - return -EFAULT; - } - ret = CMPXCHG(&table[index], orig_pte, new_pte); - memunmap(table); - mmap_read_unlock(current->mm); - } + int r = -EFAULT; + + if (!user_access_begin(ptep_user, sizeof(pt_element_t))) + return -EFAULT; + +#ifdef CMPXCHG + asm volatile("1:" LOCK_PREFIX CMPXCHG " %[new], %[ptr]\n" + "mov $0, %[r]\n" + "setnz %b[r]\n" + "2:" + _ASM_EXTABLE_UA(1b, 2b) + : [ptr] "+m" (*ptep_user), + [old] "+a" (orig_pte), + [r] "+q" (r) + : [new] "r" (new_pte) + : "memory"); +#else + asm volatile("1:" LOCK_PREFIX "cmpxchg8b %[ptr]\n" + "movl $0, %[r]\n" + "jz 2f\n" + "incl %[r]\n" + "2:" + _ASM_EXTABLE_UA(1b, 2b) + : [ptr] "+m" (*ptep_user), + [old] "+A" (orig_pte), + [r] "+rm" (r) + : [new_lo] "b" ((u32)new_pte), + [new_hi] "c" ((u32)(new_pte >> 32)) + : "memory"); +#endif - return (ret != orig_pte); + user_access_end(); + return r; } static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu,