Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2039487pxb; Mon, 8 Mar 2021 12:25:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJxqr923bA5EkeOtKbI+ypoMr0tIUEa6XxyS1h6HqUmGYbtit8IsrYI8n8mnBsuZGo+WJUka X-Received: by 2002:a17:906:1182:: with SMTP id n2mr16992572eja.234.1615235120746; Mon, 08 Mar 2021 12:25:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615235120; cv=none; d=google.com; s=arc-20160816; b=NLlrO4TyVDoyPZe6Merkbsh4g/f8ACNuFKA8WwNoX+sJgHe7nNCgDhYJZENQzMhfoJ 0vFF3g/pMTSS3H3JAr7nbp2C/wIQ2nBsqIaMi2BFG4Q2tW6f79fkiR9kOP0Z9CodvwxZ PwxYy9J4M7D5GSFCxMYTuZTFw91Cl73HIpxIVurRD50MIyNOxv6y9XiU6djhLz6OwZOk EtPfy0RY8BvEJjlVp51XgIrjsqbhcU8jj0mbA+W//e6QQq0q0f9l+piY7U4NFJyOqQ0V UVvro1GPCqTk3KUVrY6OOzzK3+kpbAyxDBf9iKGnq8UCIK1hOb1DuG1KeAenrvRPdFb0 I6dA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=CnOdrg+zuN9I7icqeoUJJHRaYHJ2yvJy7iK4d/+2Nsg=; b=mpZF34E4k8CwMU0KyfDRZRxiE1kaTfG/tm+YZ1YznU0VC91R9stLRtrpyyYQ1acqlM 1QG/QW8q9xOVaNWr+0krSYMs6WWPLc3Ta+lRkSw4RpmkRk2pwzdTHdmbf+aAPBkEGLpN jH2ftkguauIW9lTDowy6w3VXdqFZiIWjlgjgk+dN9/ch24MFLMrC65OrEfDvqgagj7tx yFFQBTFTdOiNM2mt4Q1tEIYY2mWnYdEdoiFXqy+RG9kIKrjNg2ZHixbD1F9LSsIDNqmR tuDRHKXRWr6WD5O6dYT56IeVnoH2Doh55SxGXK3KIEi2FmJy0Qvo3rdrli23sFODkGrC CNwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AoreMP4k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gg25si7208644ejb.693.2021.03.08.12.24.58; Mon, 08 Mar 2021 12:25:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=AoreMP4k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230320AbhCHUVT (ORCPT + 99 others); Mon, 8 Mar 2021 15:21:19 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:57974 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230034AbhCHUVQ (ORCPT ); Mon, 8 Mar 2021 15:21:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615234876; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CnOdrg+zuN9I7icqeoUJJHRaYHJ2yvJy7iK4d/+2Nsg=; b=AoreMP4kOHgSacddJE5MP9qB1wpjtDHyIv/rFMrgkjyUkUaP1q8gqVy11cAZyPnZM38J0A 0Yz2xNjvX6fxTXNNfapUf0KhFKO+ig4Eh4x8PkgeU0PPgcmraMpOjrLLfZpPp5hWWCMh4W cykGIwqhX+taZdaAf5Z0MnIXcQ341yE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-600-vZ1m_1qkM9Gr4lbC7TDRGA-1; Mon, 08 Mar 2021 15:21:14 -0500 X-MC-Unique: vZ1m_1qkM9Gr4lbC7TDRGA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 68B4180432E; Mon, 8 Mar 2021 20:21:12 +0000 (UTC) Received: from omen.home.shazbot.org (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9E00860855; Mon, 8 Mar 2021 20:21:07 +0000 (UTC) Date: Mon, 8 Mar 2021 13:21:06 -0700 From: Alex Williamson To: Zeng Tao Cc: , Cornelia Huck , Kevin Tian , Andrew Morton , Peter Xu , Giovanni Cabiddu , Michel Lespinasse , "Jann Horn" , Max Gurtovoy , , , Jason Gunthorpe Subject: Re: [PATCH] vfio/pci: make the vfio_pci_mmap_fault reentrant Message-ID: <20210308132106.49da42e2@omen.home.shazbot.org> In-Reply-To: <1615201890-887-1-git-send-email-prime.zeng@hisilicon.com> References: <1615201890-887-1-git-send-email-prime.zeng@hisilicon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 8 Mar 2021 19:11:26 +0800 Zeng Tao wrote: > We have met the following error when test with DPDK testpmd: > [ 1591.733256] kernel BUG at mm/memory.c:2177! > [ 1591.739515] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > [ 1591.747381] Modules linked in: vfio_iommu_type1 vfio_pci vfio_virqfd vfio pv680_mii(O) > [ 1591.760536] CPU: 2 PID: 227 Comm: lcore-worker-2 Tainted: G O 5.11.0-rc3+ #1 > [ 1591.770735] Hardware name: , BIOS HixxxxFPGA 1P B600 V121-1 > [ 1591.778872] pstate: 40400009 (nZcv daif +PAN -UAO -TCO BTYPE=--) > [ 1591.786134] pc : remap_pfn_range+0x214/0x340 > [ 1591.793564] lr : remap_pfn_range+0x1b8/0x340 > [ 1591.799117] sp : ffff80001068bbd0 > [ 1591.803476] x29: ffff80001068bbd0 x28: 0000042eff6f0000 > [ 1591.810404] x27: 0000001100910000 x26: 0000001300910000 > [ 1591.817457] x25: 0068000000000fd3 x24: ffffa92f1338e358 > [ 1591.825144] x23: 0000001140000000 x22: 0000000000000041 > [ 1591.832506] x21: 0000001300910000 x20: ffffa92f141a4000 > [ 1591.839520] x19: 0000001100a00000 x18: 0000000000000000 > [ 1591.846108] x17: 0000000000000000 x16: ffffa92f11844540 > [ 1591.853570] x15: 0000000000000000 x14: 0000000000000000 > [ 1591.860768] x13: fffffc0000000000 x12: 0000000000000880 > [ 1591.868053] x11: ffff0821bf3d01d0 x10: ffff5ef2abd89000 > [ 1591.875932] x9 : ffffa92f12ab0064 x8 : ffffa92f136471c0 > [ 1591.883208] x7 : 0000001140910000 x6 : 0000000200000000 > [ 1591.890177] x5 : 0000000000000001 x4 : 0000000000000001 > [ 1591.896656] x3 : 0000000000000000 x2 : 0168044000000fd3 > [ 1591.903215] x1 : ffff082126261880 x0 : fffffc2084989868 > [ 1591.910234] Call trace: > [ 1591.914837] remap_pfn_range+0x214/0x340 > [ 1591.921765] vfio_pci_mmap_fault+0xac/0x130 [vfio_pci] > [ 1591.931200] __do_fault+0x44/0x12c > [ 1591.937031] handle_mm_fault+0xcc8/0x1230 > [ 1591.942475] do_page_fault+0x16c/0x484 > [ 1591.948635] do_translation_fault+0xbc/0xd8 > [ 1591.954171] do_mem_abort+0x4c/0xc0 > [ 1591.960316] el0_da+0x40/0x80 > [ 1591.965585] el0_sync_handler+0x168/0x1b0 > [ 1591.971608] el0_sync+0x174/0x180 > [ 1591.978312] Code: eb1b027f 540000c0 f9400022 b4fffe02 (d4210000) > > The cause is that the vfio_pci_mmap_fault function is not reentrant, if > multiple threads access the same address which will lead to a page fault > at the same time, we will have the above error. > > Fix the issue by making the vfio_pci_mmap_fault reentrant, and there is > another issue that when the io_remap_pfn_range fails, we need to undo > the __vfio_pci_add_vma, fix it by moving the __vfio_pci_add_vma down > after the io_remap_pfn_range. > > Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking") > Signed-off-by: Zeng Tao > --- > drivers/vfio/pci/vfio_pci.c | 14 ++++++++++---- > 1 file changed, 10 insertions(+), 4 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index 65e7e6b..6928c37 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -1613,6 +1613,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) > struct vm_area_struct *vma = vmf->vma; > struct vfio_pci_device *vdev = vma->vm_private_data; > vm_fault_t ret = VM_FAULT_NOPAGE; > + unsigned long pfn; > > mutex_lock(&vdev->vma_lock); > down_read(&vdev->memory_lock); > @@ -1623,18 +1624,23 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) > goto up_out; > } > > - if (__vfio_pci_add_vma(vdev, vma)) { > - ret = VM_FAULT_OOM; > + if (!follow_pfn(vma, vma->vm_start, &pfn)) { > mutex_unlock(&vdev->vma_lock); > goto up_out; > } > > - mutex_unlock(&vdev->vma_lock); If I understand correctly, I think you're using (perhaps slightly abusing) the vma_lock to extend the serialization of the vma_list manipulation to include io_remap_pfn_range() such that you can test whether the pte has already been populated using follow_pfn(). In that case we return VM_FAULT_NOPAGE without trying to repopulate the page and therefore avoid the BUG_ON in remap_pte_range() triggered by trying to overwrite an existing pte, and less importantly, a duplicate vma in our list. I wonder if use of follow_pfn() is still strongly discouraged for this use case. I'm surprised that it's left to the fault handler to provide this serialization, is this because we're filling the entire vma rather than only the faulting page? As we move to unmap_mapping_range()[1] we remove all of the complexity of managing a list of vmas to zap based on whether device memory is enabled, including the vma_lock. Are we going to need to replace that with another lock here, or is there a better approach to handling concurrency of this fault handler? Jason/Peter? Thanks, Alex [1]https://lore.kernel.org/kvm/161401267316.16443.11184767955094847849.stgit@gimli.home/ > > if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, > - vma->vm_end - vma->vm_start, vma->vm_page_prot)) > + vma->vm_end - vma->vm_start, vma->vm_page_prot)) { > ret = VM_FAULT_SIGBUS; > + mutex_unlock(&vdev->vma_lock); > + goto up_out; > + } > + > + if (__vfio_pci_add_vma(vdev, vma)) > + ret = VM_FAULT_OOM; > > + mutex_unlock(&vdev->vma_lock); > up_out: > up_read(&vdev->memory_lock); > return ret;