Received: by 2002:a05:6a10:6744:0:0:0:0 with SMTP id w4csp830292pxu; Wed, 7 Oct 2020 17:54:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzrTLtwKnRT3ogpb42f4H+iucDFfk6AN4b2pmcH3DvJh7eA7phNKL7Pp6jFASeR9YEd1Owk X-Received: by 2002:aa7:cc02:: with SMTP id q2mr6166197edt.196.1602118484973; Wed, 07 Oct 2020 17:54:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1602118484; cv=none; d=google.com; s=arc-20160816; b=bmt5X3Jij1W6nXaIyIgGiwiqOH5YYXg56u5RzYix+lKx84iEyuQMXi8VDblKVtmOcK rpUtgOEopYbBfqruXBSWprTvu0DKorQlEX+3ei6CLo2puZwM6//eC+Vunj8hO2Bj88I9 PVE9HXsQhOpuMcYQr89WIH6DicIZKBno/yDg0W8YXPv5B+IOVC64WzdR4XS9euIegWgI 1w0XVG9BEI4x2rYIorsGs1Zd7e85oP/R61VemLi/W3DDVWlr5rWJ08vknnt0VBRTVham OeOSZD50BXz+ELn7p+oibVeQWgXWRE1GdVBaUaaWsfySJeqM3b/j+QrqiqyHjNQ2ZRjC OMMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=5uceI5LiOq5+y7CKrEdS7/p/1zSG6F9z/ywEQqsl+0Q=; b=TEv5KEuNfWi0Zd4dgwtmJlLWQnrvi600upfiM7j3rr9A05mTKaKBbVTMnh9ESD34Fu ZFRXV2MMV8kbNqxa3HShWKtnhxwsDh9EruXeX2aj4NOilAWd5aX6eI4r6J1C3+bLS9lI gHqyhOEObmFUzk7Te3ZTjvG6G4AMO+zad30Q+qUwBBwRjCrk3b7qIoiXMvNLqKLnjQwk sCQr7dm7UBiWbilHt5ACFz+2Q/anrhj2kamnAy2oZVHEaqC7JWPR1toaokjCF5++h8e7 CH+5fY1nXy7aTHLTe/TA7SLWwoH/paACaB5F5SK7nC9oQtFNM7T5elQt4nSNGzCsFuHy gQ7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=bOv1zchI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d21si2502948eje.284.2020.10.07.17.54.22; Wed, 07 Oct 2020 17:54:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=bOv1zchI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728216AbgJHApG (ORCPT + 99 others); Wed, 7 Oct 2020 20:45:06 -0400 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:1211 "EHLO hqnvemgate24.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728023AbgJHAow (ORCPT ); Wed, 7 Oct 2020 20:44:52 -0400 Received: from hqmail.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, AES256-SHA) id ; Wed, 07 Oct 2020 17:43:02 -0700 Received: from [10.2.85.86] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 8 Oct 2020 00:44:46 +0000 Subject: Re: [PATCH 07/13] mm: close race in generic_access_phys To: Daniel Vetter , DRI Development , LKML CC: , , , , , , Jason Gunthorpe , Dan Williams , Kees Cook , Rik van Riel , Benjamin Herrensmidt , Dave Airlie , Hugh Dickins , Andrew Morton , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Jan Kara , Daniel Vetter References: <20201007164426.1812530-1-daniel.vetter@ffwll.ch> <20201007164426.1812530-8-daniel.vetter@ffwll.ch> From: John Hubbard Message-ID: <852a74ec-339b-4c7f-9e29-b9736111849a@nvidia.com> Date: Wed, 7 Oct 2020 17:44:46 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <20201007164426.1812530-8-daniel.vetter@ffwll.ch> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1602117782; bh=5uceI5LiOq5+y7CKrEdS7/p/1zSG6F9z/ywEQqsl+0Q=; h=Subject:To:CC:References:From:Message-ID:Date:User-Agent: MIME-Version:In-Reply-To:Content-Type:Content-Language: Content-Transfer-Encoding:X-Originating-IP:X-ClientProxiedBy; b=bOv1zchIGPZ4VmVobYy9ouKvwkK5z2iFx0tCygQStaL5H7jfVh0290LObbayaaJdS WHEwgcF93dN/WBuw358p5ASUvd7jq2UVbYaEsfmxxLYQKm9SIHpvw1Azpcdj8EvlkO +h0jsBZCIPOWXbnWZglvtlNPQ9RozMAAGu9n2l/aM37dTFHaW2PBaPyNHrvz3GBICy 9TRkHD5naSV5XLswp6NmgWYjti6DfG9Id8SNalhueNWQcUu+7yGRlg+spys2z8hb56 zeyo4ufqohIWRiSETZ13zhAJ7qYzaQsz7j6WHTgK/Hmqk/WB97CYrfy3hYU2QFR/25 OfakNOpxWk5cA== Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/7/20 9:44 AM, Daniel Vetter wrote: > Way back it was a reasonable assumptions that iomem mappings never > change the pfn range they point at. But this has changed: > > - gpu drivers dynamically manage their memory nowadays, invalidating > ptes with unmap_mapping_range when buffers get moved > > - contiguous dma allocations have moved from dedicated carvetouts to s/carvetouts/carveouts/ > cma regions. This means if we miss the unmap the pfn might contain > pagecache or anon memory (well anything allocated with GFP_MOVEABLE) > > - even /dev/mem now invalidates mappings when the kernel requests that > iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87 > ("/dev/mem: Revoke mappings when a driver claims the region") Thanks for putting these references into the log, it's very helpful. ... > diff --git a/mm/memory.c b/mm/memory.c > index fcfc4ca36eba..8d467e23b44e 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4873,28 +4873,68 @@ int follow_phys(struct vm_area_struct *vma, > return ret; > } > > +/** > + * generic_access_phys - generic implementation for iomem mmap access > + * @vma: the vma to access > + * @addr: userspace addres, not relative offset within @vma > + * @buf: buffer to read/write > + * @len: length of transfer > + * @write: set to FOLL_WRITE when writing, otherwise reading > + * > + * This is a generic implementation for &vm_operations_struct.access for an > + * iomem mapping. This callback is used by access_process_vm() when the @vma is > + * not page based. > + */ > int generic_access_phys(struct vm_area_struct *vma, unsigned long addr, > void *buf, int len, int write) > { > resource_size_t phys_addr; > unsigned long prot = 0; > void __iomem *maddr; > + pte_t *ptep, pte; > + spinlock_t *ptl; > int offset = addr & (PAGE_SIZE-1); > + int ret = -EINVAL; > + > + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) > + return -EINVAL; > + > +retry: > + if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) > + return -EINVAL; > + pte = *ptep; > + pte_unmap_unlock(ptep, ptl); > > - if (follow_phys(vma, addr, write, &prot, &phys_addr)) > + prot = pgprot_val(pte_pgprot(pte)); > + phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT; > + > + if ((write & FOLL_WRITE) && !pte_write(pte)) > return -EINVAL; > > maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot); > if (!maddr) > return -ENOMEM; > > + if (follow_pte(vma->vm_mm, addr, &ptep, &ptl)) > + goto out_unmap; > + > + if (pte_same(pte, *ptep)) { The ioremap area is something I'm sorta new to, so a newbie question: is it possible for the same pte to already be there, ever? If so, we be stuck in an infinite loop here. I'm sure that's not the case, but it's not yet obvious to me why it's impossible. Resource reservations maybe? thanks, -- John Hubbard NVIDIA