Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp216865pxj; Fri, 28 May 2021 02:11:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxz1Bl/g6xG1n0tQDqbVFEN4JX0MhcAgab35lxFQLkuz/wwi5UHSHHKtBi4uJKVoKml7nB3 X-Received: by 2002:a17:907:204b:: with SMTP id pg11mr8334254ejb.433.1622193092325; Fri, 28 May 2021 02:11:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1622193092; cv=none; d=google.com; s=arc-20160816; b=WzDtdtqif5PFrLc/6Vs5pzGA5gwQxNicSNkVO9Nir5uffyoozgxQCDAPqfdL4fwm+4 D+KswP/rsTENbUWVTWkVEJFSrWofcYlHmxcDlz0gNHv93DA/wQyxdLm+RiBWmK17KFB2 MZ1nlum7Epvi8nrZU5rdF08mX3EOPxjYtDxnccwzjaHqr6bBQQw7JGcgLkI3yjQHW5uu a15ePMpPWJqw10hJFP3XgaL2xkhO1TzZd8o51T5u+2bMDhKL4xkgvpk0G7E9Usj2fFLf LiOEzI5yTUpT/UKsjTUocJaH5gIUUk/z3EvR8C4ztJjdv4mh9TOFwth3le+/7cZyoUNY khnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:organization :from:references:cc:to:subject:dkim-signature; bh=+WiQc46VNYZyiaojk5h9JxqplWT0OeXUCmh+32Qpg80=; b=dZC1GDWsJ3wov4PsDY9vRl4CdWbZ8hbmQxwvUrcKmcrFnp0z+X6dWuGHiikih2Q9kO NP1BDhIirZLQAwIP2E6kg5B8jRBNnUyEfKWqMO6XO6G+lK+eaFIfeC9elTg6fNRyPb6q Qmj0Ble87xlpkX0THs0CBjkEyGA4ZkE+F4TE16I32AakgKZV2oCIjgc+t09SJC8wqiXw E2B5ZasQ2IBaM5S07AEkfuTTucL4KxF3Wa4SsPPgxtNNqQV/uTnvewVq33l51cID4KLK 7lJkPYcmCO5zUw3vVGeZlfsFhlN2YO5TYiaShSrYQ2tX5FOpYySynEft79yK6fGG8QJV oyLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PkiezJ65; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f20si4437604edq.178.2021.05.28.02.11.09; Fri, 28 May 2021 02:11:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PkiezJ65; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235453AbhE1I7l (ORCPT + 99 others); Fri, 28 May 2021 04:59:41 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:56680 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235297AbhE1I7k (ORCPT ); Fri, 28 May 2021 04:59:40 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622192286; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+WiQc46VNYZyiaojk5h9JxqplWT0OeXUCmh+32Qpg80=; b=PkiezJ65gKThqItW0miX2G3cHe1bi307Sl5gvkUN0OR+fpJyv29KTreEhqEeqimPlS28yO l2X3mMZUKos3Snl5P/RttUf3Mh9i+GckCJ4F+wl4BWZFo3dpkM5O/UliEq+l7kpRAWKloK WhTNEEwqtSWBU+/0nIgYotyTARGVEaI= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-250-a0W0t97dPku1ZiTacPBLAA-1; Fri, 28 May 2021 04:58:04 -0400 X-MC-Unique: a0W0t97dPku1ZiTacPBLAA-1 Received: by mail-wm1-f71.google.com with SMTP id f141-20020a1c1f930000b029017ce5240ed6so1119502wmf.5 for ; Fri, 28 May 2021 01:58:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=+WiQc46VNYZyiaojk5h9JxqplWT0OeXUCmh+32Qpg80=; b=J4Mfn7v5C1pCbtL2nNG4w8ZaOFJuGKVg1PUQ8HD0D94OQG/T6tWzDbHxbr9EP36f7z zAj+n+qfAgoxSDja/6L7dE69S1OtoqOFn56hMhT8xSGr/s1W88L5W/9VNwRkSZD+3Z96 qHXujLF0lBsxhKFtXxRFgg5YI3PQhAhY4RONEn/lwbMNQNjBvbP/LSfLG0/KElaAIdF1 6gvItLeNRyuDzwBgit5UQ5BaOyd8IDydz2Q4WqQgeQ8M0YlX8Sg9OalMOx0QUned9XnK 2EfHrGWo4Qdin+zVi6+9IoEcRYg7FaCDqPaV1FcUiF/sGl1QAdj/6i9SfIZ8nStLRwza lSYA== X-Gm-Message-State: AOAM530KycSp5oLk5QgxANxcV0YoQBPfRiTQgvTI0o2lpfJdCxVnLunm oPQaohrOTZSp+et4Mtd3ZfgY+DkXngYMYAXejAModMHPzxjOFEBYUv2dczX3rEsbj3FpE9zh4rS SdcYW8niblEQJVhnzeDseP6Y4 X-Received: by 2002:a1c:9a89:: with SMTP id c131mr7583528wme.49.1622192283320; Fri, 28 May 2021 01:58:03 -0700 (PDT) X-Received: by 2002:a1c:9a89:: with SMTP id c131mr7583507wme.49.1622192283052; Fri, 28 May 2021 01:58:03 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6870.dip0.t-ipconnect.de. [91.12.104.112]) by smtp.gmail.com with ESMTPSA id u14sm13405368wmc.41.2021.05.28.01.58.02 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 28 May 2021 01:58:02 -0700 (PDT) Subject: Re: [PATCH v4] /dev/mem: Revoke mappings when a driver claims the region To: Dan Williams , Bjorn Helgaas Cc: Greg KH , Arnd Bergmann , Ingo Molnar , Kees Cook , Matthew Wilcox , Russell King , Andrew Morton , Linux Kernel Mailing List , Linux MM , Linux PCI , Daniel Vetter , =?UTF-8?Q?Krzysztof_Wilczy=c5=84ski?= , Jason Gunthorpe , Christoph Hellwig References: <159009507306.847224.8502634072429766747.stgit@dwillia2-desk3.amr.corp.intel.com> <20210527205845.GA1421476@bjorn-Precision-5520> From: David Hildenbrand Organization: Red Hat Message-ID: <7eaf4c10-292e-18d7-e8ce-3a6b72122381@redhat.com> Date: Fri, 28 May 2021 10:58:01 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27.05.21 23:30, Dan Williams wrote: > On Thu, May 27, 2021 at 1:58 PM Bjorn Helgaas wrote: >> >> [+cc Daniel, Krzysztof, Jason, Christoph, linux-pci] >> >> On Thu, May 21, 2020 at 02:06:17PM -0700, Dan Williams wrote: >>> Close the hole of holding a mapping over kernel driver takeover event of >>> a given address range. >>> >>> Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges") >>> introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the >>> kernel against scenarios where a /dev/mem user tramples memory that a >>> kernel driver owns. However, this protection only prevents *new* read(), >>> write() and mmap() requests. Established mappings prior to the driver >>> calling request_mem_region() are left alone. >>> >>> Especially with persistent memory, and the core kernel metadata that is >>> stored there, there are plentiful scenarios for a /dev/mem user to >>> violate the expectations of the driver and cause amplified damage. >>> >>> Teach request_mem_region() to find and shoot down active /dev/mem >>> mappings that it believes it has successfully claimed for the exclusive >>> use of the driver. Effectively a driver call to request_mem_region() >>> becomes a hole-punch on the /dev/mem device. >> >> This idea of hole-punching /dev/mem has since been extended to PCI >> BARs via [1]. >> >> Correct me if I'm wrong: I think this means that if a user process has >> mmapped a PCI BAR via sysfs, and a kernel driver subsequently requests >> that region via pci_request_region() or similar, we punch holes in the >> the user process mmap. The driver might be happy, but my guess is the >> user starts seeing segmentation violations for no obvious reason and >> is not happy. >> >> Apart from the user process issue, the implementation of [1] is >> problematic for PCI because the mmappable sysfs attributes now depend >> on iomem_init_inode(), an fs_initcall, which means they can't be >> static attributes, which ultimately leads to races in creating them. > > See the comments in iomem_get_mapping(), and revoke_iomem(): > > /* > * Check that the initialization has completed. Losing the race > * is ok because it means drivers are claiming resources before > * the fs_initcall level of init and prevent iomem_get_mapping users > * from establishing mappings. > */ > > ...the observation being that it is ok for the revocation inode to > come on later in the boot process because userspace won't be able to > use the fs yet. So any missed calls to revoke_iomem() would fall back > to userspace just seeing the resource busy in the first instance. I.e. > through the normal devmem_is_allowed() exclusion. > >> >> So I'm raising the question of whether this hole-punch is the right >> strategy. >> >> - Prior to revoke_iomem(), __request_region() was very >> self-contained and really only depended on the resource tree. Now >> it depends on a lot of higher-level MM machinery to shoot down >> mappings of other tasks. This adds quite a bit of complexity and >> some new ordering constraints. >> >> - Punching holes in the address space of an existing process seems >> unfriendly. Maybe the driver's __request_region() should fail >> instead, since the driver should be prepared to handle failure >> there anyway. > > It's prepared to handle failure, but in this case it is dealing with a > root user of 2 minds. > >> >> - [2] suggests that the hole punch protects drivers from /dev/mem >> writers, especially with persistent memory. I'm not really >> convinced. The hole punch does nothing to prevent a user process >> from mmapping and corrupting something before the driver loads. > > The motivation for this was a case that was swapping between /dev/mem > access and /dev/pmem0 access and they forgot to stop using /dev/mem > when they switched to /dev/pmem0. If root wants to use /dev/mem it can > use it, if root wants to stop the driver from loading it can set > mopdrobe policy or manually unbind, and if root asks the kernel to > load the driver while it is actively using /dev/mem something has to > give. Given root has other options to stop a driver the decision to > revoke userspace access when root messes up and causes a collision > seems prudent to me. > Is there a real use case for mapping pmem via /dev/mem or could we just prohibit the access to these areas completely? What's the use case for "swapping between /dev/mem access and /dev/pmem0 access" ? -- Thanks, David / dhildenb