Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp397256pxb; Thu, 21 Jan 2021 09:35:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJz0vK8Wvp468Z1ZMCyB1kwdO0q6mTewZc6rTFUeqj8HBA9q2GYgphyfxRwhjfNSd1o3aJNz X-Received: by 2002:a05:6402:22e9:: with SMTP id dn9mr184544edb.61.1611250545889; Thu, 21 Jan 2021 09:35:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611250545; cv=none; d=google.com; s=arc-20160816; b=pe3zS+tcqw21kyOjP0uXD5WpQKOx1G0Jt7Glh+gZE+gYQi50bFKo+X4SCBhYYIOOrf KmIXZsQGlTbzQQV+xygO8/8rqBm0Ch+b5s1csdxkxl0RLW13glp5OWxtNh79nPiKuot6 +2AU3JaZ7LGP4y591DBNn0OAfKYGtcjUz8fxMJTesplVU5U6qaBgJ414LTYHsKu3e6LH H+mJOSBcK11ZT0O7P2Ps0kbxo72SZ1s0uClXEcZ0L9PF+KGIQIGbAeojmCZnJ1yqcinU Ty7JjXL+CJPeqBOCrP/acpK16GMnmh486tRTSkPgn6bD0CuuaBs1p9j1V8SGe9eZJIUU uiPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=iw6rsaQx0AyEczpys9YEDMUiReOPzqfBusWyO4ZBOlI=; b=XZZFepT2lBYUh6X9zFB587uDjirOubc0EwN2Gds0OC3g1tpHnyQbd4WIcFbj388XOx 1IEDVb43JQOKSOwXPlybhsYrKm1AQm4jfxw2COzQwa6xxH4UZ4XDqyYYeH7o6N2npmcw ImeDNXhaMJMNf5/6e0085Cwnu9oVsduKzvhPZj9/xjj1YqYopL0fkfggFUanaSNGimao S7Z7Lr2lZ5kPCilBuH86hEVUmU3et6Bow9UqtD5jGKIKouTRBcKGlLgQB2CQ4RZ+rFf3 Fw7jLCNMKQ0ZTO1i10S6tEV8SCrzmulTXlUSD8BTae4CXXX+AQkujkQpJqW9kz7inhoG eSyw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u6si2079476edo.225.2021.01.21.09.35.14; Thu, 21 Jan 2021 09:35:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388629AbhAURcp (ORCPT + 99 others); Thu, 21 Jan 2021 12:32:45 -0500 Received: from foss.arm.com ([217.140.110.172]:42200 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388541AbhAURaI (ORCPT ); Thu, 21 Jan 2021 12:30:08 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B74E611B3; Thu, 21 Jan 2021 09:29:20 -0800 (PST) Received: from [10.57.39.58] (unknown [10.57.39.58]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 13E233F66E; Thu, 21 Jan 2021 09:29:14 -0800 (PST) Subject: Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool To: Rob Herring Cc: Claire Chang , Michael Ellerman , Benjamin Herrenschmidt , Paul Mackerras , Joerg Roedel , Will Deacon , Frank Rowand , Konrad Rzeszutek Wilk , Boris Ostrovsky , Juergen Gross , Stefano Stabellini , Christoph Hellwig , Marek Szyprowski , Grant Likely , Heinrich Schuchardt , Thierry Reding , Ingo Molnar , Thiago Jung Bauermann , Peter Zijlstra , Greg Kroah-Hartman , Saravana Kannan , "Wysocki, Rafael J" , Heikki Krogerus , Andy Shevchenko , Randy Dunlap , Dan Williams , Bartosz Golaszewski , devicetree@vger.kernel.org, "linux-kernel@vger.kernel.org" , linuxppc-dev , Linux IOMMU , xen-devel@lists.xenproject.org, Tomasz Figa , Nicolas Boichat References: <20210106034124.30560-1-tientzu@chromium.org> <20210106034124.30560-6-tientzu@chromium.org> <20210120165348.GA220770@robh.at.kernel.org> <313f8052-a591-75de-c4c2-ee9ea8f02e7f@arm.com> From: Robin Murphy Message-ID: <1a570c5c-e0da-7d86-4384-4a4c50193c94@arm.com> Date: Thu, 21 Jan 2021 17:29:13 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-01-21 15:48, Rob Herring wrote: > On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy > wrote: >> >> On 2021-01-20 21:31, Rob Herring wrote: >>> On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy >>> wrote: >>>> >>>> On 2021-01-20 16:53, Rob Herring wrote: >>>>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang >>>>> wrote: >>>>>> Introduce the new compatible string, restricted-dma-pool, >>>>>> for restricted DMA. One can specify the address and length >>>>>> of the restricted DMA memory region by restricted-dma-pool >>>>>> in the device tree. >>>>> >>>>> If this goes into DT, I think we should be able to use >>>>> dma-ranges for this purpose instead. Normally, 'dma-ranges' >>>>> is for physical bus restrictions, but there's no reason it >>>>> can't be used for policy or to express restrictions the >>>>> firmware has enabled. >>>> >>>> There would still need to be some way to tell SWIOTLB to pick >>>> up the corresponding chunk of memory and to prevent the kernel >>>> from using it for anything else, though. >>> >>> Don't we already have that problem if dma-ranges had a very >>> small range? We just get lucky because the restriction is >>> generally much more RAM than needed. >> >> Not really - if a device has a naturally tiny addressing capability >> that doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer >> will be allocated then it's unlikely to work well, but that's just >> crap system design. Yes, memory pressure in ZONE_DMA{32} is >> particularly problematic for such limited devices, but it's >> irrelevant to the issue at hand here. > > Yesterday's crap system design is today's security feature. Couldn't > this feature make crap system design work better? Indeed! Say you bring out your shiny new "Strawberry Flan 4" machine with all the latest connectivity, but tragically its PCIe can only address 25% of the RAM. So you decide to support deploying it in two configurations: one where it runs normally for best performance, and another "secure" one where it dedicates that quarter of RAM as a restricted DMA pool for any PCIe devices - that way, even if that hotel projector you plug in turns out to be a rogue Thunderbolt endpoint, it can never snarf your private keys off your eMMC out of the page cache. (Yes, is is the thinnest of strawmen, but it sets the scene for the point you raised...) ...which is that in both cases the dma-ranges will still be identical. So how is the kernel going to know whether to steal that whole area from memblock before anything else can allocate from it, or not? I don't disagree that even in Claire's original intended case it would be semantically correct to describe the hardware-firewalled region with dma-ranges. It just turns out not to be necessary, and you're already arguing for not adding anything in DT that doesn't need to be. >> What we have here is a device that's not allowed to see *kernel* >> memory at all. It's been artificially constrained to a particular >> region by a TZASC or similar, and the only data which should ever >> be placed in that > > May have been constrained, but that's entirely optional. > > In the optional case where the setup is entirely up to the OS, I > don't think this belongs in the DT at all. Perhaps that should be > solved first. Yes! Let's definitely consider that case! Say you don't have any security or physical limitations but want to use a bounce pool for some device anyway because reasons (perhaps copying streaming DMA data to a better guaranteed alignment gives an overall performance win). Now the *only* relevant thing to communicate to the kernel is to, ahem, reserve a large chunk of memory, and use it for this special purpose. Isn't that literally what reserved-memory bindings are for? >> region is data intended for that device to see. That way if it >> tries to go rogue it physically can't start slurping data intended >> for other devices or not mapped for DMA at all. The bouncing is an >> important part of this - I forget the title off-hand but there was >> an interesting paper a few years ago which demonstrated that even >> with an IOMMU, streaming DMA of in-place buffers could reveal >> enough adjacent data from the same page to mount an attack on the >> system. Memory pressure should be immaterial since the size of each >> bounce pool carveout will presumably be tuned for the needs of the >> given device. >> >>> In any case, wouldn't finding all the dma-ranges do this? We're >>> already walking the tree to find the max DMA address now. >> >> If all you can see are two "dma-ranges" properties, how do you >> propose to tell that one means "this is the extent of what I can >> address, please set my masks and dma-range-map accordingly and try >> to allocate things where I can reach them" while the other means >> "take this output range away from the page allocator and hook it up >> as my dedicated bounce pool, because it is Serious Security Time"? >> Especially since getting that choice wrong either way would be a >> Bad Thing. > > Either we have some heuristic based on the size or we add some hint. > The point is let's build on what we already have for defining DMA > accessible memory in DT rather than some parallel mechanism. The point I'm trying to bang home is that it's really not about the DMA accessibility, it's about the purpose of the memory itself. Even when DMA accessibility *is* relevant it's already implied by that purpose, from the point of view of the implementation. The only difference it might make is to the end user if they want to ascertain whether the presence of such a pool represents protection against an untrusted device or just some DMA optimisation tweak. Robin.