Received: by 10.223.164.221 with SMTP id h29csp2800621wrb; Tue, 3 Oct 2017 11:06:02 -0700 (PDT) X-Google-Smtp-Source: AOwi7QB8uP6776A7rQBgd4am56LWKJC/JhDCXAf/9ddVpDfkAJRdz6CZ8lra49YnMDwHbbhdZMX6 X-Received: by 10.159.204.139 with SMTP id t11mr17896993plo.359.1507053962034; Tue, 03 Oct 2017 11:06:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1507053961; cv=none; d=google.com; s=arc-20160816; b=weXFVjuO9wZYtQCHC9Uxe8OQMFVG4JXJSsPuk60Rc4UPNdBS+qfwe7sgY6qPu2JPRR vlxbl0DPZlTtRzbLKV44gUWosW3glSW6Gq9BgHq/gWyS2wzkND9wyOGJzRe3Cg9TDioR 9F60x14SWe0vzrE3/jRfmvxQkdyOsYaPC2W1qBG45IrOZmfhs6JgvfXCmHkQwyK1c03T Fa4KNLJadt8wupSfQXORqdxXahAQDIlUZLzbrz5RhNBx+beKY+H6cwBrbGubI9713q/C et5VzzffbhMqIopnoyT64L6rk2SqnvFIQKnUl/xOT1syS6/+CT8i1mxUc+bD5c9rcQAo CY6w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=jbc+29thxa8TqbNZQa0AhWfn+nDX8CIMZJtU19lf/Dc=; b=g18p6lSTaK+O930ygkJv/QZLsl1tMV/BPPrUI2x19QvIkGetumLBWdVWTxwS8p8UtW PdYM1uhAHMLsId4i5T9ZxcyLqhRKg2dkMTmA4IFL4chRayzKCWmKwbzKTGY6FjgRk7sy bXOqvDomEMctGnfYhXFD8KkTufdDMv3UIwqJc3vAOKqQafScIIs0E2DR1IL0LySR4Mjs /4owwDn8bcbTVmvrnonAYOFvWsqib2GfS0TtQ2yUZyo/RdIrVxMExmX8PVvF4W8GtIQK oTwLWQtRAfOCjw/n4Qg0K9pnr5wHDxgMfSvq93CaoxYnue+F+3TH3xtLxpl5RtEO3qpm 2nkw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f34si10732028ple.458.2017.10.03.11.05.46; Tue, 03 Oct 2017 11:06:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751172AbdJCSFW (ORCPT + 99 others); Tue, 3 Oct 2017 14:05:22 -0400 Received: from foss.arm.com ([217.140.101.70]:52282 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750812AbdJCSFV (ORCPT ); Tue, 3 Oct 2017 14:05:21 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 00C0280D; Tue, 3 Oct 2017 11:05:21 -0700 (PDT) Received: from [10.1.210.88] (e110467-lin.cambridge.arm.com [10.1.210.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2CD963F578; Tue, 3 Oct 2017 11:05:19 -0700 (PDT) Subject: Re: [PATCH] iommu/vt-d: Fix scatterlist offset handling To: David Woodhouse , joro@8bytes.org Cc: ashok.raj@intel.com, leedom@chelsio.com, Harsh@chelsio.com, herbert@gondor.apana.org.au, iommu@lists.linux-foundation.org, linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org References: <644c3e01654f8bd48d669c36e424959d6ef0e27e.1506607370.git.robin.murphy@arm.com> <1507035334.29211.105.camel@infradead.org> From: Robin Murphy Message-ID: Date: Tue, 3 Oct 2017 19:05:17 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: <1507035334.29211.105.camel@infradead.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/10/17 13:55, David Woodhouse wrote: > On Thu, 2017-09-28 at 15:14 +0100, Robin Murphy wrote: >> The intel-iommu DMA ops fail to correctly handle scatterlists where >> sg->offset is greater than PAGE_SIZE - the IOVA allocation is computed >> appropriately based on the page-aligned portion of the offset, but the >> mapping is set up relative to sg->page, which means it fails to actually >> cover the whole buffer (and in the worst case doesn't cover it at all): >> >>     (sg->dma_address + sg->dma_len) ----+ >>     sg->dma_address ---------+          | >>     iov_pfn------+           |          | >>                  |           |          | >>                  v           v          v >> iova:   a        b        c        d        e        f >>         |--------|--------|--------|--------|--------| >>                           <...calculated....> >>                  [_____mapped______] >> pfn:    0        1        2        3        4        5 >>         |--------|--------|--------|--------|--------| >>                  ^           ^          ^ >>                  |           |          | >>     sg->page ----+           |          | >>     sg->offset --------------+          | >>     (sg->offset + sg->length) ----------+ > > I'd still dearly love to see some clear documentation of what it means > for sg->offset to be outside the page referenced by sg->page. I think the key is that for each SG segment, sg->page doesn't necessarily represent "a" page, but the first of one or more contiguous pages. Disregarding offsets for the moment, Here's a typical example of a 120KB buffer from the block layer as processed by iommu_dma_map_sg(): [ 16.092649] == initial (4) == [ 16.095591] 0: virt ffff800001372000 phys 0x0000000081372000 dma 0x0000000000000000 [ 16.095591] offset 0x00000000 length 0x0000e000 dma_len 0x00000000 [ 16.109541] 1: virt ffff800001380000 phys 0x0000000081380000 dma 0x0000000000000000 [ 16.109541] offset 0x00000000 length 0x0000d000 dma_len 0x00000000 [ 16.123491] 2: virt ffff80000138e000 phys 0x000000008138e000 dma 0x0000000000000000 [ 16.123491] offset 0x00000000 length 0x00002000 dma_len 0x00000000 [ 16.137440] 3: virt ffff800001390000 phys 0x0000000081390000 dma 0x0000000000000000 [ 16.137440] offset 0x00000000 length 0x00001000 dma_len 0x00000000 [ 16.216167] == final (2) == [ 16.219106] 0: virt ffff800001372000 phys 0x0000000081372000 dma 0x00000000ffb60000 [ 16.219106] offset 0x00000000 length 0x0000e000 dma_len 0x0000e000 [ 16.233056] 1: virt ffff800001380000 phys 0x0000000081380000 dma 0x00000000ffb70000 [ 16.233056] offset 0x00000000 length 0x0000d000 dma_len 0x00010000 i.e. segments of 14 pages, 13 pages, 2 pages and 1 page respectively (and we further merge the resulting DMA-contiguous segments on top of that). Now, there are indeed plenty of drivers and subsystems which do work on lists of explicitly single pages - anything doing some variant of "addr = kmap_atomic(sg_page(sg)) + sg->offset;" is easy to spot - but I don't think DMA API implementations are in a position to make any kind of assumption; nearly all of them just shut up and handle sg->length bytes from sg_phys(sg) without questioning the caller, and I reckon that's exactly what they should be doing. > Or is it really not "outside", and it's *only* valid for the offset to > be > PAGE_OFFSET when it's a huge page, so we can check that with a > BUG_ON() ?  > > In particular, I'd like to know what is intended in the Xen PV case, > where there isn't a straight correspondence between pfn and mfn. Is the > out-of-range sg->offset intended to refer to the next *pfn* after sg- >> page, or to the next *mfn* after sg->page? Logically, it should mean the same thing as whatever a length of more than 1 page means to Xen - judging by blkif_queue_rw_req() at least, that seems to be a BUG_ON() in both cases. > I confess I've only followed this thread vaguely, but I haven't seen a > *coherent* explanation except in the huge page case (in which case I > want to see that BUG_ON in the patch) of why this isn't just totally > bogus. As I've said before, I'd certainly consider it a denormalised case, but not a bogus one, and certainly not something that is the DMA API's job to police. Having now audited every dma_map_ops::map_sg implementation I could find, the only ones not using sg_phys()/sg_virt() or some other construction immune to the absolute offset value (MIPS even explicitly normalises it) are intel-iommu and arch/frv, and the latter is clearly broken anyway as it ignores sg->length. Robin. From 1580241218132444280@xxx Tue Oct 03 12:57:43 +0000 2017 X-GM-THRID: 1579793157804914964 X-Gmail-Labels: Inbox,Category Forums