Received: by 10.192.165.148 with SMTP id m20csp1821731imm; Thu, 26 Apr 2018 02:47:23 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/pG7cXx778hTJ26DexIu3pE7u5ctrOET0RJWpW376kCDbfUEyZ7oJ1t4M7YvmQF7lRfB6T X-Received: by 2002:a17:902:2468:: with SMTP id m37-v6mr33683433plg.388.1524736043109; Thu, 26 Apr 2018 02:47:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524736043; cv=none; d=google.com; s=arc-20160816; b=ctoA+/xGQ1b6YDcAApTnp8DDjB9HJ0Q+RiqfxzOYP/uoeZM3dQklUGsuANIL3qyrMY JnN7lQ9CuKs6nrAlqdp3994JvmRrciHc5G+t7kjwI14ueBt+ISMD2vOxlT5VIcv0pjS8 y651dFHhLuype9vCgXJgCxB+mRUaSQkLsSRyOS1h0xUv14QBjVHJn+BEyhfS9zWfM79X Q0OX8sdiHVDLvsV+SYTnRmxMNDZM1jXOtsDXEv2KWlj18rnJS7oHU0yx5iejHOIiUs3i qQ3fQCusyBnlEH/LUU8/y4tZrT6AV5uYl+I3k1BeQ+LfdCJCuczNryna8UfahbOipgAM hGQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=vbLqoXbyrjpyvRr1nlZf88lEgaH/jh6KqdfvJaYX/Fc=; b=pbzOekbsifp6bpxumJXngppVD5qcvxv53lbBVCPQ3E/owCaXaZQXKXVs+B0h1AnUKY 5PrGcAhM5e4GbB2MlX58aRrYie1kka8hrVri8JO2jqMpszht2tSeJGP24BnWyoaBFn/4 4AkWN19oqZgNZ+hCEPalpZ6iCa4+FkV7K+CraKql4TleKIMBehVJ7vq9V/I95jq6X2pq NF6TxUEky5lufPfKblBCYg3TeHTsrYGdTyX41a9Szz7dA8rwYgZ3tqr8OaPvDGD8TzaT jWIQSyswWqwPqATYFDrfQ/EjfzbBNW7MEw/UJDAmzJaLumjCHOO7/xC7ISn66ZDd+K7c O7cw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=hR4dSGMw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t5si9705303pgp.594.2018.04.26.02.47.08; Thu, 26 Apr 2018 02:47:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@ffwll.ch header.s=google header.b=hR4dSGMw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754811AbeDZJpd (ORCPT + 99 others); Thu, 26 Apr 2018 05:45:33 -0400 Received: from mail-it0-f48.google.com ([209.85.214.48]:50554 "EHLO mail-it0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754611AbeDZJp1 (ORCPT ); Thu, 26 Apr 2018 05:45:27 -0400 Received: by mail-it0-f48.google.com with SMTP id p3-v6so23129171itc.0 for ; Thu, 26 Apr 2018 02:45:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ffwll.ch; s=google; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=vbLqoXbyrjpyvRr1nlZf88lEgaH/jh6KqdfvJaYX/Fc=; b=hR4dSGMwkhXlc7l/HhEiw+28iEbosNMgYgnL7+p1D2NfxzDmey3mEWg39iPOtFdHKE j21Uw+JZ6SQ0k8V/R9hSia6c9vyQDpZgxb6ltVdCaMu4BQ3WlIMgY9U8cbSLG+JJY/6g hNrcsLsTyPTVdTnvmFZ/+o16PLNUDLEctgO9s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=vbLqoXbyrjpyvRr1nlZf88lEgaH/jh6KqdfvJaYX/Fc=; b=W99ABj8GEoSNAlXbgxNfWtwmEYTK3co8Heb2W8hG3iNROYcx7hThAj5K99Rpi5e5do R6uOycoOA6H9u+LaVUwW2X/SRvYquhEanYHDeFfQFEXcf9/etDviwlFKjovB7qFgCaF6 Ij+wtnhEzdd0colQbkhdn0T2WWl0b1hN2iLLba12sA9A25DX4UgkK6u2JYAnRASULW6q BnHou7CqnG2v0sg0YTGwFBcbztte+mlOLf0B0eRcd6Ig2CBpo7UbLP6EOSSRc2p4Ezie cbFkDeB9LWNxW7UitdhaMa2QLWeEgQoy75yqbJlLMXInCzitF3MbB1TCQgjLAjBnE2Qn bokA== X-Gm-Message-State: ALQs6tD65ErRa5oBt1zfvxjWzhVjFtfpUv+znBd9aWhDYL0L1vpYHiZX UKHVpLYnv9qU1yh9c78hAoAEtGWW1jRMGQPQEMu/OA== X-Received: by 2002:a24:b310:: with SMTP id e16-v6mr23802063itf.58.1524735926699; Thu, 26 Apr 2018 02:45:26 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a4f:f0d3:0:0:0:0:0 with HTTP; Thu, 26 Apr 2018 02:45:26 -0700 (PDT) X-Originating-IP: [2a02:168:5635:0:39d2:f87e:2033:9f6] In-Reply-To: <20180426090942.GA18811@infradead.org> References: <20180425054855.GA17038@infradead.org> <20180425064335.GB28100@infradead.org> <20180425074151.GA2271@ulmo> <20180425085439.GA29996@infradead.org> <20180425100429.GR25142@phenom.ffwll.local> <20180425153312.GD27076@infradead.org> <20180426090942.GA18811@infradead.org> From: Daniel Vetter Date: Thu, 26 Apr 2018 11:45:26 +0200 X-Google-Sender-Auth: miFIOHdfYzRVXXLhgEghxJxZHXc Message-ID: Subject: Re: noveau vs arm dma ops To: Christoph Hellwig Cc: Thierry Reding , =?UTF-8?Q?Christian_K=C3=B6nig?= , "moderated list:DMA BUFFER SHARING FRAMEWORK" , Linux Kernel Mailing List , amd-gfx list , Jerome Glisse , dri-devel , Dan Williams , Logan Gunthorpe , "open list:DMA BUFFER SHARING FRAMEWORK" , iommu@lists.linux-foundation.org, Linux ARM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 26, 2018 at 11:09 AM, Christoph Hellwig wrote: > On Wed, Apr 25, 2018 at 11:35:13PM +0200, Daniel Vetter wrote: >> > get_required_mask() is supposed to tell you if you are safe. However >> > we are missing lots of implementations of it for iommus so you might get >> > some false negatives, improvements welcome. It's been on my list of >> > things to fix in the DMA API, but it is nowhere near the top. >> >> I hasn't come up in a while in some fireworks, so I honestly don't >> remember exactly what the issues have been. But >> >> commit d766ef53006c2c38a7fe2bef0904105a793383f2 >> Author: Chris Wilson >> Date: Mon Dec 19 12:43:45 2016 +0000 >> >> drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping >> >> and the various bits of code that a >> >> $ git grep SWIOTLB -- drivers/gpu >> >> turns up is what we're doing to hack around that stuff. And in general >> (there's some exceptions) gpus should be able to address everything, >> so I never fully understood where that's even coming from. > > I'm pretty sure I've seen some oddly low dma masks in GPU drivers. E.g. > duplicated in various AMD files: > > adev->need_dma32 = false; > dma_bits = adev->need_dma32 ? 32 : 40; > r = pci_set_dma_mask(adev->pdev, DMA_BIT_MASK(dma_bits)); > if (r) { > adev->need_dma32 = true; > dma_bits = 32; > dev_warn(adev->dev, "amdgpu: No suitable DMA available.\n"); > } > > synopsis: > > drivers/gpu/drm/bridge/synopsys/dw-hdmi-i2s-audio.c: pdevinfo.dma_mask = DMA_BIT_MASK(32); > drivers/gpu/drm/bridge/synopsys/dw-hdmi.c: pdevinfo.dma_mask = DMA_BIT_MASK(32); > drivers/gpu/drm/bridge/synopsys/dw-hdmi.c: pdevinfo.dma_mask = DMA_BIT_MASK(32); > > etnaviv gets it right: > > drivers/gpu/drm/etnaviv/etnaviv_gpu.c: u32 dma_mask = (u32)dma_get_required_mask(gpu->dev); > > > But yes, the swiotlb hackery really irks me. I just have some more > important and bigger fires to fight first, but I plan to get back to the > root cause of that eventually. > >> >> >> - dma api hides the cache flushing requirements from us. GPUs love >> >> non-snooped access, and worse give userspace control over that. We want >> >> a strict separation between mapping stuff and flushing stuff. With the >> >> IOMMU api we mostly have the former, but for the later arch maintainers >> >> regularly tells they won't allow that. So we have drm_clflush.c. >> > >> > The problem is that a cache flushing API entirely separate is hard. That >> > being said if you look at my generic dma-noncoherent API series it tries >> > to move that way. So far it is in early stages and apparently rather >> > buggy unfortunately. >> >> I'm assuming this stuff here? >> >> https://lkml.org/lkml/2018/4/20/146 >> >> Anyway got lost in all that work a bit, looks really nice. > > That url doesn't seem to work currently. But I am talking about the > thread titled '[RFC] common non-cache coherent direct dma mapping ops' > >> Yeah the above is pretty much what we do on x86. dma-api believes >> everything is coherent, so dma_map_sg does the mapping we want and >> nothing else (minus swiotlb fun). Cache flushing, allocations, all >> done by the driver. > > Which sounds like the right thing to do to me. > >> On arm that doesn't work. The iommu api seems like a good fit, except >> the dma-api tends to get in the way a bit (drm/msm apparently has >> similar problems like tegra), and if you need contiguous memory >> dma_alloc_coherent is the only way to get at contiguous memory. There >> was a huge discussion years ago about that, and direct cma access was >> shot down because it would have exposed too much of the caching >> attribute mangling required (most arm platforms need wc-pages to not >> be in the kernel's linear map apparently). > > Simple cma_alloc() doesn't do anything about cache handling, it > just is a very dumb allocator for large contiguous regions inside > a big pool. > > I'm not the CMA maintainer, but in general I'd love to see an > EXPORT_SYMBOL_GPL slapped onto cma_alloc/release and drivers use > that were needed. Using that plus dma_map*/dma_unmap* sounds like > a much saner interface than dma_alloc_attrs + DMA_ATTR_NON_CONSISTENT > or DMA_ATTR_NO_KERNEL_MAPPING. > > You don't happen to have a pointer to that previous discussion? I'll try to dig them up, I tried to stay as far away from that discussion as possible (since I have the luxury to not care for intel gpus). >> Anything that separate these 3 things more (allocation pools, mapping >> through IOMMUs and flushing cpu caches) sounds like the right >> direction to me. Even if that throws some portability across platforms >> away - drivers who want to control things in this much detail aren't >> really portable (without some serious work) anyway. > > As long as we stay away from the dma coherent allocations separating > them is fine, and I'm working towards it. dma coherent allocations on > the other hand are very hairy beasts, and provide by very different > implementations depending on the architecture, or often even depending > on the specifics of individual implementations inside the architecture. > > So for your GPU/media case it seems like you'd better stay away from > them as much as you can. Hm, at least on x86 we do set up write-combine mappings ourselves (by calling relevant functions, which also changes the kernel mapping to avoid aliasing fun). Of course that means the gpu drivers are less portable, but then they're not really all that portable anyway - the buffer handling code always needs to be tuned for the platform you're running on. GPUs really love write-combining everything (we do the same for our mmio mappings to kernel/userspace, see stuff like io-mapping.h. But in many other cases dma_alloc_coherent is only used because it's the convenient/only way to allocate the memory we need. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch