Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp447137yba; Fri, 5 Apr 2019 09:44:49 -0700 (PDT) X-Google-Smtp-Source: APXvYqzHRw3h52yrq0rA8BqWIbfdzXTriu8ihRpBwdCjcWG3SHZAbVqgfAkfOqSuOZyDeKDxYuDm X-Received: by 2002:aa7:8145:: with SMTP id d5mr13552980pfn.215.1554482689565; Fri, 05 Apr 2019 09:44:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554482689; cv=none; d=google.com; s=arc-20160816; b=iUevoOXbmWWDidQ7osfVkkoMih9f966LFLEIMNit2yymWkoZ50Btp6m+9Dbg2UoXdi tJ/+4o3jXEutu+Pj/F7UJYEFux8G+Y7rRS+hvdl8W+5U/+w2ItmbhQbaRhUsgRQPyPgo czgidrrhIrcUDZqMpTiCaVItWwnXfaKRol07T8l5B8uIE1G2S5TPSQKV/5vNJYxmZZji M2lOUUgWFRUujiJV+wzJadvzwsGA7B3u3lf/6h9zF3uryCnPNDmKNUeTqsdPKPag1ysN NO9kh0IClF7ieGm1In/hC75Bl5Bjxq3k8ZL9aLUtMv+5BDMI7FU1F7igodacEzE0Rl++ f3Gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=vH7k4NScGxX+PZ2Q4Xnqz0ADhGw7qSOZY2XWeo4wwzY=; b=dOcz2VDHMreDq/M/HxIjXiotZwPAoP2DyWX3iQlD/1tpNFpk8XA4n27gSUPWW6F15k glPrw5JmdRI8bvt37grMGl/Jzitk4BSOe9lQoedbyZqIl6mD12wCk45+8m5eqFnazwkZ 57l1OeyMuWP0n45C0hqB97fdDDOM7eFVqJrI0FXtgaXIJzRwZQH2u2zUO326dle/XGkk vS4aRbsizkf/NWIp0xad0iYcHCOrfk0tW+xUAxT/bjUFdwDJiSqNYNJz1R4BUEI2hJnk PFb/DzsJoqK3yo7j82a7kAUGqOYdyHkwu3pTO9Oc32A8AtcUzGnhfdjfzXA3nZv5xCcZ z5YA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f63si13156348pff.107.2019.04.05.09.44.34; Fri, 05 Apr 2019 09:44:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731474AbfDEQmi (ORCPT + 99 others); Fri, 5 Apr 2019 12:42:38 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:52966 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730565AbfDEQmi (ORCPT ); Fri, 5 Apr 2019 12:42:38 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6BF8168F; Fri, 5 Apr 2019 09:42:37 -0700 (PDT) Received: from [10.1.196.69] (e112269-lin.cambridge.arm.com [10.1.196.69]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6CCA93F68F; Fri, 5 Apr 2019 09:42:35 -0700 (PDT) Subject: Re: [PATCH v2 3/3] drm/panfrost: Add initial panfrost driver To: Alyssa Rosenzweig Cc: Rob Herring , Tomeu Vizoso , Neil Armstrong , Maxime Ripard , Robin Murphy , Will Deacon , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, David Airlie , iommu@lists.linux-foundation.org, "Marty E . Plummer" , Sean Paul , linux-arm-kernel@lists.infradead.org References: <20190401074730.12241-1-robh@kernel.org> <20190401074730.12241-4-robh@kernel.org> <5efdc3cb-7367-65e1-d1bf-14051db5da10@arm.com> <20190405161632.GA9160@rosenzweig.io> From: Steven Price Message-ID: <34a7038e-34f0-0cc4-4fc4-9b7dda356df6@arm.com> Date: Fri, 5 Apr 2019 17:42:33 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <20190405161632.GA9160@rosenzweig.io> Content-Type: text/plain; charset=utf-8 Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05/04/2019 17:16, Alyssa Rosenzweig wrote: >> I'm also somewhat surprised that you don't need loads of other >> properties from the GPU - in particular knowing the number of shader >> cores is useful for allocating the right amount of memory for TLS (and >> can't be obtained purely from the GPU_ID). > > Since I have no idea what TLS is (and in my notes, I've come across the Sorry - "Thread Local Storage" - e.g. registers spilled to memory from a shader program. > acronym once ever and have it as a "??"), I'm not sure how to respond to > that... We don't know how to allocate memory for the GPU-internal data > structures (the tiler heap, for instance, but also a few others I've > just named "misc_0" and "scratchpad" -- guessing one of those is for > "TLS"). With kbase, I took the worst-case strategy of allocating > gigantic chunks on startup with tiny commit counts and GROW_ON_GPF set. > With the new driver, well, our memory consumption is scary since > implementing GROW_ON_GPF in an upstream-friendly way is a bit more work > and isn't expected to hit the 5.2 window. Yes GROW_ON_GPF is pretty much required for the tiler heap - it's not (reasonably) possible to determine how big it should be. The Arm user space driver does the same approach (tiny commit count, but allow it to grow). > > Given this is kernel facing, I'm hoping 're able to share some answers > here? At the moment I don't have any permission to share details which aren't already public in the kbase driver. Hopefully that situation will change. I'm also very much not an expert on anything but the kernel driver (I tried to stay away from shader compilers and all that graphics knowledge...). The details of the job descriptors is only really publicly documented in terms of the "replay workaround" which is quite limited. > >> I think this comment might have survived since the very earliest version >> of the Midgard driver! :) > > ^_^ > >> But I'm not sure anything will attempt to lock a region spanning two >> pages like that. > > At least at the moment, I align everything kernel-facing to page > granularity in userspace, so it's not a cornercase I'm going to hit > anytime soon. Still probably better to have it technically correct. > >> To be fair only for BASE_HW_ISSUE_6367/T60X - but yes it's not a >> pleasant workaround. There's no way on that hardware to reliably drain >> the write buffer other than waiting. > > *wishing T60X disappeared intensifies* ;) I think we all felt like that :) Still the Nexus 10 wasn't a bad tablet, and the Chromebook was an exciting first! > Granted there are enough other errata specific to it that aren't worked > around here that, well, it makes you wonder ;) A lot of the errata are things you only hit with soak testing. So to a large extent you "get lucky". >> Do we have a good way of user space determining which requirements are >> supported by the driver? At the moment it's just the one. kbase outgeew >> the original u16 and has an ugly "compat_core_req", so I suspect you're >> going to need to add several more along the way. > > Oh, so that's why compat_/core_req is split off! I thought somebody just > thought it would be "fun" to break the UABI ;) No that's a case of us actually not breaking the UABI for once :) > I've definitely issues using the wrong core_req field for the kbase I > had setup, that set me back a little bit on RK3399/T860 bringup *purses > lips* > > To be fair, bunches of the kbase reqs are for soft jobs, which I don't > feel are a good fit for how the Panfrost kernel works. If we need to > implement functionality corresponding to softjobs, that would likely be > done with dedicated ioctl(s), instead of affecting the core_req field. > > On that note > >> You might also want to consider being able to submit more than one job >> chain at a time - but that could easily be implemented using a new ioctl >> in the future. > > The issue with that at the bottom is having to introduce something akin > to kbase's annoyingly intra-job-chain dependency management (read: I > still don't understand how FBOs are supposed to work with kbase ;) ), > which AFAIK we push off to userspace right now via standard fencing. If > we want to submit batches at a time, we would potentially need to > express those somewhat complex dependency trees, which is a lot of work > for diminishing returns at this stage. Future ioctl indeed... You should be able to express the dependencies using fences. At the time kbase was started there was no fence mechanism in the kernel. We invented horrible things like UMP[1] and KDS[2] for cross-driver sharing. It all comes down to how small your job chains are - if you don't need to squeeze too many through the hardware you should be fine. But there's going to be some performance gain to be had implementing it. [1] I forget what it actually stands for, but was an attempt to do something like dma_buf [2] "Kernel Dependency System" - a not-so-good version of dma_fence >> There's no SUBMIT_CL in this posting? I think you just need s/_CL//. > > +1 > >> You are probably going to want flags for at least: >> >> * No execute/No read/No write (good for security, especially with >> buffer sharing between processes) >> >> * Alignment (shader programs have quite strict alignment requirements, >> I believe kbase just ensures that the shader memory block doesn't cross >> a 16MB boundary which covers most cases). >> >> * Page fault behaviour (kbase has GROW_ON_GPF) >> >> * Coherency management > > +1 for all of these. This is piped through in userspace (for kbase), but > the corresponding functionality isn't there yet in the Panfrost kernel. > You're right there should at least be a flags field for future use. > >> One issue that I haven't got to the bottom of is that I can trigger a >> lockdep splat: > > Oh, "fun"... > >> This is with the below simple reproducer: > > @Rob, ideas? > >> Other than that in my testing (on a Firefly RK3288) I didn't experience >> any problems pushing jobs from the ARM userspace blob through it. > > Nice! > > Besides what was mentioned above, any other functionality you'll need > for that? (e.g. the infamous replay workaround...) If you don't implement the replay workaround I'm very happy :) The main missing part for the Arm user space is feature registers. That and the lack of SAME_VA is horrible to emulate (keep allocating until it happens to land in a free area of user space memory). Arm user space also makes use of cached memory with explicit cache sync operations. It of course works fine with uncached and ignoring the sync, but again I'm not sure how much performance is being lost. Steve