Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp2155331rdb; Tue, 3 Oct 2023 11:57:49 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEEKk+4cjXPJQBA8ttwVuaCGY/RmdA3L/p0CQ9p/NHgR+73luKV5b8tp/C8b/ORYoxZ3Ces X-Received: by 2002:a05:6870:d202:b0:1bf:7b3:5116 with SMTP id g2-20020a056870d20200b001bf07b35116mr494349oac.47.1696359468939; Tue, 03 Oct 2023 11:57:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696359468; cv=none; d=google.com; s=arc-20160816; b=J5uATq/zw/TVYKqERnwe7ETsYoFCfvl4K4Z6VcpMZhOZ0Ztc8J5bamT+Qs3CXrqdim iPEdNvnUyTjtfFKNlvMRxB7Rcy/1s2rLYEHNoW3yVOAN6Johq0TEiYg7AwWHZcdDqnsb oYNRGEtlK6E/s0aCMqlnIqaUJayDS6dWovVA9mlbzgOjJOELITk8yxrXqJJnmLr8ubWF Kd4ckysZFRiEs5TTiQOBeimuEDfKPZ/nn2AEvYzyodr6PN6U20DsY69SMhcvGpA7MdzC if6zbZ4ytzNyOfYfKmosZqiHJpt+LN7rNaB09bluN5d3ahqic9SgwfV8Sq5Z4qa1zcUl pZYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=ABQ9coSbOi5xSWoH2nI+Dck8SPnQ1D9d6r3leGWAvt0=; fh=+bvfTsi1YD5sTECElJJLznX0jquzC17rxrCLPX+l+QY=; b=SOBnPrf4M2EoCZkmUo8yQ/rPb7425KmBgD4qYXh3ZqRw4LRyeZzCLP+kExOaYBPU3H piMKQUBuLP3/LS5jPcGlgQyAynIijLhJqu6FWhZdMwxRQ6heMwhSJqVpFFzBJULJX7Az R9nzIIjlTeJc5qoBxJznFwS8uw9BrUvHu6q0Tt7GL+Gtr6SlaRdeTnb2RfIGR4Ft8skv ucXdOXb8qoJMuuk3sp6PliW4Lg52BK2UjxOZNCIY4p3L1WklvhE0VMwXmx5SHeIh88Db KLCYD3DZEDEwfD2JIEtMmaktnMPeoGKON04rTfl79/Wp6VjelqgD12BF3OltMbr4/h6I VDpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RMt24Z2s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id p21-20020a637415000000b00573f94e8b83si1966420pgc.265.2023.10.03.11.57.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Oct 2023 11:57:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RMt24Z2s; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 299FA806533D; Tue, 3 Oct 2023 11:57:46 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231376AbjJCS5k (ORCPT + 99 others); Tue, 3 Oct 2023 14:57:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42290 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232106AbjJCS5j (ORCPT ); Tue, 3 Oct 2023 14:57:39 -0400 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C22959B for ; Tue, 3 Oct 2023 11:57:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696359456; x=1727895456; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=2RmLHXlDrvpms2MO/iCl0m6YdD9Afj1b4iuJowttADI=; b=RMt24Z2satFHFN/hr7GO/jCd5AAgWoPDruZse2W1xA1ZLPEBczb12FXC dDXEYIBGZEWyM440k6lsLR22h6zv3Seqm3ejhLNPTyzVDGDIEmPiNzSvr 38c8/MjKJpaG+q55n77dfH4bp0CIvoqrSgvURAwjEL1ecl/nuEIkxz7R3 +iCgK0KkeQ17A0WGh9gvA9P/OIjoQ3lXcfRO7oRi93C4begs4Dv3JGhHR 8255SVFJnWEeKkgVAOTrfB2ukRmM7u6A3Zr71OooNkrROTNQRqbqLG4Se l9uSNrMRtvK5DgVN1wnyTqjRco5gWtwuZ4kaDR1H6GhWx5mP3eyNl98ME Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10852"; a="1549143" X-IronPort-AV: E=Sophos;i="6.03,198,1694761200"; d="scan'208";a="1549143" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Oct 2023 11:57:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10852"; a="816808372" X-IronPort-AV: E=Sophos;i="6.03,198,1694761200"; d="scan'208";a="816808372" Received: from fhoeg-mobl1.ger.corp.intel.com (HELO [10.249.254.234]) ([10.249.254.234]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Oct 2023 11:57:31 -0700 Message-ID: Date: Tue, 3 Oct 2023 20:57:29 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects Content-Language: en-US To: Danilo Krummrich , Boris Brezillon Cc: airlied@gmail.com, daniel@ffwll.ch, matthew.brost@intel.com, sarah.walker@imgtec.com, donald.robson@imgtec.com, christian.koenig@amd.com, faith@gfxstrand.net, dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org References: <20230928191624.13703-1-dakr@redhat.com> <20230928191624.13703-5-dakr@redhat.com> <20231003120554.547090bc@collabora.com> <20231003162143.490e3ef0@collabora.com> From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 03 Oct 2023 11:57:46 -0700 (PDT) On 10/3/23 18:55, Danilo Krummrich wrote: > It seems like we're mostly aligned on this series, except for the key > controversy we're discussing for a few versions now: locking of the internal > lists. Hence, let's just re-iterate the options we have to get this out of the > way. > > (1) The spinlock dance. This basically works for every use case, updating the VA > space from the IOCTL, from the fence signaling path or anywhere else. > However, it has the downside of requiring spin_lock() / spin_unlock() for > *each* list element when locking all external objects and validating all > evicted objects. Typically, the amount of extobjs and evicted objects > shouldn't be excessive, but there might be exceptions, e.g. Xe. > > (2) The dma-resv lock dance. This is convinient for drivers updating the VA > space from a VM_BIND ioctl() and is especially efficient if such drivers > have a huge amount of external and/or evicted objects to manage. However, > the downsides are that it requires a few tricks in drivers updating the VA > space from the fence signaling path (e.g. job_run()). Design wise, I'm still > skeptical that it is a good idea to protect internal data structures with > external locks in a way that it's not clear to callers that a certain > function would access one of those resources and hence needs protection. > E.g. it is counter intuitive that drm_gpuvm_bo_put() would require both the > dma-resv lock of the corresponding object and the VM's dma-resv lock held. > (Additionally, there were some concerns from amdgpu regarding flexibility in > terms of using GPUVM for non-VM_BIND uAPIs and compute, however, AFAICS > those discussions did not complete and to me it's still unclear why it > wouldn't work.) > > (3) Simply use an internal mutex per list. This adds a tiny (IMHO negligible) > overhead for drivers updating the VA space from a VM_BIND ioctl(), namely > a *single* mutex_lock()/mutex_unlock() when locking all external objects > and validating all evicted objects. And it still requires some tricks for > drivers updating the VA space from the fence signaling path. However, it's > as simple as it can be and hence way less error prone as well as > self-contained and hence easy to use. Additionally, it's flexible in a way > that we don't have any expections on drivers to already hold certain locks > that the driver in some situation might not be able to acquire in the first > place. > > (4) Arbitrary combinations of the above. For instance, the current V5 implements > both (1) and (2) (as either one or the other). But also (1) and (3) (as in > (1) additionally to (3)) would be an option, where a driver could opt-in for > the spinlock dance in case it updates the VA space from the fence signaling > path. > > I also considered a few other options as well, however, they don't seem to be > flexible enough. For instance, as by now we could use SRCU for the external > object list. However, this falls apart once a driver wants to remove and re-add > extobjs for the same VM_BO instance. (For the same reason it wouldn't work for > evicted objects.) > > Personally, after seeing the weird implications of (1), (2) and a combination of > both, I tend to go with (3). Optionally, with an opt-in for (1). The reason for > the latter is that with (3) the weirdness of (1) by its own mostly disappears. > > Please let me know what you think, and, of course, other ideas than the > mentioned ones above are still welcome. > > - Danilo > Here are the locking principles Daniel put together and Dave once called out for us to be applying when reviewing DRM code. These were prompted by very fragile and hard to understand locking patterns in the i915 driver and I think the xe vm_bind locking design was made with these in mind, (not sure exactly who wrote what, though so can't say for sure). https://blog.ffwll.ch/2022/07/locking-engineering.html https://blog.ffwll.ch/2022/08/locking-hierarchy.html At least to me, this motivates using the resv design unless we strictly need lower level locks that are taken in the eviction paths or userptr invalidation paths, but doesn't rule out spinlocks or lock dropping tricks where these are really necessary. But pretty much rules out RCU / SRCU from what I can tell. It also calls for documenting how individual members of structs are protected when ever possible. Thanks, Thomas