Received: by 2002:a19:771d:0:0:0:0:0 with SMTP id s29csp1242125lfc; Wed, 1 Jun 2022 12:50:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw59SSwX9J/E0oNb/gQf0MnBRb8fR+WAqDr4Vss2jTJzrjDIcaDwJ0h1EGWtg+qlvEa8RkW X-Received: by 2002:a17:902:728a:b0:166:3b4f:f6eb with SMTP id d10-20020a170902728a00b001663b4ff6ebmr839537pll.16.1654113033393; Wed, 01 Jun 2022 12:50:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654113033; cv=none; d=google.com; s=arc-20160816; b=Vpnf8rGfzZZLi2r5zdjecQGHso5FmTgF+Rf8AUN6+dp4cgMwoeUSRiZ/5vupI+yYHE Gf9inaTXOHREEAiEPtoFImvdqI7I6BbWPq9bbNluj2++u2gYI1MMsHQ7bG+56gi98KB0 gUtUdYvWQb3QI4dtndQisdpc8kh/JUY3zswZvod4OegfDXBPJr4VIEYGGRnzsNoxte6k +fMzBaISUkRC9uvalqv5U1IJs1I1nHPzVDaTMhYStBRzgdgqG2rsfS946+8oTUFgT/RA 6+CPI1W0Nx9cOAwxkbGew0uQ1Bg1BT8dgNTOZCovP/7Coh5Yppt79VbT8EGnKf9VL8Y2 ID+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=wr9xpxSibC0ctyCpjnvowaocMYx7SBWLKjpEyTnQD44=; b=khjvjxxeHvkpSlGv171F3WUK0Q8kM8bLLoa40YZ6uQyJxdAK4/zCzZOhcNQbWKHvqs SAad3nezfrjxgjc3G8cnBU/YlrnhEAS0X7c5l4Z7tDkF5S/izd514rUN6rj0kK+3JD5b rspacb766iYMi/bTBGiWOnjDTP1OjIvqTLBp8W2pz6lbuBhZeZOjEvDQkdylqHTUR3SU 2T6u3MKUiOUC/korIpUxadzgh2IO+DGn63GSJIRe2CvRBoz0LLsYEzpGgmc6i1fFTUQE 9gb23lsxRaO3hExzoXtSji5vg93fDLyAkIYW+xGirTfce54o9uP8G6lu9yFgzc9lEYf0 YUMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PvlmpA2r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id b6-20020a170902d88600b00158eabc46casi3340057plz.604.2022.06.01.12.50.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 12:50:33 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=PvlmpA2r; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1279E1F5AD5; Wed, 1 Jun 2022 12:13:12 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231274AbiEaXbb (ORCPT + 99 others); Tue, 31 May 2022 19:31:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241462AbiEaXb2 (ORCPT ); Tue, 31 May 2022 19:31:28 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 770979A98B for ; Tue, 31 May 2022 16:31:26 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id n28so10708296edb.9 for ; Tue, 31 May 2022 16:31:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=wr9xpxSibC0ctyCpjnvowaocMYx7SBWLKjpEyTnQD44=; b=PvlmpA2rjw0N8/GZkXsGQSPJ+gcFfBQQagwSoQXhhsGWaWkC4/aViDXoh7vjWD2eMO PoypXG4Mwckqsh5TG/Ju9Baovrq2i16PZVUv7d5Djd5yr3LFRSyoWcy11iOcaS7Hjfj2 b4d01OclJMNng7WSUtYzSLPHRqnPtig5Ygvs+56tpXaVeQ5NHj43hL+6qrDoh5FGOYhI k97kD4qqlanzUJ5rYdQg7avhqu0ICGXWLZV1rvIbZISX6MLx8Fftp16BlLkYtqXjwlCz S8D7zJUw1ef7Qa6oqcvNvnYS5L4h/DR7QPjaW+Ets1RZf5ftjXdx46Cnsu0lMe7jsV0J SYtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=wr9xpxSibC0ctyCpjnvowaocMYx7SBWLKjpEyTnQD44=; b=BpZQTc1FAXoEaf0i2iXCj30XSYeO1URiCH0oWrc2rGfpkTrBhSWM9492MNGgFzqHeC 8LXgOV0l0SzKhagysSy9hfPR0quqXXB0NRfdQglkMRbvL94TABRL4F4/Ta2mzcZNUF/l t/MYjI9rc/U1SKf1EI38QHzNTMN1hNNyWb60GoZJGhr9EFGPeP+j/3YCIfThcPZyVu7q IjLpN7JY/ndKrSp3grpJOVMkCa/9p4ZcDs092FovwslIQzAM41mKFAWFtkBRG2ol3AEF o3XCbJ8It/sQIDX6KEYwzSmqZxGYaoRquEGmg0MwRwg6ChscBrdqt21obJxuLWsbSL0E JbJA== X-Gm-Message-State: AOAM530Mucx/httGXe5jUMr/GV6h7kjHiM6fIh6wMv1+ZFqvJrqL1Jzq 7HU1Va8MUIgz7kqltO1jR81LS//Lg0z1yWjk7uCgQQ== X-Received: by 2002:a05:6402:354c:b0:42b:4e22:203b with SMTP id f12-20020a056402354c00b0042b4e22203bmr52476539edd.308.1654039884720; Tue, 31 May 2022 16:31:24 -0700 (PDT) MIME-Version: 1.0 References: <20220516171315.2400578-1-tjmercier@google.com> <175c5af3-9224-9c8e-0784-349dad9a2954@amd.com> <0875fa95-3a25-a354-1433-201fca81ed3e@amd.com> <38da6dcd-b395-f32f-5a47-6a8f2c6a4331@amd.com> In-Reply-To: <38da6dcd-b395-f32f-5a47-6a8f2c6a4331@amd.com> From: "T.J. Mercier" Date: Tue, 31 May 2022 16:31:13 -0700 Message-ID: Subject: Re: [PATCH v2] dma-buf: Move sysfs work out of DMA-BUF export path To: =?UTF-8?Q?Christian_K=C3=B6nig?= Cc: Greg Kroah-Hartman , Suren Baghdasaryan , Kalesh Singh , Minchan Kim , Greg Kroah-Hartman , John Stultz , Sumit Semwal , Hridya Valsaraju , kernel-team@android.com, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org, Daniel Vetter Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-9.5 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, May 29, 2022 at 11:12 PM Christian K=C3=B6nig wrote: > > Am 25.05.22 um 23:05 schrieb T.J. Mercier: > > On Wed, May 25, 2022 at 7:38 AM Daniel Vetter wrote: > >> On Tue, May 17, 2022 at 08:13:24AM +0200, Greg Kroah-Hartman wrote: > >>> On Mon, May 16, 2022 at 05:08:05PM -0700, T.J. Mercier wrote: > >>>> On Mon, May 16, 2022 at 12:21 PM Christian K=C3=B6nig > >>>> wrote: > >>>>> Am 16.05.22 um 20:08 schrieb T.J. Mercier: > >>>>>> On Mon, May 16, 2022 at 10:20 AM Christian K=C3=B6nig > >>>>>> wrote: > >>>>>>> Am 16.05.22 um 19:13 schrieb T.J. Mercier: > >>>>>>>> Recently, we noticed an issue where a process went into direct r= eclaim > >>>>>>>> while holding the kernfs rw semaphore for sysfs in write (exclus= ive) > >>>>>>>> mode. This caused processes who were doing DMA-BUF exports and r= eleases > >>>>>>>> to go into uninterruptible sleep since they needed to acquire th= e same > >>>>>>>> semaphore for the DMA-BUF sysfs entry creation/deletion. In orde= r to avoid > >>>>>>>> blocking DMA-BUF export for an indeterminate amount of time whil= e > >>>>>>>> another process is holding the sysfs rw semaphore in exclusive m= ode, > >>>>>>>> this patch moves the per-buffer sysfs file creation to the defau= lt work > >>>>>>>> queue. Note that this can lead to a short-term inaccuracy in the= dmabuf > >>>>>>>> sysfs statistics, but this is a tradeoff to prevent the hot path= from > >>>>>>>> being blocked. A work_struct is added to dma_buf to achieve this= , but as > >>>>>>>> it is unioned with the kobject in the sysfs_entry, dma_buf does = not > >>>>>>>> increase in size. > >>>>>>> I'm still not very keen of this approach as it strongly feels lik= e we > >>>>>>> are working around shortcoming somewhere else. > >>>>>>> > >>>>>> My read of the thread for the last version is that we're running i= nto > >>>>>> a situation where sysfs is getting used for something it wasn't > >>>>>> originally intended for, but we're also stuck with this sysfs > >>>>>> functionality for dmabufs. > >>>>>> > >>>>>>>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-B= UF stats in sysfs") > >>>>>>>> Originally-by: Hridya Valsaraju > >>>>>>>> Signed-off-by: T.J. Mercier > >>>>>>>> > >>>>>>>> --- > >>>>>>>> See the originally submitted patch by Hridya Valsaraju here: > >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2= F%2Flkml.org%2Flkml%2F2022%2F1%2F4%2F1066&data=3D05%7C01%7Cchristian.ko= enig%40amd.com%7C8f00afd44b9744c45f5708da3e926503%7C3dd8961fe4884e608e11a82= d994e183d%7C0%7C0%7C637891095771223650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4w= LjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&= sdata=3DpubWqUyqhCWpXHhJHsoqarc3GLtB6IFB1rhgfsL4a1M%3D&reserved=3D0 > >>>>>>>> > >>>>>>>> v2 changes: > >>>>>>>> - Defer only sysfs creation instead of creation and teardown per > >>>>>>>> Christian K=C3=B6nig > >>>>>>>> > >>>>>>>> - Use a work queue instead of a kthread for deferred work per > >>>>>>>> Christian K=C3=B6nig > >>>>>>>> --- > >>>>>>>> drivers/dma-buf/dma-buf-sysfs-stats.c | 56 +++++++++++++++++= +++------- > >>>>>>>> include/linux/dma-buf.h | 14 ++++++- > >>>>>>>> 2 files changed, 54 insertions(+), 16 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma= -buf/dma-buf-sysfs-stats.c > >>>>>>>> index 2bba0babcb62..67b0a298291c 100644 > >>>>>>>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c > >>>>>>>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c > >>>>>>>> @@ -11,6 +11,7 @@ > >>>>>>>> #include > >>>>>>>> #include > >>>>>>>> #include > >>>>>>>> +#include > >>>>>>>> > >>>>>>>> #include "dma-buf-sysfs-stats.h" > >>>>>>>> > >>>>>>>> @@ -168,10 +169,46 @@ void dma_buf_uninit_sysfs_statistics(void) > >>>>>>>> kset_unregister(dma_buf_stats_kset); > >>>>>>>> } > >>>>>>>> > >>>>>>>> +static void sysfs_add_workfn(struct work_struct *work) > >>>>>>>> +{ > >>>>>>>> + struct dma_buf_sysfs_entry *sysfs_entry =3D > >>>>>>>> + container_of(work, struct dma_buf_sysfs_entry, sys= fs_add_work); > >>>>>>>> + struct dma_buf *dmabuf =3D sysfs_entry->dmabuf; > >>>>>>>> + > >>>>>>>> + /* > >>>>>>>> + * A dmabuf is ref-counted via its file member. If this ha= ndler holds the only > >>>>>>>> + * reference to the dmabuf, there is no need for sysfs kob= ject creation. This is an > >>>>>>>> + * optimization and a race; when the reference count drops= to 1 immediately after > >>>>>>>> + * this check it is not harmful as the sysfs entry will st= ill get cleaned up in > >>>>>>>> + * dma_buf_stats_teardown, which won't get called until th= e final dmabuf reference > >>>>>>>> + * is released, and that can't happen until the end of thi= s function. > >>>>>>>> + */ > >>>>>>>> + if (file_count(dmabuf->file) > 1) { > >>>>>>> Please completely drop that. I see absolutely no justification fo= r this > >>>>>>> additional complexity. > >>>>>>> > >>>>>> This case gets hit around 5% of the time in my testing so the else= is > >>>>>> not a completely unused branch. > >>>>> Well I can only repeat myself: This means that your userspace is > >>>>> severely broken! > >>>>> > >>>>> DMA-buf are meant to be long living objects > >>>> This patch addresses export *latency* regardless of how long-lived t= he > >>>> object is. Even a single, long-lived export will benefit from this > >>>> change if it would otherwise be blocked on adding an object to sysfs= . > >>>> I think attempting to improve this latency still has merit. > >>> Fixing the latency is nice, but as it's just pushing the needed work = off > >>> to another code path, it will take longer overall for the sysfs stuff= to > >>> be ready for userspace to see. > >>> > >>> Perhaps we need to step back and understand what this code is suppose= d > >>> to be doing. As I recall, it was created because some systems do not > >>> allow debugfs anymore, and they wanted the debugging information that > >>> the dmabuf code was exposing to debugfs on a "normal" system. Moving > >>> that logic to sysfs made sense, but now I am wondering why we didn't = see > >>> these issues in the debugfs code previously? > >>> > >>> Perhaps we should go just one step further and make a misc device nod= e > >>> for dmabug debugging information to be in and just have userspace > >>> poll/read on the device node and we spit the info that used to be in > >>> debugfs out through that? That way this only affects systems when th= ey > >>> want to read the information and not normal code paths? Yeah that's = a > >>> hack, but this whole thing feels overly complex now. > >> A bit late on this discussion, but just wanted to add my +1 that we sh= ould > >> either redesign the uapi, or fix the underlying latency issue in sysfs= , or > >> whatever else is deemed the proper fix. > >> > >> Making uapi interfaces async in ways that userspace can't discover is = a > >> hack that we really shouldn't consider, at least for upstream. All kin= ds > >> of hilarious things might start to happen when an object exists, but n= ot > >> consistently in all the places where it should be visible. There's a > >> reason sysfs has all these neat property groups so that absolutely > >> everything is added atomically. Doing stuff later on just because usua= lly > >> no one notices that the illusion falls apart isn't great. > >> > >> Unfortunately I don't have a clear idea here what would be the right > >> solution :-/ One idea perhaps: Should we dynamically enumerate the obj= ects > >> when userspace does a readdir()? That's absolutely not how sysfs works= , > >> but procfs works like that and there's discussions going around about > >> moving these optimizations to other kernfs implementations. At least t= here > >> was a recent lwn article on this: > >> > >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttps%3A%2F%2Flw= n.net%2FArticles%2F895111%2F&data=3D05%7C01%7Cchristian.koenig%40amd.co= m%7C8f00afd44b9744c45f5708da3e926503%7C3dd8961fe4884e608e11a82d994e183d%7C0= %7C0%7C637891095771223650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI= joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3DQ58OZ= i79vmKMCZLL0pY7NniIW6hmSqyWjlEaZgqzYtM%3D&reserved=3D0 > >> > >> But that would be serious amounts of work I guess. > >> -Daniel > >> -- > >> Daniel Vetter" > >> Software Engineer, Intel Corporation > >> https://nam11.safelinks.protection.outlook.com/?url=3Dhttp%3A%2F%2Fblo= g.ffwll.ch%2F&data=3D05%7C01%7Cchristian.koenig%40amd.com%7C8f00afd44b9= 744c45f5708da3e926503%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C63789109= 5771223650%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB= TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=3DpOIl5yszzak4TPqjBYyL= 0mHjj%2F1nYRfNJbNPQTXBhbA%3D&reserved=3D0 > > Hi Daniel, > > > > My team has been discussing this, and I think we're approaching a > > consensus on a way forward that involves deprecating the existing > > uapi. > > > > I actually proposed a similar (but less elegant) idea to the readdir() > > one. A new "dump_dmabuf_data" sysfs file that a user would write to, > > which would cause a one-time creation of the per-buffer files. These > > could be left around to become stale, or get cleaned up after first > > read. However to me it seems impossible to correctly deal with > > multiple simultaneous users with this technique. We're not currently > > planning to pursue this. > > > > Thanks for the link to the article. That on-demand creation sounds > > like it would allow us to keep the existing structure and files for > > DMA-buf, assuming there is not a similar lock contention issue when > > adding a new node to the virtual tree. :) > > I think that this on demand creation is even worse than the existing > ideas, but if you can get Greg to accept the required sysfs changes than > that's at least outside of my maintenance domain any more :) Hah, ok. After chatting with Steven it sounds like an attempt at on demand creation for sysfs is not likely to happen soon, as the focus is on getting it working for tracefs first and letting it stew there for a while to polish it. I'll check with Greg when he's available again next week about whether that is a direction we should even consider before moving forward from here. > > > Regards, > Christian.