Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp419584iob; Wed, 18 May 2022 05:14:11 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxHkUAQ1GKFuykKFIGyTbrNu7XZbnKKV12ZjjuHfeZy2bMeU6gp0lGmxihBhZ5oAYB+ffLf X-Received: by 2002:a63:89c8:0:b0:3db:9da:797e with SMTP id v191-20020a6389c8000000b003db09da797emr23289740pgd.358.1652876051351; Wed, 18 May 2022 05:14:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652876051; cv=none; d=google.com; s=arc-20160816; b=zdRfjJZn/wF9Nri/VyAPAqHk53z7IcEI/3iQ4GzasnTOW8s0+wYRFy6C+NgwxdJraI dmpHozaYCfNV0f5roFPiO63D7iRjNOQsfbaBV9CSKA18jcrI83TlWDl5LjPfxY5Arn8X lwpBy0K2ksCmF98aOBngcUaHzwPOGPPN9IhQ3NUT4uTUs5UCUOtga9QqNr64dV25erYi xB4Ey7xL37rI+JZ44YtLwTV1+3XANHvImzuXEUscws9jKxz+/Gv5BMOLSqv0jpVr8gTJ ro6do1RANedBNCi1tc3BjYtcs+QQhyd85NFfp0WRP5wtXzsfwnGXVjBUMs4xovuugIbU C8og== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Wuv0Ina3m25Wqo3zMRb3pJeGakYZ7Zhwpl9ULfN/YwY=; b=KZjP44/25v6Igdcs6hCf5CbLlwRQduCY9UGuDbErLJtGnlGCt3HvjMuDsC8Qp+g2iU a3WltIzPzfDUBbMN/pI03ycuHGFOHeDF6yzKEaL91BdyKBzYq2pjX9X6unYgnzA97kmq uD+8NuhisawEqJh68hZNoR6D+VZN/q9kWX6y8BIFFKDqJCDmO+3QOUMtWRfr2IZMkL/c 98RqUbiIKW05AxM9/jRFHzEAtpmm90tsKqnedFrJjtkib3/nf1+sdlRI8JDWVBdXaKXf Gl9Y0Xgr8oosmi3EG2EYT69Mj4eI2LnuSihd8P69lsCjOcZsxf/QwDayfIljtF00TCS2 J1wQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=t7TyWoBQ; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id o11-20020a170902778b00b00153b7df656asi2436384pll.490.2022.05.18.05.14.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 May 2022 05:14:11 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=t7TyWoBQ; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 84FBBB0D11; Wed, 18 May 2022 05:07:36 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236239AbiERMHE (ORCPT + 99 others); Wed, 18 May 2022 08:07:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53036 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236201AbiERMHC (ORCPT ); Wed, 18 May 2022 08:07:02 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8CF9AF1D5; Wed, 18 May 2022 05:06:57 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4D6C76120D; Wed, 18 May 2022 12:06:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 13886C385A5; Wed, 18 May 2022 12:06:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1652875616; bh=QotIh7ZOcCKwfWvV5sJOexyAfJaV8SzuQ0JxUEs2VIk=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=t7TyWoBQWift1DQwXw80K3wSMoi8EEezmi+HW8MyoJ32dXLuwWMmmZO2mfmci9gqc ZoJsmDXJuo84Il6+QuJF2aG0qv8KdOGYEE5cN94b4uaWLw5HD/Qt0Xbe2WYY+ueHcA sBsrbUsOnOeIpt9XTCvqvmawGE8SQuMA6T6fII+0= Date: Wed, 18 May 2022 14:06:53 +0200 From: Greg Kroah-Hartman To: "T.J. Mercier" Cc: Christian =?iso-8859-1?Q?K=F6nig?= , Suren Baghdasaryan , Kalesh Singh , Minchan Kim , Greg Kroah-Hartman , John Stultz , Sumit Semwal , Daniel Vetter , Hridya Valsaraju , kernel-team@android.com, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2] dma-buf: Move sysfs work out of DMA-BUF export path Message-ID: References: <20220516171315.2400578-1-tjmercier@google.com> <175c5af3-9224-9c8e-0784-349dad9a2954@amd.com> <0875fa95-3a25-a354-1433-201fca81ed3e@amd.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 17, 2022 at 04:09:36PM -0700, T.J. Mercier wrote: > On Mon, May 16, 2022 at 11:13 PM Greg Kroah-Hartman > wrote: > > > > On Mon, May 16, 2022 at 05:08:05PM -0700, T.J. Mercier wrote: > > > On Mon, May 16, 2022 at 12:21 PM Christian K?nig > > > wrote: > > > > > > > > Am 16.05.22 um 20:08 schrieb T.J. Mercier: > > > > > On Mon, May 16, 2022 at 10:20 AM Christian K?nig > > > > > wrote: > > > > >> Am 16.05.22 um 19:13 schrieb T.J. Mercier: > > > > >>> Recently, we noticed an issue where a process went into direct reclaim > > > > >>> while holding the kernfs rw semaphore for sysfs in write (exclusive) > > > > >>> mode. This caused processes who were doing DMA-BUF exports and releases > > > > >>> to go into uninterruptible sleep since they needed to acquire the same > > > > >>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid > > > > >>> blocking DMA-BUF export for an indeterminate amount of time while > > > > >>> another process is holding the sysfs rw semaphore in exclusive mode, > > > > >>> this patch moves the per-buffer sysfs file creation to the default work > > > > >>> queue. Note that this can lead to a short-term inaccuracy in the dmabuf > > > > >>> sysfs statistics, but this is a tradeoff to prevent the hot path from > > > > >>> being blocked. A work_struct is added to dma_buf to achieve this, but as > > > > >>> it is unioned with the kobject in the sysfs_entry, dma_buf does not > > > > >>> increase in size. > > > > >> I'm still not very keen of this approach as it strongly feels like we > > > > >> are working around shortcoming somewhere else. > > > > >> > > > > > My read of the thread for the last version is that we're running into > > > > > a situation where sysfs is getting used for something it wasn't > > > > > originally intended for, but we're also stuck with this sysfs > > > > > functionality for dmabufs. > > > > > > > > > >>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs") > > > > >>> Originally-by: Hridya Valsaraju > > > > >>> Signed-off-by: T.J. Mercier > > > > >>> > > > > >>> --- > > > > >>> See the originally submitted patch by Hridya Valsaraju here: > > > > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F1%2F4%2F1066&data=05%7C01%7Cchristian.koenig%40amd.com%7C794614324d114880a25508da37672e4b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637883213566903705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bGlA2FeubfSeL5XDHYyWMZqUXfScoCphZjjK4jrqQJs%3D&reserved=0 > > > > >>> > > > > >>> v2 changes: > > > > >>> - Defer only sysfs creation instead of creation and teardown per > > > > >>> Christian K?nig > > > > >>> > > > > >>> - Use a work queue instead of a kthread for deferred work per > > > > >>> Christian K?nig > > > > >>> --- > > > > >>> drivers/dma-buf/dma-buf-sysfs-stats.c | 56 ++++++++++++++++++++------- > > > > >>> include/linux/dma-buf.h | 14 ++++++- > > > > >>> 2 files changed, 54 insertions(+), 16 deletions(-) > > > > >>> > > > > >>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c > > > > >>> index 2bba0babcb62..67b0a298291c 100644 > > > > >>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c > > > > >>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c > > > > >>> @@ -11,6 +11,7 @@ > > > > >>> #include > > > > >>> #include > > > > >>> #include > > > > >>> +#include > > > > >>> > > > > >>> #include "dma-buf-sysfs-stats.h" > > > > >>> > > > > >>> @@ -168,10 +169,46 @@ void dma_buf_uninit_sysfs_statistics(void) > > > > >>> kset_unregister(dma_buf_stats_kset); > > > > >>> } > > > > >>> > > > > >>> +static void sysfs_add_workfn(struct work_struct *work) > > > > >>> +{ > > > > >>> + struct dma_buf_sysfs_entry *sysfs_entry = > > > > >>> + container_of(work, struct dma_buf_sysfs_entry, sysfs_add_work); > > > > >>> + struct dma_buf *dmabuf = sysfs_entry->dmabuf; > > > > >>> + > > > > >>> + /* > > > > >>> + * A dmabuf is ref-counted via its file member. If this handler holds the only > > > > >>> + * reference to the dmabuf, there is no need for sysfs kobject creation. This is an > > > > >>> + * optimization and a race; when the reference count drops to 1 immediately after > > > > >>> + * this check it is not harmful as the sysfs entry will still get cleaned up in > > > > >>> + * dma_buf_stats_teardown, which won't get called until the final dmabuf reference > > > > >>> + * is released, and that can't happen until the end of this function. > > > > >>> + */ > > > > >>> + if (file_count(dmabuf->file) > 1) { > > > > >> Please completely drop that. I see absolutely no justification for this > > > > >> additional complexity. > > > > >> > > > > > This case gets hit around 5% of the time in my testing so the else is > > > > > not a completely unused branch. > > > > > > > > Well I can only repeat myself: This means that your userspace is > > > > severely broken! > > > > > > > > DMA-buf are meant to be long living objects > > > This patch addresses export *latency* regardless of how long-lived the > > > object is. Even a single, long-lived export will benefit from this > > > change if it would otherwise be blocked on adding an object to sysfs. > > > I think attempting to improve this latency still has merit. > > > > Fixing the latency is nice, but as it's just pushing the needed work off > > to another code path, it will take longer overall for the sysfs stuff to > > be ready for userspace to see. > > > > Perhaps we need to step back and understand what this code is supposed > > to be doing. As I recall, it was created because some systems do not > > allow debugfs anymore, and they wanted the debugging information that > > the dmabuf code was exposing to debugfs on a "normal" system. Moving > > that logic to sysfs made sense, but now I am wondering why we didn't see > > these issues in the debugfs code previously? > > > The debugfs stuff doesn't happen on every export, right? I do not remember. If not, then why not do what that does? :) > > Perhaps we should go just one step further and make a misc device node > > for dmabug debugging information to be in and just have userspace > > poll/read on the device node and we spit the info that used to be in > > debugfs out through that? That way this only affects systems when they > > want to read the information and not normal code paths? Yeah that's a > > hack, but this whole thing feels overly complex now. > > > > And deprecate sysfs support? As Android is the only current user, that would be easy to do. > I'm happy to try out anything you think might be a better way. As far > as complexity of this patch, this revision is a much simpler version > of the one from Hridya you already reviewed. I agree we should work to resolve this now, but just that going forward, perhaps this whole api should be revisited now that we see just how much complexity and extra system load it has added to to the system due to how some users interact with the dmabuf apis. You never learn just how bad an API really is until it gets used a lot, so it's no one's fault. But let's learn and move forward if possible to make things better. thanks, greg k-h