Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp36493792rwd; Tue, 11 Jul 2023 01:32:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlGIwiUFfSaLPH1HXCd6ae03QKDnnfVGmtyLdONq3vq6Lt5ii7kdiNnuBnR642hUWXjdMDJM X-Received: by 2002:a17:906:220f:b0:993:ef26:c6e3 with SMTP id s15-20020a170906220f00b00993ef26c6e3mr10301042ejs.26.1689064352096; Tue, 11 Jul 2023 01:32:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689064352; cv=none; d=google.com; s=arc-20160816; b=ahHPKzwBtdg/bGIsFfVXAaxeWxAeRR1YxMrgn0JdeVvGobU/8Nn9GRNbaYewIQ1keF P+R3FwXD9kaYKjoD62BH9pYMeOTPpBUx8kaMwUHZ7KoUhFXQ3no8Y0HYSwuBvX88oifr pjr4/X8OxR4U2uHuRBkKwIhMIdXTbjxy48AmjqNeQmZxX6U/uPgtY7Sdtipbl8gxK2qV Z6yortPn0FO39kpC3tciKtA9RymhLLz2JnSylMoRfMFobutXb4I/uP12ONP+ur7vfAUC Tr3PV70Zr52pZqe51Xtl3ckF+jo31XnbBLufzOE8NmuLrrGCHbg/PxSQ5fCgBE2tuXn8 qSxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :content-language:references:cc:to:subject:mime-version:date :dkim-signature:message-id; bh=AslcQWlNvQ734Fm57fmSpyEgWucxQ3TgbSH4k4BXqJI=; fh=kyQup8REPTXAW6kejfSZ139SxMTyM1GJaIxuXcv026s=; b=cuPdkiVuufkhkiPg3N4gQ592+xObfPR6d6TAqAeThn2xOe00m4w3CNSU80kEzqNM3v iBeY6kzADL0rPhfBiPYbcCrd7BNrrApXXtoWH/wdKSP991G1dJm6DsecyJwohSowkRFP HBk+ZrROt364+m5NNDB8P5eIu3P1BOFsG+NcUxV495GrmRA0qDoro62XBaq5QceGnLNy zw+HIJ++YU0eY6XWbjpFyYM9DGudgf5VIJ/y8IORbrCy0dNGg2SBUGtIOXHHIDvk/3vI 46hH8ycUC/6jwPjFosgOxJbpmnJjCZwUAViBUp3efEylgcfQgORtV0NI8TcPjoEjAXOy zUyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=MDxkELzV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qc15-20020a170906d8af00b009924806ccffsi1809034ejb.488.2023.07.11.01.32.06; Tue, 11 Jul 2023 01:32:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=MDxkELzV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbjGKIYe (ORCPT + 99 others); Tue, 11 Jul 2023 04:24:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230322AbjGKIYd (ORCPT ); Tue, 11 Jul 2023 04:24:33 -0400 X-Greylist: delayed 430 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Tue, 11 Jul 2023 01:24:31 PDT Received: from out-26.mta0.migadu.com (out-26.mta0.migadu.com [IPv6:2001:41d0:1004:224b::1a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADC63E6F for ; Tue, 11 Jul 2023 01:24:31 -0700 (PDT) Message-ID: <7b47fd90-5db5-ec52-8ac2-59ac54c38acb@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1689063439; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AslcQWlNvQ734Fm57fmSpyEgWucxQ3TgbSH4k4BXqJI=; b=MDxkELzVKjJJclg/hkISwerR8TPB+gZ9HOWYycPfDwXoo7tVwVF6rlncreKHLLKlXT7dvM tFGRpJtvHAKhtJB1SQXivk6aVWfvwAOzE/wbKkVMN4eJw3088Hlcdm4P8wwHLV2QA6/fKr 3mnAO1TNpjaWtji73nnx6cUsbawdAQk= Date: Tue, 11 Jul 2023 16:17:11 +0800 MIME-Version: 1.0 Subject: Re: [PATCH v2 1/6] fs: split off vfs_getdents function of getdents64 syscall To: Dominique Martinet , Christian Brauner Cc: Alexander Viro , Jens Axboe , Pavel Begunkov , Stefan Roesch , Clay Harris , Dave Chinner , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, io-uring@vger.kernel.org References: <20230524-monolog-punkband-4ed95d8ea852@brauner> <20230525-funkanstalt-ertasten-a43443d045c8@brauner> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Hao Xu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/25/23 19:00, Dominique Martinet wrote: > Christian Brauner wrote on Thu, May 25, 2023 at 11:22:08AM +0200: >>> What was confusing is that default_llseek updates f_pos under the >>> inode_lock (write), and getdents also takes that lock (for read only in >>> shared implem), so I assumed getdents also was just protected by this >>> read lock, but I guess that was a bad assumption (as I kept pointing >>> out, a shared read lock isn't good enough, we definitely agree there) >>> >>> >>> In practice, in the non-registered file case io_uring is also calling >>> fdget, so the lock is held exactly the same as the syscall and I wasn't >> >> No, it really isn't. fdget() doesn't take f_pos_lock at all: >> >> fdget() >> -> __fdget() >> -> __fget_light() >> -> __fget() >> -> __fget_files() >> -> __fget_files_rcu() > > Ugh, I managed to not notice that I was looking at fdget_pos and that > it's not the same as fdget by the time I wrote two paragraphs... These > functions all have too many wrappers and too similar names for a quick > look before work. > >> If that were true then any system call that passes an fd and uses >> fdget() would try to acquire a mutex on f_pos_lock. We'd be serializing >> every *at based system call on f_pos_lock whenever we have multiple fds >> referring to the same file trying to operate on it concurrently. >> >> We do have fdget_pos() and fdput_pos() as a special purpose fdget() for >> a select group of system calls that require this synchronization. > > Right, that makes sense, and invalidates everything I said after that > anyway but it's not like looking stupid ever killed anyone. > > Ok so it would require adding a new wrapper from struct file to struct > fd that'd eventually take the lock and set FDPUT_POS_UNLOCK for... not > fdput_pos but another function for that stopping short of fdput... > Then just call that around both vfs_llseek and vfs_getdents calls; which > is the easy part. > > (Or possibly call mutex_lock directly like Dylan did in [1]...) > [1] https://lore.kernel.org/all/20220222105504.3331010-1-dylany@fb.com/T/#m3609dc8057d0bc8e41ceab643e4d630f7b91bde6 > > > > I'll be honest though I'm thankful for your explanations but I think > I'll just do like Stefan and stop trying for now: the only reason I've > started this was because I wanted to play with io_uring for a new toy > project and it felt awkward without a getdents for crawling a tree; and > I'm long past the point where I should have thrown the towel and just > make that a sequential walk. > There's too many "conditional patches" (NOWAIT, end of dir indicator) > that I don't care about and require additional work to rebase > continuously so I'll just leave it up to someone else who does care. > > So to that someone: feel free to continue from these branches (I've > included the fix for kernfs_fop_readdir that Dan Carpenter reported): > https://github.com/martinetd/linux/commits/io_uring_getdents > https://github.com/martinetd/liburing/commits/getdents > > Or just start over, there's not that much code now hopefully the > baseline requirements have gotten a little bit clearer. > > > Sorry for stirring the mess and leaving halfway, if nobody does continue > I might send a v3 when I have more time/energy in a few months, but it > won't be quick. > Hi Dominique, I'd like to take this if you don't mind. Regards, Hao