Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp134169rdb; Fri, 29 Sep 2023 20:59:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH5HV+12g52pZiLxUqwCBtNG3hiegpVROuiiNhOFfH+snMlBVRUTBZss3EfWqah006s1l4u X-Received: by 2002:a05:620a:1725:b0:775:7fdc:42c1 with SMTP id az37-20020a05620a172500b007757fdc42c1mr6971371qkb.13.1696046341520; Fri, 29 Sep 2023 20:59:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696046341; cv=none; d=google.com; s=arc-20160816; b=HFDfdjYvRxF+ZHUDmMJ3yEP3QKAhzPBF9PyTSxS01T4D31NzREWZJHLDIG6frPvjVM dZZ9O5U8+kgnXf+GfytCcrWNc4k0XBNAcrS3zA/RMsx2OpHiUlQAxZRZ2MfloQrnfowe FqbpgxkPu8cJ2IgOTYiSTO9hcKqGwqdWxuL00YyBeijqSxHukAfiM1RzymDgqyDNL2fD daDbQpXeygwTnxREiXDpwTd+4RJ8pagI6QHy8pfVTTKJUgM32ZspRP/+I8ipFLKEybgc 6ZaeybktnI/DgvMvcRUtz0OBzCYUDa/FnYg58GIYpVuHkv/N1FiD+NXmeZ9ZlI1dKd+C d+tA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature; bh=yLmbChkWgyq5O/j/+nl2wCwmWMxWqACpCkv3qIitXDU=; fh=DkJmP+KT1gtLtMW59+V6tBEWd8xFo9KE+3e0txB7KZ0=; b=kjFtDDn63s8PYTiUrJHwO0drAqv7hDGCxrhxaC3oe6Rto7bLEL984ZBttIldbnRLEP LBqBFf88X0wV97kQ6HoYA/DUe3NIkt6yuZbd4N4W3zLX8oOmkuOmVcgkPUlqisy2H1i8 WgBfqJYFAVKiBqSeQha+DL0hzj7aekeGDNNpUkxswtB8vPUM5lmkOAe1fxzX1jzYAX98 gU0ZeneNTXsAtiAUJGWZ2i5cSaZkZqsYcG6VIRMmbS6GB98M3RyhW5Hax56WgGveYLwM Fzya7yHxFPPd/I7YBr0r9WdMDv+qkUJ+rvYwnixbdH/DP02eZyHvqR7gznufis2QL93V gFpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=nfvaFrBO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id q186-20020a632ac3000000b0057751b7788esi22169469pgq.488.2023.09.29.20.59.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 20:59:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=nfvaFrBO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 6B8EA833814B; Fri, 29 Sep 2023 14:39:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233517AbjI2VjU (ORCPT + 99 others); Fri, 29 Sep 2023 17:39:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52596 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232834AbjI2VjT (ORCPT ); Fri, 29 Sep 2023 17:39:19 -0400 Received: from mail-oo1-xc33.google.com (mail-oo1-xc33.google.com [IPv6:2607:f8b0:4864:20::c33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 19624F1; Fri, 29 Sep 2023 14:39:17 -0700 (PDT) Received: by mail-oo1-xc33.google.com with SMTP id 006d021491bc7-57b6c7d0cabso7222283eaf.1; Fri, 29 Sep 2023 14:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696023556; x=1696628356; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yLmbChkWgyq5O/j/+nl2wCwmWMxWqACpCkv3qIitXDU=; b=nfvaFrBOXH1x1Sj051uHDq209Bm6lMmRQlzwTHPIPSiF0EZK3SdSwbnA+6dsyiOKtY hKn94bOv7NhIhVUIa69VP3fTSpkAAPVqeJlDjd6uEceR0XjjndfTPUiPMwWOieMzz4lv DxyDPgkKPkcxJ/zktwsgSQg8PofUkboYcMoK7dEzxkTJIHo9lRgFakmRyrqJlupXZZtv +EgCTwleIiJlOc/+cnzOXmeRT0XPbkyRraBX7U39pX6nA0E0qBBHippJwgpRVSbqkXeL JINMTSAasFRy22CWwPM5euDC0ijqlrN2B0pZtt75kHJ0AO+1OBuC3AcNKxa3tyfLBP4t 9qyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696023556; x=1696628356; h=content-transfer-encoding:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yLmbChkWgyq5O/j/+nl2wCwmWMxWqACpCkv3qIitXDU=; b=dTiTbkTwb1TDbGTS0p/NKLoQvAeJL5zs1NTy13Al+WYSrPhp76sC7uHpaEjc4v4DjC VO6ANvNbNz+ksDCCWGcVnUyxr+HThp7FVUc4ECytTXu3hz04fzqm2icUp3sEvlcuVX21 FwbfgbXi9rO3beTRKlHem8EANGbExbtxXAy+Ro4aDkzSxy6wNF1l+YzzYik5mVDp7k+A sC0h5CXDHB1tLz5GXe08lrfyqkA1JMP1fl9Jg3yyiz2Gau581TPwNfKUivVM4xfULBX9 1u35ToEDXqVt5u9Fgs9W1HoQY/6rWPZxk9fD+TfucFUovh0WptS2mmmLadFRVMJkeH4J y6Pw== X-Gm-Message-State: AOJu0YwULTy6RQQxrGMISLw984rHDwaL5bzlRY/lU+aGcwp9QJmUR4yK TyMTl/t5+2L+4BurP4DHOM/fVA9sufDmMYTVCB42P4M0 X-Received: by 2002:a4a:9ccc:0:b0:56d:10bb:c2d4 with SMTP id d12-20020a4a9ccc000000b0056d10bbc2d4mr5448328ook.1.1696023556306; Fri, 29 Sep 2023 14:39:16 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a8a:108:0:b0:4f0:1250:dd51 with HTTP; Fri, 29 Sep 2023 14:39:15 -0700 (PDT) In-Reply-To: References: <20230928-kulleraugen-restaurant-dd14e2a9c0b0@brauner> <20230928-themen-dilettanten-16bf329ab370@brauner> <20230929-kerzen-fachjargon-ca17177e9eeb@brauner> <20230929-test-lauf-693fda7ae36b@brauner> From: Mateusz Guzik Date: Fri, 29 Sep 2023 23:39:15 +0200 Message-ID: Subject: Re: [PATCH v2] vfs: shave work on failed file open To: Christian Brauner Cc: Jann Horn , Linus Torvalds , viro@zeniv.linux.org.uk, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 29 Sep 2023 14:39:26 -0700 (PDT) On 9/29/23, Mateusz Guzik wrote: > On 9/29/23, Christian Brauner wrote: >> On Fri, Sep 29, 2023 at 03:31:29PM +0200, Jann Horn wrote: >>> On Fri, Sep 29, 2023 at 11:20=E2=80=AFAM Christian Brauner >>> wrote: >>> > > But yes, that protection would be broken by SLAB_TYPESAFE_BY_RCU, >>> > > since then the "f_count is zero" is no longer a final thing. >>> > >>> > I've tried coming up with a patch that is simple enough so the patter= n >>> > is easy to follow and then converting all places to rely on a pattern >>> > that combine lookup_fd_rcu() or similar with get_file_rcu(). The >>> > obvious >>> > thing is that we'll force a few places to now always acquire a >>> > reference >>> > when they don't really need one right now and that already may cause >>> > performance issues. >>> >>> (Those places are probably used way less often than the hot >>> open/fget/close paths though.) >>> >>> > We also can't fully get rid of plain get_file_rcu() uses itself >>> > because >>> > of users such as mm->exe_file. They don't go from one of the rcu >>> > fdtable >>> > lookup helpers to the struct file obviously. They rcu replace the fil= e >>> > pointer in their struct ofc so we could change get_file_rcu() to take >>> > a >>> > struct file __rcu **f and then comparing that the passed in pointer >>> > hasn't changed before we managed to do atomic_long_inc_not_zero(). >>> > Which >>> > afaict should work for such cases. >>> > >>> > But overall we would introduce a fairly big and at the same time >>> > subtle >>> > semantic change. The idea is pretty neat and it was fun to do but I'm >>> > just not convinced we should do it given how ubiquitous struct file i= s >>> > used and now to make the semanics even more special by allowing >>> > refcounts. >>> > >>> > I've kept your original release_empty_file() proposal in vfs.misc >>> > which >>> > I think is a really nice change. >>> > >>> > Let me know if you all passionately disagree. ;) >> >> So I'm appending the patch I had played with and a fix from Jann on top. >> @Linus, if you have an opinion, let me know what you think. >> >> Also available here: >> https://gitlab.com/brauner/linux/-/commits/vfs.file.rcu >> >> Might be interesting if this could be perfed to see if there is any real >> gain for workloads with massive numbers of fds. >> > > I would feel safer with a guaranteed way to tell that the file was > reallocated. > > I think this could track allocs/frees with a sequence counter embedded > into the object, say odd means deallocated and even means allocated. > > Then you would know for a fact whether you raced with the file getting > whacked and would never have to wonder if you double-checked > everything you needed (like that f_mode) thing. > > This would also mean that consumers which get away with poking around > the file without getting a ref could still do it, this is at least > true for tid_fd_mode. All of them would need patching though. > > Extending struct file is not ideal by any means, but the good news is tha= t: > 1. there is a 4 byte hole in there, if one is fine with an int-sized > counter > 2. if one insists on 8 bytes, the struct is 232 bytes on my kernel > (debian). still some room up to 256, so it may be tolerable? > So to be clear, obtaining the initial count would require a dedicated accessor. First you would find the file obj, wait for the count to reach "allocated" state, validate the source still has the right pointer, validate the count did not change (with acq fences sprinkled in there). At the end of it you know that the seq counter you got from the file was there when the file was still "installed". Then you can poke around and validate you poked around the right thing by once more validating the counter. Maybe I missed something, but the idea in general should work. --=20 Mateusz Guzik