Received: by 2002:a89:d88:0:b0:1fa:5c73:8e2d with SMTP id eb8csp2166947lqb; Mon, 27 May 2024 09:52:55 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXw5ukjOLs3+0N6C3ttI3SPjN6WmLKpUoPuwztSjwerU5VgRb+O0OJwaVZBiRpnkhU4h88qRbA3xtnpsaBqRm6Alkvoj5i4THlUJVNU/Q== X-Google-Smtp-Source: AGHT+IH5VCXBIMYpAwz3K5668b1ds35DIXiOglQ//hxUA8tjbP9QofRZDYn9K/bSk17hxzKjMvZk X-Received: by 2002:a05:6a20:9708:b0:1b0:14ee:3b1b with SMTP id adf61e73a8af0-1b212e1cdcemr7902342637.38.1716828774789; Mon, 27 May 2024 09:52:54 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716828774; cv=pass; d=google.com; s=arc-20160816; b=HD6SWMQeRjlCbxhdHhVDasGaXJBPdvlrZ9jal4xi9EZVIpnS8x/zGw1XEa0Uiyix9s brdIu0IeEqNx6OrIMHpt51T0RSrK9SeT+quGAHoNfI++eNoVPtFCxEmsygC8aKsFsA4h bqWtMOPJS372ilMIry6rtkFdswgIe3JsmgkGPlq1ZBVYbr5l3x0pPr75awsYc6LxAoOA xlHXJUzCpkMixXa3s5SkFt9Gl3mbfKU2s7thdJfJ3VQKGc/NwU/+8kSsyaE6tSeLKSln oEAL9EHSWOFmaWLnpfh8dXhPTJOHaQN4YrW6nNNheR5ZbkFXX4+So2co7MgEwlxamUDs 8j0Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=EJWqJxrkn9eN5G99j0riJdNLIsQh7HlIjrBMGMc+Qc0=; fh=g08URXOATlmqLvhdG7kKJrBGYfdOgu2KOHwQ+LyIJ1M=; b=O+HS0Q8N4pcXQIuT00GadL9LhpgV34gQANqAjDfAPoC4VIzqkI0ROwZPKd6pdYzRex oOxB/KEeuqoPmhII823/7sLxxDpoETqsuGfsaUAdvHn2bfrwr7/AjSm+pRqmFjX9le3d TXLJ5tw6niDPQq00u1HTyVYLDn30U407/AQbyTGyy/YQDwRw8K6aZgca8j50QL2vntos dxx6Z32IUrG0TTtFTAuOVx9voCWErUve98YKbi5DqBvd5BQ02hFCwZm8wzHY3RzM+Yze 91bkGIF+XhQTxxMpUKDm05gsXmR9RHBJWz1TsGd/ik+DZyyOVORrO4Unu0VBjcKxgOt7 OQ3w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=H1M6ni55; arc=pass (i=1 spf=pass spfdomain=flex--aliceryhl.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-191197-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-191197-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id d9443c01a7336-1f44c99d24bsi63865025ad.379.2024.05.27.09.52.54 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 May 2024 09:52:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-191197-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=H1M6ni55; arc=pass (i=1 spf=pass spfdomain=flex--aliceryhl.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-191197-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-191197-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 9BE2CB315AB for ; Mon, 27 May 2024 16:14:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AE48B13AD05; Mon, 27 May 2024 16:04:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="H1M6ni55" Received: from mail-lj1-f201.google.com (mail-lj1-f201.google.com [209.85.208.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6A3713A40D for ; Mon, 27 May 2024 16:04:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716825843; cv=none; b=l4OEDaCiGJiJVpXCWk9/G3iEmXhzkVMxntWWBgWE9h965Qei3qnp0989VSJbSrli/7Dc8a5903zrZ3AsKsIpOna1MMfR8nCzpd57mJy6H4KFpM37wYnCrFIzJqGG0kvsrz+erqT4A/mvbJ7g2ELaHrFo7awSzJWc2cjg4abaD50= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716825843; c=relaxed/simple; bh=4EmFyncAhhLiKY+1D8a0clxDIqQWULOdjf0ovBAqZOg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hwScvf5SwXgm+4eiN1VMiQMtyHUTn35/cZiyhtgKDXIw+4u4MAYBpk13Isu5NoSftKH2qTqqz1G7G0785wQpX/zBchObIT4acI5BzMhW3RkFCuP3RskRI/1+gOjLiUyrAsURQLSSed9F6Hs7GomtBonQL3lZhhxzl8RO0O9xIYo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=H1M6ni55; arc=none smtp.client-ip=209.85.208.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Received: by mail-lj1-f201.google.com with SMTP id 38308e7fff4ca-2e95a1d5ff1so26598371fa.0 for ; Mon, 27 May 2024 09:04:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1716825840; x=1717430640; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EJWqJxrkn9eN5G99j0riJdNLIsQh7HlIjrBMGMc+Qc0=; b=H1M6ni55WKT5hNa6QGjItDS8EnlyiMwcMYm7r0Yar834aTI/MhAr6DAUme4y9CkPmX E/GW1CJ2ChhAKqHMwykpg1e9cV4bw9dxdn5o7LBK/ghdDjfZRk5mXeCmn3nipmw/gi2g bfH/oYsFqgLgyODVKHvW8H4shJqDjBcfwzaj7wLUGdiYS5j5riuO3rsaWQSc7L7iFtWz B1HC09oQXUGxNLy/8demm/W6OmZR9k4H0NV7pIxwttu9bPHDDPLSvbdXivNJxarp0M/F ePEyDQCV2QE0IV4vk/TZL/TutSEehNP1wjboQQ2L/c48NmTXEsXay2fsW+dMNG1Fyfm1 Temw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716825840; x=1717430640; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EJWqJxrkn9eN5G99j0riJdNLIsQh7HlIjrBMGMc+Qc0=; b=AYVLn1Lq4NQB+H4bbF3sVoCp1pxWCHbaqQ1/DjjbL1C7uROK/vTQ5M88QDqNm4hx4b AdjBnLqMR67vhoIg+wFi8JTdbzjCQw8RhT5N6d9EbX7fFUK0yIOuqI6E+/6pW66zQl5j ArBcXaVXe/Ymbl3CiWfVv6FE4Bh8Y3bByC9VreDa8+0t0hb3nDZpgNtI1SWdUZEkUsWD U4jZC6Js9suiccSpcTld0jHXZtctHSTHvHjogisIgb2ixAA4WDxHyU2stMEU6DBmC0Eo YRKVV1PBG5i5X09SBo4Z8FR8mMkJI8xiuwHsbm4KqmGxfxg2/rM3c7nR0PeXWcjQeQyt ep0A== X-Forwarded-Encrypted: i=1; AJvYcCXt+IsA72n7/dL9DdYGT5qkEMq+F8O8mbVnlEEXBiMLNkZrG/sYJcLBqMDdELbrfWl6is+XNiDxVLFcfJJGpVN3Q/lhCb6WrQwJR/Io X-Gm-Message-State: AOJu0YySi1RvRsBzD8yctXxjxvcs/7o12Ik30ZqwM+B5euLIPHO/dMQg l8d2biOTRWzhKsom+XBQOlD8/CWzg9RlVWfKICmEn/iuhITPcRc8mce9+g8IwZCUruhn+6Zk4ob FUo+nL6PHLgF8Aw== X-Received: from aliceryhl2.c.googlers.com ([fda3:e722:ac3:cc00:68:949d:c0a8:572]) (user=aliceryhl job=sendgmr) by 2002:a2e:9155:0:b0:2e2:a6dc:8289 with SMTP id 38308e7fff4ca-2e95b25667emr88131fa.7.1716825839717; Mon, 27 May 2024 09:03:59 -0700 (PDT) Date: Mon, 27 May 2024 16:03:56 +0000 In-Reply-To: <20240524213245.GT2118490@ZenIV> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240524213245.GT2118490@ZenIV> X-Mailer: git-send-email 2.45.1.288.g0e0cd299f1-goog Message-ID: <20240527160356.3909000-1-aliceryhl@google.com> Subject: Re: [PATCH v6 3/8] rust: file: add Rust abstraction for `struct file` From: Alice Ryhl To: viro@zeniv.linux.org.uk Cc: a.hindborg@samsung.com, alex.gaynor@gmail.com, aliceryhl@google.com, arve@android.com, benno.lossin@proton.me, bjorn3_gh@protonmail.com, boqun.feng@gmail.com, brauner@kernel.org, cmllamas@google.com, dan.j.williams@intel.com, dxu@dxuuu.xyz, gary@garyguo.net, gregkh@linuxfoundation.org, joel@joelfernandes.org, keescook@chromium.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, maco@android.com, ojeda@kernel.org, peterz@infradead.org, rust-for-linux@vger.kernel.org, surenb@google.com, tglx@linutronix.de, tkjos@android.com, tmgross@umich.edu, wedsonaf@gmail.com, willy@infradead.org, yakoyoku@gmail.com Content-Type: text/plain; charset="utf-8" Al Viro writes: > > > You obviously are aware of this but I'm just spelling it out. Iirc, > > > there will practically only ever be one light refcount per file. > > > > > > For a light refcount to be used we know that the file descriptor table > > > isn't shared with any other task. So there are no threads that could > > > concurrently access the file descriptor table. We also know that the > > > file descriptor table cannot become shared while we're in system call > > > context because the caller can't create new threads and they can't > > > unshare the file descriptor table. > > > > > > So there's only one fdget() caller (Yes, they could call fdget() > > > multiple times and then have to do fdput() multiple times but that's a > > > level of weirdness that we don't need to worry about.). > > > > Hmm. Is it not the case that different processes with different file > > descriptor tables could reference the same underlying `struct file` and > > both use light refcounts to do so, as long as each fd table is not > > shared? So there could be multiple light refcounts to the same `struct > > file` at the same time on different threads. > > Relevant rules: > > * Each file pointer in any descriptor table contributes to refcount > of file. > > * All assignments to task->files are done by the task itself or, > during task creation, by its parent The latter happens before the task > runs for the first time. The former is done with task_lock(current) > held. > > * current->files is always stable. The object it points to > is guaranteed to stay alive at least until you explicitly change > current->files. > * task->files is stable while you are holding task_lock(task). > The object it points to is guaranteed to stay alive until you release > task_lock(task). > * task->files MAY be fetched (racily) without either of the > above, but it should not be dereferenced - the memory may be freed > and reused right after you've fetched the pointer. > > * descriptor tables are refcounted by table->count. > * descriptor table is created with ->count equal to 1 and > destroyed when its ->count reaches 0. > * each task with task->files == table contributes to table->count. > * before the task dies, its ->files becomes NULL (see exit_files()). > * when task is born (see copy_process() and copy_files())) the parent > is responsible for setting the value of task->files and making sure that > refcounts are correct; that's the only case where one is allowed to acquire > an extra reference to existing table (handling of clone(2) with COPY_FILES). > > * the only descriptor table one may modify is that pointed to > by current->files. Any access to other threads' descriptor tables is > read-only. > > * struct fd is fundamentally thread-local. It should never be > passed around, put into shared data structures, etc. > > * if you have done fdget(N), the matching fdput() MUST be done > before the caller modifies the Nth slot of its descriptor table, > spawns children that would share the descriptor table. > > * fdget() MAY borrow a reference from caller's descriptor table. > That can be done if current->files->count is equal to 1. > In that case we can be certain that the file reference we fetched from > our descriptor table will remain unchanged (and thus contributing to refcount > of file) until fdput(). Indeed, > + at the time of fdget() no other thread has task->files pointing > to our table (otherwise ->count would be greater than 1). > + our thread will remain the sole owner of descriptor table at > least until fdput(). Indeed, the first additional thread with task->files > pointing to our table would have to have been spawned by us and we are > forbidden to do that (rules for fdget() use) > + no other thread could modify our descriptor table (they would > have to share it first). > + we are allowed to modify our table, but we are forbidden to touch > the slot we'd copied from (rules for fdget() use). > > In other words, if current->files->count is equal to 1 at fdget() time > we can skip incrementing refcount. Matching fdput() would need to > skip decrement, of course. Note that we must record that (borrowed > vs. cloned) in struct fd - the condition cannot be rechecked at fdput() > time, since the table that had been shared at fdget() time might no longer > be shared by the time of fdput(). This is great! It matches my understanding. I didn't know the details about current->files and task->files. You should copy this to the kernel documentation somewhere. :) > > And this does *not* apply to `fdget_pos`, which checks the refcount of > > the `struct file` instead of the refcount of the fd table. > > False. fdget_pos() is identical to fdget() as far as file refcount > handling goes. The part that is different is that grabbing ->f_pos_lock > is sensitive to file refcount in some cases. This is orthogonal to > "does this struct fd contribute to file refcount". Sorry, I see now that I didn't phrase that quite right. What I meant is that there are ways of sharing a `struct file` reference during an fdget scope that are not dangerous, but where it *would be* dangerous if it was an fdget_pos scope instead. Specifically, the reason they are dangerous is that they can lead to a data race on the file position if the fdget_pos scope did not take the f_pos_lock mutex. For example, during an `fdget(N)` scope, you can always do a `get_file` and then send it to another process and `fd_install` it into that other process. There's no way that this could result in the deletion of the Nth entry of `current->files`. However, during an `fdget_pos(N)` scope, then it is *not* the case that it's always okay to send a `get_file` reference to another thread and `fd_install` it. Because after the remote process returns from the syscall in which we `fd_install`ed the file, the remote process could proceed to call another syscall that in turn modifies the file position. And if the original `fdget_pos(N)` scope modifies the file position after sending the `get_file` reference, then that could be a data race on f_pos. > Again, "light" references are tied to thread; they can only be created > if we are guaranteed that descriptor table's slot they came from will > remain unchanged for as long as the reference is used. > > And yes, there may be several light references to the same file - both > in different processes that do not share descriptor table *and* in the > same thread, if e.g. sendfile(in_fd, out_fd, ...) is called with > in_fd == out_fd. Thanks for confirming this! I hope this reply along with my reply to Christian Brauner also addresses your other thread. Let me know if it doesn't. Alice