Received: by 2002:ab2:7903:0:b0:1fb:b500:807b with SMTP id a3csp1145186lqj; Mon, 3 Jun 2024 11:26:38 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVVKtllGLykWR+rSza6v9jk/C4pSIqZLh5YfM+YhG5pmZtiGnHxI8xuTkg/SQOXUSMv5ismpxQ3HxtstKaZRAVKDjuQlqZ7BEsDv0VbDQ== X-Google-Smtp-Source: AGHT+IGXPdXBH6XSLlyiTOV/y7XcH4q4fWFIplDSni/q+Qag4wwZ7IiFpKxAocXONO+sQlapPG6W X-Received: by 2002:a05:622a:164e:b0:43d:f712:d5bd with SMTP id d75a77b69052e-43ff52aae83mr107021221cf.47.1717439197742; Mon, 03 Jun 2024 11:26:37 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717439197; cv=pass; d=google.com; s=arc-20160816; b=LYy1Hzp0FyyzRYsow7p0SaIOmuzM4T5qxVqRzzz2v126YB5SvrLZKl4GkEBF2JsMKF UXYUjroZnNby3n2AhT8IZpBK8um28BffEMdQvYleV/lRDqZHYHWgyyAwY0rOsP4Ww5vN uQc8DKtBZ55ThF5H8NFl9KUm+uH0ygSLAgybzUnIOr+/WUlv0yxK+vPilISiuC/F9jHK gqef2quso5TiMBYDrEsuJlOjaUVQY+OsIiOWp1Lz93MB5m4xqFipJrKXVDe7JT8n+TPE sqKXkc0E/NrywuhqbS1C66VsHcFIAlPKz8VS5+lkMaL+z9LmZKCpgPPRkPHqwCw4oxlU wPJQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:feedback-id:references :in-reply-to:message-id:subject:cc:from:to:date:dkim-signature; bh=h5UBR5s+DqAv1y4ocFeN+mluknCcFmnFq+yPUyc1U4o=; fh=l/PEToSkI6aTmWghg6aZZvJ0DKXCR3tW5enXMnUlsGk=; b=Jnq1urWYjAoJhv7Q2O8AsAu1KjGv4075ZYbXkxWWo0qpt/AhAQGrUabX45MTA9eoiE gRnBidDxHKjBNm8UlplkGyMatWv8yHbXpF47/nyLLRKlYQ1r3v0ni7PjjheFObsYsPEU kYNIJInTwsHvCvf+Ywlo/jFAP4p06SYp8SaFukNP5uLmokFU7AbljwC5G0RvYFk3QHnu jMew5QxF24XM3ujhSqcIPgRzTZZQxrxslVb8V5tToHEFvbb1k4pUvWrxc8ip/re15OyX xhU/5bqmUFAR0wy0wNY3IUmBIRspe4DH3U3Ce9vwG1iiftzh9S6+k7AGvDYadPyM/OtB Ao4Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@proton.me header.s=protonmail header.b=I8FUfYep; arc=pass (i=1 spf=pass spfdomain=proton.me dkim=pass dkdomain=proton.me dmarc=pass fromdomain=proton.me); spf=pass (google.com: domain of linux-kernel+bounces-199561-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199561-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=proton.me Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id d75a77b69052e-43ff23ae858si4018301cf.30.2024.06.03.11.26.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Jun 2024 11:26:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-199561-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@proton.me header.s=protonmail header.b=I8FUfYep; arc=pass (i=1 spf=pass spfdomain=proton.me dkim=pass dkdomain=proton.me dmarc=pass fromdomain=proton.me); spf=pass (google.com: domain of linux-kernel+bounces-199561-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-199561-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=proton.me Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 671A81C22594 for ; Mon, 3 Jun 2024 18:26:37 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C49B3138484; Mon, 3 Jun 2024 18:26:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=proton.me header.i=@proton.me header.b="I8FUfYep" Received: from mail-4316.protonmail.ch (mail-4316.protonmail.ch [185.70.43.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B735137C4A; Mon, 3 Jun 2024 18:26:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.70.43.16 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717439187; cv=none; b=FfU4UsonkYOZJyqVfS1XPZlugO9usxpaV2dmOPxzsBxn97PwDWprhs21VXA3/MhH7t2JuKR+/zXuk/vANf6Ipd7318BqarT+xtpYVAku2biFZcofhGQuirVJm+6NowqZu6GZWdiOJfVfdbuCHL2uuhhvA0Y4uDPXvEzWQ956kNI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717439187; c=relaxed/simple; bh=IYDSrevwN0l8xOdEMkQFQ8XiL8ZhTn5e0JZtNVYcOa4=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qQvHIQcMvuV04aqLVOZLzEbSMygi0QC/3csdGlzqXTpJxUcBMzwGuJ6WyzK46cB4gLiQO/iFNAvWIYLAa6edE9UdsZ7EdnHhCdsVQqpzf+cSNYjG0BPaYDYftJUAE3U9LxgMMBh7rTiVkzmspOpoUzPt5XDaPCvGOjT2TUbhCqQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=proton.me; spf=pass smtp.mailfrom=proton.me; dkim=pass (2048-bit key) header.d=proton.me header.i=@proton.me header.b=I8FUfYep; arc=none smtp.client-ip=185.70.43.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=proton.me Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=proton.me DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proton.me; s=protonmail; t=1717439176; x=1717698376; bh=h5UBR5s+DqAv1y4ocFeN+mluknCcFmnFq+yPUyc1U4o=; h=Date:To:From:Cc:Subject:Message-ID:In-Reply-To:References: Feedback-ID:From:To:Cc:Date:Subject:Reply-To:Feedback-ID: Message-ID:BIMI-Selector; b=I8FUfYepxDICOZxNDvK48bFsvqV1CguSjcb0QSHtzMRml22V3rC+9nSm5vAtudFnF UodAOyU3g6SkUHLvYVG5nlVAZenDlK8SzZPQZy+KLD8sItktsP26Xd35ypC97aPcWM E4bazlmUI/6BnYvHTkRLWqto38XR+WaF4DvsM+2atcXDmFDngLOYX9uKQS5Tdv3My+ eN1URtby09xA2/F6jAddhdTnsypZKlAY8/EV+Qyofj7N4khu3I5WxE0KBpRon81z3s QtmDJy9yLfE+HJkepQecBZT8NoNhYPwZ0qeFpioJzE8ByonstAJG2CbOsFhcI50zpY t/mQ2YI8dZISw== Date: Mon, 03 Jun 2024 18:26:08 +0000 To: Andreas Hindborg From: Benno Lossin Cc: Jens Axboe , Christoph Hellwig , Keith Busch , Damien Le Moal , Bart Van Assche , Hannes Reinecke , Ming Lei , "linux-block@vger.kernel.org" , Andreas Hindborg , Wedson Almeida Filho , Greg KH , Matthew Wilcox , Miguel Ojeda , Alex Gaynor , Boqun Feng , Gary Guo , =?utf-8?Q?Bj=C3=B6rn_Roy_Baron?= , Alice Ryhl , Chaitanya Kulkarni , Luis Chamberlain , Yexuan Yang <1182282462@bupt.edu.cn>, =?utf-8?Q?Sergio_Gonz=C3=A1lez_Collado?= , Joel Granados , "Pankaj Raghav (Samsung)" , Daniel Gomez , Niklas Cassel , Philipp Stanner , Conor Dooley , Johannes Thumshirn , =?utf-8?Q?Matias_Bj=C3=B8rling?= , open list , "rust-for-linux@vger.kernel.org" , "lsf-pc@lists.linux-foundation.org" , "gost.dev@samsung.com" Subject: Re: [PATCH v4 1/3] rust: block: introduce `kernel::block::mq` module Message-ID: <925fe0fe-9303-4f49-b473-c3a3ecc5e2e6@proton.me> In-Reply-To: <87mso2me0p.fsf@metaspace.dk> References: <20240601134005.621714-1-nmi@metaspace.dk> <20240601134005.621714-2-nmi@metaspace.dk> <87mso2me0p.fsf@metaspace.dk> Feedback-ID: 71780778:user:proton X-Pm-Message-ID: 6e3342b49160368ba5b3571626e6270cd0c4f503 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 03.06.24 14:01, Andreas Hindborg wrote: > Benno Lossin writes: >> On 01.06.24 15:40, Andreas Hindborg wrote: >>> +impl seal::Sealed for Initialized {} >>> +impl GenDiskState for Initialized { >>> + const DELETE_ON_DROP: bool =3D false; >>> +} >>> +impl seal::Sealed for Added {} >>> +impl GenDiskState for Added { >>> + const DELETE_ON_DROP: bool =3D true; >>> +} >>> + >>> +impl GenDisk { >>> + /// Try to create a new `GenDisk`. >>> + pub fn try_new(tagset: Arc>) -> Result { >> >> Since there is no non-try `new` function, I think we should name this >> function just `new`. >=20 > Right, I am still getting used to the new naming scheme. Do you know if > it is documented anywhere? I don't think it is documented, it might only be a verbal convention at the moment. Although [1] is suggesting `new` for general constructors. Since this is the only constructor, one could argue that the recommendation is to use `new` (which I personally find a good idea). [1]: https://rust-lang.github.io/api-guidelines/naming.html [...] >>> +impl OperationsVTable { >>> + /// This function is called by the C kernel. A pointer to this fun= ction is >>> + /// installed in the `blk_mq_ops` vtable for the driver. >>> + /// >>> + /// # Safety >>> + /// >>> + /// - The caller of this function must ensure `bd` is valid >>> + /// and initialized. The pointees must outlive this function. >> >> Until when do the pointees have to be alive? "must outlive this >> function" could also be the case if the pointees die immediately after >> this function returns. >=20 > It should not be plural. What I intended to communicate is that what > `bd` points to must be valid for read for the duration of the function > call. I think that is what "The pointee must outlive this function" > states? Although when we talk about lifetime of an object pointed to by > a pointer, I am not sure about the correct way to word this. Do we talk > about the lifetime of the pointer or the lifetime of the pointed to > object (the pointee). We should not use the same wording for the pointer > and the pointee. >=20 > How about: >=20 > /// - The caller of this function must ensure that the pointee of `bd= ` is > /// valid for read for the duration of this function. But this is not enough for it to be sound, right? You create an `ARef` from `bd.rq`, which potentially lives forever. You somehow need to require that the pointer `bd` stays valid for reads and (synchronized) writes until the request is ended (probably via `blk_mq_end_request`). >>> + /// - This function must not be called with a `hctx` for which >>> + /// `Self::exit_hctx_callback()` has been called. >>> + /// - (*bd).rq must point to a valid `bindings:request` for which >>> + /// `OperationsVTable::init_request_callback` was called >> >> Missing `.` at the end. >=20 > Thanks. >=20 >> >>> + unsafe extern "C" fn queue_rq_callback( >>> + _hctx: *mut bindings::blk_mq_hw_ctx, >>> + bd: *const bindings::blk_mq_queue_data, >>> + ) -> bindings::blk_status_t { >>> + // SAFETY: `bd.rq` is valid as required by the safety requirem= ent for >>> + // this function. >>> + let request =3D unsafe { &*(*bd).rq.cast::>() }; >>> + >>> + // One refcount for the ARef, one for being in flight >>> + request.wrapper_ref().refcount().store(2, Ordering::Relaxed); >>> + >>> + // SAFETY: We own a refcount that we took above. We pass that = to `ARef`. >>> + // By the safety requirements of this function, `request` is a= valid >>> + // `struct request` and the private data is properly initializ= ed. >>> + let rq =3D unsafe { Request::aref_from_raw((*bd).rq) }; >> >> I think that you need to require that the request is alive at least >> until `blk_mq_end_request` is called for the request (since at that >> point all `ARef`s will be gone). >> Also if this is not guaranteed, the safety requirements of >> `AlwaysRefCounted` are violated (since the object can just disappear >> even if it has refcount > 0 [the refcount refers to the Rust refcount in >> the `RequestDataWrapper`, not the one in C]). >=20 > Yea, for the last invariant of `Request`: >=20 > /// * `self` is reference counted by atomic modification of > /// self.wrapper_ref().refcount(). >=20 > I will add this to the safety comment at the call site: >=20 > // - `rq` will be alive until `blk_mq_end_request` is called and is > // reference counted by `ARef` until then. Seems like you already want to use this here :) [...] >>> + /// This function is called by the C kernel. A pointer to this fun= ction is >>> + /// installed in the `blk_mq_ops` vtable for the driver. >>> + /// >>> + /// # Safety >>> + /// >>> + /// This function may only be called by blk-mq C infrastructure. `= set` must `set` doesn't exist (`_set` does), you are also not using this requirement. >>> + /// point to an initialized `TagSet`. >>> + unsafe extern "C" fn init_request_callback( >>> + _set: *mut bindings::blk_mq_tag_set, >>> + rq: *mut bindings::request, >>> + _hctx_idx: core::ffi::c_uint, >>> + _numa_node: core::ffi::c_uint, >>> + ) -> core::ffi::c_int { >>> + from_result(|| { >>> + // SAFETY: The `blk_mq_tag_set` invariants guarantee that = all >>> + // requests are allocated with extra memory for the reques= t data. >> >> What guarantees that the right amount of memory has been allocated? >> AFAIU that is guaranteed by the `TagSet` (but there is no invariant). >=20 > It is by C API contract. `TagSet`::try_new` (now `new`) writes > `cmd_size` into the `struct blk_mq_tag_set`. That is picked up by > `blk_mq_alloc_tag_set` to allocate the right amount of space for each req= uest. >=20 > The invariant here is on the C type. Perhaps the wording is wrong. I am > not exactly sure how to express this. How about this: >=20 > // SAFETY: We instructed `blk_mq_alloc_tag_set` to allocate r= equests > // with extra memory for the request data when we called it i= n > // `TagSet::new`. I think you need a safety requirement on the function: `rq` points to a valid `Request`. Then you could just use `Request::wrapper_ptr` instead of the line below. >>> + let pdu =3D unsafe { bindings::blk_mq_rq_to_pdu(rq) }.cast= ::(); >>> + >>> + // SAFETY: The refcount field is allocated but not initial= ized, this >>> + // valid for write. >>> + unsafe { RequestDataWrapper::refcount_ptr(pdu).write(Atomi= cU64::new(0)) }; >>> + >>> + Ok(0) >>> + }) >>> + } >> >> [...] >> >>> + /// Notify the block layer that a request is going to be processed= now. >>> + /// >>> + /// The block layer uses this hook to do proper initializations su= ch as >>> + /// starting the timeout timer. It is a requirement that block dev= ice >>> + /// drivers call this function when starting to process a request. >>> + /// >>> + /// # Safety >>> + /// >>> + /// The caller must have exclusive ownership of `self`, that is >>> + /// `self.wrapper_ref().refcount() =3D=3D 2`. >>> + pub(crate) unsafe fn start_unchecked(this: &ARef) { >>> + // SAFETY: By type invariant, `self.0` is a valid `struct requ= est`. By >>> + // existence of `&mut self` we have exclusive access. >> >> We don't have a `&mut self`. But the safety requirements ask for a >> unique `ARef`. >=20 > Thanks, I'll rephrase to: >=20 > // SAFETY: By type invariant, `self.0` is a valid `struct request= ` and > // we have exclusive access. >=20 >> >>> + unsafe { bindings::blk_mq_start_request(this.0.get()) }; >>> + } >>> + >>> + fn try_set_end(this: ARef) -> Result, ARef>= { >>> + // We can race with `TagSet::tag_to_rq` >>> + match this.wrapper_ref().refcount().compare_exchange( >>> + 2, >>> + 0, >>> + Ordering::Relaxed, >>> + Ordering::Relaxed, >>> + ) { >>> + Err(_old) =3D> Err(this), >>> + Ok(_) =3D> Ok(this), >>> + } >>> + } >>> + >>> + /// Notify the block layer that the request has been completed wit= hout errors. >>> + /// >>> + /// This function will return `Err` if `this` is not the only `ARe= f` >>> + /// referencing the request. >>> + pub fn end_ok(this: ARef) -> Result<(), ARef> { >>> + let this =3D Self::try_set_end(this)?; >>> + let request_ptr =3D this.0.get(); >>> + core::mem::forget(this); >>> + >>> + // SAFETY: By type invariant, `self.0` is a valid `struct requ= est`. By >>> + // existence of `&mut self` we have exclusive access. >> >> Same here, but in this case, the `ARef` is unique, since you called >> `try_set_end`. You could make it a `# Guarantee` of `try_set_end`: "If >> `Ok(aref)` is returned, then the `aref` is unique." >=20 > Makes sense. I have not seen `# Guarantee` used anywhere. Do you have a l= ink for that use? Alice used it a couple of times, eg in [2]. I plan on putting it in the safety standard. [2]: https://lore.kernel.org/rust-for-linux/20230601134946.3887870-2-alicer= yhl@google.com/ --- Cheers, Benno