Received: by 2002:ab2:6816:0:b0:1f9:5764:f03e with SMTP id t22csp2448238lqo; Mon, 20 May 2024 06:24:50 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWo4y7n6Svl39nBO4qySRhpQSYpyW4CUFp+ZEbDAQD3ihWhmgHxCwHqUay9pPrFTO2rFgXS5I7cLjImSTt9lvZSXLwvYh3dogX2TOIWpw== X-Google-Smtp-Source: AGHT+IE2JiNhKsk30mF2JMt8odxWteKQojcHHzBv7eP69a0EBA2+XG54xSSl5HOWFkiRNv0nG8vw X-Received: by 2002:a50:cddc:0:b0:572:67ee:d3d9 with SMTP id 4fb4d7f45d1cf-5734d5ce8e7mr19705910a12.17.1716211490595; Mon, 20 May 2024 06:24:50 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1716211490; cv=pass; d=google.com; s=arc-20160816; b=HTMX6p0JJUwYfz3MRTpwqkq/SrZRcY3BbbaV+Chqfk2HX0i7zB4a+hFvuDa9spc/Mu A7TJBA0CGEhHSLgW72HNmMEDwihtZe1maSwf1Q5bSTnHY8C2JWHxC8F56aWjsM8eNi2S 4WAYx9mAdjBzbulXj20RQ+rMqlnGR0Ns54HtcXYy95gnO+MiF5/bBJD9iGwrbTMnrdKP lqTfiN57RHRTjOTAVTi441TJdoDW0q//yaHZOQFO1i16NAakJk0CwUAzFJUgPoubIr2e 6PQYIu8GPufTZwHHHGXNE7x3vaWDKpCLJUOrKpnEhbtGuFdbesW1lxjVwxgUzy30pxwR BzTQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:content-transfer-encoding:references:in-reply-to:date:cc :to:from:subject:message-id:dkim-signature; bh=jRuGR9E/ZO82AQP2eZ2LFI6ryYPE7ZM0uMC0Lon4N/E=; fh=Lxne1YORvh5E8bI2whSgCtH08X/oRACjxrtZoBa81v4=; b=JbmRciLHODGcnZomtS2PFDZd9qtcqsf3ZZ1d4foT7oEM4TZB97i3haj0ubBT+cimk/ J44SK4EXwLkU1nf4oUggP/XaOZUfPluU07NiTZKX30qKTo+PpUNavvd9xffnseZqYxeY /peGPicJt0ieEFGM+A8OZyj7A6NF4kEGE+SV6OVctKnhEy0bQZtLpwJYOaDKRcVFPouv XFdWhk3KN2OBqaHNpAUtTsTXpwZHStJbb1HA7vo7CyjZCC8dtxx7WwQpHPSGo0NJNhRY VUvRGH30mA1IurD/YKAuKAfR6LQr4krK3h6eueE5uwQHdBvVvu4wwU1Id4huKQOCNqnr 8msg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=XhnwnH2o; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-183786-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-183786-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id 4fb4d7f45d1cf-5733c2d56d6si12589695a12.301.2024.05.20.06.24.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 May 2024 06:24:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-183786-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=XhnwnH2o; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-183786-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-183786-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 294751F22436 for ; Mon, 20 May 2024 13:24:50 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2E7DE136987; Mon, 20 May 2024 13:24:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XhnwnH2o" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF8AA13398E; Mon, 20 May 2024 13:24:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716211470; cv=none; b=LII+PqUicGDxmWj6YQOBWNA38fiFbVyvZEYDs+sRmJyQX6s4foZOEX09/N6Wlk7p3+Sg6V54akTCpKuVPPR4RJGRP02MLtJ3q0Ijhin/fKnbm0jx3kYBYSYckQPmPUdjKhl1XB6Qax0l6yqKDr1zyLKMfcSHpZW0D62xUNYQhCk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716211470; c=relaxed/simple; bh=tmycHzHS16yUDsFnDDAGL9c2k/ykCsqRadkM55Fr7Rk=; h=Message-ID:Subject:From:To:Cc:Date:In-Reply-To:References: Content-Type:MIME-Version; b=qFzwdlwfdtwJIUw9onoJ1fXjxdXyx85UPcI7SaZMiQg5n5nR2+lBFtT/iGoe4ucrAzpc1R9HjExHNklohruEukJHTGuX3aWaUs1BmuF/GbRy0tNW+X/SRAi0FJxG2cXfxtHVZwFoRW5CsvSRXpIxzdsDZXDTxsc5Dufyk1MNBc8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XhnwnH2o; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id D3F33C2BD10; Mon, 20 May 2024 13:24:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716211470; bh=tmycHzHS16yUDsFnDDAGL9c2k/ykCsqRadkM55Fr7Rk=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=XhnwnH2o/kK01P+lYGJGEibHOO3m9DuugiCuE9yJ90uFaL2eM+Z/d2XN3lfTGzpFb U4Wdfi2wQEEbGEARAa9r0h/ryL8Yte54qJzPeudrmaXyUi1BkFmVqPUbtUc1lrvbFV XsHOTEM3WPnFIpxJ0fUeqvW/7p3x4ebrAR5QC1u+ujqVeEJsG5gMEPPpGKMb0g1zUK rDW1NGM8VzK7VKTndup+Bftvki/RVcH2m9dQ5A1WdxCcqvAsFnANYp+gm6u8NyHqo7 3uj8bfGfzDkE9jUBcavi0KrEt6fVGzzxkJOkYHTb/1t1fSpWuKtwRysfthdhp0fM7v U+/eUYApP/Nvg== Message-ID: Subject: Re: [PATCH v2 4/5] cachefiles: cyclic allocation of msg_id to avoid reuse From: Jeff Layton To: Baokun Li , netfs@lists.linux.dev, dhowells@redhat.com Cc: hsiangkao@linux.alibaba.com, jefflexu@linux.alibaba.com, zhujia.zj@bytedance.com, linux-erofs@lists.ozlabs.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, yangerkun@huawei.com, houtao1@huawei.com, yukuai3@huawei.com, wozizhi@huawei.com, Baokun Li Date: Mon, 20 May 2024 09:24:27 -0400 In-Reply-To: References: <20240515125136.3714580-1-libaokun@huaweicloud.com> <20240515125136.3714580-5-libaokun@huaweicloud.com> <4b1584787dd54bb95d700feae1ca498c40429551.camel@kernel.org> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.50.4 (3.50.4-1.fc39) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 On Mon, 2024-05-20 at 20:42 +0800, Baokun Li wrote: > On 2024/5/20 18:04, Jeff Layton wrote: > > On Mon, 2024-05-20 at 12:06 +0800, Baokun Li wrote: > > > Hi Jeff, > > >=20 > > > Thank you very much for your review! > > >=20 > > > On 2024/5/19 19:11, Jeff Layton wrote: > > > > On Wed, 2024-05-15 at 20:51 +0800, > > > > libaokun@huaweicloud.com=C2=A0wrote: > > > > > From: Baokun Li > > > > >=20 > > > > > Reusing the msg_id after a maliciously completed reopen > > > > > request may cause > > > > > a read request to remain unprocessed and result in a hung, as > > > > > shown below: > > > > >=20 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 t1=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 t2=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 t3 > > > > > ------------------------------------------------- > > > > > cachefiles_ondemand_select_req > > > > > =C2=A0=C2=A0 cachefiles_ondemand_object_is_close(A) > > > > > =C2=A0=C2=A0 cachefiles_ondemand_set_object_reopening(A) > > > > > =C2=A0=C2=A0 queue_work(fscache_object_wq, &info->work) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ondemand_object_worker > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cachefiles_ondemand_init_obje= ct(A) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cachefiles_ondemand_sen= d_req(OPEN) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // get msg_= id 6 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 wait_for_co= mpletion(&req_A->done) > > > > > cachefiles_ondemand_daemon_read > > > > > =C2=A0=C2=A0 // read msg_id 6 req_A > > > > > =C2=A0=C2=A0 cachefiles_ondemand_get_fd > > > > > =C2=A0=C2=A0 copy_to_user > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // Malicious c= ompletion > > > > > msg_id 6 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 copen 6,-1 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 cachefiles_ond= emand_copen > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 complete= (&req_A->done) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // will = not set the object > > > > > to close > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // becau= se ondemand_id && > > > > > fd is valid. > > > > >=20 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // ondemand_object_worker() is done > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // but the object is still reopenin= g. > > > > >=20 > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // new open re= q_B > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > > > > > cachefiles_ondemand_init_object(B) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 > > > > > cachefiles_ondemand_send_req(OPEN) > > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 // reuse= msg_id 6 > > > > > process_open_req > > > > > =C2=A0=C2=A0 copen 6,A.size > > > > > =C2=A0=C2=A0 // The expected failed copen was executed successful= ly > > > > >=20 > > > > > Expect copen to fail, and when it does, it closes fd, which > > > > > sets the > > > > > object to close, and then close triggers reopen again. > > > > > However, due to > > > > > msg_id reuse resulting in a successful copen, the anonymous > > > > > fd is not > > > > > closed until the daemon exits. Therefore read requests > > > > > waiting for reopen > > > > > to complete may trigger hung task. > > > > >=20 > > > > > To avoid this issue, allocate the msg_id cyclically to avoid > > > > > reusing the > > > > > msg_id for a very short duration of time. > > > > >=20 > > > > > Fixes: c8383054506c ("cachefiles: notify the user daemon when > > > > > looking up cookie") > > > > > Signed-off-by: Baokun Li > > > > > --- > > > > > =C2=A0=C2=A0 fs/cachefiles/internal.h |=C2=A0 1 + > > > > > =C2=A0=C2=A0 fs/cachefiles/ondemand.c | 20 ++++++++++++++++---- > > > > > =C2=A0=C2=A0 2 files changed, 17 insertions(+), 4 deletions(-) > > > > >=20 > > > > > diff --git a/fs/cachefiles/internal.h > > > > > b/fs/cachefiles/internal.h > > > > > index 8ecd296cc1c4..9200c00f3e98 100644 > > > > > --- a/fs/cachefiles/internal.h > > > > > +++ b/fs/cachefiles/internal.h > > > > > @@ -128,6 +128,7 @@ struct cachefiles_cache { > > > > > =C2=A0=C2=A0=C2=A0 unsigned long req_id_next; > > > > > =C2=A0=C2=A0=C2=A0 struct xarray ondemand_ids; /* > > > > > xarray for ondemand_id allocation */ > > > > > =C2=A0=C2=A0=C2=A0 u32 ondemand_id_next; > > > > > + u32 msg_id_next; > > > > > =C2=A0=C2=A0 }; > > > > > =C2=A0=C2=A0=20 > > > > > =C2=A0=C2=A0 static inline bool cachefiles_in_ondemand_mode(struc= t > > > > > cachefiles_cache *cache) > > > > > diff --git a/fs/cachefiles/ondemand.c > > > > > b/fs/cachefiles/ondemand.c > > > > > index f6440b3e7368..b10952f77472 100644 > > > > > --- a/fs/cachefiles/ondemand.c > > > > > +++ b/fs/cachefiles/ondemand.c > > > > > @@ -433,20 +433,32 @@ static int > > > > > cachefiles_ondemand_send_req(struct cachefiles_object > > > > > *object, > > > > > =C2=A0=C2=A0=C2=A0 smp_mb(); > > > > > =C2=A0=C2=A0=20 > > > > > =C2=A0=C2=A0=C2=A0 if (opcode =3D=3D CACHEFILES_OP_CLOSE && > > > > > - > > > > > !cachefiles_ondemand_object_is_open(object)) { > > > > > + =C2=A0=C2=A0=C2=A0 > > > > > !cachefiles_ondemand_object_is_open(object)) { > > > > > =C2=A0=C2=A0=C2=A0 WARN_ON_ONCE(object->ondemand- > > > > > >ondemand_id =3D=3D 0); > > > > > =C2=A0=C2=A0=C2=A0 xas_unlock(&xas); > > > > > =C2=A0=C2=A0=C2=A0 ret =3D -EIO; > > > > > =C2=A0=C2=A0=C2=A0 goto out; > > > > > =C2=A0=C2=A0=C2=A0 } > > > > > =C2=A0=C2=A0=20 > > > > > - xas.xa_index =3D 0; > > > > > + /* > > > > > + * Cyclically find a free xas to avoid > > > > > msg_id reuse that would > > > > > + * cause the daemon to successfully copen a > > > > > stale msg_id. > > > > > + */ > > > > > + xas.xa_index =3D cache->msg_id_next; > > > > > =C2=A0=C2=A0=C2=A0 xas_find_marked(&xas, UINT_MAX, > > > > > XA_FREE_MARK); > > > > > + if (xas.xa_node =3D=3D XAS_RESTART) { > > > > > + xas.xa_index =3D 0; > > > > > + xas_find_marked(&xas, cache- > > > > > >msg_id_next - 1, XA_FREE_MARK); > > > > > + } > > > > > =C2=A0=C2=A0=C2=A0 if (xas.xa_node =3D=3D XAS_RESTART) > > > > > =C2=A0=C2=A0=C2=A0 xas_set_err(&xas, -EBUSY); > > > > > + > > > > > =C2=A0=C2=A0=C2=A0 xas_store(&xas, req); > > > > > - xas_clear_mark(&xas, XA_FREE_MARK); > > > > > - xas_set_mark(&xas, CACHEFILES_REQ_NEW); > > > > > + if (xas_valid(&xas)) { > > > > > + cache->msg_id_next =3D xas.xa_index + > > > > > 1; > > > > If you have a long-standing stuck request, could this counter > > > > wrap > > > > around and you still end up with reuse? > > > Yes, msg_id_next is declared to be of type u32 in the hope that > > > when > > > xa_index =3D=3D UINT_MAX, a wrap around occurs so that msg_id_next > > > goes to zero. Limiting xa_index to no more than UINT_MAX is to > > > avoid > > > the xarry being too deep. > > >=20 > > > If msg_id_next is equal to the id of a long-standing stuck > > > request > > > after the wrap-around, it is true that the reuse in the above > > > problem > > > may also occur. > > >=20 > > > But I feel that a long stuck request is problematic in itself, it > > > means > > > that after we have sent 4294967295 requests, the first one has > > > not > > > been processed yet, and even if we send a million requests per > > > second, this one hasn't been completed for more than an hour. > > >=20 > > > We have a keep-alive process that pulls the daemon back up as > > > soon as it exits, and there is a timeout mechanism for requests > > > in > > > the daemon to prevent the kernel from waiting for long periods > > > of time. In other words, we should avoid the situation where > > > a request is stuck for a long period of time. > > >=20 > > > If you think UINT_MAX is not enough, perhaps we could raise > > > the maximum value of msg_id_next to ULONG_MAX? > > > > Maybe this should be using > > > > ida_alloc/free instead, which would prevent that too? > > > >=20 > > > The id reuse here is that the kernel has finished the open > > > request > > > req_A and freed its id_A and used it again when sending the open > > > request req_B, but the daemon is still working on req_A, so the > > > copen id_A succeeds but operates on req_B. > > >=20 > > > The id that is being used by the kernel will not be allocated > > > here > > > so it seems that ida _alloc/free does not prevent reuse either, > > > could you elaborate a bit more how this works? > > >=20 > > ida_alloc and free absolutely prevent reuse while the id is in use. > > That's sort of the point of those functions. Basically it uses a > > set of > > bitmaps in an xarray to track which IDs are in use, so ida_alloc > > only > > hands out values which are not in use. See the comments over > > ida_alloc_range() in lib/idr.c. > >=20 > Thank you for the explanation! >=20 > The logic now provides the same guarantees as ida_alloc/free. > The "reused" id, indeed, is no longer in use in the kernel, but it is > still > in use in the userland, so a multi-threaded daemon could be handling > two different requests for the same msg_id at the same time. >=20 > Previously, the logic for allocating msg_ids was to start at 0 and > look > for a free xas.index, so it was possible for an id to be allocated to > a > new request just as the id was being freed. >=20 > With the change to cyclic allocation, the kernel will not use the > same > id again until INT_MAX requests have been sent, and during the time > it takes to send requests, the daemon has enough time to process > requests whose ids are still in use by the daemon, but have already > been freed in the kernel. >=20 >=20 If you're checking for collisions somewhere else, then this should be fine: Acked-by: Jeff Layton