Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp5507265pxv; Wed, 28 Jul 2021 12:26:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxOa7ehGjrvc5EVgPrdx2wtlLI+VJC5BKhGWKj7gy3VJrh3uYJ+LNRNX3Ho2aEBZ0MfmDfF X-Received: by 2002:a05:6402:2228:: with SMTP id cr8mr1628900edb.309.1627500381931; Wed, 28 Jul 2021 12:26:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627500381; cv=none; d=google.com; s=arc-20160816; b=W86zrDDGcDsu/Zq1VINGFqU7DwYF3WSQ+S+IXt+bc6GZnqA3uEMR8zo7TZ6D4e2Ci+ ZZg5BBiVCDO7+QGtuz7WQvyQ66z6k2Et42CTc0YPjk0smWrk6ekYqV41VmT6Y7AhRUNU 8d27jnmvpvtZf5fEe4sce5tNvgx/L91XTZg8CV084kDD5kjAKlMisRaMjeCIZVBlQ1/6 tNyXfDX4WU7jC2CDfwzGYKKzECgB3s8jUrithxk95nk41goxTZntns2tIGCrhOmjwsw/ dzwIcmJI3IUjJSvFYRcBVEgRghdX2j/we5s2BoN4msKNszK0ULGidPw5nu/vyddzIVtT UREw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=xtCzcJFAu9dR3CG6ek/fh/xMW6UNckFyG7RsgVYyIWw=; b=JMV+ezy41hxEBPvgdyaMDqzYA02nVAwfhud2ghq6vEvY/Q4Yt/Pseg/lVw/GebIEBT pOL4jHW6RsRYmctWS8TEhJ/kkLKW0NUDFsRcv48YH/+jOjUDbMSprQ3LWinai7RS0uyJ tzrTBWe2HUoYJlT8YYsD9tQ+xpjirS34EsX4Ufr3wv9SbkdpPBT5tOkCliDO5EvqAan0 5t6atD3mmWos6tljBUrHb751n4M1/AKnP4l0T8R8DsIw9hZzwy2dvEllCErRRYPoD5j4 9EoqJojN+cxRcEBql6qMTypgrSsBDpKvYVjkCdTUqtZbMMkK6yziI0HID1Bx1mlQ3Y94 PDCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=TT+fBye+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id jg13si585900ejc.54.2021.07.28.12.25.58; Wed, 28 Jul 2021 12:26:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=TT+fBye+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231309AbhG1TWY (ORCPT + 99 others); Wed, 28 Jul 2021 15:22:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230300AbhG1TWX (ORCPT ); Wed, 28 Jul 2021 15:22:23 -0400 Received: from mail-ot1-x331.google.com (mail-ot1-x331.google.com [IPv6:2607:f8b0:4864:20::331]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45671C0613C1 for ; Wed, 28 Jul 2021 12:22:21 -0700 (PDT) Received: by mail-ot1-x331.google.com with SMTP id 48-20020a9d0bb30000b02904cd671b911bso3258826oth.1 for ; Wed, 28 Jul 2021 12:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=xtCzcJFAu9dR3CG6ek/fh/xMW6UNckFyG7RsgVYyIWw=; b=TT+fBye+Vvf5GJpNluw0WcRTC7515kpgaXVVegMP0wr4inSikyrK9QanPefFtjXmim WTEQtYiCNW/5IW8h+juauUzRzHM/svXLuLrAV+fuKZ9rqkmQbDIA9OT+VvZQH2bjm2cT 0/p6SsgDhIIYH/h46RytjULmrLSTZRSXTrga4HzPz/3Qm0rJj1aeVtkoOK4gaNlySQvW 3M+BSBbmJisXYnHtIp6o6BQhrd/mwd9K2OyKVF+/qPBE4+wp0j2A3zZhqHHEgyHyZMqF tmmtTCOr682Mglw8SU0cvL4tSZlC4vzW8FLEHyOkF1swMSKXTy+H6mdf1t8PCYkkfmg8 Lovw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xtCzcJFAu9dR3CG6ek/fh/xMW6UNckFyG7RsgVYyIWw=; b=lhTbyWOl+SWO4JhjVB6Fzw5g4LTlTelPfbh6m8+1WiSPZjYhr2z8dBXTkOSWGCRygR YwbxYb/Tgfi1HrqggSjUidigHuOg7e6bUeZr4XXxEvxtlS/brawiO7MlTbCry2lg8RxA TXCt2XBAUkaHJb+TCIBo1/WegyL/3aj0raQlb1gNkWj3cnG4q4yqdGSFfRvT0kFqSvtH ULGFI11N/BHubykmn8V/4za+07wmzP7BLMX/5YQ22wq/dEVEpWjqXk5rD0KgIDC5ZEM6 Ks6wJuQR8BijrZPHk/qphd17ZaEMO1m2HOjbB8E+TLp00KwIKG04PmiIKIzxKFOlfwo6 roxA== X-Gm-Message-State: AOAM530daO2avoIJL7amwrWsOVWNh++zrmeuZ4QsbKTygM3q3OBfgVSG wwRBj+IkkGxKhf4gcIjZeZWy4g== X-Received: by 2002:a9d:6e8a:: with SMTP id a10mr1050189otr.51.1627500140530; Wed, 28 Jul 2021 12:22:20 -0700 (PDT) Received: from [192.168.1.30] ([207.135.234.126]) by smtp.gmail.com with ESMTPSA id p4sm127444ooa.35.2021.07.28.12.22.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Jul 2021 12:22:20 -0700 (PDT) Subject: Re: [PATCH v5 0/5] block: add a sequence number to disks To: Matteo Croce , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, Christoph Hellwig Cc: linux-kernel@vger.kernel.org, Lennart Poettering , Luca Boccassi , Alexander Viro , Damien Le Moal , Tejun Heo , =?UTF-8?Q?Javier_Gonz=c3=a1lez?= , Niklas Cassel , Johannes Thumshirn , Hannes Reinecke , Matthew Wilcox , JeffleXu References: <20210712230530.29323-1-mcroce@linux.microsoft.com> From: Jens Axboe Message-ID: Date: Wed, 28 Jul 2021 13:22:18 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210712230530.29323-1-mcroce@linux.microsoft.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/12/21 5:05 PM, Matteo Croce wrote: > From: Matteo Croce > > Associating uevents with block devices in userspace is difficult and racy: > the uevent netlink socket is lossy, and on slow and overloaded systems has > a very high latency. Block devices do not have exclusive owners in > userspace, any process can set one up (e.g. loop devices). Moreover, device > names can be reused (e.g. loop0 can be reused again and again). A userspace > process setting up a block device and watching for its events cannot thus > reliably tell whether an event relates to the device it just set up or > another earlier instance with the same name. > > Being able to set a UUID on a loop device would solve the race conditions. > But it does not allow to derive orderings from uevents: if you see a uevent > with a UUID that does not match the device you are waiting for, you cannot > tell whether it's because the right uevent has not arrived yet, or it was > already sent and you missed it. So you cannot tell whether you should wait > for it or not. > > Being able to set devices up in a namespace would solve the race conditions > too, but it can work only if being namespaced is feasible in the first > place. Many userspace processes need to set devices up for the root > namespace, so this solution cannot always work. > > Changing the loop devices naming implementation to always use > monotonically increasing device numbers, instead of reusing the lowest > free number, would also solve the problem, but it would be very disruptive > to userspace and likely break many existing use cases. It would also be > quite awkward to use on long-running machines, as the loop device name > would quickly grow to many-digits length. > > Furthermore, this problem does not affect only loop devices - partition > probing is asynchronous and very slow on busy systems. It is very easy to > enter races when using LO_FLAGS_PARTSCAN and watching for the partitions to > show up, as it can take a long time for the uevents to be delivered after > setting them up. > > Associating a unique, monotonically increasing sequential number to the > lifetime of each block device, which can be retrieved with an ioctl > immediately upon setting it up, allows to solve the race conditions with > uevents, and also allows userspace processes to know whether they should > wait for the uevent they need or if it was dropped and thus they should > move on. > > This does not benefit only loop devices and block devices with multiple > partitions, but for example also removable media such as USB sticks or > cdroms/dvdroms/etc. > > The first patch is the core one, the 2..4 expose the information in > different ways, and the last one makes the loop device generate a media > changed event upon attach, detach or reconfigure, so the sequence number > is increased. > > If merged, this feature will immediately used by the userspace: > https://github.com/systemd/systemd/issues/17469#issuecomment-762919781 Applied for 5.15, with #2 done manually since it didn't apply cleanly. -- Jens Axboe