Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5700265pxj; Wed, 23 Jun 2021 07:12:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxrh7vZsbqSohK5WQyGiDAGbQDQsCdoUKnxV3DgbAwe6b0rfOHuf+rw4igi+o5vgROrXkjS X-Received: by 2002:a17:906:d288:: with SMTP id ay8mr237489ejb.230.1624457544349; Wed, 23 Jun 2021 07:12:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624457544; cv=none; d=google.com; s=arc-20160816; b=t4N7wAtoQUYhGPIeuIYZPpSXzUeXtdSapUCI0EcM8R2+YkWdvFX4ANPVjKmYmnr8Lk JMx/9qwwNsAPuoq0wDZS9aNI9Jj4E0rxV+yDe9oeC6utSgcRNUYYHbNM3E6qOB90sIGA 3uRG4JnRxT3vE9tZm2wSfF3l5JsJn4/UPjvT2sdIHXcuo5t0IjBdf4XMILPBsMkO28uS 8QAjtQWwharR9ZpMRKjHqCtcjcadM+yd+XzQRozVyxDRLhRG9A4WhVscC4QSTTSxoLlk LPL1ja/1tBadKjyLvNeap/Hf3FX2yak5E9CKD+djCMgljiDKoPmxkbz5gwch/0Q7x1mE wRrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=/LnntjySTViOQ7jkgoAFWbfcd4ETdiu0/AZOQp7ZG5Q=; b=a7n24/RnHxoy/cn68B1vgDJzAME0eiYAyrgRzKsFTMer30MhNi77a7IJpJgkRkYphE 2jjikfB8Lye9C/9di7URxvgMG1fXj507mS583HTVU0RV9hOgOUeHOWFh96tSyCXYGAqf YfjBJqeLpwmw/TyRvw5LnaxMyb0nS2GUsiRWyIME7kZeW4a8ZNNcQ3PTybwwVtEQvam+ lUm9y1P8o0soS5uaCrmza/LGd2W1tfFGBMSI6uktqbOlu8BCzNhAV+wnKs655V+Jy9wq mN0Uv0mJauM7b3SQb7q159NG0SfuK7zRYKlWJkugKkfyeaXqk4vWf2j5pt8Wl07Qh8rH 8oBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w7si23255958edx.534.2021.06.23.07.12.00; Wed, 23 Jun 2021 07:12:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231240AbhFWOKZ (ORCPT + 99 others); Wed, 23 Jun 2021 10:10:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54970 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231153AbhFWOKQ (ORCPT ); Wed, 23 Jun 2021 10:10:16 -0400 X-Greylist: delayed 1002 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 23 Jun 2021 07:07:57 PDT Received: from gardel.0pointer.net (gardel.0pointer.net [IPv6:2a01:238:43ed:c300:10c3:bcf3:3266:da74]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 56E97C061574; Wed, 23 Jun 2021 07:07:56 -0700 (PDT) Received: from gardel-login.0pointer.net (gardel-mail [85.214.157.71]) by gardel.0pointer.net (Postfix) with ESMTP id A3933E8094B; Wed, 23 Jun 2021 16:07:54 +0200 (CEST) Received: by gardel-login.0pointer.net (Postfix, from userid 1000) id 4A707160DC0; Wed, 23 Jun 2021 16:07:54 +0200 (CEST) Date: Wed, 23 Jun 2021 16:07:54 +0200 From: Lennart Poettering To: Hannes Reinecke Cc: Matteo Croce , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe , linux-kernel@vger.kernel.org, Luca Boccassi , Alexander Viro , Damien Le Moal , Tejun Heo , Javier =?iso-8859-1?Q?Gonz=E1lez?= , Niklas Cassel , Johannes Thumshirn , Matthew Wilcox , Christoph Hellwig , JeffleXu Subject: Re: [PATCH v3 0/6] block: add a sequence number to disks Message-ID: References: <20210623105858.6978-1-mcroce@linux.microsoft.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mi, 23.06.21 14:03, Hannes Reinecke (hare@suse.de) wrote: > On 6/23/21 12:58 PM, Matteo Croce wrote: > > From: Matteo Croce > > > > With this series a monotonically increasing number is added to disks, > > precisely in the genhd struct, and it's exported in sysfs and uevent. > > > > This helps the userspace correlate events for devices that reuse the > > same device, like loop. > > > I'm failing to see the point here. > Apparently you are assuming that there is a userspace tool tracking events, > and has a need to correlate events related to different instances of the > disk. > But if you have an userspace application tracking events, why can't the same > application track the 'add' and 'remove' events to track the lifetime of the > devices, and implement its own numbering based on that? > > Why do we need to burden the kernel with this? The problem is that tracking the "add" and "remove" events is simply not safely possibly right now for block devices whose names are frequently reused. Consider the loopback block device subsystem: whenever some tool wants a loopback block device it will ask the kernel for one and the kernel allocates from the bottom, hence /dev/loop0 is the most frequently used loopback block device. If a large number of concurrently running programs now repeatedly/quickly allocate/deallocate block devices they all sooner or later get /dev/loop0. If they now want to watch the "add" and "remove" uevents for that device for their own use of it there's a very good chance they'll end up seeing the previous user's "add" and "remove" events, as there's simply no way to associate the uevents you see with *your* *own* use of /dev/loop0 right now, and distinguish them from the uevent that might have been queued from a prior use of /dev/loop0 and were just slow to be processed. or to say this differently: loopback devices are named from a very small, dense pool of names, and are frequently and quickly reused. uevents are enqeued asynchronously and potentially take a long time to reach the listeners (after all they have to travel through two AF_NETLINK sockets and udev) and the only way to match up the device uses and their uevents is by these kernel device names that are so useless as a stable identifier. This not only applies to loopback block devices, but many other block device subsystems too. For example nbd allocates from the bottom, too, i.e. /dev/nbd0 is the most like name to be used. And for SCSI devices too: if you quickly plug/unplug/replug a bunch of USB sticks, you'll likely always get /dev/sda... Lennart -- Lennart Poettering, Berlin