Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp5734599pxj; Wed, 23 Jun 2021 07:56:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJze7+7/28s7lvJnML/3NrzSswsLY+cOFavXmwkeZq28R77DAnveCK2RwDH9gmRGHp2DzOLi X-Received: by 2002:a17:907:1c0a:: with SMTP id nc10mr451407ejc.294.1624460194184; Wed, 23 Jun 2021 07:56:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1624460194; cv=none; d=google.com; s=arc-20160816; b=orRHeDABvurkrw8M/kThaKhnlNDm9t57+/K2jIrwagjsr4g5KmH2i1+RCXXhybTBLd DdTWm2qF8V/dr4d5WJGgLVhEJWqf32UaEkBdyUcxA2UlMQMOOHaGp1g21uMzw7Iw87KZ AujCRtov68Vnj4KpEtTa3quDRPvARKOJjZQiGbODgBXp07rw3EN3Y2qj2s68bxdvODyY 8irYHsnWECJqzkW0pcfaYol5pP5JcpKgSMqZT3/6+/SDyqr3MKmICZXnhGzK2PmwBlMZ q46z1/NYbK9FltJzLbKav+PjNHvYhJ4GDj8zpKtpIX3g8aadbNDD729pGRCvYtCD4zJA 58Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=oma3XP4xJsIwZ6i248BkXqyiwWZioypAcM/++teWrvo=; b=x5ddqEUEESKwh3y+kwVvTnZWZWRCMRfxsyjEpx1LsnwGykvvCArHc8uY3VUEUqb4Nf IqDdpqp7WtrOGPTWDzjpjJXUJQvunF+bb6x2N4JeyJPUGJUHSZDTHXXvsSb9NhL8IagY TAltkAgzFqM5sYadClCWnrUgBgfNuaTCo8qFaT6zTNFkx4lcWPSXFzNoVMeRZ1lAAUAA 4+Rear25JXFgHbI3knSENIzCdaghqs+fersiXnJxlY7rOBEZzuuk6yGDJcwgLAY66gVn zhhxdeX3PsWIGFHoYJAc0ogXF/ziKJVK7SHz5fJokkvCQTyYPMnxR+8yNA1uAyQfter0 X5tQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g3si46677eds.538.2021.06.23.07.56.10; Wed, 23 Jun 2021 07:56:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230449AbhFWO5f (ORCPT + 99 others); Wed, 23 Jun 2021 10:57:35 -0400 Received: from gardel.0pointer.net ([85.214.157.71]:53798 "EHLO gardel.0pointer.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229523AbhFWO5e (ORCPT ); Wed, 23 Jun 2021 10:57:34 -0400 Received: from gardel-login.0pointer.net (gardel-mail [IPv6:2a01:238:43ed:c300:10c3:bcf3:3266:da74]) by gardel.0pointer.net (Postfix) with ESMTP id 2AE4FE8094B; Wed, 23 Jun 2021 16:55:15 +0200 (CEST) Received: by gardel-login.0pointer.net (Postfix, from userid 1000) id CEA8B160DC0; Wed, 23 Jun 2021 16:55:14 +0200 (CEST) Date: Wed, 23 Jun 2021 16:55:14 +0200 From: Lennart Poettering To: Hannes Reinecke Cc: Luca Boccassi , Matteo Croce , Christoph Hellwig , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, Jens Axboe , Linux Kernel Mailing List , Alexander Viro , Damien Le Moal , Tejun Heo , Javier Gonz??lez , Niklas Cassel , Johannes Thumshirn , Matthew Wilcox , JeffleXu Subject: Re: [PATCH v3 1/6] block: add disk sequence number Message-ID: References: <20210623105858.6978-1-mcroce@linux.microsoft.com> <20210623105858.6978-2-mcroce@linux.microsoft.com> <3be63d9f-d8eb-7657-86dc-8d57187e5940@suse.de> <1b55bc67b75e5cf982c0c1e8f45096f2eb6e8590.camel@debian.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mi, 23.06.21 16:21, Hannes Reinecke (hare@suse.de) wrote: > > We need this so that we can reliably correlate events to instances of a > > device. Events alone cannot solve this problem, because events _are_ > > the problem. > > > In which sense? > Yes, events can be delayed (if you list to uevents), but if you listen to > kernel events there shouldn't be a delay, right? uevents are delivered to userpace via an AF_NETLINK socket. The AF_NETLINK socket is basically an asynchronous buffer. I mean, consider what you are saying: you establish the AF_NETLINK uevent watching socket, then you allocate /dev/loop0. Since you cannot do that atomically, you'll first have to do one, and then the other. But if you do that in two steps, then in the middle some other process might get scheduled that quickly allocates /dev/loop0 and releases it again, before your code gets to run. So now you have in your AF_NETLINK socket buffer the uevents for that other process' use of the device, and you cannot sanely distinguish them from your own. of course you could do it the other way: allocate the device first, and only then allocate the AF_NETLINK uevent socket. But then you might or might not lose the "add" event for the device you just allocated. And you don't know if you should wait for it or not. This isn't even a constructed issue, this is the common case if you have multiple processes all simultaneously trying to acquire a loopback block device, because they all will end up eying /dev/loop0 at the same time. But it gets worse IRL because of various factors. For example, partition probing is asynchronous, so if you use LO_FLAGS_PARTSCAN and want to watch for some partition device associated to your loopback block device to show up, this can take *really* long, so the race window is large. Or you actually use udev (like most userspace probably should) because you want the metainfo it collects about the device, in which case it will take even longer for the uevent to reach you, i.e. the time window where a previous user's uevents and your own for the same loopback device "overlap" can be quite large and you cannot determine if they are yours or the previous user's uevents — unless we have these new sequence numbers. Lennart -- Lennart Poettering, Berlin