Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4513391ybb; Tue, 7 Apr 2020 08:55:37 -0700 (PDT) X-Google-Smtp-Source: APiQypJG/GFRNqmkKraTNiorRUSi+a/8pvdhlI3pphaknQCb6N8cZHyMHn3+tFofeAnzCeISGXT2 X-Received: by 2002:a4a:d746:: with SMTP id h6mr2409060oot.21.1586274937154; Tue, 07 Apr 2020 08:55:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586274937; cv=none; d=google.com; s=arc-20160816; b=kdG5eyTqi6rDeqCbvZWH4PYg8sEMuoXzlhZBMUcttpOGbOqLeVf9ftZw9RNo5n4teh +iJsHsIPJBLVqMwNqkY3S5/1hEmrCir5BKIs7UZ0s5ZhT20WZVpVC/VniR7qkpk+93o9 VpU5H4pV4kEp2TY3aRab2t/IXX4b2tAv1mSdyDFJU4U3uyaS/95ajG1nz6y4kpYd6Iex xmtaeCPXqq1z+TKGcwNUFgEaVVDburreLs0Drjka9xbtrkxtjbkDYQShOK2vQwh3mBr+ mVHnWCyW3hb27yo+ABRAJCQJtNwzO8fPZJI6b13t7b6aBcEY3Fz3vmx28KW84/Y+jI/D BXZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=uIHZ6fH0Kqkf//kjaIuOA4jd5r5/c9B8ih6iWe0VmM4=; b=csneL9xfwOZcST3aU5n78LF3F8RRIlsQekrW9ODIAQ14mT93vxMw4itHa0/t31PRS8 RhLhP/4AoVRNodyZ9s34KjTZbOsWBPU0DOCaCUNWHsEQzNI6dP/XVB7qMCWGtqTmxKIz tVicbxfYVTtaecM2jFYIl69cE3DAiMnCRosD/eWFS/izTOknsJz7ccg3cATV29BlaMp8 7xeSs6kRqmQ/gr/n72ga8cVVjYQe2wTURB2dPt72Rr5fqBgAUYM8lxmYEmqibu/B46GX CFVCyUNEXozNNwtpbFbAh/kE0E6OA7kNy8irUlvVC0JaUrkglB0orOvW5CpHnxEmL2gS J+tw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z22si1293387oto.237.2020.04.07.08.55.23; Tue, 07 Apr 2020 08:55:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727473AbgDGPxd (ORCPT + 99 others); Tue, 7 Apr 2020 11:53:33 -0400 Received: from gardel.0pointer.net ([85.214.157.71]:33398 "EHLO gardel.0pointer.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726833AbgDGPxd (ORCPT ); Tue, 7 Apr 2020 11:53:33 -0400 Received: from gardel-login.0pointer.net (gardel.0pointer.net [IPv6:2a01:238:43ed:c300:10c3:bcf3:3266:da74]) by gardel.0pointer.net (Postfix) with ESMTP id 1F908E8017E; Tue, 7 Apr 2020 17:53:30 +0200 (CEST) Received: by gardel-login.0pointer.net (Postfix, from userid 1000) id A667E161537; Tue, 7 Apr 2020 17:53:29 +0200 (CEST) Date: Tue, 7 Apr 2020 17:53:29 +0200 From: Lennart Poettering To: Miklos Szeredi Cc: Ian Kent , David Howells , Christian Brauner , Linus Torvalds , Al Viro , dray@redhat.com, Karel Zak , Miklos Szeredi , Steven Whitehouse , Jeff Layton , andres@anarazel.de, keyrings@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Aleksa Sarai Subject: Re: Upcoming: Notifications, FS notifications and fsinfo() Message-ID: <20200407155329.GA39803@gardel-login> References: <20200402155020.GA31715@gardel-login> <20200403110842.GA34663@gardel-login> <20200403150143.GA34800@gardel-login> <20200406172917.GA37692@gardel-login> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Di, 07.04.20 15:59, Miklos Szeredi (miklos@szeredi.hu) wrote: > On Tue, Apr 7, 2020 at 4:22 AM Ian Kent wrote: > > > Right now, when you have n mounts, and any mount changes, or one is > > > added or removed then we have to parse the whole mount table again, > > > asynchronously, processing all n entries again, every frickin > > > time. This means the work to process n mounts popping up at boot is > > > O(n²). That sucks, it should be obvious to anyone. Now if we get that > > > fixed, by some mount API that can send us minimal notifications about > > > what happened and where, then this becomes O(n), which is totally OK. > > Something's not right with the above statement. Hint: if there are > lots of events in quick succession, you can batch them quite easily to > prevent overloading the system. > > Wrote a pair of utilities to check out the capabilities of the current > API. The first one just creates N mounts, optionally sleeping > between each. The second one watches /proc/self/mountinfo and > generates individual (add/del/change) events based on POLLPRI and > comparing contents with previous instance. > > First use case: create 10,000 mounts, then start the watcher and > create 1000 mounts with a 50ms sleep between them. Total time (user + > system) consumed by the watcher: 25s. This is indeed pretty dismal, > and a per-mount query will help tremendously. But it's still "just" > 25ms per mount, so if the mounts are far apart (which is what this > test is about), this won't thrash the system. Note, how this is self > regulating: if the load is high, it will automatically batch more > requests, preventing overload. It is also prone to lose pairs of add > + remove in these case (and so is the ring buffer based one from > David). We will batch requests too in systemd, of course, necessarily, given that the /p/s/mi inotify stuff is async. Thing though is that this means we buy lower CPU usage — working around the O(n²) issue — by introducing artifical higher latencies. We usually want to boot quickly, and not artificially slow. Sure one can come up with some super smart scheme how to tweak the artifical latencies, how to grow them, how to shrink them, depending on a perceived flood of events, some backing off scheme. But that's just polishing a turd, if all we want is proper queued change notification without the O(n²) behaviour. I mean, the fix for an O(n²) algorithm is to make it O(n) or so. By coalescing wake-up events you just lower the n again, probably linearly, but that still means we pay O(n²), which sucks. Lennart -- Lennart Poettering, Berlin