Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp376176rdh; Sat, 23 Sep 2023 14:46:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFtShr5biGnx3GOdXGQMnMXwb0joVrUd56g2W8X1Z82zV3YHCfjXwn5hVm9OZfCUqEXdr/X X-Received: by 2002:a05:620a:bc6:b0:75d:4e8b:9d19 with SMTP id s6-20020a05620a0bc600b0075d4e8b9d19mr3495432qki.26.1695505606933; Sat, 23 Sep 2023 14:46:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695505606; cv=none; d=google.com; s=arc-20160816; b=xVX1hx1bSQzP3EKRW4pmhEBbsVHVqpfvI5vpPEpCTwrs5FmzZkOTz/AD+zhuwCFQYo 124/QI/MgvP0rF/0jdfkcUE+XcqqNsej8hVRV4xNCFMq8yFNbhcx9dGN3kUfUZsb7k8i E+dTWSwmQfiQY6PlD5OF5o4FEbXrzST0J3eA/+Af3k4PFsZ9P6q44zk4bt8gLCmbPLhK MnRz6k7I1pU2KHnk7rRjN6j+qDxn04ZjVCaS7pUe3AbsQfzIglcTdetySgOch2EbRMIW CG8T7ZFfP4OgddD+SCr8MlsB0PTtCmQfAPmL7/3Pnhq9kAp6k9C4divJeLFU7s344zFK qiuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=oKJoZzk5+TuDNioAs1hxZJOGkwfd9HbCqvFh7/jI5HQ=; fh=2+aX4amzICepqfV5PtExM31P+uvyAN0b3HaY3H/NTzY=; b=h2RxzXKI/730wScIfZWQrWIg7+TTNGn6CRl+vpPwGhJHzwc2EP2tFrC4NBHl1MxtbR 7yMN4gzesFTCTlRoynARRQwpSR9xclxN1JnnuQJaTQnH9wpYMfNk/1Pi+WkabKPAZBUb 0SI5wry4tzvqF56okdzf6UJLkWRGR70UZ41qJvaQwFdQrx4uFPGKBCZ5wV96CTZHG/PS mg0PACg5YHLw3I5w+7TgrHLy7TxXPPK74doYgFfpVM2KLGVeLDsK541ZbDksUk4YsNx/ 9ZEtf3Uz1PmyZln6Fk5/Ety54XIye/op9uw/mDljjeZft2XHY5G2N8PByWHWYmML+nIY +MWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=fkVI0mTI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id u7-20020a627907000000b0068a3cb58334si6693845pfc.320.2023.09.23.14.46.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 14:46:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=fkVI0mTI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id CBE55812A613; Sat, 23 Sep 2023 10:49:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232132AbjIWRtU (ORCPT + 99 others); Sat, 23 Sep 2023 13:49:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230192AbjIWRtT (ORCPT ); Sat, 23 Sep 2023 13:49:19 -0400 Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50058127 for ; Sat, 23 Sep 2023 10:49:12 -0700 (PDT) Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-2c012232792so65947531fa.0 for ; Sat, 23 Sep 2023 10:49:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1695491350; x=1696096150; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oKJoZzk5+TuDNioAs1hxZJOGkwfd9HbCqvFh7/jI5HQ=; b=fkVI0mTIEtILlo8ftyxWZ2j6KBrFDl1dic9hwVCaRqEITf4KfeVR1xs2eqfmf4S38+ MkIkRx8AfD/6PF0/NBncTeayOi6QHzNSQZF/XGVkG2P8ApaJ9NYtstelCsAKqHmDEhE0 zIYHzWjlKcL+ytJRhBRyFlakSU6pJTlRUoxso= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695491350; x=1696096150; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oKJoZzk5+TuDNioAs1hxZJOGkwfd9HbCqvFh7/jI5HQ=; b=rIuiNbZLQe2ICf4Bqf7kf54A/taZoUAco2TXpkSp6dhJtqUuqvAEIpZYz/5efCebvL MoanGXUyrhNZQSCqi/1rHpDidMTt5LKU3Md17bQyFkCeNdz29HhZztqjFCqE4ygnJFjj 0ii7v6GB74fJ+knduiiB4e80nMjlles0T2Zt8dM2tZ2ePfk5JkUjLBH/jZcuyM+XF6pu uCahxDbOzJns6/ieokKkoSVw2pgaGmCAUGpEZ2gvTXRQ5jF1eXQX/AQ2EZ1xVdtuZKrV FbR8Gk/CEMO87DLm1TcFWWulJI98jr2ofm0IQqqQ7X88rrFjReHDSSXlcDzV46JD038O r6aQ== X-Gm-Message-State: AOJu0YzhnRvYzBL0CiT7vVuRJ9jdSMlNTujEnlWcThxlU4gzRD7AQVJw 45MbzpF671etgQHLR5A8ldokyTk/kNTDyD9lrYzMVILn X-Received: by 2002:a2e:3304:0:b0:2bd:180d:67b7 with SMTP id d4-20020a2e3304000000b002bd180d67b7mr2086892ljc.40.1695491350038; Sat, 23 Sep 2023 10:49:10 -0700 (PDT) Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com. [209.85.167.48]) by smtp.gmail.com with ESMTPSA id j3-20020a2eb703000000b002c0414c3b6csm1413063ljo.121.2023.09.23.10.49.09 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 23 Sep 2023 10:49:09 -0700 (PDT) Received: by mail-lf1-f48.google.com with SMTP id 2adb3069b0e04-50437c618b4so4025079e87.2 for ; Sat, 23 Sep 2023 10:49:09 -0700 (PDT) X-Received: by 2002:a05:6512:3090:b0:501:ba04:f34b with SMTP id z16-20020a056512309000b00501ba04f34bmr2848652lfd.44.1695491348654; Sat, 23 Sep 2023 10:49:08 -0700 (PDT) MIME-Version: 1.0 References: <20230921-umgekehrt-buden-a8718451ef7c@brauner> <0d006954b698cb1cea3a93c1662b5913a0ded3b1.camel@kernel.org> In-Reply-To: From: Linus Torvalds Date: Sat, 23 Sep 2023 10:48:51 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [GIT PULL v2] timestamp fixes To: Amir Goldstein Cc: Jeff Layton , Christian Brauner , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Jan Kara , "Darrick J. Wong" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=2.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,LOTS_OF_MONEY, MAILING_LIST_MULTI,MONEY_NOHTML,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Sat, 23 Sep 2023 10:49:23 -0700 (PDT) X-Spam-Level: ** On Fri, 22 Sept 2023 at 23:36, Amir Goldstein wrote: > > Apparently, they are willing to handle the "year 2486" issue ;) Well, we could certainly do the same at the VFS layer. But I suspect 10ns resolution is entirely overkill, since on a lot of platforms you don't even have timers with that resolution. I feel like 100ns is a much more reasonable resolution, and is quite close to a single system call (think "one thousand cycles at 10GHz"). > But the resolution change is counter to the purpose of multigrain > timestamps - if two syscalls updated the same or two different inodes > within a 100ns tick, apparently, there are some workloads that > care to know about it and fs needs to store this information persistently. Those workloads are broken garbage, and we should *not* use that kind of sh*t to decide on VFS internals. Honestly, if the main reason for the multigrain resolution is something like that, I think we should forget about MG *entirely*. Somebody needs to be told to get their act together. We have *never* guaranteed nanosecond resolution on timestamps, and I think we should put our foot down and say that we never will. Partly because we have platforms where that kind of timer resolution just does not exist. Partly because it's stupid to expect that kind of resolution anyway. And partly because any load that assumes that kind of resolution is already broken. End result: we should ABSOLUTELY NOT have as a target to support some insane resolution. 100ns resolution for file access times is - and I'll happily go down in history for saying this - enough for anybody. If you need finer resolution than that, you'd better do it yourself in user space. And no, this is not a "but some day we'll have terahertz CPU's and 100ns is an eternity". Moore's law is dead, we're not going to see terahertz CPUs, and people who say "but quantum" have bought into a technological fairytale. 100ns is plenty, and has the advantage of having a very safe range. That said, we don't have to do powers-of-ten. In fact, in many ways, it would probably be a good idea to think of the fractional seconds in powers of two. That tends to make it cheaper to do conversions, without having to do a full 64-bit divide (a constant divide turns into a fancy multiply, but it's still painful on 32-bit architectures). So, for example, we could easily make the format be a fixed-point format with "sign bit, 38 bit seconds, 25 bit fractional seconds", which gives us about 30ns resolution, and a range of almost 9000 years. Which is nice, in how it covers all of written history and all four-digit years (we'd keep the 1970 base). And 30ns resolution really *is* pretty much the limit of a single system call. I could *wish* we had system calls that fast, or CPU's that fast. Not the case right now, and sadly doesn't seem to be the case in the forseeable future - if ever - either. It would be a really good problem to have. And the nice thing about that would be that conversion to timespec64 would be fairly straightforward: struct timespec64 to_timespec(fstime_t fstime) { struct timespec64 res; unsigned int frac; frac = fstime & 0x1ffffffu; res.tv_sec = fstime >> 25; res.tv_nsec = frac * 1000000000ull >> 25; return res; } fstime_t to_fstime(struct timespec64 a) { fstime_t sec = (fstime_t) a.tv_sec << 25; unsigned frac = a.tv_nsec; frac = ((unsigned long long) a.tv_nsec << 25) / 1000000000ull; return sec | frac; } and both of those generate good code (that large divide by a constant in to_fstime() is not great, but the compiler can turn it into a multiply). The above could be improved upon (nicer rounding and overflow handling, and a few modifications to generate even nicer code), but it's not horrendous as-is. On x86-64, to_timespec becomes a very reasonable movq %rdi, %rax andl $33554431, %edi imulq $1000000000, %rdi, %rdx sarq $25, %rax shrq $25, %rdx and to some degree that's the critical function (that code would show up in 'stat()'). Of course, I might have screwed up the above conversion functions, they are untested garbage, but they look close enough to being in the right ballpark. Anyway, we really need to push back at any crazies who say "I want nanosecond resolution, because I'm special and my mother said so". Linus