Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp536006pxu; Wed, 25 Nov 2020 09:12:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJwjFuRbQCWUe5nkv8OiDFONbTvqbDOeTm9Mo72+uR46PDtV0V3D2DGvilKRCpuAledzc53O X-Received: by 2002:a17:906:57cc:: with SMTP id u12mr4060458ejr.163.1606324327660; Wed, 25 Nov 2020 09:12:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606324327; cv=none; d=google.com; s=arc-20160816; b=HLh8a6I5mq8Ls1KJB6uQ0zRonFttBI1/4Ole/dZzPh/7JR98ixf00Xyhc6xYOFmWaL 3BgKLak9xhYTsul+B70PtcMc+cN7122E+5kN4IRBcMw0aOYaRECBj9vTExT2mKmBUXcy jHtk2MpdGEyd46tHI0XAOXLZ0bpKKAnCEporM5yGAiXaTuLXnm6mGo7J8s2ZIPwNULc+ AwwBWLNqDHOS80dfr59oz6X/Zehmxan9WtqFlx2vIkxhVaIpVkkQezipZz2LeVnkiOaD Cphbg7gs/Tqk5EN7+qmodSJxb9TeEk0zo5VUA/z+/W+s0Pw2Q0L9FcgtAIsio5tL9PFn N2Pg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:subject :organization:from:references:cc:to; bh=FNKY5aOdjUlmKZvx04gZLZmBM7ekdL4Tuy4XLPW/vnE=; b=u5OKi9AG6J0vgsImWvL9Z/6pdMArdWZ74fiKK+7DbHcB5sG51+aBLItQspWCuuyP7Q RB9OKo3kOaxT3GM1aOm73qYNL/PunyRXUj1mbnLk/ziJvkxivG1C4ECWB4mljWrfaw8D 5krhucDyvRBlsArcdUmvMHMVXG7PcC3o7oWziLrobhVCRGLOG25OcsWqWaH7BwazowiW ag4eMmIdd+njJiegWZW/VdbLduu/Lfjf8cNc1T5A9bfPgp7e53Tc0wyL6Sjbv91j+Ca4 8c/QQvQFpv/6IKflNkOCny2t/0wkYYD22YB+TKYvaFfWjuyxTU6N60Fms/+aqdLk91IF WSWA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nic.cz Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y17si1253609edu.312.2020.11.25.09.11.44; Wed, 25 Nov 2020 09:12:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nic.cz Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732635AbgKYRGw (ORCPT + 99 others); Wed, 25 Nov 2020 12:06:52 -0500 Received: from mail.nic.cz ([217.31.204.67]:43224 "EHLO mail.nic.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732631AbgKYRGv (ORCPT ); Wed, 25 Nov 2020 12:06:51 -0500 Received: from pc-cznic19.fit.vutbr.cz (unknown [IPv6:2a02:8308:a18b:5500:c82b:1453:c29d:e435]) by mail.nic.cz (Postfix) with ESMTPSA id 61C1D142076; Wed, 25 Nov 2020 18:06:48 +0100 (CET) To: Thomas Gleixner , Carlos O'Donell , Zack Weinberg , Cyril Hrubis Cc: Dmitry Safonov , Andrei Vagin , GNU C Library , Linux Kernel Mailing List References: <20201030110229.43f0773b@jawa> <20201030135816.GA1790@yuki.lan> <87sg9vn40t.fsf@nanos.tec.linutronix.de> <72bbb207-b041-7710-98ad-b08579fe17e4@redhat.com> <87h7qbmqc3.fsf@nanos.tec.linutronix.de> <7bb5837f-1ff6-2b2c-089e-e2441d31ddb2@redhat.com> <87k0v7kwdc.fsf@nanos.tec.linutronix.de> <7a4d7b14-1f0b-4c40-2bd1-2582d8b71868@redhat.com> <87y2jej8mp.fsf@nanos.tec.linutronix.de> <87wnygopen.fsf@nanos.tec.linutronix.de> From: =?UTF-8?B?UGV0ciDFoHBhxI1law==?= Organization: CZ.NIC Subject: Re: [Y2038][time namespaces] Question regarding CLOCK_REALTIME support plans in Linux time namespaces Message-ID: Date: Wed, 25 Nov 2020 18:06:47 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.4.3 MIME-Version: 1.0 In-Reply-To: <87wnygopen.fsf@nanos.tec.linutronix.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US-large Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-100.0 required=5.9 tests=SHORTCIRCUIT, USER_IN_WELCOMELIST,USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on mail.nic.cz X-Virus-Scanned: clamav-milter 0.102.2 at mail X-Virus-Status: Clean Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 20. 11. 20 1:14, Thomas Gleixner wrote: > On Thu, Nov 19 2020 at 13:37, Carlos O'Donell wrote: >> On 11/6/20 7:47 PM, Thomas Gleixner wrote: >>> Would CONFIG_DEBUG_DISTORTED_CLOCK_REALTIME be a way to go? IOW, >>> something which is clearly in the debug section of the kernel which wont >>> get turned on by distros (*cough*) and comes with a description that any >>> bug reports against it vs. time correctness are going to be ignored. >> >> Yes. I would be requiring CONFIG_DEBUG_DISTORTED_CLOCK_REALTIME. >> >> Let me be clear though, the distros have *+debug kernels for which this >> CONFIG_DEBUG_* could get turned on? In Fedora *+debug kernels we enable all >> sorts of things like CONFIG_DEBUG_OBJECTS_* and CONFIG_DEBUG_SPINLOCK etc. >> etc. etc. > > That's why I wrote '(*cough*)'. It's entirely clear to me that this > would be enabled for whatever raisins. > >> I would push Fedora/RHEL to ship this in the *+debug kernels. That way I can have >> this on for local test/build cycle. Would you be OK with that? > > Distros ship a lot of weird things. Though that config would be probably > saner than some of the horrors shipped in enterprise production kernels. > >> We could have it disabled by default but enabled via proc like >> unprivileged_userns_clone was at one point? > > Yes, that'd be mandatory. But see below. > >> I want to avoid accidental use in Fedora *+debug kernels unless the >> developer is actively going to run tests that require time >> manipulation e.g. thousands of DNSSEC tests with timeouts [1]. > > ... > >> In case of DNSSEC protocol conversations have real time values in them >> which cause "expiration", thus packet captures are useful only if real >> time clock reflects values during the original conversation. In our case >> packet captures come from real Internet, i.e. we do not have private >> keys used to sign the packets, so we cannot change time values. >> >> This use-case also implies support for settime(): During the course of a >> test we shorten time windows where "nothing happens" and server and >> client are waiting for an event, e.g. for cache expiration on >> client. This window can be hours long so it really _does_ make a >> difference. Oh yes, and for these time jumps we need to move monotonic >> time as well. > > I hope you are aware that the time namespace offsets have to be set > _before_ the process starts and can't be changed afterwards, > i.e. settime() is not an option. > > That might limit the usability for your use case and this can't be > changed at all because there might be armed timers and other time > related things which would start to go into full confusion mode. > > The supported use case is container life migration and that _is_ very > careful about restoring time and armed timers and if their user space > tools screw it up then they can keep the bits and pieces. > > So in order to utilize that you'd have to checkpoint the container, > manipulate the offsets and restore it. > > The point is that on changing the time offset after the fact the kernel > would have to chase _all_ armed timers which belong to that namespace > and are related to the affected clock and readjust them to the new > distortion of namespace time. Otherwise they might expire way too late > (which is kinda ok from a correctness POV, but not what you expect) or > too early, which is clearly a NONO. Finding them is not trivial because > some of them are part of a syscall and on stack. > > What's worse is that if the host's CLOCK_REALTIME is set, then it'd have > to go through _all_ time namespaces, adjust the offsets, find all timers > of all tasks in each namespace. > > Contrary to that the real clock_settime(CLOCK_REALTIME) is not a big > problem, simply because all it takes is to change the time and then kick > all CPUs to reevaluate their first expiring timer. If the clock jumped > backward then they rearm their hardware and are done, if it jumped > forward they expire the ones which are affected and all is good. > > The original posix timer implementation did not have seperate time bases > and on clock_settime() _all_ armed CLOCK_REALTIME timers in the system > had to be chased down, reevaluated and readjusted. Guess how well that > worked and what kind of limitation that implied. > > Aside of this, there are other things, e.g. file times, packet > timestamps etc. which are based on CLOCK_REALTIME. What to do about > them? Translate these to/from name space time or not? There is a long > list of other horrors which are related to that. > > So _you_ might say, that you don't care about file times, RTC, timers > expiring at the wrong time, packet timestamps and whatever. > > But then the next test dude comes around and want's to test exactly > these interfaces and we have to slap the time namespace conversions for > REALTIME and TAI all over the place because we already support the > minimal thing. > > Can you see why this is a slippery slope and why I'm extremly reluctant > to even provide the minimal 'distort realtime when the namespace starts' > support? > >> Hopefully this ilustrates that real time name space is not "request for >> ponny" :-) > > I can understand your pain and why you want to distort time, but please > understand that timekeeping is complex. The primary focus must be > correctness, scalability and maintainability which is already hard > enough to achieve. Just for the perspective: It took us only 8 years to > get the kernel halfways 2038 ready (filesystems still outstanding). > > So from my point of view asking for distorted time still _is_ a request > for ponies. > > The fixed offsets for clock MONOTONIC/BOOTTIME are straight forward, > absolutely make sense and they have a limited scope of exposure. clock > REALTIME/TAI are very different beasts which entail a slew of horrors. > Adding settime() to the mix makes it exponentially harder. Point taken, I can see it is complex as hell. Maybe settime() would not be necessary if checkpoint+restore operation is cheap enough, assuming time jumps can be achieved by manipulating images. I will eventually explore criu.org to find out. Thank you for your time! -- Petr Špaček @ CZ.NIC