Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp2126435imm; Tue, 2 Oct 2018 21:51:41 -0700 (PDT) X-Google-Smtp-Source: ACcGV60bkBD+5NdCqWJy16227mtZiedm85Ew9mFcP7a41l+OX02fzZ3QTTvPGVN1hzLoH8tIMy3n X-Received: by 2002:a17:902:5a4d:: with SMTP id f13-v6mr19784500plm.114.1538542301810; Tue, 02 Oct 2018 21:51:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538542301; cv=none; d=google.com; s=arc-20160816; b=bvr6SYbQl/puDhYLXHLfIt3d2/RWEl1xxg9Vrie7qtvr+zVAytadnAuXnhbDUsz9+x SWd2vE05hrTv4eb83JjRQ1V002Ls/bit5y3H57gFHGr3Q/F+MhLNyT4p1wR8gSmFIbPx jiDQND/79e3o7EWel/61ddD+BY27daWyUaIAFd6wDshkZiBDGTarB1qENigmE7MvLmln d5jn7V7QZ/dZy2pyycgBoLOMFyoyqdkitYarBB4DXr5v/+tAKpe/ydrc+Trs1E3RLiRU dP6gNn4TJiAa30nVPvhDmPh2Omg7jyx7/p14l/3tIO2Ia5cDxyHSwhSINlaGRQFvhNG0 YjJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from; bh=nKh6l5v7krLvNm3COrqP9Z4hPtSQ11srsgN6jHIZ3Eg=; b=mW0MmxziwSI1bgPj9qp+lb8j9a0ZwZJrd4P1JWL3rkocxEDleTh7rfWOuuKs9jZHJl 43wE5t2ldz5gzOjFGGsvRvG2jLf7QR7M2cld7bcFNEdoUzS6NUhZzjH/Sf//jmy8FeW2 r4/dV3r9VPcJq9TypNI5iwxKTOc2YY71DJsXecbqx9pK9E2SBcZuLkN+xbR3E5mJEi1X YPzoJkYhbxaGXPlFhzakKeUIiywmKB8D55vd5NhVoN8befcTi7Ym2SadnMgYADHOxQ1h uvUsxyQBhyBHNTElH3gKGSNj/ikrbX58sYcSMLzxKd3vTdYttM3svkigCdMDE8JXtWHE 6Ang== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b11-v6si271203plk.302.2018.10.02.21.51.24; Tue, 02 Oct 2018 21:51:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726606AbeJCLiD (ORCPT + 99 others); Wed, 3 Oct 2018 07:38:03 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:47386 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726525AbeJCLiD (ORCPT ); Wed, 3 Oct 2018 07:38:03 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out02.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1g7Z7y-0000PW-Cy; Tue, 02 Oct 2018 22:51:14 -0600 Received: from [105.184.227.67] (helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1g7Z7i-00067m-BM; Tue, 02 Oct 2018 22:51:14 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Thomas Gleixner Cc: Arnd Bergmann , avagin@virtuozzo.com, dima@arista.com, Linux Kernel Mailing List , 0x7f454c46@gmail.com, adrian@lisas.de, Andy Lutomirski , Christian Brauner , gorcunov@openvz.org, "H. Peter Anvin" , Ingo Molnar , Jeff Dike , Oleg Nesterov , xemul@virtuozzo.com, Shuah Khan , containers@lists.linux-foundation.org, criu@openvz.org, Linux API , the arch/x86 maintainers , Alexey Dobriyan , linux-kselftest@vger.kernel.org References: <20180919205037.9574-1-dima@arista.com> <874lej6nny.fsf@xmission.com> <20180924205119.GA14833@outlook.office365.com> <874leezh8n.fsf@xmission.com> <20180925014150.GA6302@outlook.office365.com> <87zhw4rwiq.fsf@xmission.com> <87mus1ftb9.fsf@xmission.com> <877ej2xc23.fsf_-_@xmission.com> Date: Wed, 03 Oct 2018 06:50:47 +0200 In-Reply-To: (Thomas Gleixner's message of "Tue, 2 Oct 2018 22:06:28 +0200 (CEST)") Message-ID: <87in2jskew.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1g7Z7i-00067m-BM;;;mid=<87in2jskew.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=105.184.227.67;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+mS1JiCwgOdrpjqu0CL+pXRPwZtQcbesw= X-SA-Exim-Connect-IP: 105.184.227.67 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa06.xmission.com X-Spam-Level: X-Spam-Status: No, score=-0.2 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,T_TM2_M_HEADER_IN_MSG autolearn=disabled version=3.4.1 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4992] * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;Thomas Gleixner X-Spam-Relay-Country: X-Spam-Timing: total 15036 ms - load_scoreonly_sql: 0.11 (0.0%), signal_user_changed: 4.3 (0.0%), b_tie_ro: 2.7 (0.0%), parse: 1.68 (0.0%), extract_message_metadata: 14 (0.1%), get_uri_detail_list: 2.6 (0.0%), tests_pri_-1000: 4.3 (0.0%), tests_pri_-950: 1.41 (0.0%), tests_pri_-900: 1.19 (0.0%), tests_pri_-400: 30 (0.2%), check_bayes: 29 (0.2%), b_tokenize: 10 (0.1%), b_tok_get_all: 9 (0.1%), b_comp_prob: 3.5 (0.0%), b_tok_touch_all: 3.4 (0.0%), b_finish: 0.71 (0.0%), tests_pri_-100: 9 (0.1%), check_dkim_signature: 0.74 (0.0%), check_dkim_adsp: 4.5 (0.0%), tests_pri_0: 283 (1.9%), tests_pri_10: 3.1 (0.0%), tests_pri_500: 14680 (97.6%), poll_dns_idle: 14664 (97.5%), rewrite_mail: 0.00 (0.0%) Subject: Re: Setting monotonic time? X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Thomas Gleixner writes: > On Tue, 2 Oct 2018, Arnd Bergmann wrote: >> On Mon, Oct 1, 2018 at 8:53 PM Thomas Gleixner wrote: >> > >> > On Mon, 1 Oct 2018, Eric W. Biederman wrote: >> > > In the context of process migration there is a simpler subproblem that I >> > > think it is worth exploring if we can do something about. >> > > >> > > For a cluster of machines all running with synchronized >> > > clocks. CLOCK_REALTIME matches. CLOCK_MONOTNIC does not match between >> > > machines. Not having a matching CLOCK_MONOTONIC prevents successful >> > > process migration between nodes in that cluster. >> > > >> > > Would it be possible to allow setting CLOCK_MONOTONIC at the very >> > > beginning of time? So that all of the nodes in a cluster can be in >> > > sync? >> > > >> > > No change in skew just in offset for CLOCK_MONOTONIC. >> > > >> > > There are also dragons involved in coordinating things so that >> > > CLOCK_MONOTONIC gets set before CLOCK_MONOTONIC gets used. So I don't >> > > know if allowing CLOCK_MONOTONIC to be set would be practical but it >> > > seems work exploring all on it's own. >> > >> > It's used very early on in the kernel, so that would be a major surprise >> > for many things including user space which has expectations on clock >> > monotonic. >> > >> > It would be reasonably easy to add CLOCK_MONONOTIC_SYNC which can be set in >> > the way you described and then in name spaces make it possible to magically >> > map CLOCK_MONOTONIC to CLOCK_MONOTONIC_SYNC. >> > >> > It still wouldn't allow to have different NTP/PTP time domains, but might >> > be a good start to address the main migration headaches. >> >> If we make CLOCK_MONOTONIC settable this way in a namespace, >> do you think that should include device drivers that report timestamps >> in CLOCK_MONOTONIC base, or only the timekeeping clock and timer >> interfaces? > > Uurgh. That gets messy very fast. > >> Examples for drivers that can report timestamps are input, sound, v4l, >> and drm. I think most of these can report stamps in either monotonic >> or realtime base, while socket timestamps notably are always in >> realtime. >> >> We can probably get away with not setting the timebase for those >> device drivers as long as the checkpoint/restart and migration features >> are not expected to restore the state of an open character device >> in that way. I don't know if that is a reasonable assumption to make >> for the examples I listed. > > No idea. I'm not a container migration wizard. Direct access to hardware/drivers and not through an abstraction like the vfs (an abstraction over block devices) can legitimately be handled by hotplug events. I unplug one keyboard I plug in another. I don't know if the input layer is more of a general abstraction or more of a hardware device. I have not dug into it but my guess is abstraction from what I have heard. The scary difficulty here is if after restart input is reporting times in CLOCK_MONOTONIC and the applications in the namespace are talking about times in CLOCK_MONOTONIC_SYNC. Then there is an issue. As even with a fixed offset the times don't match up. So a time namespace absolutely needs to do is figure out how to deal with all of the kernel interfaces reporting times and figure out how to report them in the current time namespace. Eric