Received: by 2002:a25:e7d8:0:0:0:0:0 with SMTP id e207csp466656ybh; Wed, 18 Mar 2020 03:19:54 -0700 (PDT) X-Google-Smtp-Source: ADFU+vtSL4T/uzlc+uw+sY9FpV6hS/OVjkzVcsx0mSbKy9hCkEGLaC2YZcbBywp2mlPzAzMIjtsx X-Received: by 2002:a9d:12a3:: with SMTP id g32mr3224578otg.111.1584526794694; Wed, 18 Mar 2020 03:19:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584526794; cv=none; d=google.com; s=arc-20160816; b=TS6AZzHPnFcwbQLK5j72Dv8kUjsQAifkTT3UFwzJP6E8d8snWLL4GClb/2UJe9Qccw oj38zPQkkX6PXKL6E0rAJwMhiGwSbAwEF83NJiV2NYm8oFkgjxk0hI7zXGzrBjQkquH9 Kp3cyvcJucKtD0LbhLiRwDLE5DHOCZhg8HnTtSo0u3fylXQPFdDgLkHld4F3t+hWJSdI avr+CrDupY9Mrnm83Heu9N74eEAGuJSmcaSGVNGm2VMhWcEkiCZgC8RCoE6Byh7FI1mO u4tEihrH9IA1zf824qYThSH5+rckyhZyBW5/59KFIJooWd9N3eGSKjCmTHPFSEu5BU1m Nn/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version; bh=T6Kzolk83f974buXkPlpl0VdER3impZFpG2QC0DuLMU=; b=So1ehGGPuE7g09L/iWX5WnpRxg+LSqwiC5YcZG9DAXTtBbEeYjoileW+4UWNHqnkAP W1hsYTiAMNFVdTcTs2EqQO1Awi151tXb9an2AGxGEilPWff21oTH2USgym4tndwe7GuD 37MRFhFBBml79KFm0l+eS6QGp+x0ZdK6cRg51kAKX2JUWKaPhiL/ChQMdW147UFhOHF7 PJILVTO6nIKkqFqwKXSC1a90NgLGlaI2uyyBvt9g0gGIC5iSDB4I3uYgkCvR3SkXo8Td qu2ZF1drCuSC3+RSmSd3sVvEnuhbEMDc1LRIrR7cg4Pv3WM9Yfbb5E7VCC35dzmCWgWG HSiw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c22si1385310oto.280.2020.03.18.03.19.42; Wed, 18 Mar 2020 03:19:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727683AbgCRKTN (ORCPT + 99 others); Wed, 18 Mar 2020 06:19:13 -0400 Received: from mout.kundenserver.de ([217.72.192.73]:60027 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727380AbgCRKTL (ORCPT ); Wed, 18 Mar 2020 06:19:11 -0400 Received: from mail-qv1-f44.google.com ([209.85.219.44]) by mrelayeu.kundenserver.de (mreue108 [212.227.15.145]) with ESMTPSA (Nemesis) id 1MMGVE-1iyUZc2DLC-00JJln for ; Wed, 18 Mar 2020 11:19:10 +0100 Received: by mail-qv1-f44.google.com with SMTP id z13so6393581qvw.3 for ; Wed, 18 Mar 2020 03:19:10 -0700 (PDT) X-Gm-Message-State: ANhLgQ1yNay6j/PxkstgmXa5ejMcMzZ5hdsCRTmBit01Qd2ScNC9nYTs jhQ+C2R8dNKDIHiAtKywXslL6fl4VRztxwK9aWE= X-Received: by 2002:a0c:b203:: with SMTP id x3mr3340934qvd.197.1584526749469; Wed, 18 Mar 2020 03:19:09 -0700 (PDT) MIME-Version: 1.0 References: <20200317083043.226593-1-areber@redhat.com> In-Reply-To: <20200317083043.226593-1-areber@redhat.com> From: Arnd Bergmann Date: Wed, 18 Mar 2020 11:18:53 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: clone3: allow creation of time namespace with offset To: Adrian Reber Cc: Christian Brauner , Eric Biederman , Pavel Emelyanov , Oleg Nesterov , Dmitry Safonov <0x7f454c46@gmail.com>, Andrei Vagin , "linux-kernel@vger.kernel.org" , Mike Rapoport , Radostin Stoyanov , Michael Kerrisk , Cyrill Gorcunov , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" X-Provags-ID: V03:K1:9bf/+TYZkAaEp+9Cc/8X/2wDjONwsJhDkpThfDZE4LKluDMbwlT SMx0e9Sq2uQeHw1CQqjyKlANwmjLMgJXAJ7FwlO3XsKsL9e3UPvJxB3JMdv6DcrotXzAXSe p0DgQWnsL7zkxa576BcuC5a3LDygWgEJOAXyNJbxepcwYsbt1jXQr6Rt1KT8omgW7Ad6bqO 49sE23xQxelUTnrm8AHdQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:PNcvG8R74hg=:nQq4PCjgKKSJP1B6Gg8NFB HNX+IEFH8Fncaymyd3nCR/qnfmtIgYE/qAXEq1VYYZw2IIY9pezyg/s0F36OXwYOhdk38oLOA tLneV/tv0HPTbKtx2Q3PQpe3pad5oVW3cF3vkHUEN+MqPNgiWBaNTegF8jCCoJ8LsVDz/F729 mt4x4pcHLkPFHsgDA2VD82dcabJt2ojMxkmAVjZ2bid8iRqC5TLx0AKwqm2MoDm2SUIZAv54a 7lprRzeyXfdaBM2YGNRgaIuNLMptRvJU5KwFlYF0AgMEXisqH0XirPdvUSjK9u+e/uORgmeSj 08V1hVvOyKb/7sTmmZV6AQxCJRXR7Mj0nhu+ZeEZkOMj8NK9wkMPcPuTTLPecyoKcJsftFYuA WA2rxQcOWNfsgw+a9bpHD1LLDUA2LZBVxSRi98orZSYzwFcS95ypAKr9GGfhAte8UZviIKqzu Gbu7UWbalUlyhQXRS1+TQg3TC8bXA/2QBZXzgaQjQrs49aAfUlgR00xwa+uzORQoPzWntBqQb ditnGSxuEvbvoLv4lyOim97cH5cqM/XAzUFjhnfNnx3M2zZzig+Af+DmNPTHAJ+hPU7Wd21or 1l1IUG1qnhkgECr3yG5QObuQUjrskh3EZHaTwlcmftb1jwxCoKIcuqEK8nAp130fp8933iyOk lIgzdJVwtpmYNfotDkfJebTZvKFgb5QkkZfcFSkQ7AMju+DMt11QPLcEw9/YUm4Dl0LvTOyTF RrdSY9yzhRWM87eOaPQ5IQWteDhf06QTriOh5KXyiTBp4XJlB4JB9M8gZgSDmGBiGb4QvnaCO 6s41bLxwXWi4/WzkwU9wv7Mthz9AwslyzQPDNq9+8/kkDvnWbM= Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 17, 2020 at 9:32 AM Adrian Reber wrote: > > This is an attempt to add time namespace support to clone3(). I am not > really sure which way clone3() should handle time namespaces. The time > namespace through /proc cannot be used with clone3() because the offsets > for the time namespace need to be written before a process has been > created in that time namespace. This means it is necessary to somehow > tell clone3() the offsets for the clocks. > > The time namespace offers the possibility to set offsets for > CLOCK_MONOTONIC and CLOCK_BOOTTIME. My first approach was to extend > 'struct clone_args` with '__aligned_u64 monotonic_offset' and > '__aligned_u64 boottime_offset'. The problem with this approach was that > it was not possible to set nanoseconds for the clocks in the time > namespace. > > One of the motivations for clone3() with CLONE_NEWTIME was to enable > CRIU to restore a process in a time namespace with the corresponding > offsets. And although the nanosecond value can probably never be > restored to the same value it had during checkpointing, because the > clock keeps on running between CRIU pausing all processes and CRIU > actually reading the value of the clocks, the nanosecond value is still > necessary for CRIU to not restore a process where the clock jumps back > due to CRIU restoring it with a nanonsecond value that is too small. > > Requiring nanoseconds as well as seconds for two clocks during clone3() > means that it would require 4 additional members to 'struct clone_args': > > __aligned_u64 tls; > __aligned_u64 set_tid; > __aligned_u64 set_tid_size; > + __aligned_u64 boottime_offset_seconds; > + __aligned_u64 boottime_offset_nanoseconds; > + __aligned_u64 monotonic_offset_seconds; > + __aligned_u64 monotonic_offset_nanoseconds; > }; Wouldn't it be sufficient to have the two nanosecond values, rather than both seconds and nanoseconds? With 64-bit nanoseconds you can represent several hundred years, and these would always start at zero during boot. Regardless of this, I think you need a signed offset, not unsigned. Arnd