Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8833601rwr; Thu, 11 May 2023 06:52:50 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5ritfoDxkrBG2oCVZggUDZLn+HJCEVPzZCn9/BTj4na74GWWGfTLmLCk9zXOx/Uj9ONsCO X-Received: by 2002:a05:6a00:16d3:b0:63d:3f74:9df7 with SMTP id l19-20020a056a0016d300b0063d3f749df7mr26353594pfc.34.1683813170345; Thu, 11 May 2023 06:52:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683813170; cv=none; d=google.com; s=arc-20160816; b=tMBhuKTMolekBkvU2A22qS0i33PaNlOyt4WCeHyOx5IzXJr4gTz0v29fbTtptq0D/V FZk4hPoRNj72puqYb5yuMvUttba7vYYF4XjTgHtbozAs47OHTzqXInlUObe7gcClZQvn 9ZAvG/Ifd3EOyM2lO3CGLiU1jKr584EwVPJDkk5GAwrOR35xIhtZjmxlcCQijXynlvd5 TfWxtsWmVjXhFG+dQkLcMBNPJJhSbd3d23hfdDYuS3OSE0Z7pNWt7W69+Xak/+q2MMEi HlDwdYFlLI3mrQq75B7xJhI1QWw5rWf82pW7GFteXhlx4SeiLkV6sLSGONqxoAB+ImG/ EZsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=iCwX8gHyc3b+mBTSawk7jvVq4HsHK8LP72wiJ/1PsQw=; b=0/Ky1csi1T51nQ6HozMrIP4nHZjwbL9i2kvZtI7bqQxpt3F9w/hvExubvPFdI+VxOM UHyKf/jHSLDgkXWSD252sfSZKuRDF/4bq0B7vxa3+dE52e7JVvvzcn4TPtk22lGxaLSz LI5fi3MQxL9bTj4k3CpByhj2ONYG7z6r1q02EFHHff0/QLZTTaOAxTcBYfxBwMnHxPUE uUQWSWkHM23mJEQ71RVXEkFAesFQ6B2pMLmCINgiIT/4Ba8X9E6mWlkPRjdah+f+diTY LLEbR7ZXCQ0OVjMtBk4u4JVl+ydMm7yPf5RrSxLjceb6AG6rk+xqSdneBNFMVtbtu/fP 6urg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=fXlTxNIg; dkim=neutral (no key) header.i=@linutronix.de header.b="d9j/MDo1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r20-20020a6560d4000000b00502e7159b0fsi6592810pgv.175.2023.05.11.06.52.37; Thu, 11 May 2023 06:52:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=fXlTxNIg; dkim=neutral (no key) header.i=@linutronix.de header.b="d9j/MDo1"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238000AbjEKNmp (ORCPT + 99 others); Thu, 11 May 2023 09:42:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238166AbjEKNm3 (ORCPT ); Thu, 11 May 2023 09:42:29 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B52AC106C4 for ; Thu, 11 May 2023 06:42:12 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1683812531; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iCwX8gHyc3b+mBTSawk7jvVq4HsHK8LP72wiJ/1PsQw=; b=fXlTxNIgHL75VxOsJEKll1WWilsJNeHpptTpPm1JDmAa5Q6MnxsT09wv2el/6zu4p0SevS RjeG+lb0o1JKqXg3ovlZ/N9SerZBO6TdewLXbYyVA03LnAsOJMeJn3j5XC9gabG3T8FJZ2 PonkFjRUUwnWFyJELe9eJNNC5oFRAZsVqcBy/lCryvqJD+r9D+x1bl2kWGZy6j2Qss1qQQ T0i4RIt4g89tvLCwLrX9S/1TPYApr7MB2tlPPLpGQJ1umnkoDuo+i3r4zSB1H1f5ypeEKM evKiVSWbAOAOLbQMN31ED/VcUak9Rn1txa8BkeIRK+isfhy4id8lTkXrWN0Ltw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1683812531; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=iCwX8gHyc3b+mBTSawk7jvVq4HsHK8LP72wiJ/1PsQw=; b=d9j/MDo1mtTovVltFiI5zG8LDyf0ar7JcOqj4Y1gAI8sMOnWIFM+RegaOROgMRrv0u0lOG b5uORHX/7MsR7ZCQ== To: Pavel Tikhomirov , Andrey Vagin Cc: Frederic Weisbecker , LKML , Anna-Maria Behnsen , Peter Zijlstra , syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com, Dmitry Vyukov , Sebastian Siewior , Michael Kerrisk , Christian Brauner , Alexander Mikhalitsyn , Pavel Emelyanov Subject: Re: [RFD] posix-timers: CRIU woes In-Reply-To: References: <20230425181827.219128101@linutronix.de> <20230425183312.932345089@linutronix.de> <87zg6i2xn3.ffs@tglx> <87v8h62vwp.ffs@tglx> <878rdy32ri.ffs@tglx> <87v8h126p2.ffs@tglx> <875y911xeg.ffs@tglx> <87ednpyyeo.ffs@tglx> <87r0rnciqo.ffs@tglx> Date: Thu, 11 May 2023 15:42:10 +0200 Message-ID: <87ilczc7d9.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 11 2023 at 17:52, Pavel Tikhomirov wrote: > On 11.05.2023 17:36, Thomas Gleixner wrote: >> On Thu, May 11 2023 at 11:17, Pavel Tikhomirov wrote: >>> On 10.05.2023 16:16, Andrey Vagin wrote: >>>>> >>>>> So because of that half thought out user space ABI we are now up the >>>>> regression creek without a paddle, unless CRIU can accomodate to a >>>>> different restore mechanism to lift this restriction from the kernel. >>>>> >>>> If you give us a new API to create timers with specified id-s, we will >>>> figure out how to live with it. It isn't good to ask users to update >>>> CRIU to work on new kernels, but here are reasons and event improvements >>>> for CRIU, so I think it's worth it. >>> >>> I agree, any API to create timers with specified id-s would work for new >>> CRIU versions. >> >> The real question is whether this will cause any upheaval when a new >> kernel meets a non-updated CRIU stack. > > Creation of posix timer would hang forever in this loop > https://github.com/checkpoint-restore/criu/blob/33dd66c6fc93c47213aaa0447a94d97ba1fa56ba/criu/pie/restorer.c#L1185 > if old criu is run on new kernel (without consecutive id allocation) AFAICS. Yes, because that "sanity" check if ((long)next_id > args->posix_timers[i].spt.it_id) which tries to establish whether the kernel provides timer IDs in strict increasing order does not work for that case. It "works" to detect the IDR case on older kernels by chance, but not under all circumstances. Assume the following case: Global IDR has a free slot at index 1 Restore tries to create a timer for index 2 That will also loop forever, unless some other process creates a timer and occupies the free slot at index 1, right? So this needs a fix anyway, which should be done so that the new kernel case is at least properly detected. But even then there is still the problem of "it worked before I upgraded the kernel". IOW, we are still up a creek without a paddle, unless you would be willing to utilize the existing CRIU bug to distribute the 'deal with new kernel' mechanics as a bug bounty :) Fix for the loop termination below. Thanks, tglx --- criu/pie/restorer.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) --- a/criu/pie/restorer.c +++ b/criu/pie/restorer.c @@ -1169,10 +1169,10 @@ static int timerfd_arm(struct task_resto static int create_posix_timers(struct task_restore_args *args) { int ret, i; - kernel_timer_t next_id; + kernel_timer_t next_id, timer_id; struct sigevent sev; - for (i = 0; i < args->posix_timers_n; i++) { + for (i = 0, next_id = 0; i < args->posix_timers_n; i++) { sev.sigev_notify = args->posix_timers[i].spt.it_sigev_notify; sev.sigev_signo = args->posix_timers[i].spt.si_signo; #ifdef __GLIBC__ @@ -1183,25 +1183,27 @@ static int create_posix_timers(struct ta sev.sigev_value.sival_ptr = args->posix_timers[i].spt.sival_ptr; while (1) { - ret = sys_timer_create(args->posix_timers[i].spt.clock_id, &sev, &next_id); + ret = sys_timer_create(args->posix_timers[i].spt.clock_id, &sev, &timer_id); if (ret < 0) { pr_err("Can't create posix timer - %d\n", i); return ret; } - if (next_id == args->posix_timers[i].spt.it_id) + if (timer_id != next_id) { + pr_err("Can't create timers, kernel don't give them consequently\n"); + return -1; + } + + next_id++; + + if (timer_id == args->posix_timers[i].spt.it_id) break; - ret = sys_timer_delete(next_id); + ret = sys_timer_delete(timer_id); if (ret < 0) { - pr_err("Can't remove temporaty posix timer 0x%x\n", next_id); + pr_err("Can't remove temporaty posix timer 0x%x\n", timer_id); return ret; } - - if ((long)next_id > args->posix_timers[i].spt.it_id) { - pr_err("Can't create timers, kernel don't give them consequently\n"); - return -1; - } } }