Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2486232pxb; Mon, 18 Jan 2021 20:52:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJzkFk7s7cU2TnGaTFmigUQIoju9PzmH9n7ftdl4IsTYO0RxXkoxIGOa1zN0RctfQnGx+j2D X-Received: by 2002:aa7:c384:: with SMTP id k4mr1924905edq.23.1611031930601; Mon, 18 Jan 2021 20:52:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611031930; cv=none; d=google.com; s=arc-20160816; b=NXkYOqJzAHQIR2hB5sS1EmDszlb7JLyBP1145F+nZtXN4dfPcL/bhlnijgL4tHMz8d 3nOApn4aoIcNBiWdtdF2DeK0SCo6oJr2yJZoDXP/Sy8xPRiLIQaRnnechWj51JpkiDvX RhGd7KMcD/n5axqdNChvLhHwhLFCmISMlIGOgWumZqrZHSvu0Zdgk/dy8LjX0OQTC6g+ UvI2Ojv3Ebsbqs2BuU5TXkqR3KKMImvZFh0HbD9cK3YwpZhkB/Pp/3AVX5nKNNMpmngf KEEs1WOmSrBr+J///rNRfz5GPWaLhW0rqbBoyv9esYb0H+fTK01znp+P7JJmInxsM3+M MJHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Nm6HrK7oawuOwyXp1xJbpJjKfOkLLT1pH7TKml+09Lw=; b=epySS5IbCghDQbvyvNKrdsi6ox27WnnRfr67We31WkFbAIM0lBK5rr/pSkE0gB3NXu pDmb0Ew3v8T9UR7DAVktDl8dwYmpj/crwPXRVQN2KLl+E18iiV5q7c29baIzRK4cWyHy 2gEwassohMJCNzCCl5VoneBkGmgesqrx042fEJ6g05f/Qk7C8C3F9CX//X65DVxRCV+g bEYi5dp4hN9uSk7x9wI/12jNpIXQp3aCfGFU69yOwigDyuABSM9VJxLyI8eKCY30WKpi /XkdfjxvlrNlOBSkPB2bFAbYVVD1EQdB5W0MkkVD4HprQs14g3gnC4/leL1csJw93Ngh uZWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oFaSiW47; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id oz26si7361611ejb.181.2021.01.18.20.51.18; Mon, 18 Jan 2021 20:52:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=oFaSiW47; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2406890AbhARR3q (ORCPT + 99 others); Mon, 18 Jan 2021 12:29:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2393628AbhARR0a (ORCPT ); Mon, 18 Jan 2021 12:26:30 -0500 Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAE07C061573 for ; Mon, 18 Jan 2021 09:25:48 -0800 (PST) Received: by mail-ed1-x535.google.com with SMTP id f1so4342639edr.12 for ; Mon, 18 Jan 2021 09:25:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=Nm6HrK7oawuOwyXp1xJbpJjKfOkLLT1pH7TKml+09Lw=; b=oFaSiW4769WFoFpssMgp5qXiVVVfMhM+HhdNLvMoO+a7WHxDuc3F7Yx8KH2D+XBtIE AOTsbnBbbLoRTx+ePSnZnEevl1yeTf8/wUNvTuk84MQPbcxHj2Lk9Vkm185AZh5LefEt uCI4jqnVCtrwMNS6UgOgGod1dK9lAPkVYz+XgXDBNEXaoTJVPB0cEE+c6R9wb28q/beg /Uzw7qaBWrSL4q7B9FyxycJGvtbRp3YElk7cC0KvKCwvt8chEqaIka7EOqCbyK573qf4 2yV1/urXsnpnH/RPH2PQIa4LohTNvZEOkBe38uAJnd4o5ZLjB8BC9FgLKWyEu1ntFdXe rrCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Nm6HrK7oawuOwyXp1xJbpJjKfOkLLT1pH7TKml+09Lw=; b=sO/6+2AljV4dqWQO9h0wtUahRs7TVG5e8xYQ67L+5z/4lqIxdSfUp+f7c8lUx8jEdY RpbLA03FU/GF4yJGyXmbWn74ACkoIexZjZxF+XTD9tm6X901XQc9STjXUavf7gdtfhKZ ORFboeKXenRsmbIB2nBbw95M+WwnL/enmbhHpRfB2wfGsmIfMSV8Rsrm9zipJBHPCVxY lU21fgPB5sSmZhmUTaW/2o0bEbroeai6U4o6j4FK1QfZ1lS5lvDYWcOkkZnL5MrE5in9 cgUdW+xo9fLpUd0OynH/5TsXdQW11e6sEr3R75vEcxy1irh3+Uavx7h5lyYX0hghU2C5 LAQQ== X-Gm-Message-State: AOAM531tIWyRj86/0UctGaNSFUgfH9tXlklA0rTcmZ+uOF2BoU5kAz8I c1iiTVibHI7JhhokMwVYStVgZw== X-Received: by 2002:a05:6402:11c7:: with SMTP id j7mr396092edw.290.1610990747511; Mon, 18 Jan 2021 09:25:47 -0800 (PST) Received: from google.com ([2a00:79e0:2:11:1ea0:b8ff:fe79:fe73]) by smtp.gmail.com with ESMTPSA id r7sm11127221edh.86.2021.01.18.09.25.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jan 2021 09:25:46 -0800 (PST) Date: Mon, 18 Jan 2021 18:25:41 +0100 From: Piotr Figiel To: Mathieu Desnoyers Cc: Peter Zijlstra , paulmck , Boqun Feng , Alexey Dobriyan , "Eric W. Biederman" , Andrew Morton , Kees Cook , Alexey Gladkov , Christian Brauner , Michel Lespinasse , Bernd Edlinger , Andrei Vagin , linux-kernel , linux-fsdevel , Peter Oskolkov , Kamil Yurtsever , Chris Kennelly , Paul Turner Subject: Re: [PATCH v2] fs/proc: Expose RSEQ configuration Message-ID: References: <20210114185445.996-1-figiel@google.com> <1530232798.13459.1610725460826.JavaMail.zimbra@efficios.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1530232798.13459.1610725460826.JavaMail.zimbra@efficios.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, thanks for review. On Fri, Jan 15, 2021 at 10:44:20AM -0500, Mathieu Desnoyers wrote: > ----- On Jan 14, 2021, at 1:54 PM, Piotr Figiel figiel@google.com wrote: > Added PeterZ, Paul and Boqun to CC. They are also listed as maintainers of rseq. > Please CC them in your next round of patches. OK. > > Since C/R preserves TLS memory and addresses RSEQ ABI will be > > restored using the address registered before C/R. > How do you plan to re-register the rseq TLS for each thread upon > restore ? In CRIU restorer there is a moment when the code runs on behalf of the restored thread after the memory is already restored but before the control is passed to the application code. I'm going to use rseq() syscall there with the checkpointed values of ABI address and signatures (obtained via the newly added procfs file). > I suspect you move the return IP to the abort either at checkpoint or > restore if you detect that the thread is running in a rseq critical > section. Actually in the prototype implementation I use PTRACE_SINGLESTEP during checkpointing (with some safeguards) to force the kernel to jump out of the critical section before registers values are fetched. This has the drawback though that the first instruction of abort handler is executed upon checkpointing. I'll likely rework it to update instruction pointer by getting abort address with PTRACE_PEEKTEXT (via RSEQ ABI). I think an option is to have a kernel interface to trigger the abort on userspace's request without need for some hacks. This could be a ptrace extension. Alternatively attach could trigger RSEQ logic, but this is potentially a breaking change for debuggers. > > Detection whether the thread is in a critical section during C/R is > > needed to enforce behavior of RSEQ abort during C/R. Attaching with > > ptrace() before registers are dumped itself doesn't cause RSEQ > > abort. > Right, because the RSEQ abort is only done when going back to > user-space, and AFAIU the checkpointed process will cease to exist, > and won't go back to user-space, therefore bypassing any RSEQ abort. The checkpointed process doesn't have to cease to exist, actually it can continue, and when it's unpaused kernel will schedule the process and should call the abort handler for RSEQ CS. But this will happen on the checkpointing side after process state was already checkpointed. For C/R is important that the checkpoint (serialized process state) is safe wrt RSEQ. > > Restoring the instruction pointer within the critical section is > > problematic because rseq_cs may get cleared before the control is > > passed to the migrated application code leading to RSEQ invariants > > not being preserved. > The commit message should state that both the per-thread rseq TLS area > address and the signature are dumped within this new proc file. I'll include this in v3, thanks. > AFAIU lock_trace prevents concurrent exec() from modifying the task's > content. What prevents a concurrent rseq register/unregister to be > executed concurrently with proc_pid_rseq ? Yes, in this shape only ptrace prevents, as it was the intended use case. Do you think it would make sense to add a mutex on task_struct for the purpose of casual reader (sys admin?) consistency? This would be locked only here and in the syscall during setting. (Alternatively SMP barrier could be used to enforce the order so that the ABI address is always written first, and the signature wouldn't make sense on ABI address = 0, but probably being simply consistent is better). > I wonder if all those parentheses are needed. Wouldn't it be enough to have: Will remove thanks. Best regards, Piotr.