Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp1895433ybb; Thu, 9 Apr 2020 11:01:46 -0700 (PDT) X-Google-Smtp-Source: APiQypI+dYNDh/z20fCmDS6uqLtYJJV4nCHCdQAwVlXvJgBpfgkGy8ouhwbI8xSMNWPT1ap4z55M X-Received: by 2002:ac8:5486:: with SMTP id h6mr605117qtq.256.1586455306135; Thu, 09 Apr 2020 11:01:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1586455306; cv=none; d=google.com; s=arc-20160816; b=L6Ku9uwwmVzBhD5C1+Cq6q8HkaWVZM4M6Mr4etlmFrs/zbxDwudL0ph6Xg6MLjWq3a B6J/L2emTeHEfSfh/Ij8q8dArikxs3YGudOzA3JgrLcqz3DxpFZ7RvL/4LMqteeCBcC3 rfhNhLjnvyTfCnjLkgdHPA/9uvinK8aO3LNWyhbd9PT1HuY9+1qTGn7U5mJ2ovMLGpJE GwDrKGhNz2WlKzqafga9E53nfFyTQHj7cavqYQnprUfl0xzKH4EC5PponIxadGLXgRLU NoSV17bkUsrWgLz3ll2oY0CUexi46+4S0L7e14qOOHXG9xLLl6xIgCjxU9o2EQfpDF1b waeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:in-reply-to:cc:references:message-id :date:subject:mime-version:from:content-transfer-encoding :dkim-signature; bh=zmgHj370LBNKV1cbwyo1caCm3JQLWtu1rtTInvPHiUM=; b=RhqHc+gXk/dmka47LYizixAn7FCscF+Idd3b2AXIGvFtYSY23w8WKJs49F7yyPWSPR c+Uvl3XSflreUFKva2Ro5Zn+malY5zvE0+UXXbzSeWyWglhIJt+f3CL31836coYMof6q Sudsa/ypuyDjpkPfcoaiEK4CsBWx9a2r8ITXRj6j+xL4jAjJOF2/sxThSDTQmHHCty8Y I0UzIWZEzETU3QNle73aUfANRUkWATCdV/pZA6fKennCaJsUgoeGBtKcXjjwTrJ6pmo5 K13u3GlzePowBt1F7A7ZvvUjXUTb3fzJ0yzTvkOjVEmvqc/bFmGI4pWMOYl4AhrXRe1P oy9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=tzNrLp6S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h11si3110356qtq.202.2020.04.09.11.01.20; Thu, 09 Apr 2020 11:01:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=tzNrLp6S; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726597AbgDISAT (ORCPT + 99 others); Thu, 9 Apr 2020 14:00:19 -0400 Received: from mail-pj1-f67.google.com ([209.85.216.67]:40166 "EHLO mail-pj1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725970AbgDISAT (ORCPT ); Thu, 9 Apr 2020 14:00:19 -0400 Received: by mail-pj1-f67.google.com with SMTP id kx8so1576771pjb.5 for ; Thu, 09 Apr 2020 11:00:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=zmgHj370LBNKV1cbwyo1caCm3JQLWtu1rtTInvPHiUM=; b=tzNrLp6SmTl4zuvoJWbv/IhrraqC8y+wpruozs/0zZ2WxqUK+8LU0L9I4hzVj2ua9U KY4iUimjpe58mMHQ5aLQgiS86vHoYVT/MPfC/f04JDPoelZ7dXayZ2gxrJoZJFqWqlAW tyC5H1xGmGRb3Es0DTFD6S73K5nQgskuaNMbgkZqBFc2BUEe7mEjWpXrG4IMCVehJ+K9 mV7IN/uS5qiJOsvVoX7uMjJXer1hGe247YoLfdMK4bapF4kUSj1XtqPwbXfUvPZqUfTw +FmtN9UGx2prFh+/QoNprDQvyAWYCBn4UgnDGGocPkf5wCqL2lbyaPVsgzPfXKK0MFtg Wb+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=zmgHj370LBNKV1cbwyo1caCm3JQLWtu1rtTInvPHiUM=; b=J16K7xEL5grLcZjP29oqytTr4vE3s4r+OLnT3A7oubopN8ATtsk2x57UKMzorMIUxO QVMC0hEgB5m1+j/CfXFswbnzaC8LepiTiXjr6ZjhQGlXCLiKc7e4JEYBQKIXEw905wTa 6WQIXKXucU+azzQNpPJfRNYphPBYFMLc6klfCaGePHqwL+CeCnKZQ9GBw1RZ4WH4A/2t UNESGR5GqKZEH2UMafZU447lrNaAuMrk/TnLcnU6XwJqV4ybXCMaK2Ozl2QdVQWO06ma dCxCFzCm1TZfSEVV1GYLPwTEFQkCa9cPSdYP4hX9zpWnmy7sx4TTQmLvu7aS2CcFNtRA nJog== X-Gm-Message-State: AGi0PuYSioTgjm3CUmKJZ+s9p8+04Q6z7mdnu77Sm2WZdFV4p/126Fc2 Q9sK7HUqKzfejzA68Hs5yjW2vQ== X-Received: by 2002:a17:902:ff14:: with SMTP id f20mr731344plj.206.1586455218768; Thu, 09 Apr 2020 11:00:18 -0700 (PDT) Received: from ?IPv6:2601:646:c200:1ef2:d3f:18b:ffcb:12f6? ([2601:646:c200:1ef2:d3f:18b:ffcb:12f6]) by smtp.gmail.com with ESMTPSA id c3sm2461610pfa.160.2020.04.09.11.00.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 Apr 2020 11:00:18 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Andy Lutomirski Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v3 04/13] task_isolation: userspace hard isolation from kernel Date: Thu, 9 Apr 2020 11:00:16 -0700 Message-Id: <915489BC-B2C9-4D47-A205-FC597FC68B98@amacapital.net> References: <58995f108f1af4d59aa8ccd412cdff92711a9990.camel@marvell.com> Cc: "frederic@kernel.org" , "rostedt@goodmis.org" , Prasun Kapoor , "mingo@kernel.org" , "davem@davemloft.net" , "linux-api@vger.kernel.org" , "peterz@infradead.org" , "linux-arch@vger.kernel.org" , "catalin.marinas@arm.com" , "tglx@linutronix.de" , "will@kernel.org" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , "netdev@vger.kernel.org" In-Reply-To: <58995f108f1af4d59aa8ccd412cdff92711a9990.camel@marvell.com> To: Alex Belits X-Mailer: iPhone Mail (17E255) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Apr 9, 2020, at 8:21 AM, Alex Belits wrote: >=20 > =EF=BB=BFThe existing nohz_full mode is designed as a "soft" isolation mod= e > that makes tradeoffs to minimize userspace interruptions while > still attempting to avoid overheads in the kernel entry/exit path, > to provide 100% kernel semantics, etc. >=20 > However, some applications require a "hard" commitment from the > kernel to avoid interruptions, in particular userspace device driver > style applications, such as high-speed networking code. >=20 > This change introduces a framework to allow applications > to elect to have the "hard" semantics as needed, specifying > prctl(PR_TASK_ISOLATION, PR_TASK_ISOLATION_ENABLE) to do so. >=20 > The kernel must be built with the new TASK_ISOLATION Kconfig flag > to enable this mode, and the kernel booted with an appropriate > "isolcpus=3Dnohz,domain,CPULIST" boot argument to enable > nohz_full and isolcpus. The "task_isolation" state is then indicated > by setting a new task struct field, task_isolation_flag, to the > value passed by prctl(), and also setting a TIF_TASK_ISOLATION > bit in the thread_info flags. When the kernel is returning to > userspace from the prctl() call and sees TIF_TASK_ISOLATION set, > it calls the new task_isolation_start() routine to arrange for > the task to avoid being interrupted in the future. >=20 > With interrupts disabled, task_isolation_start() ensures that kernel > subsystems that might cause a future interrupt are quiesced. If it > doesn't succeed, it adjusts the syscall return value to indicate that > fact, and userspace can retry as desired. In addition to stopping > the scheduler tick, the code takes any actions that might avoid > a future interrupt to the core, such as a worker thread being > scheduled that could be quiesced now (e.g. the vmstat worker) > or a future IPI to the core to clean up some state that could be > cleaned up now (e.g. the mm lru per-cpu cache). >=20 > Once the task has returned to userspace after issuing the prctl(), > if it enters the kernel again via system call, page fault, or any > other exception or irq, the kernel will kill it with SIGKILL. I could easily imagine myself using task isolation, but not with the SIGKILL= semantics. SIGKILL causes data loss. Please at least let users choose what s= ignal to send.=