Received: by 2002:ab2:710b:0:b0:1ef:a325:1205 with SMTP id z11csp643612lql; Mon, 11 Mar 2024 12:55:54 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU8juyDkFonQXyLL/4A8VuHLTO8q01PUFduB2VmCf8jDN3cMLFGNSHEGLvq1TYJP+UxErA0+FBEkSVmMSL+v7afch2P1ZIXdDTA6JcMCQ== X-Google-Smtp-Source: AGHT+IGezz77YXJKwDOJef9usb1tV/CadDiGlbtPTUnprMUcNX7YcuF0G4DPbpdp4PzipeM4z6Ih X-Received: by 2002:a05:6a20:748d:b0:1a0:fea5:ae58 with SMTP id p13-20020a056a20748d00b001a0fea5ae58mr9717123pzd.8.1710186953822; Mon, 11 Mar 2024 12:55:53 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710186953; cv=pass; d=google.com; s=arc-20160816; b=WgAG4B2Lamsv0oW+xmxhzDrnc3Y0jGU+CK4MqA41v/kl7Vh+zXYtXHIFmv3fBKiHyD MkcQsb4u69oujOqb3SXVxKDiwUWzsMBvvNdkXg93Qxv2KeNJoV588fakfoFoZqoMVWPX 8NcGxmJTKkbeI2JCOnLYdAzK5E5+IfgcHXbjAtx96Y0MnD0p6xNqWmWK6lDZI5/ys/Ch z07wZwvWBwry1AiE+BN6NGjT8YibQU+euzVd/Nzj2Z284biOBDjPQWqcRUzfw8TfpeVy 8fqkE5rq3CX/mLO0zLuKI++7AwBiBiD41/WyTJrTgn5qxrc7pQwlK5+01EWA/hYnrwfC tBsg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=A1aBHw5gQ1J5Bs32lBVEKHzkNuxIPKT5rP8UBnpACOM=; fh=8h3iDDbxJnJQucQBgCxLC43QjGGz7gNZqG7RE1jx6i0=; b=QibAAcqxirpD6OrhfhMpd34hg9LIcEs4cvdAPLHA+vkHNORp7VgoVvYFxm0yI0ELDj 6xgD7uCx4/7KAFQkJHXRSo0Ukqu8aSJKhJhPAjzpj8JVwfJTlTNkUPV1svrAsgs6acfJ bwBwqlZgec5tgJwFGv8zbspzce+r/TGrwnnDYBRrTDbMYyWL+74oGMibTQSQ1Wq/JUwt XUnl1Bou7Ma83Cyx9bqwWu6GNKt0YSnCIwxI504aerV3CKs+4l/YzfZoX/WyOwZIJXMn JsRAB/sk2XXrhSqKlQnHP2KJY4eJsG4jJTjOd9VVaCaUvQiwvQttGlEEyJ0FXTuSeAuK BDPw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=Hys0txg+; arc=pass (i=1 spf=pass spfdomain=soleen.com dkim=pass dkdomain=soleen-com.20230601.gappssmtp.com dmarc=pass fromdomain=soleen.com); spf=pass (google.com: domain of linux-kernel+bounces-99491-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-99491-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=soleen.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id k3-20020a634b43000000b005e49bf549b2si5610347pgl.523.2024.03.11.12.55.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Mar 2024 12:55:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-99491-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=Hys0txg+; arc=pass (i=1 spf=pass spfdomain=soleen.com dkim=pass dkdomain=soleen-com.20230601.gappssmtp.com dmarc=pass fromdomain=soleen.com); spf=pass (google.com: domain of linux-kernel+bounces-99491-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-99491-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=soleen.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 11FB7281C57 for ; Mon, 11 Mar 2024 19:55:53 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 784FD56458; Mon, 11 Mar 2024 19:55:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=soleen-com.20230601.gappssmtp.com header.i=@soleen-com.20230601.gappssmtp.com header.b="Hys0txg+" Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6D254206B for ; Mon, 11 Mar 2024 19:55:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710186945; cv=none; b=e5Qgpj8YkGoyJir2pr56PgCnrXhzrjMwc5HbtaSZLle0ydYU30jNG6oVmkydHL3AyhBsGzPxdiiQusWjHUWdGfQeoV40MeJ53vb4sg1JeNSSAcDFP4/zaaMu9wiKCwvQorHYmnu7O4Cs9scOzL9EbSZEED5CS9yXMQm+DfGcj1M= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710186945; c=relaxed/simple; bh=A1aBHw5gQ1J5Bs32lBVEKHzkNuxIPKT5rP8UBnpACOM=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=cgSu7I4fCWaD6/UI3yPvFotlzanS1q5Dz+Tm/8UWuo6sdTMsS76Tbb2LNxt7rBCO4P/m6kYygBQ7OyLG8kOM9gpFG7vFG/Z2Ywx/sah3j9657EMdAvhbfmsKA2ospUOCxWEL48uPJ91f8+tVUu6bZrWiUSdpodBZuo6KZGGqGpc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=soleen.com; spf=pass smtp.mailfrom=soleen.com; dkim=pass (2048-bit key) header.d=soleen-com.20230601.gappssmtp.com header.i=@soleen-com.20230601.gappssmtp.com header.b=Hys0txg+; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=soleen.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=soleen.com Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-42f2009e559so34898281cf.0 for ; Mon, 11 Mar 2024 12:55:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710186943; x=1710791743; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=A1aBHw5gQ1J5Bs32lBVEKHzkNuxIPKT5rP8UBnpACOM=; b=Hys0txg+j9BsIjjTIoc7PjeaAl5M0kn28MMLfeW5G/X1VUu1uYwMbK8KjeBOpr1AZk +3CmtwgZFZTCghh0ezmYRmu4FEYMKSSF8xvR6wCK/iSEVqzjjZTRXRitAsNvSOsFey5G 6oRSR5LGRexFwSlwMTrmeLMBG9aytHfW4V9MQkLInWIBpjTIHuktxnBNxZDtJqfJw6at zwO8k3gz8Krc5jiCk1b21ManAgFFaXNkTXMaf3Dt67l66M07xYVdjotCEPGrVF7IE08c Npwj9EE6dCT5WZVrDYp90DhD2ctns/nRtUI9azdbGkx+/eqvYGN6ERbwte+jV6jo/bqZ xs6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710186943; x=1710791743; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A1aBHw5gQ1J5Bs32lBVEKHzkNuxIPKT5rP8UBnpACOM=; b=E6fgCk0W9/09J/Lbk7b2mS8DmBPJPRzvLKLaWiOK/FZXykrADdgpGdJLveBZCFD8kY 7v6p8LGumAqnL039BYeeW9hrdz6IXqHxTWuhP8231YCv0bmOgaFrACcGaJ+nmT6Y3cmH U96QHPDmWodg1DjLEMFexZYTxlSovfpAltmPUygEbxzos8f3BjgB20HP3UNkHI8aVUiH kDjFt8LFi2NXhOEZiZ1ryDO/PN6LuU9un0u15NKP6Q5p8EFPU/hX+0FAzVAXokTiOR7l JbaKzdiXHC1dRO4doJpEMocmi/vObkyGpOJHUYO6Mj+tkkg4NtIbJxJ5jb0hmgtDaP7K /Qzg== X-Gm-Message-State: AOJu0YxhgCLtRUfpcYPnU6JFpVFa6/CvHOrgYezxrJ1MRUYdKALXpjWD sgIR+sn0yyFau2AZ3oOV0TvOt/TSZQFEoen7EIoI3yKF94c5hx5CXungr2dMh4Pqg8I+RSDyMaA uK4d4Npk5KvE327BRX/SMehZN1eM67tQQH8Vi9g== X-Received: by 2002:a05:622a:4cf:b0:42f:201c:d4e3 with SMTP id q15-20020a05622a04cf00b0042f201cd4e3mr12570375qtx.13.1710186942817; Mon, 11 Mar 2024 12:55:42 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> In-Reply-To: From: Pasha Tatashin Date: Mon, 11 Mar 2024 15:55:06 -0400 Message-ID: Subject: Re: [RFC 00/14] Dynamic Kernel Stacks To: Mateusz Guzik Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, tglx@linutronix.de, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 11, 2024 at 3:21=E2=80=AFPM Mateusz Guzik w= rote: > > On 3/11/24, Pasha Tatashin wrote: > > On Mon, Mar 11, 2024 at 1:09=E2=80=AFPM Mateusz Guzik wrote: > >> 1. what about faults when the thread holds a bunch of arbitrary locks > >> or has preemption disabled? is the allocation lockless? > > > > Each thread has a stack with 4 pages. > > Pre-allocated page: This page is always allocated and mapped at thread > > creation. > > Dynamic pages (3): These pages are mapped dynamically upon stack faults= . > > > > A per-CPU data structure holds 3 dynamic pages for each CPU. These > > pages are used to handle stack faults occurring when a running thread > > faults (even within interrupt-disabled contexts). Typically, only one > > page is needed, but in the rare case where the thread accesses beyond > > that, we might use up to all three pages in a single fault. This > > structure allows for atomic handling of stack faults, preventing > > conflicts from other processes. Additionally, the thread's 16K-aligned > > virtual address (VA) and guaranteed pre-allocated page means no page > > table allocation is required during the fault. > > > > When a thread leaves the CPU in normal kernel mode, we check a flag to > > see if it has experienced stack faults. If so, we charge the thread > > for the new stack pages and refill the per-CPU data structure with any > > missing pages. > > > > So this also has to happen if the thread holds a bunch of arbitrary > semaphores and goes off cpu with them? Anyhow, see below. Yes, this is alright, if thread is allowed to sleep it should not hold any alloc_pages() locks. > >> 2. what happens if there is no memory from which to map extra pages in > >> the first place? you may be in position where you can't go off cpu > > > > When the per-CPU data structure cannot be refilled, and a new thread > > faults, we issue a message indicating a critical stack fault. This > > triggers a system-wide panic similar to a guard page access violation > > > > OOM handling is fundamentally what I was worried about. I'm confident > this failure mode makes the feature unsuitable for general-purpose > deployments. The primary goal of this series is to enhance system safety, not introduce additional risks. Memory saving is a welcome side effect. Please see below for explanations. > > Now, I have no vote here, it may be this is perfectly fine as an > optional feature, which it is in your patchset. However, if this is to > go in, the option description definitely needs a big fat warning about > possible panics if enabled. > > I fully agree something(tm) should be done about stacks and the > current usage is a massive bummer. I wonder if things would be ok if > they shrinked to just 12K? Perhaps that would provide big enough The current setting of 1 pre-allocated page 3-dynamic page is just WIP, we can very well change to 2 pre-allocated 2-dynamic pages, or 3/1 etc. At Google, we still utilize 8K stacks (have not increased it to 16K when upstream increased it in 2014) and are only now encountering extreme cases where the 8K limit is reached. Consequently, we plan to increase the limit to 16K. Dynamic Kernel Stacks allow us to maintain an 8K pre-allocated stack while handling page faults only in exceptionally rare circumstances. Another example is to increase THREAD_SIZE to 32K, and keep 16K pre-allocated. This is the same as what upstream has today, but avoids panics with guard pages thus making the systems safer for everyone. Pasha