Received: by 2002:ab2:2994:0:b0:1ef:ca3e:3cd5 with SMTP id n20csp119590lqb; Thu, 14 Mar 2024 07:04:44 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUrsYXfMZa2KEDy82vY/sqhWaczaAsERKh3rzVR2Nbq64DZfabBzXNBEulJ0+dMHuZwtjCZjvs5DOoIdG3MOvgAIDn02VC3cbWTD4er6A== X-Google-Smtp-Source: AGHT+IF7afkvWAujazpGudtaT91sPITAvQM51Bod2ED7/9Ct5C2ZA7mJyUFDgCq9vgZtjaOWSDNu X-Received: by 2002:a17:906:4f11:b0:a46:6d2b:8be2 with SMTP id t17-20020a1709064f1100b00a466d2b8be2mr1401883eju.29.1710425083918; Thu, 14 Mar 2024 07:04:43 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710425083; cv=pass; d=google.com; s=arc-20160816; b=ioOmZAOobymntKA1NHoHRZ8V1z9To76EBsYEJ2P/lRJwL/Pw4B4/FFzOCvumXKwJ13 /JpJUnVjIzQB+j8w42rT58JgZ5CAfWC07pR4aVtEE9CPqdH44Z+70zhpv2tpavYM0lys Mq4Je4OyRndH60VBIOQSkGa/QzUAsi0BUtULsa46XDEnJQjOTTEqESWpoHDHH+Hd/VsG Tw/sPs+4jh7Ns0CM4Wx1ndDgcCLzjydC7qCpIk4CyG4TEgt/IduQn0cxy73lJhipS3Y+ mEcKxCk0CTO67ONAu/3yuEb3a9FkUoEJ4RenzEuPW5hgb4g4RPduf3Lo4SARVLbyRpgq rmsQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=Hmgq8CAaz4Ke0Zq2nDCvEYSHRExJjsASTSq1J5x5BSo=; fh=ww/3kKOAdnWrCdEMsHso22zLXytGaMkVUdyczr4pSyk=; b=tmyZoAyBv5rlCw1M/u3ZJYBQ1g5hU//kQcTS5jU6FYUNKiUh5OT3UgBR9nmoGLBQ1a gzxGLIaRAaSwg4ZEGGfdvYTnzOIKCIwN/PIeA/7TqahWYjm94Cvfs1z5DKv+kn335SQ0 v6v6+AgKz5exS38b8R8EQq8eBCWk9JUSKgFwm9IW6fIyU6ML8rMj7MjUYlWGr1nL2lh3 PG1sh00pTjQe+2UTITjDz0+bYzjURvCS5KParVLdKRzwiKBHG2Z24FEvOUrwVEGEyw7p y8engVDyEjD5WIahAsebKvgDyQSjdKgbd+KABLrI018webU1ZiJkyAmRjOc5Uo6FF/it IijQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=p3mjRBYP; arc=pass (i=1 spf=pass spfdomain=soleen.com dkim=pass dkdomain=soleen-com.20230601.gappssmtp.com dmarc=pass fromdomain=soleen.com); spf=pass (google.com: domain of linux-kernel+bounces-103343-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103343-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=soleen.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id f14-20020a170906048e00b00a465412e421si715827eja.944.2024.03.14.07.04.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Mar 2024 07:04:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-103343-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=p3mjRBYP; arc=pass (i=1 spf=pass spfdomain=soleen.com dkim=pass dkdomain=soleen-com.20230601.gappssmtp.com dmarc=pass fromdomain=soleen.com); spf=pass (google.com: domain of linux-kernel+bounces-103343-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-103343-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=soleen.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 793A91F2195B for ; Thu, 14 Mar 2024 14:04:43 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 58AD36FBB7; Thu, 14 Mar 2024 14:04:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=soleen-com.20230601.gappssmtp.com header.i=@soleen-com.20230601.gappssmtp.com header.b="p3mjRBYP" Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B98C66F06D for ; Thu, 14 Mar 2024 14:04:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710425075; cv=none; b=H4P6MzAJ8w62qKaqVQqp+CxlTxF6qs7/DKvCUEiRlhJZ610yCC5hqOl+3m7khdX00SZ0mFnPqfLSLzdOQFEIxc66GZFVcAG6rFcc3zoWT4lWkddRweQSJDpfOwF2ZqlOGf6OjaqHu/FfdaOEAi+bo2A858eA9ESTLpDHag9QMBI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710425075; c=relaxed/simple; bh=Hmgq8CAaz4Ke0Zq2nDCvEYSHRExJjsASTSq1J5x5BSo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=jkwWG59yzCNfBNkETtHsUH1NKfhRkqN2DSt2Mv5kUOS7b+MPzP2D+8ccfLHwO3rIECQGlDKTnfENJgK3ErYUfDu/LXcDskQvW0yw9CkqoRxlFLVl3/wyfHoZMgGswRL/HWqJdCvb9LS3LsUn7LPPI1hmMy0hdL4NmoAQe6M6PJE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=soleen.com; spf=pass smtp.mailfrom=soleen.com; dkim=pass (2048-bit key) header.d=soleen-com.20230601.gappssmtp.com header.i=@soleen-com.20230601.gappssmtp.com header.b=p3mjRBYP; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=soleen.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=soleen.com Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-42ee2012bf0so7588361cf.0 for ; Thu, 14 Mar 2024 07:04:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710425072; x=1711029872; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Hmgq8CAaz4Ke0Zq2nDCvEYSHRExJjsASTSq1J5x5BSo=; b=p3mjRBYPhLUUdYAypUwm0kLzF2CS5qINen/EX/lmllYe9fBAJ2vTsLPa+uRR4ljjeU trgDMBeXsw64talNInyxsNzE4tWpPNXRgnuleARZnPCg4q81rBSI7Cd77bN/i+p5kmaI sFG9S/mOkegctQwCXe5YAszyB995JRLFMKEb+AOMDyYGAE41LOpfBZG6r1pff2JmyZEl XCorfQqMDbE0T9d6fJyuKBAZMiqt5CBDmr8r9HdW+riytbPFuue+iMIlGIiFlUq03RzR PTjPV0H5tmI4a2VHxjXyJBZIRtb5z3vGLMZQB2TBzLeKEGE1+eZJfxUjlcOD2Fip/590 iqdw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710425072; x=1711029872; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hmgq8CAaz4Ke0Zq2nDCvEYSHRExJjsASTSq1J5x5BSo=; b=k4KOkmlv19zl3L33zFMcbaBxhofx7gTvr4ydVONTvFIeJA1gGDseEwnbGQDrMdy6ip JVAOP3SbErqP8cOoeVapSmAuRGX+0C+hl+SKaOkhI9Vw3DywflXwbihU0gAzv/9CtQi1 aEPUfVtofzmcDkfdVExD3iFbAJpay+NCNP/PWJ/KtwoPAcBa8ol+PsffTEsi5yV07D8w KAkrlqlVx6LWFOJOppemhGhj3DKeLvn9tz4qSxpn2S9x6Knx79ql9fLEVNmjeZl7BJ5J IYnjCiqe05UFSbSITzEaKhy6dc960GxHnuowSIBdGRsvTK1sLTAfSnITTUjTQ3WSsVlx l2nQ== X-Gm-Message-State: AOJu0YzQEmSUFYsIaE5y47n4SZFWDPvkndRs/Rw291cDIxx/RQyTB4he fxmiZ9DlXYGzFkFJbVwcOaqqEhZzKJVB/imFJ6NBrnSb+nNFXHkrtg7+9GqiTC9qSVz5E65sJld 1sF32tMhgKTKUv9975qZema2pEHbDiv+OzgVVhQ== X-Received: by 2002:a05:622a:1816:b0:42e:db75:3cf9 with SMTP id t22-20020a05622a181600b0042edb753cf9mr8873826qtc.27.1710425072462; Thu, 14 Mar 2024 07:04:32 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <20240311164638.2015063-12-pasha.tatashin@soleen.com> <87v85qo2fj.ffs@tglx> <87bk7inmah.ffs@tglx> In-Reply-To: <87bk7inmah.ffs@tglx> From: Pasha Tatashin Date: Thu, 14 Mar 2024 10:03:55 -0400 Message-ID: Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks To: Thomas Gleixner Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, akpm@linux-foundation.org, x86@kernel.org, bp@alien8.de, brauner@kernel.org, bristot@redhat.com, bsegall@google.com, dave.hansen@linux.intel.com, dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, hch@infradead.org, hpa@zytor.com, jacob.jun.pan@linux.intel.com, jgg@ziepe.ca, jpoimboe@kernel.org, jroedel@suse.de, juri.lelli@redhat.com, kent.overstreet@linux.dev, kinseyho@google.com, kirill.shutemov@linux.intel.com, lstoakes@gmail.com, luto@kernel.org, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, mingo@redhat.com, mjguzik@gmail.com, mst@redhat.com, npiggin@gmail.com, peterz@infradead.org, pmladek@suse.com, rick.p.edgecombe@intel.com, rostedt@goodmis.org, surenb@google.com, urezki@gmail.com, vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Mar 13, 2024 at 12:12=E2=80=AFPM Thomas Gleixner wrote: > > On Wed, Mar 13 2024 at 11:28, Pasha Tatashin wrote: > > On Wed, Mar 13, 2024 at 9:43=E2=80=AFAM Pasha Tatashin > > wrote: > >> Here's a potential solution that is fast, avoids locking, and ensures = atomicity: > >> > >> 1. Kernel Stack VA Space > >> Dedicate a virtual address range ([KSTACK_START_VA - KSTACK_END_VA]) > >> exclusively for kernel stacks. This simplifies validation of faulting > >> addresses to be part of a stack. > >> > >> 2. Finding the faulty task > >> - Use ALIGN(fault_address, THREAD_SIZE) to calculate the end of the > >> topmost stack page (since stack addresses are aligned to THREAD_SIZE). > >> - Store the task_struct pointer as the last word on this topmost page, > >> that is always present as it is a pre-allcated stack page. > >> > >> 3. Stack Padding > >> Increase padding to 8 bytes on x86_64 (TOP_OF_KERNEL_STACK_PADDING 8) > >> to accommodate the task_struct pointer. > > > > Alternatively, do not even look-up the task_struct in > > dynamic_stack_fault(), but only install the mapping to the faulting > > address, store va in the per-cpu array, and handle the rest in > > dynamic_stack() during context switching. At that time spin locks can > > be taken, and we can do a find_vm_area(addr) call. > > > > This way, we would not need to modify TOP_OF_KERNEL_STACK_PADDING to > > keep task_struct in there. > > Why not simply doing the 'current' update right next to the stack > switching in __switch_to_asm() which has no way of faulting. > > That needs to validate whether anything uses current between the stack > switch and the place where current is updated today. I think nothing > should do so, but I would not be surprised either if it would be the > case. Such code would already today just work by chance I think, > > That should not be hard to analyze and fixup if necessary. > > So that's fixable, but I'm not really convinced that all of this is safe > and correct under all circumstances. That needs a lot more analysis than > just the trivial one I did for switch_to(). Agreed, if the current task pointer can be switched later, after loads and stores to the stack, that would be a better solution. I will incorporate this approach into my next version. I also concur that this proposal necessitates more rigorous analysis. This work remains in the investigative phase, where I am seeking a viable solution to the problem. The core issue is that kernel stacks consume excessive memory for certain workloads. However, we cannot simply reduce their size, as this leads to machine crashes in the infrequent instances where stacks do run deep. Thanks, Pasha