Received: by 2002:ab2:710b:0:b0:1ef:a325:1205 with SMTP id z11csp751295lql; Mon, 11 Mar 2024 17:09:29 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXEPsOa3AOWbhV8VOVTGE+8o2EV7TofWUeGdb62koIGjC5Z+P70AjnSs0a6S7+mj1HqCmbcM9NIJ8e0Dsm0EX5FMLKLwbl1ry1yrrfyTw== X-Google-Smtp-Source: AGHT+IHmugO31u1OdUYIijDyLjQzgdU+4l+xe+aUepOm7csxPxeFFltDuE7YniA3I0gEnCb68XH5 X-Received: by 2002:a05:6a20:1454:b0:1a0:8897:85f1 with SMTP id a20-20020a056a20145400b001a0889785f1mr9850393pzi.6.1710202168859; Mon, 11 Mar 2024 17:09:28 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1710202168; cv=pass; d=google.com; s=arc-20160816; b=um5Tu109gwV+NRGVsCl5VeWmrS56B+cMbyRpdYVnaLEhUmzwKT+jabZHy0DYhEwMQ4 S4qItxs6QgkUkdWpQWTwr4YjFQj/YYWYQ/Bb9EeI9UXmnhs3iz9pV27UuiGqSoi0isOt Vb7M7MVfQHdhP8qJx2Q84WwHJl8avMM1podcNTr5HEWHmtvJWeWUhJP5zfQYhsgedaWe k1if2QtsX2u92y+lxNvmNocXKnBIX+9NMRvdw3aQO9U3hMa2OOxAYBmno1y5BwNQzVDd ecx+EWAhEOhYuIjSMZp1d8p2kzuupvtJRW+21EEixXEW+O08BeJOp3VSu3r6u9KOb5wd hHSg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=AnBY48HsUFUFKc13ajCqh9BG/16i5QkSsCcX73NoIzo=; fh=B8QgWRpwIxh9a5O4W9KLqTXkjrQvRhRwJqkM3IDxhXg=; b=OF60Jv8J7j8amCLoPOx7CAdBz1C+tv8MoQ9f5es6Y63enfFAJCUnG1numAh9Ymd9m+ XFFXv/7p6j1VJthBBe9ZNnWKMghc64IswH5q1/irZT08ZDnkW/FR2rW9tG4hF/DGqfPT Tgf73NRCj3aXDk1smMScZ46DFvIWYlO1l6eFILndDgvpOFfpLjS1FHsYpF0orUrdqBGw Cp3AEZFRFm4gyTc0Axv6BRustzsTsXnwQySPdlNi3DYYH7co029J1z2pUCMrW4s9PPe7 SQuIaLAL9DwBRWMFO6hajjztayb/eqios2+prjf7s3VdKCVRXZSazH3700P5hi5+jNFR KV1w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=UKT0mLkE; arc=pass (i=1 spf=pass spfdomain=soleen.com dkim=pass dkdomain=soleen-com.20230601.gappssmtp.com dmarc=pass fromdomain=soleen.com); spf=pass (google.com: domain of linux-kernel+bounces-99690-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-99690-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=soleen.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id c5-20020a654205000000b005dc97ac8afcsi5808090pgq.125.2024.03.11.17.09.28 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Mar 2024 17:09:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-99690-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen-com.20230601.gappssmtp.com header.s=20230601 header.b=UKT0mLkE; arc=pass (i=1 spf=pass spfdomain=soleen.com dkim=pass dkdomain=soleen-com.20230601.gappssmtp.com dmarc=pass fromdomain=soleen.com); spf=pass (google.com: domain of linux-kernel+bounces-99690-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-99690-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=soleen.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id F164DB21330 for ; Tue, 12 Mar 2024 00:09:26 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C748D6FC5; Tue, 12 Mar 2024 00:08:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=soleen-com.20230601.gappssmtp.com header.i=@soleen-com.20230601.gappssmtp.com header.b="UKT0mLkE" Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C2E7567F for ; Tue, 12 Mar 2024 00:08:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710202135; cv=none; b=awSP8E23jpD4B9/6yPGgj821lKi+EVPAX7n/E2ND7Dq9nogEVSdQ9bBHCe+g575cB6xEg3uNW9z1zYxOrKPn9/8ZYcuGZU5Nm/EWywUN53efMmVYAFGjTa4uQJ+tq98cFo5PiAEFqL1lON92Jy1cSJhKW2p4BPeZ4Jc/5u39WAI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1710202135; c=relaxed/simple; bh=/C3VJItrs0o7vvnGQLjOLjLnoEcD/Axem3JkzmQPo9g=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=ZacoTyxYNHreIU0LfDGoOFSf0kszmqnJ/lWT5gQX/stJDPDyvKJ8AuSp1L/TIRrRSqQ2MjuCTFLwatWLKFMM/I+kwK14isx9TasfRv0JUvzpfpF2gpct3oV1CUWw5gznouFdzOWhSjfYiMH+Ui0xLOp4B9jFqXuPJgRZt6BCLvs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=soleen.com; spf=pass smtp.mailfrom=soleen.com; dkim=pass (2048-bit key) header.d=soleen-com.20230601.gappssmtp.com header.i=@soleen-com.20230601.gappssmtp.com header.b=UKT0mLkE; arc=none smtp.client-ip=209.85.160.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=soleen.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=soleen.com Received: by mail-qt1-f181.google.com with SMTP id d75a77b69052e-42efa84f7b5so44427751cf.1 for ; Mon, 11 Mar 2024 17:08:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen-com.20230601.gappssmtp.com; s=20230601; t=1710202133; x=1710806933; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AnBY48HsUFUFKc13ajCqh9BG/16i5QkSsCcX73NoIzo=; b=UKT0mLkEhs8Bl01vvKWBI4JcwoV6DBnn6sZVLVpAYIly8OOkuiFM/ufSmx0dzU2L9a BWtwinfL3vactbjXu+mhIzDJU9JCBuRRVyJAiIlKwz6IYbLM0sCxxp/xRxmgpR5S1lqr tGyiPbEHT+MBP98KJXy9qpqZtxdWeMRir5eqB9ViyFiytT+qWzWbfx/m6+aZzn2hpDQt k+B54zMCj0VUyHkQVKKtohvQx7G8oMeycdBM0Kt6XxYbo5PqAspHmgM8mQIa8KWPwbs0 Th+pnmFHYtH56cXYOWhEkXfzpM5onMrAJ/8V0ci+nWDXJg3t3aFRtIVu8eU980R9SLKh 8sFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710202133; x=1710806933; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AnBY48HsUFUFKc13ajCqh9BG/16i5QkSsCcX73NoIzo=; b=boTLTV8Ii8JGM5R4WrN5quuBQ6aKLdHt9CVoHsD2WOzkM/AIpdOOB2bFe2vSdZKmNX bEWEM/BnOZWpTWTodN7tBRge32WT+RwkwPpVSm73p2pzdLe717IE0+majt7ENFrOHSR+ k0/kf2AInoHuSivmuuksYopdd+uMZ76vCq5PtZFwpmZbKNAGkqdCDu7O1wW0Jnas2DaM daFwZ7cYONuOGumEXz2fS3gTG5aG7p3XGzyfjhm2J3fRSTAR+ixRF/B/jMm6hh+JNI7/ m0imaKpObhtHBatoks08jCX9k/ftEGZFZqbmRUlQto1DSVQX3z0UfT1ofT+FLJKyrvYV lJrA== X-Gm-Message-State: AOJu0YyReb6f/VQ9rUGWzlIh0BD/tozStDEyRcmtuDFUHVqV9rc0iHoU hkk33YK1HhlFrHjV0hIra5SxGdwH85/bxu6IG8hbpZpA4FLCFkr0LWjJhqPFqxqMdsNNIjtxf5n dxYLstrONqFuJWyaj74pmEl9/VzVmDt/G4yfYsQ== X-Received: by 2002:ac8:5f13:0:b0:42e:db75:3cf9 with SMTP id x19-20020ac85f13000000b0042edb753cf9mr14493400qta.27.1710202133152; Mon, 11 Mar 2024 17:08:53 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240311164638.2015063-1-pasha.tatashin@soleen.com> <20240311164638.2015063-12-pasha.tatashin@soleen.com> <3e180c07-53db-4acb-a75c-1a33447d81af@app.fastmail.com> <1ac305b1-d28f-44f6-88e5-c85d9062f9e8@app.fastmail.com> In-Reply-To: <1ac305b1-d28f-44f6-88e5-c85d9062f9e8@app.fastmail.com> From: Pasha Tatashin Date: Mon, 11 Mar 2024 20:08:16 -0400 Message-ID: Subject: Re: [RFC 11/14] x86: add support for Dynamic Kernel Stacks To: Andy Lutomirski Cc: Linux Kernel Mailing List , linux-mm@kvack.org, Andrew Morton , "the arch/x86 maintainers" , Borislav Petkov , Christian Brauner , bristot@redhat.com, Ben Segall , Dave Hansen , dianders@chromium.org, dietmar.eggemann@arm.com, eric.devolder@oracle.com, hca@linux.ibm.com, "hch@infradead.org" , "H. Peter Anvin" , Jacob Pan , Jason Gunthorpe , jpoimboe@kernel.org, Joerg Roedel , juri.lelli@redhat.com, Kent Overstreet , kinseyho@google.com, "Kirill A. Shutemov" , lstoakes@gmail.com, mgorman@suse.de, mic@digikod.net, michael.christie@oracle.com, Ingo Molnar , mjguzik@gmail.com, "Michael S. Tsirkin" , Nicholas Piggin , "Peter Zijlstra (Intel)" , Petr Mladek , Rick P Edgecombe , Steven Rostedt , Suren Baghdasaryan , Thomas Gleixner , Uladzislau Rezki , vincent.guittot@linaro.org, vschneid@redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > >> There are some other options: you could pre-map > > > > Pre-mapping would be expensive. It would mean pre-mapping the dynamic > > pages for every scheduled thread, and we'd still need to check the > > access bit every time a thread leaves the CPU. > > That's a write to four consecutive words in memory, with no locking requi= red. You convinced me, this might not be that bad. At the thread creation time we will save the locations of the unmapped thread PTE's, and set them on every schedule. There is a slight increase in scheduling cost, but perhaps it is not as bad as I initially thought. This approach, however, makes this dynamic stac feature much safer, and can be easily extended to all arches that support access/dirty bit tracking. > > > Dynamic thread faults > > should be considered rare events and thus shouldn't significantly > > affect the performance of normal context switch operations. With 8K > > stacks, we might encounter only 0.00001% of stacks requiring an extra > > page, and even fewer needing 16K. > > Well yes, but if you crash 0.0001% of the time due to the microcode not l= iking you, you lose. :) > > > > >> Also, I think the whole memory allocation concept in this whole series= is a bit odd. Fundamentally, we *can't* block on these stack faults -- we= may be in a context where blocking will deadlock. We may be in the page a= llocator. Panicing due to kernel stack allocation would be very unpleasan= t. > > > > We never block during handling stack faults. There's a per-CPU page > > pool, guaranteeing availability for the faulting thread. The thread > > simply takes pages from this per-CPU data structure and refills the > > pool when leaving the CPU. The faulting routine is efficient, > > requiring a fixed number of loads without any locks, stalling, or even > > cmpxchg operations. > > You can't block when scheduling, either. What if you can't refill the po= ol? Why can't we (I am not a scheduler guy)? IRQ's are not yet disabled, what prevents us from blocking while the old process has not yet been removed from the CPU?