Received: by 2002:a05:6a10:d5a5:0:0:0:0 with SMTP id gn37csp1513179pxb; Fri, 1 Oct 2021 12:16:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyP30gAI92StqSAcJ8mkJxISHnGYO5kv4FuUH2JD1t1GSNDExbl1QesCrw/IXevWi7U1jgl X-Received: by 2002:a17:906:d215:: with SMTP id w21mr8287590ejz.448.1633115783133; Fri, 01 Oct 2021 12:16:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633115783; cv=none; d=google.com; s=arc-20160816; b=lQBYclqYpE3oPbu4DXNt9ARIV3QbOXEZAXoe9RHjXu+vb8uMomet/FFH4A+48tlV1D rMxkBrNKdZr1+qSPtuiAS/pTp62tqB8g+NE2SOP5/TJyv9A95gTYSqo613Ko67sCLMzv DrwKTChXf3cC9a1f2nG6WKbE/ppv5OK3kesTQEKl1h3vwjsJImG2cPB/lO8+pjw3j3i4 wr8LXts17HUV6L/RrNbuHwbKJiUouR3Lb1in/Gr7LkQqC9u3AD4GKE1ojLxrFpYAzvpQ 0g2uSkfYa32dN0GYsL8BglZd3FWet1mSkvGOnAfcKyaj7jawbP33azNpTitOQ+hWvnnK neTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=CZrZhkAJqTK9D8J8a+fdJS6U3SXRow/vIKF725oN3iM=; b=tML18aboZ3NwN1mzIZ+9UnCQ6m6dcJRYQW0L2TnV1++DyGhho7odh5+iZToXbP+vlN 0OrWYX84Y/Wb0r9kKE5KT5PuY0owvEJ+ID4cVhavjjDZVKaWymyosGXpqmnmbO+0VguM lHPpvxVxJJ2sx3ElQZppnDyE27+5ryapFnosw6i+U6nFpERJQ/OtRhPjseftDyxuooNR ZZ0KFChjm54s0/hR563zOWiNXYS3ZqDr84UZ9dH+pdA7fGFy6nuDbwsl72ZdLzYOtP1s Rdbn6K5jQHZuLNaSOtF58DY8JT5nK/ueVjhaV7GEmTwzv1kwHroE9W8N3OoO7YAwwKXE BM3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="h4/Ll86J"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g8si1620532eds.329.2021.10.01.12.15.57; Fri, 01 Oct 2021 12:16:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="h4/Ll86J"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231948AbhJASut (ORCPT + 99 others); Fri, 1 Oct 2021 14:50:49 -0400 Received: from mail.kernel.org ([198.145.29.99]:50938 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231510AbhJASur (ORCPT ); Fri, 1 Oct 2021 14:50:47 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id BB41561881 for ; Fri, 1 Oct 2021 18:49:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1633114142; bh=Bx8bikt8D/SqC4WgRhAep40YSZrd61378RH9MsoFItY=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=h4/Ll86J/ZxwNMrpkta/Y8YjsNLg0TutmshsSB7QmNqdeUmhaH8ppOld4rVlVVCz9 yPSRKKlcfOom73FqgL5wLbCbWZMcnuA2z+64YZTqodynpGlaLbBp+Y9vB6QECsqz0P UbXhZlpxoo2Dja29UvxTFJ1PW61crdxX1J94eGGZX1K7/xPaCZ+0Xw9HTaSfEwOdOE qGmQOTE3GWe1RkeDj9ll84RYFY19Yygxv059ZbmSX224nB0+XrbxRMTZrZf7x3Uc88 0KUC0oLuW1g1JZ1EZKEPSdYxskeaktFYc8CG/bpOBgNoxfG/YYGqK8rXIlCoju7KVK N6n/V47JPLykQ== Received: by mail-ed1-f49.google.com with SMTP id v10so38434752edj.10 for ; Fri, 01 Oct 2021 11:49:02 -0700 (PDT) X-Gm-Message-State: AOAM532+00auL13ERPjWjyPIIB51g1mBSNmi/lAIBX4V1XHlU99FnaoO w9Eun85a5LuBRaH9PYDc7Adz0MSDprCHvMSMOtU1Kw== X-Received: by 2002:aa7:c945:: with SMTP id h5mr16158050edt.350.1633114141151; Fri, 01 Oct 2021 11:49:01 -0700 (PDT) MIME-Version: 1.0 References: <20210928122339.502270600@linutronix.de> <20210928122411.593486363@linutronix.de> <87o8884q02.ffs@tglx> In-Reply-To: <87o8884q02.ffs@tglx> From: Andy Lutomirski Date: Fri, 1 Oct 2021 11:48:48 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [patch 4/5] sched: Delay task stack freeing on RT To: Thomas Gleixner Cc: Peter Zijlstra , LKML , Ingo Molnar , Sebastian Andrzej Siewior , Masami Hiramatsu Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Oct 1, 2021 at 10:24 AM Thomas Gleixner wrote: > > On Fri, Oct 01 2021 at 09:12, Andy Lutomirski wrote: > > On Wed, Sep 29, 2021 at 4:54 AM Peter Zijlstra wrote: > >> Having this logic split across two files seems unfortunate and prone to > >> 'accidents'. Is there a real down-side to unconditionally doing it in > >> delayed_put_task_struct() ? > >> > >> /me goes out for lunch... meanwhile tglx points at: 68f24b08ee89. > >> > >> Bah.. Andy? > > > > Could we make whatever we do here unconditional? > > Sure. I just was unsure about your reasoning in 68f24b08ee89. Mmm, right. The reasoning is that there are a lot of workloads that frequently wait for a task to exit and immediately start a new task -- most shell scripts, for example. I think I tested this with the following amazing workload: while true; do true; done and we want to reuse the same stack each time from the cached stack lookaside list instead of vfreeing and vmallocing a stack each time. Deferring the release to the lookaside list breaks it. Although I suppose the fact that it works well right now is a bit fragile -- we're waking the parent (sh, etc) before releasing the stack, but nothing gets to run until the stack is released. > > > And what actually causes the latency? If it's vfree, shouldn't the > > existing use of vfree_atomic() in free_thread_stack() handle it? Or > > is it the accounting? > > The accounting muck because it can go into the allocator and sleep in > the worst case, which is nasty even on !RT kernels. Wait, unaccounting memory can go into the allocator? That seems quite nasty. > > But thinking some more, there is actually a way nastier issue on RT in > the following case: > > CPU 0 CPU 1 > T1 > spin_lock(L1) > rt_mutex_lock() > schedule() > > T2 > do_exit() > do_task_dead() spin_unlock(L1) > wake(T1) > __schedule() > switch_to(T1) > finish_task_switch() > put_task_stack() > account() > .... > spin_lock(L2) > > So if L1 == L2 or L1 and L2 have a reverse dependency then this can just > deadlock. > > We've never observed that, but the above case is obviously hard to > hit. Nevertheless it's there. Hmm. ISTM it would be conceptually for do_exit() to handle its own freeing in its own preemptible context. Obviously that can't really work, since we can't free a task_struct or a task stack while we're running on it. But I wonder if we could approximate it by putting this work in a workqueue so that it all runs in a normal schedulable context. To make the shell script case work nicely, we want to release the task stack before notifying anyone waiting for the dying task to exit, but maybe that's doable. It could involve some nasty exit_signal hackery, though.