Received: by 2002:ab2:6c55:0:b0:1fd:c486:4f03 with SMTP id v21csp254380lqp; Tue, 11 Jun 2024 23:49:30 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWIekEqTdWfvky6tR2VB4whCzDH02gYrRctjdmZ5SBTbDpEr04qX+98X/9hauN2k/JPQ5BcRC2hRe6myfcRgP/gvV7RlJYUt2hfyGCkDw== X-Google-Smtp-Source: AGHT+IGkETqflO63bbSbja422poMSeCuULDwOO8C4SYihBdSJOYz41CZJVexCeFDz+u77r0L/Dhn X-Received: by 2002:a50:d55d:0:b0:57c:6861:d72b with SMTP id 4fb4d7f45d1cf-57ca9749614mr516966a12.9.1718174969842; Tue, 11 Jun 2024 23:49:29 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1718174969; cv=pass; d=google.com; s=arc-20160816; b=lh/p/3CmbZX9aPwEBoQymSvnaSTfdWkkffmOma0znws1kLoLS0lXE4DKzCGma31yod KHDn0NB/zb//qQLM488nQSKdHks0xWlUw2s2NEOl2h9LWtjl60U0iqcDvRGJV9ll9XLO ZcoDMuFJUVTBCWBxpPbLAtEGqlQ9Ye3ssbawM8/RcxZ/q31woOepYLRuQmrlk5p/fRih XkunbqNL358+uGw2HtRNu+bI+KaBTRKQmmd8CG/xMv5q2UQMN1xw4n/VQwABG89MsAj2 CfQDW+MnImZHaiDeETGKzckZtP4KlipxywiPLrpEjP8gC49UeJ5lemvl//Br91Or2ukc IE2Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=O9z0/04Tq+ZkFyczZntOIrucurtZxJV5B+NC3qQbK/E=; fh=CLS9E6AWT10H4jQT9Z4mYk8UfWhJWRGinFXONZrkHTc=; b=CMw1l9MD3y1eWt63M6ucsULjUVUgtCC7Qj2T057C3JwQt4TbBb8xfUA/Akb3+4RWZO BaXHtUhWEHgFwo0RubE9vs+ah9doYMZfEc4CByxfr92EdbpiBZl4PgO9PrYiLCNQbLJh +7WIkvPzGPYjgN/qUhLqr94v5bMybi07wUc2KOtxA5c/CaR3sz18ebo/uaN9U3C7MpiE 6kpCtN+VSzB86673Wm95cFmZhT29vfkYIw1Tr4pN8pfN8LRZ6+QAtozR//g1lMfpaU4m xRoFLCvB8eFdFo3OC98GH3QsNtxuPeDHUGjzHVtWbn0cITtcjDt0HmeStzkPB/QmoDUt 41qg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@ionos.com header.s=google header.b=Fg9jqOav; arc=pass (i=1 spf=pass spfdomain=ionos.com dkim=pass dkdomain=ionos.com dmarc=pass fromdomain=ionos.com); spf=pass (google.com: domain of linux-kernel+bounces-211005-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-211005-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=ionos.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id 4fb4d7f45d1cf-57c6096b2a4si4969214a12.578.2024.06.11.23.49.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jun 2024 23:49:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-211005-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@ionos.com header.s=google header.b=Fg9jqOav; arc=pass (i=1 spf=pass spfdomain=ionos.com dkim=pass dkdomain=ionos.com dmarc=pass fromdomain=ionos.com); spf=pass (google.com: domain of linux-kernel+bounces-211005-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-211005-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=ionos.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 904F81F213C1 for ; Wed, 12 Jun 2024 06:49:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 35F7E168C1E; Wed, 12 Jun 2024 06:49:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ionos.com header.i=@ionos.com header.b="Fg9jqOav" Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B964167270 for ; Wed, 12 Jun 2024 06:49:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718174958; cv=none; b=ZRZYX32lVcyFTsJySIRpOAmZO/uTihd305mUqiORe42YN+r5d/VEc1E2gIihdsgMFXEvATWyVMgwexlWrKih7aeL1EbBUdEgDebaOBACyJkqJEux2mzoJMM8RbmUr2IBSyweJeV/VJFvef4i3AxfgCUiTTAjZJDp/8DRimtMfw4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718174958; c=relaxed/simple; bh=O9z0/04Tq+ZkFyczZntOIrucurtZxJV5B+NC3qQbK/E=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=ZTU4aG8pxoo23Gi6uHwZmOd1vM3Mm0gA9W62S6NcWSlb85nvDwhMc7xc+1tW/rvAuBEtPoXjsJ3AP/RlQmkrjh4G7QwyS/P+ZTzCk8rfba43gVTAcQTlyo2a0C8AvDRvXEbBca8cpLy8kKnLVUm95BbN/09v8JaYPGKWQ77Bb20= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=ionos.com; spf=pass smtp.mailfrom=ionos.com; dkim=pass (2048-bit key) header.d=ionos.com header.i=@ionos.com header.b=Fg9jqOav; arc=none smtp.client-ip=209.85.218.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=ionos.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ionos.com Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-a6f0dc80ab9so251963866b.2 for ; Tue, 11 Jun 2024 23:49:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ionos.com; s=google; t=1718174954; x=1718779754; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=O9z0/04Tq+ZkFyczZntOIrucurtZxJV5B+NC3qQbK/E=; b=Fg9jqOavegl2y4kBVjszt0XjdFLGt/UMqSyNX1hxcA5ByLvYG6RynCb5XV/7n2gDy/ rDQ1u7XTU2zL1zgHC25zRafnwaiPiODrWYE4w/sq/Sbdf9RX1IaTf6aAQRiJmC/5ZcLM E0TgavBbGOhwINlyKzm1i8hFnwKNeZ2A2JCxM13r8idp7+n/Ly0mDm4T6GOQZIQUe1Zi /n14Bw8UgaW/x7TsLlIUV5RZchfdun5WkylOdcLNfEN+/8aGCg73OjrymZnA0rNtLGZM ngBBCwGxmLCCZ12agteIuNKwkmLMbewcy7V1uEgArqdhi4m34MedtrbmxwZHJ0e56+g9 HjQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718174954; x=1718779754; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=O9z0/04Tq+ZkFyczZntOIrucurtZxJV5B+NC3qQbK/E=; b=bnV5x6O+hsiFTc7q8dIhxnOno91NAtJseSXCzW6mIKjFCbJRAj233b6cppSNkhPLqF Cb2ngXTISVtzU7lYUCecEfa8TyWyotCBReiO7cxQtEdCtD/ZlcF4eVSxDoz7aUZ29qkD bPlSd7GPKafVxN91JJQdrk/5Rnira4RXkqqMrYcqH7c1mm8xTC43XHtaDBVDvNmApIeV +j0g6xx4PrSnkTLhkj01TeOAbyE2MUZTBCWKq4lL6NLGqxf5RHpJIcX3O8m4ZGN4y4TX aU2Y2BqfkqkTWIT8pV/6/9IVnbdRm92tD3NdvIxQe4mzu/QfNpEZ/8vTBDlyZ/6jr36G WDGw== X-Forwarded-Encrypted: i=1; AJvYcCU//AWwDM+ngAZMBGVy7q0B8FRyYiChi36SPSUOffSqYnrnXl3vK7oHiyIUdjw5lbuvghkBA2HzDbV/+bBHAApwDkZ+UKDlUitnemv1 X-Gm-Message-State: AOJu0YwhA1aPlWGGrdF4r/mkz9d5RsJr49c8clfuac0/7enBoKE20O95 plc/RVwQwxLUoOdtrELjK2iE6iPyXm5qjZiDsyn2W1eu7Wgfqv1xIMaesFvXMnp5QdKPawr5kl3 lvrUf/stuzUh1JeBK5iQ9HYTqvIBATryfyZrqCw== X-Received: by 2002:a17:907:38c:b0:a6a:b1c7:ff33 with SMTP id a640c23a62f3a-a6f47c7d83cmr48980566b.8.1718174953644; Tue, 11 Jun 2024 23:49:13 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: In-Reply-To: From: Max Kellermann Date: Wed, 12 Jun 2024 08:49:02 +0200 Message-ID: Subject: Re: Bad psi_group_cpu.tasks[NR_MEMSTALL] counter To: Suren Baghdasaryan Cc: Johannes Weiner , Peter Zijlstra , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jun 12, 2024 at 7:01=E2=80=AFAM Suren Baghdasaryan wrote: > Instead I think what might be happening is that the task is terminated > while it's in memstall. How is it possible to terminate a task that's in memstall? This must be between psi_memstall_enter() and psi_memstall_leave(), but I had already checked all the callers and found nothing suspicious; no obvious way to escape the section without psi_memstall_leave(). In my understanding, it's impossible to terminate a task that's currently stuck in the kernel. First, it needs to leave the kernel and go back to userspace, doesn't it? > I think if your theory was > correct and psi_task_change() was called while task's cgroup is > destroyed then task_psi_group() would have returned an invalid pointer > and we would crash once that value is dereferenced. I was thinking of something slightly different; something about the cgroup being deleted or a task being terminated and the bookkeeping of the PSI flags getting wrong, maybe some data race. I found the whole PSI code with per-task flags, per-cpu per-cgroup counters and flags somewhat obscure (but somebody else's code is always obscure, of course); I thought there was a lot of potential for mistakes with the bookkeeping, but I found nothing specific. Anyway, thanks for looking into this - I hope we can get a grip on this issue, as it's preventing me from using PSI values for actual process management; the servers that go into this state will always appear overloaded and that would lead to killing all the workload processes forever. Max