Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp840856rdb; Thu, 18 Jan 2024 23:48:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IHc9euLdnwOAUGWOisxexnfeuFMjGAseqRFrug79V9vl452NXauVTywPTklst7MboMhPqWL X-Received: by 2002:a17:907:2948:b0:a2f:16b0:cddb with SMTP id et8-20020a170907294800b00a2f16b0cddbmr674575ejc.198.1705650496296; Thu, 18 Jan 2024 23:48:16 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705650496; cv=pass; d=google.com; s=arc-20160816; b=D+LqtyKQEAEgaGanlNmXh9AXHLExgcMcipvcpZ5DZSEYaQtHtOijudIzHvbYl3U6jL rOq0qh8I24MmtmN9DljCfWrjcg5BvK8a0FRa5X+9TptQPExRuzDWWaCj7EcNIyVnJZkd pN7/UO8z7JEbCrKo6fLcUhPJbDpPpx0Qi7X/bRbXK3VndWKwl9wiq9mnLXkyxPj0lsUM U7aTHCphhvZTQhzRvEBNGYUEHWueRHKy2hHjidnYGnortknLM4AL4BUuW3Ggcpi4rz0y vevgQlKd1CnnGOrSnHGfl/E9GysFAcSbRM8P7OQ9lUiJLlQ2svGHjOMjnx2fEdpPNWdC Uc5g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=; fh=l6q22G6F2sW9J33oh0/xQYt0MF8dFDLMn4Dp+wh2Utk=; b=S3y9u6aBLtLVnGmUj8JsXN1O6OaQCJwfrb2aoC8gU6rL5f5mjnGtw/KH653PScrBxW JvogwL5Ii8Qm7w3eS+1KFIro5Ug8nZZ80zKzlxV9Vm2owxKnAEuzMjQR/YcuJxhMaULm QVjKGVrN9X/+Mbw0tvkOr51k6y5FhHv+P0K2LX67Uv5LJCC1MTn3QNU+JJda+flaw6of Sx2iLCjqzvDUj0nK5ZGHTkRFbcXQFQ6vPF8jVVg++Ew/GKf3ehOorhIl3dZFYszvdTdk ym5DQ54DrqmXAuH2EyUAV4mk2XaGk2YaNRPXORPL86U4PF0AqfHTQdXy2vSLGpNHCTvt iDrA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PhZMrL3n; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-30864-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30864-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id x8-20020a1709060a4800b00a2777fddfefsi7579222ejf.772.2024.01.18.23.48.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 23:48:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-30864-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=PhZMrL3n; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-30864-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30864-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id C0B271F21422 for ; Fri, 19 Jan 2024 07:48:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CF2F3D53B; Fri, 19 Jan 2024 07:48:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PhZMrL3n" Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90D4E6FAF for ; Fri, 19 Jan 2024 07:48:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.181 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705650484; cv=none; b=F4p3l0kxOUZI2ALTgOxWPntspBYoZwDDE4Lw8q1JHRNtP1zj5775nMdjqJ4pUgcc19hd/RE3oZpwPImwJog0N7p4JCtYhv0WvnAghtymQ+0N38hch7T/SvgWUj8udF+LuabfLCAcaIeRDchT8v8GoB5+PRAzQFa8q+iNWIOB5Ek= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705650484; c=relaxed/simple; bh=dbAOVjE2Wm0t88Ow3HMtoG1SG9rDYNwruAB84WPN1zo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=tz34MLndO3X7O46zL/y/EEfNhUgNjhU5uZPFfEdXzQKyKkifVze+ZDqF20FSW1pIKHjvDlDLl77wa3eqCKKWvo1C9R6/HTQ/J616axkQ6bNxrbf6pFnVtT3u6ThxlYSgfJEVYmPMmJyu7Nngt0xRxDSD2Zmv8yDAphFd/tMEiPI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=PhZMrL3n; arc=none smtp.client-ip=209.85.214.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1d72043fa06so45605ad.1 for ; Thu, 18 Jan 2024 23:48:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705650483; x=1706255283; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=; b=PhZMrL3nk+wOOaxOMXQa47MUncl0Hxz0c3CwC9/rFasFzE5q2he0g3fZeyfjzkmOQp Jq/0jFPVhElgp0wpw5tii65jqaqUDay0v4B1Id/CVbg2La9uHdt50CcDkpz1G26lQ8qu /BIaMYRO0uDlbBEmNoZ/lRhcpO6Skdej/KW9gud6BbkXYbDNoeLqVkSkB7IdmCEw+HUE roNRkhA42BLFo/ABEBlBmZbvkjRb3BamPKNZJbJxOBFsRUxBeC23vT4xHEUa/jZAWTD6 VC5UCt0f4GiIJDeWLE/CYBmRafe4umiFf7s2mWEA51cUjhb8ahlDoHUzD/LoJZpB+fvf 3nVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705650483; x=1706255283; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=B24ukB2YP6slYCgEIRmf+Hk92TM7//n+GaGioxlv0Yk=; b=Y7hN8H1AqA/YyVZ4/bQgigGByB4qGPGVM8cav6JRcBj4jOaeO5+4heewTuTZqLKuVp D7iIUkxGr34yCHv0Cx2xb6qtqEWSQB9GQJh1kAEnZasu4JLOzKvvCUGiu19QLgerU1nD gZO9i4H1b0UcAa1RSvFTRTaLRUrCwqCCiQKbInb1W2iFoDlYDdTABN2roI2Wjt/FK61A JhJNZzEplmfp3llHGYm3hRo8aOGXcwWB6q3/mZBfaVwsSVLb2KzNOKPTHMRqNm40mfla 3dM2jyHY0K6aWLn4ktpdFrTsqbinX29mUuRu+K/qd2OuCGGa9YKCKC5bK2Y08tapr6NO a0Hw== X-Gm-Message-State: AOJu0YyIs6I1psYaEu6IXcaB0WN7Ijzpjjn+p/2EMgfbiqyb93+kkfPd Z8KqNL3B9iBtyJaUd1IZGn3kBES1VxxKKOvueZcnr9N2odwNoJRGks0cLzSUn2lTQ7JwW6hzXWi 5womJ9gth8stORAQw/kLsHbDvYiznV2M5j7nY X-Received: by 2002:a17:903:110c:b0:1d5:4c40:bf01 with SMTP id n12-20020a170903110c00b001d54c40bf01mr127187plh.17.1705650482720; Thu, 18 Jan 2024 23:48:02 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <6667b799702e1815bd4e4f7744eddbc0bd042bb7.camel@kernel.org> <20240117193915.urwueineol7p4hg7@treble> In-Reply-To: From: Shakeel Butt Date: Thu, 18 Jan 2024 23:47:51 -0800 Message-ID: Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again To: Roman Gushchin Cc: Linus Torvalds , Josh Poimboeuf , Vlastimil Babka , Jeff Layton , Chuck Lever , Johannes Weiner , Michal Hocko , linux-kernel@vger.kernel.org, Jens Axboe , Tejun Heo , Vasily Averin , Michal Koutny , Waiman Long , Muchun Song , Jiri Kosina , cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jan 17, 2024 at 2:20=E2=80=AFPM Roman Gushchin wrote: > > On Wed, Jan 17, 2024 at 01:02:19PM -0800, Shakeel Butt wrote: > > On Wed, Jan 17, 2024 at 12:21=E2=80=AFPM Linus Torvalds > > wrote: > > > > > > On Wed, 17 Jan 2024 at 11:39, Josh Poimboeuf wr= ote: > > > > > > > > That's a good point. If the microbenchmark isn't likely to be even > > > > remotely realistic, maybe we should just revert the revert until if= /when > > > > somebody shows a real world impact. > > > > > > > > Linus, any objections to that? > > > > > > We use SLAB_ACCOUNT for much more common allocations like queued > > > signals, so I would tend to agree with Jeff that it's probably just > > > some not very interesting microbenchmark that shows any file locking > > > effects from SLAB_ALLOC, not any real use. > > > > > > That said, those benchmarks do matter. It's very easy to say "not > > > relevant in the big picture" and then the end result is that > > > everything is a bit of a pig. > > > > > > And the regression was absolutely *ENORMOUS*. We're not talking "a fe= w > > > percent". We're talking a 33% regression that caused the revert: > > > > > > https://lore.kernel.org/lkml/20210907150757.GE17617@xsang-OptiPlex= -9020/ > > > > > > I wish our SLAB_ACCOUNT wasn't such a pig. Rather than account every > > > single allocation, it would be much nicer to account at a bigger > > > granularity, possibly by having per-thread counters first before > > > falling back to the obj_cgroup_charge. Whatever. > > > > > > It's kind of stupid to have a benchmark that just allocates and > > > deallocates a file lock in quick succession spend lots of time > > > incrementing and decrementing cgroup charges for that repeated > > > alloc/free. > > > > > > However, that problem with SLAB_ACCOUNT is not the fault of file > > > locking, but more of a slab issue. > > > > > > End result: I think we should bring in Vlastimil and whoever else is > > > doing SLAB_ACCOUNT things, and have them look at that side. > > > > > > And then just enable SLAB_ACCOUNT for file locks. But very much look > > > at silly costs in SLAB_ACCOUNT first, at least for trivial > > > "alloc/free" patterns.. > > > > > > Vlastimil? Who would be the best person to look at that SLAB_ACCOUNT > > > thing? See commit 3754707bcc3e (Revert "memcg: enable accounting for > > > file lock caches") for the history here. > > > > > > > Roman last looked into optimizing this code path. I suspect > > mod_objcg_state() to be more costly than obj_cgroup_charge(). I will > > try to measure this path and see if I can improve it. > > It's roughly an equal split between mod_objcg_state() and obj_cgroup_char= ge(). > And each is comparable (by order of magnitude) to the slab allocation cos= t > itself. On the free() path a significant cost comes simple from reading > the objcg pointer (it's usually a cache miss). > > So I don't see how we can make it really cheap (say, less than 5% overhea= d) > without caching pre-accounted objects. > > I thought about merging of charge and stats handling paths, which _maybe_= can > shave off another 20-30%, but there still will be a double-digit% account= ing > overhead. > > I'm curious to hear other ideas and suggestions. > > Thanks! I profiled (perf record -a) the same benchmark i.e. lock1_processes on an icelake machine with 72 cores and got the following results: 12.72% lock1_processes [kernel.kallsyms] [k] mod_objcg_state 10.89% lock1_processes [kernel.kallsyms] [k] kmem_cache_free 8.40% lock1_processes [kernel.kallsyms] [k] slab_post_alloc_hook 8.36% lock1_processes [kernel.kallsyms] [k] kmem_cache_alloc 5.18% lock1_processes [kernel.kallsyms] [k] refill_obj_stock 5.18% lock1_processes [kernel.kallsyms] [k] _copy_from_user On annotating mod_objcg_state(), the following irq disabling instructions are taking 30% of its time. 6.64 =E2=94=82 pushfq 10.26=E2=94=82 popq -0x38(%rbp) 6.05 =E2=94=82 mov -0x38(%rbp),%rcx 7.60 =E2=94=82 cli For kmem_cache_free() & kmem_cache_alloc(), the following instruction was expensive, which corresponds to __update_cpu_freelist_fast(). 16.33 =E2=94=82 cmpxchg16b %gs:(%rsi) For slab_post_alloc_hook(), it's all over the place and refill_obj_stock() is very similar to mod_objcg_state(). I will dig more in the next couple of days.