Received: by 2002:a05:7412:8d1c:b0:fa:4c10:6cad with SMTP id bj28csp498271rdb; Wed, 17 Jan 2024 08:16:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IEv01cQYLTkTev6o/GmvybpVNmdG7yI/8ho+ZYg0QQcMq5i60+GRcIKKdFRXfSdmU4FENIF X-Received: by 2002:a05:6122:4e8f:b0:4b7:8d7c:346f with SMTP id gf15-20020a0561224e8f00b004b78d7c346fmr5440200vkb.9.1705508185945; Wed, 17 Jan 2024 08:16:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705508185; cv=pass; d=google.com; s=arc-20160816; b=K4MuHulhF+Noilu3DOjkfhqzt3ICZeFIjlfF1agIZrlbWdZJ3ee+tKvlOaV03MpyZg h+OGk4m9FR6H0oCdmY19xuq/yTs4SQ7WTAWrGVHVrj3Aqn7qap9SBuMHrduwbwPQTdHe AE9Q/tskiznkOydfM62QB7co555Tdgd/Rt7k5LH87jndvd+mH5dveqPYScjf801lNqJJ SvDqTC2FBiEVJ98/30ZiiHKyVprBJ7pQdFpHGUJcN5SefIOUnuB1+N3u4RIQcbJE7HiE YcNSzax0iFVK0HcxWL8UMGWOQT6NOLjon03Gk5db6OIuoMAPr14guRb0dG2IcFxQgJ3k ec0w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=o3K4TPl78GOzKHIjcRo4cqwYWgTS0PNWTuRXct9Lkh0=; fh=mzZ5OKkyWbFHsrLyCorOBXM9JoluhtmgZuUWxRX6aw0=; b=FjfGzCmsE/LSvsMwYCad64JcG2kfXPIkefyhF3PFXKfa4h2bgws+qZwOCLuEZnifzC 2JYzVH5j7w6GpG4PxUrKbxxwibk9kL/h7+WZqmNtmcFcKo/ENH/deAwY9Ocv3o1YfxQ+ FptjAA1h167r8wVyZWsUq+ZK58sGXaikaUel2tx2cZEAG4jlY1lPFkLj2UdltJf0sb9R PzZV2+1fYZVxNf5pjDoAZ2sku94iqaQ8dKStZzMHO8SG18K/4Yn+HyQ3+x69I6MExIqk BtB0OYicj7pWKb9LMpbcCWlHljnS/hyfsgrCutYzxBw95pCJQoMGIs8oKAvZ089Kr6dc sWfw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WGlSJQ3j; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-29200-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29200-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id ez12-20020a0561302f0c00b007ce41c4a5d8si1672166uab.20.2024.01.17.08.16.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 08:16:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-29200-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=WGlSJQ3j; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-29200-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29200-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id D521D1C26721 for ; Wed, 17 Jan 2024 16:15:25 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 078C6224E4; Wed, 17 Jan 2024 16:15:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="WGlSJQ3j" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D8632232D; Wed, 17 Jan 2024 16:15:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705508108; cv=none; b=CmuX8E3oEmtTzKUEa0W+/PU6zJ/rZgPT+RqH6dqU139jKH9yNknlx/8GoYkIvkgytNgTVAONp3MqJ+haY3aj25+eZC0lT0PsZSoYEOVpxhnB4LwUBfTNfKU5nU2ERK3aVrtxgmbcoIDya3MgAUzE9Nl1xYAxYt2LiGRV9NwaMpI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705508108; c=relaxed/simple; bh=eP1aE7eGJrI56wu/JUZuR/oHJxD8hIQm2thQjg7z0tE=; h=Received:DKIM-Signature:From:To:Cc:Subject:Date:Message-ID: X-Mailer:In-Reply-To:References:MIME-Version: Content-Transfer-Encoding; b=dT6BBTW2j2GO4hOlOpxYOEwh4LpSIeek2gfL5Q0CBrYwT/Th7OhMK9HyeTaMwU/PNS0Tz1TmsboLYw2tywGFWzWnEvSqwmiyNfKWqQ+4Iq6IFtBBVGUt8l4rDUV9gYrUFNGHO6rMEgzsIn/fKka4FbNInsFH1rVR4dgYNExP6+E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=WGlSJQ3j; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4ED50C43394; Wed, 17 Jan 2024 16:15:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705508108; bh=eP1aE7eGJrI56wu/JUZuR/oHJxD8hIQm2thQjg7z0tE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=WGlSJQ3jQyZPQFC++EdqLNjVx8Yadpn7xElFro4BLyKUqFFpQuAncr1EOLlt3uyuI +sxXmuKxtLjUOA+r4tkrjzwa0w2fXlZGffuba9K/H5yHuoLqmQz6tLk8/uuRsnNwvp 0OESLveyoKN0OwCncqn6LD6mHP7NTY6OGGx7qOg0g8puA1fNuqUsAJCkH0E6nK2ru8 j4UkgHz40xh0P0O1yVFYkRg0lALH6UXwp8Ff3UgZgnlfnXSn/RGd8P6ZwFHx957AUc 3JfjzKdG1G9q/wZM6RZ1qlXsBSPYbFSAjWGhKG1QGsZTEw1T2W7rDmqsCvi4cGD8dl B1HUqFkOztYmA== From: Josh Poimboeuf To: Linus Torvalds , Jeff Layton , Chuck Lever , Shakeel Butt , Roman Gushchin , Johannes Weiner , Michal Hocko Cc: linux-kernel@vger.kernel.org, Jens Axboe , Tejun Heo , Vasily Averin , Michal Koutny , Waiman Long , Muchun Song , Jiri Kosina , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again Date: Wed, 17 Jan 2024 08:14:43 -0800 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit A container can exceed its memcg limits by allocating a bunch of file locks. This bug was originally fixed by commit 0f12156dff28 ("memcg: enable accounting for file lock caches"), but was later reverted by commit 3754707bcc3e ("Revert "memcg: enable accounting for file lock caches"") due to performance issues. Unfortunately those performance issues were never addressed and the bug has remained unfixed for over two years. Fix it by default but allow users to disable it with a cmdline option (flock_accounting=off). Signed-off-by: Josh Poimboeuf --- .../admin-guide/kernel-parameters.txt | 17 +++++++++++ fs/locks.c | 30 +++++++++++++++++-- 2 files changed, 45 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6ee0f9a5da70..91987b06bc52 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1527,6 +1527,23 @@ See Documentation/admin-guide/sysctl/net.rst for fb_tunnels_only_for_init_ns + flock_accounting= + [KNL] Enable/disable accounting for kernel + memory allocations related to file locks. + Format: { on | off } + Default: on + on: Enable kernel memory accounting for file + locks. This prevents task groups from + exceeding their memcg allocation limits. + However, it may cause slowdowns in the + flock() system call. + off: Disable kernel memory accounting for + file locks. This may allow a rogue task + to DoS the system by forcing the kernel + to allocate memory beyond the task + group's memcg limits. Not recommended + unless you have trusted user space. + floppy= [HW] See Documentation/admin-guide/blockdev/floppy.rst. diff --git a/fs/locks.c b/fs/locks.c index cc7c117ee192..235ac56c557d 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2905,15 +2905,41 @@ static int __init proc_locks_init(void) fs_initcall(proc_locks_init); #endif +static bool flock_accounting __ro_after_init = true; + +static int __init flock_accounting_cmdline(char *str) +{ + if (!str) + return -EINVAL; + + if (!strcmp(str, "off")) + flock_accounting = false; + else if (!strcmp(str, "on")) + flock_accounting = true; + else + return -EINVAL; + + return 0; +} +early_param("flock_accounting", flock_accounting_cmdline); + +#define FLOCK_ACCOUNTING_MSG "WARNING: File lock accounting is disabled, container-triggered host memory exhaustion possible!\n" + static int __init filelock_init(void) { int i; + slab_flags_t flags = SLAB_PANIC; + + if (!flock_accounting) + pr_err(FLOCK_ACCOUNTING_MSG); + else + flags |= SLAB_ACCOUNT; flctx_cache = kmem_cache_create("file_lock_ctx", - sizeof(struct file_lock_context), 0, SLAB_PANIC, NULL); + sizeof(struct file_lock_context), 0, flags, NULL); filelock_cache = kmem_cache_create("file_lock_cache", - sizeof(struct file_lock), 0, SLAB_PANIC, NULL); + sizeof(struct file_lock), 0, flags, NULL); for_each_possible_cpu(i) { struct file_lock_list_struct *fll = per_cpu_ptr(&file_lock_list, i); -- 2.43.0