Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp961422pxb; Wed, 6 Apr 2022 05:24:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxMS6O+IDoGJ5ohnyuTP+A6WkPzzP37/AE5UvmAuhPhf2mjP2fw/gIhd2bFG+7jg91HLn6t X-Received: by 2002:a63:3688:0:b0:382:8bf0:aad7 with SMTP id d130-20020a633688000000b003828bf0aad7mr7087855pga.602.1649247854548; Wed, 06 Apr 2022 05:24:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649247854; cv=none; d=google.com; s=arc-20160816; b=SHgp65uHJERfJTXHnpT92KZCk/KpvfK4p1c9jjSMyMpuoYY9gEBYhC8kl+bEMoNOnu VYp9fKa9EZ2yU3kVZ4Zysg8vIADGBoTxn5UPcvyFQA7c9Zqh19UUSR2qus0o2UAMEtB1 cd4aWzNkWfFXbYlF/7W+Vmj9kzLYeE/piM5rQ9yp0ymrGo5/tF7qZAQKityqb7R16NCZ FZm96hJiwUiYL/akjl+Lh6qHRlCrIVfSV8S7g8v/oJJW07x2TeyiLtMaAcWN5bwpLFxc dAqGAgaSh0AHSIVZB3KS7eICXOsKfcv5lewcTYHZatsQMvz5etALxCGbYLUNGhOREXEF 04fQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=tVD3oKGoxASbHmsHpyVRzfzb1b2/Pq8uJGtUAyAY3z4=; b=N1rYOb3a5fI2HoC/6bSq5rRymEeVe2UY6aNWb6gCTE3Hx7ZhStJMs0xulrKADwyWWv RhbSVVPQb2Wo2a3rsTiCROEPo2i1/D9swJIXjXPVTyfgmfY3lb3vj1ub7RWLcd8KO3Wq swJCh/MRRaNjBwewkbyqnwsUL7gN1WHdLC9y9SobWKD4uRrEeD3tz79B4WX/vYD+4+DL SBcekyZIaMWfdTIkJdyt0twdXEsSod0XjYOROvTj4/u8r6Lp0QD+5GFR0SrENvnuQUXt 0QtrHbOXR5cDixfNt/ndj1SlLerGzYlAqyBotUrXrSnqCRTx73TlhBMNWl+z5q7iqPNX rfBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Zomx+r0L; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id x8-20020a170902ec8800b00153b2d164c0si16589678plg.200.2022.04.06.05.24.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 05:24:14 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=Zomx+r0L; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AD7FB57423C; Wed, 6 Apr 2022 04:33:47 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1840794AbiDFBLy (ORCPT + 99 others); Tue, 5 Apr 2022 21:11:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349782AbiDEJv2 (ORCPT ); Tue, 5 Apr 2022 05:51:28 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D46F31C10C; Tue, 5 Apr 2022 02:49:27 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6D80D61577; Tue, 5 Apr 2022 09:49:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7E507C385A1; Tue, 5 Apr 2022 09:49:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1649152166; bh=IECC3XuBdG5GLST7u+ger1qa2WxBbPkq2C29u83lBFo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Zomx+r0LSDkYIwqDfbHrKHmf1AdBtgbChdNdj10pg+sAdWohmqS7MC3lYNpDhpZiw YvxXhT2zJOC/yjcZMEPlo26jotQUuMN5PjG9SCDXjOBsQqO7WJQsiCXPLhX6/rTsXm qH5x4nbgNqtgYdPj6L4lLGc4JtidmkJ7fLzVj7dA= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Fedor Pchelkin , Alexey Khoroshilov , Linus Torvalds , Sasha Levin , Christian Brauner Subject: [PATCH 5.15 679/913] fs: fd tables have to be multiples of BITS_PER_LONG Date: Tue, 5 Apr 2022 09:29:01 +0200 Message-Id: <20220405070400.189202123@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220405070339.801210740@linuxfoundation.org> References: <20220405070339.801210740@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Linus Torvalds [ Upstream commit 1c24a186398f59c80adb9a967486b65c1423a59d ] This has always been the rule: fdtables have several bitmaps in them, and as a result they have to be sized properly for bitmaps. We walk those bitmaps in chunks of 'unsigned long' in serveral cases, but even when we don't, we use the regular kernel bitops that are defined to work on arrays of 'unsigned long', not on some byte array. Now, the distinction between arrays of bytes and 'unsigned long' normally only really ends up being noticeable on big-endian systems, but Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps() could be called with an argument that wasn't even a multiple of BITS_PER_BYTE. And then it fails to do the proper copy even on little-endian machines. The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which didn't actually sanitize the fdtable size sufficiently, and never made sure it had the proper BITS_PER_LONG alignment. That's partly because the alignment historically came not from having to explicitly align things, but simply from previous fdtable sizes, and from count_open_files(), which counts the file descriptors by walking them one 'unsigned long' word at a time and thus naturally ends up doing sizing in the proper 'chunks of unsigned long'. But with the introduction of close_range(), we now have an external source of "this is how many files we want to have", and so sane_fdtable_size() needs to do a better job. This also adds that explicit alignment to alloc_fdtable(), although there it is mainly just for documentation at a source code level. The arithmetic we do there to pick a reasonable fdtable size already aligns the result sufficiently. In fact,clang notices that the added ALIGN() in that function doesn't actually do anything, and does not generate any extra code for it. It turns out that gcc ends up confusing itself by combining a previous constant-sized shift operation with the variable-sized shift operations in roundup_pow_of_two(). And probably due to that doesn't notice that the ALIGN() is a no-op. But that's a (tiny) gcc misfeature that doesn't matter. Having the explicit alignment makes sense, and would actually matter on a 128-bit architecture if we ever go there. This also adds big comments above both functions about how fdtable sizes have to have that BITS_PER_LONG alignment. Fixes: 60997c3d45d9 ("close_range: add CLOSE_RANGE_UNSHARE") Reported-by: Fedor Pchelkin Reported-by: Alexey Khoroshilov Link: https://lore.kernel.org/all/20220326114009.1690-1-aissur0002@gmail.com/ Tested-and-acked-by: Christian Brauner Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin --- fs/file.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) diff --git a/fs/file.c b/fs/file.c index 97d212a9b814..c01c29417ae6 100644 --- a/fs/file.c +++ b/fs/file.c @@ -87,6 +87,21 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt) copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds); } +/* + * Note how the fdtable bitmap allocations very much have to be a multiple of + * BITS_PER_LONG. This is not only because we walk those things in chunks of + * 'unsigned long' in some places, but simply because that is how the Linux + * kernel bitmaps are defined to work: they are not "bits in an array of bytes", + * they are very much "bits in an array of unsigned long". + * + * The ALIGN(nr, BITS_PER_LONG) here is for clarity: since we just multiplied + * by that "1024/sizeof(ptr)" before, we already know there are sufficient + * clear low bits. Clang seems to realize that, gcc ends up being confused. + * + * On a 128-bit machine, the ALIGN() would actually matter. In the meantime, + * let's consider it documentation (and maybe a test-case for gcc to improve + * its code generation ;) + */ static struct fdtable * alloc_fdtable(unsigned int nr) { struct fdtable *fdt; @@ -102,6 +117,7 @@ static struct fdtable * alloc_fdtable(unsigned int nr) nr /= (1024 / sizeof(struct file *)); nr = roundup_pow_of_two(nr + 1); nr *= (1024 / sizeof(struct file *)); + nr = ALIGN(nr, BITS_PER_LONG); /* * Note that this can drive nr *below* what we had passed if sysctl_nr_open * had been set lower between the check in expand_files() and here. Deal @@ -269,11 +285,25 @@ static unsigned int count_open_files(struct fdtable *fdt) return i; } +/* + * Note that a sane fdtable size always has to be a multiple of + * BITS_PER_LONG, since we have bitmaps that are sized by this. + * + * 'max_fds' will normally already be properly aligned, but it + * turns out that in the close_range() -> __close_range() -> + * unshare_fd() -> dup_fd() -> sane_fdtable_size() we can end + * up having a 'max_fds' value that isn't already aligned. + * + * Rather than make close_range() have to worry about this, + * just make that BITS_PER_LONG alignment be part of a sane + * fdtable size. Becuase that's really what it is. + */ static unsigned int sane_fdtable_size(struct fdtable *fdt, unsigned int max_fds) { unsigned int count; count = count_open_files(fdt); + max_fds = ALIGN(max_fds, BITS_PER_LONG); if (max_fds < NR_OPEN_DEFAULT) max_fds = NR_OPEN_DEFAULT; return min(count, max_fds); -- 2.34.1