Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp1638764pxy; Mon, 2 Aug 2021 06:50:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy/z43aqKfEciTo3IYfTaiY1EKmKbxfMGIvBY6nZ/1Q5vB3krvFDs5fN2c8VBIk3SSjgc4D X-Received: by 2002:a6b:ba02:: with SMTP id k2mr1444745iof.164.1627912244819; Mon, 02 Aug 2021 06:50:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627912244; cv=none; d=google.com; s=arc-20160816; b=F7p9DVwShGLclSjLBA1woZ/XEI1yH8vKHXN+zxzxPlXfXLY18LJ3g1bVvAI/0gF+ol wYMzDu+yncwJ3YYpTctbNP9U0tK+nyT5oRLHqIFu6LJodIdtGdLjA5rGD2t7Y2oTvzQ5 HV7vFM6+mH260pT1/RG4zF1ZMCwi8OBt3zAuol7QZ7exhrfytla9cjB8DxvwFCza6WSO 5IiTkY8HwXWKHyjFcOUyQa9XcERUKXFpn+bCoS1w+n7LCrnPN9l/L7WCEYIg3LavbQVr TDTRIoY0ayWG5jT9YX4PO9Q77eIWlkNbykOfBoZ6i8G+aXF2db3dECIprzH/4p0FjJRE bsjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=lZeasCzPghBOxAS7XnAH8VzHbCeAJZOIpLLkB29jTx0=; b=w9CI45n76W/AtKZsEJ6YzZ6UVQst6fwE/Z9C4aX99jT/KOS0XavNZInLlD4J72o/19 EV8MhG71ATFG5+vjFncHrG9Y3P5BaSyCWVdCfHa7O2GiSaX8qJENK+x6Mb6tURH9jOKp ruZ8VosFN3cVcBb82LrqcvktKCoGq9MSmMbjV6wy01ZRAEtSfS+/9UcibU/RO2UtUhy3 BXDxEH/uZBdoyomJvK9xf5/j0nrXa3kMnPlvzJtk6fT12mhjTLpqMJ0E5icyjSKGUWCt hksEBlepBsp2Tr/iPlOIwSJgXkLxHQqT7Qn0mTnJx+6v2TJtTvLU74Pu4M7vrzvWx49Z F9Fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=TCrcFecc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g1si12850522ile.121.2021.08.02.06.50.33; Mon, 02 Aug 2021 06:50:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=TCrcFecc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234562AbhHBNs5 (ORCPT + 99 others); Mon, 2 Aug 2021 09:48:57 -0400 Received: from mail.kernel.org ([198.145.29.99]:57094 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233985AbhHBNrS (ORCPT ); Mon, 2 Aug 2021 09:47:18 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 598DB610FD; Mon, 2 Aug 2021 13:46:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1627912019; bh=mogSHeVLiFpUVrpUMHycIe44kxnb76AiEJBxF9eVVnc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TCrcFeccAjOzuCxH8m1MnCJBsNHOlYOttbtCgIzqGG/w1OIsJuv9n/xf85JH/3Rh8 +RIXbUg3vhRsn8xGz/o+r6IeVSqQKOAipIyCGYg3Dzrdm3d/nzoRtfnsXtu9SEwKmb 5Wa6hUufV8AoSLRoQT5uEw9cz8JYDhFoSn60lZUw= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Miklos Szeredi , Linus Torvalds Subject: [PATCH 4.9 04/32] af_unix: fix garbage collect vs MSG_PEEK Date: Mon, 2 Aug 2021 15:44:24 +0200 Message-Id: <20210802134333.066918619@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210802134332.931915241@linuxfoundation.org> References: <20210802134332.931915241@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Miklos Szeredi commit cbcf01128d0a92e131bd09f1688fe032480b65ca upstream. unix_gc() assumes that candidate sockets can never gain an external reference (i.e. be installed into an fd) while the unix_gc_lock is held. Except for MSG_PEEK this is guaranteed by modifying inflight count under the unix_gc_lock. MSG_PEEK does not touch any variable protected by unix_gc_lock (file count is not), yet it needs to be serialized with garbage collection. Do this by locking/unlocking unix_gc_lock: 1) increment file count 2) lock/unlock barrier to make sure incremented file count is visible to garbage collection 3) install file into fd This is a lock barrier (unlike smp_mb()) that ensures that garbage collection is run completely before or completely after the barrier. Cc: Signed-off-by: Miklos Szeredi Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- net/unix/af_unix.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 2 deletions(-) --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1507,6 +1507,53 @@ out: return err; } +static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + scm->fp = scm_fp_dup(UNIXCB(skb).fp); + + /* + * Garbage collection of unix sockets starts by selecting a set of + * candidate sockets which have reference only from being in flight + * (total_refs == inflight_refs). This condition is checked once during + * the candidate collection phase, and candidates are marked as such, so + * that non-candidates can later be ignored. While inflight_refs is + * protected by unix_gc_lock, total_refs (file count) is not, hence this + * is an instantaneous decision. + * + * Once a candidate, however, the socket must not be reinstalled into a + * file descriptor while the garbage collection is in progress. + * + * If the above conditions are met, then the directed graph of + * candidates (*) does not change while unix_gc_lock is held. + * + * Any operations that changes the file count through file descriptors + * (dup, close, sendmsg) does not change the graph since candidates are + * not installed in fds. + * + * Dequeing a candidate via recvmsg would install it into an fd, but + * that takes unix_gc_lock to decrement the inflight count, so it's + * serialized with garbage collection. + * + * MSG_PEEK is special in that it does not change the inflight count, + * yet does install the socket into an fd. The following lock/unlock + * pair is to ensure serialization with garbage collection. It must be + * done between incrementing the file count and installing the file into + * an fd. + * + * If garbage collection starts after the barrier provided by the + * lock/unlock, then it will see the elevated refcount and not mark this + * as a candidate. If a garbage collection is already in progress + * before the file count was incremented, then the lock/unlock pair will + * ensure that garbage collection is finished before progressing to + * installing the fd. + * + * (*) A -> B where B is on the queue of A or B is on the queue of C + * which is on the queue of listening socket A. + */ + spin_lock(&unix_gc_lock); + spin_unlock(&unix_gc_lock); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) { int err = 0; @@ -2140,7 +2187,7 @@ static int unix_dgram_recvmsg(struct soc sk_peek_offset_fwd(sk, size); if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); } err = (flags & MSG_TRUNC) ? skb->len - skip : size; @@ -2385,7 +2432,7 @@ unlock: /* It is questionable, see note in unix_dgram_recvmsg. */ if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); sk_peek_offset_fwd(sk, chunk);