Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp6117132pxv; Thu, 29 Jul 2021 06:59:19 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbcsif97EuzC718OFJ1oi3aBCjt0lSRaz+FNTdp7MDIQcVNBJxnW6OhIRPGe7YdpiJXqwG X-Received: by 2002:a5e:8401:: with SMTP id h1mr4347262ioj.75.1627567159574; Thu, 29 Jul 2021 06:59:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627567159; cv=none; d=google.com; s=arc-20160816; b=fUHkwQ3FhPlcGgg5E8++sg6UOKIdgYq2ql9dFvxiHMpkx0cPeTzV0TDhNsoSoBtbl2 dJw45X+LRAIAXLxPzOSGpFDEa8+e4viACF3jP4uxWAOidzagGNTlDSFBFpNqDudQv1UC IYxZskTtdW2UI+mnBr++S2fis/SSurlZEuQoaraukC0p1gHgydHxoZlgdTl6NjNFqm0R WlWDDHE1RhFPfShO3W9YShRVRDducB6UdzeG9CbqxzPgmfxvh6YeCgW8qvCdUcWHiOck aRGT73y6ZoTL93Hb+yGJmfbkZ+sA4Fa/EuYVODJj871RzW42TMw4bNKmdbBh9c0D49Jl 0D9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=nEZOn7+T0kEyeUzwgwnHo+iCHiqwYF0WbvZxkBPOSwQ=; b=szMSvdYKXkY6Fyl7mVKs5diai5xvHKQq9S3qVYP09MBzRs0uFCIN9ItLM7AM9K1li9 h/Hc16TLyGc1ObVJwbwmVSg3ktRkoLstNC5RNpj/PJ2KMgI11B+8mVd7Dp73Uee5ZyV3 j9o8eIFs8s0mwYdNbgXEbVt7WR32uGVgdyoOM7DXxxpR5N9aGkILcd1a2VfeBIg//q2n 4QK6Jcw1obCuNYhLm9ATRS1YUngfNYm/hAVJsMo12KYpsLKWqKrVoHVGj9W2bXJVD/pC KNUc5cUo/0drT9PEvPcOrSWE5A/ybdb3AxHn9Sp4DaoZPEcmUzqToHtdbA9Gy1PrjY4t Zo9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=bp+KIv9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t7si3694081jam.62.2021.07.29.06.59.04; Thu, 29 Jul 2021 06:59:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=bp+KIv9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238024AbhG2N4y (ORCPT + 99 others); Thu, 29 Jul 2021 09:56:54 -0400 Received: from mail.kernel.org ([198.145.29.99]:46662 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237966AbhG2N4p (ORCPT ); Thu, 29 Jul 2021 09:56:45 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 2F6BA60FD7; Thu, 29 Jul 2021 13:56:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1627567002; bh=xGNvBE/H2gr1ZrfITWBYQOrVkC1DrqRTEWGzVmIQNdU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bp+KIv9EDNSHerNRkwkmYqznRKiNuwaZeWhndtXcRZnX70PNdBKRZFF3KvQlhVvh0 LicCbmZdNGWbJB8hglrHPNwaIHe9hzUyUT5RsFR8NsoWQ0L4mxv/E0EwD8brlDPFXF bsNkwg8x55X8c8dYVpxhYzUWV/9RwKRsqtsmiSQ0= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Miklos Szeredi , Linus Torvalds Subject: [PATCH 4.19 05/17] af_unix: fix garbage collect vs MSG_PEEK Date: Thu, 29 Jul 2021 15:54:06 +0200 Message-Id: <20210729135137.432633743@linuxfoundation.org> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20210729135137.260993951@linuxfoundation.org> References: <20210729135137.260993951@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Miklos Szeredi commit cbcf01128d0a92e131bd09f1688fe032480b65ca upstream. unix_gc() assumes that candidate sockets can never gain an external reference (i.e. be installed into an fd) while the unix_gc_lock is held. Except for MSG_PEEK this is guaranteed by modifying inflight count under the unix_gc_lock. MSG_PEEK does not touch any variable protected by unix_gc_lock (file count is not), yet it needs to be serialized with garbage collection. Do this by locking/unlocking unix_gc_lock: 1) increment file count 2) lock/unlock barrier to make sure incremented file count is visible to garbage collection 3) install file into fd This is a lock barrier (unlike smp_mb()) that ensures that garbage collection is run completely before or completely after the barrier. Cc: Signed-off-by: Miklos Szeredi Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- net/unix/af_unix.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 2 deletions(-) --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1517,6 +1517,53 @@ out: return err; } +static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + scm->fp = scm_fp_dup(UNIXCB(skb).fp); + + /* + * Garbage collection of unix sockets starts by selecting a set of + * candidate sockets which have reference only from being in flight + * (total_refs == inflight_refs). This condition is checked once during + * the candidate collection phase, and candidates are marked as such, so + * that non-candidates can later be ignored. While inflight_refs is + * protected by unix_gc_lock, total_refs (file count) is not, hence this + * is an instantaneous decision. + * + * Once a candidate, however, the socket must not be reinstalled into a + * file descriptor while the garbage collection is in progress. + * + * If the above conditions are met, then the directed graph of + * candidates (*) does not change while unix_gc_lock is held. + * + * Any operations that changes the file count through file descriptors + * (dup, close, sendmsg) does not change the graph since candidates are + * not installed in fds. + * + * Dequeing a candidate via recvmsg would install it into an fd, but + * that takes unix_gc_lock to decrement the inflight count, so it's + * serialized with garbage collection. + * + * MSG_PEEK is special in that it does not change the inflight count, + * yet does install the socket into an fd. The following lock/unlock + * pair is to ensure serialization with garbage collection. It must be + * done between incrementing the file count and installing the file into + * an fd. + * + * If garbage collection starts after the barrier provided by the + * lock/unlock, then it will see the elevated refcount and not mark this + * as a candidate. If a garbage collection is already in progress + * before the file count was incremented, then the lock/unlock pair will + * ensure that garbage collection is finished before progressing to + * installing the fd. + * + * (*) A -> B where B is on the queue of A or B is on the queue of C + * which is on the queue of listening socket A. + */ + spin_lock(&unix_gc_lock); + spin_unlock(&unix_gc_lock); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) { int err = 0; @@ -2142,7 +2189,7 @@ static int unix_dgram_recvmsg(struct soc sk_peek_offset_fwd(sk, size); if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); } err = (flags & MSG_TRUNC) ? skb->len - skip : size; @@ -2383,7 +2430,7 @@ unlock: /* It is questionable, see note in unix_dgram_recvmsg. */ if (UNIXCB(skb).fp) - scm.fp = scm_fp_dup(UNIXCB(skb).fp); + unix_peek_fds(&scm, skb); sk_peek_offset_fwd(sk, chunk);