Received: by 2002:a25:23cc:0:0:0:0:0 with SMTP id j195csp406790ybj; Tue, 5 May 2020 00:31:11 -0700 (PDT) X-Google-Smtp-Source: APiQypLxrdFhzadYKV4akuuSo/y3dcphgm9mFtqwS9j1A9gLOcjKSXRL9q0QurYjdfVwxFge39uR X-Received: by 2002:a17:906:f90d:: with SMTP id lc13mr1274464ejb.367.1588663870967; Tue, 05 May 2020 00:31:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588663870; cv=none; d=google.com; s=arc-20160816; b=VmHdT+2QM2vLiBXc1zpiBL9d/L5glCfAH2Nj+v30qplmFAIF5ppwBdIPWI6eJgfIa2 XJqO2V2f+oC0FFVe2mlqUgiSljK+2j78lxx0aDrYzVMx+gjR5d/TIuMWzezz+mJkHHFo v3rpnKJejANQH5sXhgQOQnv6+8Di8yJKY4oi1mSQIYIZ4Xw/ZvoB+2G06GFAToBU524O PCjHLbeqcgPevlJJWAIGv26oUs3fPx4zoLZql8F/4b0/Bukib8xtsaHZtoWeHLdt/MHT jIKiIgPGCRKsaZcLBs1zzvsyn04evfyWSfCMA1s6Zfqu7c2DSkEcyS8hwAFh00M5VfGK nb/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:ironport-sdr:dkim-signature; bh=suz/Lj4eL3SscpJVgCrPtVKpBmbxRG8LOkx5H/LmA10=; b=mH06sAR9j5b1mivn57Ka9fGwrg/K2RInBoi1AWO6vtxdjscAahyswmTuzE9RRevZCp pKqZ2S2ZVswjbd3INW+1OGrRMLmpP8mpHwB2fAPG1lyu90v6cxTuQANSnjLKSDbNmY/j lTg06mi8Tuooh/6yoLPaW3ZD3MVph4qNy3nPAgyNlw68bzwF4aqS2INqT5GIvlrgRBUq H2hay1mHJdzDvu0daIkn6uEJx3rMfLO2Atnk1xm3afK1NCaSm6c3vBnvjMWQ30/VMkpe 5XBge7aY8XzWS7NKFwdy/t3CWv3nZwzrYfW+We07KpX5s0y2fYApVUoHCEyRj0VkkGjT 0Lzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=kIclmsHg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e16si613019ejr.309.2020.05.05.00.30.47; Tue, 05 May 2020 00:31:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=kIclmsHg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728233AbgEEH3Z (ORCPT + 99 others); Tue, 5 May 2020 03:29:25 -0400 Received: from smtp-fw-6001.amazon.com ([52.95.48.154]:45741 "EHLO smtp-fw-6001.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725320AbgEEH3Z (ORCPT ); Tue, 5 May 2020 03:29:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1588663764; x=1620199764; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version; bh=suz/Lj4eL3SscpJVgCrPtVKpBmbxRG8LOkx5H/LmA10=; b=kIclmsHgUKjEAWFYd/wHKIW8aZjCXNvb+5VrICLxvOa0SWAiyYCwhiem Ng9TFmo/soSCeJmQH1cxyUmROn5RQ47nDpl3ibL7lc1XnVwvX7XWOlHe+ dMcbMHh+0eigPhstwYClibAfJssDMrICviliX2JHB9OisBmtzDVZKsEHL I=; IronPort-SDR: rk6j1thN2DKX5PH9SsVvWLeqcn2EY9c/7eJ09GG7rUFi+L5boZgKIOdRWclh63hS22Qfj7Y/vX fU8m+x07yUYg== X-IronPort-AV: E=Sophos;i="5.73,354,1583193600"; d="scan'208";a="30080096" Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO email-inbound-relay-2a-90c42d1d.us-west-2.amazon.com) ([10.43.8.6]) by smtp-border-fw-out-6001.iad6.amazon.com with ESMTP; 05 May 2020 07:29:10 +0000 Received: from EX13MTAUEA002.ant.amazon.com (pdx4-ws-svc-p6-lb7-vlan3.pdx.amazon.com [10.170.41.166]) by email-inbound-relay-2a-90c42d1d.us-west-2.amazon.com (Postfix) with ESMTPS id DF9A8A0618; Tue, 5 May 2020 07:29:08 +0000 (UTC) Received: from EX13D31EUA001.ant.amazon.com (10.43.165.15) by EX13MTAUEA002.ant.amazon.com (10.43.61.77) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 5 May 2020 07:29:07 +0000 Received: from u886c93fd17d25d.ant.amazon.com (10.43.160.180) by EX13D31EUA001.ant.amazon.com (10.43.165.15) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Tue, 5 May 2020 07:29:03 +0000 From: SeongJae Park To: CC: , , , , , , SeongJae Park Subject: [PATCH net 1/2] Revert "coallocate socket_wq with socket itself" Date: Tue, 5 May 2020 09:28:40 +0200 Message-ID: <20200505072841.25365-2-sjpark@amazon.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200505072841.25365-1-sjpark@amazon.com> References: <20200505072841.25365-1-sjpark@amazon.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.43.160.180] X-ClientProxiedBy: EX13D20UWC003.ant.amazon.com (10.43.162.18) To EX13D31EUA001.ant.amazon.com (10.43.165.15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: SeongJae Park This reverts commit 333f7909a8573145811c4ab7d8c9092301707721. The commit 6d7855c54e1e ("sockfs: switch to ->free_inode()") made the deallocation of 'socket_alloc' to be done asynchronously using RCU, as same to 'sock.wq'. And the following commit 333f7909a857 ("coallocate socket_sq with socket itself") made those to have same life cycle. The changes made the code much more simple, but also made 'socket_alloc' live longer than before. For the reason, user programs intensively repeating allocations and deallocations of sockets could cause memory pressure on recent kernels. To avoid the problem, this commit separates the life cycle of 'socket_alloc' and 'sock.wq' again. The following commit will make the deallocation of 'socket_alloc' to be done synchronously again. --- drivers/net/tap.c | 5 +++-- drivers/net/tun.c | 8 +++++--- include/linux/if_tap.h | 1 + include/linux/net.h | 4 ++-- include/net/sock.h | 4 ++-- net/core/sock.c | 2 +- net/socket.c | 19 ++++++++++++++----- 7 files changed, 28 insertions(+), 15 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 1f4bdd94407a..7912039a4846 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -518,7 +518,8 @@ static int tap_open(struct inode *inode, struct file *file) goto err; } - init_waitqueue_head(&q->sock.wq.wait); + RCU_INIT_POINTER(q->sock.wq, &q->wq); + init_waitqueue_head(&q->wq.wait); q->sock.type = SOCK_RAW; q->sock.state = SS_CONNECTED; q->sock.file = file; @@ -576,7 +577,7 @@ static __poll_t tap_poll(struct file *file, poll_table *wait) goto out; mask = 0; - poll_wait(file, &q->sock.wq.wait, wait); + poll_wait(file, &q->wq.wait, wait); if (!ptr_ring_empty(&q->ring)) mask |= EPOLLIN | EPOLLRDNORM; diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 650c937ed56b..16a5f3b80edf 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -160,6 +160,7 @@ struct tun_pcpu_stats { struct tun_file { struct sock sk; struct socket socket; + struct socket_wq wq; struct tun_struct __rcu *tun; struct fasync_struct *fasync; /* only used for fasnyc */ @@ -2173,7 +2174,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) goto out; } - add_wait_queue(&tfile->socket.wq.wait, &wait); + add_wait_queue(&tfile->wq.wait, &wait); while (1) { set_current_state(TASK_INTERRUPTIBLE); @@ -2193,7 +2194,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) } __set_current_state(TASK_RUNNING); - remove_wait_queue(&tfile->socket.wq.wait, &wait); + remove_wait_queue(&tfile->wq.wait, &wait); out: *err = error; @@ -3434,7 +3435,8 @@ static int tun_chr_open(struct inode *inode, struct file * file) tfile->flags = 0; tfile->ifindex = 0; - init_waitqueue_head(&tfile->socket.wq.wait); + init_waitqueue_head(&tfile->wq.wait); + RCU_INIT_POINTER(tfile->socket.wq, &tfile->wq); tfile->socket.file = file; tfile->socket.ops = &tun_socket_ops; diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h index 915a187cfabd..8e66866c11be 100644 --- a/include/linux/if_tap.h +++ b/include/linux/if_tap.h @@ -62,6 +62,7 @@ struct tap_dev { struct tap_queue { struct sock sk; struct socket sock; + struct socket_wq wq; int vnet_hdr_sz; struct tap_dev __rcu *tap; struct file *file; diff --git a/include/linux/net.h b/include/linux/net.h index 6451425e828f..28c929bebb4a 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -116,11 +116,11 @@ struct socket { unsigned long flags; + struct socket_wq *wq; + struct file *file; struct sock *sk; const struct proto_ops *ops; - - struct socket_wq wq; }; struct vm_area_struct; diff --git a/include/net/sock.h b/include/net/sock.h index 328564525526..20799a333570 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1841,7 +1841,7 @@ static inline void sock_graft(struct sock *sk, struct socket *parent) { WARN_ON(parent->sk); write_lock_bh(&sk->sk_callback_lock); - rcu_assign_pointer(sk->sk_wq, &parent->wq); + rcu_assign_pointer(sk->sk_wq, parent->wq); parent->sk = sk; sk_set_socket(sk, parent); sk->sk_uid = SOCK_INODE(parent)->i_uid; @@ -2119,7 +2119,7 @@ static inline void sock_poll_wait(struct file *filp, struct socket *sock, poll_table *p) { if (!poll_does_not_wait(p)) { - poll_wait(filp, &sock->wq.wait, p); + poll_wait(filp, &sock->wq->wait, p); /* We need to be sure we are in sync with the * socket flags modification. * diff --git a/net/core/sock.c b/net/core/sock.c index 8f71684305c3..7fa3241b5507 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2869,7 +2869,7 @@ void sock_init_data(struct socket *sock, struct sock *sk) if (sock) { sk->sk_type = sock->type; - RCU_INIT_POINTER(sk->sk_wq, &sock->wq); + RCU_INIT_POINTER(sk->sk_wq, sock->wq); sock->sk = sk; sk->sk_uid = SOCK_INODE(sock)->i_uid; } else { diff --git a/net/socket.c b/net/socket.c index 2eecf1517f76..e274ae4b45e4 100644 --- a/net/socket.c +++ b/net/socket.c @@ -249,13 +249,20 @@ static struct kmem_cache *sock_inode_cachep __ro_after_init; static struct inode *sock_alloc_inode(struct super_block *sb) { struct socket_alloc *ei; + struct socket_wq *wq; ei = kmem_cache_alloc(sock_inode_cachep, GFP_KERNEL); if (!ei) return NULL; - init_waitqueue_head(&ei->socket.wq.wait); - ei->socket.wq.fasync_list = NULL; - ei->socket.wq.flags = 0; + wq = kmalloc(sizeof(*wq), GFP_KERNEL); + if (!wq) { + kmem_cache_free(sock_inode_cachep, ei); + return NULL; + } + init_waitqueue_head(&wq->wait); + wq->fasync_list = NULL; + wq->flags = 0; + ei->socket.wq = wq; ei->socket.state = SS_UNCONNECTED; ei->socket.flags = 0; @@ -271,6 +278,7 @@ static void sock_free_inode(struct inode *inode) struct socket_alloc *ei; ei = container_of(inode, struct socket_alloc, vfs_inode); + kfree(ei->socket.wq); kmem_cache_free(sock_inode_cachep, ei); } @@ -610,7 +618,7 @@ static void __sock_release(struct socket *sock, struct inode *inode) module_put(owner); } - if (sock->wq.fasync_list) + if (sock->wq->fasync_list) pr_err("%s: fasync list not empty!\n", __func__); if (!sock->file) { @@ -1299,12 +1307,13 @@ static int sock_fasync(int fd, struct file *filp, int on) { struct socket *sock = filp->private_data; struct sock *sk = sock->sk; - struct socket_wq *wq = &sock->wq; + struct socket_wq *wq; if (sk == NULL) return -EINVAL; lock_sock(sk); + wq = sock->wq; fasync_helper(fd, filp, on, &wq->fasync_list); if (!wq->fasync_list) -- 2.17.1