Received: by 2002:a05:7412:b995:b0:f9:9502:5bb8 with SMTP id it21csp6286181rdb; Mon, 1 Jan 2024 17:10:00 -0800 (PST) X-Google-Smtp-Source: AGHT+IFJ8dmZNnM8x5RZVUTD6nkq6XjbUPdT1hRH1ga818ZohWjknCDqA6je05WIdPsLp+YHcX8O X-Received: by 2002:a17:90a:d987:b0:28c:825e:2cf4 with SMTP id d7-20020a17090ad98700b0028c825e2cf4mr5373410pjv.41.1704157800169; Mon, 01 Jan 2024 17:10:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704157800; cv=none; d=google.com; s=arc-20160816; b=aE34skUNc5Xa5IwpQN6W/NOS+huqa2J7kFB1bjugOFQ+lJuv6KztZwQObmyNoTqVUH YQGcCpfXsNQxCCutJIReTUQDZyjFIvDdq0i4Z7Oec3LsphXm8kjo+cEdvY0OZTAWKxzR z/KhSxLnK+gXEzGN/0pyxXWbFhHNyxqMTSXBSdm1KCVnkeSf6NIhOf1IY9MRwjI9oD+m kbF+eIMabaOKMpb1IOe1ALJXZT/L8Vgx23D+hrEOvfPG2EcyWo7K6mMwoKcv6O3ideut attgCBtuUjQzyNWRytF+9jnfZGzvRZuXb/oWREZBBRaiXLPusTAI57FiKQ6sZPXWk9WZ RsFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:list-unsubscribe:list-subscribe:list-id :precedence:date:message-id; bh=KsnRjfzApOdeFNM5KfBIlUGTxd7WDYYKae96+a4bfn4=; fh=edFWOogpuT73DPSs4b8OhQEcoxf0Uezvsyb15fm2eus=; b=xRy2KY+MBhKJ0jzaqgugptVBuAW0Q9cM2j34bX6XAxKsCWJo0XoLOZvp7RLR9RXV5h hue7AU+IDee0+YvCyQ+JaZJmunU76GHxPoegHpCg71sZfgOOAy+INOuWxc3zNq7/9pZH iwC0vu7PQjhScEhE8saokvLZqoOl+b+rUsLCDakes7IX4iCE5YEwkJmVeWBMRXsXFYXh s1GcrMkgy4E5Db/CbLD6pvxnr7mAaD2ddIM3yPAfpfIwEemXM8aWEhV/brW5vnWN8VmP 6S9BGhHhJ9FfMe+ovGSFxZyXqqJHl8wDGi1cIAvdq2M5sx1HSZuC3cgt+P8ruYHBhTTw WpLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-13965-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-13965-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id k15-20020a17090a590f00b0028bf69208a9si18402556pji.86.2024.01.01.17.09.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Jan 2024 17:10:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-13965-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-13965-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-13965-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 13CBEB211E7 for ; Tue, 2 Jan 2024 01:09:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9610EA4D; Tue, 2 Jan 2024 01:09:47 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A87238F; Tue, 2 Jan 2024 01:09:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4T3vrG2VwjzsV2R; Tue, 2 Jan 2024 09:09:02 +0800 (CST) Received: from dggpeml500026.china.huawei.com (unknown [7.185.36.106]) by mail.maildlp.com (Postfix) with ESMTPS id D747914011A; Tue, 2 Jan 2024 09:09:35 +0800 (CST) Received: from [10.174.178.66] (10.174.178.66) by dggpeml500026.china.huawei.com (7.185.36.106) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Tue, 2 Jan 2024 09:09:35 +0800 Message-ID: Date: Tue, 2 Jan 2024 09:09:34 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.0.2 Subject: Re: [PATCH v2] ipc/mqueue: fix potential sleeping issue in mqueue_flush_file To: , CC: , , , , , , , , , , References: <20231220021208.2634523-1-shaozhengchao@huawei.com> From: shaozhengchao In-Reply-To: <20231220021208.2634523-1-shaozhengchao@huawei.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To dggpeml500026.china.huawei.com (7.185.36.106) +ping Does anyone have ideas with this patch? On 2023/12/20 10:12, Zhengchao Shao wrote: > I analyze the potential sleeping issue of the following processes: > Thread A Thread B > ... netlink_create //ref = 1 > do_mq_notify ... > sock = netlink_getsockbyfilp ... //ref = 2 > info->notify_sock = sock; ... > ... netlink_sendmsg > ... skb = netlink_alloc_large_skb //skb->head is vmalloced > ... netlink_unicast > ... sk = netlink_getsockbyportid //ref = 3 > ... netlink_sendskb > ... __netlink_sendskb > ... skb_queue_tail //put skb to sk_receive_queue > ... sock_put //ref = 2 > ... ... > ... netlink_release > ... deferred_put_nlk_sk //ref = 1 > mqueue_flush_file > spin_lock > remove_notification > netlink_sendskb > sock_put //ref = 0 > sk_free > ... > __sk_destruct > netlink_sock_destruct > skb_queue_purge //get skb from sk_receive_queue > ... > __skb_queue_purge_reason > kfree_skb_reason > __kfree_skb > ... > skb_release_all > skb_release_head_state > netlink_skb_destructor > vfree(skb->head) //sleeping while holding spinlock > > In netlink_sendmsg, if the memory pointed to by skb->head is allocated by > vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. > When the mqueue executes flush, the sleeping bug will occur. Use mutex > lock instead of spin lock in mqueue_flush_file. > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > Signed-off-by: Zhengchao Shao > --- > v2: CCed some networking maintainer & netdev list > --- > ipc/mqueue.c | 48 ++++++++++++++++++++++++------------------------ > 1 file changed, 24 insertions(+), 24 deletions(-) > > diff --git a/ipc/mqueue.c b/ipc/mqueue.c > index 5eea4dc0509e..f6f92e3f82e4 100644 > --- a/ipc/mqueue.c > +++ b/ipc/mqueue.c > @@ -118,9 +118,9 @@ struct posix_msg_tree_node { > * Solution: use _release and _acquire barriers. > * > * 3) There is intentionally no barrier when setting current->state > - * to TASK_INTERRUPTIBLE: spin_unlock(&info->lock) provides the > + * to TASK_INTERRUPTIBLE: mutex_unlock(&info->lock) provides the > * release memory barrier, and the wakeup is triggered when holding > - * info->lock, i.e. spin_lock(&info->lock) provided a pairing > + * info->lock, i.e. mutex_lock(&info->lock) provided a pairing > * acquire memory barrier. > */ > > @@ -132,7 +132,7 @@ struct ext_wait_queue { /* queue of sleeping tasks */ > }; > > struct mqueue_inode_info { > - spinlock_t lock; > + struct mutex lock; > struct inode vfs_inode; > wait_queue_head_t wait_q; > > @@ -312,7 +312,7 @@ static struct inode *mqueue_get_inode(struct super_block *sb, > inode->i_size = FILENT_SIZE; > /* mqueue specific info */ > info = MQUEUE_I(inode); > - spin_lock_init(&info->lock); > + mutex_init(&info->lock); > init_waitqueue_head(&info->wait_q); > INIT_LIST_HEAD(&info->e_wait_q[0].list); > INIT_LIST_HEAD(&info->e_wait_q[1].list); > @@ -523,11 +523,11 @@ static void mqueue_evict_inode(struct inode *inode) > > ipc_ns = get_ns_from_inode(inode); > info = MQUEUE_I(inode); > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > while ((msg = msg_get(info)) != NULL) > list_add_tail(&msg->m_list, &tmp_msg); > kfree(info->node_cache); > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > > list_for_each_entry_safe(msg, nmsg, &tmp_msg, m_list) { > list_del(&msg->m_list); > @@ -640,7 +640,7 @@ static ssize_t mqueue_read_file(struct file *filp, char __user *u_data, > char buffer[FILENT_SIZE]; > ssize_t ret; > > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > snprintf(buffer, sizeof(buffer), > "QSIZE:%-10lu NOTIFY:%-5d SIGNO:%-5d NOTIFY_PID:%-6d\n", > info->qsize, > @@ -649,7 +649,7 @@ static ssize_t mqueue_read_file(struct file *filp, char __user *u_data, > info->notify.sigev_notify == SIGEV_SIGNAL) ? > info->notify.sigev_signo : 0, > pid_vnr(info->notify_owner)); > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > buffer[sizeof(buffer)-1] = '\0'; > > ret = simple_read_from_buffer(u_data, count, off, buffer, > @@ -665,11 +665,11 @@ static int mqueue_flush_file(struct file *filp, fl_owner_t id) > { > struct mqueue_inode_info *info = MQUEUE_I(file_inode(filp)); > > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > if (task_tgid(current) == info->notify_owner) > remove_notification(info); > > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > return 0; > } > > @@ -680,13 +680,13 @@ static __poll_t mqueue_poll_file(struct file *filp, struct poll_table_struct *po > > poll_wait(filp, &info->wait_q, poll_tab); > > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > if (info->attr.mq_curmsgs) > retval = EPOLLIN | EPOLLRDNORM; > > if (info->attr.mq_curmsgs < info->attr.mq_maxmsg) > retval |= EPOLLOUT | EPOLLWRNORM; > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > > return retval; > } > @@ -724,7 +724,7 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr, > /* memory barrier not required, we hold info->lock */ > __set_current_state(TASK_INTERRUPTIBLE); > > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > time = schedule_hrtimeout_range_clock(timeout, 0, > HRTIMER_MODE_ABS, CLOCK_REALTIME); > > @@ -734,7 +734,7 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr, > retval = 0; > goto out; > } > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > > /* we hold info->lock, so no memory barrier required */ > if (READ_ONCE(ewp->state) == STATE_READY) { > @@ -752,7 +752,7 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr, > } > list_del(&ewp->list); > out_unlock: > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > out: > return retval; > } > @@ -1125,7 +1125,7 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr, > if (!info->node_cache) > new_leaf = kmalloc(sizeof(*new_leaf), GFP_KERNEL); > > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > > if (!info->node_cache && new_leaf) { > /* Save our speculative allocation into the cache */ > @@ -1166,7 +1166,7 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr, > simple_inode_init_ts(inode); > } > out_unlock: > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > wake_up_q(&wake_q); > out_free: > if (ret) > @@ -1230,7 +1230,7 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr, > if (!info->node_cache) > new_leaf = kmalloc(sizeof(*new_leaf), GFP_KERNEL); > > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > > if (!info->node_cache && new_leaf) { > /* Save our speculative allocation into the cache */ > @@ -1242,7 +1242,7 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr, > > if (info->attr.mq_curmsgs == 0) { > if (f.file->f_flags & O_NONBLOCK) { > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > ret = -EAGAIN; > } else { > wait.task = current; > @@ -1261,7 +1261,7 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr, > > /* There is now free space in queue. */ > pipelined_receive(&wake_q, info); > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > wake_up_q(&wake_q); > ret = 0; > } > @@ -1391,7 +1391,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification) > info = MQUEUE_I(inode); > > ret = 0; > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > if (notification == NULL) { > if (info->notify_owner == task_tgid(current)) { > remove_notification(info); > @@ -1424,7 +1424,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification) > info->notify_user_ns = get_user_ns(current_user_ns()); > inode_set_atime_to_ts(inode, inode_set_ctime_current(inode)); > } > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > out_fput: > fdput(f); > out: > @@ -1470,7 +1470,7 @@ static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old) > inode = file_inode(f.file); > info = MQUEUE_I(inode); > > - spin_lock(&info->lock); > + mutex_lock(&info->lock); > > if (old) { > *old = info->attr; > @@ -1488,7 +1488,7 @@ static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old) > inode_set_atime_to_ts(inode, inode_set_ctime_current(inode)); > } > > - spin_unlock(&info->lock); > + mutex_unlock(&info->lock); > fdput(f); > return 0; > }