Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp5776735ybf; Thu, 5 Mar 2020 06:52:29 -0800 (PST) X-Google-Smtp-Source: ADFU+vuEac0wlkTdUTmFniA+sguxnydCscZ2Ui+w4JR50j7++TZ8BkrLhIABDy8rHmWXelhwpDws X-Received: by 2002:a05:6830:10da:: with SMTP id z26mr6568264oto.27.1583419949486; Thu, 05 Mar 2020 06:52:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583419949; cv=none; d=google.com; s=arc-20160816; b=tZn4oHH1oCykcMQwgXba7BsEeog9raunaq9ln07CxUN6EJWkY8cn4C1inB3hniHNAo Tvb0xJm6iywgM3Y9miNAM84kdndR3mFRh4XqYLz3tZcyuFDhm1enMkAgDiKWl0Zu+Pdn 3JoDjborIG6Yq10JnSjApxCb9+ClZ2V/2YuO1lxAejVW1b/zsyXQ1m7buScGsiPIjmYK gDSXqxfl7A59QMBtBapPTP99E/mrhsF87J7qX+HExS0MLZ+hginD5LogJSufdCzf30c/ QmJz9ODxHcYG25QveMzvUyI4QRysNnJroTg2w+QueQf7qUGeDRS1n4TNblp+C6e9YVlf SQfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=Pzrv3Xgm9ArCuBczojg+jiLM6hQhONYv4n+bCu9P/B4=; b=UG47c1iv4VfXaHoKm/P7fvkZpfH5TFmYsyZCkweDJwnruXdjNQENGJpin+Yv5KOODF QBqISaq+86LhwiVl5F0B4WMpOx4Zne8L5WzB3Be/In6jEKabB5yso85Ip+HRGAA3j3QG XJ5tlnJ9RliaIE0r7vTcmh+LT4EGlVFU/Iho3gGT6xakC1h7phfUIUONvATX+s3ulHi7 s5VgBfx0nu5Q5/Cs2tSan7PYyFh0CibcST72WzTPbsxHFGOij6mYwsVb0Npzay9DUKfy ehjNO2+dWHRrF+HjpN2jlZUrvEwCCt/CHjktasreDfc26W9WM8Msedelgvuzs+IDOpsd Fa0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x8si3654788oia.29.2020.03.05.06.52.17; Thu, 05 Mar 2020 06:52:29 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=collabora.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726894AbgCEOur (ORCPT + 99 others); Thu, 5 Mar 2020 09:50:47 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:55388 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726184AbgCEOuq (ORCPT ); Thu, 5 Mar 2020 09:50:46 -0500 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: tonyk) with ESMTPSA id 41E97295EE7 Subject: Re: [PATCH v3 0/4] Implement FUTEX_WAIT_MULTIPLE operation To: Hillf Danton Cc: linux-kernel@vger.kernel.org, "kernel@collabora.com" References: <20200229105130.15436-1-hdanton@sina.com> From: =?UTF-8?Q?Andr=c3=a9_Almeida?= Message-ID: Date: Thu, 5 Mar 2020 11:48:07 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <20200229105130.15436-1-hdanton@sina.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/29/20 7:51 AM, Hillf Danton wrote: > > On Thu, 13 Feb 2020 18:45:21 -0300 Andre Almeida wrote: >> >> When a write or a read operation in an eventfd file succeeds, it will try >> to wake up all threads that are waiting to perform some operation to >> the file. The lock (ctx->wqh.lock) that hold the access to the file value >> (ctx->count) is the same lock used to control access the waitqueue. When >> all those those thread woke, they will compete to get this lock. Along >> with that, the poll() also manipulates the waitqueue and need to hold >> this same lock. This lock is specially hard to acquire when poll() calls >> poll_freewait(), where it tries to free all waitqueues associated with >> this poll. While doing that, it will compete with a lot of read and >> write operations that have been waken. > > Want to know if a rwsem is likely to help mitigate the tension between the > readers and writers in your workloads. > Thanks for the suggestion Hillf. However, keep in mind that the lock that is causing the tension (ctx->wqh.lock) is shared between eventfd() and poll() syscall, so a solution to help mitigate the tension would need to be shared between both codes. And since wqh.lock isn't a read/write lock, even if a lot of readers manage to get this lock in parallel, they will compete wqh.lock > --- a/fs/eventfd.c > +++ b/fs/eventfd.c > @@ -40,6 +40,7 @@ struct eventfd_ctx { > __u64 count; > unsigned int flags; > int id; > + struct rw_semaphore rwsem; > }; > > /** > @@ -212,6 +213,8 @@ static ssize_t eventfd_read(struct file > if (count < sizeof(ucnt)) > return -EINVAL; > > + /* take sem in write mode for event read */ > + down_write(&ctx->rwsem); > spin_lock_irq(&ctx->wqh.lock); > res = -EAGAIN; > if (ctx->count > 0) > @@ -229,7 +232,9 @@ static ssize_t eventfd_read(struct file > break; > } > spin_unlock_irq(&ctx->wqh.lock); > + up_write(&ctx->rwsem); > schedule(); > + down_write(&ctx->rwsem); > spin_lock_irq(&ctx->wqh.lock); > } > __remove_wait_queue(&ctx->wqh, &wait); > @@ -241,6 +246,7 @@ static ssize_t eventfd_read(struct file > wake_up_locked_poll(&ctx->wqh, EPOLLOUT); > } > spin_unlock_irq(&ctx->wqh.lock); > + up_write(&ctx->rwsem); > > if (res > 0 && put_user(ucnt, (__u64 __user *)buf)) > return -EFAULT; > @@ -262,6 +268,8 @@ static ssize_t eventfd_write(struct file > return -EFAULT; > if (ucnt == ULLONG_MAX) > return -EINVAL; > + /* take sem in read mode for event write */ > + down_read(&ctx->rwsem); > spin_lock_irq(&ctx->wqh.lock); > res = -EAGAIN; > if (ULLONG_MAX - ctx->count > ucnt) > @@ -279,7 +287,9 @@ static ssize_t eventfd_write(struct file > break; > } > spin_unlock_irq(&ctx->wqh.lock); > + up_read(&ctx->rwsem); > schedule(); > + down_read(&ctx->rwsem); > spin_lock_irq(&ctx->wqh.lock); > } > __remove_wait_queue(&ctx->wqh, &wait); > @@ -291,6 +301,7 @@ static ssize_t eventfd_write(struct file > wake_up_locked_poll(&ctx->wqh, EPOLLIN); > } > spin_unlock_irq(&ctx->wqh.lock); > + up_read(&ctx->rwsem); > > return res; > } > @@ -408,6 +419,7 @@ static int do_eventfd(unsigned int count > init_waitqueue_head(&ctx->wqh); > ctx->count = count; > ctx->flags = flags; > + init_rwsem(&ctx->rwsem); > ctx->id = ida_simple_get(&eventfd_ida, 0, 0, GFP_KERNEL); > > fd = anon_inode_getfd("[eventfd]", &eventfd_fops, ctx, >