2023-01-07 01:55:48

by Hongchen Zhang

[permalink] [raw]
Subject: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

Use spinlock in pipe_read/write cost too much time,IMO
pipe->{head,tail} can be protected by __pipe_{lock,unlock}.
On the other hand, we can use __pipe_{lock,unlock} to protect
the pipe->{head,tail} in pipe_resize_ring and
post_one_notification.

Reminded by Matthew, I tested this patch using UnixBench's pipe
test case on a x86_64 machine,and get the following data:
1) before this patch
System Benchmarks Partial Index BASELINE RESULT INDEX
Pipe Throughput 12440.0 493023.3 396.3
========
System Benchmarks Index Score (Partial Only) 396.3

2) after this patch
System Benchmarks Partial Index BASELINE RESULT INDEX
Pipe Throughput 12440.0 507551.4 408.0
========
System Benchmarks Index Score (Partial Only) 408.0

so we get ~3% speedup.

Reminded by Andrew, I tested this patch with the test code in
Linus's 0ddad21d3e99 add get following result:
1) before this patch
13,136.54 msec task-clock # 3.870 CPUs utilized
1,186,779 context-switches # 90.342 K/sec
668,867 cpu-migrations # 50.917 K/sec
895 page-faults # 68.131 /sec
29,875,711,543 cycles # 2.274 GHz
12,372,397,462 instructions # 0.41 insn per cycle
2,480,235,723 branches # 188.804 M/sec
47,191,943 branch-misses # 1.90% of all branches

3.394806886 seconds time elapsed

0.037869000 seconds user
0.189346000 seconds sys

2) after this patch

12,395.63 msec task-clock # 4.138 CPUs utilized
1,193,381 context-switches # 96.274 K/sec
585,543 cpu-migrations # 47.238 K/sec
1,063 page-faults # 85.756 /sec
27,691,587,226 cycles # 2.234 GHz
11,738,307,999 instructions # 0.42 insn per cycle
2,351,299,522 branches # 189.688 M/sec
45,404,526 branch-misses # 1.93% of all branches

2.995280878 seconds time elapsed

0.010615000 seconds user
0.206999000 seconds sys
After adding this patch, the time used on this test program becomes less.

Signed-off-by: Hongchen Zhang <[email protected]>

v3:
- fixes the error reported by kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-lkp/[email protected]
- add perf stat data for the test code in Linus's 0ddad21d3e99 in
commit message.
v2:
- add UnixBench test data in commit message
- fixes the test error reported by kernel test robot <[email protected]>
by adding the missing fs.h header file.
---
fs/pipe.c | 22 +---------------------
include/linux/pipe_fs_i.h | 12 ++++++++++++
kernel/watch_queue.c | 8 ++++----
3 files changed, 17 insertions(+), 25 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 42c7ff41c2db..4355ee5f754e 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -98,16 +98,6 @@ void pipe_unlock(struct pipe_inode_info *pipe)
}
EXPORT_SYMBOL(pipe_unlock);

-static inline void __pipe_lock(struct pipe_inode_info *pipe)
-{
- mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
-}
-
-static inline void __pipe_unlock(struct pipe_inode_info *pipe)
-{
- mutex_unlock(&pipe->mutex);
-}
-
void pipe_double_lock(struct pipe_inode_info *pipe1,
struct pipe_inode_info *pipe2)
{
@@ -253,8 +243,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
*/
was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
for (;;) {
- /* Read ->head with a barrier vs post_one_notification() */
- unsigned int head = smp_load_acquire(&pipe->head);
+ unsigned int head = pipe->head;
unsigned int tail = pipe->tail;
unsigned int mask = pipe->ring_size - 1;

@@ -322,14 +311,12 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)

if (!buf->len) {
pipe_buf_release(pipe, buf);
- spin_lock_irq(&pipe->rd_wait.lock);
#ifdef CONFIG_WATCH_QUEUE
if (buf->flags & PIPE_BUF_FLAG_LOSS)
pipe->note_loss = true;
#endif
tail++;
pipe->tail = tail;
- spin_unlock_irq(&pipe->rd_wait.lock);
}
total_len -= chars;
if (!total_len)
@@ -506,16 +493,13 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
* it, either the reader will consume it or it'll still
* be there for the next write.
*/
- spin_lock_irq(&pipe->rd_wait.lock);

head = pipe->head;
if (pipe_full(head, pipe->tail, pipe->max_usage)) {
- spin_unlock_irq(&pipe->rd_wait.lock);
continue;
}

pipe->head = head + 1;
- spin_unlock_irq(&pipe->rd_wait.lock);

/* Insert it into the buffer array */
buf = &pipe->bufs[head & mask];
@@ -1260,14 +1244,12 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
if (unlikely(!bufs))
return -ENOMEM;

- spin_lock_irq(&pipe->rd_wait.lock);
mask = pipe->ring_size - 1;
head = pipe->head;
tail = pipe->tail;

n = pipe_occupancy(head, tail);
if (nr_slots < n) {
- spin_unlock_irq(&pipe->rd_wait.lock);
kfree(bufs);
return -EBUSY;
}
@@ -1303,8 +1285,6 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
pipe->tail = tail;
pipe->head = head;

- spin_unlock_irq(&pipe->rd_wait.lock);
-
/* This might have made more room for writers */
wake_up_interruptible(&pipe->wr_wait);
return 0;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 6cb65df3e3ba..f5084daf6eaf 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -2,6 +2,8 @@
#ifndef _LINUX_PIPE_FS_I_H
#define _LINUX_PIPE_FS_I_H

+#include <linux/fs.h>
+
#define PIPE_DEF_BUFFERS 16

#define PIPE_BUF_FLAG_LRU 0x01 /* page is on the LRU */
@@ -223,6 +225,16 @@ static inline void pipe_discard_from(struct pipe_inode_info *pipe,
#define PIPE_SIZE PAGE_SIZE

/* Pipe lock and unlock operations */
+static inline void __pipe_lock(struct pipe_inode_info *pipe)
+{
+ mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
+}
+
+static inline void __pipe_unlock(struct pipe_inode_info *pipe)
+{
+ mutex_unlock(&pipe->mutex);
+}
+
void pipe_lock(struct pipe_inode_info *);
void pipe_unlock(struct pipe_inode_info *);
void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *);
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index a6f9bdd956c3..92e46cfe9419 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -108,7 +108,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
if (!pipe)
return false;

- spin_lock_irq(&pipe->rd_wait.lock);
+ __pipe_lock(pipe);

mask = pipe->ring_size - 1;
head = pipe->head;
@@ -135,17 +135,17 @@ static bool post_one_notification(struct watch_queue *wqueue,
buf->offset = offset;
buf->len = len;
buf->flags = PIPE_BUF_FLAG_WHOLE;
- smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
+ pipe->head = head + 1;

if (!test_and_clear_bit(note, wqueue->notes_bitmap)) {
- spin_unlock_irq(&pipe->rd_wait.lock);
+ __pipe_unlock(pipe);
BUG();
}
wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
done = true;

out:
- spin_unlock_irq(&pipe->rd_wait.lock);
+ __pipe_unlock(pipe);
if (done)
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
return done;

base-commit: c8451c141e07a8d05693f6c8d0e418fbb4b68bb7
--
2.34.1


2023-01-13 03:29:03

by Hongchen Zhang

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

Hi All,
any question about this patch, can it be merged?

Thanks
On 2023/1/7 am 9:23, Hongchen Zhang wrote:
> Use spinlock in pipe_read/write cost too much time,IMO
> pipe->{head,tail} can be protected by __pipe_{lock,unlock}.
> On the other hand, we can use __pipe_{lock,unlock} to protect
> the pipe->{head,tail} in pipe_resize_ring and
> post_one_notification.
>
> Reminded by Matthew, I tested this patch using UnixBench's pipe
> test case on a x86_64 machine,and get the following data:
> 1) before this patch
> System Benchmarks Partial Index BASELINE RESULT INDEX
> Pipe Throughput 12440.0 493023.3 396.3
> ========
> System Benchmarks Index Score (Partial Only) 396.3
>
> 2) after this patch
> System Benchmarks Partial Index BASELINE RESULT INDEX
> Pipe Throughput 12440.0 507551.4 408.0
> ========
> System Benchmarks Index Score (Partial Only) 408.0
>
> so we get ~3% speedup.
>
> Reminded by Andrew, I tested this patch with the test code in
> Linus's 0ddad21d3e99 add get following result:
> 1) before this patch
> 13,136.54 msec task-clock # 3.870 CPUs utilized
> 1,186,779 context-switches # 90.342 K/sec
> 668,867 cpu-migrations # 50.917 K/sec
> 895 page-faults # 68.131 /sec
> 29,875,711,543 cycles # 2.274 GHz
> 12,372,397,462 instructions # 0.41 insn per cycle
> 2,480,235,723 branches # 188.804 M/sec
> 47,191,943 branch-misses # 1.90% of all branches
>
> 3.394806886 seconds time elapsed
>
> 0.037869000 seconds user
> 0.189346000 seconds sys
>
> 2) after this patch
>
> 12,395.63 msec task-clock # 4.138 CPUs utilized
> 1,193,381 context-switches # 96.274 K/sec
> 585,543 cpu-migrations # 47.238 K/sec
> 1,063 page-faults # 85.756 /sec
> 27,691,587,226 cycles # 2.234 GHz
> 11,738,307,999 instructions # 0.42 insn per cycle
> 2,351,299,522 branches # 189.688 M/sec
> 45,404,526 branch-misses # 1.93% of all branches
>
> 2.995280878 seconds time elapsed
>
> 0.010615000 seconds user
> 0.206999000 seconds sys
> After adding this patch, the time used on this test program becomes less.
>
> Signed-off-by: Hongchen Zhang <[email protected]>
>
> v3:
> - fixes the error reported by kernel test robot <[email protected]>
> Link: https://lore.kernel.org/oe-lkp/[email protected]
> - add perf stat data for the test code in Linus's 0ddad21d3e99 in
> commit message.
> v2:
> - add UnixBench test data in commit message
> - fixes the test error reported by kernel test robot <[email protected]>
> by adding the missing fs.h header file.
> ---
> fs/pipe.c | 22 +---------------------
> include/linux/pipe_fs_i.h | 12 ++++++++++++
> kernel/watch_queue.c | 8 ++++----
> 3 files changed, 17 insertions(+), 25 deletions(-)
>
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 42c7ff41c2db..4355ee5f754e 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -98,16 +98,6 @@ void pipe_unlock(struct pipe_inode_info *pipe)
> }
> EXPORT_SYMBOL(pipe_unlock);
>
> -static inline void __pipe_lock(struct pipe_inode_info *pipe)
> -{
> - mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
> -}
> -
> -static inline void __pipe_unlock(struct pipe_inode_info *pipe)
> -{
> - mutex_unlock(&pipe->mutex);
> -}
> -
> void pipe_double_lock(struct pipe_inode_info *pipe1,
> struct pipe_inode_info *pipe2)
> {
> @@ -253,8 +243,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> */
> was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
> for (;;) {
> - /* Read ->head with a barrier vs post_one_notification() */
> - unsigned int head = smp_load_acquire(&pipe->head);
> + unsigned int head = pipe->head;
> unsigned int tail = pipe->tail;
> unsigned int mask = pipe->ring_size - 1;
>
> @@ -322,14 +311,12 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>
> if (!buf->len) {
> pipe_buf_release(pipe, buf);
> - spin_lock_irq(&pipe->rd_wait.lock);
> #ifdef CONFIG_WATCH_QUEUE
> if (buf->flags & PIPE_BUF_FLAG_LOSS)
> pipe->note_loss = true;
> #endif
> tail++;
> pipe->tail = tail;
> - spin_unlock_irq(&pipe->rd_wait.lock);
> }
> total_len -= chars;
> if (!total_len)
> @@ -506,16 +493,13 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
> * it, either the reader will consume it or it'll still
> * be there for the next write.
> */
> - spin_lock_irq(&pipe->rd_wait.lock);
>
> head = pipe->head;
> if (pipe_full(head, pipe->tail, pipe->max_usage)) {
> - spin_unlock_irq(&pipe->rd_wait.lock);
> continue;
> }
>
> pipe->head = head + 1;
> - spin_unlock_irq(&pipe->rd_wait.lock);
>
> /* Insert it into the buffer array */
> buf = &pipe->bufs[head & mask];
> @@ -1260,14 +1244,12 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
> if (unlikely(!bufs))
> return -ENOMEM;
>
> - spin_lock_irq(&pipe->rd_wait.lock);
> mask = pipe->ring_size - 1;
> head = pipe->head;
> tail = pipe->tail;
>
> n = pipe_occupancy(head, tail);
> if (nr_slots < n) {
> - spin_unlock_irq(&pipe->rd_wait.lock);
> kfree(bufs);
> return -EBUSY;
> }
> @@ -1303,8 +1285,6 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
> pipe->tail = tail;
> pipe->head = head;
>
> - spin_unlock_irq(&pipe->rd_wait.lock);
> -
> /* This might have made more room for writers */
> wake_up_interruptible(&pipe->wr_wait);
> return 0;
> diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
> index 6cb65df3e3ba..f5084daf6eaf 100644
> --- a/include/linux/pipe_fs_i.h
> +++ b/include/linux/pipe_fs_i.h
> @@ -2,6 +2,8 @@
> #ifndef _LINUX_PIPE_FS_I_H
> #define _LINUX_PIPE_FS_I_H
>
> +#include <linux/fs.h>
> +
> #define PIPE_DEF_BUFFERS 16
>
> #define PIPE_BUF_FLAG_LRU 0x01 /* page is on the LRU */
> @@ -223,6 +225,16 @@ static inline void pipe_discard_from(struct pipe_inode_info *pipe,
> #define PIPE_SIZE PAGE_SIZE
>
> /* Pipe lock and unlock operations */
> +static inline void __pipe_lock(struct pipe_inode_info *pipe)
> +{
> + mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
> +}
> +
> +static inline void __pipe_unlock(struct pipe_inode_info *pipe)
> +{
> + mutex_unlock(&pipe->mutex);
> +}
> +
> void pipe_lock(struct pipe_inode_info *);
> void pipe_unlock(struct pipe_inode_info *);
> void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *);
> diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
> index a6f9bdd956c3..92e46cfe9419 100644
> --- a/kernel/watch_queue.c
> +++ b/kernel/watch_queue.c
> @@ -108,7 +108,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
> if (!pipe)
> return false;
>
> - spin_lock_irq(&pipe->rd_wait.lock);
> + __pipe_lock(pipe);
>
> mask = pipe->ring_size - 1;
> head = pipe->head;
> @@ -135,17 +135,17 @@ static bool post_one_notification(struct watch_queue *wqueue,
> buf->offset = offset;
> buf->len = len;
> buf->flags = PIPE_BUF_FLAG_WHOLE;
> - smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
> + pipe->head = head + 1;
>
> if (!test_and_clear_bit(note, wqueue->notes_bitmap)) {
> - spin_unlock_irq(&pipe->rd_wait.lock);
> + __pipe_unlock(pipe);
> BUG();
> }
> wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
> done = true;
>
> out:
> - spin_unlock_irq(&pipe->rd_wait.lock);
> + __pipe_unlock(pipe);
> if (done)
> kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> return done;
>
> base-commit: c8451c141e07a8d05693f6c8d0e418fbb4b68bb7
>

2023-01-13 10:07:29

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

On Fri, Jan 13, 2023 at 4:19 AM Hongchen Zhang
<[email protected]> wrote:
>
> Hi All,
> any question about this patch, can it be merged?
>
> Thanks
> On 2023/1/7 am 9:23, Hongchen Zhang wrote:
> > Use spinlock in pipe_read/write cost too much time,IMO
> > pipe->{head,tail} can be protected by __pipe_{lock,unlock}.
> > On the other hand, we can use __pipe_{lock,unlock} to protect
> > the pipe->{head,tail} in pipe_resize_ring and
> > post_one_notification.
> >
> > Reminded by Matthew, I tested this patch using UnixBench's pipe
> > test case on a x86_64 machine,and get the following data:
> > 1) before this patch
> > System Benchmarks Partial Index BASELINE RESULT INDEX
> > Pipe Throughput 12440.0 493023.3 396.3
> > ========
> > System Benchmarks Index Score (Partial Only) 396.3
> >
> > 2) after this patch
> > System Benchmarks Partial Index BASELINE RESULT INDEX
> > Pipe Throughput 12440.0 507551.4 408.0
> > ========
> > System Benchmarks Index Score (Partial Only) 408.0
> >
> > so we get ~3% speedup.
> >
> > Reminded by Andrew, I tested this patch with the test code in
> > Linus's 0ddad21d3e99 add get following result:

Happy new 2023 Hongchen Zhang,

Thanks for the update and sorry for the late response.

Should be "...s/add/and get following result:"

I cannot say much about the patch itself or tested it in my build-environment.

Best regards,
-Sedat-

> > 1) before this patch
> > 13,136.54 msec task-clock # 3.870 CPUs utilized
> > 1,186,779 context-switches # 90.342 K/sec
> > 668,867 cpu-migrations # 50.917 K/sec
> > 895 page-faults # 68.131 /sec
> > 29,875,711,543 cycles # 2.274 GHz
> > 12,372,397,462 instructions # 0.41 insn per cycle
> > 2,480,235,723 branches # 188.804 M/sec
> > 47,191,943 branch-misses # 1.90% of all branches
> >
> > 3.394806886 seconds time elapsed
> >
> > 0.037869000 seconds user
> > 0.189346000 seconds sys
> >
> > 2) after this patch
> >
> > 12,395.63 msec task-clock # 4.138 CPUs utilized
> > 1,193,381 context-switches # 96.274 K/sec
> > 585,543 cpu-migrations # 47.238 K/sec
> > 1,063 page-faults # 85.756 /sec
> > 27,691,587,226 cycles # 2.234 GHz
> > 11,738,307,999 instructions # 0.42 insn per cycle
> > 2,351,299,522 branches # 189.688 M/sec
> > 45,404,526 branch-misses # 1.93% of all branches
> >
> > 2.995280878 seconds time elapsed
> >
> > 0.010615000 seconds user
> > 0.206999000 seconds sys
> > After adding this patch, the time used on this test program becomes less.
> >
> > Signed-off-by: Hongchen Zhang <[email protected]>
> >
> > v3:
> > - fixes the error reported by kernel test robot <[email protected]>
> > Link: https://lore.kernel.org/oe-lkp/[email protected]
> > - add perf stat data for the test code in Linus's 0ddad21d3e99 in
> > commit message.
> > v2:
> > - add UnixBench test data in commit message
> > - fixes the test error reported by kernel test robot <[email protected]>
> > by adding the missing fs.h header file.
> > ---
> > fs/pipe.c | 22 +---------------------
> > include/linux/pipe_fs_i.h | 12 ++++++++++++
> > kernel/watch_queue.c | 8 ++++----
> > 3 files changed, 17 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/pipe.c b/fs/pipe.c
> > index 42c7ff41c2db..4355ee5f754e 100644
> > --- a/fs/pipe.c
> > +++ b/fs/pipe.c
> > @@ -98,16 +98,6 @@ void pipe_unlock(struct pipe_inode_info *pipe)
> > }
> > EXPORT_SYMBOL(pipe_unlock);
> >
> > -static inline void __pipe_lock(struct pipe_inode_info *pipe)
> > -{
> > - mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
> > -}
> > -
> > -static inline void __pipe_unlock(struct pipe_inode_info *pipe)
> > -{
> > - mutex_unlock(&pipe->mutex);
> > -}
> > -
> > void pipe_double_lock(struct pipe_inode_info *pipe1,
> > struct pipe_inode_info *pipe2)
> > {
> > @@ -253,8 +243,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> > */
> > was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
> > for (;;) {
> > - /* Read ->head with a barrier vs post_one_notification() */
> > - unsigned int head = smp_load_acquire(&pipe->head);
> > + unsigned int head = pipe->head;
> > unsigned int tail = pipe->tail;
> > unsigned int mask = pipe->ring_size - 1;
> >
> > @@ -322,14 +311,12 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
> >
> > if (!buf->len) {
> > pipe_buf_release(pipe, buf);
> > - spin_lock_irq(&pipe->rd_wait.lock);
> > #ifdef CONFIG_WATCH_QUEUE
> > if (buf->flags & PIPE_BUF_FLAG_LOSS)
> > pipe->note_loss = true;
> > #endif
> > tail++;
> > pipe->tail = tail;
> > - spin_unlock_irq(&pipe->rd_wait.lock);
> > }
> > total_len -= chars;
> > if (!total_len)
> > @@ -506,16 +493,13 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
> > * it, either the reader will consume it or it'll still
> > * be there for the next write.
> > */
> > - spin_lock_irq(&pipe->rd_wait.lock);
> >
> > head = pipe->head;
> > if (pipe_full(head, pipe->tail, pipe->max_usage)) {
> > - spin_unlock_irq(&pipe->rd_wait.lock);
> > continue;
> > }
> >
> > pipe->head = head + 1;
> > - spin_unlock_irq(&pipe->rd_wait.lock);
> >
> > /* Insert it into the buffer array */
> > buf = &pipe->bufs[head & mask];
> > @@ -1260,14 +1244,12 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
> > if (unlikely(!bufs))
> > return -ENOMEM;
> >
> > - spin_lock_irq(&pipe->rd_wait.lock);
> > mask = pipe->ring_size - 1;
> > head = pipe->head;
> > tail = pipe->tail;
> >
> > n = pipe_occupancy(head, tail);
> > if (nr_slots < n) {
> > - spin_unlock_irq(&pipe->rd_wait.lock);
> > kfree(bufs);
> > return -EBUSY;
> > }
> > @@ -1303,8 +1285,6 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
> > pipe->tail = tail;
> > pipe->head = head;
> >
> > - spin_unlock_irq(&pipe->rd_wait.lock);
> > -
> > /* This might have made more room for writers */
> > wake_up_interruptible(&pipe->wr_wait);
> > return 0;
> > diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
> > index 6cb65df3e3ba..f5084daf6eaf 100644
> > --- a/include/linux/pipe_fs_i.h
> > +++ b/include/linux/pipe_fs_i.h
> > @@ -2,6 +2,8 @@
> > #ifndef _LINUX_PIPE_FS_I_H
> > #define _LINUX_PIPE_FS_I_H
> >
> > +#include <linux/fs.h>
> > +
> > #define PIPE_DEF_BUFFERS 16
> >
> > #define PIPE_BUF_FLAG_LRU 0x01 /* page is on the LRU */
> > @@ -223,6 +225,16 @@ static inline void pipe_discard_from(struct pipe_inode_info *pipe,
> > #define PIPE_SIZE PAGE_SIZE
> >
> > /* Pipe lock and unlock operations */
> > +static inline void __pipe_lock(struct pipe_inode_info *pipe)
> > +{
> > + mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
> > +}
> > +
> > +static inline void __pipe_unlock(struct pipe_inode_info *pipe)
> > +{
> > + mutex_unlock(&pipe->mutex);
> > +}
> > +
> > void pipe_lock(struct pipe_inode_info *);
> > void pipe_unlock(struct pipe_inode_info *);
> > void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *);
> > diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
> > index a6f9bdd956c3..92e46cfe9419 100644
> > --- a/kernel/watch_queue.c
> > +++ b/kernel/watch_queue.c
> > @@ -108,7 +108,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
> > if (!pipe)
> > return false;
> >
> > - spin_lock_irq(&pipe->rd_wait.lock);
> > + __pipe_lock(pipe);
> >
> > mask = pipe->ring_size - 1;
> > head = pipe->head;
> > @@ -135,17 +135,17 @@ static bool post_one_notification(struct watch_queue *wqueue,
> > buf->offset = offset;
> > buf->len = len;
> > buf->flags = PIPE_BUF_FLAG_WHOLE;
> > - smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
> > + pipe->head = head + 1;
> >
> > if (!test_and_clear_bit(note, wqueue->notes_bitmap)) {
> > - spin_unlock_irq(&pipe->rd_wait.lock);
> > + __pipe_unlock(pipe);
> > BUG();
> > }
> > wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
> > done = true;
> >
> > out:
> > - spin_unlock_irq(&pipe->rd_wait.lock);
> > + __pipe_unlock(pipe);
> > if (done)
> > kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
> > return done;
> >
> > base-commit: c8451c141e07a8d05693f6c8d0e418fbb4b68bb7
> >
>

2023-01-16 02:05:56

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

On Fri, Jan 13, 2023 at 10:32 AM Sedat Dilek <[email protected]> wrote:
>
> On Fri, Jan 13, 2023 at 4:19 AM Hongchen Zhang
> <[email protected]> wrote:
> >
> > Hi All,
> > any question about this patch, can it be merged?
> >
> > Thanks
> > On 2023/1/7 am 9:23, Hongchen Zhang wrote:
> > > Use spinlock in pipe_read/write cost too much time,IMO
> > > pipe->{head,tail} can be protected by __pipe_{lock,unlock}.
> > > On the other hand, we can use __pipe_{lock,unlock} to protect
> > > the pipe->{head,tail} in pipe_resize_ring and
> > > post_one_notification.
> > >
> > > Reminded by Matthew, I tested this patch using UnixBench's pipe
> > > test case on a x86_64 machine,and get the following data:
> > > 1) before this patch
> > > System Benchmarks Partial Index BASELINE RESULT INDEX
> > > Pipe Throughput 12440.0 493023.3 396.3
> > > ========
> > > System Benchmarks Index Score (Partial Only) 396.3
> > >
> > > 2) after this patch
> > > System Benchmarks Partial Index BASELINE RESULT INDEX
> > > Pipe Throughput 12440.0 507551.4 408.0
> > > ========
> > > System Benchmarks Index Score (Partial Only) 408.0
> > >
> > > so we get ~3% speedup.
> > >
> > > Reminded by Andrew, I tested this patch with the test code in
> > > Linus's 0ddad21d3e99 add get following result:
>
> Happy new 2023 Hongchen Zhang,
>
> Thanks for the update and sorry for the late response.
>
> Should be "...s/add/and get following result:"
>
> I cannot say much about the patch itself or tested it in my build-environment.
>
> Best regards,
> -Sedat-
>

I have applied v3 on top of Linux v6.2-rc4.

Used pipebench for a quick testing.

# fdisk -l /dev/sdb
Disk /dev/sdb: 14,91 GiB, 16013942784 bytes, 31277232 sectors
Disk model: SanDisk iSSD P4
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x74f02dea

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 31277231 31275184 14,9G 83 Linux

# cat /dev/sdb | pipebench > /dev/null
Summary:
Piped 14.91 GB in 00h01m34.20s: 162.12 MB/second

Not tested/benchmarked with the kernel w/o your patch.

-Sedat-

2023-01-16 02:26:48

by Hongchen Zhang

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

Hi sedat,


On 2023/1/16 am9:52, Sedat Dilek wrote:
> On Fri, Jan 13, 2023 at 10:32 AM Sedat Dilek <[email protected]> wrote:
>>
>> On Fri, Jan 13, 2023 at 4:19 AM Hongchen Zhang
>> <[email protected]> wrote:
>>>
>>> Hi All,
>>> any question about this patch, can it be merged?
>>>
>>> Thanks
>>> On 2023/1/7 am 9:23, Hongchen Zhang wrote:
>>>> Use spinlock in pipe_read/write cost too much time,IMO
>>>> pipe->{head,tail} can be protected by __pipe_{lock,unlock}.
>>>> On the other hand, we can use __pipe_{lock,unlock} to protect
>>>> the pipe->{head,tail} in pipe_resize_ring and
>>>> post_one_notification.
>>>>
>>>> Reminded by Matthew, I tested this patch using UnixBench's pipe
>>>> test case on a x86_64 machine,and get the following data:
>>>> 1) before this patch
>>>> System Benchmarks Partial Index BASELINE RESULT INDEX
>>>> Pipe Throughput 12440.0 493023.3 396.3
>>>> ========
>>>> System Benchmarks Index Score (Partial Only) 396.3
>>>>
>>>> 2) after this patch
>>>> System Benchmarks Partial Index BASELINE RESULT INDEX
>>>> Pipe Throughput 12440.0 507551.4 408.0
>>>> ========
>>>> System Benchmarks Index Score (Partial Only) 408.0
>>>>
>>>> so we get ~3% speedup.
>>>>
>>>> Reminded by Andrew, I tested this patch with the test code in
>>>> Linus's 0ddad21d3e99 add get following result:
>>
>> Happy new 2023 Hongchen Zhang,
>>
>> Thanks for the update and sorry for the late response.
>>
>> Should be "...s/add/and get following result:"
>>
>> I cannot say much about the patch itself or tested it in my build-environment.
>>
>> Best regards,
>> -Sedat-
>>
>
> I have applied v3 on top of Linux v6.2-rc4.
>
> Used pipebench for a quick testing.
>
> # fdisk -l /dev/sdb
> Disk /dev/sdb: 14,91 GiB, 16013942784 bytes, 31277232 sectors
> Disk model: SanDisk iSSD P4
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disklabel type: dos
> Disk identifier: 0x74f02dea
>
> Device Boot Start End Sectors Size Id Type
> /dev/sdb1 2048 31277231 31275184 14,9G 83 Linux
>
> # cat /dev/sdb | pipebench > /dev/null
> Summary:
> Piped 14.91 GB in 00h01m34.20s: 162.12 MB/second
>
> Not tested/benchmarked with the kernel w/o your patch.
>
> -Sedat-
>
OK, If there is any problem, let's continue to discuss it
and hope it can be merged into the main line.

Best Regards,
Hongchen Zhang

2023-01-16 02:59:52

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

On Mon, Jan 16, 2023 at 3:16 AM Hongchen Zhang
<[email protected]> wrote:
>
> Hi sedat,
>
>
> On 2023/1/16 am9:52, Sedat Dilek wrote:
> > On Fri, Jan 13, 2023 at 10:32 AM Sedat Dilek <[email protected]> wrote:
> >>
> >> On Fri, Jan 13, 2023 at 4:19 AM Hongchen Zhang
> >> <[email protected]> wrote:
> >>>
> >>> Hi All,
> >>> any question about this patch, can it be merged?
> >>>
> >>> Thanks
> >>> On 2023/1/7 am 9:23, Hongchen Zhang wrote:
> >>>> Use spinlock in pipe_read/write cost too much time,IMO
> >>>> pipe->{head,tail} can be protected by __pipe_{lock,unlock}.
> >>>> On the other hand, we can use __pipe_{lock,unlock} to protect
> >>>> the pipe->{head,tail} in pipe_resize_ring and
> >>>> post_one_notification.
> >>>>
> >>>> Reminded by Matthew, I tested this patch using UnixBench's pipe
> >>>> test case on a x86_64 machine,and get the following data:
> >>>> 1) before this patch
> >>>> System Benchmarks Partial Index BASELINE RESULT INDEX
> >>>> Pipe Throughput 12440.0 493023.3 396.3
> >>>> ========
> >>>> System Benchmarks Index Score (Partial Only) 396.3
> >>>>
> >>>> 2) after this patch
> >>>> System Benchmarks Partial Index BASELINE RESULT INDEX
> >>>> Pipe Throughput 12440.0 507551.4 408.0
> >>>> ========
> >>>> System Benchmarks Index Score (Partial Only) 408.0
> >>>>
> >>>> so we get ~3% speedup.
> >>>>
> >>>> Reminded by Andrew, I tested this patch with the test code in
> >>>> Linus's 0ddad21d3e99 add get following result:
> >>
> >> Happy new 2023 Hongchen Zhang,
> >>
> >> Thanks for the update and sorry for the late response.
> >>
> >> Should be "...s/add/and get following result:"
> >>
> >> I cannot say much about the patch itself or tested it in my build-environment.
> >>
> >> Best regards,
> >> -Sedat-
> >>
> >
> > I have applied v3 on top of Linux v6.2-rc4.
> >
> > Used pipebench for a quick testing.
> >
> > # fdisk -l /dev/sdb
> > Disk /dev/sdb: 14,91 GiB, 16013942784 bytes, 31277232 sectors
> > Disk model: SanDisk iSSD P4
> > Units: sectors of 1 * 512 = 512 bytes
> > Sector size (logical/physical): 512 bytes / 512 bytes
> > I/O size (minimum/optimal): 512 bytes / 512 bytes
> > Disklabel type: dos
> > Disk identifier: 0x74f02dea
> >
> > Device Boot Start End Sectors Size Id Type
> > /dev/sdb1 2048 31277231 31275184 14,9G 83 Linux
> >
> > # cat /dev/sdb | pipebench > /dev/null
> > Summary:
> > Piped 14.91 GB in 00h01m34.20s: 162.12 MB/second
> >
> > Not tested/benchmarked with the kernel w/o your patch.
> >
> > -Sedat-
> >
> OK, If there is any problem, let's continue to discuss it
> and hope it can be merged into the main line.
>

Can you give me a hand on the perf stat line?

I tried with:

$ /usr/bin/perf stat --repeat=1 ./0ddad21d3e99

But that gives in both cases no context-switches and cpu-migrations values.

-Sedat-

2023-01-16 03:22:43

by maobibo

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

Hongchen,

I have a glance with this patch, it simply replaces with
spinlock_irqsave with mutex lock. There may be performance
improvement with two processes competing with pipe, however
for N processes, there will be complex context switches
and ipi interruptts.

Can you find some cases with more than 2 processes competing
pipe, rather than only unixbench?

regards
bibo, mao

在 2023/1/13 11:19, Hongchen Zhang 写道:
> Hi All,
> any question about this patch, can it be merged?
>
> Thanks
> On 2023/1/7 am 9:23, Hongchen Zhang wrote:
>> Use spinlock in pipe_read/write cost too much time,IMO
>> pipe->{head,tail} can be protected by __pipe_{lock,unlock}.
>> On the other hand, we can use __pipe_{lock,unlock} to protect
>> the pipe->{head,tail} in pipe_resize_ring and
>> post_one_notification.
>>
>> Reminded by Matthew, I tested this patch using UnixBench's pipe
>> test case on a x86_64 machine,and get the following data:
>> 1) before this patch
>> System Benchmarks Partial Index  BASELINE       RESULT    INDEX
>> Pipe Throughput                   12440.0     493023.3    396.3
>>                                                          ========
>> System Benchmarks Index Score (Partial Only)              396.3
>>
>> 2) after this patch
>> System Benchmarks Partial Index  BASELINE       RESULT    INDEX
>> Pipe Throughput                   12440.0     507551.4    408.0
>>                                                          ========
>> System Benchmarks Index Score (Partial Only)              408.0
>>
>> so we get ~3% speedup.
>>
>> Reminded by Andrew, I tested this patch with the test code in
>> Linus's 0ddad21d3e99 add get following result:
>> 1) before this patch
>>           13,136.54 msec task-clock           #    3.870 CPUs utilized
>>           1,186,779      context-switches     #   90.342 K/sec
>>             668,867      cpu-migrations       #   50.917 K/sec
>>                 895      page-faults          #   68.131 /sec
>>      29,875,711,543      cycles               #    2.274 GHz
>>      12,372,397,462      instructions         #    0.41  insn per cycle
>>       2,480,235,723      branches             #  188.804 M/sec
>>          47,191,943      branch-misses        #    1.90% of all branches
>>
>>         3.394806886 seconds time elapsed
>>
>>         0.037869000 seconds user
>>         0.189346000 seconds sys
>>
>> 2) after this patch
>>
>>           12,395.63 msec task-clock          #    4.138 CPUs utilized
>>           1,193,381      context-switches    #   96.274 K/sec
>>             585,543      cpu-migrations      #   47.238 K/sec
>>               1,063      page-faults         #   85.756 /sec
>>      27,691,587,226      cycles              #    2.234 GHz
>>      11,738,307,999      instructions        #    0.42  insn per cycle
>>       2,351,299,522      branches            #  189.688 M/sec
>>          45,404,526      branch-misses       #    1.93% of all branches
>>
>>         2.995280878 seconds time elapsed
>>
>>         0.010615000 seconds user
>>         0.206999000 seconds sys
>> After adding this patch, the time used on this test program becomes less.
>>
>> Signed-off-by: Hongchen Zhang <[email protected]>
>>
>> v3:
>>    - fixes the error reported by kernel test robot <[email protected]>
>>      Link: https://lore.kernel.org/oe-lkp/[email protected]
>>    - add perf stat data for the test code in Linus's 0ddad21d3e99 in
>>      commit message.
>> v2:
>>    - add UnixBench test data in commit message
>>    - fixes the test error reported by kernel test robot <[email protected]>
>>      by adding the missing fs.h header file.
>> ---
>>   fs/pipe.c                 | 22 +---------------------
>>   include/linux/pipe_fs_i.h | 12 ++++++++++++
>>   kernel/watch_queue.c      |  8 ++++----
>>   3 files changed, 17 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/pipe.c b/fs/pipe.c
>> index 42c7ff41c2db..4355ee5f754e 100644
>> --- a/fs/pipe.c
>> +++ b/fs/pipe.c
>> @@ -98,16 +98,6 @@ void pipe_unlock(struct pipe_inode_info *pipe)
>>   }
>>   EXPORT_SYMBOL(pipe_unlock);
>>   -static inline void __pipe_lock(struct pipe_inode_info *pipe)
>> -{
>> -    mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
>> -}
>> -
>> -static inline void __pipe_unlock(struct pipe_inode_info *pipe)
>> -{
>> -    mutex_unlock(&pipe->mutex);
>> -}
>> -
>>   void pipe_double_lock(struct pipe_inode_info *pipe1,
>>                 struct pipe_inode_info *pipe2)
>>   {
>> @@ -253,8 +243,7 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>>        */
>>       was_full = pipe_full(pipe->head, pipe->tail, pipe->max_usage);
>>       for (;;) {
>> -        /* Read ->head with a barrier vs post_one_notification() */
>> -        unsigned int head = smp_load_acquire(&pipe->head);
>> +        unsigned int head = pipe->head;
>>           unsigned int tail = pipe->tail;
>>           unsigned int mask = pipe->ring_size - 1;
>>   @@ -322,14 +311,12 @@ pipe_read(struct kiocb *iocb, struct iov_iter *to)
>>                 if (!buf->len) {
>>                   pipe_buf_release(pipe, buf);
>> -                spin_lock_irq(&pipe->rd_wait.lock);
>>   #ifdef CONFIG_WATCH_QUEUE
>>                   if (buf->flags & PIPE_BUF_FLAG_LOSS)
>>                       pipe->note_loss = true;
>>   #endif
>>                   tail++;
>>                   pipe->tail = tail;
>> -                spin_unlock_irq(&pipe->rd_wait.lock);
>>               }
>>               total_len -= chars;
>>               if (!total_len)
>> @@ -506,16 +493,13 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
>>                * it, either the reader will consume it or it'll still
>>                * be there for the next write.
>>                */
>> -            spin_lock_irq(&pipe->rd_wait.lock);
>>                 head = pipe->head;
>>               if (pipe_full(head, pipe->tail, pipe->max_usage)) {
>> -                spin_unlock_irq(&pipe->rd_wait.lock);
>>                   continue;
>>               }
>>                 pipe->head = head + 1;
>> -            spin_unlock_irq(&pipe->rd_wait.lock);
>>                 /* Insert it into the buffer array */
>>               buf = &pipe->bufs[head & mask];
>> @@ -1260,14 +1244,12 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
>>       if (unlikely(!bufs))
>>           return -ENOMEM;
>>   -    spin_lock_irq(&pipe->rd_wait.lock);
>>       mask = pipe->ring_size - 1;
>>       head = pipe->head;
>>       tail = pipe->tail;
>>         n = pipe_occupancy(head, tail);
>>       if (nr_slots < n) {
>> -        spin_unlock_irq(&pipe->rd_wait.lock);
>>           kfree(bufs);
>>           return -EBUSY;
>>       }
>> @@ -1303,8 +1285,6 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots)
>>       pipe->tail = tail;
>>       pipe->head = head;
>>   -    spin_unlock_irq(&pipe->rd_wait.lock);
>> -
>>       /* This might have made more room for writers */
>>       wake_up_interruptible(&pipe->wr_wait);
>>       return 0;
>> diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
>> index 6cb65df3e3ba..f5084daf6eaf 100644
>> --- a/include/linux/pipe_fs_i.h
>> +++ b/include/linux/pipe_fs_i.h
>> @@ -2,6 +2,8 @@
>>   #ifndef _LINUX_PIPE_FS_I_H
>>   #define _LINUX_PIPE_FS_I_H
>>   +#include <linux/fs.h>
>> +
>>   #define PIPE_DEF_BUFFERS    16
>>     #define PIPE_BUF_FLAG_LRU    0x01    /* page is on the LRU */
>> @@ -223,6 +225,16 @@ static inline void pipe_discard_from(struct pipe_inode_info *pipe,
>>   #define PIPE_SIZE        PAGE_SIZE
>>     /* Pipe lock and unlock operations */
>> +static inline void __pipe_lock(struct pipe_inode_info *pipe)
>> +{
>> +    mutex_lock_nested(&pipe->mutex, I_MUTEX_PARENT);
>> +}
>> +
>> +static inline void __pipe_unlock(struct pipe_inode_info *pipe)
>> +{
>> +    mutex_unlock(&pipe->mutex);
>> +}
>> +
>>   void pipe_lock(struct pipe_inode_info *);
>>   void pipe_unlock(struct pipe_inode_info *);
>>   void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *);
>> diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
>> index a6f9bdd956c3..92e46cfe9419 100644
>> --- a/kernel/watch_queue.c
>> +++ b/kernel/watch_queue.c
>> @@ -108,7 +108,7 @@ static bool post_one_notification(struct watch_queue *wqueue,
>>       if (!pipe)
>>           return false;
>>   -    spin_lock_irq(&pipe->rd_wait.lock);
>> +    __pipe_lock(pipe);
>>         mask = pipe->ring_size - 1;
>>       head = pipe->head;
>> @@ -135,17 +135,17 @@ static bool post_one_notification(struct watch_queue *wqueue,
>>       buf->offset = offset;
>>       buf->len = len;
>>       buf->flags = PIPE_BUF_FLAG_WHOLE;
>> -    smp_store_release(&pipe->head, head + 1); /* vs pipe_read() */
>> +    pipe->head = head + 1;
>>         if (!test_and_clear_bit(note, wqueue->notes_bitmap)) {
>> -        spin_unlock_irq(&pipe->rd_wait.lock);
>> +        __pipe_unlock(pipe);
>>           BUG();
>>       }
>>       wake_up_interruptible_sync_poll_locked(&pipe->rd_wait, EPOLLIN | EPOLLRDNORM);
>>       done = true;
>>     out:
>> -    spin_unlock_irq(&pipe->rd_wait.lock);
>> +    __pipe_unlock(pipe);
>>       if (done)
>>           kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
>>       return done;
>>
>> base-commit: c8451c141e07a8d05693f6c8d0e418fbb4b68bb7
>>

2023-01-16 04:45:32

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

On Mon, Jan 16, 2023 at 11:16:13AM +0800, maobibo wrote:
> Hongchen,
>
> I have a glance with this patch, it simply replaces with
> spinlock_irqsave with mutex lock. There may be performance
> improvement with two processes competing with pipe, however
> for N processes, there will be complex context switches
> and ipi interruptts.
>
> Can you find some cases with more than 2 processes competing
> pipe, rather than only unixbench?

What real applications have pipes with more than 1 writer & 1 reader?
I'm OK with slowing down the weird cases if the common cases go faster.

2023-01-16 22:03:24

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

On Mon, Jan 16, 2023 at 04:38:01AM +0000, Matthew Wilcox wrote:
> On Mon, Jan 16, 2023 at 11:16:13AM +0800, maobibo wrote:
> > Hongchen,
> >
> > I have a glance with this patch, it simply replaces with
> > spinlock_irqsave with mutex lock. There may be performance
> > improvement with two processes competing with pipe, however
> > for N processes, there will be complex context switches
> > and ipi interruptts.
> >
> > Can you find some cases with more than 2 processes competing
> > pipe, rather than only unixbench?
>
> What real applications have pipes with more than 1 writer & 1 reader?
> I'm OK with slowing down the weird cases if the common cases go faster.

From commit 0ddad21d3e99c743a3aa473121dc5561679e26bb:
While this isn't a common occurrence in the traditional "use a pipe as a
data transport" case, where you typically only have a single reader and
a single writer process, there is one common special case: using a pipe
as a source of "locking tokens" rather than for data communication.

In particular, the GNU make jobserver code ends up using a pipe as a way
to limit parallelism, where each job consumes a token by reading a byte
from the jobserver pipe, and releases the token by writing a byte back
to the pipe.

2023-01-16 23:45:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

On Mon, 16 Jan 2023 21:10:37 +0000 Al Viro <[email protected]> wrote:

> On Mon, Jan 16, 2023 at 04:38:01AM +0000, Matthew Wilcox wrote:
> > On Mon, Jan 16, 2023 at 11:16:13AM +0800, maobibo wrote:
> > > Hongchen,
> > >
> > > I have a glance with this patch, it simply replaces with
> > > spinlock_irqsave with mutex lock. There may be performance
> > > improvement with two processes competing with pipe, however
> > > for N processes, there will be complex context switches
> > > and ipi interruptts.
> > >
> > > Can you find some cases with more than 2 processes competing
> > > pipe, rather than only unixbench?
> >
> > What real applications have pipes with more than 1 writer & 1 reader?
> > I'm OK with slowing down the weird cases if the common cases go faster.
>
> >From commit 0ddad21d3e99c743a3aa473121dc5561679e26bb:
> While this isn't a common occurrence in the traditional "use a pipe as a
> data transport" case, where you typically only have a single reader and
> a single writer process, there is one common special case: using a pipe
> as a source of "locking tokens" rather than for data communication.
>
> In particular, the GNU make jobserver code ends up using a pipe as a way
> to limit parallelism, where each job consumes a token by reading a byte
> from the jobserver pipe, and releases the token by writing a byte back
> to the pipe.

The author has tested this patch with Linus's test code from 0ddad21d3e
and the results were OK
(https://lkml.kernel.org/r/[email protected]).

I've been stalling on this patch until Linus gets back to his desk,
which now appears to have happened.

Hongchen, when convenient, please capture this discussion (as well as
the testing results with Linus's sample code) in the changelog and send
us a v4, with Linus on cc?

2023-01-17 07:15:26

by Sedat Dilek

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

On Mon, Jan 16, 2023 at 11:16 PM Andrew Morton
<[email protected]> wrote:
>
> On Mon, 16 Jan 2023 21:10:37 +0000 Al Viro <[email protected]> wrote:
>
> > On Mon, Jan 16, 2023 at 04:38:01AM +0000, Matthew Wilcox wrote:
> > > On Mon, Jan 16, 2023 at 11:16:13AM +0800, maobibo wrote:
> > > > Hongchen,
> > > >
> > > > I have a glance with this patch, it simply replaces with
> > > > spinlock_irqsave with mutex lock. There may be performance
> > > > improvement with two processes competing with pipe, however
> > > > for N processes, there will be complex context switches
> > > > and ipi interruptts.
> > > >
> > > > Can you find some cases with more than 2 processes competing
> > > > pipe, rather than only unixbench?
> > >
> > > What real applications have pipes with more than 1 writer & 1 reader?
> > > I'm OK with slowing down the weird cases if the common cases go faster.
> >
> > >From commit 0ddad21d3e99c743a3aa473121dc5561679e26bb:
> > While this isn't a common occurrence in the traditional "use a pipe as a
> > data transport" case, where you typically only have a single reader and
> > a single writer process, there is one common special case: using a pipe
> > as a source of "locking tokens" rather than for data communication.
> >
> > In particular, the GNU make jobserver code ends up using a pipe as a way
> > to limit parallelism, where each job consumes a token by reading a byte
> > from the jobserver pipe, and releases the token by writing a byte back
> > to the pipe.
>
> The author has tested this patch with Linus's test code from 0ddad21d3e
> and the results were OK
> (https://lkml.kernel.org/r/[email protected]).
>

Yesterday, I had some time to play with and without this patch on my
Debian/unstable AMD64 box.

[ TEST-CASE ]

BASE: Linux v6.2-rc4

PATCH: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

TEST-CASE: Taken from commit 0ddad21d3e99

RUN: gcc-12 -o 0ddad21d3e99 0ddad21d3e99.c

Link: https://lore.kernel.org/all/[email protected]/
Link: https://git.kernel.org/linus/0ddad21d3e99


[ INSTRUCTIONS ]

echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid

10 runs: /usr/bin/perf stat --repeat=10 ./0ddad21d3e99

echo 1 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid


[ BEFORE ]

Performance counter stats for './0ddad21d3e99' (10 runs):

23.985,50 msec task-clock # 3,246
CPUs utilized ( +- 0,20% )
1.112.822 context-switches # 46,696
K/sec ( +- 0,30% )
403.033 cpu-migrations # 16,912
K/sec ( +- 0,28% )
1.508 page-faults # 63,278
/sec ( +- 2,95% )
39.436.000.959 cycles # 1,655 GHz
( +- 0,22% )
29.364.329.413 stalled-cycles-frontend # 74,91%
frontend cycles idle ( +- 0,24% )
22.139.448.400 stalled-cycles-backend # 56,48%
backend cycles idle ( +- 0,23% )
18.565.538.523 instructions # 0,47
insn per cycle
# 1,57 stalled
cycles per insn ( +- 0,17% )
4.059.885.546 branches # 170,359
M/sec ( +- 0,17% )
59.991.226 branch-misses # 1,48% of
all branches ( +- 0,19% )

7,3892 +- 0,0127 seconds time elapsed ( +- 0,17% )


[ AFTER ]

Performance counter stats for './0ddad21d3e99' (10 runs):

24.175,94 msec task-clock # 3,362
CPUs utilized ( +- 0,11% )
1.139.152 context-switches # 47,119
K/sec ( +- 0,12% )
407.994 cpu-migrations # 16,876
K/sec ( +- 0,26% )
1.555 page-faults # 64,319
/sec ( +- 3,11% )
40.904.849.091 cycles # 1,692 GHz
( +- 0,13% )
30.587.623.034 stalled-cycles-frontend # 74,84%
frontend cycles idle ( +- 0,15% )
23.145.533.537 stalled-cycles-backend # 56,63%
backend cycles idle ( +- 0,16% )
18.762.964.037 instructions # 0,46
insn per cycle
# 1,63 stalled
cycles per insn ( +- 0,11% )
4.057.182.849 branches # 167,817
M/sec ( +- 0,09% )
63.887.806 branch-misses # 1,58% of
all branches ( +- 0,25% )

7,19157 +- 0,00644 seconds time elapsed ( +- 0,09% )


[ RESULT ]

seconds time elapsed: - 2,67%

The test-case c-file is attached and for the case the above lines were
truncated I have attached as a README file.

Feel free to add a...

Tested-by: Sedat Dilek <[email protected]> # LLVM v15.0.3 (x86-64)

If you need further information, please let me know.

-Sedat-

> I've been stalling on this patch until Linus gets back to his desk,
> which now appears to have happened.
>
> Hongchen, when convenient, please capture this discussion (as well as
> the testing results with Linus's sample code) in the changelog and send
> us a v4, with Linus on cc?
>


Attachments:
0ddad21d3e99.c (787.00 B)
README_zhanghongchen-pipe-v3-0ddad21d3e99 (3.16 kB)
Download all attachments

2023-01-29 02:29:32

by Hongchen Zhang

[permalink] [raw]
Subject: Re: [PATCH v3] pipe: use __pipe_{lock,unlock} instead of spinlock

Hi Andrew,

Sorry to reply to you so late, because I took a long holiday.

On 2023/1/17 am 6:16, Andrew Morton wrote:
> On Mon, 16 Jan 2023 21:10:37 +0000 Al Viro <[email protected]> wrote:
>
>> On Mon, Jan 16, 2023 at 04:38:01AM +0000, Matthew Wilcox wrote:
>>> On Mon, Jan 16, 2023 at 11:16:13AM +0800, maobibo wrote:
>>>> Hongchen,
>>>>
>>>> I have a glance with this patch, it simply replaces with
>>>> spinlock_irqsave with mutex lock. There may be performance
>>>> improvement with two processes competing with pipe, however
>>>> for N processes, there will be complex context switches
>>>> and ipi interruptts.
>>>>
>>>> Can you find some cases with more than 2 processes competing
>>>> pipe, rather than only unixbench?
>>>
>>> What real applications have pipes with more than 1 writer & 1 reader?
>>> I'm OK with slowing down the weird cases if the common cases go faster.
>>
>> >From commit 0ddad21d3e99c743a3aa473121dc5561679e26bb:
>> While this isn't a common occurrence in the traditional "use a pipe as a
>> data transport" case, where you typically only have a single reader and
>> a single writer process, there is one common special case: using a pipe
>> as a source of "locking tokens" rather than for data communication.
>>
>> In particular, the GNU make jobserver code ends up using a pipe as a way
>> to limit parallelism, where each job consumes a token by reading a byte
>> from the jobserver pipe, and releases the token by writing a byte back
>> to the pipe.
>
> The author has tested this patch with Linus's test code from 0ddad21d3e
> and the results were OK
> (https://lkml.kernel.org/r/[email protected]).
>
> I've been stalling on this patch until Linus gets back to his desk,
> which now appears to have happened.
>
> Hongchen, when convenient, please capture this discussion (as well as
> the testing results with Linus's sample code) in the changelog and send
> us a v4, with Linus on cc?
>
I will send you a v4 and cc to Linus.

Thanks.
Hongchen Zhang