2010-06-03 08:01:27

by Henrik Rydberg

[permalink] [raw]
Subject: [PATCH 0/4] input: evdev: Dynamic buffers (rev3)

Dmitry,

Please find enclosed the third version of the evdev buffer patches.

This version has one more patch; the locking mechanism has been broken
out into its own file, buflock.h, adding a fair amount of
documentation. The mechanism has been moderately tested, showing
graceful dropping of packets as the buffer size gets smaller and the
number of simultaneous readers gets larger.

In the second patch, the per-client buffer is replaced by the buflock
reader, and the spinlock name is changed to reflect the reduced area
it covers. The third patch only has trivial changes, and the fourth
patch is unchanged, but included for completeness.

Cheers,
Henrik

---

Henrik Rydberg (4):
input: Introduce buflock, a one-to-many circular buffer mechanism
input: evdev: Use multi-reader buffer to save space (rev3)
input: evdev: Convert to dynamic event buffer (rev3)
input: Use driver hint to compute the evdev buffer size

drivers/input/buflock.h | 133 +++++++++++++++++++++++++++++++++++++++++++++++
drivers/input/evdev.c | 77 +++++++++++++++++-----------
include/linux/input.h | 7 +++
3 files changed, 187 insertions(+), 30 deletions(-)
create mode 100644 drivers/input/buflock.h


2010-06-03 08:01:35

by Henrik Rydberg

[permalink] [raw]
Subject: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

In spite of the many lock patterns and fifo helpers in the kernel, the
case of a single writer feeding many readers via a circular buffer
seems to be uncovered. This patch adds the buflock, a minimalistic
interface implementing SMP-safe locking for such a buffer. Under
normal operation, given adequate buffer size, the operation is
lock-less. The template is given the name buflock to emphasize that
the locking depends on the buffer read/write clashes.

Signed-off-by: Henrik Rydberg <[email protected]>
---
drivers/input/buflock.h | 133 +++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 133 insertions(+), 0 deletions(-)
create mode 100644 drivers/input/buflock.h

diff --git a/drivers/input/buflock.h b/drivers/input/buflock.h
new file mode 100644
index 0000000..3a4322c
--- /dev/null
+++ b/drivers/input/buflock.h
@@ -0,0 +1,133 @@
+#ifndef _BUFLOCK_H
+#define _BUFLOCK_H
+/*
+ * Circular buffer lock for single writer, multiple readers
+ *
+ * Copyright (c) 2010 Henrik Rydberg
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+/*
+ * Mechanism for circular buffer locking:
+ *
+ * Single writer does not block.
+ *
+ * Readers block on buffer wrap-around only.
+ *
+ * These locks are particularly suitable when a single writer must not
+ * be starved, and when there are several threads reading the same buffer.
+ *
+ * The structure is similar to seqlocks, with the main difference being
+ * that readers retry only when the writer simultaneously overwrites the
+ * data currently being read.
+ *
+ * In practise, given enough buffer size, the mechanism is lock-less.
+ *
+ * Like seqlocks, buflocks are not very cache friendly, and require the
+ * buffer to be valid in all threads.
+ *
+ * Multiple writers or re-entrant readers require additional locking.
+ *
+ */
+
+#include <linux/spinlock.h>
+
+struct buflock_writer {
+ unsigned int head;
+ unsigned int next_head;
+};
+
+struct buflock_reader {
+ unsigned int head;
+ unsigned int tail;
+};
+
+/*
+ * Write to buffer without locking
+ *
+ * bw - the buflock_writer keeping track of the write position
+ * buf - the buffer to write to (array of item type)
+ * size - the size of the circular buffer (must be a power of two)
+ * item - the item to write
+ *
+ * There is no locking involved during write, so this method is
+ * suitable to use in interrupt context.
+ */
+#define buflock_write(bw, buf, size, item) \
+ do { \
+ bw.next_head = (bw.head + 1) & ((size) - 1); \
+ smp_wmb(); \
+ buf[bw.head] = item; \
+ smp_wmb(); \
+ bw.head = bw.next_head; \
+ smp_wmb(); \
+ } while (0)
+
+
+/*
+ * Syncronize reader with current writer
+ *
+ * br - the buflock_reader keeping track of the read position
+ * bw - the buflock_writer keeping track of the write position
+ *
+ * Synchronize the reader head with the writer head, effectively
+ * telling the reader thread that there is new data to read.
+ *
+ * The reader head will always follow the writer head. As a
+ * consequence, the number of items stored in the read buffer might
+ * decrease during sync, as an effect of wrap-around. To avoid
+ * non-deterministic behavior during polls, the read buffer is
+ * guaranteed to be non-empty after synchronization.
+ *
+ */
+#define buflock_sync_reader(br, bw) \
+ do { \
+ if (br.tail != bw.head) \
+ br.head = bw.head; \
+ } while (0)
+
+/*
+ * True if reader is empty
+ *
+ * br - the buflock_reader keeping track of the read position
+ *
+ */
+#define buflock_reader_empty(br) (br.head == br.tail)
+
+/*
+ * Read from buffer, retry during wrap-around
+ *
+ * br - the buflock_reader keeping track of the read position
+ * bw - the buflock_writer keeping track of the write position
+ * buf - the buffer to read from (array of item type)
+ * size - the size of the circular buffer (must be a power of two)
+ * item - the item to read to
+ *
+ * Read the oldest item available in the buffer.
+ *
+ * During normal operation, with adequate buffer size, this method will not
+ * block, regardless of the number of concurrent readers. The method will
+ * only block momentarily during a write to the same position being read
+ * from, which happens when the buffer gets full. In such cases, the value
+ * eventually read will be the new value written to the buffer.
+ *
+ */
+#define buflock_read(br, bw, buf, size, item) \
+ do { \
+ unsigned int _h, _nh; \
+ do { \
+ _h = bw.head; \
+ smp_rmb(); \
+ item = buf[br.tail]; \
+ smp_rmb(); \
+ _nh = bw.next_head; \
+ smp_rmb(); \
+ } while (unlikely(br.tail - _h < _nh - _h)); \
+ br.tail = (br.tail + 1) & ((size) - 1); \
+ } while (0)
+
+
+#endif /* _BUFLOCK_H */
--
1.6.3.3

2010-06-03 08:02:08

by Henrik Rydberg

[permalink] [raw]
Subject: [PATCH 4/4] input: Use driver hint to compute the evdev buffer size

Some devices, in particular MT devices, produce a lot of data. This
leads to a high frequency of lost packets in evdev, which by default
uses a fairly small event buffer. Let the drivers hint the average
number of events per packet for the device by calling the
input_set_events_per_packet(), and use that information when computing
the evdev buffer size.

Signed-off-by: Henrik Rydberg <[email protected]>
---
drivers/input/evdev.c | 5 ++++-
include/linux/input.h | 7 +++++++
2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 2e2a339..f08b1d2 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -11,6 +11,7 @@
#define EVDEV_MINOR_BASE 64
#define EVDEV_MINORS 32
#define EVDEV_MIN_BUFFER_SIZE 64
+#define EVDEV_BUF_PACKETS 8

#include <linux/poll.h>
#include <linux/sched.h>
@@ -790,7 +791,9 @@ static void evdev_cleanup(struct evdev *evdev)

static int evdev_compute_buffer_size(struct input_dev *dev)
{
- return EVDEV_MIN_BUFFER_SIZE;
+ int nev = dev->hint_events_per_packet * EVDEV_BUF_PACKETS;
+ nev = max(nev, EVDEV_MIN_BUFFER_SIZE);
+ return roundup_pow_of_two(nev);
}

/*
diff --git a/include/linux/input.h b/include/linux/input.h
index bd00786..35b015d 100644
--- a/include/linux/input.h
+++ b/include/linux/input.h
@@ -1162,6 +1162,8 @@ struct input_dev {
unsigned long ffbit[BITS_TO_LONGS(FF_CNT)];
unsigned long swbit[BITS_TO_LONGS(SW_CNT)];

+ unsigned int hint_events_per_packet;
+
unsigned int keycodemax;
unsigned int keycodesize;
void *keycode;
@@ -1439,6 +1441,11 @@ static inline void input_mt_slot(struct input_dev *dev, int slot)

void input_set_capability(struct input_dev *dev, unsigned int type, unsigned int code);

+static inline void input_set_events_per_packet(struct input_dev *dev, int nev)
+{
+ dev->hint_events_per_packet = nev;
+}
+
static inline void input_set_abs_params(struct input_dev *dev, int axis, int min, int max, int fuzz, int flat)
{
dev->absmin[axis] = min;
--
1.6.3.3

2010-06-03 08:02:15

by Henrik Rydberg

[permalink] [raw]
Subject: [PATCH 2/4] input: evdev: Use multi-reader buffer to save space (rev3)

Preparing for larger buffer needs, convert the current per-client
circular buffer to a single buffer with multiple clients. Use the
buflock mechanism, where clients wait during buffer collision only.

Signed-off-by: Henrik Rydberg <[email protected]>
---
drivers/input/evdev.c | 57 ++++++++++++++++++++++++-------------------------
1 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 2ee6c7a..7f0c7da 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -21,6 +21,7 @@
#include <linux/major.h>
#include <linux/device.h>
#include "input-compat.h"
+#include "buflock.h"

struct evdev {
int exist;
@@ -33,13 +34,13 @@ struct evdev {
spinlock_t client_lock; /* protects client_list */
struct mutex mutex;
struct device dev;
+ struct buflock_writer writer;
+ struct input_event buffer[EVDEV_BUFFER_SIZE];
};

struct evdev_client {
- struct input_event buffer[EVDEV_BUFFER_SIZE];
- int head;
- int tail;
- spinlock_t buffer_lock; /* protects access to buffer, head and tail */
+ struct buflock_reader reader;
+ spinlock_t entry_lock; /* protects reentrant fops reads */
struct fasync_struct *fasync;
struct evdev *evdev;
struct list_head node;
@@ -48,18 +49,12 @@ struct evdev_client {
static struct evdev *evdev_table[EVDEV_MINORS];
static DEFINE_MUTEX(evdev_table_mutex);

-static void evdev_pass_event(struct evdev_client *client,
- struct input_event *event)
+static inline void evdev_sync_event(struct evdev_client *client,
+ struct evdev *evdev, int type)
{
- /*
- * Interrupts are disabled, just acquire the lock
- */
- spin_lock(&client->buffer_lock);
- client->buffer[client->head++] = *event;
- client->head &= EVDEV_BUFFER_SIZE - 1;
- spin_unlock(&client->buffer_lock);
-
- if (event->type == EV_SYN)
+ /* sync reader, no locks required */
+ buflock_sync_reader(client->reader, evdev->writer);
+ if (type == EV_SYN)
kill_fasync(&client->fasync, SIGIO, POLL_IN);
}

@@ -78,14 +73,17 @@ static void evdev_event(struct input_handle *handle,
event.code = code;
event.value = value;

+ /* lock-less write */
+ buflock_write(evdev->writer, evdev->buffer, EVDEV_BUFFER_SIZE, event);
+
rcu_read_lock();

client = rcu_dereference(evdev->grab);
if (client)
- evdev_pass_event(client, &event);
+ evdev_sync_event(client, evdev, type);
else
list_for_each_entry_rcu(client, &evdev->client_list, node)
- evdev_pass_event(client, &event);
+ evdev_sync_event(client, evdev, type);

rcu_read_unlock();

@@ -269,7 +267,7 @@ static int evdev_open(struct inode *inode, struct file *file)
goto err_put_evdev;
}

- spin_lock_init(&client->buffer_lock);
+ spin_lock_init(&client->entry_lock);
client->evdev = evdev;
evdev_attach_client(evdev, client);

@@ -325,19 +323,19 @@ static ssize_t evdev_write(struct file *file, const char __user *buffer,
}

static int evdev_fetch_next_event(struct evdev_client *client,
+ struct evdev *evdev,
struct input_event *event)
{
int have_event;

- spin_lock_irq(&client->buffer_lock);
+ spin_lock_irq(&client->entry_lock);

- have_event = client->head != client->tail;
- if (have_event) {
- *event = client->buffer[client->tail++];
- client->tail &= EVDEV_BUFFER_SIZE - 1;
- }
+ have_event = !buflock_reader_empty(client->reader);
+ if (have_event)
+ buflock_read(client->reader, evdev->writer,
+ evdev->buffer, EVDEV_BUFFER_SIZE, *event);

- spin_unlock_irq(&client->buffer_lock);
+ spin_unlock_irq(&client->entry_lock);

return have_event;
}
@@ -353,12 +351,12 @@ static ssize_t evdev_read(struct file *file, char __user *buffer,
if (count < input_event_size())
return -EINVAL;

- if (client->head == client->tail && evdev->exist &&
+ if (buflock_reader_empty(client->reader) && evdev->exist &&
(file->f_flags & O_NONBLOCK))
return -EAGAIN;

retval = wait_event_interruptible(evdev->wait,
- client->head != client->tail || !evdev->exist);
+ !buflock_reader_empty(client->reader) || !evdev->exist);
if (retval)
return retval;

@@ -366,7 +364,7 @@ static ssize_t evdev_read(struct file *file, char __user *buffer,
return -ENODEV;

while (retval + input_event_size() <= count &&
- evdev_fetch_next_event(client, &event)) {
+ evdev_fetch_next_event(client, evdev, &event)) {

if (input_event_to_user(buffer + retval, &event))
return -EFAULT;
@@ -384,7 +382,8 @@ static unsigned int evdev_poll(struct file *file, poll_table *wait)
struct evdev *evdev = client->evdev;

poll_wait(file, &evdev->wait, wait);
- return ((client->head == client->tail) ? 0 : (POLLIN | POLLRDNORM)) |
+ return (buflock_reader_empty(client->reader) ? 0 :
+ (POLLIN | POLLRDNORM)) |
(evdev->exist ? 0 : (POLLHUP | POLLERR));
}

--
1.6.3.3

2010-06-03 08:02:28

by Henrik Rydberg

[permalink] [raw]
Subject: [PATCH 3/4] input: evdev: Convert to dynamic event buffer (rev3)

Allocate the event buffer dynamically, and prepare to compute the
buffer size in a separate function. This patch defines the size
computation to be identical to the current code, and does not contain
any logical changes.

Signed-off-by: Henrik Rydberg <[email protected]>
---
drivers/input/evdev.c | 23 +++++++++++++++++++----
1 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
index 7f0c7da..2e2a339 100644
--- a/drivers/input/evdev.c
+++ b/drivers/input/evdev.c
@@ -10,7 +10,7 @@

#define EVDEV_MINOR_BASE 64
#define EVDEV_MINORS 32
-#define EVDEV_BUFFER_SIZE 64
+#define EVDEV_MIN_BUFFER_SIZE 64

#include <linux/poll.h>
#include <linux/sched.h>
@@ -35,7 +35,8 @@ struct evdev {
struct mutex mutex;
struct device dev;
struct buflock_writer writer;
- struct input_event buffer[EVDEV_BUFFER_SIZE];
+ unsigned int bufsize;
+ struct input_event *buffer;
};

struct evdev_client {
@@ -74,7 +75,7 @@ static void evdev_event(struct input_handle *handle,
event.value = value;

/* lock-less write */
- buflock_write(evdev->writer, evdev->buffer, EVDEV_BUFFER_SIZE, event);
+ buflock_write(evdev->writer, evdev->buffer, evdev->bufsize, event);

rcu_read_lock();

@@ -121,6 +122,7 @@ static void evdev_free(struct device *dev)
struct evdev *evdev = container_of(dev, struct evdev, dev);

input_put_device(evdev->handle.dev);
+ kfree(evdev->buffer);
kfree(evdev);
}

@@ -333,7 +335,7 @@ static int evdev_fetch_next_event(struct evdev_client *client,
have_event = !buflock_reader_empty(client->reader);
if (have_event)
buflock_read(client->reader, evdev->writer,
- evdev->buffer, EVDEV_BUFFER_SIZE, *event);
+ evdev->buffer, evdev->bufsize, *event);

spin_unlock_irq(&client->entry_lock);

@@ -786,6 +788,11 @@ static void evdev_cleanup(struct evdev *evdev)
}
}

+static int evdev_compute_buffer_size(struct input_dev *dev)
+{
+ return EVDEV_MIN_BUFFER_SIZE;
+}
+
/*
* Create new evdev device. Note that input core serializes calls
* to connect and disconnect so we don't need to lock evdev_table here.
@@ -830,6 +837,14 @@ static int evdev_connect(struct input_handler *handler, struct input_dev *dev,
evdev->dev.release = evdev_free;
device_initialize(&evdev->dev);

+ evdev->bufsize = evdev_compute_buffer_size(dev);
+ evdev->buffer = kzalloc(evdev->bufsize * sizeof(struct input_event),
+ GFP_KERNEL);
+ if (!evdev->buffer) {
+ error = -ENOMEM;
+ goto err_free_evdev;
+ }
+
error = input_register_handle(&evdev->handle);
if (error)
goto err_free_evdev;
--
1.6.3.3

2010-06-04 06:34:38

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH 4/4] input: Use driver hint to compute the evdev buffer size

Hi Henrik,

On Thu, Jun 03, 2010 at 10:01:02AM +0200, Henrik Rydberg wrote:
>
> +static inline void input_set_events_per_packet(struct input_dev *dev, int nev)

A kerneldoc-style comment explaining the purpose of the new function would
be very beneficial.

> +{
> + dev->hint_events_per_packet = nev;
> +}
> +

Thanks.

--
Dmitry

2010-06-04 06:37:41

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH 3/4] input: evdev: Convert to dynamic event buffer (rev3)

On Thu, Jun 03, 2010 at 10:01:01AM +0200, Henrik Rydberg wrote:
>
> + evdev->bufsize = evdev_compute_buffer_size(dev);
> + evdev->buffer = kzalloc(evdev->bufsize * sizeof(struct input_event),
> + GFP_KERNEL);

kcalloc().

> + if (!evdev->buffer) {
> + error = -ENOMEM;
> + goto err_free_evdev;
> + }
> +
> error = input_register_handle(&evdev->handle);
> if (error)
> goto err_free_evdev;

--
Dmitry

2010-06-04 06:57:09

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

Hi Henrik,

On Thu, Jun 03, 2010 at 10:00:59AM +0200, Henrik Rydberg wrote:
> In spite of the many lock patterns and fifo helpers in the kernel, the
> case of a single writer feeding many readers via a circular buffer
> seems to be uncovered. This patch adds the buflock, a minimalistic
> interface implementing SMP-safe locking for such a buffer. Under
> normal operation, given adequate buffer size, the operation is
> lock-less. The template is given the name buflock to emphasize that
> the locking depends on the buffer read/write clashes.
>
> Signed-off-by: Henrik Rydberg <[email protected]>
> ---
> drivers/input/buflock.h | 133 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 133 insertions(+), 0 deletions(-)
> create mode 100644 drivers/input/buflock.h
>
> diff --git a/drivers/input/buflock.h b/drivers/input/buflock.h
> new file mode 100644
> index 0000000..3a4322c
> --- /dev/null
> +++ b/drivers/input/buflock.h
> @@ -0,0 +1,133 @@
> +#ifndef _BUFLOCK_H
> +#define _BUFLOCK_H
> +/*
> + * Circular buffer lock for single writer, multiple readers
> + *
> + * Copyright (c) 2010 Henrik Rydberg
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +
> +/*
> + * Mechanism for circular buffer locking:
> + *
> + * Single writer does not block.
> + *
> + * Readers block on buffer wrap-around only.
> + *
> + * These locks are particularly suitable when a single writer must not
> + * be starved, and when there are several threads reading the same buffer.
> + *
> + * The structure is similar to seqlocks, with the main difference being
> + * that readers retry only when the writer simultaneously overwrites the
> + * data currently being read.
> + *
> + * In practise, given enough buffer size, the mechanism is lock-less.
> + *
> + * Like seqlocks, buflocks are not very cache friendly, and require the
> + * buffer to be valid in all threads.
> + *
> + * Multiple writers or re-entrant readers require additional locking.
> + *
> + */
> +
> +#include <linux/spinlock.h>
> +
> +struct buflock_writer {
> + unsigned int head;
> + unsigned int next_head;
> +};

Since there can be only one writer thread should we just create "struct
buflock" and pull head and next head into it along with the buffer
itself and element size?

Also, maybe we could extend kfifo with the notion of multiple readers?

In any case, it shoudl not live in include/linux/input/ as it may be
useful ouside of input subsystem.

> +
> +struct buflock_reader {
> + unsigned int head;
> + unsigned int tail;
> +};
> +
> +/*
> + * Write to buffer without locking

Implies that there is an option of doing so with locking. Juts change to
write. Also please use standard kerneldoc-style markups.

> + *
> + * bw - the buflock_writer keeping track of the write position
> + * buf - the buffer to write to (array of item type)
> + * size - the size of the circular buffer (must be a power of two)
> + * item - the item to write
> + *
> + * There is no locking involved during write, so this method is
> + * suitable to use in interrupt context.

This is a misleading statement. You can say that this operation does not
sleep and thus is suitable for use in atomic contexts.

> + */
> +#define buflock_write(bw, buf, size, item) \
> + do { \
> + bw.next_head = (bw.head + 1) & ((size) - 1); \
> + smp_wmb(); \

Why do we need the write barrier here?

> + buf[bw.head] = item; \
> + smp_wmb(); \

I think this is the only barrier that is needed. You want to make sure
that we advance head only after we complete write. Also, why SMP only?
Can't we get into trouble if we rearrange writes and take interrupt and
schedule away from this thread?

> + bw.head = bw.next_head; \
> + smp_wmb(); \

Why do we need the write barrier here?

> + } while (0)
> +

This (and the rest) should be a static inline function so that we have
type safety, etc, etc.

> +
> +/*
> + * Syncronize reader with current writer
> + *
> + * br - the buflock_reader keeping track of the read position
> + * bw - the buflock_writer keeping track of the write position
> + *
> + * Synchronize the reader head with the writer head, effectively
> + * telling the reader thread that there is new data to read.
> + *
> + * The reader head will always follow the writer head. As a
> + * consequence, the number of items stored in the read buffer might
> + * decrease during sync, as an effect of wrap-around. To avoid
> + * non-deterministic behavior during polls, the read buffer is
> + * guaranteed to be non-empty after synchronization.
> + *
> + */
> +#define buflock_sync_reader(br, bw) \
> + do { \
> + if (br.tail != bw.head) \
> + br.head = bw.head; \

Why condition? Simple assignment is cheaper.

> + } while (0)
> +
> +/*
> + * True if reader is empty
> + *
> + * br - the buflock_reader keeping track of the read position
> + *
> + */
> +#define buflock_reader_empty(br) (br.head == br.tail)
> +
> +/*
> + * Read from buffer, retry during wrap-around
> + *
> + * br - the buflock_reader keeping track of the read position
> + * bw - the buflock_writer keeping track of the write position
> + * buf - the buffer to read from (array of item type)
> + * size - the size of the circular buffer (must be a power of two)
> + * item - the item to read to
> + *
> + * Read the oldest item available in the buffer.
> + *
> + * During normal operation, with adequate buffer size, this method will not
> + * block, regardless of the number of concurrent readers. The method will
> + * only block momentarily during a write to the same position being read
> + * from, which happens when the buffer gets full. In such cases, the value
> + * eventually read will be the new value written to the buffer.
> + *
> + */
> +#define buflock_read(br, bw, buf, size, item) \
> + do { \
> + unsigned int _h, _nh; \
> + do { \
> + _h = bw.head; \
> + smp_rmb(); \
> + item = buf[br.tail]; \
> + smp_rmb(); \
> + _nh = bw.next_head; \
> + smp_rmb(); \
> + } while (unlikely(br.tail - _h < _nh - _h)); \
> + br.tail = (br.tail + 1) & ((size) - 1); \
> + } while (0)

Again, are we sure we need all these barriers? Spinlock may end up less
expensive... Anyway, Oleg Nesterov knows more than anyone about data
coherency issues (CCed).

--
Dmitry

2010-06-04 06:59:23

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH 0/4] input: evdev: Dynamic buffers (rev3)

Hi Henrik,

On Thu, Jun 03, 2010 at 10:00:58AM +0200, Henrik Rydberg wrote:
> Dmitry,
>
> Please find enclosed the third version of the evdev buffer patches.
>
> This version has one more patch; the locking mechanism has been broken
> out into its own file, buflock.h, adding a fair amount of
> documentation. The mechanism has been moderately tested, showing
> graceful dropping of packets as the buffer size gets smaller and the
> number of simultaneous readers gets larger.
>
> In the second patch, the per-client buffer is replaced by the buflock
> reader, and the spinlock name is changed to reflect the reduced area
> it covers. The third patch only has trivial changes, and the fourth
> patch is unchanged, but included for completeness.
>

I think the patchet moves things in the right direction, however I am
unsure at the moment about buflock implementation. Any chance you coudl
redo the series without the buflock (with readers taking dev->event_lock
for now) and once buflock is sorted we can switch to is as a followup).

Thanks.

--
Dmitry

2010-06-04 08:43:46

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

Hi Dmitry,
>> +struct buflock_writer {
>> + unsigned int head;
>> + unsigned int next_head;
>> +};
>
> Since there can be only one writer thread should we just create "struct
> buflock" and pull head and next head into it along with the buffer
> itself and element size?

It is possible, but there are some arguments against it:

1. What type to give the buffer?
2. Static or dynamic buffering?
3. Can size be both a compile-time constant and a variable?

In short, I think that by _not_ including the actual buffer, the method
ultimately becomes more useful.

> Also, maybe we could extend kfifo with the notion of multiple readers?

If merging the data and algorithm as you suggest, that would be a logical step,
yes. To me, the most ideal would be to modify split the kfifo into data, writers
and readers. But that would require api changes.

>
> In any case, it shoudl not live in include/linux/input/ as it may be
> useful ouside of input subsystem.

Agreed.

>
>> +
>> +struct buflock_reader {
>> + unsigned int head;
>> + unsigned int tail;
>> +};
>> +
>> +/*
>> + * Write to buffer without locking
>
> Implies that there is an option of doing so with locking. Juts change to
> write. Also please use standard kerneldoc-style markups.

Ok.

>
>> + *
>> + * bw - the buflock_writer keeping track of the write position
>> + * buf - the buffer to write to (array of item type)
>> + * size - the size of the circular buffer (must be a power of two)
>> + * item - the item to write
>> + *
>> + * There is no locking involved during write, so this method is
>> + * suitable to use in interrupt context.
>
> This is a misleading statement. You can say that this operation does not
> sleep and thus is suitable for use in atomic contexts.

Ok.

>
>> + */
>> +#define buflock_write(bw, buf, size, item) \
>> + do { \
>> + bw.next_head = (bw.head + 1) & ((size) - 1); \
>> + smp_wmb(); \
>
> Why do we need the write barrier here?

reader_loads_next_head
-> interrupt modifying next_head then the buffer then the head
reader_loads_buffer
reader_loads_head

In this scenario, the reader ends up seeing next_head < head, but the position
written was next_head. The reader will get a false picture of which portion of
the buffer was modified.

>
>> + buf[bw.head] = item; \
>> + smp_wmb(); \
>
> I think this is the only barrier that is needed. You want to make sure
> that we advance head only after we complete write. Also, why SMP only?
> Can't we get into trouble if we rearrange writes and take interrupt and
> schedule away from this thread?

This would be true for a single-reader fifo, if we do not care about what
happens when the buffer wraps around. Regarding reordering, my impression was
that this cannot happen across smp_wmb(), but I might very well be wrong.

>
>> + bw.head = bw.next_head; \
>> + smp_wmb(); \
>
> Why do we need the write barrier here?

This is following the pattern of seqlocks. My understanding is that since we
will later rely on head being written, the last smp_wmb() is "for the road".

>
>> + } while (0)
>> +
>
> This (and the rest) should be a static inline function so that we have
> type safety, etc, etc.

And this is precisely what I wanted to avoid by not including the buffer in the
buflock structures.

>
>> +
>> +/*
>> + * Syncronize reader with current writer
>> + *
>> + * br - the buflock_reader keeping track of the read position
>> + * bw - the buflock_writer keeping track of the write position
>> + *
>> + * Synchronize the reader head with the writer head, effectively
>> + * telling the reader thread that there is new data to read.
>> + *
>> + * The reader head will always follow the writer head. As a
>> + * consequence, the number of items stored in the read buffer might
>> + * decrease during sync, as an effect of wrap-around. To avoid
>> + * non-deterministic behavior during polls, the read buffer is
>> + * guaranteed to be non-empty after synchronization.
>> + *
>> + */
>> +#define buflock_sync_reader(br, bw) \
>> + do { \
>> + if (br.tail != bw.head) \
>> + br.head = bw.head; \
>
> Why condition? Simple assignment is cheaper.

The condition takes care of a problem that is present also in the current evdev
code: When the buffer is very small and wraps around a lot, it may well be that
a write increases the head so that head == tail. If this happens between a point
where a poll is triggered and the actual data being read, there will be no data
to read. In an application like "cat", this will close the file and the program
will exit.

By ensuring that the writer never creates a situation where head == tail, this
problem is avoided.

>
>> + } while (0)
>> +
>> +/*
>> + * True if reader is empty
>> + *
>> + * br - the buflock_reader keeping track of the read position
>> + *
>> + */
>> +#define buflock_reader_empty(br) (br.head == br.tail)
>> +
>> +/*
>> + * Read from buffer, retry during wrap-around
>> + *
>> + * br - the buflock_reader keeping track of the read position
>> + * bw - the buflock_writer keeping track of the write position
>> + * buf - the buffer to read from (array of item type)
>> + * size - the size of the circular buffer (must be a power of two)
>> + * item - the item to read to
>> + *
>> + * Read the oldest item available in the buffer.
>> + *
>> + * During normal operation, with adequate buffer size, this method will not
>> + * block, regardless of the number of concurrent readers. The method will
>> + * only block momentarily during a write to the same position being read
>> + * from, which happens when the buffer gets full. In such cases, the value
>> + * eventually read will be the new value written to the buffer.
>> + *
>> + */
>> +#define buflock_read(br, bw, buf, size, item) \
>> + do { \
>> + unsigned int _h, _nh; \
>> + do { \
>> + _h = bw.head; \
>> + smp_rmb(); \
>> + item = buf[br.tail]; \
>> + smp_rmb(); \
>> + _nh = bw.next_head; \
>> + smp_rmb(); \
>> + } while (unlikely(br.tail - _h < _nh - _h)); \
>> + br.tail = (br.tail + 1) & ((size) - 1); \
>> + } while (0)
>
> Again, are we sure we need all these barriers? Spinlock may end up less
> expensive... Anyway, Oleg Nesterov knows more than anyone about data
> coherency issues (CCed).

These barriers I am less certain of, so additional eyes would be very helpful.

Thanks,
Henrik

2010-06-04 16:11:57

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [PATCH 0/4] input: evdev: Dynamic buffers (rev3)

Dmitry Torokhov wrote:
> Hi Henrik,
[...]
>
> I think the patchet moves things in the right direction, however I am
> unsure at the moment about buflock implementation. Any chance you coudl
> redo the series without the buflock (with readers taking dev->event_lock
> for now) and once buflock is sorted we can switch to is as a followup).
>
> Thanks.
>

Good idea, I will do that.

Thanks,
Henrik

2010-06-04 16:36:31

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

On Fri, Jun 04, 2010 at 10:43:33AM +0200, Henrik Rydberg wrote:
> Hi Dmitry,
> >> +struct buflock_writer {
> >> + unsigned int head;
> >> + unsigned int next_head;
> >> +};
> >
> > Since there can be only one writer thread should we just create "struct
> > buflock" and pull head and next head into it along with the buffer
> > itself and element size?
>
> It is possible, but there are some arguments against it:
>
> 1. What type to give the buffer?

u8.

> 2. Static or dynamic buffering?

You mean resizeable?

> 3. Can size be both a compile-time constant and a variable?
>

Obviously not compile-time only.

> In short, I think that by _not_ including the actual buffer, the method
> ultimately becomes more useful.
>
> > Also, maybe we could extend kfifo with the notion of multiple readers?
>
> If merging the data and algorithm as you suggest, that would be a logical step,
> yes. To me, the most ideal would be to modify split the kfifo into data, writers
> and readers. But that would require api changes.
>
> >
> > In any case, it shoudl not live in include/linux/input/ as it may be
> > useful ouside of input subsystem.
>
> Agreed.
>
> >
> >> +
> >> +struct buflock_reader {
> >> + unsigned int head;
> >> + unsigned int tail;
> >> +};
> >> +
> >> +/*
> >> + * Write to buffer without locking
> >
> > Implies that there is an option of doing so with locking. Juts change to
> > write. Also please use standard kerneldoc-style markups.
>
> Ok.
>
> >
> >> + *
> >> + * bw - the buflock_writer keeping track of the write position
> >> + * buf - the buffer to write to (array of item type)
> >> + * size - the size of the circular buffer (must be a power of two)
> >> + * item - the item to write
> >> + *
> >> + * There is no locking involved during write, so this method is
> >> + * suitable to use in interrupt context.
> >
> > This is a misleading statement. You can say that this operation does not
> > sleep and thus is suitable for use in atomic contexts.
>
> Ok.
>
> >
> >> + */
> >> +#define buflock_write(bw, buf, size, item) \
> >> + do { \
> >> + bw.next_head = (bw.head + 1) & ((size) - 1); \
> >> + smp_wmb(); \
> >
> > Why do we need the write barrier here?
>
> reader_loads_next_head
> -> interrupt modifying next_head then the buffer then the head
> reader_loads_buffer
> reader_loads_head
>
> In this scenario, the reader ends up seeing next_head < head, but the position
> written was next_head. The reader will get a false picture of which portion of
> the buffer was modified.

I see.

>
> >
> >> + buf[bw.head] = item; \
> >> + smp_wmb(); \
> >
> > I think this is the only barrier that is needed. You want to make sure
> > that we advance head only after we complete write. Also, why SMP only?
> > Can't we get into trouble if we rearrange writes and take interrupt and
> > schedule away from this thread?
>
> This would be true for a single-reader fifo, if we do not care about what
> happens when the buffer wraps around. Regarding reordering, my impression was
> that this cannot happen across smp_wmb(), but I might very well be wrong.
>
> >
> >> + bw.head = bw.next_head; \
> >> + smp_wmb(); \
> >
> > Why do we need the write barrier here?
>
> This is following the pattern of seqlocks. My understanding is that since we
> will later rely on head being written, the last smp_wmb() is "for the road".
>
> >
> >> + } while (0)
> >> +
> >
> > This (and the rest) should be a static inline function so that we have
> > type safety, etc, etc.
>
> And this is precisely what I wanted to avoid by not including the buffer in the
> buflock structures.
>
> >
> >> +
> >> +/*
> >> + * Syncronize reader with current writer
> >> + *
> >> + * br - the buflock_reader keeping track of the read position
> >> + * bw - the buflock_writer keeping track of the write position
> >> + *
> >> + * Synchronize the reader head with the writer head, effectively
> >> + * telling the reader thread that there is new data to read.
> >> + *
> >> + * The reader head will always follow the writer head. As a
> >> + * consequence, the number of items stored in the read buffer might
> >> + * decrease during sync, as an effect of wrap-around. To avoid
> >> + * non-deterministic behavior during polls, the read buffer is
> >> + * guaranteed to be non-empty after synchronization.
> >> + *
> >> + */
> >> +#define buflock_sync_reader(br, bw) \
> >> + do { \
> >> + if (br.tail != bw.head) \
> >> + br.head = bw.head; \
> >
> > Why condition? Simple assignment is cheaper.
>

Ah, crap, I misread the condition... Anyway, thanks for the
explalantion.

> The condition takes care of a problem that is present also in the current evdev
> code: When the buffer is very small and wraps around a lot, it may well be that
> a write increases the head so that head == tail. If this happens between a point
> where a poll is triggered and the actual data being read, there will be no data
> to read. In an application like "cat", this will close the file and the program
> will exit.
>
> By ensuring that the writer never creates a situation where head == tail, this
> problem is avoided.
>
> >
> >> + } while (0)
> >> +
> >> +/*
> >> + * True if reader is empty
> >> + *
> >> + * br - the buflock_reader keeping track of the read position
> >> + *
> >> + */
> >> +#define buflock_reader_empty(br) (br.head == br.tail)
> >> +
> >> +/*
> >> + * Read from buffer, retry during wrap-around
> >> + *
> >> + * br - the buflock_reader keeping track of the read position
> >> + * bw - the buflock_writer keeping track of the write position
> >> + * buf - the buffer to read from (array of item type)
> >> + * size - the size of the circular buffer (must be a power of two)
> >> + * item - the item to read to
> >> + *
> >> + * Read the oldest item available in the buffer.
> >> + *
> >> + * During normal operation, with adequate buffer size, this method will not
> >> + * block, regardless of the number of concurrent readers. The method will
> >> + * only block momentarily during a write to the same position being read
> >> + * from, which happens when the buffer gets full. In such cases, the value
> >> + * eventually read will be the new value written to the buffer.
> >> + *
> >> + */
> >> +#define buflock_read(br, bw, buf, size, item) \
> >> + do { \
> >> + unsigned int _h, _nh; \
> >> + do { \
> >> + _h = bw.head; \
> >> + smp_rmb(); \
> >> + item = buf[br.tail]; \
> >> + smp_rmb(); \
> >> + _nh = bw.next_head; \
> >> + smp_rmb(); \
> >> + } while (unlikely(br.tail - _h < _nh - _h)); \
> >> + br.tail = (br.tail + 1) & ((size) - 1); \
> >> + } while (0)
> >
> > Again, are we sure we need all these barriers? Spinlock may end up less
> > expensive... Anyway, Oleg Nesterov knows more than anyone about data
> > coherency issues (CCed).
>
> These barriers I am less certain of, so additional eyes would be very helpful.
>
> Thanks,
> Henrik
>

--
Dmitry

2010-06-04 16:37:06

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

>> +#define buflock_write(bw, buf, size, item) \
>> + do { \
>> + bw.next_head = (bw.head + 1) & ((size) - 1); \
>> + smp_wmb(); \
>
> Why do we need the write barrier here?

I believe my first answer to this question was foggy indeed, so allow me to go
again, with a time line:

Scenario 1, correct write order:

writer store_next_head store_buf store_head
reader load_head load_buf load_next_head

Result: head != next_head, incoherent read detected

Scenario 2, incorrect write order:

writer store_buf store_next_head store_head
reader load_head load_buf load_next_head

Result: head == next_head, incoherent read not detected

Based on the assumption that scenario 2 could happen if the smp_wmb() is not
present, the barrier is needed.

Henrik

2010-06-04 17:08:17

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

On 06/04/10 09:43, Henrik Rydberg wrote:
> Hi Dmitry,
>>> +struct buflock_writer {
>>> + unsigned int head;
>>> + unsigned int next_head;
>>> +};
>>
>> Since there can be only one writer thread should we just create "struct
>> buflock" and pull head and next head into it along with the buffer
>> itself and element size?
>
> It is possible, but there are some arguments against it:
>
> 1. What type to give the buffer?
> 2. Static or dynamic buffering?
> 3. Can size be both a compile-time constant and a variable?
>
> In short, I think that by _not_ including the actual buffer, the method
> ultimately becomes more useful.
>
>> Also, maybe we could extend kfifo with the notion of multiple readers?
>
> If merging the data and algorithm as you suggest, that would be a logical step,
> yes. To me, the most ideal would be to modify split the kfifo into data, writers
> and readers. But that would require api changes.
>
>>
>> In any case, it shoudl not live in include/linux/input/ as it may be
>> useful ouside of input subsystem.
>
> Agreed.
I've just opened a debate on linux-iio about whether we want our event infrastructure to
support multiple readers. If enough people care, then this looks like some infrastructure
we will be wanting to use as well so I would definitely support putting this outside
of input.

Jonathan

2010-06-04 19:15:27

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

On 06/04, Henrik Rydberg wrote:
>
> additional eyes would be very helpful

I am puzzled. I don't understand what this patch does at all ;)

Could you please provide a simple example or answer my questions?

> >> + * During normal operation, with adequate buffer size, this method will not
> >> + * block, regardless of the number of concurrent readers.

I don't understand this "regardless of the number of concurrent" comment.
buflock_read(br) modifies br.tail, it can't be used lockless.

Or, do you mean that every reader should use its own buflock_reader?

If yes. Afaics, we have one buflock_writer, and one buf "connected"
to this buflock_writer. In that case I don't understand why this
buf doesn't live in "struct buflock_writer", it can be char[].
This way both read/write macros do not need buf and size args.
typeof(item) could be used for read/write into this buf.

But still I can't understand how this all works.

> >> +#define buflock_read(br, bw, buf, size, item) \
> >> + do { \
> >> + unsigned int _h, _nh; \
> >> + do { \
> >> + _h = bw.head; \
> >> + smp_rmb(); \
> >> + item = buf[br.tail]; \
> >> + smp_rmb(); \
> >> + _nh = bw.next_head; \
> >> + smp_rmb(); \
> >> + } while (unlikely(br.tail - _h < _nh - _h)); \
> >> + br.tail = (br.tail + 1) & ((size) - 1); \
> >> + } while (0)

How can the reader know there is something new/valid in buf it
can read?

I guess it should call buflock_sync_reader() at least once, to
"attach" to the writer/buf, and then check buflock_reader_empty() ?

But. If the reader calls buflock_read() constantly, sooner or
later buflock_reader_empty() becomes T again.

Probably the reader should call buflock_sync_reader() + check
buflock_reader_empty() == F every time before buflock_read() ?

In this case I do not understand why do we have 2 separate
helpers, and why do we need buflock_reader->head.

Perhaps it is writer who calls buflock_sync_reader() and tells
the reader it has the new data? In this case I do not understand
the "feeding many readers" part.



And in any case I do not understand which guarantees this API
provides.

Whatever we do, buflock_read() can race with the writer and read
the invalid item.

Suppose that buflock_read(br, item) gets the preemption "inside" of
item = buf[br.tail] asignment.

The writer calls buflock_write() SIZE times.

The reader resumes, continues its memcpy() operation, and suceeds.

But the "item" it returns is not consistent, it is neither the old
value nor the new.

No?

Oleg.

2010-06-04 19:43:20

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

Hi Oleg,

Thank you very much for looking at this.

> I am puzzled. I don't understand what this patch does at all ;)
>
> Could you please provide a simple example or answer my questions?

I will do my best. It seem you figured out most of it even without the evdev
example for which it was created. But additional usage documentation ought to be
in place, in other words. Noted.

>>>> + * During normal operation, with adequate buffer size, this method will not
>>>> + * block, regardless of the number of concurrent readers.
>
> I don't understand this "regardless of the number of concurrent" comment.
> buflock_read(br) modifies br.tail, it can't be used lockless.
>
> Or, do you mean that every reader should use its own buflock_reader?

Yes.

> If yes. Afaics, we have one buflock_writer, and one buf "connected"
> to this buflock_writer. In that case I don't understand why this
> buf doesn't live in "struct buflock_writer", it can be char[].
> This way both read/write macros do not need buf and size args.
> typeof(item) could be used for read/write into this buf.

Both Dmitry and yourself say the same thing here, also noted. If looking for an
explanation anyways, it would go something like "do not automatically put things
together if not absolutely necessary".

> But still I can't understand how this all works.
>
>>>> +#define buflock_read(br, bw, buf, size, item) \
>>>> + do { \
>>>> + unsigned int _h, _nh; \
>>>> + do { \
>>>> + _h = bw.head; \
>>>> + smp_rmb(); \
>>>> + item = buf[br.tail]; \
>>>> + smp_rmb(); \
>>>> + _nh = bw.next_head; \
>>>> + smp_rmb(); \
>>>> + } while (unlikely(br.tail - _h < _nh - _h)); \
>>>> + br.tail = (br.tail + 1) & ((size) - 1); \
>>>> + } while (0)
>
> How can the reader know there is something new/valid in buf it
> can read?
>
> I guess it should call buflock_sync_reader() at least once, to
> "attach" to the writer/buf, and then check buflock_reader_empty() ?
>
> But. If the reader calls buflock_read() constantly, sooner or
> later buflock_reader_empty() becomes T again.
>
> Probably the reader should call buflock_sync_reader() + check
> buflock_reader_empty() == F every time before buflock_read() ?
>
> In this case I do not understand why do we have 2 separate
> helpers, and why do we need buflock_reader->head.
>
> Perhaps it is writer who calls buflock_sync_reader() and tells
> the reader it has the new data? In this case I do not understand
> the "feeding many readers" part.

Yes, the writer thread "synchronizes" the readers. Having separate reader/writer
heads helps minimizing the contact area between threads. There is also a
practical reason, in that the writer is able to choose which readers should get
data. In the input layer, this maps to the notion of grabbing a device.
Admittedly a bit of a special case.

> And in any case I do not understand which guarantees this API
> provides.
>
> Whatever we do, buflock_read() can race with the writer and read
> the invalid item.

Yep.

> Suppose that buflock_read(br, item) gets the preemption "inside" of
> item = buf[br.tail] asignment.
>
> The writer calls buflock_write() SIZE times.
>
> The reader resumes, continues its memcpy() operation, and suceeds.
>
> But the "item" it returns is not consistent, it is neither the old
> value nor the new.

True. However, one could argue this is a highly unlikely case given the
(current) usage. Or, one could remedy it by not wrapping the indexes modulo SIZE.

>
> No?

Yes. :-)

Regarding the barriers used in the code, would it be possible to get a picture
of exactly how bad those operations are for performance? Is it true that a
simple spinlock might be faster on average, for instance? Do you see any other
solutions?

Thanks,
Henrik

2010-06-05 01:37:08

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

On Thu, 3 Jun 2010 10:00:59 +0200 "Henrik Rydberg" <[email protected]> wrote:

> In spite of the many lock patterns and fifo helpers in the kernel, the
> case of a single writer feeding many readers via a circular buffer
> seems to be uncovered. This patch adds the buflock, a minimalistic
> interface implementing SMP-safe locking for such a buffer. Under
> normal operation, given adequate buffer size, the operation is
> lock-less. The template is given the name buflock to emphasize that
> the locking depends on the buffer read/write clashes.
>

Seems that reviewers have already covered most of the oddities.

> +/*
> + * Write to buffer without locking
> + *
> + * bw - the buflock_writer keeping track of the write position
> + * buf - the buffer to write to (array of item type)
> + * size - the size of the circular buffer (must be a power of two)
> + * item - the item to write
> + *
> + * There is no locking involved during write, so this method is
> + * suitable to use in interrupt context.
> + */

And if the buffer fills up, it silently overwrites old data?

There are many options in this sort of thing. Certain choices have
been made here and they should be spelled out exhaustively please.

> +#define buflock_write(bw, buf, size, item) \
> + do { \
> + bw.next_head = (bw.head + 1) & ((size) - 1); \
> + smp_wmb(); \
> + buf[bw.head] = item; \
> + smp_wmb(); \
> + bw.head = bw.next_head; \
> + smp_wmb(); \
> + } while (0)

I don't think there's a reason why these all had to be implemented as
bloaty, un-typesafe macros? Especially as buggy ones which reference
their arguments multiple times!

Code it in C if possible, please.

2010-06-05 11:21:59

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

Andrew Morton wrote:
> On Thu, 3 Jun 2010 10:00:59 +0200 "Henrik Rydberg" <[email protected]> wrote:
>
>> In spite of the many lock patterns and fifo helpers in the kernel, the
>> case of a single writer feeding many readers via a circular buffer
>> seems to be uncovered. This patch adds the buflock, a minimalistic
>> interface implementing SMP-safe locking for such a buffer. Under
>> normal operation, given adequate buffer size, the operation is
>> lock-less. The template is given the name buflock to emphasize that
>> the locking depends on the buffer read/write clashes.
>>
>
> Seems that reviewers have already covered most of the oddities.
>
>> +/*
>> + * Write to buffer without locking
>> + *
>> + * bw - the buflock_writer keeping track of the write position
>> + * buf - the buffer to write to (array of item type)
>> + * size - the size of the circular buffer (must be a power of two)
>> + * item - the item to write
>> + *
>> + * There is no locking involved during write, so this method is
>> + * suitable to use in interrupt context.
>> + */
>
> And if the buffer fills up, it silently overwrites old data?
>
> There are many options in this sort of thing. Certain choices have
> been made here and they should be spelled out exhaustively please.
>
>> +#define buflock_write(bw, buf, size, item) \
>> + do { \
>> + bw.next_head = (bw.head + 1) & ((size) - 1); \
>> + smp_wmb(); \
>> + buf[bw.head] = item; \
>> + smp_wmb(); \
>> + bw.head = bw.next_head; \
>> + smp_wmb(); \
>> + } while (0)
>
> I don't think there's a reason why these all had to be implemented as
> bloaty, un-typesafe macros? Especially as buggy ones which reference
> their arguments multiple times!
>
> Code it in C if possible, please.

Thanks Andrew, Dmitry, Oleg and Jonathan for your reviews. Next round of the
buflock stuff will be targeting the wider audience in include/linux/.

Henrik

2010-06-05 17:42:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

Hi Henrik,

On 06/04, Henrik Rydberg wrote:
>
> But additional usage documentation ought to be
> in place, in other words. Noted.

Yes, thanks.

Especially a small example (even in pseudo-code) in buflock.h can help
the reader to quickly understand how this buflock actually works.

> > Whatever we do, buflock_read() can race with the writer and read
> > the invalid item.
>
> True. However, one could argue this is a highly unlikely case given the
> (current) usage.

Agreed, but then I'd strongly suggest you to document this in the header.
The possible user of this API should know the limitations.

> Or, one could remedy it by not wrapping the indexes modulo SIZE.

You mean, change the implementation? Yes.

One more question. As you rightly pointed out, this is similar to seqlocks.
Did you consider the option to merely use them?

IOW,
struct buflock_writer {
seqcount_t lock;
unsigned int head;
};

In this case the implementation is obvious and correct.

Afaics, compared to the current implentation it has the only drawback:
the reader has to restart if it races with any write, while with your
code it only restarts if the writer writes to the item we are trying
to read.

> Regarding the barriers used in the code, would it be possible to get a picture
> of exactly how bad those operations are for performance?

Oh, sorry I don't know, and this obvioulsy differs depending on arch.
I never Knew how these barriers actually work in hardware, just have
the foggy ideas about the "side effects" they have ;)

And I agree with Dmitry, the last smp_Xmb() in buflock_write/read looks
unneeded. Both helpers do not care about the subsequent LOAD/STORE's.

write_seqcount_begin() has the "final" wmb, yes. But this is because
it does care. We are going to modify something under this write_lock,
the result of these subsequent STORE's shouldn't be visible to reader
before it sees the result of ++sequence.

> Is it true that a
> simple spinlock might be faster on average, for instance?

May be. But without spinlock's the writer can be never delayed by
reader. I guess this was your motivation.

Oleg.

2010-06-05 18:34:11

by Henrik Rydberg

[permalink] [raw]
Subject: Re: [PATCH 1/4] input: Introduce buflock, a one-to-many circular buffer mechanism

Hi Oleg,

thanks for having another look at this.

[...]
>>> Whatever we do, buflock_read() can race with the writer and read
>>> the invalid item.
>> True. However, one could argue this is a highly unlikely case given the
>> (current) usage.
>
> Agreed, but then I'd strongly suggest you to document this in the header.
> The possible user of this API should know the limitations.
>
>> Or, one could remedy it by not wrapping the indexes modulo SIZE.
>
> You mean, change the implementation? Yes.

I feel this is the only option now.

> One more question. As you rightly pointed out, this is similar to seqlocks.
> Did you consider the option to merely use them?
>
> IOW,
> struct buflock_writer {
> seqcount_t lock;
> unsigned int head;
> };
>
> In this case the implementation is obvious and correct.
>
> Afaics, compared to the current implentation it has the only drawback:
> the reader has to restart if it races with any write, while with your
> code it only restarts if the writer writes to the item we are trying
> to read.

Yes, I did consider it, but it is suboptimal. :-)

We fixed the immediate problem in another (worse but simpler) way, so this
implementation is now pursued more out of academic interest.

>> Regarding the barriers used in the code, would it be possible to get a picture
>> of exactly how bad those operations are for performance?
>
> Oh, sorry I don't know, and this obvioulsy differs depending on arch.
> I never Knew how these barriers actually work in hardware, just have
> the foggy ideas about the "side effects" they have ;)
>
> And I agree with Dmitry, the last smp_Xmb() in buflock_write/read looks
> unneeded. Both helpers do not care about the subsequent LOAD/STORE's.
>
> write_seqcount_begin() has the "final" wmb, yes. But this is because
> it does care. We are going to modify something under this write_lock,
> the result of these subsequent STORE's shouldn't be visible to reader
> before it sees the result of ++sequence.

The relation between storing the writer head and synchronizing the reader head
is similar in structure, in my view. On the other hand, it might be possible to
remove one of the writer heads altogether, which would make things simpler still.

>> Is it true that a
>> simple spinlock might be faster on average, for instance?
>
> May be. But without spinlock's the writer can be never delayed by
> reader. I guess this was your motivation.

Yes, one of them. The other was a lock where readers do not wait for each other.

Thanks!

Henrik