2013-07-08 13:20:40

by Eliezer Tamir

[permalink] [raw]
Subject: [PATCH net-next] net: rename low latency sockets functions to busy poll

Rename functions in include/net/ll_poll.h to busy wait.
Clarify documentation about expected power use increase.
Rename POLL_LL to POLL_BUSY_LOOP.
Add need_resched() testing to poll/select busy loops.

Note, that in select and poll can_busy_poll is dynamic and is
updated continuously to reflect the existence of supported
sockets with valid queue information.

Signed-off-by: Eliezer Tamir <[email protected]>
---

Documentation/sysctl/net.txt | 12 +++++---
fs/select.c | 60 ++++++++++++++++++++++++---------------
include/net/ll_poll.h | 46 ++++++++++++++++--------------
include/uapi/asm-generic/poll.h | 2 +
net/core/datagram.c | 3 +-
net/ipv4/tcp.c | 6 ++--
net/socket.c | 12 ++++----
7 files changed, 80 insertions(+), 61 deletions(-)

diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
index e658bbf..7323b88 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.txt
@@ -53,22 +53,24 @@ Default: 64
low_latency_read
----------------
Low latency busy poll timeout for socket reads. (needs CONFIG_NET_LL_RX_POLL)
-Approximate time in us to spin waiting for packets on the device queue.
+Approximate time in us to busy loop waiting for packets on the device queue.
This sets the default value of the SO_LL socket option.
-Can be set or overridden per socket by setting socket option SO_LL.
-Recommended value is 50. May increase power usage.
+Can be set or overridden per socket by setting socket option SO_LL, which is
+the preferred method of enabling.
+If you need to enable the feature globally via sysctl, a value of 50 is recommended.
+Will increase power usage.
Default: 0 (off)

low_latency_poll
----------------
Low latency busy poll timeout for poll and select. (needs CONFIG_NET_LL_RX_POLL)
-Approximate time in us to spin waiting for packets on the device queue.
+Approximate time in us to busy loop waiting for events.
Recommended value depends on the number of sockets you poll on.
For several sockets 50, for several hundreds 100.
For more than that you probably want to use epoll.
Note that only sockets with SO_LL set will be busy polled, so you want to either
selectively set SO_LL on those sockets or set sysctl.net.low_latency_read globally.
-May increase power usage.
+Will increase power usage.
Default: 0 (off)

rmem_default
diff --git a/fs/select.c b/fs/select.c
index f28a585..25cac5f 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -402,9 +402,9 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
poll_table *wait;
int retval, i, timed_out = 0;
unsigned long slack = 0;
- unsigned int ll_flag = ll_get_flag();
- u64 ll_start = ll_start_time(ll_flag);
- u64 ll_time = ll_run_time();
+ unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0;
+ u64 busy_start = busy_loop_start_time(busy_flag);
+ u64 busy_end = busy_loop_end_time();

rcu_read_lock();
retval = max_select_fd(n, fds);
@@ -427,7 +427,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
retval = 0;
for (;;) {
unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
- bool can_ll = false;
+ bool can_busy_loop = false;

inp = fds->in; outp = fds->out; exp = fds->ex;
rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;
@@ -456,7 +456,7 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
mask = DEFAULT_POLLMASK;
if (f_op && f_op->poll) {
wait_key_set(wait, in, out,
- bit, ll_flag);
+ bit, busy_flag);
mask = (*f_op->poll)(f.file, wait);
}
fdput(f);
@@ -475,11 +475,18 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
retval++;
wait->_qproc = NULL;
}
- if (mask & POLL_LL)
- can_ll = true;
/* got something, stop busy polling */
- if (retval)
- ll_flag = 0;
+ if (retval) {
+ can_busy_loop = false;
+ busy_flag = 0;
+
+ /*
+ * only remember a returned
+ * POLL_BUSY_LOOP if we asked for it
+ */
+ } else if (busy_flag & mask)
+ can_busy_loop = true;
+
}
}
if (res_in)
@@ -498,8 +505,9 @@ int do_select(int n, fd_set_bits *fds, struct timespec *end_time)
break;
}

- /* only if on, have sockets with POLL_LL and not out of time */
- if (ll_flag && can_ll && can_poll_ll(ll_start, ll_time))
+ /* only if found POLL_BUSY_LOOP sockets && not out of time */
+ if (!need_resched() && can_busy_loop &&
+ busy_loop_range(busy_start, busy_end))
continue;

/*
@@ -734,7 +742,8 @@ struct poll_list {
* if pwait->_qproc is non-NULL.
*/
static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait,
- bool *can_ll, unsigned int ll_flag)
+ bool *can_busy_poll,
+ unsigned int busy_flag)
{
unsigned int mask;
int fd;
@@ -748,10 +757,10 @@ static inline unsigned int do_pollfd(struct pollfd *pollfd, poll_table *pwait,
mask = DEFAULT_POLLMASK;
if (f.file->f_op && f.file->f_op->poll) {
pwait->_key = pollfd->events|POLLERR|POLLHUP;
- pwait->_key |= ll_flag;
+ pwait->_key |= busy_flag;
mask = f.file->f_op->poll(f.file, pwait);
- if (mask & POLL_LL)
- *can_ll = true;
+ if (mask & busy_flag)
+ *can_busy_poll = true;
}
/* Mask out unneeded events. */
mask &= pollfd->events | POLLERR | POLLHUP;
@@ -770,9 +779,10 @@ static int do_poll(unsigned int nfds, struct poll_list *list,
ktime_t expire, *to = NULL;
int timed_out = 0, count = 0;
unsigned long slack = 0;
- unsigned int ll_flag = ll_get_flag();
- u64 ll_start = ll_start_time(ll_flag);
- u64 ll_time = ll_run_time();
+ unsigned int busy_flag = net_busy_loop_on() ? POLL_BUSY_LOOP : 0;
+ u64 busy_start = busy_loop_start_time(busy_flag);
+ u64 busy_end = busy_loop_end_time();
+

/* Optimise the no-wait case */
if (end_time && !end_time->tv_sec && !end_time->tv_nsec) {
@@ -785,7 +795,7 @@ static int do_poll(unsigned int nfds, struct poll_list *list,

for (;;) {
struct poll_list *walk;
- bool can_ll = false;
+ bool can_busy_loop = false;

for (walk = list; walk != NULL; walk = walk->next) {
struct pollfd * pfd, * pfd_end;
@@ -800,10 +810,13 @@ static int do_poll(unsigned int nfds, struct poll_list *list,
* this. They'll get immediately deregistered
* when we break out and return.
*/
- if (do_pollfd(pfd, pt, &can_ll, ll_flag)) {
+ if (do_pollfd(pfd, pt, &can_busy_loop,
+ busy_flag)) {
count++;
pt->_qproc = NULL;
- ll_flag = 0;
+ /* found something, stop busy polling */
+ busy_flag = 0;
+ can_busy_loop = false;
}
}
}
@@ -820,8 +833,9 @@ static int do_poll(unsigned int nfds, struct poll_list *list,
if (count || timed_out)
break;

- /* only if on, have sockets with POLL_LL and not out of time */
- if (ll_flag && can_ll && can_poll_ll(ll_start, ll_time))
+ /* only if found POLL_BUSY_LOOP sockets && not out of time */
+ if (!need_resched() && can_busy_loop &&
+ busy_loop_range(busy_start, busy_end))
continue;

/*
diff --git a/include/net/ll_poll.h b/include/net/ll_poll.h
index 0d620ba..f14dd88 100644
--- a/include/net/ll_poll.h
+++ b/include/net/ll_poll.h
@@ -37,9 +37,9 @@ extern unsigned int sysctl_net_ll_poll __read_mostly;
#define LL_FLUSH_FAILED -1
#define LL_FLUSH_BUSY -2

-static inline unsigned int ll_get_flag(void)
+static inline bool net_busy_loop_on(void)
{
- return sysctl_net_ll_poll ? POLL_LL : 0;
+ return sysctl_net_ll_poll;
}

/* a wrapper to make debug_smp_processor_id() happy
@@ -47,7 +47,7 @@ static inline unsigned int ll_get_flag(void)
* we only care that the average is bounded
*/
#ifdef CONFIG_DEBUG_PREEMPT
-static inline u64 ll_sched_clock(void)
+static inline u64 busy_loop_sched_clock(void)
{
u64 rc;

@@ -58,7 +58,7 @@ static inline u64 ll_sched_clock(void)
return rc;
}
#else /* CONFIG_DEBUG_PREEMPT */
-static inline u64 ll_sched_clock(void)
+static inline u64 busy_loop_sched_clock(void)
{
return sched_clock();
}
@@ -67,7 +67,7 @@ static inline u64 ll_sched_clock(void)
/* we don't mind a ~2.5% imprecision so <<10 instead of *1000
* sk->sk_ll_usec is a u_int so this can't overflow
*/
-static inline u64 ll_sk_run_time(struct sock *sk)
+static inline u64 sk_busy_loop_end_time(struct sock *sk)
{
return (u64)ACCESS_ONCE(sk->sk_ll_usec) << 10;
}
@@ -75,27 +75,29 @@ static inline u64 ll_sk_run_time(struct sock *sk)
/* in poll/select we use the global sysctl_net_ll_poll value
* only call sched_clock() if enabled
*/
-static inline u64 ll_run_time(void)
+static inline u64 busy_loop_end_time(void)
{
return (u64)ACCESS_ONCE(sysctl_net_ll_poll) << 10;
}

-/* if flag is not set we don't need to know the time */
-static inline u64 ll_start_time(unsigned int flag)
+/* if flag is not set we don't need to know the time
+ * so we want to avoid a potentially expensive sched_clock()
+ */
+static inline u64 busy_loop_start_time(unsigned int flag)
{
- return flag ? ll_sched_clock() : 0;
+ return flag ? busy_loop_sched_clock() : 0;
}

-static inline bool sk_valid_ll(struct sock *sk)
+static inline bool sk_can_busy_loop(struct sock *sk)
{
return sk->sk_ll_usec && sk->sk_napi_id &&
!need_resched() && !signal_pending(current);
}

/* careful! time_in_range64 will evaluate now twice */
-static inline bool can_poll_ll(u64 start_time, u64 run_time)
+static inline bool busy_loop_range(u64 start_time, u64 run_time)
{
- u64 now = ll_sched_clock();
+ u64 now = busy_loop_sched_clock();

return time_in_range64(now, start_time, start_time + run_time);
}
@@ -103,10 +105,10 @@ static inline bool can_poll_ll(u64 start_time, u64 run_time)
/* when used in sock_poll() nonblock is known at compile time to be true
* so the loop and end_time will be optimized out
*/
-static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+static inline bool sk_busy_loop(struct sock *sk, int nonblock)
{
- u64 start_time = ll_start_time(!nonblock);
- u64 run_time = ll_sk_run_time(sk);
+ u64 start_time = busy_loop_start_time(!nonblock);
+ u64 end_time = sk_busy_loop_end_time(sk);
const struct net_device_ops *ops;
struct napi_struct *napi;
int rc = false;
@@ -137,7 +139,7 @@ static inline bool sk_poll_ll(struct sock *sk, int nonblock)
LINUX_MIB_LOWLATENCYRXPACKETS, rc);

} while (!nonblock && skb_queue_empty(&sk->sk_receive_queue) &&
- can_poll_ll(start_time, run_time));
+ busy_loop_range(start_time, end_time));

rc = !skb_queue_empty(&sk->sk_receive_queue);
out:
@@ -158,27 +160,27 @@ static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
}

#else /* CONFIG_NET_LL_RX_POLL */
-static inline unsigned long ll_get_flag(void)
+static inline unsigned long net_busy_loop_on(void)
{
return 0;
}

-static inline u64 ll_start_time(unsigned int flag)
+static inline u64 busy_loop_start_time(unsigned int flag)
{
return 0;
}

-static inline u64 ll_run_time(void)
+static inline u64 busy_loop_end_time(void)
{
return 0;
}

-static inline bool sk_valid_ll(struct sock *sk)
+static inline bool sk_can_busy_loop(struct sock *sk)
{
return false;
}

-static inline bool sk_poll_ll(struct sock *sk, int nonblock)
+static inline bool sk_busy_poll(struct sock *sk, int nonblock)
{
return false;
}
@@ -191,7 +193,7 @@ static inline void sk_mark_ll(struct sock *sk, struct sk_buff *skb)
{
}

-static inline bool can_poll_ll(u64 start_time, u64 run_time)
+static inline bool busy_loop_range(u64 start_time, u64 run_time)
{
return false;
}
diff --git a/include/uapi/asm-generic/poll.h b/include/uapi/asm-generic/poll.h
index 4aee586..a969498 100644
--- a/include/uapi/asm-generic/poll.h
+++ b/include/uapi/asm-generic/poll.h
@@ -30,7 +30,7 @@

#define POLLFREE 0x4000 /* currently only for epoll */

-#define POLL_LL 0x8000
+#define POLL_BUSY_LOOP 0x8000

struct pollfd {
int fd;
diff --git a/net/core/datagram.c b/net/core/datagram.c
index 9cbaba9..6e9ab31 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -208,7 +208,8 @@ struct sk_buff *__skb_recv_datagram(struct sock *sk, unsigned int flags,
}
spin_unlock_irqrestore(&queue->lock, cpu_flags);

- if (sk_valid_ll(sk) && sk_poll_ll(sk, flags & MSG_DONTWAIT))
+ if (sk_can_busy_loop(sk) &&
+ sk_busy_loop(sk, flags & MSG_DONTWAIT))
continue;

/* User doesn't want to wait */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 46ed9af..15cbfa9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1554,9 +1554,9 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
struct sk_buff *skb;
u32 urg_hole = 0;

- if (sk_valid_ll(sk) && skb_queue_empty(&sk->sk_receive_queue)
- && (sk->sk_state == TCP_ESTABLISHED))
- sk_poll_ll(sk, nonblock);
+ if (sk_can_busy_loop(sk) && skb_queue_empty(&sk->sk_receive_queue) &&
+ (sk->sk_state == TCP_ESTABLISHED))
+ sk_busy_loop(sk, nonblock);

lock_sock(sk);

diff --git a/net/socket.c b/net/socket.c
index 4da14cb..45afa64 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1148,7 +1148,7 @@ EXPORT_SYMBOL(sock_create_lite);
/* No kernel lock held - perfect */
static unsigned int sock_poll(struct file *file, poll_table *wait)
{
- unsigned int ll_flag = 0;
+ unsigned int busy_flag = 0;
struct socket *sock;

/*
@@ -1156,16 +1156,16 @@ static unsigned int sock_poll(struct file *file, poll_table *wait)
*/
sock = file->private_data;

- if (sk_valid_ll(sock->sk)) {
+ if (sk_can_busy_loop(sock->sk)) {
/* this socket can poll_ll so tell the system call */
- ll_flag = POLL_LL;
+ busy_flag = POLL_BUSY_LOOP;

/* once, only if requested by syscall */
- if (wait && (wait->_key & POLL_LL))
- sk_poll_ll(sock->sk, 1);
+ if (wait && (wait->_key & POLL_BUSY_LOOP))
+ sk_busy_loop(sock->sk, 1);
}

- return ll_flag | sock->ops->poll(file, sock, wait);
+ return busy_flag | sock->ops->poll(file, sock, wait);
}

static int sock_mmap(struct file *file, struct vm_area_struct *vma)


2013-07-08 16:37:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On Mon, Jul 8, 2013 at 6:20 AM, Eliezer Tamir
<[email protected]> wrote:
>
> - /* only if on, have sockets with POLL_LL and not out of time */
> - if (ll_flag && can_ll && can_poll_ll(ll_start, ll_time))
> + /* only if found POLL_BUSY_LOOP sockets && not out of time */
> + if (!need_resched() && can_busy_loop &&
> + busy_loop_range(busy_start, busy_end))
> continue;

Better, but still horribly ugly. I also suspect the need_resched()
test should be done after testing can_busy_loop, just in case the
compiler can avoid having to load things off the current pointer.

I also think that we should clear busy_flag if we don't continue, no?

I also don't understand why the code keeps both busy_start and
busy_end around. It all looks completely pointless. Why have two
variables, and why make the comparisons more complicated, and the code
look more complex than it actually is?

So the code *should* just do

if (need_busy_loop) {
if (!need_resched() && !busy_loop_timeout(start_time))
continue;
}
busy_flag = 0;

or something like that. Then, calculate busy_end there, since it's
just an addition. Keeping two 64-bit variables around not only makes
the code more complex, it generates worse code.

I also suspect there's a lot of other micro-optimizations that could
be done: while keeping *two* 64-bit timeouts is just completely insane
on a 32-bit architecture, keeping even just one is debatable. I
suspect the timeouts should be just "unsigned long", so that 32-bit
architectures don't bother with unnecessary 64-bit clocks. 32-bit
clocks are several seconds even if you count nanoseconds, and you
actually only do microseconds anyway (so instead of shifting the
microseconds left by ten like you do, shift the nanoseconds of
sched_clock right by ten, and now you only need 32-bit values).

select() and poll() performance is actually fairly critical, so trying
to avoid bad code generation here is likely worth it.

Linus

2013-07-08 17:14:35

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On 08/07/2013 19:37, Linus Torvalds wrote:
> On Mon, Jul 8, 2013 at 6:20 AM, Eliezer Tamir
> <[email protected]> wrote:
>>
>> - /* only if on, have sockets with POLL_LL and not out of time */
>> - if (ll_flag && can_ll && can_poll_ll(ll_start, ll_time))
>> + /* only if found POLL_BUSY_LOOP sockets && not out of time */
>> + if (!need_resched() && can_busy_loop &&
>> + busy_loop_range(busy_start, busy_end))
>> continue;
>
> Better, but still horribly ugly. I also suspect the need_resched()
> test should be done after testing can_busy_loop, just in case the
> compiler can avoid having to load things off the current pointer.

I think there is no way for the compiler to know the value of
can_busy_loop at compile time. It depends on the replies we get
from polling the sockets. ll_flag was there to make sure the compiler
will know when things are defined out.

I would be very happy to hear suggestions for better names for things.

> I also think that we should clear busy_flag if we don't continue, no?

I'm not sure. If the time the user specified to busy-poll is not over,
and the reason we didn't do it last time was that the sockets did not
report that they have valid polling information (perhaps a route changed
or a device we reset), but how we do have sockets that can busy-poll,
wouldn't polling be the right thing to do?

> I also don't understand why the code keeps both busy_start and
> busy_end around. It all looks completely pointless. Why have two
> variables, and why make the comparisons more complicated, and the code
> look more complex than it actually is?

Originally the code used time_after() and only kept the start time.
People on the list voiced concerns that using sched_clock() might be
risky since we may end up on another CPU with a different time.

We used sched_clock() because we don't need the time to be very
accurate, this is only a cut-off time to make sure we never spin
forever when no event ever happens.

I then changed this to use time_in_range() so that if we wake up with a
completely random time, we will be out of the range and fail safely.

If you think that is wrong we can go back to use time_after().

> I also suspect there's a lot of other micro-optimizations that could
> be done: while keeping *two* 64-bit timeouts is just completely insane
> on a 32-bit architecture, keeping even just one is debatable. I
> suspect the timeouts should be just "unsigned long", so that 32-bit
> architectures don't bother with unnecessary 64-bit clocks. 32-bit
> clocks are several seconds even if you count nanoseconds, and you
> actually only do microseconds anyway (so instead of shifting the
> microseconds left by ten like you do, shift the nanoseconds of
> sched_clock right by ten, and now you only need 32-bit values).

OK, but please answer my questions above, it is starting to be late here
and I would really like to send a fix that everyone will find
acceptable today.

Thanks,
Eliezer

2013-07-08 19:37:11

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On Mon, Jul 8, 2013 at 10:14 AM, Eliezer Tamir
<[email protected]> wrote:
>
> I think there is no way for the compiler to know the value of
> can_busy_loop at compile time. It depends on the replies we get
> from polling the sockets. ll_flag was there to make sure the compiler
> will know when things are defined out.

No, my point was that we want to handle the easily seen register test
first, and not even have to load current().

The compiler may end up scheduling the code to load current anyway,
but the way you wrote it it's pretty much guaranteed that it will do
it.

>> I also think that we should clear busy_flag if we don't continue, no?
>
> I'm not sure. If the time the user specified to busy-poll is not over,
> and the reason we didn't do it last time was that the sockets did not
> report that they have valid polling information (perhaps a route changed
> or a device we reset), but how we do have sockets that can busy-poll,
> wouldn't polling be the right thing to do?

Well, we would have slept already, and then microseconds will have
passed. Also, if we schedule, we may overflow a 32-bit time which can
screw up the later optimization:

> Originally the code used time_after() and only kept the start time.
> People on the list voiced concerns that using sched_clock() might be
> risky since we may end up on another CPU with a different time.

No, I agree with the "don't call sched_clock() unless necessary".

It's the end_time I disagree with. There's no point. Just do the
start-time (preferably as just a "unsigned long" in microseconds), and
let it be.

In fact, I'd argue for initializing start_time to zero, and have the
"have we timed out" logic load it only if necessary, rather than
initializing it based on whether POLL_BUSY_WAIT was set or not.
Because one common case - even with POLL_BUSY_WAIT - is that we go
through the loop exactly once, and the data exists on the very first
try. And that is in fact the case we want to optimize and not do any
extra work for at all.

So I would actually argue that the whole timeout code might as well be
something like

unsigned long start_time = 0;
...
if (want_busy_poll && !need_resched()) {
unsigned long now = busy_poll_sched_clock();
if (!start_time) {
start_time = now + sysctl.busypoll;
continue;
}
if (time_before(start_time, now))
continue;
}

or similar. Just one time variable, and it doesn't even need to be
64-bit on 32-bit architectures. Which makes all the code generation
much simpler. And notice how this has basically no cost if we already
found data, because we won't even get here. So there's only cost if we
actually might want to busy-poll.

> OK, but please answer my questions above, it is starting to be late here
> and I would really like to send a fix that everyone will find
> acceptable today.

I think it's getting closer, and I'm ok with the last final details
being sorted out later. I just can't reasonably test any of my
suggestions, so I'd like to get it to a point where when I pull, I
don't feel like I'm pulling core code that I really detest. And the
VFS layer in particular I'm pretty attached to.

Linus

2013-07-08 19:46:14

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On 08/07/2013 22:37, Linus Torvalds wrote:
> On Mon, Jul 8, 2013 at 10:14 AM, Eliezer Tamir
> <[email protected]> wrote:
>>
>> I think there is no way for the compiler to know the value of
>> can_busy_loop at compile time. It depends on the replies we get
>> from polling the sockets. ll_flag was there to make sure the compiler
>> will know when things are defined out.
>
> No, my point was that we want to handle the easily seen register test
> first, and not even have to load current().
>
> The compiler may end up scheduling the code to load current anyway,
> but the way you wrote it it's pretty much guaranteed that it will do
> it.

I see. OK.

> In fact, I'd argue for initializing start_time to zero, and have the
> "have we timed out" logic load it only if necessary, rather than
> initializing it based on whether POLL_BUSY_WAIT was set or not.
> Because one common case - even with POLL_BUSY_WAIT - is that we go
> through the loop exactly once, and the data exists on the very first
> try. And that is in fact the case we want to optimize and not do any
> extra work for at all.
>
> So I would actually argue that the whole timeout code might as well be
> something like
>
> unsigned long start_time = 0;
> ...
> if (want_busy_poll && !need_resched()) {
> unsigned long now = busy_poll_sched_clock();
> if (!start_time) {
> start_time = now + sysctl.busypoll;
> continue;
> }
> if (time_before(start_time, now))
> continue;
> }
>

OK.

Thanks,
Eliezer

2013-07-08 19:59:45

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On Mon, 08 Jul 2013 22:46:04 +0300
Eliezer Tamir <[email protected]> wrote:

> On 08/07/2013 22:37, Linus Torvalds wrote:
> > On Mon, Jul 8, 2013 at 10:14 AM, Eliezer Tamir
> > <[email protected]> wrote:
> >>
> >> I think there is no way for the compiler to know the value of
> >> can_busy_loop at compile time. It depends on the replies we get
> >> from polling the sockets. ll_flag was there to make sure the compiler
> >> will know when things are defined out.
> >
> > No, my point was that we want to handle the easily seen register test
> > first, and not even have to load current().
> >
> > The compiler may end up scheduling the code to load current anyway,
> > but the way you wrote it it's pretty much guaranteed that it will do
> > it.
>
> I see. OK.
>
> > In fact, I'd argue for initializing start_time to zero, and have the
> > "have we timed out" logic load it only if necessary, rather than
> > initializing it based on whether POLL_BUSY_WAIT was set or not.
> > Because one common case - even with POLL_BUSY_WAIT - is that we go
> > through the loop exactly once, and the data exists on the very first
> > try. And that is in fact the case we want to optimize and not do any
> > extra work for at all.
> >
> > So I would actually argue that the whole timeout code might as well be
> > something like
> >
> > unsigned long start_time = 0;
> > ...
> > if (want_busy_poll && !need_resched()) {
> > unsigned long now = busy_poll_sched_clock();
> > if (!start_time) {
> > start_time = now + sysctl.busypoll;
> > continue;
> > }
> > if (time_before(start_time, now))
> > continue;
> > }
> >
>
> OK.

Since this is special case in the hot path, it looks like a good case
for static branch logic.

2013-07-08 20:05:21

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On Mon, 08 Jul 2013 22:46:04 +0300
Eliezer Tamir <[email protected]> wrote:

> On 08/07/2013 22:37, Linus Torvalds wrote:
> > On Mon, Jul 8, 2013 at 10:14 AM, Eliezer Tamir
> > <[email protected]> wrote:
> >>
> >> I think there is no way for the compiler to know the value of
> >> can_busy_loop at compile time. It depends on the replies we get
> >> from polling the sockets. ll_flag was there to make sure the compiler
> >> will know when things are defined out.
> >
> > No, my point was that we want to handle the easily seen register test
> > first, and not even have to load current().
> >
> > The compiler may end up scheduling the code to load current anyway,
> > but the way you wrote it it's pretty much guaranteed that it will do
> > it.
>
> I see. OK.
>
> > In fact, I'd argue for initializing start_time to zero, and have the
> > "have we timed out" logic load it only if necessary, rather than
> > initializing it based on whether POLL_BUSY_WAIT was set or not.
> > Because one common case - even with POLL_BUSY_WAIT - is that we go
> > through the loop exactly once, and the data exists on the very first
> > try. And that is in fact the case we want to optimize and not do any
> > extra work for at all.
> >
> > So I would actually argue that the whole timeout code might as well be
> > something like
> >
> > unsigned long start_time = 0;
> > ...
> > if (want_busy_poll && !need_resched()) {
> > unsigned long now = busy_poll_sched_clock();
> > if (!start_time) {
> > start_time = now + sysctl.busypoll;
> > continue;
> > }
> > if (time_before(start_time, now))
> > continue;
> > }
> >
>

Since this code is in hot path, and a special case, looks like a good
candidate for static branch.

2013-07-08 20:10:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On Mon, Jul 8, 2013 at 1:05 PM, Stephen Hemminger
<[email protected]> wrote:
>> >
>> > unsigned long start_time = 0;
>> > ...
>> > if (want_busy_poll && !need_resched()) {
>> > unsigned long now = busy_poll_sched_clock();
>> > if (!start_time) {
>> > start_time = now + sysctl.busypoll;
>> > continue;
>> > }
>> > if (time_before(start_time, now))
>> > continue;
>> > }
>> >
>>
>
> Since this code is in hot path, and a special case, looks like a good
> candidate for static branch.

No, it's not static. It's an initializer, yes, but it's not a static
one, and it gets run on every single select()/poll(). Well, every
single one that goes through the loop more than once.

So it's very much a dynamic condition. It's just placed that way to
avoid doing the (relatively expensive) time read for the case where
data is available immediately.

And the "data is available immediately" case is actually fairly
common. Sometimes it's because the producer/network is faster than the
consumer. And sometimes it's because you get big chunks at a time, but
the reader/writer ends up always going through a select() loop without
actually reading/writing everything in a loop until it gets a
full/empty error.

And I bet that both of those cases happen even with programs/networks
that end up using the magic low-latency/busyloop polling flags. So
avoiding the timer read if at all possible is definitely a good idea.

Linus

2013-07-09 02:27:09

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

From: Linus Torvalds <[email protected]>
Date: Mon, 8 Jul 2013 12:37:06 -0700

> I think it's getting closer, and I'm ok with the last final details
> being sorted out later. I just can't reasonably test any of my
> suggestions, so I'd like to get it to a point where when I pull, I
> don't feel like I'm pulling core code that I really detest. And the
> VFS layer in particular I'm pretty attached to.

So I've applied Eliezer's rename patch and will send another pull
request once he also takes care of the reduction of the number and
size of the variables used to track the poll timestamps.

Thanks.

2013-07-09 22:25:18

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On Mon, 08 Jul 2013 16:20:34 +0300
Eliezer Tamir <[email protected]> wrote:

> Rename POLL_LL to POLL_BUSY_LOOP.

So pardon me if I speak out of turn, but it occurs to me to
wonder...should the SO_LL socket option be renamed in a similar fashion
before this interface escapes into the wild?

Thanks,

jon

2013-07-09 23:06:16

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

From: Jonathan Corbet <[email protected]>
Date: Tue, 9 Jul 2013 16:25:14 -0600

> On Mon, 08 Jul 2013 16:20:34 +0300
> Eliezer Tamir <[email protected]> wrote:
>
>> Rename POLL_LL to POLL_BUSY_LOOP.
>
> So pardon me if I speak out of turn, but it occurs to me to
> wonder...should the SO_LL socket option be renamed in a similar fashion
> before this interface escapes into the wild?

Sure and we can rename include/net/ll_poll.h to something more
fitting as well.

I'll make sure this happens before 3.11 gets even close to release.

2013-07-10 03:29:22

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On 10/07/2013 02:06, David Miller wrote:
> From: Jonathan Corbet <[email protected]>
> Date: Tue, 9 Jul 2013 16:25:14 -0600
>
>> On Mon, 08 Jul 2013 16:20:34 +0300
>> Eliezer Tamir <[email protected]> wrote:
>>
>>> Rename POLL_LL to POLL_BUSY_LOOP.
>>
>> So pardon me if I speak out of turn, but it occurs to me to
>> wonder...should the SO_LL socket option be renamed in a similar fashion
>> before this interface escapes into the wild?
>
> Sure and we can rename include/net/ll_poll.h to something more
> fitting as well.
>
> I'll make sure this happens before 3.11 gets even close to release.

David,

If the following names changes are acceptable I will try to send out
a patch today.

1. include/net/ll_poll.h -> include/net/busy_poll.h

2. ndo_ll_poll -> ndo_busy_poll

- not technically accurate since the ndo callback does not itself busy
poll, it's just used to implement it.

maybe ndo_napi_id_poll? or ndo_id_poll? I don't really like any of them,
so a suggestion would be nice.

3. sysctl_net_ll_{read,poll} -> sysctl_net_busy_{read,poll}
- along with matching file name changes.

4. {sk,skb}_mark_ll -> {sk,skb}_mark_napi_id

5. LL_SO -> BUSY_POLL_SO

Are you OK with the names?
Did I miss anything?

Thanks,
Eliezer

2013-07-10 04:41:59

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

From: Eliezer Tamir <[email protected]>
Date: Wed, 10 Jul 2013 06:29:16 +0300

> If the following names changes are acceptable I will try to send out
> a patch today.
>
> 1. include/net/ll_poll.h -> include/net/busy_poll.h

Agreed.

> 2. ndo_ll_poll -> ndo_busy_poll
>
> - not technically accurate since the ndo callback does not itself busy
> poll, it's just used to implement it.

I think this name change is accurate, it expresses the two elements of
what it does. It's busy waiting, in that it's doing a synchronous
scan of the device's RX queue, and it's polling just like NAPI polling
does.

> maybe ndo_napi_id_poll? or ndo_id_poll? I don't really like any of them,
> so a suggestion would be nice.

This would make it sound like it's some new version of the existing
NAPI poll.

Well... what would be great would be to come up with some single
interface that drivers can implement rather than having to have
both napi->poll and netdevice_ops->ndo_ll_poll(). But that's a task
for a later date.

Therefore, ndo_busy_poll is probably best for now.

> 3. sysctl_net_ll_{read,poll} -> sysctl_net_busy_{read,poll}
> - along with matching file name changes.

Agreed.

> 4. {sk,skb}_mark_ll -> {sk,skb}_mark_napi_id

Agreed.

> 5. LL_SO -> BUSY_POLL_SO

Agreed.

> Did I miss anything?

Nope, looks complete. And also do the manpage update.

2013-07-10 05:21:30

by Eliezer Tamir

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On 10/07/2013 07:41, David Miller wrote:
> From: Eliezer Tamir <[email protected]>
> Date: Wed, 10 Jul 2013 06:29:16 +0300
>
>> If the following names changes are acceptable I will try to send out
>> a patch today.

>> 2. ndo_ll_poll -> ndo_busy_poll
>>
>> - not technically accurate since the ndo callback does not itself busy
>> poll, it's just used to implement it.
>
> I think this name change is accurate, it expresses the two elements of
> what it does. It's busy waiting, in that it's doing a synchronous
> scan of the device's RX queue, and it's polling just like NAPI polling
> does.

OK

> Well... what would be great would be to come up with some single
> interface that drivers can implement rather than having to have
> both napi->poll and netdevice_ops->ndo_ll_poll(). But that's a task
> for a later date.
>
> Therefore, ndo_busy_poll is probably best for now.

I will think about this, maybe we could even unify ndo_poll_controller.
It seems like said unified method would have to have an extra parameter
that would indicate from which context it was called:
1. from napi poll (bh)
2. from poll controller (with interrupts disabled)
3. from busy poll (user context)
-of course not for today.

> Nope, looks complete. And also do the manpage update.

Where do I find the repository for the manpages?

2013-07-10 06:10:53

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next] net: rename low latency sockets functions to busy poll

On Wed, 2013-07-10 at 08:21 +0300, Eliezer Tamir wrote:

> Where do I find the repository for the manpages?
>

MAN-PAGES: MANUAL PAGES FOR LINUX -- Sections 2, 3, 4, 5, and 7
M: Michael Kerrisk <[email protected]>
W: http://www.kernel.org/doc/man-pages
L: [email protected]
S: Maintained