Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:57104 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752168AbcI2OBT (ORCPT ); Thu, 29 Sep 2016 10:01:19 -0400 Message-ID: <1475157674.4676.52.camel@redhat.com> Subject: Re: [PATCH net-next v3 2/3] udp: implement memory accounting helpers From: Paolo Abeni To: Eric Dumazet Cc: netdev@vger.kernel.org, "David S. Miller" , James Morris , Trond Myklebust , Alexander Duyck , Daniel Borkmann , Eric Dumazet , Tom Herbert , Hannes Frederic Sowa , Edward Cree , linux-nfs@vger.kernel.org Date: Thu, 29 Sep 2016 16:01:14 +0200 In-Reply-To: <1475155472.28155.164.camel@edumazet-glaptop3.roam.corp.google.com> References: <6f445861ef2ce2e626a1df4946bc3f43f3d43e3f.1475048434.git.pabeni@redhat.com> <1475113378.28155.124.camel@edumazet-glaptop3.roam.corp.google.com> <1475141514.4676.28.camel@redhat.com> <1475155472.28155.164.camel@edumazet-glaptop3.roam.corp.google.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org List-ID: On Thu, 2016-09-29 at 06:24 -0700, Eric Dumazet wrote: > On Thu, 2016-09-29 at 11:31 +0200, Paolo Abeni wrote: > > On Wed, 2016-09-28 at 18:42 -0700, Eric Dumazet wrote: > > > On Wed, 2016-09-28 at 12:52 +0200, Paolo Abeni wrote: > > > > > > > +static void udp_rmem_release(struct sock *sk, int partial) > > > > +{ > > > > + struct udp_sock *up = udp_sk(sk); > > > > + int fwd, amt; > > > > + > > > > + if (partial && !udp_under_memory_pressure(sk)) > > > > + return; > > > > + > > > > + /* we can have concurrent release; if we catch any conflict > > > > + * we let only one of them do the work > > > > + */ > > > > + if (atomic_dec_if_positive(&up->can_reclaim) < 0) > > > > + return; > > > > + > > > > + fwd = __udp_forward(up, atomic_read(&sk->sk_rmem_alloc)); > > > > + if (fwd < SK_MEM_QUANTUM + partial) { > > > > + atomic_inc(&up->can_reclaim); > > > > + return; > > > > + } > > > > + > > > > + amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1); > > > > + atomic_sub(amt, &up->mem_allocated); > > > > + atomic_inc(&up->can_reclaim); > > > > + > > > > + __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT); > > > > + sk->sk_forward_alloc = fwd - amt; > > > > +} > > > > > > > > > This is racy... all these atomics make me nervous... > > > > Ah, perhaps I got it: if we have a concurrent memory scheduling, we > > could end up with a value of mem_allocated below the real need. > > > > That mismatch will not drift: at worst we can end up with mem_allocated > > being single SK_MEM_QUANTUM below what is strictly needed. > > > > A possible alternative could be: > > > > static void udp_rmem_release(struct sock *sk, int partial) > > { > > struct udp_sock *up = udp_sk(sk); > > int fwd, amt, alloc_old, alloc; > > > > if (partial && !udp_under_memory_pressure(sk)) > > return; > > > > alloc = atomic_read(&up->mem_allocated); > > fwd = alloc - atomic_read(&sk->sk_rmem_alloc); > > if (fwd < SK_MEM_QUANTUM + partial) > > return; > > > > amt = (fwd - partial) & ~(SK_MEM_QUANTUM - 1); > > alloc_old = atomic_cmpxchg(&up->mem_allocated, alloc, alloc - amt); > > /* if a concurrent update is detected, just do nothing; if said update > > * is due to another memory release, that release take care of > > * reclaiming the memory for us, too. > > * Otherwise we will be able to release on later dequeue, since > > * we will eventually stop colliding with the writer when it will > > * consume all the fwd allocated memory > > */ > > if (alloc_old != alloc) > > return; > > > > __sk_mem_reduce_allocated(sk, amt >> SK_MEM_QUANTUM_SHIFT); > > sk->sk_forward_alloc = fwd - amt; > > Can still be done from multiple cpus. > > Add some ndelay() or udelay() before to simulate fact that current cpu > could be interrupted by an NMI handler (perf for example)... or hard IRQ > handler... > > Then make sure your tests involve 16 concurrent cpus dealing with one > udp socket... Thank you again reviewing this. I'm working to this sort of tests right now. > > > } > > > > which is even more lazy in reclaiming but should never underestimate the > > needed forward allocation, and under pressure should eventually free the > > needed memory. > > > If this code is rarely used, why don't you simply use a real spinlock, > so that we do not have to worry about all this ? > > A spinlock acquisition/release is a _single_ locked operation. > Faster than the 3 atomic you got in last version. > spinlock code (ticket or MCS) avoids starvation. I'd like to avoid adding a lock, if possible, to avoid any possible source of contention. > Then, you can safely update multiple fields in the socket. > > And you get nice lockdep support as a bonus. > > cmpxchg() is fine when a single field need an exclusion. But there you > have multiple fields to update at once : > > sk_memory_allocated_add() and sk_memory_allocated_sub() can work using > atomic_long_add_return() and atomic_long_sub() because their caller owns > the socket lock and can safely update sk->sk_forward_alloc without > additional locking, but UDP wont have this luxury after your patches. When we reach __sk_mem_reduce_allocated() we are sure we can free the specified amount of memory, so we only need to ensure consistent sk_prot->memory_allocated updates. The current atomic operation suffices to this. Paolo