Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751343AbbHWGEt (ORCPT ); Sun, 23 Aug 2015 02:04:49 -0400 Received: from mail-wi0-f181.google.com ([209.85.212.181]:33439 "EHLO mail-wi0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750956AbbHWGEs (ORCPT ); Sun, 23 Aug 2015 02:04:48 -0400 Date: Sun, 23 Aug 2015 08:04:43 +0200 From: Ingo Molnar To: George Spelvin Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, torvalds@linux-foundation.org, Dave Hansen , Peter Zijlstra , David Rientjes , Rik van Riel , Rasmus Villemoes Subject: Re: [PATCH 0/3] mm/vmalloc: Cache the /proc/meminfo vmalloc statistics Message-ID: <20150823060443.GA9882@gmail.com> References: <20150823044839.5727.qmail@ns.horizon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150823044839.5727.qmail@ns.horizon.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4440 Lines: 134 * George Spelvin wrote: > Linus wrote: > > I don't think any of this can be called "correct", in that the > > unlocked accesses to the cached state are clearly racy, but I think > > it's very much "acceptable". > > I'd think you could easily fix that with a seqlock-like system. > > What makes it so simple is that you can always fall back to > calc_vmalloc_info if there's any problem, rather than looping or blocking. > > The basic idea is that you have a seqlock counter, but if either of > the two lsbits are set, the cached information is stale. > > Basically, you need a seqlock and a spinlock. The seqlock does > most of the work, and the spinlock ensures that there's only one > updater of the cache. > > vmap_unlock() does set_bit(0, &seq->sequence). This marks the information > as stale. > > get_vmalloc_info reads the seqlock. There are two case: > - If the two lsbits are 10, the cached information is valid. > Copy it out, re-check the seqlock, and loop if the sequence > number changes. > - In any other case, the cached information is > not valid. > - Try to obtain the spinlock. Do not block if it's unavailable. > - If unavailable, do not block. > - If the lock is acquired: > - Set the sequence to (sequence | 3) + 1 (we're the only writer) > - This bumps the sequence number and leaves the lsbits at 00 (invalid) > - Memory barrier TBD. Do the RCU ops in calc_vmalloc_info do it for us? > - Call calc_vmalloc_info > - If we obtained the spinlock earlier: > - Copy our vmi to cached_info > - smp_wmb() > - set_bit(1, &seq->sequence). This marks the information as valid, > as long as bit 0 is still clear. > - Release the spinlock. > > Basically, bit 0 says "vmalloc info has changed", and bit 1 says > "vmalloc cache has been updated". This clears bit 0 before > starting the update so that an update during calc_vmalloc_info > will force a new update. > > So the three case are basically: > 00 - calc_vmalloc_info() in progress > 01 - vmap_unlock() during calc_vmalloc_info() > 10 - cached_info is valid > 11 - vmap_unlock has invalidated cached_info, awaiting refresh > > Logically, the sequence number should be initialized to ...01, > but the code above handles 00 okay. I think this is too complex. How about something simple like the patch below (on top of the third patch)? It makes the vmalloc info transactional - /proc/meminfo will always print a consistent set of numbers. (Not that we really care about races there, but it looks really simple to solve so why not.) ( I also moved the function-static cache next to the flag and seqlock - this should further compress the cache footprint. ) Have I missed anything? Very lightly tested: booted in a VM. Thanks, Ingo =========================> mm/vmalloc.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ef48e557df5a..66726f41e726 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -278,7 +278,15 @@ EXPORT_SYMBOL(vmalloc_to_pfn); static __cacheline_aligned_in_smp DEFINE_SPINLOCK(vmap_area_lock); +/* + * Seqlock and flag for the vmalloc info cache printed in /proc/meminfo. + * + * The assumption of the optimization is that it's read frequently, but + * modified infrequently. + */ +static DEFINE_SEQLOCK(vmap_info_lock); static int vmap_info_changed; +static struct vmalloc_info vmap_info_cache; static inline void vmap_lock(void) { @@ -2752,10 +2760,14 @@ static void calc_vmalloc_info(struct vmalloc_info *vmi) void get_vmalloc_info(struct vmalloc_info *vmi) { - static struct vmalloc_info cached_info; + if (!READ_ONCE(vmap_info_changed)) { + unsigned int seq; + + do { + seq = read_seqbegin(&vmap_info_lock); + *vmi = vmap_info_cache; + } while (read_seqretry(&vmap_info_lock, seq)); - if (!vmap_info_changed) { - *vmi = cached_info; return; } @@ -2764,8 +2776,9 @@ void get_vmalloc_info(struct vmalloc_info *vmi) calc_vmalloc_info(vmi); - barrier(); - cached_info = *vmi; + write_seqlock(&vmap_info_lock); + vmap_info_cache = *vmi; + write_sequnlock(&vmap_info_lock); } #endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/