Date: Wed, 11 Feb 2015 17:48:17 +1300
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Christoph Lameter <cl@linux.com>
Cc: akpm@linuxfoundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
        penberg@kernel.org, iamjoonsoo@lge.com, brouer@redhat.com
Subject: Re: [PATCH 2/3] slub: Support for array operations
Message-ID: <20150211174817.44cc5562@redhat.com>
In-Reply-To: <20150210194811.902155759@linux.com>
References: <20150210194804.288708936@linux.com>
	<20150210194811.902155759@linux.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2305
Lines: 79


On Tue, 10 Feb 2015 13:48:06 -0600 Christoph Lameter <cl@linux.com> wrote:

> The major portions are there but there is no support yet for
> directly allocating per cpu objects. There could also be more
> sophisticated code to exploit the batch freeing.
> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
[...]
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c
> +++ linux/mm/slub.c
[...]
> @@ -2516,8 +2521,78 @@ EXPORT_SYMBOL(kmem_cache_alloc_node_trac
>  #endif
>  #endif
>  
> +int slab_array_alloc_from_partial(struct kmem_cache *s,
> +			size_t nr, void **p)
> +{
> +	void **end = p + nr;
> +	struct kmem_cache_node *n = get_node(s, numa_mem_id());
> +	int allocated = 0;
> +	unsigned long flags;
> +	struct page *page, *page2;
> +
> +	if (!n->nr_partial)
> +		return 0;
> +
> +
> +	spin_lock_irqsave(&n->list_lock, flags);

This is quite an expensive lock with irqsave.


> +	list_for_each_entry_safe(page, page2, &n->partial, lru) {
> +		void *freelist;
> +
> +		if (page->objects - page->inuse > end - p)
> +			/* More objects free in page than we want */
> +			break;
> +		list_del(&page->lru);
> +		slab_lock(page);

Yet another lock cost.

> +		freelist = page->freelist;
> +		page->inuse = page->objects;
> +		page->freelist = NULL;
> +		slab_unlock(page);
> +		/* Grab all available objects */
> +		while (freelist) {
> +			*p++ = freelist;
> +			freelist = get_freepointer(s, freelist);
> +			allocated++;
> +		}
> +	}
> +	spin_unlock_irqrestore(&n->list_lock, flags);
> +	return allocated;

I estimate (on my CPU) the locking cost itself is more than 32ns, plus
the irqsave (which I've also found quite expensive, alone 14ns).  Thus,
estimated 46ns.  Single elem slub fast path cost is 18-19ns. Thus 3-4
elem bulking should be enough to amortized the cost, guess we are still
good :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/