Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754006Ab3ENRD3 (ORCPT ); Tue, 14 May 2013 13:03:29 -0400 Received: from e8.ny.us.ibm.com ([32.97.182.138]:42175 "EHLO e8.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751177Ab3ENRD2 (ORCPT ); Tue, 14 May 2013 13:03:28 -0400 Date: Tue, 14 May 2013 12:03:11 -0500 From: Seth Jennings To: Bob Liu Cc: Andrew Morton , Greg Kroah-Hartman , Nitin Gupta , Minchan Kim , Konrad Rzeszutek Wilk , Dan Magenheimer , Robert Jennings , Jenifer Hopper , Mel Gorman , Johannes Weiner , Rik van Riel , Larry Woodman , Benjamin Herrenschmidt , Dave Hansen , Joe Perches , Joonsoo Kim , Cody P Schafer , Hugh Dickens , Paul Mackerras , linux-mm@kvack.org, linux-kernel@vger.kernel.org, devel@driverdev.osuosl.org Subject: Re: [PATCHv11 2/4] zbud: add to mm/ Message-ID: <20130514170311.GD4024@medulla> References: <1368448803-2089-1-git-send-email-sjenning@linux.vnet.ibm.com> <1368448803-2089-3-git-send-email-sjenning@linux.vnet.ibm.com> <5191FA1C.8070805@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5191FA1C.8070805@oracle.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13051417-9360-0000-0000-0000122A21B2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 26633 Lines: 744 On Tue, May 14, 2013 at 04:47:24PM +0800, Bob Liu wrote: > Hi Seth, > > On 05/13/2013 08:40 PM, Seth Jennings wrote: > > zbud is an special purpose allocator for storing compressed pages. It is > > designed to store up to two compressed pages per physical page. While this > > design limits storage density, it has simple and deterministic reclaim > > properties that make it preferable to a higher density approach when reclaim > > will be used. > > > > zbud works by storing compressed pages, or "zpages", together in pairs in a > > single memory page called a "zbud page". The first buddy is "left > > justifed" at the beginning of the zbud page, and the last buddy is "right > > justified" at the end of the zbud page. The benefit is that if either > > buddy is freed, the freed buddy space, coalesced with whatever slack space > > that existed between the buddies, results in the largest possible free region > > within the zbud page. > > > > zbud also provides an attractive lower bound on density. The ratio of zpages > > to zbud pages can not be less than 1. This ensures that zbud can never "do > > harm" by using more pages to store zpages than the uncompressed zpages would > > have used on their own. > > > > This patch adds zbud to mm/ for later use by zswap. > > > > Signed-off-by: Seth Jennings > > --- > > Good job! And I'm testing it! Thanks for the review! > > > include/linux/zbud.h | 22 ++ > > mm/Kconfig | 10 + > > mm/Makefile | 1 + > > mm/zbud.c | 564 ++++++++++++++++++++++++++++++++++++++++++++++++++ > > 4 files changed, 597 insertions(+) > > create mode 100644 include/linux/zbud.h > > create mode 100644 mm/zbud.c > > > > diff --git a/include/linux/zbud.h b/include/linux/zbud.h > > new file mode 100644 > > index 0000000..954252b > > --- /dev/null > > +++ b/include/linux/zbud.h > > @@ -0,0 +1,22 @@ > > +#ifndef _ZBUD_H_ > > +#define _ZBUD_H_ > > + > > +#include > > + > > +struct zbud_pool; > > + > > +struct zbud_ops { > > + int (*evict)(struct zbud_pool *pool, unsigned long handle); > > +}; > > + > > +struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops); > > +void zbud_destroy_pool(struct zbud_pool *pool); > > +int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, > > + unsigned long *handle); > > +void zbud_free(struct zbud_pool *pool, unsigned long handle); > > +int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries); > > +void *zbud_map(struct zbud_pool *pool, unsigned long handle); > > +void zbud_unmap(struct zbud_pool *pool, unsigned long handle); > > +int zbud_get_pool_size(struct zbud_pool *pool); > > + > > +#endif /* _ZBUD_H_ */ > > diff --git a/mm/Kconfig b/mm/Kconfig > > index e742d06..908f41b 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -477,3 +477,13 @@ config FRONTSWAP > > and swap data is stored as normal on the matching swap device. > > > > If unsure, say Y to enable frontswap. > > + > > +config ZBUD > > + tristate "Buddy allocator for compressed pages" > > + default n > > + help > > + zbud is an special purpose allocator for storing compressed pages. > > + It is designed to store up to two compressed pages per physical page. > > + While this design limits storage density, it has simple and > > + deterministic reclaim properties that make it preferable to a higher > > + density approach when reclaim will be used. > > diff --git a/mm/Makefile b/mm/Makefile > > index 72c5acb..95f0197 100644 > > --- a/mm/Makefile > > +++ b/mm/Makefile > > @@ -58,3 +58,4 @@ obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o > > obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o > > obj-$(CONFIG_CLEANCACHE) += cleancache.o > > obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o > > +obj-$(CONFIG_ZBUD) += zbud.o > > diff --git a/mm/zbud.c b/mm/zbud.c > > new file mode 100644 > > index 0000000..e5bd0e6 > > --- /dev/null > > +++ b/mm/zbud.c > > @@ -0,0 +1,564 @@ > > +/* > > + * zbud.c - Buddy Allocator for Compressed Pages > > + * > > + * Copyright (C) 2013, Seth Jennings, IBM > > + * > > + * Concepts based on zcache internal zbud allocator by Dan Magenheimer. > > + * > > + * zbud is an special purpose allocator for storing compressed pages. It is > > + * designed to store up to two compressed pages per physical page. While this > > + * design limits storage density, it has simple and deterministic reclaim > > + * properties that make it preferable to a higher density approach when reclaim > > + * will be used. > > + * > > + * zbud works by storing compressed pages, or "zpages", together in pairs in a > > + * single memory page called a "zbud page". The first buddy is "left > > + * justifed" at the beginning of the zbud page, and the last buddy is "right > > + * justified" at the end of the zbud page. The benefit is that if either > > + * buddy is freed, the freed buddy space, coalesced with whatever slack space > > + * that existed between the buddies, results in the largest possible free region > > + * within the zbud page. > > + * > > + * zbud also provides an attractive lower bound on density. The ratio of zpages > > + * to zbud pages can not be less than 1. This ensures that zbud can never "do > > + * harm" by using more pages to store zpages than the uncompressed zpages would > > + * have used on their own. > > + * > > + * zbud pages are divided into "chunks". The size of the chunks is fixed at > > + * compile time and determined by NCHUNKS_ORDER below. Dividing zbud pages > > + * into chunks allows organizing unbuddied zbud pages into a manageable number > > + * of unbuddied lists according to the number of free chunks available in the > > + * zbud page. > > + * > > + * The zbud API differs from that of conventional allocators in that the > > + * allocation function, zbud_alloc(), returns an opaque handle to the user, > > + * not a dereferenceable pointer. The user must map the handle using > > + * zbud_map() in order to get a usable pointer by which to access the > > + * allocation data and unmap the handle with zbud_unmap() when operations > > + * on the allocation data are complete. > > + */ > > + > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +/***************** > > + * Structures > > +*****************/ > > +/** > > + * struct zbud_page - zbud page metadata overlay > > + * @page: typed reference to the underlying struct page > > + * @donotuse: this overlays the page flags and should not be used > > + * @first_chunks: the size of the first buddy in chunks, 0 if free > > Nitpick, the name here seems not directly to me. > But I don't have a better idea yet. > Maybe first_buddy_size/first_buddy_nrchunks or buddy0_size. Yeah, I went through a few different names here. I abandoned the buddy0|1 naming in favor of "first" and "last". Adding _buddy might make it clearer but it adds length and the fields are documented. So I'm inclined to keep them as they are for now just to reduce churn. > > > + * @last_chunks: the size of the last buddy in chunks, 0 if free > > + * @buddy: links the zbud page into the unbuddied/buddied lists in the pool > > + * @lru: links the zbud page into the lru list in the pool > > + * > > + * This structure overlays the struct page to store metadata needed for a > > + * single storage page in for zbud. There is a BUILD_BUG_ON in zbud_init() > > + * that ensures this structure is not larger that struct page. > > + * > > + * The PG_reclaim flag of the underlying page is used for indicating > > + * that this zbud page is under reclaim (see zbud_reclaim_page()) > > + */ > > +struct zbud_page { > > + union { > > + struct page page; > > + struct { > > + unsigned long donotuse; > > + u16 first_chunks; > > + u16 last_chunks; > > + struct list_head buddy; > > + struct list_head lru; > > + }; > > + }; > > +}; > > + > > +/* > > + * NCHUNKS_ORDER determines the internal allocation granularity, effectively > > + * adjusting internal fragmentation. It also determines the number of > > + * freelists maintained in each pool. NCHUNKS_ORDER of 6 means that the > > + * allocation granularity will be in chunks of size PAGE_SIZE/64, and there > > + * will be 64 freelists per pool. > > + */ > > +#define NCHUNKS_ORDER 6 > > + > > +#define CHUNK_SHIFT (PAGE_SHIFT - NCHUNKS_ORDER) > > +#define CHUNK_SIZE (1 << CHUNK_SHIFT) > > +#define NCHUNKS (PAGE_SIZE >> CHUNK_SHIFT) > > + > > +/** > > + * struct zbud_pool - stores metadata for each zbud pool > > + * @lock: protects all pool lists and first|last_chunk fields of any > > + * zbud page in the pool > > + * @unbuddied: array of lists tracking zbud pages that only contain one buddy; > > + * the lists each zbud page is added to depends on the size of > > + * its free region. > > + * @buddied: list tracking the zbud pages that contain two buddies; > > + * these zbud pages are full > > Lack of list_head lru. Ah yes, will add. > > > + * @pages_nr: number of zbud pages in the pool. > > + * @ops: pointer to a structure of user defined operations specified at > > + * pool creation time. > > + * > > + * This structure is allocated at pool creation time and maintains metadata > > + * pertaining to a particular zbud pool. > > + */ > > +struct zbud_pool { > > + spinlock_t lock; > > + struct list_head unbuddied[NCHUNKS]; > > + struct list_head buddied; > > + struct list_head lru; > > + atomic_t pages_nr; > > + struct zbud_ops *ops; > > +}; > > + > > +/***************** > > + * Helpers > > +*****************/ > > +/* Just to make the code easier to read */ > > +enum buddy { > > + FIRST, > > + LAST > > +}; > > + > > +/* Converts an allocation size in bytes to size in zbud chunks */ > > +static inline int size_to_chunks(int size) > > +{ > > + return (size + CHUNK_SIZE - 1) >> CHUNK_SHIFT; > > +} > > + > > +#define for_each_unbuddied_list(_iter, _begin) \ > > + for ((_iter) = (_begin); (_iter) < NCHUNKS; (_iter)++) > > + > > +/* Initializes a zbud page from a newly allocated page */ > > +static inline struct zbud_page *init_zbud_page(struct page *page) > > +{ > > + struct zbud_page *zbpage = (struct zbud_page *)page; > > + zbpage->first_chunks = 0; > > + zbpage->last_chunks = 0; > > + INIT_LIST_HEAD(&zbpage->buddy); > > + INIT_LIST_HEAD(&zbpage->lru); > > + return zbpage; > > +} > > + > > +/* Resets a zbud page so that it can be properly freed */ > > Better with comment: the caller must hold the pool->lock? You don't have to hold the pool lock for reset_zbud_page(). At the point you call it, the page is already removed from all the zbud pool structures. > > > +static inline struct page *reset_zbud_page(struct zbud_page *zbpage) > > +{ > > + struct page *page = &zbpage->page; > > + set_page_private(page, 0); > > + page->mapping = NULL; > > + page->index = 0; > > + page_mapcount_reset(page); > > + init_page_count(page); > > + INIT_LIST_HEAD(&page->lru); > > + return page; > > +} > > + > > +/* > > + * Encodes the handle of a particular buddy within a zbud page > > + * Pool lock should be held as this function accesses first|last_chunks > > + */ > > +static inline unsigned long encode_handle(struct zbud_page *zbpage, > > + enum buddy bud) > > +{ > > + unsigned long handle; > > + > > + /* > > + * For now, the encoded handle is actually just the pointer to the data > > + * but this might not always be the case. A little information hiding. > > + */ > > + handle = (unsigned long)page_address(&zbpage->page); > > + if (bud == FIRST) > > + return handle; > > + handle += PAGE_SIZE - (zbpage->last_chunks << CHUNK_SHIFT); > > + return handle; > > +} > > + > > +/* Returns the zbud page where a given handle is stored */ > > +static inline struct zbud_page *handle_to_zbud_page(unsigned long handle) > > +{ > > + return (struct zbud_page *)(virt_to_page(handle)); > > +} > > + > > +/* Returns the number of free chunks in a zbud page */ > > +static inline int num_free_chunks(struct zbud_page *zbpage) > > +{ > > + /* > > + * Rather than branch for different situations, just use the fact that > > + * free buddies have a length of zero to simplify everything. > > + */ > > + return NCHUNKS - zbpage->first_chunks - zbpage->last_chunks; > > +} > > + > > +/***************** > > + * API Functions > > +*****************/ > > +/** > > + * zbud_create_pool() - create a new zbud pool > > + * @gfp: gfp flags when allocating the zbud pool structure > > + * @ops: user-defined operations for the zbud pool > > + * > > + * Return: pointer to the new zbud pool or NULL if the metadata allocation > > + * failed. > > + */ > > +struct zbud_pool *zbud_create_pool(gfp_t gfp, struct zbud_ops *ops) > > +{ > > + struct zbud_pool *pool; > > + int i; > > + > > + pool = kmalloc(sizeof(struct zbud_pool), gfp); > > + if (!pool) > > + return NULL; > > + spin_lock_init(&pool->lock); > > + for_each_unbuddied_list(i, 0) > > + INIT_LIST_HEAD(&pool->unbuddied[i]); > > + INIT_LIST_HEAD(&pool->buddied); > > + INIT_LIST_HEAD(&pool->lru); > > + atomic_set(&pool->pages_nr, 0); > > + pool->ops = ops; > > + return pool; > > +} > > +EXPORT_SYMBOL_GPL(zbud_create_pool); > > + > > +/** > > + * zbud_destroy_pool() - destroys an existing zbud pool > > + * @pool: the zbud pool to be destroyed > > + */ > > +void zbud_destroy_pool(struct zbud_pool *pool) > > +{ > > + kfree(pool); > > Pages in zbud pool should also be freed here? or if they are freed > before call this function some check may be needed. > But there isn't a problem currently since no actual user. Calling zbud_destroy_pool() on a pool with outstanding allocations is a usage error. I can add that to the documentation. Not sure if we want to go as far as adding checks to make sure that all the lists are empty. > > > +} > > +EXPORT_SYMBOL_GPL(zbud_destroy_pool); > > + > > +/** > > + * zbud_alloc() - allocates a region of a given size > > + * @pool: zbud pool from which to allocate > > + * @size: size in bytes of the desired allocation > > + * @gfp: gfp flags used if the pool needs to grow > > + * @handle: handle of the new allocation > > + * > > + * This function will attempt to find a free region in the pool large > > + * enough to satisfy the allocation request. First, it tries to use > > + * free space in the most recently used zbud page, at the beginning of > > + * the pool LRU list. If that zbud page is full or doesn't have the > > + * required free space, a best fit search of the unbuddied lists is > > + * performed. If no suitable free region is found, then a new page > > + * is allocated and added to the pool to satisfy the request. > > + * > > + * gfp should not set __GFP_HIGHMEM as highmem pages cannot be used > > + * as zbud pool pages. > > + * > > + * Return: 0 if success and handle is set, otherwise -EINVAL is the size or > > + * gfp arguments are invalid or -ENOMEM if the pool was unable to allocate > > + * a new page. > > + */ > > +int zbud_alloc(struct zbud_pool *pool, int size, gfp_t gfp, > > + unsigned long *handle) > > +{ > > + int chunks, i, freechunks; > > + struct zbud_page *zbpage = NULL; > > + enum buddy bud; > > + struct page *page; > > + > > + if (size <= 0 || size > PAGE_SIZE || gfp & __GFP_HIGHMEM) > > + return -EINVAL; > > + chunks = size_to_chunks(size); > > + spin_lock(&pool->lock); > > + > > + /* > > + * First, try to use the zbpage we last used (at the head of the > > + * LRU) to increase LRU locality of the buddies. This is first fit. > > + */ > > + if (!list_empty(&pool->lru)) { > > + zbpage = list_first_entry(&pool->lru, struct zbud_page, lru); > > + if (num_free_chunks(zbpage) >= chunks) { > > + if (zbpage->first_chunks == 0) { > > + list_del(&zbpage->buddy); > > + bud = FIRST; > > + goto found; > > + } > > + if (zbpage->last_chunks == 0) { > > + list_del(&zbpage->buddy); > > + bud = LAST; > > + goto found; > > + } > > + } > > + } > > I'd prefer to drop above lines to keep things simple since no way to > prove the benefit of it. Both you and Dan have pointed this out. While I think there is benefit, I can't prove it. In an effort to not be hypocritical, I'll pull it out until such time as I can demonstrate and measure the benefit. > > > + > > + /* Second, try to find an unbuddied zbpage. This is best fit. */ > > + zbpage = NULL; > > + for_each_unbuddied_list(i, chunks) { > > + if (!list_empty(&pool->unbuddied[i])) { > > + zbpage = list_first_entry(&pool->unbuddied[i], > > + struct zbud_page, buddy); > > + list_del(&zbpage->buddy); > > + if (zbpage->first_chunks == 0) > > + bud = FIRST; > > + else > > + bud = LAST; > > + goto found; > > + } > > + } > > + > > + /* Lastly, couldn't find unbuddied zbpage, create new one */ > > + spin_unlock(&pool->lock); > > + page = alloc_page(gfp); > > + if (!page) > > + return -ENOMEM; > > + spin_lock(&pool->lock); > > + atomic_inc(&pool->pages_nr); > > + zbpage = init_zbud_page(page); > > + bud = FIRST; > > + > > +found: > > + if (bud == FIRST) > > + zbpage->first_chunks = chunks; > > + else > > + zbpage->last_chunks = chunks; > > + > > + if (zbpage->first_chunks == 0 || zbpage->last_chunks == 0) { > > + /* Add to unbuddied list */ > > + freechunks = num_free_chunks(zbpage); > > + list_add(&zbpage->buddy, &pool->unbuddied[freechunks]); > > + } else { > > + /* Add to buddied list */ > > + list_add(&zbpage->buddy, &pool->buddied); > > + } > > + > > + /* Add/move zbpage to beginning of LRU */ > > + if (!list_empty(&zbpage->lru)) > > + list_del(&zbpage->lru); > > + list_add(&zbpage->lru, &pool->lru); > > + > > + *handle = encode_handle(zbpage, bud); > > + spin_unlock(&pool->lock); > > + > > + return 0; > > +} > > +EXPORT_SYMBOL_GPL(zbud_alloc); > > + > > +/** > > + * zbud_free() - frees the allocation associated with the given handle > > + * @pool: pool in which the allocation resided > > + * @handle: handle associated with the allocation returned by zbud_alloc() > > + * > > + * In the case that the zbud page in which the allocation resides is under > > + * reclaim, as indicated by the PG_reclaim flag being set, this function > > + * only sets the first|last_chunks to 0. The page is actually freed > > + * once both buddies are evicted (see zbud_reclaim_page() below). > > + */ > > +void zbud_free(struct zbud_pool *pool, unsigned long handle) > > +{ > > + struct zbud_page *zbpage; > > + int freechunks; > > + > > + spin_lock(&pool->lock); > > + zbpage = handle_to_zbud_page(handle); > > + > > + /* If first buddy, handle will be page aligned */ > > + if (handle & ~PAGE_MASK) > > + zbpage->last_chunks = 0; > > + else > > + zbpage->first_chunks = 0; > > + > > + if (PageReclaim(&zbpage->page)) { > > + /* zbpage is under reclaim, reclaim will free */ > > + spin_unlock(&pool->lock); > > + return; > > + } > > + > > + /* Remove from existing buddy list */ > > + list_del(&zbpage->buddy); > > + > > + if (zbpage->first_chunks == 0 && zbpage->last_chunks == 0) { > > + /* zbpage is empty, free */ > > + list_del(&zbpage->lru); > > + __free_page(reset_zbud_page(zbpage)); > > + atomic_dec(&pool->pages_nr); > > + } else { > > + /* Add to unbuddied list */ > > + freechunks = num_free_chunks(zbpage); > > + list_add(&zbpage->buddy, &pool->unbuddied[freechunks]); > > + } > > + > > + spin_unlock(&pool->lock); > > +} > > +EXPORT_SYMBOL_GPL(zbud_free); > > + > > +#define list_tail_entry(ptr, type, member) \ > > + list_entry((ptr)->prev, type, member) > > + > > +/** > > + * zbud_reclaim_page() - evicts allocations from a pool page and frees it > > + * @pool: pool from which a page will attempt to be evicted > > + * @retires: number of pages on the LRU list for which eviction will > > + * be attempted before failing > > + * > > + * zbud reclaim is different from normal system reclaim in that the reclaim is > > + * done from the bottom, up. This is because only the bottom layer, zbud, has > > + * information on how the allocations are organized within each zbud page. This > > + * has the potential to create interesting locking situations between zbud and > > + * the user, however. > > + * > > + * To avoid these, this is how zbud_reclaim_page() should be called: > > + > > + * The user detects a page should be reclaimed and calls zbud_reclaim_page(). > > + * zbud_reclaim_page() will remove a zbud page from the pool LRU list and call > > + * the user-defined eviction handler with the pool and handle as arguments. > > + * > > + * If the handle can not be evicted, the eviction handler should return > > + * non-zero. zbud_reclaim_page() will add the zbud page back to the > > + * appropriate list and try the next zbud page on the LRU up to > > + * a user defined number of retries. > > + * > > + * If the handle is successfully evicted, the eviction handler should > > + * return 0 _and_ should have called zbud_free() on the handle. zbud_free() > > + * contains logic to delay freeing the page if the page is under reclaim, > > + * as indicated by the setting of the PG_reclaim flag on the underlying page. > > + * > > + * If all buddies in the zbud page are successfully evicted, then the > > + * zbud page can be freed. > > + * > > + * Returns: 0 if page is successfully freed, otherwise -EINVAL if there are > > + * no pages to evict or an eviction handler is not registered, -EAGAIN if > > + * the retry limit was hit. > > + */ > > +int zbud_reclaim_page(struct zbud_pool *pool, unsigned int retries) > > +{ > > + int i, ret, freechunks; > > + struct zbud_page *zbpage; > > + unsigned long first_handle = 0, last_handle = 0; > > + > > + spin_lock(&pool->lock); > > + if (!pool->ops || !pool->ops->evict || list_empty(&pool->lru) || > > + retries == 0) { > > + spin_unlock(&pool->lock); > > + return -EINVAL; > > + } > > + for (i = 0; i < retries; i++) { > > + zbpage = list_tail_entry(&pool->lru, struct zbud_page, lru); > > + list_del(&zbpage->lru); > > + list_del(&zbpage->buddy); > > + /* Protect zbpage against free */ > > + SetPageReclaim(&zbpage->page); > > + /* > > + * We need encode the handles before unlocking, since we can > > + * race with free that will set (first|last)_chunks to 0 > > + */ > > + first_handle = 0; > > + last_handle = 0; > > + if (zbpage->first_chunks) > > + first_handle = encode_handle(zbpage, FIRST); > > + if (zbpage->last_chunks) > > + last_handle = encode_handle(zbpage, LAST); > > + spin_unlock(&pool->lock); > > + > > + /* Issue the eviction callback(s) */ > > + if (first_handle) { > > + ret = pool->ops->evict(pool, first_handle); > > + if (ret) > > + goto next; > > + } > > + if (last_handle) { > > + ret = pool->ops->evict(pool, last_handle); > > + if (ret) > > + goto next; > > Will go to next anyway! Yes, yes it will :) > > > + } > > +next: > > + spin_lock(&pool->lock); > > + ClearPageReclaim(&zbpage->page); > > + if (zbpage->first_chunks == 0 && zbpage->last_chunks == 0) { > > + /* > > + * Both buddies are now free, free the zbpage and > > + * return success. > > + */ > > + __free_page(reset_zbud_page(zbpage)); > > + atomic_dec(&pool->pages_nr); > > + spin_unlock(&pool->lock); > > + return 0; > > + } else if (zbpage->first_chunks == 0 || > > + zbpage->last_chunks == 0) { > > + /* add to unbuddied list */ > > + freechunks = num_free_chunks(zbpage); > > + list_add(&zbpage->buddy, &pool->unbuddied[freechunks]); > > + } else { > > + /* add to buddied list */ > > + list_add(&zbpage->buddy, &pool->buddied); > > + } > > + > > + /* add to beginning of LRU */ > > + list_add(&zbpage->lru, &pool->lru); > > + } > > + spin_unlock(&pool->lock); > > + return -EAGAIN; > > +} > > +EXPORT_SYMBOL_GPL(zbud_reclaim_page); > > + > > +/** > > + * zbud_map() - maps the allocation associated with the given handle > > + * @pool: pool in which the allocation resides > > + * @handle: handle associated with the allocation to be mapped > > + * > > + * While trivial for zbud, the mapping functions for others allocators > > + * implementing this allocation API could have more complex information encoded > > + * in the handle and could create temporary mappings to make the data > > + * accessible to the user. > > + * > > + * Returns: a pointer to the mapped allocation > > + */ > > +void *zbud_map(struct zbud_pool *pool, unsigned long handle) > > +{ > > + return (void *)(handle); > > +} > > +EXPORT_SYMBOL_GPL(zbud_map); > > + > > +/** > > + * zbud_unmap() - maps the allocation associated with the given handle > > + * @pool: pool in which the allocation resides > > + * @handle: handle associated with the allocation to be unmapped > > + */ > > +void zbud_unmap(struct zbud_pool *pool, unsigned long handle) > > +{ > > +} > > +EXPORT_SYMBOL_GPL(zbud_unmap); > > + > > +/** > > + * zbud_get_pool_size() - gets the zbud pool size in pages > > + * @pool: pool whose size is being queried > > + * > > + * Returns: size in pages of the given pool > > + */ > > +int zbud_get_pool_size(struct zbud_pool *pool) > > +{ > > + return atomic_read(&pool->pages_nr); > > Should hold pool->lock? > I saw some other place dec/inc pool->pages_nr with holding pool->lock. It is atomic specifically to avoid having to hold the lock. However, I think I do hold the lock everywhere else, so it is confusing. I'll document this and possibly move the other references outside the lock so it is obvious. Pending these changes, can I get your ack? Thanks, Seth > > > +} > > +EXPORT_SYMBOL_GPL(zbud_get_pool_size); > > + > > +static int __init init_zbud(void) > > +{ > > + /* Make sure we aren't overflowing the underlying struct page */ > > + BUILD_BUG_ON(sizeof(struct zbud_page) != sizeof(struct page)); > > + /* Make sure we can represent any chunk offset with a u16 */ > > + BUILD_BUG_ON(sizeof(u16) * BITS_PER_BYTE < PAGE_SHIFT - CHUNK_SHIFT); > > + pr_info("loaded\n"); > > + return 0; > > +} > > + > > +static void __exit exit_zbud(void) > > +{ > > + pr_info("unloaded\n"); > > +} > > + > > +module_init(init_zbud); > > +module_exit(exit_zbud); > > + > > +MODULE_LICENSE("GPL"); > > +MODULE_AUTHOR("Seth Jennings "); > > +MODULE_DESCRIPTION("Buddy Allocator for Compressed Pages"); > > > > -- > Regards, > -Bob > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/