Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5935270yba; Thu, 11 Apr 2019 08:41:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqyPaFBHSfcis8ATPlWp32ri2UbRQZtjYk9OpJGQjJcfID8fC1zsUrMFXORgMaodW5s+WE2h X-Received: by 2002:a17:902:a7:: with SMTP id a36mr32977229pla.111.1554997275607; Thu, 11 Apr 2019 08:41:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554997275; cv=none; d=google.com; s=arc-20160816; b=D6x9rqeZk9+L7nfJgIFubcGcsOh0d9awijq+6xl3M3XENkTFQMNoGJHZHEt/e3GcQ6 UKrcdKXB1vZNWet5yZo9g6O7J4ycZxWlVwF6qMvTFP6m2bC2wb9htpf0LVpjnVMSENes 2dmUtttIsiWT6TvdAnDAgsM8buNz/SSIVrDIlIMcu1lJyg4tZuz+W9wuf/Pa8AweFaXX 0bVjw/jgOXrFc/ZHt3DcmVu8CDi8rJLXeBqaWxtw5Nw7gAm2uCpOaOkmjWBJVkcSC6JB mMoD8/FxKpRG5AOvyuLMYTlOctdG6kfqevyMkEbmeSGMZDfZO9Mcz1x+jbCIkcP4Cw9a O75g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:cc:to:from:subject:dkim-signature; bh=cSOQ7Gs++diSHQm1M9fFoOA1/dE9XvDkWRHKalGstic=; b=XIWaObZIB0LcgYXVIuYe2nWIBM+I0d+x+jxtKq12zUnXYLlafRs1KkShCYTFYFJJF/ 8vSQYjUKXE87e1ZFnxaHWPkk/4fhOiOCLwiO9aImUg/xaBgyYi/fRtMFzuB0PHcyspqT eBLCZP1ExJYK4DX2hzjQJfFOlavSMJ6r5Eivr2xsHSvkrFlxo71hN9WoXTonEHQhhJ7l 9htrgdwRzEvJukSaUYwyG7HJP63v/PGMef597j5PXcLg2nfHJFtEriWrBL8Mbgc/k9S0 iqJ9zMN9gSHhrs/kZ9ZQmx/b3GmVBA2hdvjARLwhTMTWr1nP3glCi33XkCPhJ9HC+rQc lceg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NET3Rf8Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i10si35294532plb.384.2019.04.11.08.40.59; Thu, 11 Apr 2019 08:41:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NET3Rf8Q; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726656AbfDKPjD (ORCPT + 99 others); Thu, 11 Apr 2019 11:39:03 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:41061 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726073AbfDKPjC (ORCPT ); Thu, 11 Apr 2019 11:39:02 -0400 Received: by mail-wr1-f66.google.com with SMTP id r4so7944206wrq.8 for ; Thu, 11 Apr 2019 08:39:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:from:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=cSOQ7Gs++diSHQm1M9fFoOA1/dE9XvDkWRHKalGstic=; b=NET3Rf8QPYoMsqtb4EY8M/PQTQrDtNpZznGMI/5vGDVmw0WA/ccFoCAQPfE84npu/O F+Qr/8LcC2HFzQN84LeLM+kwCskAF2FuTBYaJzyv+8WUSkOoz7wl8gwHsbAkoS6TwHgk rUkjTcjh/RtrTfotGMCKZYOY7w1aY+Mz1RwI5NBuycu9VS/xvP6d6JQRXUp5/sI2e4+V TyfZdXNKoqapsm8LTJAnMIG+N+Clmnad61iIhLjBUtgaBBhrcPAIrvHKfL71fh1EAxvK g8/bjgLPOJXe1bSXEw16jHASw3hAhllMqH2v9xBFd7W/Ud5h9RPX48HwpMPDgsOHTcTx TUbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=cSOQ7Gs++diSHQm1M9fFoOA1/dE9XvDkWRHKalGstic=; b=Uy966LoBEgROwpr8f8FDFjoKDpGbgadJnYrht+ZB3to1too81IVY6U6dug8gMkRlBx nXHCk5iPAyoEx6GyvUXPp2Xf96aWODZjq7o1NPAWw8iQ8YntksV760+AuKeQgWdS/2jn ae/qHj9/IGj2GHOMxrdEnMF2gT/FzBYXJHtF7WhMWWge2u4Tp/B0jkJ8FNgh+ghWUMp9 3kxhGNXSFdn/KvSPFAvh12YeLj3QPNxK7TtFnGs3bBxuz3CUrxK4ZwSCpUgRzZylJcVn w9hqibgjZiA5iu9hJhIM5UDajxg06TaYkGdz6552E5X8hOPZ8zcwGJSxnOERywd8mFs8 +DuA== X-Gm-Message-State: APjAAAUt3Jn3NQN5T6ydDwL6oGLg91vkm0b1NYe30CcEe6iQf2XcK4wc uTe6tpFw1YWPptixyYL7IjM= X-Received: by 2002:adf:dd82:: with SMTP id x2mr21471541wrl.214.1554997139970; Thu, 11 Apr 2019 08:38:59 -0700 (PDT) Received: from ?IPv6:::1? (lan.nucleusys.com. [92.247.61.126]) by smtp.gmail.com with ESMTPSA id g132sm5590006wme.3.2019.04.11.08.38.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Apr 2019 08:38:59 -0700 (PDT) Subject: [PATCH 4/4] z3fold: support page migration From: Vitaly Wool To: Linux-MM , linux-kernel@vger.kernel.org Cc: Andrew Morton , Oleksiy.Avramchenko@sony.com, Dan Streetman References: Message-ID: Date: Thu, 11 Apr 2019 17:38:56 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Now that we are not using page address in handles directly, we can make z3fold pages movable to decrease the memory fragmentation z3fold may create over time. This patch starts advertising non-headless z3fold pages as movable and uses the existing kernel infrastructure to implement moving of such pages per memory management subsystem's request. It thus implements 3 required callbacks for page migration: * isolation callback: z3fold_page_isolate(): try to isolate the page by removing it from all lists. Pages scheduled for some activity and mapped pages will not be isolated. Return true if isolation was successful or false otherwise * migration callback: z3fold_page_migrate(): re-check critical conditions and migrate page contents to the new page provided by the memory subsystem. Returns 0 on success or negative error code otherwise * putback callback: z3fold_page_putback(): put back the page if z3fold_page_migrate() for it failed permanently (i. e. not with -EAGAIN code). Signed-off-by: Vitaly Wool --- mm/z3fold.c | 241 +++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 231 insertions(+), 10 deletions(-) diff --git a/mm/z3fold.c b/mm/z3fold.c index bebc10083f1c..d9eabfdad0fe 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -24,10 +24,18 @@ #include #include +#include +#include #include #include #include +#include +#include +#include +#include #include +#include +#include #include #include #include @@ -97,6 +105,7 @@ struct z3fold_buddy_slots { * @middle_chunks: the size of the middle buddy in chunks, 0 if free * @last_chunks: the size of the last buddy in chunks, 0 if free * @first_num: the starting number (for the first handle) + * @mapped_count: the number of objects currently mapped */ struct z3fold_header { struct list_head buddy; @@ -110,6 +119,7 @@ struct z3fold_header { unsigned short last_chunks; unsigned short start_middle; unsigned short first_num:2; + unsigned short mapped_count:2; }; /** @@ -130,6 +140,7 @@ struct z3fold_header { * @compact_wq: workqueue for page layout background optimization * @release_wq: workqueue for safe page release * @work: work_struct for safe page release + * @inode: inode for z3fold pseudo filesystem * * This structure is allocated at pool creation time and maintains metadata * pertaining to a particular z3fold pool. @@ -149,6 +160,7 @@ struct z3fold_pool { struct workqueue_struct *compact_wq; struct workqueue_struct *release_wq; struct work_struct work; + struct inode *inode; }; /* @@ -227,6 +239,59 @@ static inline void free_handle(unsigned long handle) } } +static struct dentry *z3fold_do_mount(struct file_system_type *fs_type, + int flags, const char *dev_name, void *data) +{ + static const struct dentry_operations ops = { + .d_dname = simple_dname, + }; + + return mount_pseudo(fs_type, "z3fold:", NULL, &ops, 0x33); +} + +static struct file_system_type z3fold_fs = { + .name = "z3fold", + .mount = z3fold_do_mount, + .kill_sb = kill_anon_super, +}; + +static struct vfsmount *z3fold_mnt; +static int z3fold_mount(void) +{ + int ret = 0; + + z3fold_mnt = kern_mount(&z3fold_fs); + if (IS_ERR(z3fold_mnt)) + ret = PTR_ERR(z3fold_mnt); + + return ret; +} + +static void z3fold_unmount(void) +{ + kern_unmount(z3fold_mnt); +} + +static const struct address_space_operations z3fold_aops; +static int z3fold_register_migration(struct z3fold_pool *pool) +{ + pool->inode = alloc_anon_inode(z3fold_mnt->mnt_sb); + if (IS_ERR(pool->inode)) { + pool->inode = NULL; + return 1; + } + + pool->inode->i_mapping->private_data = pool; + pool->inode->i_mapping->a_ops = &z3fold_aops; + return 0; +} + +static void z3fold_unregister_migration(struct z3fold_pool *pool) +{ + if (pool->inode) + iput(pool->inode); + } + /* Initializes the z3fold header of a newly allocated z3fold page */ static struct z3fold_header *init_z3fold_page(struct page *page, struct z3fold_pool *pool) @@ -259,8 +324,14 @@ static struct z3fold_header *init_z3fold_page(struct page *page, } /* Resets the struct page fields and frees the page */ -static void free_z3fold_page(struct page *page) +static void free_z3fold_page(struct page *page, bool headless) { + if (!headless) { + lock_page(page); + __ClearPageMovable(page); + unlock_page(page); + } + ClearPagePrivate(page); __free_page(page); } @@ -317,12 +388,12 @@ static unsigned long encode_handle(struct z3fold_header *zhdr, enum buddy bud) } /* Returns the z3fold page where a given handle is stored */ -static inline struct z3fold_header *handle_to_z3fold_header(unsigned long handle) +static inline struct z3fold_header *handle_to_z3fold_header(unsigned long h) { - unsigned long addr = handle; + unsigned long addr = h; if (!(addr & (1 << PAGE_HEADLESS))) - addr = *(unsigned long *)handle; + addr = *(unsigned long *)h; return (struct z3fold_header *)(addr & PAGE_MASK); } @@ -366,7 +437,7 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked) clear_bit(NEEDS_COMPACTING, &page->private); spin_lock(&pool->lock); if (!list_empty(&page->lru)) - list_del(&page->lru); + list_del_init(&page->lru); spin_unlock(&pool->lock); if (locked) z3fold_page_unlock(zhdr); @@ -420,7 +491,7 @@ static void free_pages_work(struct work_struct *w) continue; spin_unlock(&pool->stale_lock); cancel_work_sync(&zhdr->work); - free_z3fold_page(page); + free_z3fold_page(page, false); cond_resched(); spin_lock(&pool->stale_lock); } @@ -486,6 +557,9 @@ static int z3fold_compact_page(struct z3fold_header *zhdr) if (test_bit(MIDDLE_CHUNK_MAPPED, &page->private)) return 0; /* can't move middle chunk, it's used */ + if (unlikely(PageIsolated(page))) + return 0; + if (zhdr->middle_chunks == 0) return 0; /* nothing to compact */ @@ -546,6 +620,12 @@ static void do_compact_page(struct z3fold_header *zhdr, bool locked) return; } + if (unlikely(PageIsolated(page) || + test_bit(PAGE_STALE, &page->private))) { + z3fold_page_unlock(zhdr); + return; + } + z3fold_compact_page(zhdr); add_to_unbuddied(pool, zhdr); z3fold_page_unlock(zhdr); @@ -705,10 +785,14 @@ static struct z3fold_pool *z3fold_create_pool(const char *name, gfp_t gfp, pool->release_wq = create_singlethread_workqueue(pool->name); if (!pool->release_wq) goto out_wq; + if (z3fold_register_migration(pool)) + goto out_rwq; INIT_WORK(&pool->work, free_pages_work); pool->ops = ops; return pool; +out_rwq: + destroy_workqueue(pool->release_wq); out_wq: destroy_workqueue(pool->compact_wq); out_unbuddied: @@ -730,6 +814,7 @@ static struct z3fold_pool *z3fold_create_pool(const char *name, gfp_t gfp, static void z3fold_destroy_pool(struct z3fold_pool *pool) { kmem_cache_destroy(pool->c_handle); + z3fold_unregister_migration(pool); destroy_workqueue(pool->release_wq); destroy_workqueue(pool->compact_wq); kfree(pool); @@ -837,6 +922,7 @@ static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp, set_bit(PAGE_HEADLESS, &page->private); goto headless; } + __SetPageMovable(page, pool->inode->i_mapping); z3fold_page_lock(zhdr); found: @@ -895,7 +981,7 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) spin_lock(&pool->lock); list_del(&page->lru); spin_unlock(&pool->lock); - free_z3fold_page(page); + free_z3fold_page(page, true); atomic64_dec(&pool->pages_nr); } return; @@ -931,7 +1017,8 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) z3fold_page_unlock(zhdr); return; } - if (test_and_set_bit(NEEDS_COMPACTING, &page->private)) { + if (unlikely(PageIsolated(page)) || + test_and_set_bit(NEEDS_COMPACTING, &page->private)) { z3fold_page_unlock(zhdr); return; } @@ -1012,10 +1099,12 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) if (test_and_set_bit(PAGE_CLAIMED, &page->private)) continue; - zhdr = page_address(page); + if (unlikely(PageIsolated(page))) + continue; if (test_bit(PAGE_HEADLESS, &page->private)) break; + zhdr = page_address(page); if (!z3fold_page_trylock(zhdr)) { zhdr = NULL; continue; /* can't evict at this point */ @@ -1076,7 +1165,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) next: if (test_bit(PAGE_HEADLESS, &page->private)) { if (ret == 0) { - free_z3fold_page(page); + free_z3fold_page(page, true); atomic64_dec(&pool->pages_nr); return 0; } @@ -1153,6 +1242,8 @@ static void *z3fold_map(struct z3fold_pool *pool, unsigned long handle) break; } + if (addr) + zhdr->mapped_count++; z3fold_page_unlock(zhdr); out: return addr; @@ -1179,6 +1270,7 @@ static void z3fold_unmap(struct z3fold_pool *pool, unsigned long handle) buddy = handle_to_buddy(handle); if (buddy == MIDDLE) clear_bit(MIDDLE_CHUNK_MAPPED, &page->private); + zhdr->mapped_count--; z3fold_page_unlock(zhdr); } @@ -1193,6 +1285,128 @@ static u64 z3fold_get_pool_size(struct z3fold_pool *pool) return atomic64_read(&pool->pages_nr); } +bool z3fold_page_isolate(struct page *page, isolate_mode_t mode) +{ + struct z3fold_header *zhdr; + struct z3fold_pool *pool; + + VM_BUG_ON_PAGE(!PageMovable(page), page); + VM_BUG_ON_PAGE(PageIsolated(page), page); + + if (test_bit(PAGE_HEADLESS, &page->private)) + return false; + + zhdr = page_address(page); + z3fold_page_lock(zhdr); + if (test_bit(NEEDS_COMPACTING, &page->private) || + test_bit(PAGE_STALE, &page->private)) + goto out; + + pool = zhdr_to_pool(zhdr); + + if (zhdr->mapped_count == 0) { + kref_get(&zhdr->refcount); + if (!list_empty(&zhdr->buddy)) + list_del_init(&zhdr->buddy); + spin_lock(&pool->lock); + if (!list_empty(&page->lru)) + list_del(&page->lru); + spin_unlock(&pool->lock); + z3fold_page_unlock(zhdr); + return true; + } +out: + z3fold_page_unlock(zhdr); + return false; +} + +int z3fold_page_migrate(struct address_space *mapping, struct page *newpage, + struct page *page, enum migrate_mode mode) +{ + struct z3fold_header *zhdr, *new_zhdr; + struct z3fold_pool *pool; + struct address_space *new_mapping; + + VM_BUG_ON_PAGE(!PageMovable(page), page); + VM_BUG_ON_PAGE(!PageIsolated(page), page); + + zhdr = page_address(page); + pool = zhdr_to_pool(zhdr); + + if (!trylock_page(page)) + return -EAGAIN; + + if (!z3fold_page_trylock(zhdr)) { + unlock_page(page); + return -EAGAIN; + } + if (zhdr->mapped_count != 0) { + z3fold_page_unlock(zhdr); + unlock_page(page); + return -EBUSY; + } + new_zhdr = page_address(newpage); + memcpy(new_zhdr, zhdr, PAGE_SIZE); + newpage->private = page->private; + page->private = 0; + z3fold_page_unlock(zhdr); + spin_lock_init(&new_zhdr->page_lock); + new_mapping = page_mapping(page); + __ClearPageMovable(page); + ClearPagePrivate(page); + + get_page(newpage); + z3fold_page_lock(new_zhdr); + if (new_zhdr->first_chunks) + encode_handle(new_zhdr, FIRST); + if (new_zhdr->last_chunks) + encode_handle(new_zhdr, LAST); + if (new_zhdr->middle_chunks) + encode_handle(new_zhdr, MIDDLE); + set_bit(NEEDS_COMPACTING, &newpage->private); + new_zhdr->cpu = smp_processor_id(); + spin_lock(&pool->lock); + list_add(&newpage->lru, &pool->lru); + spin_unlock(&pool->lock); + __SetPageMovable(newpage, new_mapping); + z3fold_page_unlock(new_zhdr); + + queue_work_on(new_zhdr->cpu, pool->compact_wq, &new_zhdr->work); + + page_mapcount_reset(page); + unlock_page(page); + put_page(page); + return 0; +} + +void z3fold_page_putback(struct page *page) +{ + struct z3fold_header *zhdr; + struct z3fold_pool *pool; + + zhdr = page_address(page); + pool = zhdr_to_pool(zhdr); + + z3fold_page_lock(zhdr); + if (!list_empty(&zhdr->buddy)) + list_del_init(&zhdr->buddy); + INIT_LIST_HEAD(&page->lru); + if (kref_put(&zhdr->refcount, release_z3fold_page_locked)) { + atomic64_dec(&pool->pages_nr); + return; + } + spin_lock(&pool->lock); + list_add(&page->lru, &pool->lru); + spin_unlock(&pool->lock); + z3fold_page_unlock(zhdr); +} + +static const struct address_space_operations z3fold_aops = { + .isolate_page = z3fold_page_isolate, + .migratepage = z3fold_page_migrate, + .putback_page = z3fold_page_putback, +}; + /***************** * zpool ****************/ @@ -1290,8 +1504,14 @@ MODULE_ALIAS("zpool-z3fold"); static int __init init_z3fold(void) { + int ret; + /* Make sure the z3fold header is not larger than the page size */ BUILD_BUG_ON(ZHDR_SIZE_ALIGNED > PAGE_SIZE); + ret = z3fold_mount(); + if (ret) + return ret; + zpool_register_driver(&z3fold_zpool_driver); return 0; @@ -1299,6 +1519,7 @@ static int __init init_z3fold(void) static void __exit exit_z3fold(void) { + z3fold_unmount(); zpool_unregister_driver(&z3fold_zpool_driver); } -- 2.17.1