Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161683Ab3DEMH7 (ORCPT ); Fri, 5 Apr 2013 08:07:59 -0400 Received: from mga03.intel.com ([143.182.124.21]:62677 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161315Ab3DEL6S (ORCPT ); Fri, 5 Apr 2013 07:58:18 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.87,414,1363158000"; d="scan'208";a="281245132" From: "Kirill A. Shutemov" To: Andrea Arcangeli , Andrew Morton Cc: Al Viro , Hugh Dickins , Wu Fengguang , Jan Kara , Mel Gorman , linux-mm@kvack.org, Andi Kleen , Matthew Wilcox , "Kirill A. Shutemov" , Hillf Danton , Dave Hansen , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv3, RFC 00/34] Transparent huge page cache Date: Fri, 5 Apr 2013 14:59:24 +0300 Message-Id: <1365163198-29726-1-git-send-email-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 1.7.10.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5219 Lines: 124 From: "Kirill A. Shutemov" Here's third RFC. Thanks everybody for feedback. The patchset is pretty big already and I want to stop generate new features to keep it reviewable. Next I'll concentrate on benchmarking and tuning. Therefore some features will be outside initial transparent huge page cache implementation: - page collapsing; - migration; - tmpfs/shmem; There are few features which are not implemented and potentially can block upstreaming: 1. Currently we allocate 2M page even if we create only 1 byte file on ramfs. I don't think it's a problem by itself. With anon thp pages we also try to allocate huge pages whenever possible. The problem is that ramfs pages are unevictable and we can't just split and pushed them in swap as with anon thp. We (at some point) have to have mechanism to split last page of the file under memory pressure to reclaim some memory. 2. We don't have knobs for disabling transparent huge page cache per-mount or per-file. Should we have mount option and fadivse flags as part of initial implementation? Any thoughts? The patchset is also on git: git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git thp/pagecache v3: - set RADIX_TREE_PRELOAD_NR to 512 only if we build with THP; - rewrite lru_add_page_tail() to address few bags; - memcg accounting; - represent file thp pages in meminfo and friends; - dump page order in filemap trace; - add missed flush_dcache_page() in zero_huge_user_segment; - random cleanups based on feedback. v2: - mmap(); - fix add_to_page_cache_locked() and delete_from_page_cache(); - introduce mapping_can_have_hugepages(); - call split_huge_page() only for head page in filemap_fault(); - wait_split_huge_page(): serialize over i_mmap_mutex too; - lru_add_page_tail: avoid PageUnevictable on active/inactive lru lists; - fix off-by-one in zero_huge_user_segment(); - THP_WRITE_ALLOC/THP_WRITE_FAILED counters; Kirill A. Shutemov (34): mm: drop actor argument of do_generic_file_read() block: implement add_bdi_stat() mm: implement zero_huge_user_segment and friends radix-tree: implement preload for multiple contiguous elements memcg, thp: charge huge cache pages thp, mm: avoid PageUnevictable on active/inactive lru lists thp, mm: basic defines for transparent huge page cache thp, mm: introduce mapping_can_have_hugepages() predicate thp: represent file thp pages in meminfo and friends thp, mm: rewrite add_to_page_cache_locked() to support huge pages mm: trace filemap: dump page order thp, mm: rewrite delete_from_page_cache() to support huge pages thp, mm: trigger bug in replace_page_cache_page() on THP thp, mm: locking tail page is a bug thp, mm: handle tail pages in page_cache_get_speculative() thp, mm: add event counters for huge page alloc on write to a file thp, mm: implement grab_thp_write_begin() thp, mm: naive support of thp in generic read/write routines thp, libfs: initial support of thp in simple_read/write_begin/write_end thp: handle file pages in split_huge_page() thp: wait_split_huge_page(): serialize over i_mmap_mutex too thp, mm: truncate support for transparent huge page cache thp, mm: split huge page on mmap file page ramfs: enable transparent huge page cache x86-64, mm: proper alignment mappings with hugepages mm: add huge_fault() callback to vm_operations_struct thp: prepare zap_huge_pmd() to uncharge file pages thp: move maybe_pmd_mkwrite() out of mk_huge_pmd() thp, mm: basic huge_fault implementation for generic_file_vm_ops thp: extract fallback path from do_huge_pmd_anonymous_page() to a function thp: initial implementation of do_huge_linear_fault() thp: handle write-protect exception to file-backed huge pages thp: call __vma_adjust_trans_huge() for file-backed VMA thp: map file-backed huge pages on fault arch/x86/kernel/sys_x86_64.c | 12 +- drivers/base/node.c | 10 + fs/libfs.c | 48 +++- fs/proc/meminfo.c | 6 + fs/ramfs/inode.c | 6 +- include/linux/backing-dev.h | 10 + include/linux/huge_mm.h | 36 ++- include/linux/mm.h | 8 + include/linux/mmzone.h | 1 + include/linux/pagemap.h | 33 ++- include/linux/radix-tree.h | 11 + include/linux/vm_event_item.h | 2 + include/trace/events/filemap.h | 7 +- lib/radix-tree.c | 33 ++- mm/filemap.c | 298 ++++++++++++++++++++----- mm/huge_memory.c | 474 +++++++++++++++++++++++++++++++++------- mm/memcontrol.c | 2 - mm/memory.c | 41 +++- mm/mmap.c | 3 + mm/page_alloc.c | 7 +- mm/swap.c | 20 +- mm/truncate.c | 13 ++ mm/vmstat.c | 2 + 23 files changed, 902 insertions(+), 181 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/