Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759850AbcDEVKT (ORCPT ); Tue, 5 Apr 2016 17:10:19 -0400 Received: from mail-pa0-f44.google.com ([209.85.220.44]:35480 "EHLO mail-pa0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752392AbcDEVKR (ORCPT ); Tue, 5 Apr 2016 17:10:17 -0400 Date: Tue, 5 Apr 2016 14:10:13 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Andrew Morton cc: "Kirill A. Shutemov" , Andrea Arcangeli , Andres Lagar-Cavilla , Yang Shi , Ning Qu , Hugh Dickins , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 00/31] huge tmpfs: THPagecache implemented by teams Message-ID: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6323 Lines: 122 Here is my "huge tmpfs" implementation of Transparent Huge Pagecache, rebased to v4.6-rc2 plus the "mm: easy preliminaries to THPagecache" series. The design is just the same as before, when I posted against v3.19: using a team of pagecache pages placed within a huge-order extent, instead of using a compound page (see 04/31 for more info on that). Patches 01-17 are much as before, but with whatever changes were needed for the rebase, and bugfixes folded back in. Patches 18-22 add memcg and smaps visibility. But the more important ones are patches 23-29, which add recovery: reassembling a hugepage after fragmentation or swapping. Patches 30-31 reflect gfpmask doubts: you might prefer that I fold 31 back in and keep 30 internal. It was lack of recovery which stopped me from proposing inclusion of the series a year ago: this series now is fully featured, and ready for v4.7 - but I expect we shall want to wait a release to give time to consider the alternatives. I currently believe that the same functionality (including the team implementation's support for small files, standard mlocking, and recovery) can be achieved with compound pages, but not easily: I think the huge tmpfs functionality should be made available soon, then converted at leisure to compound pages, if that works out (but it's not a job I want to do - what we have here is good enough). Huge tmpfs has been in use within Google for about a year: it's been a success, and gaining ever wider adoption. Several TODOs have not yet been toDONE, because they just haven't surfaced as real-life issues yet: that includes NUMA migration, which is at the top of my list, but so far we've done well enough without it. 01 huge tmpfs: prepare counts in meminfo, vmstat and SysRq-m 02 huge tmpfs: include shmem freeholes in available memory 03 huge tmpfs: huge=N mount option and /proc/sys/vm/shmem_huge 04 huge tmpfs: try to allocate huge pages, split into a team 05 huge tmpfs: avoid team pages in a few places 06 huge tmpfs: shrinker to migrate and free underused holes 07 huge tmpfs: get_unmapped_area align & fault supply huge page 08 huge tmpfs: try_to_unmap_one use page_check_address_transhuge 09 huge tmpfs: avoid premature exposure of new pagetable 10 huge tmpfs: map shmem by huge page pmd or by page team ptes 11 huge tmpfs: disband split huge pmds on race or memory failure 12 huge tmpfs: extend get_user_pages_fast to shmem pmd 13 huge tmpfs: use Unevictable lru with variable hpage_nr_pages 14 huge tmpfs: fix Mlocked meminfo, track huge & unhuge mlocks 15 huge tmpfs: fix Mapped meminfo, track huge & unhuge mappings 16 kvm: plumb return of hva when resolving page fault. 17 kvm: teach kvm to map page teams as huge pages. 18 huge tmpfs: mem_cgroup move charge on shmem huge pages 19 huge tmpfs: mem_cgroup shmem_pmdmapped accounting 20 huge tmpfs: mem_cgroup shmem_hugepages accounting 21 huge tmpfs: show page team flag in pageflags 22 huge tmpfs: /proc//smaps show ShmemHugePages 23 huge tmpfs recovery: framework for reconstituting huge pages 24 huge tmpfs recovery: shmem_recovery_populate to fill huge page 25 huge tmpfs recovery: shmem_recovery_remap & remap_team_by_pmd 26 huge tmpfs recovery: shmem_recovery_swapin to read from swap 27 huge tmpfs recovery: tweak shmem_getpage_gfp to fill team 28 huge tmpfs recovery: debugfs stats to complete this phase 29 huge tmpfs recovery: page migration call back into shmem 30 huge tmpfs: shmem_huge_gfpmask and shmem_recovery_gfpmask 31 huge tmpfs: no kswapd by default on sync allocations Documentation/cgroup-v1/memory.txt | 2 Documentation/filesystems/proc.txt | 20 Documentation/filesystems/tmpfs.txt | 106 + Documentation/sysctl/vm.txt | 46 Documentation/vm/pagemap.txt | 2 Documentation/vm/transhuge.txt | 38 Documentation/vm/unevictable-lru.txt | 15 arch/mips/mm/gup.c | 15 arch/s390/mm/gup.c | 19 arch/sparc/mm/gup.c | 19 arch/x86/kvm/mmu.c | 150 + arch/x86/kvm/paging_tmpl.h | 6 arch/x86/mm/gup.c | 15 drivers/base/node.c | 20 drivers/char/mem.c | 23 fs/proc/meminfo.c | 11 fs/proc/page.c | 6 fs/proc/task_mmu.c | 28 include/linux/huge_mm.h | 14 include/linux/kvm_host.h | 2 include/linux/memcontrol.h | 17 include/linux/migrate.h | 2 include/linux/migrate_mode.h | 2 include/linux/mm.h | 3 include/linux/mm_types.h | 1 include/linux/mmzone.h | 5 include/linux/page-flags.h | 10 include/linux/shmem_fs.h | 29 include/trace/events/migrate.h | 7 include/trace/events/mmflags.h | 7 include/uapi/linux/kernel-page-flags.h | 3 ipc/shm.c | 6 kernel/sysctl.c | 33 mm/compaction.c | 5 mm/filemap.c | 10 mm/gup.c | 19 mm/huge_memory.c | 363 +++- mm/internal.h | 26 mm/memcontrol.c | 187 +- mm/memory-failure.c | 7 mm/memory.c | 225 +- mm/mempolicy.c | 13 mm/migrate.c | 37 mm/mlock.c | 183 +- mm/mmap.c | 16 mm/page-writeback.c | 2 mm/page_alloc.c | 55 mm/rmap.c | 129 - mm/shmem.c | 2066 ++++++++++++++++++++++- mm/swap.c | 5 mm/truncate.c | 2 mm/util.c | 1 mm/vmscan.c | 47 mm/vmstat.c | 3 tools/vm/page-types.c | 2 virt/kvm/kvm_main.c | 14 56 files changed, 3627 insertions(+), 472 deletions(-)