Received: by 10.223.164.202 with SMTP id h10csp913083wrb; Thu, 9 Nov 2017 17:10:07 -0800 (PST) X-Google-Smtp-Source: ABhQp+TQXBOyecEZ3DbbNIuh9DecMEv744ieJWMuCkAaI8J14FFdDC8+IJmqOOxaGN9FAdUPyLMs X-Received: by 10.101.64.4 with SMTP id f4mr2205367pgp.301.1510276206957; Thu, 09 Nov 2017 17:10:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510276206; cv=none; d=google.com; s=arc-20160816; b=b/ELz1ToP52xKHG8+uFfF+kEG7jvJB8OLidyawGWDSiO+Hw8/MMwCECUbo1Q5UCqkO yJCKblMRj+O3dwG4pwZmNZTICVNweQaR+3dRF/j7ouC3PXyx4zKWc6o6m4p9OpTR+FhT pQZRTtGmyYwB6IEtTEz6H1lg5VnI0fVgVdbBBcESU+5sIhJoc668Q/0qFq2s74fKGhr2 VdGG8HraoPjczjLZIf44XnuswfrlhQVoowv5h2dmqptksZf+0piHcokZXH8laKZtL50d rOnJX7+0WoLD65l6R2puKiXvQOOtz+xC7JJxsEzgRIIocmkYJzzNENbtpgLir9AVVOGM w35A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=8OfUWc/NdCcPd9NY+cvmIsvqNXkf+TKKYksd3dAOOfo=; b=uTVzfX/BoaHRndessWMSgR3Qp7CIpPtS7DYrEf4sJz/r2zvwOzDLI0SgSX+t2a5+Jw f9UcdFS7xk2YYTZDpPRoV1od1e3c9S/uMvSAcx9dbBVJwx3odVSd03gcNoQ/8/3PyDiN n9FqoMEqkyJ9AbOVZyJlJZ5kiuIegZ5+aAoJQ6YwEtqmOxWHMSVPcniDhNObJgRtC2TM F/TAMZj8Vr3kwmv4YdI+nAceEJvYTcLQVYVfvIJV+gi42l0ND/az+v9EenvQ1gZBBE82 0kqOOdV+ggcmvnK5q9950jwpf+OCd8Mx/JfjZa0RqogXtqW9/O5FbjtmbBfjDRibL++S Zvjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@docker.com header.s=google header.b=g3XuPrxK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=docker.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n76si8079266pfa.220.2017.11.09.17.09.55; Thu, 09 Nov 2017 17:10:06 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@docker.com header.s=google header.b=g3XuPrxK; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=docker.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755567AbdKJBJV (ORCPT + 83 others); Thu, 9 Nov 2017 20:09:21 -0500 Received: from mail-io0-f196.google.com ([209.85.223.196]:54751 "EHLO mail-io0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755527AbdKJBJK (ORCPT ); Thu, 9 Nov 2017 20:09:10 -0500 Received: by mail-io0-f196.google.com with SMTP id e89so11884539ioi.11 for ; Thu, 09 Nov 2017 17:09:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=docker.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=8OfUWc/NdCcPd9NY+cvmIsvqNXkf+TKKYksd3dAOOfo=; b=g3XuPrxKyOZ4wpt3CDtLirsOCax/XG/MXd77zT8Ox52WXAD0dMa4Q3bwE6yKjJtI2N GDh6pM5EVBVeqHU4bla4VtL0afvv4pnHw01T+PDoovBwNW7bU4d0vY3IwKumgG+gtt6p gdYrmx47pr09NSFQ3UpGLRcYjKKRSNKqfBUA8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=8OfUWc/NdCcPd9NY+cvmIsvqNXkf+TKKYksd3dAOOfo=; b=bBa0JGHB+cUssbMOgJX2rhvla+NraR4ZYFJ42NMccjF675mQrYm81FtFcdLYuowbYE PpiSEJKgpzUKSWg2vHDa0dgjltFuT/fmI+TlBrsz9WlLNaLu68MGcCJImogMeCWHJKZa eXcHUlH2NUmsgTOdH0qeAlK0vmo9eIREaew6xu6KpIR95cDvqxAXtYTlCGzEYKzHAPT/ 0fTDNUDXOWhj1ln/f52yghuJ0tN0BqOMxMDlOkpn+z4hXP9n1id0nyIDPgP1rlrbsXHJ s+qbF/qOrg2PIIav3619peP2fpc1NPfmqyPWNVkR91UlkRn1pipQkTiD6iOvQZB4Cvb1 M5nQ== X-Gm-Message-State: AJaThX4yMvM1BoahHhkd2P1UOg327D8/3dWdItN5HhjNLEzPHSwF11h9 5lCBGWdhoJpSGeCFA0iDWl6Rrg== X-Received: by 10.107.175.203 with SMTP id p72mr2963305ioo.32.1510276149633; Thu, 09 Nov 2017 17:09:09 -0800 (PST) Received: from smitten ([8.24.24.129]) by smtp.gmail.com with ESMTPSA id 31sm4091289ioq.10.2017.11.09.17.09.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 09 Nov 2017 17:09:08 -0800 (PST) Date: Thu, 9 Nov 2017 18:09:07 -0700 From: Tycho Andersen To: Dave Hansen Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, Marco Benatto , Juerg Haefliger , x86@kernel.org Subject: Re: [PATCH v6 03/11] mm, x86: Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20171110010907.qfkqhrbtdkt5y3hy@smitten> References: <20170907173609.22696-1-tycho@docker.com> <20170907173609.22696-4-tycho@docker.com> <34454a32-72c2-c62e-546c-1837e05327e1@intel.com> <20170920223452.vam3egenc533rcta@smitten> <97475308-1f3d-ea91-5647-39231f3b40e5@intel.com> <20170921000901.v7zo4g5edhqqfabm@docker> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Dave, On Wed, Sep 20, 2017 at 05:27:02PM -0700, Dave Hansen wrote: > On 09/20/2017 05:09 PM, Tycho Andersen wrote: > >> I think the only thing that will really help here is if you batch the > >> allocations. For instance, you could make sure that the per-cpu-pageset > >> lists always contain either all kernel or all user data. Then remap the > >> entire list at once and do a single flush after the entire list is consumed. > > Just so I understand, the idea would be that we only flush when the > > type of allocation alternates, so: > > > > kmalloc(..., GFP_KERNEL); > > kmalloc(..., GFP_KERNEL); > > /* remap+flush here */ > > kmalloc(..., GFP_HIGHUSER); > > /* remap+flush here */ > > kmalloc(..., GFP_KERNEL); > > Not really. We keep a free list per migrate type, and a per_cpu_pages > (pcp) list per migratetype: > > > struct per_cpu_pages { > > int count; /* number of pages in the list */ > > int high; /* high watermark, emptying needed */ > > int batch; /* chunk size for buddy add/remove */ > > > > /* Lists of pages, one per migrate type stored on the pcp-lists */ > > struct list_head lists[MIGRATE_PCPTYPES]; > > }; > > The migratetype is derived from the GFP flags in > gfpflags_to_migratetype(). In general, GFP_HIGHUSER and GFP_KERNEL come > from different migratetypes, so they come from different free lists. > > In your case above, the GFP_HIGHUSER allocation come through the > MIGRATE_MOVABLE pcp list while the GFP_KERNEL ones come from the > MIGRATE_UNMOVABLE one. Since we add a bunch of pages to those lists at > once, you could do all the mapping/unmapping/flushing on a bunch of > pages at once So I've been playing around with an implementation of this, which is basically: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d9c1b486e1f..47b46ff1148a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2348,6 +2348,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, if (is_migrate_cma(get_pcppage_migratetype(page))) __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, -(1 << order)); + xpfo_pcp_refill(page, migratetype, order); } /* diff --git a/mm/xpfo.c b/mm/xpfo.c index 080235a2f129..b381d83c6e78 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -260,3 +265,85 @@ void xpfo_temp_unmap(const void *addr, size_t size, void **mapping, kunmap_atomic(mapping[i]); } EXPORT_SYMBOL(xpfo_temp_unmap); + +void xpfo_pcp_refill(struct page *page, enum migratetype migratetype, int order) +{ + int i; + bool flush_tlb = false; + + if (!static_branch_unlikely(&xpfo_initialized)) + return; + + for (i = 0; i < 1 << order; i++) { + struct xpfo *xpfo; + + xpfo = lookup_xpfo(page + i); + if (!xpfo) + continue; + + if (unlikely(!xpfo->initialized)) { + spin_lock_init(&xpfo->maplock); + atomic_set(&xpfo->mapcount, 0); + xpfo->initialized = true; + } + + xpfo->trace.max_entries = 20; + xpfo->trace.skip = 1; + xpfo->trace.entries = xpfo->entries; + xpfo->trace.nr_entries = 0; + xpfo->trace2.max_entries = 20; + xpfo->trace2.skip = 1; + xpfo->trace2.entries = xpfo->entries2; + xpfo->trace2.nr_entries = 0; + + xpfo->migratetype = migratetype; + + save_stack_trace(&xpfo->trace); + + if (migratetype == MIGRATE_MOVABLE) { + /* GPF_HIGHUSER */ + set_kpte(page_address(page + i), page + i, __pgprot(0)); + if (!test_and_set_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags)) + flush_tlb = true; + set_bit(XPFO_PAGE_USER, &xpfo->flags); + } else { + /* + * GFP_KERNEL and everything else; for now we just + * leave it mapped + */ + set_kpte(page_address(page + i), page + i, PAGE_KERNEL); + if (test_and_clear_bit(XPFO_PAGE_UNMAPPED, &xpfo->flags)) + flush_tlb = true; + clear_bit(XPFO_PAGE_USER, &xpfo->flags); + } + } + + if (flush_tlb) + xpfo_flush_kernel_tlb(page, order); +} + But I'm getting some faults: [ 1.897311] BUG: unable to handle kernel paging request at ffff880139b75012 [ 1.898244] IP: ext4_fill_super+0x2f3b/0x33c0 [ 1.898827] PGD 1ea6067 [ 1.898828] P4D 1ea6067 [ 1.899170] PUD 1ea9067 [ 1.899508] PMD 119478063 [ 1.899850] PTE 139b75000 [ 1.900211] [ 1.900760] Oops: 0000 [#1] SMP [ 1.901160] Modules linked in: [ 1.901565] CPU: 3 PID: 990 Comm: exe Not tainted 4.13.0+ #85 [ 1.902348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 1.903420] task: ffff88011ae7cb00 task.stack: ffffc9001a338000 [ 1.904108] RIP: 0010:ext4_fill_super+0x2f3b/0x33c0 [ 1.904649] RSP: 0018:ffffc9001a33bce0 EFLAGS: 00010246 [ 1.905240] RAX: 00000000000000f0 RBX: ffff880139b75000 RCX: ffffffff81c456b8 [ 1.906047] RDX: 0000000000000001 RSI: 0000000000000082 RDI: 0000000000000246 [ 1.906874] RBP: ffffc9001a33bda8 R08: 0000000000000000 R09: 0000000000000183 [ 1.908053] R10: ffff88011a9e0800 R11: ffffffff818493e0 R12: ffff88011a9e0800 [ 1.908920] R13: ffff88011a9e6800 R14: 000000000077fefa R15: 0000000000000000 [ 1.909775] FS: 00007f8169747700(0000) GS:ffff880139d80000(0000) knlGS:0000000000000000 [ 1.910667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.911293] CR2: ffff880139b75012 CR3: 000000011a965000 CR4: 00000000000006e0 [ 1.912050] Call Trace: [ 1.912356] ? register_shrinker+0x80/0x90 [ 1.912826] mount_bdev+0x177/0x1b0 [ 1.913234] ? ext4_calculate_overhead+0x4a0/0x4a0 [ 1.913744] ext4_mount+0x10/0x20 [ 1.914115] mount_fs+0x2d/0x140 [ 1.914490] ? __alloc_percpu+0x10/0x20 [ 1.914903] vfs_kern_mount.part.20+0x58/0x110 [ 1.915394] do_mount+0x1cc/0xca0 [ 1.915758] ? _copy_from_user+0x6b/0xa0 [ 1.916198] ? memdup_user+0x3d/0x70 [ 1.916576] SyS_mount+0x93/0xe0 [ 1.916915] entry_SYSCALL_64_fastpath+0x1a/0xa5 [ 1.917401] RIP: 0033:0x7f8169264b5a [ 1.917777] RSP: 002b:00007fff6ce82bc8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 1.918576] RAX: ffffffffffffffda RBX: 0000000000fb2030 RCX: 00007f8169264b5a [ 1.919313] RDX: 00007fff6ce84e61 RSI: 00007fff6ce84e70 RDI: 00007fff6ce84e66 [ 1.920042] RBP: 0000000000008000 R08: 0000000000000000 R09: 0000000000000000 [ 1.920771] R10: 0000000000008001 R11: 0000000000000202 R12: 0000000000000000 [ 1.921512] R13: 0000000000000000 R14: 00007fff6ce82c70 R15: 0000000000445c20 [ 1.922254] Code: 83 ee 01 48 c7 c7 70 e6 97 81 e8 1d 0c e2 ff 48 89 de 48 c7 c7 a4 48 96 81 e8 0e 0c e2 ff 8b 85 5c ff ff ff 41 39 44 24 40 75 0e 43 12 04 41 0f 44 c7 89 85 5c ff ff ff 48 c7 c7 ad 48 96 81 [ 1.924489] RIP: ext4_fill_super+0x2f3b/0x33c0 RSP: ffffc9001a33bce0 [ 1.925334] CR2: ffff880139b75012 [ 1.942161] ---[ end trace fe884f328a0a7338 ]--- This is the code: if ((grp == sbi->s_groups_count) && !(gdp->bg_flags & cpu_to_le16(EXT4_BG_INODE_ZEROED))) in fs/ext4/super.c:ext4_check_descriptors() that's ultimately failing. It looks like this allocation comes from sb_bread_unmovable(), which although it says unmovable, seems to allocate the memory with: MOVABLE IO NOFAIL HARDWALL DIRECT_RECLAIM KSWAPD_RECLAIM which I guess is from the additional flags in grow_dev_page() somewhere down the stack. Anyway... it seems this is a kernel allocation that's using MIGRATE_MOVABLE, so perhaps we need some more fine tuned heuristic than just all MOVABLE allocations are un-mapped via xpfo, and all the others are mapped. Do you have any ideas? > Or, you could hook your code into the places where the migratetype of > memory is changed (set_pageblock_migratetype(), plus where we fall > back). Those changes are much more rare than page allocation. I guess this has the same issue, that sometimes the kernel allocates MOVABLE stuff that it wants to use. Thanks, Tycho From 1579111285274812862@xxx Thu Sep 21 01:37:55 +0000 2017 X-GM-THRID: 1577903505730930323 X-Gmail-Labels: Inbox,Category Forums