Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753020AbdFRVah (ORCPT ); Sun, 18 Jun 2017 17:30:37 -0400 Received: from omzsmtpe01.verizonbusiness.com ([199.249.25.210]:32428 "EHLO omzsmtpe01.verizonbusiness.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752399AbdFRVaf (ORCPT ); Sun, 18 Jun 2017 17:30:35 -0400 X-IronPort-Anti-Spam-Filtered: false From: "Levin, Alexander (Sasha Levin)" X-IronPort-AV: E=Sophos;i="5.39,316,1493683200"; d="scan'208";a="220730626" X-Host: discovery.odc.vzwcorp.com To: Andy Lutomirski CC: "x86@kernel.org" , "linux-kernel@vger.kernel.org" , Borislav Petkov , "Linus Torvalds" , Andrew Morton , Mel Gorman , "linux-mm@kvack.org" , Nadav Amit , Rik van Riel , Dave Hansen , "Arjan van de Ven" , Peter Zijlstra Subject: Re: [PATCH v2 00/10] PCID and improved laziness Thread-Topic: [PATCH v2 00/10] PCID and improved laziness Thread-Index: AQHS6HoELWEJvwtcWEi7BoY32iOyaw== Date: Sun, 18 Jun 2017 21:29:51 +0000 Message-ID: <20170618212948.mt33zbajt5n6saed@sasha-lappy> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: NeoMutt/20170113 (1.7.2) x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.144.60.250] Content-Type: text/plain; charset="us-ascii" Content-ID: MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v5ILUj9w008302 Content-Length: 4344 Lines: 90 On Tue, Jun 13, 2017 at 09:56:18PM -0700, Andy Lutomirski wrote: >There are three performance benefits here: > >1. TLB flushing is slow. (I.e. the flush itself takes a while.) > This avoids many of them when switching tasks by using PCID. In > a stupid little benchmark I did, it saves about 100ns on my laptop > per context switch. I'll try to improve that benchmark. > >2. Mms that have been used recently on a given CPU might get to keep > their TLB entries alive across process switches with this patch > set. TLB fills are pretty fast on modern CPUs, but they're even > faster when they don't happen. > >3. Lazy TLB is way better. We used to do two stupid things when we > ran kernel threads: we'd send IPIs to flush user contexts on their > CPUs and then we'd write to CR3 for no particular reason as an excuse > to stop further IPIs. With this patch, we do neither. > >This will, in general, perform suboptimally if paravirt TLB flushing >is in use (currently just Xen, I think, but Hyper-V is in the works). >The code is structured so we could fix it in one of two ways: we >could take a spinlock when touching the percpu state so we can update >it remotely after a paravirt flush, or we could be more careful about >our exactly how we access the state and use cmpxchg16b to do atomic >remote updates. (On SMP systems without cmpxchg16b, we'd just skip >the optimization entirely.) Hey Andy, I've started seeing the following in -next: ------------[ cut here ]------------ kernel BUG at arch/x86/mm/tlb.c:47! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 5302 Comm: kworker/u9:1 Not tainted 4.12.0-rc5+ #142 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014 Workqueue: writeback wb_workfn (flush-259:0) task: ffff880030ad0040 task.stack: ffff880036e78000 RIP: 0010:leave_mm+0x33/0x40 arch/x86/mm/tlb.c:50 RSP: 0018:ffff880036e7d4c8 EFLAGS: 00010246 RAX: 0000000000000001 RBX: ffff88006a65e240 RCX: dffffc0000000000 RDX: 0000000000000000 RSI: ffffffffb1475fa0 RDI: 0000000000000000 RBP: ffff880036e7d638 R08: 1ffff10006dcfad1 R09: ffff880030ad0040 R10: ffff880036e7d3b8 R11: 0000000000000000 R12: 1ffff10006dcfa9e R13: ffff880036e7d6c0 R14: ffff880036e7d680 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c420019318 CR3: 0000000047a28000 CR4: 00000000000406f0 Call Trace: flush_tlb_func_local arch/x86/mm/tlb.c:239 [inline] flush_tlb_mm_range+0x26d/0x370 arch/x86/mm/tlb.c:317 flush_tlb_page arch/x86/include/asm/tlbflush.h:253 [inline] ptep_clear_flush+0xd5/0x110 mm/pgtable-generic.c:86 page_mkclean_one+0x242/0x540 mm/rmap.c:867 rmap_walk_file+0x5e3/0xd20 mm/rmap.c:1681 rmap_walk+0x1cd/0x2f0 mm/rmap.c:1699 page_mkclean+0x2a0/0x380 mm/rmap.c:928 clear_page_dirty_for_io+0x37e/0x9d0 mm/page-writeback.c:2703 mpage_submit_page+0x77/0x230 fs/ext4/inode.c:2131 mpage_process_page_bufs+0x427/0x500 fs/ext4/inode.c:2261 mpage_prepare_extent_to_map+0x78d/0xcf0 fs/ext4/inode.c:2638 ext4_writepages+0x13be/0x3dd0 fs/ext4/inode.c:2784 do_writepages+0xff/0x170 mm/page-writeback.c:2357 __writeback_single_inode+0x1d9/0x1480 fs/fs-writeback.c:1319 writeback_sb_inodes+0x6e2/0x1260 fs/fs-writeback.c:1583 wb_writeback+0x45d/0xed0 fs/fs-writeback.c:1759 wb_do_writeback fs/fs-writeback.c:1891 [inline] wb_workfn+0x2b5/0x1460 fs/fs-writeback.c:1927 process_one_work+0xbfa/0x1d30 kernel/workqueue.c:2097 worker_thread+0x221/0x1860 kernel/workqueue.c:2231 kthread+0x35f/0x430 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:425 Code: 48 3d 80 96 f8 b1 74 22 65 8b 05 f1 42 8c 53 83 f8 01 74 17 55 31 d2 48 c7 c6 80 96 f8 b1 31 ff 48 89 e5 e8 60 ff ff ff 5d c3 c3 <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 48 c7 c0 b4 10 73 b2 55 RIP: leave_mm+0x33/0x40 arch/x86/mm/tlb.c:50 RSP: ffff880036e7d4c8 ---[ end trace 3b5d5a6fb6e394f8 ]--- Kernel panic - not syncing: Fatal exception Dumping ftrace buffer: (ftrace buffer empty) Kernel Offset: 0x2b800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Rebooting in 86400 seconds.. Don't really have an easy way to reproduce it... -- Thanks, Sasha