Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752443AbdI0WB1 (ORCPT ); Wed, 27 Sep 2017 18:01:27 -0400 Received: from mail-dm3nam03on0059.outbound.protection.outlook.com ([104.47.41.59]:34791 "EHLO NAM03-DM3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752133AbdI0WBX (ORCPT ); Wed, 27 Sep 2017 18:01:23 -0400 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Yuri.Norov@cavium.com; Date: Thu, 28 Sep 2017 01:01:07 +0300 From: Yury Norov To: Will Deacon Cc: peterz@infradead.org, paulmck@linux.vnet.ibm.com, kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, rruigrok@codeaurora.org, linux-arch@vger.kernel.org, akpm@linux-foundation.org, catalin.marinas@arm.com Subject: Re: [RFC PATCH 0/2] Missing READ_ONCE in core and arch-specific pgtable code leading to crashes Message-ID: <20170927220107.fms6yauxwccgu2sj@yury-thinkpad> References: <1506527369-19535-1-git-send-email-will.deacon@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1506527369-19535-1-git-send-email-will.deacon@arm.com> User-Agent: NeoMutt/20170113 (1.7.2) X-Originating-IP: [176.59.118.21] X-ClientProxiedBy: HE1PR0102CA0003.eurprd01.prod.exchangelabs.com (2603:10a6:7:14::16) To CY4PR0701MB3827.namprd07.prod.outlook.com (2603:10b6:910:94::33) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: ecfbe305-048e-4319-be99-08d505f349a6 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(2017030254152)(2017052603199)(201703131423075)(201703031133081)(201702281549075);SRVR:CY4PR0701MB3827; X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3827;3:agTNZDcMU5c1Ml3EE26e+wzl8AMVh3OwOpsmkUXJSlvMEyPRn9Y4xU0O4JsVLYq52sECl4JPpHbR3trsY/GZVEtSsb2rOL9B9yIgSbjOEMEa/hCXvIszvJjG5YbsAy0HCGjpk/1wSpNtaTY8CNDf3rBGEZ2wQyPfWHgVykjYtJB9Ay1mKTkJgTfBe+4e3xkdGIsjCT4/p97h2ftjbeJ39NJxLI5E2S/jfmnx9IAX4Rp+aTEXOQHfmsiPL/qGDEYr;25:3OYytKExQX3S9YZ4AIQoxPsaXsHboVNagPLJZ939P68x5GCPHXZFl9HgAZTLEfabKkI2cjm0vuhL+pSclwTjHINbHgObsf2wy5/Ppdjj30ePiJi4/ZRrnEzUTqy9HB61M0Cw/eBt87+WgLvfwe6bllUypgPwO2Z3lYFOSw1Tc7Jtx3B8KD0qPILfV2qT8ErvXMM+U280Leal5EQAIlk0i/S+thN3ijn+IWeJqSZbd+Ao2WLwxTMiBTwkXmnzn5A6yeOdyiFp3fI5GCug54VufZaZeKwgCUi0gDaIwze9L9c+Us84Y2+bDAqMeyj2ctMfNI9PDLaYa/WKiDt01VuEPA==;31:gbZ1aKdKXZTjqjAN8OV9XJn0NTX4yf6JJ7Bor/6VVvtpJj/rQ8KslsyVx9+2jsnm40ZtFVCo8IpJ7LlIK+2HU61EbK7vLTWvwQ+wc1gHs9uoIUkvAz45qnB6E0C2CgYpmcUOYEy/jZGI28T2mr40ikzGA3im5d1prXIzoNc5HbfKERh/07KOH2VCe19KFOwf+r91hjm6rdQ5ZJHE12tG0Y83ESVjpcpO3vVV1oNy1ZU= X-MS-TrafficTypeDiagnostic: CY4PR0701MB3827: X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3827;20:gju45G4VJjZQym8MOSLxvhmr3p4kkIouzGuaGd7et/nLY1nGqZf6bssOZZUf+/zJnhmypjIYdIGWqQSf2relnZY8WOVKyuJMrVC+Sir+PH7k7zfM22KNQ6rIwmacOq0o8NBlymxuEHPQe0guzDYom40XpXAr1tD7JmzYwZvjdeCNba5cG1vfHFqEJxKTzY03rBUPmav1C1Dm5Edq1+UA/jE+uBi4TjO9re85+MHwvqgsxvlTJAvvctTD0O0OmFm9npjnZFKxIGtkKozl//efm8n8hJXfYJC3wlbmpV+XrQc0B9Jq24bR/fDq9aX749IglThtLzTKzd1fIJP0Fk00peSm1TijkLH/6vPmskkpwg+B1Ko1w77QPsTRrtU1JXv1Z/xHiR4TLvNWgL42JJgoO2mnQCCT0F9DkyRxqEKKO6FHX75GULUoh3nQV0Wo9PHTwdgydJ9mJREMT0fuc6tHEHEpqxCLylp3qM/p+6YFf/gnfENvs+Lvya0XoEcdaoGX4xXvDn67oz+yYwMMPbvgmQEyg+goZhI3M2hYwDe7lDfw6cGYLioe2eIwqFafRSGX/vogfREgDzTqpWTivmSvvhH+EU4dGRtV/4m+VjerZaE=;4:LNe/8sHwj4P0qZoPIHYDE78L2L1338D3cgEUSuvc4cpe9Ty+e+XxS2hFe8Pgfdb+7Up2DzxapHHEzLfeiAgQmVCMya2TLQXa+1kbCn7t741HTZuHgnxCgxjX9RN5iR022V6lwKRycK89JpIphAOWAmwtjizZzPvNnrmdbEHol/HWfTHKaqleQEU/hGPWloJ1DzJf66/FsFQ5iux1z58gkrLvvdglqHXVJHavnKOM3WFME0BU3nSv/0HPm/NjZIGvWgU+YMonULTzb6ScYms91xyY/YC5ftghjCkkprNEl8o= X-Exchange-Antispam-Report-Test: UriScan:(258649278758335); X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(2401047)(5005006)(8121501046)(10201501046)(3002001)(93006095)(100000703101)(100105400095)(6041248)(20161123564025)(20161123558100)(20161123562025)(20161123555025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(6072148)(201708071742011)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:CY4PR0701MB3827;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:CY4PR0701MB3827; X-Forefront-PRVS: 04433051BF X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6069001)(6009001)(7916004)(376002)(346002)(199003)(24454002)(189002)(58126008)(316002)(229853002)(16586007)(81166006)(76506005)(5660300001)(97736004)(16526017)(81156014)(2906002)(23726003)(4326008)(8936002)(33716001)(105586002)(2950100002)(189998001)(6916009)(42882006)(106356001)(1076002)(478600001)(966005)(72206003)(6486002)(6116002)(3846002)(83506001)(68736007)(54356999)(76176999)(47776003)(66066001)(50466002)(50986999)(25786009)(6496005)(305945005)(9686003)(6666003)(33646002)(6246003)(8676002)(101416001)(7736002)(6306002)(53936002)(2004002)(217873001);DIR:OUT;SFP:1101;SCL:1;SRVR:CY4PR0701MB3827;H:localhost;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;CY4PR0701MB3827;23:DcGaziMoSCMIZ63+aBY6vtva/3pKIrt06dbIPSd?= =?us-ascii?Q?IDRkromsEwngLxw26X3FOpHS150pvMEjqrEdre5VR3rlQ7EL5kPyob4ZnfCK?= =?us-ascii?Q?YB0JLqKXnEWj25xer5rQgtlWZEV32d+kJJgs4h8OxpKw13a1celPsm5AGpXC?= =?us-ascii?Q?yMmO4bvNT9JH3DBkP3kwLLT1oIiwpTM2Rodbjv/qCm7+5EmvglqQRB5uSDzs?= =?us-ascii?Q?j1kzJYPBwVrsB4nmxQQdLrlYJKdgR7QBsACN/N5AkxFsyHew/ivP6Yw5ui18?= =?us-ascii?Q?OYSKBWmkaPWoryWD3GeZZTQK/i5aejCpH35II+c39xk/DM5/uAaiSexXEFvy?= =?us-ascii?Q?YaiCbfQbFOFoDDOtREOYidv/PG6RPH0SldTr/oEJmSIUiRyJ6JYIhu9relgc?= =?us-ascii?Q?TbACsgnKoXCo/r118BJ14bqU6GN4x9BOIaTEFFAUxwj0W/jfpu9AjimWmter?= =?us-ascii?Q?2KEZOHModWdddLIdE3HYSV5IbNeVhxhy1JXdB33ywZPAutvRwBdh5yGa7zJN?= =?us-ascii?Q?hOAXAJ0XHSRefyGJMwrjlOMKaDwxl07rckTx6BwnCHxejE+kHEJMaU58qxFu?= =?us-ascii?Q?4AJuLnGPXStcQQqzEyfChjouSTiuTpuHb/UL2FoRt1ArjmMjykyd1C1b34Uz?= =?us-ascii?Q?hYEXlIKxxiSo0WK2AWcJh23uKEpy5vkb0WBsTB2ezMwtRk08n6OlHe/PypQP?= =?us-ascii?Q?RS09MffaWzIS3hpahIIPTSrMmn65sgwvd725sRfbM7kGpiLSF78K6wcoVRH/?= =?us-ascii?Q?WGBResFrmRKuULJu5CdWAfieYkoSwV6xiK4cV4M/DppxGKWTLtXj1X+Ru1TF?= =?us-ascii?Q?kp2O8FwZ3UyQ8Y3/RVKup+tv4HjBJjg+xE/XrpBdCkagMDgg+2lYdBaFE6KE?= =?us-ascii?Q?arUGHn9yMT2bkL1ne/GvBWfoC39lIz69TLrjCdtOHmsSCIQncVCyZwTwcI2J?= =?us-ascii?Q?gYB1vcRYxf8tQdVP8L7Xq1pH0SrOE9vkrxkzhqwAMWkdTPfJ/mcP3HWKPGTC?= =?us-ascii?Q?9lZXWXtmMsixhUENjnXdV58WfrMTbshyO2FfFr2FLzkxABWGsszGu/5UhLru?= =?us-ascii?Q?D5kn7KbqhA0igdQevFifXny1D0MgVLDvIqeIBU4nHEZrbFHkpQn+k/xlbAvI?= =?us-ascii?Q?N49y8jcegRDkHHNcNDcQQXXIJX+uAZIAb9SefaXVHz3GXzQ+VoR1pif3Yd+j?= =?us-ascii?Q?+jkTpcxTeaN10etMsIhHx2u3ny1gXWJdM+8EWJzr3BB33N9CYz6GOUb1mki6?= =?us-ascii?Q?8EZmeWPpSSPBfttSvn2RDbXwW02CRMqlzyua/kXodLiA2MCdAn+TrsFu35YB?= =?us-ascii?Q?qC9Jda/OWH86gvIALqWbfVSE=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR0701MB3827;6:+Bux3y9rSbJsHlJl22ZOfsdPBgXgY9q4i3Pi9FeabGmrobcE9SVtllFVaET3BfEZlVq32IwlkvrZPgTi8XQzhMTHlBh37SMpfQKTajVy/nirj0VGuBb6fXw+9Vgjgeqc3gVhIPcReuYniPe/kKBEco0uRwVVWyblvjmXXWYx6KJSvv9gMnP/Q/mna0NkAehPhD/53Q1MuzgSjfDJ59AmveGxwIEtxbG20hMQsNfh8h0FH4ZdKjV1aw1XO21i36cv+hVw1Pj75DBE1MN2BDC0cC8CREIW4/KeQkd9zTd6++KeRY/2N3/pJsyNiq7uNRNXWL7ar5tXrOM0VBuVU/3hVg==;5:84RXU8ddIW27FGz+1gfOfRi98h6CqMtm7cVeQS58AFaJslPZ2d3YEW0HA5y5jsBjWyisyaVCSqVbH7MgvL5uwg67lJQUYJbWD/j6CNEcV6G79BKCpjIBeAHGVm+H2V/vMSitpyaFsTg8fomAXf5OXg==;24:eC0nXJTgIGD0RBhkz+shxoun20hcM9upmAhoRBk4ZfINf3uQrV2VTDoKLVrvK6iu5Qg1sRudi7DWu9maR1yahWKwTVSA5Y2cA+hDmiBAeyY=;7:pebHTrwZrA1X/ep6owcUztfqNuRJa6WDAIIOpQyGUflAuHlocLhp4LW158ahj8kCSV1xjynmtHnKMyD4m5uettC8FZq/E7FB7au/MBaCdNzT934Mcb6AHwoR+g/QQvSR4vthf4FOLVOiRpcbCYtzSAEGRYoP/wgNm/8506p+3iMEyPT+wIvtwDlpEwVBfRscXopQ/ziISbxEu5NqfGQdGYIozVhCOmtqKin5X5MKsd8= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 Sep 2017 22:01:20.5055 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR0701MB3827 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5937 Lines: 140 On Wed, Sep 27, 2017 at 04:49:27PM +0100, Will Deacon wrote: > Hi, > > We recently had a crash report[1] on arm64 that involved a bad dereference > in the page_vma_mapped code during ext4 writeback with THP active. I can > reproduce this on -rc2: > > [ 254.032812] PC is at check_pte+0x20/0x170 > [ 254.032948] LR is at page_vma_mapped_walk+0x2e0/0x540 > [...] > [ 254.036114] Process doio (pid: 2463, stack limit = 0xffff00000f2e8000) > [ 254.036361] Call trace: > [ 254.038977] [] check_pte+0x20/0x170 > [ 254.039137] [] page_vma_mapped_walk+0x2e0/0x540 > [ 254.039332] [] page_mkclean_one+0xac/0x278 > [ 254.039489] [] rmap_walk_file+0xf0/0x238 > [ 254.039642] [] rmap_walk+0x64/0xa0 > [ 254.039784] [] page_mkclean+0x90/0xa8 > [ 254.040029] [] clear_page_dirty_for_io+0x84/0x2a8 > [ 254.040311] [] mpage_submit_page+0x34/0x98 > [ 254.040518] [] mpage_process_page_bufs+0x164/0x170 > [ 254.040743] [] mpage_prepare_extent_to_map+0x134/0x2b8 > [ 254.040969] [] ext4_writepages+0x484/0xe30 > [ 254.041175] [] do_writepages+0x44/0xe8 > [ 254.041372] [] __filemap_fdatawrite_range+0xbc/0x110 > [ 254.041568] [] file_write_and_wait_range+0x48/0xd8 > [ 254.041739] [] ext4_sync_file+0x80/0x4b8 > [ 254.041907] [] vfs_fsync_range+0x64/0xc0 > [ 254.042106] [] SyS_msync+0x194/0x1e8 > > After digging into the issue, I found that we appear to be racing with > a concurrent pmd update in page_vma_mapped_walk, assumedly due a THP > splitting operation. Looking at the code there: > > pvmw->pmd = pmd_offset(pud, pvmw->address); > if (pmd_trans_huge(*pvmw->pmd) || is_pmd_migration_entry(*pvmw->pmd)) { > [...] > } else { > if (!check_pmd(pvmw)) > return false; > } > if (!map_pte(pvmw)) > goto next_pte; > > what happens in the crashing scenario is that we see all zeroes for the > PMD in pmd_trans_huge(*pvmw->pmd), and so go to the 'else' case (migration > isn't enabled, so the test is removed at compile-time). check_pmd then does: > > pmde = READ_ONCE(*pvmw->pmd); > return pmd_present(pmde) && !pmd_trans_huge(pmde); > > and reads a valid table entry for the PMD because the splitting has completed > (i.e. the first dereference reads from the pmdp_invalidate in the splitting > code, whereas the second dereferenced reads from the following pmd_populate). > It returns true because we should descend to the PTE level in map_pte. map_pte > does: > > pvmw->pte = pte_offset_map(pvmw->pmd, pvmw->address); > > which on arm64 (and this appears to be the same on x86) ends up doing: > > (pmd_page_paddr((*(pvmw->pmd))) + pte_index(pvmw->address) * sizeof(pte_t)) > > as part of its calculation. However, this is horribly broken because GCC > inlines everything and reuses the register it loaded for the initial > pmd_trans_huge check (when we loaded the value of zero) here, so we end up > calculating a junk pointer and crashing when we dereference it. Disassembly > at the end of the mail[2] for those who are curious. > > The moral of the story is that read-after-read (same address) ordering *only* > applies if READ_ONCE is used consistently. This means we need to fix page > table dereferences in the core code as well as the arch code to avoid this > problem. The two RFC patches in this series fix arm64 (which is a bigger fix > that necessary since I clean things up too) and page_vma_mapped_walk. > > Comments welcome. > > Will > > [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/532786.html > [2] Hi Will, The fix works for me. Thanks. My cross-compiler is: $ /home/yury/work/thunderx-tools-28/bin/aarch64-thunderx-linux-gnu-gcc --version aarch64-thunderx-linux-gnu-gcc (Cavium Inc. build 28) 7.1.0 Copyright (C) 2017 Free Software Foundation, Inc. Tested-by: Yury Norov Yury > // page_vma_mapped_walk > // pvmw->pmd = pmd_offset(pud, pvmw->address); > ldr x0, [x19, #24] // pvmw->pmd > > // if (pmd_trans_huge(*pvmw->pmd) || is_pmd_migration_entry(*pvmw->pmd)) { > ldr x1, [x0] // *pvmw->pmd > cbz x1, ffff0000082336a0 > tbz w1, #1, ffff000008233788 // pmd_trans_huge? > > // else if (!check_pmd(pvmw)) > ldr x0, [x0] // READ_ONCE in check_pmd > tst x0, x24 // pmd_present? > b.eq ffff000008233538 // b.none > tbz w0, #1, ffff000008233538 // pmd_trans_huge? > > // if (!map_pte(pvmw)) > ldr x0, [x19, #16] // pvmw->address > > // pvmw->pte = pte_offset_map(pvmw->pmd, pvmw->address); > and x1, x1, #0xfffffffff000 // Reusing the old value of *pvmw->pmd!!! > [...] > > --->8 > > Will Deacon (2): > arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables > mm: page_vma_mapped: Ensure pmd is loaded with READ_ONCE outside of > lock > > arch/arm64/include/asm/hugetlb.h | 2 +- > arch/arm64/include/asm/kvm_mmu.h | 18 +-- > arch/arm64/include/asm/mmu_context.h | 4 +- > arch/arm64/include/asm/pgalloc.h | 42 +++--- > arch/arm64/include/asm/pgtable.h | 29 ++-- > arch/arm64/kernel/hibernate.c | 148 +++++++++--------- > arch/arm64/mm/dump.c | 54 ++++--- > arch/arm64/mm/fault.c | 44 +++--- > arch/arm64/mm/hugetlbpage.c | 94 ++++++------ > arch/arm64/mm/kasan_init.c | 62 ++++---- > arch/arm64/mm/mmu.c | 281 ++++++++++++++++++----------------- > arch/arm64/mm/pageattr.c | 30 ++-- > mm/page_vma_mapped.c | 25 ++-- > 13 files changed, 427 insertions(+), 406 deletions(-) > > -- > 2.1.4