Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp4161988ybc; Tue, 26 Nov 2019 04:55:59 -0800 (PST) X-Google-Smtp-Source: APXvYqxlGiyonBVqHDz/0Ph07WsSQlTANAzcok+6j+1CsNk1YxmGyTNOzi2QpR3R937QQCQ4Cm28 X-Received: by 2002:a17:906:7fcb:: with SMTP id r11mr42695305ejs.85.1574772959367; Tue, 26 Nov 2019 04:55:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574772959; cv=none; d=google.com; s=arc-20160816; b=t8gcaHkFPqSxOcfiuMLA5k9wZtRA8p3eJ1e1iz0MlOSVvF0sMN/yK8NVTuBRzeeqam 0XX1J2Tk+5XQbmqJfnAWKBbVY3aYXndxE/AairZT0bGjw9VGnaXkg556QDZzR+B3Be88 X7iy81ec/l+xs+YxSqd9H07UEEV+l7qGYuWqQHbTPF1REmhts3OAq9z91vy4GZw9wMrZ PeGOZyB7iPC/2vw9kYerqNrOuHZVJVZjMVt54M6WE3ni/Nb8ELhr+5klvAGb+W9IhFVA wVlKlyCbasjcGiguDQHe1oqGBVRBkLCcorpkS7HXIdYZnebtrG8GWHkcWB6nZIcAofva jZMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ZSWYOnBp7lbO19TSpi9b9n9XrIncCSiLTDtopqS5zNo=; b=Ubth/xqcj4vepi3OmFXGcTbQoOpYrEKQOhdwRrj2GaGzeI0bF6CCvR7oYjhl1i91YD /fw81g5GBw6lDwlHoJL6xI8Lr24FfphJjQJaV3Rb3QMx6bZb0Tl9Btyf7GqaVTmG5tiL uh8pOX/Pa6HJYA0lPbu98PpkC3rtK4yxgY6gp9I1QCxnK+9/jjx8g3XeNg/vVMIhnere 4dkyzgHF0GNMi+v3U+UuuQ7useFjsd4FIqUaijxq9ptx1ZEBfFxBU2E9p1bV2c1QDD5R 9v8TqMP59qVHJDWFcQwZsOrh/77SU6/hXdAgwyDLWHCgfTfoe+LwKdC8Ivha3qC8ASoJ Pdow== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s2si7111927edx.148.2019.11.26.04.55.35; Tue, 26 Nov 2019 04:55:59 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728384AbfKZMar (ORCPT + 99 others); Tue, 26 Nov 2019 07:30:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:45256 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727408AbfKZMar (ORCPT ); Tue, 26 Nov 2019 07:30:47 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A9FB6AC10; Tue, 26 Nov 2019 12:30:45 +0000 (UTC) Date: Tue, 26 Nov 2019 13:30:43 +0100 From: Joerg Roedel To: Ingo Molnar Cc: Joerg Roedel , Dave Hansen , Andy Lutomirski , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Peter Zijlstra , hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH -tip, v2] x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() Message-ID: <20191126123043.GH21753@suse.de> References: <20191126100942.13059-1-joro@8bytes.org> <20191126111119.GA110513@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191126111119.GA110513@gmail.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Ingo, On Tue, Nov 26, 2019 at 12:11:19PM +0100, Ingo Molnar wrote: > The vmalloc_sync_all() also iterating over the LDT range is buggy, > because for the LDT the mappings are *intentionally* and fundamentally > different between processes, i.e. not synchronized. Yes, you are right, your patch description is much better, thanks for making it more clear and correct. > Furthermore I'm not sure we need to iterate over the PKMAP range either: > those are effectively permanent PMDs as well, and they are not part of > the vmalloc.c lazy deallocation scheme in any case - they are handled > entirely separately in mm/highmem.c et al. I looked a bit at that, and I didn't find an explict place where the PKMAP PMD gets established. It probably happens implicitly on the first kmap() call, so we are safe as long as the first call to kmap happens before the kernel starts the first userspace process. But that is not an issue that should be handled by vmalloc_sync_all(), as the name already implies that it only cares about the vmalloc range. So your change to only iterate to VMALLOC_END makes sense and we should establish the PKMAP PMD at a defined place to make sure it exists when we start the first process. > Note that this is *completely* untested - I might have wrecked PKMAP in > my ignorance. Mind giving it a careful review and a test? My testing environment for 32 bit is quite limited these days, but I tested it in my PTI-x32 environment and the patch below works perfectly fine there and still fixes the ldt_gdt selftest. Regards, Joerg > ===========================> > Subject: x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() > From: Joerg Roedel > Date: Tue, 26 Nov 2019 11:09:42 +0100 > > From: Joerg Roedel > > The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc() > ranges: before such vmap ranges are reused we make sure that they are > unmapped from every task's page tables. > > This is really easy on pagetable setups where the kernel page tables > are shared between all tasks - this is the case on 32-bit kernels > with SHARED_KERNEL_PMD = 1. > > But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating > over the pgd_list and clearing all pmd entries in the pgds that > are cleared in the init_mm.pgd, which is the reference pagetable > that the vmalloc() code uses. > > In that context the current practice of vmalloc_sync_all() iterating > until FIX_ADDR_TOP is buggy: > > for (address = VMALLOC_START & PMD_MASK; > address >= TASK_SIZE_MAX && address < FIXADDR_TOP; > address += PMD_SIZE) { > struct page *page; > > Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc > address ranges: > > VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR > > This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges > that don't clear their pmds, but it's lethal for the LDT range, > which relies on having different mappings in different processes, > and 'synchronizing' them in the vmalloc sense corrupts those > pagetable entries (clearing them). > > This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD > off and makes this the dominant mapping mode on 32-bit. > > To make LDT working again vmalloc_sync_all() must only iterate over > the volatile parts of the kernel address range that are identical > between all processes. > > So the correct check in vmalloc_sync_all() is "address < VMALLOC_END" > to make sure the VMALLOC areas are synchronized and the LDT > mapping is not falsely overwritten. > > The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either, > but this is not really a proplem since their PMDs get established > during bootup and never change. > > This change fixes the ldt_gdt selftest in my setup. > > Reported-by: Borislav Petkov > Tested-by: Borislav Petkov > Signed-off-by: Joerg Roedel > Cc: > Cc: Andy Lutomirski > Cc: Borislav Petkov > Cc: Brian Gerst > Cc: Dave Hansen > Cc: H. Peter Anvin > Cc: Linus Torvalds > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Fixes: 7757d607c6b3: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32") > Link: https://lkml.kernel.org/r/20191126100942.13059-1-joro@8bytes.org > Signed-off-by: Ingo Molnar > --- > arch/x86/mm/fault.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > Index: tip/arch/x86/mm/fault.c > =================================================================== > --- tip.orig/arch/x86/mm/fault.c > +++ tip/arch/x86/mm/fault.c > @@ -197,7 +197,7 @@ void vmalloc_sync_all(void) > return; > > for (address = VMALLOC_START & PMD_MASK; > - address >= TASK_SIZE_MAX && address < FIXADDR_TOP; > + address >= TASK_SIZE_MAX && address < VMALLOC_END; > address += PMD_SIZE) { > struct page *page; >