Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp2664680ybi; Thu, 18 Jul 2019 12:05:45 -0700 (PDT) X-Google-Smtp-Source: APXvYqwCYGIIrdK/abAjbXyzKdJ8XtoW3JbrYrHhpU9QJlC8Tg+FdubE28g5sssBoy4lRFQVlOqn X-Received: by 2002:a17:90b:94:: with SMTP id bb20mr53537107pjb.16.1563476745431; Thu, 18 Jul 2019 12:05:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563476745; cv=none; d=google.com; s=arc-20160816; b=YJsdan4yAOM4UwZr1IrzPJE2Akj4N8Nc+ksbAnV0MFF4yG6oHEK8gYRI5Gpg/+Ka/r 5Ysyqd2tojQNFx6rrO/l2pJ3ekmM2hEkkw6M/KGD6BXjma+VkLvQ0t4YrP4svadwJVQF JREdxiPa1ygDr53HCt6xAw2pMOpu0uUewK6YUTUXHg4BVTsvPYiFDJ6qxUtAIgGZGm32 kfRbjl4AO+LwJCipHoBSkE5vq7tI8ntHjhGfeZVWNBzuHBUxI7doijW/9apow4CZ+IgW rbPKrGMLv8/Rq3QUgtdLNc5Y1sGdlV6YcuDHQUjFOlklKZXhTuGTIjHpfmWjB2hkoPPO 4lbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=hJCHgwWJegp9gmuco2S0E0jhothiJS3DFgir7R78j+Q=; b=hkB8AnuEJ9fafckFWv5GzUuOHN6G0SMOC2oHpR7ayqVACcLRUogeFEuMsj0DytXZbm N0EgEi7exXMd20g8nL81HrJ4TKMf5eQU6WNt5d//McmK4Av409fSlwe2uKW7TuYGbTWb KZLJp/nVqXneR6p7Zs9VhBwplgenoIZtccdc+paNzqJSuD7mc7jkwdOAMqDqpy5r0D2v Av/XeK80ToJV4WSJfE4P6WWi1v3ngqlwM3wyN3DnTlZ6WBUAo+60Sv2ktenZRaziN8al SrUx9fbdv/Ay0CXCeSsCTDh5r1R2qs1Z0GJ9/Aa3F1MMCcQoAMhXs29usLHq0t2+qkzo HchQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Bl2twEhm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h6si216200pgc.202.2019.07.18.12.05.29; Thu, 18 Jul 2019 12:05:45 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=Bl2twEhm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390987AbfGRTFE (ORCPT + 99 others); Thu, 18 Jul 2019 15:05:04 -0400 Received: from mail.kernel.org ([198.145.29.99]:32780 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727685AbfGRTFD (ORCPT ); Thu, 18 Jul 2019 15:05:03 -0400 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3D55121871 for ; Thu, 18 Jul 2019 19:05:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1563476702; bh=JLV3apyWx90y3S5LriUELmBnKu3l40uTLHsTfgrKxVg=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Bl2twEhm2US7P69bVm3HXVdMmwuJjpJpKdezxRcPxJPh5Aa9brJgyZfbKrxU70Ijf qLmuD6/QjMGMU3y6dIpzimq1mxSiGSpCKiRxbrufLNl6ISnfP5buAod9rAQEW18Ikf 8QABJimF8B2leofR35RQCBn7S3se+I+JumTZIFQo= Received: by mail-wr1-f49.google.com with SMTP id n4so29885930wrs.3 for ; Thu, 18 Jul 2019 12:05:02 -0700 (PDT) X-Gm-Message-State: APjAAAVPgvI1LAJjhJmDGUFYN03sPldtsHvPY/KLgJ/UcezKMxH9HSF/ YA1MhQ4/RsKGnvQkQUhh4554uSFA2ShmJr0+uXXpzw== X-Received: by 2002:adf:f28a:: with SMTP id k10mr11064718wro.343.1563476700729; Thu, 18 Jul 2019 12:05:00 -0700 (PDT) MIME-Version: 1.0 References: <20190717071439.14261-1-joro@8bytes.org> <20190717071439.14261-4-joro@8bytes.org> <20190718091745.GG13091@suse.de> In-Reply-To: <20190718091745.GG13091@suse.de> From: Andy Lutomirski Date: Thu, 18 Jul 2019 12:04:49 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 3/3] mm/vmalloc: Sync unmappings in vunmap_page_range() To: Joerg Roedel Cc: Andy Lutomirski , Joerg Roedel , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andrew Morton , LKML , Linux-MM Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 18, 2019 at 2:17 AM Joerg Roedel wrote: > > Hi Andy, > > On Wed, Jul 17, 2019 at 02:24:09PM -0700, Andy Lutomirski wrote: > > On Wed, Jul 17, 2019 at 12:14 AM Joerg Roedel wrote: > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > > > index 4fa8d84599b0..322b11a374fd 100644 > > > --- a/mm/vmalloc.c > > > +++ b/mm/vmalloc.c > > > @@ -132,6 +132,8 @@ static void vunmap_page_range(unsigned long addr, unsigned long end) > > > continue; > > > vunmap_p4d_range(pgd, addr, next); > > > } while (pgd++, addr = next, addr != end); > > > + > > > + vmalloc_sync_all(); > > > } > > > > I'm confused. Shouldn't the code in _vm_unmap_aliases handle this? > > As it stands, won't your patch hurt performance on x86_64? If x86_32 > > is a special snowflake here, maybe flush_tlb_kernel_range() should > > handle this? > > Imo this is the logical place to handle this. The code first unmaps the > area from the init_mm page-table and then syncs that page-table to all > other page-tables in the system, so one place to update the page-tables. I find it problematic that there is no meaningful documentation as to what vmalloc_sync_all() is supposed to do. The closest I can find is this comment by following the x86_64 code, which calls sync_global_pgds(), which says: /* * When memory was added make sure all the processes MM have * suitable PGD entries in the local PGD level page. */ void sync_global_pgds(unsigned long start, unsigned long end) { Which is obviously entirely inapplicable. If I'm understanding correctly, the underlying issue here is that the vmalloc fault mechanism can propagate PGD entry *addition*, but nothing (not even flush_tlb_kernel_range()) propagates PGD entry *removal*. I find it suspicious that only x86 has this. How do other architectures handle this? At the very least, I think this series needs a comment in vmalloc_sync_all() explaining exactly what the function promises to do. But maybe a better fix is to add code to flush_tlb_kernel_range() to sync the vmalloc area if the flushed range overlaps the vmalloc area. Or, even better, improve x86_32 the way we did x86_64: adjust the memory mapping code such that top-level paging entries are never deleted in the first place.