Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750981AbdIISrg (ORCPT ); Sat, 9 Sep 2017 14:47:36 -0400 Received: from mail-io0-f194.google.com ([209.85.223.194]:36576 "EHLO mail-io0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750923AbdIISrf (ORCPT ); Sat, 9 Sep 2017 14:47:35 -0400 X-Google-Smtp-Source: AOwi7QBWX2hdZpZx9IHzuVlVqi7V3/oI9Rfqrrl8rdlVnHPdhtxErsW358a5UyOTXP/n1jKVRbiXqMr+AJpRPTQnkwg= MIME-Version: 1.0 In-Reply-To: <20170909182952.itqad4ryngjwrgqf@pd.tnic> References: <20170909140700.bp7jonmp7etlb7ov@pd.tnic> <20170909142014.GC289@x4> <20170909143335.ja2iwjsbeyfxz4ez@pd.tnic> <20170909144350.GA290@x4> <20170909163225.GA290@x4> <20170909170537.6xmxtzwripplhhwi@pd.tnic> <20170909172352.GA290@x4> <20170909173633.4ttfk7maooxkcwum@pd.tnic> <20170909181445.GA281@x4> <20170909182952.itqad4ryngjwrgqf@pd.tnic> From: Linus Torvalds Date: Sat, 9 Sep 2017 11:47:33 -0700 X-Google-Sender-Auth: KDE7MnXv06iqWHZKZ6imWniFDJQ Message-ID: Subject: Re: Current mainline git (24e700e291d52bd2) hangs when building e.g. perf To: Borislav Petkov Cc: Markus Trippelsdorf , Andy Lutomirski , Ingo Molnar , Thomas Gleixner , Peter Zijlstra , LKML , Ingo Molnar , Tom Lendacky Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1115 Lines: 26 On Sat, Sep 9, 2017 at 11:29 AM, Borislav Petkov wrote: > On Sat, Sep 09, 2017 at 11:26:27AM -0700, Linus Torvalds wrote: >> But the fact that that fixes it for you does indicate that it's not >> just a stale TLB entry or something, it really is some CPU using page >> tables after they have been free'd and been re-allocated to something >> else (and *then* they may point to garbage). > > Cool, I was trying to think of a good use case how we'd hit that. I > guess you just gave one. :) The thing is, even with the delayed TLB flushing, I don't think it should be *so* delayed that we should be seeing a TLB fill from garbage page tables. But the part in Andy's patch that worries me the most is that + cpumask_clear_cpu(cpu, mm_cpumask(mm)); in enter_lazy_tlb(). It means that we won't be notified by peopel invalidating the page tables, and while we then do re-validate the TLB when we switch back from lazy mode, I still worry. I'm not entirely convinced by that tlb_gen logic. I can't actually see anything *wrong* in the tlb_gen logic, but it worries me. Linus