Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751337AbdH2S2t (ORCPT ); Tue, 29 Aug 2017 14:28:49 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57718 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751215AbdH2S2s (ORCPT ); Tue, 29 Aug 2017 14:28:48 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 1FE7D356E1 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jglisse@redhat.com Date: Tue, 29 Aug 2017 14:28:44 -0400 From: Jerome Glisse To: Linus Torvalds Cc: Andrea Arcangeli , Adam Borowski , Takashi Iwai , Bernhard Held , Nadav Amit , Paolo Bonzini , Wanpeng Li , Radim =?utf-8?B?S3LEjW3DocWZ?= , Joerg Roedel , "Kirill A. Shutemov" , Andrew Morton , kvm , "linux-kernel@vger.kernel.org" , Michal Hocko Subject: Re: kvm splat in mmu_spte_clear_track_bits Message-ID: <20170829182844.GA7546@redhat.com> References: <20170825131419.r5lzm6oluauu65nx@angband.pl> <0a85df4b-ca0a-7e70-51dc-90bd1c460c85@redhat.com> <20170827123505.u4kb24kigjqwa2t2@angband.pl> <0dcca3a4-8ecd-0d05-489c-7f6d1ddb49a6@gmx.de> <79BC5306-4ED4-41E4-B2C1-12197D9D1709@gmail.com> <20170829125923.g3tp22bzsrcuruks@angband.pl> <20170829140924.GB21615@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.8.3 (2017-05-23) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Tue, 29 Aug 2017 18:28:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2226 Lines: 55 On Tue, Aug 29, 2017 at 09:10:59AM -0700, Linus Torvalds wrote: > On Tue, Aug 29, 2017 at 7:09 AM, Andrea Arcangeli wrote: > > Hello, > > > > On Tue, Aug 29, 2017 at 02:59:23PM +0200, Adam Borowski wrote: > >> On Tue, Aug 29, 2017 at 02:45:41PM +0200, Takashi Iwai wrote: > >> > [Put more people to Cc, sorry for growing too much...] > >> > >> We're all interested in 4.13.0 not crashing on us, so that's ok. > >> > >> > On Tue, 29 Aug 2017 11:19:13 +0200, > >> > Bernhard Held wrote: > >> > > > >> > > On 08/28/2017 at 06:56 PM, Nadav Amit wrote: > >> > > > Don’t blame me for the TLB stuff... My money is on aac2fea94f7a . > >> > > > >> > > Amit, thanks for your courage to expose your patch! > >> > > > >> > > I'm more and more confident that aac2fea94f7a is the culprit. Maybe it > >> > > just accelerates the triggering of the splash. To be more sure the > >> > > kernel needs to be tested for a couple of days. It would be great if > >> > > others could assist in testing aac2fea94f7a. > >> > > >> > I'm testing with the revert for a while and it seems working. > >> > >> With nothing but aac2fea94f7a reverted, no explosions for me either. > > > > The aforementioned commit has 3 bugs. > > Yes. I'm reverting it from my tree. > > We should really *really* just tell the stupid MMU notifier users that > they can't sleep. There is no way around sleeping if we ever want to support thing like GPU. To invalidate page table on GPU you need to schedule commands to do so on GPU command queue and wait for the GPU to signal that it has invalidated its page table/tlb and caches. We had this discussion before. Either we want to support all the new fancy GPGPU, AI and all the API they rely on or we should tell them sorry guys not on linux. > > The MMU notifiers are not going to destroy our VM layer. I hate the > damn crap, and this kind of garbage is an example of why. Issue here is that nobody calls mmu_notifier_invalidate_range_start/end() hence why people relied on invalidate_range() to not sleep like start/end Now we can make the decission that start/end can sleep while the range can't but then we also need to make sure that range_start/end is always called. Cheers, Jérôme