Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp1181566imm; Tue, 3 Jul 2018 06:58:34 -0700 (PDT) X-Google-Smtp-Source: ADUXVKITrdYhQl3Hp3gpc4gpyu1xK26sLnrfk50zO0VoroIfNE4Wkl4wHs2M+DQDnKk4X/SP40DX X-Received: by 2002:a17:902:9b98:: with SMTP id y24-v6mr30169549plp.177.1530626314203; Tue, 03 Jul 2018 06:58:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530626314; cv=none; d=google.com; s=arc-20160816; b=LXg5ZCrcHLTdIGru4dG5ETGzQIxI8gbr4fg/VETVmFJumHy9hBDeQai5hJf/iG/4HW keXPJyX4ONNCl3BLeJfI3S5fXyhbqyP1c3c1Ab7IWW1gjvdUdQPeKks33PfdZV9o01gA K9nwViKWKaX0INKz3KwV/2pRBt5dmVpbIqEMgTli03WTycicx0GGKZmHf5kxVrqhlq0x pxFC66awc/6QUY9r7afHM6RJ50gixVAYNgPci0ChE318iS9eqU99+OjUp/H8tTvFcuuy ZCkgPn472GX1CSE1YCd4U9Bp/Lo2474gLY3/tlCH32kJMZHcmFUXe8X7uhb+dNCem1L7 qbpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date :arc-authentication-results; bh=8l9bcqS3JlnWe6ptVSm9cbThfkujCXb805GivFEyJVI=; b=LOLDxVwywBDEa+4WH8k1cv+cZ7gikFag5OABSJHjzHfDqQKFI5LoWsa2xPfMltLuNS RrNw8kpELN5OJB0p3kFSSCNotFp1424utcQiCBF8UQ5LEUOTubRur8FvIJA04I8tr+AS WDpBZTk1xjpUBOltkZ5yflDrTaOATCjAU6kqo5yKOR65Jb+/PyuBh1gdm3qDixEpVcSX VwZZE8LsH0wvijWzcNKMMSeIlfL3xJ0oUrQnoj9i5+Xq5cm0KRbzwaQoP+ZpXU7zZ1Na cxJpw4jBjVmuiNwa9IqjatbkGPEgEuqZE/dcp+iuMI0fy0BhlIdtUtpP7/7r4NUfxK5D kPyQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s35-v6si1069883pgl.656.2018.07.03.06.58.19; Tue, 03 Jul 2018 06:58:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753414AbeGCN5o (ORCPT + 99 others); Tue, 3 Jul 2018 09:57:44 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:43989 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753200AbeGCN5n (ORCPT ); Tue, 3 Jul 2018 09:57:43 -0400 Received: from hsi-kbw-5-158-153-52.hsi19.kabel-badenwuerttemberg.de ([5.158.153.52] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1faLo7-00026a-EG; Tue, 03 Jul 2018 15:57:27 +0200 Date: Tue, 3 Jul 2018 15:57:27 +0200 (CEST) From: Thomas Gleixner To: Bin Yang cc: Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, LKML , Peter Zijlstra Subject: Re: [PATCH] x86/mm: fix cpu stuck issue in __change_page_attr_set_clr In-Reply-To: <1530180340-18593-1-git-send-email-bin.yang@intel.com> Message-ID: References: <1530180340-18593-1-git-send-email-bin.yang@intel.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Bin, On Thu, 28 Jun 2018, Bin Yang wrote: thanks for submitting this. > This issue can be easily triggered by free_initmem() functuion on > x86_64 cpu. Please do not use the output of git show for submitting a patch. Use git format-patch(1). > If cpu supports "pdpe1gb", kernel will try to use 1G large page. > In worst case, it needs to check every 4K page in 1G range in > try_preserve_large_page function. If __change_page_attr_set_clr > needs to change lots of 4K pages, cpu will be stuck for long time. Unfortunately you missed to describe what the actual interface function is which is called from a driver or whatever, including the parameters. > @@ -552,16 +552,20 @@ static int > try_preserve_large_page(pte_t *kpte, unsigned long address, > struct cpa_data *cpa) > { > + static unsigned long address_cache; > + static unsigned long do_split_cache = 1; What are the life time and protection rules of these static variables and why are they static in the first place. > unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn, old_pfn; > pte_t new_pte, old_pte, *tmp; > pgprot_t old_prot, new_prot, req_prot; > int i, do_split = 1; > enum pg_level level; > > - if (cpa->force_split) > + spin_lock(&pgd_lock); > + if (cpa->force_split) { > + do_split_cache = 1; Returns with pgd_lock held which will immediately lockup the system on the next spin_lock(&pgd_lock) operation. Also what's the point of storing the already available information of cpa->force_split in yet another variable? This enforces taking the lock on every invocation for no reason. > return 1; > + } > > - spin_lock(&pgd_lock); > /* > * Check for races, another CPU might have split this page > * up already: > @@ -627,13 +631,25 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, > > new_prot = static_protections(req_prot, address, pfn); > > + addr = address & pmask; > + pfn = old_pfn; > + /* > + * If an address in same range had been checked just now, re-use the > + * cache value without full range check. In the worst case, it needs to > + * check every 4K page in 1G range, which causes cpu stuck for long > + * time. > + */ > + if (!do_split_cache && > + address_cache >= addr && address_cache < nextpage_addr && On the first call, address_cache contains 0. On any subsequent call address_caches contains the address of the previous invocation of try_preserve_large_page(). But addr = address & pmask! So you are comparing apples and oranges. Not sure how that is supposed to work correctly. > + pgprot_val(new_prot) == pgprot_val(old_prot)) { > + do_split = do_split_cache; This is an very obscure way to write: do_split = 0; because do_split_cache must be 0 otherwise it cannot reach that code path. > + goto out_unlock; > + } > /* > * We need to check the full range, whether > * static_protection() requires a different pgprot for one of > * the pages in the range we try to preserve: > */ > - addr = address & pmask; > - pfn = old_pfn; > for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr += PAGE_SIZE, pfn++) { > pgprot_t chk_prot = static_protections(req_prot, addr, pfn); > > @@ -670,6 +686,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long address, > } > > out_unlock: > + address_cache = address; > + do_split_cache = do_split; > spin_unlock(&pgd_lock); So here you store the 'cache' values and while this code suggests that it protects the 'cache' via pgd_lock (due to lack of comments), the protection is actually cpa_lock. But, that cache information stays around when cpa_lock is dropped, i.e. when the current (partial) operation has been done and this information stays stale for the next user. That does not make sense. I'm still puzzled how that can ever happen at all and which problem you are trying to solve. The first invocation of try_to_preserve_large_page() will either preserve the 1GB page and consume the number of 4K pages which fit into it, or it splits it into 2M chunks and tries again. The retry will either preserve the 2M chunk and and consume the number of 4K pages which fit into it, or it splits it into 4K pages. Once split into 4K, subsequent invocations will return before reaching the magic cache checks. Can you please describe the call chain, the driver facing function used and the parameters which are handed in? Thanks, tglx