Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp12850522rwd; Fri, 23 Jun 2023 11:32:40 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4a4yUnmcslVjwAWXI1cBxxFW3+MrFqzRF0j/TDV3IYFqpFgTH1GhMmp91x+BferA1jPXJP X-Received: by 2002:a92:cac8:0:b0:340:c382:f6bb with SMTP id m8-20020a92cac8000000b00340c382f6bbmr16515746ilq.26.1687545160106; Fri, 23 Jun 2023 11:32:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687545160; cv=none; d=google.com; s=arc-20160816; b=JJ1a4sjDiaL9jiToRsVN2lkYEYq20f2ZwS6md48J/5yXXgWK0O9IZS6p1O4HRWra5l 9WHrPlFMqPh6Gpb0dpRSMoUKZ3WdUL3jpo2ShuROoOIvSG7wM7etpqdrwov/72S1JgD6 HUkO1jsg6QuV8iPh1YAPzMwWFE4sGqeYK0gLs8DBVpKLxn+eMHJUPwT3XeZ/xr3BfuBz dnZEkkTg3YuUcA+Y+NoNH6tZXLr3kiEAODxrJzbcwh184bhTnJkZyXT1JXZgo3P3+WMe VPXSsv82pVa7JYq1K6jkALiluqz0I9FK+Z3QsK7AjcNOWGvfUQOhLvlgLByfDJ0neNET N1EA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=t1uqCqPbLvdYT+hRmV0MBBOULi1G46OtarHmqDXAzJ0=; b=zoMv7DsiTWWpoA1U9xZ0Bjaz37a2Cp5QJCCqiwYE8K8HYW4ldekJLJBf1TjTAszPF7 AtwKZrN3MkXWcgRyCgei41AEiD5k4ObuXrQzNPkt0i4hEwc6g4jEDd6UYsUTbT8g19AH 6onY56kIQYIqc9zVoFP0DVzfgu3/6JRuOE0DKBaF9AJP0MJ32lnV0Hp32jixoZOu5vRh QqRNx1CLSvqcBAlsT9FFckbe6j0dB5+zYmsu8WX8lhXH/ODHDw7pCvpTnUmXuSUaLCA1 ZcVmYx9r2UvbJ56XjY6aKnW8HCxTCY7TNgWV5j7YIP/m+OO7KiueODFxrn5udd6/zGyQ MBiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=KGh7V5u2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p18-20020a635b12000000b0053f265b0ef2si5604384pgb.471.2023.06.23.11.32.26; Fri, 23 Jun 2023 11:32:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=KGh7V5u2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229620AbjFWSRt (ORCPT + 99 others); Fri, 23 Jun 2023 14:17:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41064 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229446AbjFWSRs (ORCPT ); Fri, 23 Jun 2023 14:17:48 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1E522705 for ; Fri, 23 Jun 2023 11:17:42 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 76DB42199D; Fri, 23 Jun 2023 18:17:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1687544261; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=t1uqCqPbLvdYT+hRmV0MBBOULi1G46OtarHmqDXAzJ0=; b=KGh7V5u2aX/1HxzjaOzy9g5Ev/MinoQZGuXpMIPieNwHSCrU0z8zno1vZgDzYHVrFNIo2U zJsTI9c9G+dlLQJHvm7PQ3S0WiNBkzrOwedOlAdFkG5ka5SZ4lkwz7wI7KGJ0T6Repmswq DNsHC2hooT9DlC1Ton17bEDkYD3HPvo= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 553BE1331F; Fri, 23 Jun 2023 18:17:41 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id MQHuEcXhlWQsRwAAMHmgww (envelope-from ); Fri, 23 Jun 2023 18:17:41 +0000 Date: Fri, 23 Jun 2023 20:17:40 +0200 From: Michal Hocko To: Sebastian Andrzej Siewior Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Luis Claudio R. Goncalves" , Andrew Morton , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Peter Zijlstra , Petr Mladek , Tetsuo Handa , Thomas Gleixner , Waiman Long , Will Deacon Subject: Re: [PATCH v2 2/2] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Message-ID: References: <20230623171232.892937-1-bigeasy@linutronix.de> <20230623171232.892937-3-bigeasy@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230623171232.892937-3-bigeasy@linutronix.de> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_PASS, T_SCC_BODY_TEXT_LINE,T_SPF_HELO_TEMPERROR,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 23-06-23 19:12:32, Sebastian Andrzej Siewior wrote: > __build_all_zonelists() acquires zonelist_update_seq by first disabling > interrupts via local_irq_save() and then acquiring the seqlock with > write_seqlock(). This is troublesome and leads to problems on > PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping > lock on PREEMPT_RT and must not be acquired with disabled interrupts. > > The API provides write_seqlock_irqsave() which does the right thing in > one step. > printk_deferred_enter() has to be invoked in non-migrate-able context to > ensure that deferred printing is enabled and disabled on the same CPU. > This is the case after zonelist_update_seq has been acquired. > > There was discussion on the first submission that the order should be: > local_irq_disable(); > printk_deferred_enter(); > write_seqlock(); > > to avoid pitfalls like having an unaccounted printk() coming from > write_seqlock_irqsave() before printk_deferred_enter() is invoked. The > only origin of such a printk() can be a lockdep splat because the > lockdep annotation happens after the sequence count is incremented. > This is exceptional and subject to change. > > It was also pointed that PREEMPT_RT can be affected by the printk > problem since its write_seqlock_irqsave() does not really disable > interrupts. This isn't the case because PREEMPT_RT's printk > implementation differs from the mainline implementation in two important > aspects: > - Printing happens in a dedicated threads and not at during the > invocation of printk(). > - In emergency cases where synchronous printing is used, a different > driver is used which does not use tty_port::lock. > > Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer > printk output. > > Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") > Signed-off-by: Sebastian Andrzej Siewior Thanks for extending the changelog. This is much more clearer IMO. One nit below which I haven't noticed before. Anyway Acked-by: Michal Hocko > --- > mm/page_alloc.c | 11 ++++------- > 1 file changed, 4 insertions(+), 7 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 47421bedc12b7..99b7e7d09c5c0 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5808,11 +5808,10 @@ static void __build_all_zonelists(void *data) > unsigned long flags; > > /* > - * Explicitly disable this CPU's interrupts before taking seqlock > - * to prevent any IRQ handler from calling into the page allocator > - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. > + * The zonelist_update_seq must be acquired with irqsave because the > + * reader can be invoked from IRQ with GFP_ATOMIC. > */ > - local_irq_save(flags); > + write_seqlock_irqsave(&zonelist_update_seq, flags); > /* > * Explicitly disable this CPU's synchronous printk() before taking > * seqlock to prevent any printk() from trying to hold port->lock, for This is not the case anymore because the locking ordering has flipped. I would just extend the comment above by something like: * Also disable synchronous printk() to prevent any printk() from trying * to hold port->lock, for tty_insert_flip_string_and_push_buffer() on * other CPU might be calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with * port->lock held. > @@ -5820,7 +5819,6 @@ static void __build_all_zonelists(void *data) > * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. > */ > printk_deferred_enter(); > - write_seqlock(&zonelist_update_seq); > > #ifdef CONFIG_NUMA > memset(node_load, 0, sizeof(node_load)); > @@ -5857,9 +5855,8 @@ static void __build_all_zonelists(void *data) > #endif > } > > - write_sequnlock(&zonelist_update_seq); > printk_deferred_exit(); > - local_irq_restore(flags); > + write_sequnlock_irqrestore(&zonelist_update_seq, flags); > } > > static noinline void __init > -- > 2.40.1 -- Michal Hocko SUSE Labs