Received: by 2002:a05:6358:c692:b0:131:369:b2a3 with SMTP id fe18csp2322060rwb; Sat, 29 Jul 2023 04:52:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlFN95UrIv4wDklrs+YkGbTsL4w16NNsvWOC1krScHRqPemW/nNCkjisNjDYLq4JoOOAnVzH X-Received: by 2002:a17:907:75ce:b0:994:1956:2331 with SMTP id jl14-20020a17090775ce00b0099419562331mr2126481ejc.13.1690631523656; Sat, 29 Jul 2023 04:52:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690631523; cv=none; d=google.com; s=arc-20160816; b=P8coOAuRAN0AeU6GwVC0fedfwoYMQ6OWbctncY2qrIW8kRpxNyY4DAeYxNfdoS98ZG WCsKYkeAXepE9w4Y34aEjvhNpbGaLvrzb1PBMwQNq8zblVzopbaElpel7r+HpTdfZDHT qnhO4iXhWeK3BahSaUkuiV4v7k5eRsu2ibcaSitOxc0kQPAILM8JJkeVqiRRZXKcha7P 8B/RPuPgzRT3n6S8Xhe6qSoveBHSdRALmvoUOGNoSmJqUmVXekkvdyI9PxyYrqjxjH70 T5znkwV/iZnzFATH9+jaaLR5sThRdEQEscoUQR4c+f5bFVNej6iuwhNB01aJunuBx4Yf onXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id; bh=ZEAyOtYSMbPY1efDALYxlmmM24YaoFsX+Jp8PX7Fvxs=; fh=mMnlLzmplxLziYa6NRBMxcPeUq/7vc6uWfMgPad5q5I=; b=llTxIB5RmbK8YIDjd1uif5JcFoYLcHVunOvzpqNYaJvAj2QMUnXk02Vlbn8EvHYqAN dgLgdUQf28e/eByiltlEcKq8Jl7S8Q32WBGoWl/ubcaTbyvrcBW6oOqMXCUqNZVUgy2T glvjvfjhHH2cgB2YGj1faAxr2DL5ooeeFeSbgsM8Hi4idXwCgvcoBGiHokjhw5Tu+Ha6 vNzSlnNJgPrAXkECYifl9QEfkMt4d4EYzAzY+1sxQOICuX4W7aBQMn7k2AIVRe6lyScA vDncSys7zX7tWoS8Su9qFT8ytm3OBF7PYsV5WXLAekKAHf1//q9DFni1H2hteSquwshQ 5bCQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v14-20020a170906858e00b0098e3698905fsi4332133ejx.625.2023.07.29.04.51.38; Sat, 29 Jul 2023 04:52:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231477AbjG2LFv (ORCPT + 99 others); Sat, 29 Jul 2023 07:05:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229548AbjG2LFt (ORCPT ); Sat, 29 Jul 2023 07:05:49 -0400 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D07CD3AB9 for ; Sat, 29 Jul 2023 04:05:47 -0700 (PDT) Received: from fsav311.sakura.ne.jp (fsav311.sakura.ne.jp [153.120.85.142]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 36TB5jLB045161; Sat, 29 Jul 2023 20:05:45 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav311.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav311.sakura.ne.jp); Sat, 29 Jul 2023 20:05:45 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav311.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 36TB5iJX045156 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Sat, 29 Jul 2023 20:05:45 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: <649fa1a7-4efd-8cc7-92c7-ac7944adc283@I-love.SAKURA.ne.jp> Date: Sat, 29 Jul 2023 20:05:43 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH v2 1/2] seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() Content-Language: en-US From: Tetsuo Handa To: Sebastian Andrzej Siewior , Michal Hocko Cc: Petr Mladek , linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Luis Claudio R. Goncalves" , Andrew Morton , Boqun Feng , Ingo Molnar , John Ogness , Mel Gorman , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon References: <20230623171232.892937-1-bigeasy@linutronix.de> <20230623171232.892937-2-bigeasy@linutronix.de> <20230626081254.XmorFrhs@linutronix.de> <20230727151029.e_M9bi8N@linutronix.de> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2023/07/29 14:31, Tetsuo Handa wrote: > On 2023/07/28 0:10, Sebastian Andrzej Siewior wrote: >> On 2023-06-28 21:14:16 [+0900], Tetsuo Handa wrote: >>>> Anyway, please do not do this change only because of printk(). >>>> IMHO, the current ordering is more logical and the printk() problem >>>> should be solved another way. >>> >>> Then, since [PATCH 1/2] cannot be applied, [PATCH 2/2] is automatically >>> rejected. >> >> My understanding is that this patch gets applied and your objection will >> be noted. > > My preference is that zonelist_update_seq is not checked by !__GFP_DIRECT_RECLAIM > allocations, which is a low-hanging fruit towards GFP_LOCKLESS mentioned at > https://lkml.kernel.org/r/ZG3+l4qcCWTPtSMD@dhcp22.suse.cz and > https://lkml.kernel.org/r/ZJWWpGZMJIADQvRS@dhcp22.suse.cz . > > Maybe we can defer checking zonelist_update_seq till retry check like below, > for this is really an infrequent event. > An updated version with comments added. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 7d3460c7a480..92ecf5f2f76b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3638,26 +3638,44 @@ EXPORT_SYMBOL_GPL(fs_reclaim_release); /* * Zonelists may change due to hotplug during allocation. Detect when zonelists - * have been rebuilt so allocation retries. Reader side does not lock and - * retries the allocation if zonelist changes. Writer side is protected by the - * embedded spin_lock. + * have been rebuilt so __GFP_DIRECT_RECLAIM allocation retries. Writer side + * makes this sequence odd before rebuilding zonelists and makes this sequence + * even after rebuilding zonelists. Sine writer side disables context switching + * when rebuilding zonelists, and !__GFP_DIRECT_RECLAIM allocation will not + * retry when zonelists changed, reader side does not need to hold a lock (but + * needs to use data_race() annotation), making locking dependency simpler. */ -static DEFINE_SEQLOCK(zonelist_update_seq); +static unsigned int zonelist_update_seq; static unsigned int zonelist_iter_begin(void) { if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) - return read_seqbegin(&zonelist_update_seq); + /* See comment above. */ + return data_race(READ_ONCE(zonelist_update_seq)); return 0; } -static unsigned int check_retry_zonelist(unsigned int seq) +static unsigned int check_retry_zonelist(gfp_t gfp, unsigned int seq) { - if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE)) - return read_seqretry(&zonelist_update_seq, seq); + if (IS_ENABLED(CONFIG_MEMORY_HOTREMOVE) && (gfp & __GFP_DIRECT_RECLAIM)) { + /* See comment above. */ + unsigned int seq2 = data_race(READ_ONCE(zonelist_update_seq)); - return seq; + /* + * "seq != seq2" indicates that __build_all_zonelists() has + * started or has finished rebuilding zonelists, hence retry. + * "seq == seq2 && (seq2 & 1)" indicates that + * __build_all_zonelists() is still rebuilding zonelists + * with context switching disabled, hence retry. + * "seq == seq2 && !(seq2 & 1)" indicates that + * __build_all_zonelists() did not rebuilt zonelists, hence + * no retry. + */ + return unlikely(seq != seq2 || (seq2 & 1)); + } + + return 0; } /* Perform direct synchronous page reclaim */ @@ -4146,7 +4164,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * a unnecessary OOM kill. */ if (check_retry_cpuset(cpuset_mems_cookie, ac) || - check_retry_zonelist(zonelist_iter_cookie)) + check_retry_zonelist(gfp_mask, zonelist_iter_cookie)) goto restart; /* Reclaim has failed us, start killing things */ @@ -4172,7 +4190,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, * a unnecessary OOM kill. */ if (check_retry_cpuset(cpuset_mems_cookie, ac) || - check_retry_zonelist(zonelist_iter_cookie)) + check_retry_zonelist(gfp_mask, zonelist_iter_cookie)) goto restart; /* @@ -5136,22 +5154,17 @@ static void __build_all_zonelists(void *data) int nid; int __maybe_unused cpu; pg_data_t *self = data; + static DEFINE_SPINLOCK(lock); unsigned long flags; - /* - * Explicitly disable this CPU's interrupts before taking seqlock - * to prevent any IRQ handler from calling into the page allocator - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. - */ - local_irq_save(flags); - /* - * Explicitly disable this CPU's synchronous printk() before taking - * seqlock to prevent any printk() from trying to hold port->lock, for - * tty_insert_flip_string_and_push_buffer() on other CPU might be - * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. - */ - printk_deferred_enter(); - write_seqlock(&zonelist_update_seq); +#ifdef CONFIG_PREEMPT_RT + migrate_disable() +#endif + /* Serialize increments of zonelist_update_seq. */ + spin_lock_irqsave(&lock, flags); + zonelist_update_seq++; + /* Tell check_retry_zonelist() that we started rebuilding zonelists. */ + smp_wmb(); #ifdef CONFIG_NUMA memset(node_load, 0, sizeof(node_load)); @@ -5188,9 +5201,13 @@ static void __build_all_zonelists(void *data) #endif } - write_sequnlock(&zonelist_update_seq); - printk_deferred_exit(); - local_irq_restore(flags); + /* Tell check_retry_zonelist() that we finished rebuilding zonelists. */ + smp_wmb(); + zonelist_update_seq++; + spin_unlock_irqrestore(&lock, flags); +#ifdef CONFIG_PREEMPT_RT + migrate_enable() +#endif } static noinline void __init