Received: by 2002:a05:6a10:6d10:0:0:0:0 with SMTP id gq16csp970785pxb; Fri, 22 Apr 2022 15:38:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKcCjLg3TfchHl50qAc7nHVsJpZwVuCNVcGqMmLCX9Y1d6lioFvgh8PMcvFuYqJb5g+hoE X-Received: by 2002:a17:903:1208:b0:151:93fd:d868 with SMTP id l8-20020a170903120800b0015193fdd868mr6606005plh.121.1650667089991; Fri, 22 Apr 2022 15:38:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1650667089; cv=none; d=google.com; s=arc-20160816; b=r2TQwE1iCMVmycp5sRhB40DfUFYl9/7Xy94kIiQBuoftnFKAj8nwRU/f3DqV8FETm5 /B6WeAH8niFGjO26DoQi75CktasMZGEr5RdP1/PJrzgVZ7coeMBwtHsGpy8o0EXan3TD 8hnlv00gTtWYX0xQr9LJSa5alkVKig+bBW4Q+0i1AshJBJMxyCuZs8ahuW2sTPj0Epxp 7PGqTF4Vfm5NDr2tDVyMrf1OzLHpL0f70bpcbLLV66DFAmxZXg3i6tgX1jL55y5mYtvX xDnUhSzJ1EOOfq+Zd86hIEZ2dOzEUc9XfU9AfMCOn1NrlqH0SN/7rrIoO9caWvirgAm/ o6VA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=tZpAn7vwl5NxdkUegDfw2pB7KU9327kDjSkpjq1Mgks=; b=Aw6QCONxRjtfhGH8gKpIjt2Cleb457+h2EP9tSFfzCMmkfAIDtkAWxLyzMH+elnodK PNXIHvpUi9/8l5BpapwerndX60qLZL6DUo/SwVYldWc2Cn1MtoXQe0nwkXnL0kse1I/n frY9E/7b8fWS78eGz7SsyS1/qTbOC9j/+cForK7kgRI6rJEEyAAcdW1ngRQBkO34qpnB F5n322QdT8x4d6VenEnwvyTzolFTwxrNwhcA7w/f9UfiWjYxJqtuCgN7SFuA/87SRwVX vxBTjWna9sXBbY9IZQ7nUlC6nWWz+MpIgV3HIC7rN83QzYVLlmUOJfshyRxPB3CletcE Dr9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=K5KZu5yP; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id h11-20020a63f90b000000b003aa2d1654c0si10260200pgi.729.2022.04.22.15.38.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Apr 2022 15:38:09 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=K5KZu5yP; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9766515044B; Fri, 22 Apr 2022 13:25:26 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1449343AbiDVPSg (ORCPT + 99 others); Fri, 22 Apr 2022 11:18:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1448908AbiDVPSH (ORCPT ); Fri, 22 Apr 2022 11:18:07 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E92D162F5 for ; Fri, 22 Apr 2022 08:15:12 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 9E9A72129B; Fri, 22 Apr 2022 15:15:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1650640511; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=tZpAn7vwl5NxdkUegDfw2pB7KU9327kDjSkpjq1Mgks=; b=K5KZu5yPBQ+a5fGQKubuB0QYypr+ihr1wVHc0bI38aFeIxA6oCQDQSewFJR3E08ay4pPIj 7CvzmPVXOP8IW3ouqxgMy7cWpkkUxOwgNOBjcqu13crGBVpSMEywbcM2LJmSIscaRNebCi FIKUAZrQUgfooA345OEsENgE5XLuxmA= Received: from suse.cz (unknown [10.163.26.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 138942C145; Fri, 22 Apr 2022 15:15:11 +0000 (UTC) Date: Fri, 22 Apr 2022 17:15:10 +0200 From: Petr Mladek To: John Ogness Cc: Sergey Senozhatsky , Steven Rostedt , Thomas Gleixner , linux-kernel@vger.kernel.org, Greg Kroah-Hartman Subject: Re: [PATCH printk v3 14/15] printk: extend console_lock for proper kthread support Message-ID: References: <20220419234637.357112-1-john.ogness@linutronix.de> <20220419234637.357112-15-john.ogness@linutronix.de> <20220421124119.GB11747@pathway.suse.cz> <875yn2h5ku.fsf@jogness.linutronix.de> <87o80tp5lv.fsf@jogness.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87o80tp5lv.fsf@jogness.linutronix.de> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 2022-04-22 16:20:52, John Ogness wrote: > On 2022-04-22, Petr Mladek wrote: > > IMHO, it is actually a generic problem of the complex locking scheme > > when there are too many combinations of the protected data. > > Sure. We are in a delicate situation of continuing to support the old > locking scheme while transitioning to a new one. > > > In the current state, the problem seems to be only with CON_ENABLED > > flag but there might be other hidden races in the future. > > > > IMHO, it would be much easier when there are the following rules: > > > > + console_lock() blocks taking con->lock > > + con->lock blocks taking console_lock() > > + Different con->lock might be taken in parallel > > > > The result would be: > > > > + global variables need to be guarded by the big console_lock() > > + con->lock should be enough to guard per-console variables > > + the big console_lock() would serialize also access to > > per-console variables. > > It looks like you are talking about nested locking. This was my original > idea but I had problems relating to kthread stopping. However, the code > has changed a lot since then and now when I look at it, it does not look > like it would be a problem. Getting rid of CON_THD_BLOCKED would greatly > simplify the relationship between console_lock and kthreads. > > For this we would need the console list to become a list_head so that it > is doubly linked (in order to unlock in reverse order). That probably > would be a good idea anyway. It is a bit bizarre that printk implements > its own linked list. Another problem is that the ordering is not stable. The console might come and go. > > Of course, it is not that simple. I am not 100% that we could > > even achieve this. > > It just might be that simple. I will explore it again. > > > Anyway, I think about the following wrapper: > > > > void single_console_lock(struct console *con) > > { > > for (;;) { > > error = wait_event_interruptible(log_wait, > > con->flags & CON_THB_BLOCKED); > > > > if (error) > > continue; > > > > mutex_lock(&con->lock); > > > > if (!con->flags & CON_THB_BLOCKED) > > break; > > > > mutex_unlock(&con->lock); > > } > > } > > > > void single_console_unlock(struct console *con) > > { > > mutex_unlock(&con->lock); > > } > > > > We should use it everywhere instead of the simple mutex_lock(con->lock) > > and mutex_lock(con->lock). And we could remove mutex_lock()/unlock() > > from code called under the big console_lock(). > > Hmmm. Waiting on @log_wait is not correct. A @log_wait wakeup with the > kthread already in the blocked state is unusual. There would need to be > a per-console waitqueue for when the kthread unlocks its mutex. Yeah, it was a simplification. It would be much better to add extra waitqueue for this purpose. > Maybe something like: > > void single_console_lock(struct console *con) > { > for (;;) { > error = wait_event_interruptible(con->lock_wait, > !(con->flags & CON_THB_BLOCKED)); > if (error) > continue; > > mutex_lock(&con->lock); > > if (!(con->flags & CON_THB_BLOCKED)) > break; > > mutex_unlock(&con->lock); > } > } > > And in printk_kthread_func(), after the kthread unlocks its con->lock, > it calls: > > if (wq_has_sleeper(&con->lock_wait)) > wake_up_interruptible_all(&con->lock_wait); You are right. It will need to be done in two situations: + in __console_unlock() when CON_THB_BLOCKED flag is cleared and the big console_lock is released. + in single_console_unlock() because there might be other single_console_lock() waiters. > But single_console_lock() would not be allowed to be called under > console_lock(), so I don't see how it is useful. Yes. The point is that only console_lock(), __console_unlock(), single_console_lock(), single_console_unlock() will be allowed to call mutex_lock()/mutex_unlock() directly. Any other code will need to use these wrappers to get/release the lock. I mean that the manipulation of the mutex and CON_THB_BLOCKED flag will be hidden in these wrappers. We might also want to replace CON_THB_BLOCKED flag with a separate variable (con->locked) to avoid problems with compiler optimizations. Otherwise, we might still need to use WRITE_ONCE()/READ_ONCE() when manipulating con->flags. Maybe, I should prepare a POC to make it more clear and see if it could work. > con->flags is always > modified under @console_sem to make sure the console does not disappear. > > Anyway, I will first look into the nested locking solution. That seems > more promising to me and it would go a long way to simplify the locking > hierarchy. Please, do not spend too much time on this. The solution must be simple in principle. If it gets complicated than it will likely be worse than the current code. Alternative solution would be to reduce the number of variables affected by the race. I mean: + replace CON_THB_BLOCKED flag with con->blocked to avoid the needed of READ_ONCE()/WRITE_ONCE(). + check con->blocked right after taking con->lock in printk_kthread_func() so that all the other accesses are safe. Something like: static int printk_kthread_func(void *data) { [...] for (;;) { error = wait_event_interruptible(log_wait, printer_should_wake(con, seq)); /* LMM(printk_kthread_func:A) */ if (kthread_should_stop() || !printk_kthreads_available) break; if (error) continue; error = mutex_lock_interruptible(&con->lock); if (error) continue; if (con->locked) { mutex_unlock(&con->lock); continue; } /* * Everything below is safe because we know that the console * is not locked by console_lock(); */ if (!console_is_usable(con)) { mutex_unlock(&con->lock); continue; } if ((flags & CON_THD_BLOCKED) || !console_kthread_printing_tryenter()) { mutex_unlock(&con->lock); continue; } [...] { console_lock(); con->kthread = NULL; __console_unlock(); [...] } But it is basically open-coding of the single_console_lock() wrapper. Best Regards, Petr