Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753874Ab3COQl3 (ORCPT ); Fri, 15 Mar 2013 12:41:29 -0400 Received: from mx1.redhat.com ([209.132.183.28]:11975 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751447Ab3COQl2 (ORCPT ); Fri, 15 Mar 2013 12:41:28 -0400 Date: Fri, 15 Mar 2013 17:39:16 +0100 From: Oleg Nesterov To: Andrew Morton Cc: Linus Torvalds , Andi Kleen , Lucas De Marchi , Benjamin Herrenschmidt , Linux Kernel Mailing List , Paul Mackerras , david@gibson.dropbear.id.au, Kees Cook , Serge Hallyn , "Rafael J. Wysocki" , Feng Hong , Lucas De Marchi Subject: Re: [PATCH 1/1] poweroff: change orderly_poweroff() to use schedule_work() Message-ID: <20130315163916.GA31995@redhat.com> References: <20130312182210.GA15862@redhat.com> <20130312191118.GA17439@redhat.com> <20130312203514.GA23488@redhat.com> <20130313174641.GA28083@redhat.com> <20130313174705.GB28083@redhat.com> <20130314152819.7fb1242b493e8bad2d34671b@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130314152819.7fb1242b493e8bad2d34671b@linux-foundation.org> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3959 Lines: 135 On 03/14, Andrew Morton wrote: > > On Wed, 13 Mar 2013 18:47:05 +0100 Oleg Nesterov wrote: > > > This means that orderly_poweroff() becomes async even if we do not > > run the command and always succeeds, schedule_work() can only fail > > if the work is already pending. We can export __orderly_poweroff() > > and change the non-atomic callers which want the old semantics. > > > > ... > > > > @@ -2218,21 +2237,9 @@ static int __orderly_poweroff(void) > > */ > > int orderly_poweroff(bool force) > > { > > + if (force) /* do not override the pending "true" */ > > + poweroff_force = true; > > + schedule_work(&poweroff_work); > > + return 0; > > } > > afaict the current version of orderly_poweroff() will never return - > either __orderly_poweroff() will block until the machine shuts down or > kernel_power_off() will do so. Note that __orderly_poweroff() uses UMH_WAIT_EXEC, not UMH_WAIT_PROC, so it returns right after /sbin/poweroff starts to execute. So it is already asynchronous unless execve() fails. > However with this patch there is a path via which orderly_poweroff() > can return to its caller, I think? See above, but please also read the changelog. With this patch orderly_poweroff() is always async, even if exec fails, but > If so, the caller might be rather > surprised and we're exercising never-before-used code paths. In fact > if the surprised caller goes oops, the poweroff might not occur at all. This should not happen. Anyway. Please also note that now we can export __orderly_poweroff() and probably change it, it can have another argument "bool use_UMH_WAIT_PROC". int __orderly_poweroff(bool force, bool sync) { int wait = sync ? UMH_WAIT_EXEC : UMH_WAIT_EXEC; ret = call_usermodehelper(argv[0], argv, envp, wait); if (force) { // EXEC failed or /sbin/poweroff didn't do its work if (ret || sync) kernel_power_off(); } } The non-atomic callers can use __orderly_poweroff(sync => true). --------------------------------------------------------------------------- And, Andrew, et all... Could you help with another mentioned problem? It is really trivial, but exactly because it is trivial I do not know what should I do. To remind, say, argv_split(poweroff_cmd) can race with sysctl changing this string, in this case it can write to the memory after argv[] array. We can fix this, or we can rewrite argv_split/free: void argv_free(char **argv) { kfree(argv[-1]); kfree(argv); } char **argv_split(gfp_t gfp, const char *str, int *argcp) { char *argv_str; bool was_space; char **argv, **argv_ret; int argc; argv_str = kstrndup(str, KMALLOC_MAX_SIZE, gfp); if (!argv_str) return NULL; argc = count_argc(argv_str); argv = kmalloc(sizeof(*argv) * (argc + 2), gfp); if (!argv) { kfree(argv_str); return NULL; } *argv = argv_str; argv_ret = ++argv; for (was_space = true; *argv_str; argv_str++) { if (isspace(*argv_str)) { was_space = true; *argv_str = 0; } else if (was_space) { was_space = false; *argv++ = argv_str; } } *argv = NULL; if (argcp) *argcp = argc; return argv_ret; } This way it uses a single kstrndup() to keep the arguments and it is always safe. But, whatever we do with argv_split(), it can hit the string "in between". Personally I think we do not really care, but... Perhaps we should add proc_dostring_lock() which takes some lock and modify the callers of argv_split() (or add argv_split_lock) ? Or perhaps we should introduce the rwsem which should protect every sysctl-string and proc_dostring() should take this lock? Help! I'd prefer to rewrite argv_split(), but I agree with any suggestion in advance. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/