Received: by 2002:a05:7412:3784:b0:e2:908c:2ebd with SMTP id jk4csp1613600rdb; Mon, 2 Oct 2023 15:35:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHIPAEzSP5/7z5pW5heMMhOP3mQ3BXLA60iGbWqW3TkKwKKG2LK863TClDbm4d84LhC9hh3 X-Received: by 2002:a05:6a20:3d85:b0:133:f0b9:856d with SMTP id s5-20020a056a203d8500b00133f0b9856dmr16680520pzi.17.1696286113401; Mon, 02 Oct 2023 15:35:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696286113; cv=none; d=google.com; s=arc-20160816; b=zv0akSWdBPdNpfnDRJz54bWgqBYA2HlpEyvtOBxNmyUrwVYZCFI+XnnpQt5sRV5IiX 5fJkjVAFcuBfrMN8idd4606dsKIlEzENWi5oKBfh9Kkv1aZFb20O+v9KeV08m3fgCdKe KRcLMIvWtlQc5GxDzxiHJky6GT9cZLtnHzBoahSD0TFPtzVekGT4T6yDRo4L9t/N4dcq 7pwxwzyIWUNI3PpGEosOlwtlBX9a5566gWAbzwr5CzbTgc1TkOyQQwFegmGTlkLgnqzc 2MhAp5UqVmpMXv3GCcgAErGCr4JxE/wZyC/GFEqVNUV/oNz38fLhHKJrQ0wziiyvn0GR H92g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=ZmYFxSTwzJ+srWcIfoZo1SLlX6AH34QQsRUSP7AjE6E=; fh=EK5j4NE3tdK2qtiZ4GqEgPMWRvXaVAEy+Qf+/JiMopo=; b=jj8N7l9Dtc6+peoXURqmktcMDe0vc72L6U8i5bwQVw2ki0Rk/3KOMRUV0/E2dUEeIw XdheUUKltNgRaUEqWxDDMQkvGoPZYs8+0tzxe04WnEJ9FV8yovrKk2NPd8iUkLLJSYP3 JMCoBHjpM589I25yk4Lmk2ssPnhlkJkfjh/8IuAc6OOxI9D/O5M94BuzQcNkc97O8WOv GNjBSb/tBVBU2qihdXGZKBV60GpaRwFa4Q+5jbweVO9UIl3DThpoJ+Bj0KWOznz+RHz7 NnUOHsUoCgLCcWezRwolqX9dKz6MhDMc5rAsoXLc5WXsVEHWLI3+3UIRC6jm2EudUB7/ hiyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=RwXb045E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id d12-20020a170902654c00b001c746b986e2si7942980pln.346.2023.10.02.15.35.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 02 Oct 2023 15:35:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=RwXb045E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E7F0B80254D1; Mon, 2 Oct 2023 11:19:30 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238759AbjJBSTR (ORCPT + 99 others); Mon, 2 Oct 2023 14:19:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37058 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238774AbjJBSTP (ORCPT ); Mon, 2 Oct 2023 14:19:15 -0400 Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7EC4D7 for ; Mon, 2 Oct 2023 11:19:11 -0700 (PDT) Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-50585357903so1474620e87.2 for ; Mon, 02 Oct 2023 11:19:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1696270750; x=1696875550; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZmYFxSTwzJ+srWcIfoZo1SLlX6AH34QQsRUSP7AjE6E=; b=RwXb045EvGLvHsr1ITtqFoUVD/XH1oIx7dKCxsCZPIt5ezAnCR9jC0RtgCA5H/K8vr dPulO26L5aRCvR+SyzWjyGg/Ks7FMpyxwR1rqgMvYQqNpSIyZQdBMcOjPvuWgkmt95hk yie2yCI75RzHzBzLLhsh7MO1+j+Aaerxcwk+M= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696270750; x=1696875550; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZmYFxSTwzJ+srWcIfoZo1SLlX6AH34QQsRUSP7AjE6E=; b=SVovyNTxbIqN6hE69JroeXyUeps5Xb6J7AvE+VaqP0HyKT9cHG3oMjy10Skawjg8W4 ZuYrFz42IbVevDE4OBeU70VBYfDzXZnLyeKC9VuQ8KF5p8SAfXEYX2KLoZ3LhxXtxsXZ +E3XpKQm2FVROO1UFz5c1OCuKPFMmnjr1YOB+3BF1L6jVmmaslAgck/0FbfuufhYOrq9 cOZJxFmc0h5cb/1Wge2PbNzLUtTwdY1BJmkZZFzEujvb0UbCrxn0drFmB8Yb6wFUf2cM Hp8ClE+qiTGRPMq76zyzVU1aBBzoeINy4/tO1AuW96L2mvVtyGFkL3IL6oMNymk6c2w/ wsBA== X-Gm-Message-State: AOJu0Yw3aGQr6rkl6BzjWeg1mv/KMzx/+ieq0cM9DdF0I4B/TanDaN1I JjNwkGT1WTQJOpZYprUu/0f14ovXD3NdgXcQzvUuCQ== X-Received: by 2002:a05:6512:3c87:b0:4fd:fabf:b6ee with SMTP id h7-20020a0565123c8700b004fdfabfb6eemr12141694lfv.9.1696270749716; Mon, 02 Oct 2023 11:19:09 -0700 (PDT) MIME-Version: 1.0 References: <20230929021213.2364883-1-joel@joelfernandes.org> <87bkdl55qm.fsf@email.froward.int.ebiederm.org> In-Reply-To: <87bkdl55qm.fsf@email.froward.int.ebiederm.org> From: Joel Fernandes Date: Mon, 2 Oct 2023 14:18:58 -0400 Message-ID: Subject: Re: [PATCH] kexec: Fix reboot race during device_shutdown() To: "Eric W. Biederman" Cc: linux-kernel@vger.kernel.org, Steven Rostedt , Ricardo Ribalda , Ross Zwisler , Rob Clark , Linus Torvalds , kexec@lists.infradead.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 02 Oct 2023 11:19:31 -0700 (PDT) Hi Eric, On Fri, Sep 29, 2023 at 12:01=E2=80=AFPM Eric W. Biederman wrote: > > "Joel Fernandes (Google)" writes: > > > During kexec reboot, it is possible for a race to occur between > > device_shutdown() and userspace. This causes accesses to GPU after pm_= runtime > > suspend has already happened. Fix this by calling freeze_processes() be= fore > > device_shutdown(). > > Is there any reason why this same race with between sys_kexec and the > adreno_ioctl can not happen during a normal reboot? Thanks for the response. It can happen during a normal reboot. I think the reason it does not show up in the wild is because the "reboot" command implementation typically sends one of SIGSTOP or SIGKILL to all processes which effectively prevents the race. In any case, there is also a school of thought that says the kernel should be resilient to crashes and a userspace workaround involving sending signals could be looked at as papering over the real issue. I do sympathize/agree with that school of thought as well. > Is there any reason why there is not a .shutdown method to prevent the > race? > I would think the thing to do is to prevent this race in > kernel_restart_prepare or in the GPUs .shutdown method. As I don't see > anything that would prevent this during a normal reboot. What you're saying is essentially what I remember trying, the issue is not in the GPU driver but rather there the interconnect in the SoC shutdown and causes an "SError" exception if the CPU tries to access the memory locations, as also seen in the stack. I was not able to trace exactly when the interconnect becomes unavailable and perhaps there is a possibility of a more intricate fix where we can signal the GPU driver to not access the bus anymore, but my suspicion is that will add a lot of complexity and perhaps leave the door open to similar issues. > > Such freezing is already being done if kernel supports KEXEC_JUMP and > > kexec_image->preserve_context is true. However, doing it if either of t= hese are > > not true prevents crashes/races. > > The KEXEC_JUMP case is something else entirely. It is supposed to work > like suspend to RAM. Maybe reboot should as well, but I am > uncomfortable making a generic device fix kexec specific. I see your point of view. I think regular reboot should also be fixed to avoid similar crash possibilities. I am happy to make a change for that similar to this patch if we want to proceed that way. Thoughts? thanks, - Joel