Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2414286imu; Thu, 17 Jan 2019 13:49:39 -0800 (PST) X-Google-Smtp-Source: ALg8bN4Dw0eTFXkXgbxZmskFyv7HZzOarl5OqBmTJxmB+j2hg3WKuJs9Qvke2g6d3Jy+SJHEuRfO X-Received: by 2002:a63:8c6:: with SMTP id 189mr15136329pgi.322.1547761779534; Thu, 17 Jan 2019 13:49:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547761779; cv=none; d=google.com; s=arc-20160816; b=UkXjjn0mxmU1M1vkBjgoxzl67VXrbiEkfhbMeY3sWXSJhGPiVmKVNrMRhQuRmYYi6C iNwmYp1BwODWVIvnvoMCBhQ+SeY8uG8V7d2AZ9jMROjW1iSiqVIluyl8l5tcLYLIpdvg eCFN5Ne6kpowiyZbvgT7t2FCBWQ36KRim07VZ8vhXK0tlBpr88gsTy2z4CSy4CKdLxQJ dWt0AgE0F/nWveKiinunqR/iTtMwkDLLpEPCIhLz1Br4VMkwgsMCJwtdXjWX9UgNJk8K 6VPoQwx7koWgDhvwZPvnRLUbYg9+ccLwQ/guPf4bX+GBsTgBWcix3EZ3PdO+D6AVO1EV ekOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from; bh=YwRMRMiFPSi5K92JaZKC/qHxFd/oxMLfOMAEGQI0pcU=; b=P92HHB3Hzr3y3a0H+eS6con6q7cvRHV64yBfnnZMfNDerkisr6nb7ucx2JFd62+gFV 0eYCUNbCKl5EAgqyuiUNZBWjZQ5HF3uo7tKcIMUysaPTIbwIf+ekl+PeEyWjEZ4R1c0m 4XM1UlLXKY18dA9i7IoS0buTr1iBubDo3bXnlqRGiK0A9S4+K8LNQRI2S+aDVlmkiZcB /XiU0nEO5lMhOX3eJh7sQRqMZQQguGr9jS3LtDV7jT6z1tOVjZHx6STqR+ckcrfDR8iv Hs/C18AYQoSpW1dVE3umziY9DaNBPZJYxspAm429QQB9/0l0f00Jwc60y6cWIV27NIr2 bAqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d37si2750533plb.140.2019.01.17.13.49.24; Thu, 17 Jan 2019 13:49:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728035AbfAQVoT convert rfc822-to-8bit (ORCPT + 99 others); Thu, 17 Jan 2019 16:44:19 -0500 Received: from mx08-00178001.pphosted.com ([91.207.212.93]:43912 "EHLO mx07-00178001.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726900AbfAQVoS (ORCPT ); Thu, 17 Jan 2019 16:44:18 -0500 Received: from pps.filterd (m0046660.ppops.net [127.0.0.1]) by mx08-00178001.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x0HLf6jw019920; Thu, 17 Jan 2019 22:44:13 +0100 Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx08-00178001.pphosted.com with ESMTP id 2pybxyktfs-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 17 Jan 2019 22:44:13 +0100 Received: from zeta.dmz-eu.st.com (zeta.dmz-eu.st.com [164.129.230.9]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 72C3131; Thu, 17 Jan 2019 21:44:12 +0000 (GMT) Received: from Webmail-eu.st.com (sfhdag3node2.st.com [10.75.127.8]) by zeta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 4B4A629C8; Thu, 17 Jan 2019 21:44:12 +0000 (GMT) Received: from SFHDAG7NODE2.st.com (10.75.127.20) by SFHDAG3NODE2.st.com (10.75.127.8) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Thu, 17 Jan 2019 22:44:11 +0100 Received: from SFHDAG7NODE2.st.com ([fe80::d548:6a8f:2ca4:2090]) by SFHDAG7NODE2.st.com ([fe80::d548:6a8f:2ca4:2090%20]) with mapi id 15.00.1347.000; Thu, 17 Jan 2019 22:44:11 +0100 From: Loic PALLARDY To: Loic PALLARDY , Bjorn Andersson CC: xiang xiao , "ohad@wizery.com" , "linux-remoteproc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Arnaud POULIQUEN , "benjamin.gaignard@linaro.org" , "s-anna@ti.com" Subject: RE: [PATCH 1/1] remoteproc: fix recovery procedure Thread-Topic: [PATCH 1/1] remoteproc: fix recovery procedure Thread-Index: AQHUqAn/EJZV5bhF3ki4eEklyP85XqWr6HGAgAJ6h6CABVkmgIAANTLAgAAYA2A= Date: Thu, 17 Jan 2019 21:44:11 +0000 Message-ID: References: <1547031403-34535-1-git-send-email-loic.pallardy@st.com> <20190117180007.GX9278@minitux> In-Reply-To: Accept-Language: fr-FR, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.75.127.44] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-01-17_08:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > -----Original Message----- > From: linux-remoteproc-owner@vger.kernel.org owner@vger.kernel.org> On Behalf Of Loic PALLARDY > Sent: jeudi 17 janvier 2019 21:52 > To: Bjorn Andersson > Cc: xiang xiao ; ohad@wizery.com; linux- > remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud > POULIQUEN ; benjamin.gaignard@linaro.org; s- > anna@ti.com > Subject: RE: [PATCH 1/1] remoteproc: fix recovery procedure > > Hi Bjorn, > > > -----Original Message----- > > From: Bjorn Andersson > > Sent: jeudi 17 janvier 2019 19:00 > > To: Loic PALLARDY > > Cc: xiang xiao ; ohad@wizery.com; linux- > > remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud > > POULIQUEN ; benjamin.gaignard@linaro.org; > s- > > anna@ti.com > > Subject: Re: [PATCH 1/1] remoteproc: fix recovery procedure > > > > On Mon 14 Jan 12:23 PST 2019, Loic PALLARDY wrote: > > > > > Hi Xiang, > > > > > > > -----Original Message----- > > > > From: xiang xiao > > > > Sent: samedi 12 janvier 2019 19:29 > > > > To: Loic PALLARDY > > > > Cc: bjorn.andersson@linaro.org; ohad@wizery.com; linux- > > > > remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud > > > > POULIQUEN ; > > benjamin.gaignard@linaro.org; s- > > > > anna@ti.com > > > > Subject: Re: [PATCH 1/1] remoteproc: fix recovery procedure > > > > > > > > Thanks Loic for picking this up again. > > > > > > Hi Loic, > > > > The change just hide the problem, I think. The big issue is: > > > > 1.virtio devices aren't destroyed by rpproc_stop > > > Virtio devices are destroyed by rproc_stop() as vdev is registered as rproc > > sub device. > > > rproc_stop() is calling rproc_stop_subdevices() which is in charge of > > removing virtio device and associated children. > > > rproc_vdev_do_stop() --> rproc_remove_virtio_dev() --> > > unregister_virtio_device() > > > > > > > Xiang is right, unregister_virtio_device() ends up decrementing the > > refcount of device and might free it, but it's not guaranteed. > > But it that case calling rproc_shutdown() doesn't guarantee devices are free, > it is the same. > The only difference will be that rproc_vdev will be released by rproc and > then reallocated. So virtio device allocation is restarting with a virgin memory > buffer. But you will have some ghost devices and restart may failed too. > I post a fix [1] last summer to be sure virtio device won't be released while > still registered or registering... So there is still potential issue. > > > > > So I think we need to decouple the rproc_vdev and virtio_device, to > > allow the latter to potentially outlive the prior. > > > I checked how to decouple at least the allocation because one issue here. > The main issue is that all references are done based on container_of(). > I look for a fix having the less impacts on the current code, but still possible to > create cross pointer references between rproc_vdev and virtio device. > It will clean up the memory allocation procedure, but the problem is still > there if sub virtio devices not well release. > We need to not be able to restart remote processor if at least one sub device > was not correctly release... > > > > Please find below trace of a recovery on my ST SOC. My 2 rpmsg tty are > > removed and re-inserted correctly > > > root@stm32mp1:~# ls /dev/ttyRPMSG* > > > /dev/ttyRPMSG0 /dev/ttyRPMSG1 > > > root@stm32mp1:~# [ 154.832523] remoteproc remoteproc0: crash > > detected in m4: type watchdog > > > [ 154.837725] remoteproc remoteproc0: handling crash #2 in m4 > > > [ 154.843319] remoteproc remoteproc0: recovering m4 > > > [ 154.849185] rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: rpmsg tty device > 0 > > is removed > > > [ 154.857572] rpmsg_tty virtio0.rpmsg-tty-channel.-1.1: rpmsg tty device > 1 > > is removed > > > [ 155.382327] remoteproc remoteproc0: warning: remote FW shutdown > > without ack > > > [ 155.387857] remoteproc remoteproc0: stopped remote processor m4 > > > [ 155.398988] m4@0#vdev0buffer: assigned reserved memory node > > vdev0buffer@10044000 > > > [ 155.405910] virtio_rpmsg_bus virtio0: creating channel rpmsg-tty- > channel > > addr 0x0 > > > [ 155.413422] rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: new channel: > > 0x400 -> 0x0 : ttyRPMSG0 > > > [ 155.421038] virtio_rpmsg_bus virtio0: creating channel rpmsg-tty- > channel > > addr 0x1 > > > [ 155.429088] rpmsg_tty virtio0.rpmsg-tty-channel.-1.1: new channel: > > 0x401 -> 0x1 : ttyRPMSG1 > > > [ 155.437338] virtio_rpmsg_bus virtio0: rpmsg host is online > > > [ 155.442401] m4@0#vdev0buffer: registered virtio0 (type 7) > > > [ 155.461154] remoteproc remoteproc0: remote processor m4 is now up > > > ls /dev/ttyRPMSG* > > > /dev/ttyRPMSG0 /dev/ttyRPMSG1 > > > root@stm32mp1:~# > > > > > > As vdev is including in a larger struct allocated by rproc, it is safe > > > to set it to 0 before initializing virtio device while rproc subdevice > > > sequence is respected. > > > > > > > It's likely that this works in most use cases, but if for some reason > > there's additional references held those will operate on the object past > > your clearing of it. > > In fact, as the memory is free/kzalloc, virtio device fields are not all at 0 as > during boot sequence. > As mentioned below issue is coming from kobject state_initialized field which > is not in a correct state. > This field is only set by kobject_init(). > I think normal way of working is to release memory when a device is no more > used. > But another solution could be to reset it in kobject_cleanup() or > kobject_del() in order to have a symmetrical procedure. Reading some literature, it is a bad idea. Having a look to device_initialize () function description, it is clearly mention device struct must be 0 (except fields provided by user) before. (Same in kobject documentation) Extract drivers/base/core.c [1] * All fields in @dev must be initialized by the caller to 0, except * for those explicitly set to some other value. The simplest * approach is to use kzalloc() to allocate the structure containing * @dev. So memset or kfree/kzalloc of virtio_device manadatory. Regards, Loic [1]: https://elixir.bootlin.com/linux/latest/source/drivers/base/core.c#L1482 > > Regards, > Loic > [1] https://patchwork.kernel.org/patch/10544757/ > > > > > Regards, > > Bjorn > > > > > > 2.and then rpmsg child devices aren't destroyed too > > > > Then, when the remote start and create rpmsg channel again, the > > > > duplicated channel will appear in kernel. > > > > To fix this problem, we need go through rpproc_shutdown/rproc_boot > to > > > > destroy all devices(virtio and rpmsg) and create them again. > > > Rproc_shutdown/rproc_boot is solving the issue too, except if > > rproc_boot() was called several times and so rproc->power atomic not > equal > > to 1. > > > Using only rproc_stop() and rproc_start() allows to preserve rproc- > >power > > and so to be silent from rproc user pov. > > > > > > Regards, > > > Loic > > > > > > > > Thanks > > > > Xiang > > > > > > > > On Wed, Jan 9, 2019 at 6:56 PM Loic Pallardy > > wrote: > > > > > > > > > > Commit 7e83cab824a87e83cab824a8 ("remoteproc: Modify recovery > > path > > > > > to use rproc_{start,stop}()") replaces rproc_{shutdown,boot}() with > > > > > rproc_{stop,start}(), which skips destroy the virtio device at stop > > > > > but re-initializes it again at start. > > > > > > > > > > Issue is that struct virtio_dev is not correctly reinitialized like done > > > > > at initial allocation thanks to kzalloc() and kobject is considered as > > > > > already initialized by kernel. That is due to the fact struct virtio_dev > > > > > is allocated and released at vdev resource handling level managed > and > > > > > virtio device is registered and unregistered at rproc subdevices level. > > > > > > > > > > This patch initializes struct virtio_dev to 0 before using it and > > > > > registering it. > > > > > > > > > > Fixes: 7e83cab824a8 ("remoteproc: Modify recovery path to use > > > > rproc_{start,stop}()") > > > > > > > > > > Reported-by: Xiang Xiao > > > > > Signed-off-by: Loic Pallardy > > > > > --- > > > > > drivers/remoteproc/remoteproc_virtio.c | 2 ++ > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > diff --git a/drivers/remoteproc/remoteproc_virtio.c > > > > b/drivers/remoteproc/remoteproc_virtio.c > > > > > index 183fc42a510a..88eade99395c 100644 > > > > > --- a/drivers/remoteproc/remoteproc_virtio.c > > > > > +++ b/drivers/remoteproc/remoteproc_virtio.c > > > > > @@ -332,6 +332,8 @@ int rproc_add_virtio_dev(struct rproc_vdev > > *rvdev, > > > > int id) > > > > > struct virtio_device *vdev = &rvdev->vdev; > > > > > int ret; > > > > > > > > > > + /* Reset vdev struct as you don't know how it has been > previously > > > > used */ > > > > > + memset(vdev, 0, sizeof(struct virtio_device)); > > > > > vdev->id.device = id, > > > > > vdev->config = &rproc_virtio_config_ops, > > > > > vdev->dev.parent = dev; > > > > > -- > > > > > 2.7.4 > > > > >