Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2380151imu; Thu, 17 Jan 2019 13:09:27 -0800 (PST) X-Google-Smtp-Source: ALg8bN42MgRnOhbquvBd0PlL3iqb8N1efHndfVRJJI42w8KjOOKvawlqrk4kqGkdQYz4ahLmbyeg X-Received: by 2002:a63:f811:: with SMTP id n17mr15293622pgh.23.1547759367560; Thu, 17 Jan 2019 13:09:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547759367; cv=none; d=google.com; s=arc-20160816; b=El+AsGcQriLDwvL3toSatamUu01/KJHMQIUAagwGg5CLBNVBtNr4FtisEJ9ZjwXBqU H8eE2qpajwaSEq4UqPNQhjPy44GQHr5vFXB/O016MwBhoAs1kDtHRkTXjDxZOmAlIgjE SSxZZnmeJ2EzTCU9nYGcr55rWHhds5vOa99xqcOCDnFIH534GVeyhZD9hNXRGUL6EyyB MOB5eHc6p6TUR6dd0Hp1AJfcMsaiMQf5NHK6q1TX661YZL2zxXKG2gqtm5sf6GADn64J jZw+0iKqFylm7ZLWLZVRMG0evuqZeHxCqK3i3WU0I1LK+W4E38ETax6wI6769Si/T1OJ 6GEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :content-language:accept-language:in-reply-to:references:message-id :date:thread-index:thread-topic:subject:cc:to:from; bh=0UL/KVkMI8JbmYTW7f2A9myPoIT88igGSROxgQuw3Rw=; b=iRKpR34MlhOpJQAfz9VibPMrdILAkHj6QInSR7jzbKn/nBJB7LBJP7WmSnZmAV5AaJ eduYaHdtPbwSAbqIKEpswv/BM78evyhnMNCH3KK56eE34oy8JnI+lDdml/c03ibaIqEo DpmcJ7pUK9z+aYYn9tdODkwua5su3Uzot6tzb6j67mCKwhUiK69b8KJuoBAWYdAnYF4k na/k+FWXznz/yNtWMa8DoTs/hZuggjs7SWiSlei5nmo5XJ3+NDwVtDnYEPad6DS2TDb1 OMxGKbyk1wi8qMwy95SA5ZUFZJiAB2l9oLwAoZG26iz+6SGp+90gj+WNHOLBovwidA+/ FNdA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b34si2607489pld.305.2019.01.17.13.09.08; Thu, 17 Jan 2019 13:09:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728261AbfAQUwQ convert rfc822-to-8bit (ORCPT + 99 others); Thu, 17 Jan 2019 15:52:16 -0500 Received: from mx08-00178001.pphosted.com ([91.207.212.93]:49371 "EHLO mx07-00178001.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727337AbfAQUwP (ORCPT ); Thu, 17 Jan 2019 15:52:15 -0500 Received: from pps.filterd (m0046661.ppops.net [127.0.0.1]) by mx08-00178001.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x0HKonl0013946; Thu, 17 Jan 2019 21:52:09 +0100 Received: from beta.dmz-eu.st.com (beta.dmz-eu.st.com [164.129.1.35]) by mx08-00178001.pphosted.com with ESMTP id 2pyby1knrm-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT); Thu, 17 Jan 2019 21:52:09 +0100 Received: from zeta.dmz-eu.st.com (zeta.dmz-eu.st.com [164.129.230.9]) by beta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 2CAA231; Thu, 17 Jan 2019 20:52:08 +0000 (GMT) Received: from Webmail-eu.st.com (sfhdag3node1.st.com [10.75.127.7]) by zeta.dmz-eu.st.com (STMicroelectronics) with ESMTP id 0496B2857; Thu, 17 Jan 2019 20:52:08 +0000 (GMT) Received: from SFHDAG7NODE2.st.com (10.75.127.20) by SFHDAG3NODE1.st.com (10.75.127.7) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Thu, 17 Jan 2019 21:52:07 +0100 Received: from SFHDAG7NODE2.st.com ([fe80::d548:6a8f:2ca4:2090]) by SFHDAG7NODE2.st.com ([fe80::d548:6a8f:2ca4:2090%20]) with mapi id 15.00.1347.000; Thu, 17 Jan 2019 21:52:07 +0100 From: Loic PALLARDY To: Bjorn Andersson CC: xiang xiao , "ohad@wizery.com" , "linux-remoteproc@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Arnaud POULIQUEN , "benjamin.gaignard@linaro.org" , "s-anna@ti.com" Subject: RE: [PATCH 1/1] remoteproc: fix recovery procedure Thread-Topic: [PATCH 1/1] remoteproc: fix recovery procedure Thread-Index: AQHUqAn/EJZV5bhF3ki4eEklyP85XqWr6HGAgAJ6h6CABVkmgIAANTLA Date: Thu, 17 Jan 2019 20:52:07 +0000 Message-ID: References: <1547031403-34535-1-git-send-email-loic.pallardy@st.com> <20190117180007.GX9278@minitux> In-Reply-To: <20190117180007.GX9278@minitux> Accept-Language: fr-FR, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.75.127.44] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-01-17_07:,, signatures=0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Bjorn, > -----Original Message----- > From: Bjorn Andersson > Sent: jeudi 17 janvier 2019 19:00 > To: Loic PALLARDY > Cc: xiang xiao ; ohad@wizery.com; linux- > remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud > POULIQUEN ; benjamin.gaignard@linaro.org; s- > anna@ti.com > Subject: Re: [PATCH 1/1] remoteproc: fix recovery procedure > > On Mon 14 Jan 12:23 PST 2019, Loic PALLARDY wrote: > > > Hi Xiang, > > > > > -----Original Message----- > > > From: xiang xiao > > > Sent: samedi 12 janvier 2019 19:29 > > > To: Loic PALLARDY > > > Cc: bjorn.andersson@linaro.org; ohad@wizery.com; linux- > > > remoteproc@vger.kernel.org; linux-kernel@vger.kernel.org; Arnaud > > > POULIQUEN ; > benjamin.gaignard@linaro.org; s- > > > anna@ti.com > > > Subject: Re: [PATCH 1/1] remoteproc: fix recovery procedure > > > > > Thanks Loic for picking this up again. > > > > Hi Loic, > > > The change just hide the problem, I think. The big issue is: > > > 1.virtio devices aren't destroyed by rpproc_stop > > Virtio devices are destroyed by rproc_stop() as vdev is registered as rproc > sub device. > > rproc_stop() is calling rproc_stop_subdevices() which is in charge of > removing virtio device and associated children. > > rproc_vdev_do_stop() --> rproc_remove_virtio_dev() --> > unregister_virtio_device() > > > > Xiang is right, unregister_virtio_device() ends up decrementing the > refcount of device and might free it, but it's not guaranteed. But it that case calling rproc_shutdown() doesn't guarantee devices are free, it is the same. The only difference will be that rproc_vdev will be released by rproc and then reallocated. So virtio device allocation is restarting with a virgin memory buffer. But you will have some ghost devices and restart may failed too. I post a fix [1] last summer to be sure virtio device won't be released while still registered or registering... So there is still potential issue. > > So I think we need to decouple the rproc_vdev and virtio_device, to > allow the latter to potentially outlive the prior. > I checked how to decouple at least the allocation because one issue here. The main issue is that all references are done based on container_of(). I look for a fix having the less impacts on the current code, but still possible to create cross pointer references between rproc_vdev and virtio device. It will clean up the memory allocation procedure, but the problem is still there if sub virtio devices not well release. We need to not be able to restart remote processor if at least one sub device was not correctly release... > > Please find below trace of a recovery on my ST SOC. My 2 rpmsg tty are > removed and re-inserted correctly > > root@stm32mp1:~# ls /dev/ttyRPMSG* > > /dev/ttyRPMSG0 /dev/ttyRPMSG1 > > root@stm32mp1:~# [ 154.832523] remoteproc remoteproc0: crash > detected in m4: type watchdog > > [ 154.837725] remoteproc remoteproc0: handling crash #2 in m4 > > [ 154.843319] remoteproc remoteproc0: recovering m4 > > [ 154.849185] rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: rpmsg tty device 0 > is removed > > [ 154.857572] rpmsg_tty virtio0.rpmsg-tty-channel.-1.1: rpmsg tty device 1 > is removed > > [ 155.382327] remoteproc remoteproc0: warning: remote FW shutdown > without ack > > [ 155.387857] remoteproc remoteproc0: stopped remote processor m4 > > [ 155.398988] m4@0#vdev0buffer: assigned reserved memory node > vdev0buffer@10044000 > > [ 155.405910] virtio_rpmsg_bus virtio0: creating channel rpmsg-tty-channel > addr 0x0 > > [ 155.413422] rpmsg_tty virtio0.rpmsg-tty-channel.-1.0: new channel: > 0x400 -> 0x0 : ttyRPMSG0 > > [ 155.421038] virtio_rpmsg_bus virtio0: creating channel rpmsg-tty-channel > addr 0x1 > > [ 155.429088] rpmsg_tty virtio0.rpmsg-tty-channel.-1.1: new channel: > 0x401 -> 0x1 : ttyRPMSG1 > > [ 155.437338] virtio_rpmsg_bus virtio0: rpmsg host is online > > [ 155.442401] m4@0#vdev0buffer: registered virtio0 (type 7) > > [ 155.461154] remoteproc remoteproc0: remote processor m4 is now up > > ls /dev/ttyRPMSG* > > /dev/ttyRPMSG0 /dev/ttyRPMSG1 > > root@stm32mp1:~# > > > > As vdev is including in a larger struct allocated by rproc, it is safe > > to set it to 0 before initializing virtio device while rproc subdevice > > sequence is respected. > > > > It's likely that this works in most use cases, but if for some reason > there's additional references held those will operate on the object past > your clearing of it. In fact, as the memory is free/kzalloc, virtio device fields are not all at 0 as during boot sequence. As mentioned below issue is coming from kobject state_initialized field which is not in a correct state. This field is only set by kobject_init(). I think normal way of working is to release memory when a device is no more used. But another solution could be to reset it in kobject_cleanup() or kobject_del() in order to have a symmetrical procedure. Regards, Loic [1] https://patchwork.kernel.org/patch/10544757/ > > Regards, > Bjorn > > > > 2.and then rpmsg child devices aren't destroyed too > > > Then, when the remote start and create rpmsg channel again, the > > > duplicated channel will appear in kernel. > > > To fix this problem, we need go through rpproc_shutdown/rproc_boot to > > > destroy all devices(virtio and rpmsg) and create them again. > > Rproc_shutdown/rproc_boot is solving the issue too, except if > rproc_boot() was called several times and so rproc->power atomic not equal > to 1. > > Using only rproc_stop() and rproc_start() allows to preserve rproc->power > and so to be silent from rproc user pov. > > > > Regards, > > Loic > > > > > > Thanks > > > Xiang > > > > > > On Wed, Jan 9, 2019 at 6:56 PM Loic Pallardy > wrote: > > > > > > > > Commit 7e83cab824a87e83cab824a8 ("remoteproc: Modify recovery > path > > > > to use rproc_{start,stop}()") replaces rproc_{shutdown,boot}() with > > > > rproc_{stop,start}(), which skips destroy the virtio device at stop > > > > but re-initializes it again at start. > > > > > > > > Issue is that struct virtio_dev is not correctly reinitialized like done > > > > at initial allocation thanks to kzalloc() and kobject is considered as > > > > already initialized by kernel. That is due to the fact struct virtio_dev > > > > is allocated and released at vdev resource handling level managed and > > > > virtio device is registered and unregistered at rproc subdevices level. > > > > > > > > This patch initializes struct virtio_dev to 0 before using it and > > > > registering it. > > > > > > > > Fixes: 7e83cab824a8 ("remoteproc: Modify recovery path to use > > > rproc_{start,stop}()") > > > > > > > > Reported-by: Xiang Xiao > > > > Signed-off-by: Loic Pallardy > > > > --- > > > > drivers/remoteproc/remoteproc_virtio.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/drivers/remoteproc/remoteproc_virtio.c > > > b/drivers/remoteproc/remoteproc_virtio.c > > > > index 183fc42a510a..88eade99395c 100644 > > > > --- a/drivers/remoteproc/remoteproc_virtio.c > > > > +++ b/drivers/remoteproc/remoteproc_virtio.c > > > > @@ -332,6 +332,8 @@ int rproc_add_virtio_dev(struct rproc_vdev > *rvdev, > > > int id) > > > > struct virtio_device *vdev = &rvdev->vdev; > > > > int ret; > > > > > > > > + /* Reset vdev struct as you don't know how it has been previously > > > used */ > > > > + memset(vdev, 0, sizeof(struct virtio_device)); > > > > vdev->id.device = id, > > > > vdev->config = &rproc_virtio_config_ops, > > > > vdev->dev.parent = dev; > > > > -- > > > > 2.7.4 > > > >