Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753732AbdF1B54 (ORCPT ); Tue, 27 Jun 2017 21:57:56 -0400 Received: from sci-ig2.spreadtrum.com ([222.66.158.135]:29750 "EHLO SHSQR01.spreadtrum.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1753558AbdF1B5t (ORCPT ); Tue, 27 Jun 2017 21:57:49 -0400 Date: Wed, 28 Jun 2017 09:54:32 +0800 From: Orson Zhai To: Greg Kroah-Hartman CC: Zhongping Tan =?utf-8?B?KOiwreS4reW5syk=?= , Arnd Bergmann , Linux Kernel Mailing List Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way Message-ID: <20170628015431.GA32195@spreadtrum.com> Mail-Followup-To: Greg Kroah-Hartman , Zhongping Tan =?utf-8?B?KOiwreS4reW5syk=?= , Arnd Bergmann , Linux Kernel Mailing List References: <20170626093123.23838-1-orson.zhai@spreadtrum.com> <8f919e212d5149e286422d58a5732356@SHMBX02.spreadtrum.com> <20170627062917.GA10376@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170627062917.GA10376@kroah.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-MAIL: SHSQR01.spreadtrum.com v5S1vRvA065297 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3645 Lines: 116 Hi Greg & Arnd, On Tue, Jun 27, 2017 at 08:29:17AM +0200, Greg Kroah-Hartman wrote: > > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > > A: No. > Q: Should I include quotations after my reply? > > http://daringfireball.net/2007/07/on_top We are so sorry for any troubles to you. I will take the role of Zhongping to continue discussion here. > > On Tue, Jun 27, 2017 at 02:02:13AM +0000, Zhongping Tan (谭中平) wrote: > > Ok, firstly we need to discuss the list usage, for list head we need > > do initialization, but for list node we don't need initialization at > > all. > > I don't understand, why is your misc driver touching that field at all? > Do I need to go and make it "private" somehow? There maybe some mis-understanding caused by not very well english expression. I'll clarify what had happended to us. > > > And for misc_list head, we use LIST_HEAD to define and initialize > > it. So I don't know why we put INIT_LIST_HEAD(&misc->list) in function > > misc_register, any bugs when without it? > > Again, what is wrong with the code today? What driver is this causing > problems for? It maybe a little bit long story. In recent, our test team report a crash of non-stop reboot test on a mobile phone after 46 hours. After some investigation, we got the information as following, &misc_list = 0xFFFFFFFF822AC780 -> ( next_=_0xFFFFFFFFA0087158 -> ( next = 0xFFFFFFFFA0087158 -> ( next = 0xFFFFFFFFA0087158, prev = 0xFFFFFFFFA0087158 -> ( next = 0xFFFFFFFFA0087158, prev = 0xFFFFFFFFA0087158)), prev = 0xFFFFFFFFA0087158 -> ( next = 0xFFFFFFFFA0087158, prev = 0xFFFFFFFFA0087158)), prev = 0xFFFF88007BF55618) it seems that misc_list fall into a loop which cause surfaceflinger (a service from Android) dead. And we got futher more information after that, (struct miscdevice*)0xFFFFFFFFA0087140 = 0xFFFFFFFFA0087140 -> ( minor = 20, name = 0xFFFFFFFFA008606D -> "fm", fops = 0xFFFFFFFFA0086700 -> ( owner = 0xFFFFFFFFA0087480, llseek = 0x0, read = 0xFFFFFFFFA0081860, write = 0x0, read_iter = 0x0, write_iter = 0x0, iterate = 0x0, poll = 0x0, unlocked_ioctl = 0xFFFFFFFFA00832C0, compat_ioctl = 0xFFFFFFFFA0083870, mmap = 0x0, open = 0xFFFFFFFFA0081B80, flush = 0x0, release = 0xFFFFFFFFA00838C0, fsync = 0x0, aio_fsync = 0x0, fasync = 0x0, lock = 0x0, sendpage = 0x0, get_unmapped_area = 0x0, check_flags = 0x0, flock = 0x0, splice_write = 0x0, splice_read = 0x0, setlease = 0x0, fallocate = 0x0, show_fdinfo = 0x0), list = ( next = 0xFFFFFFFFA0087158, prev = 0xFFFFFFFFA0087158), parent = 0x0, this_device = 0xFFFF88007BD32000, groups = 0x0, nodename = 0x0, mode = 0) We found the device is "fm". We highly suspect that fm driver call misc_register twice and reinitialize list to make ->pre & ->next pointing to himself. Meanwhile, we checked fm driver and found nothing obviously wrong in the code. Consider that this is a crash after 46 hours continuous power-on/off, it maybe caused by some special cases we are hard to know for now. We think it might make some sence to add protection code into misc_register() at first. Hope this could help to understand our situation. We'll try to provide any detail inforamtion about this if necessary. Thanks, Orson > > thanks, > > greg k-h