2017-06-26 14:10:23

by Arnd Bergmann

[permalink] [raw]
Subject: Re: 答复: [RFC PATCH] char: misc: Init misc->list in a safe way

On Mon, Jun 26, 2017 at 4:00 PM, Zhongping Tan (谭中平)
<[email protected]> wrote:
> hi Arnd:
> Another way to describe this question, misc_register shouldn't modify the member of the miscdevice especially when return error. Let the caller to ensure the list have been initialized,
> or when return error, please don't initialize the list.

Why not? The caller should only initialize a couple of fields in the
structure (name, minor, fops, ...)
not never even look at the list entry, which is really internal to the
misc_register()/misc_unregister().

Arnd


Subject: RE: 答复: [RFC PATCH] char: misc: Init misc ->list in a safe way

Ok, firstly we need to discuss the list usage, for list head we need do initialization, but for list node we don't need initialization at all. And for misc_list head, we use LIST_HEAD to define and initialize it. So I don't know why we put INIT_LIST_HEAD(&misc->list) in function misc_register, any bugs when without it?

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Arnd Bergmann
Sent: Monday, June 26, 2017 10:10 PM
To: Zhongping Tan (谭中平)
Cc: Orson Zhai (翟京); Greg Kroah-Hartman; Linux Kernel Mailing List
Subject: Re: 答复: [RFC PATCH] char: misc: Init misc->list in a safe way

On Mon, Jun 26, 2017 at 4:00 PM, Zhongping Tan (谭中平)
<[email protected]> wrote:
> hi Arnd:
> Another way to describe this question, misc_register shouldn't modify
> the member of the miscdevice especially when return error. Let the caller to ensure the list have been initialized, or when return error, please don't initialize the list.

Why not? The caller should only initialize a couple of fields in the structure (name, minor, fops, ...) not never even look at the list entry, which is really internal to the misc_register()/misc_unregister().

Arnd

2017-06-27 06:29:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: 答复 : [RFC PATCH] char: misc: Init misc->list in a safe way


A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Tue, Jun 27, 2017 at 02:02:13AM +0000, Zhongping Tan (谭中平) wrote:
> Ok, firstly we need to discuss the list usage, for list head we need
> do initialization, but for list node we don't need initialization at
> all.

I don't understand, why is your misc driver touching that field at all?
Do I need to go and make it "private" somehow?

> And for misc_list head, we use LIST_HEAD to define and initialize
> it. So I don't know why we put INIT_LIST_HEAD(&misc->list) in function
> misc_register, any bugs when without it?

Again, what is wrong with the code today? What driver is this causing
problems for?

thanks,

greg k-h

2017-06-28 01:57:56

by Orson Zhai

[permalink] [raw]
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

Hi Greg & Arnd,

On Tue, Jun 27, 2017 at 08:29:17AM +0200, Greg Kroah-Hartman wrote:
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>
> A: No.
> Q: Should I include quotations after my reply?
>
> http://daringfireball.net/2007/07/on_top

We are so sorry for any troubles to you. I will take the role of Zhongping to
continue discussion here.

>
> On Tue, Jun 27, 2017 at 02:02:13AM +0000, Zhongping Tan (谭中平) wrote:
> > Ok, firstly we need to discuss the list usage, for list head we need
> > do initialization, but for list node we don't need initialization at
> > all.
>
> I don't understand, why is your misc driver touching that field at all?
> Do I need to go and make it "private" somehow?

There maybe some mis-understanding caused by not very well english expression.
I'll clarify what had happended to us.

>
> > And for misc_list head, we use LIST_HEAD to define and initialize
> > it. So I don't know why we put INIT_LIST_HEAD(&misc->list) in function
> > misc_register, any bugs when without it?
>
> Again, what is wrong with the code today? What driver is this causing
> problems for?

It maybe a little bit long story.

In recent, our test team report a crash of non-stop reboot test on a mobile phone after 46 hours.

After some investigation, we got the information as following,

&misc_list = 0xFFFFFFFF822AC780 -> (
next_=_0xFFFFFFFFA0087158 -> (
next = 0xFFFFFFFFA0087158 -> (
next = 0xFFFFFFFFA0087158,
prev = 0xFFFFFFFFA0087158 -> (
next = 0xFFFFFFFFA0087158,
prev = 0xFFFFFFFFA0087158)),
prev = 0xFFFFFFFFA0087158 -> (
next = 0xFFFFFFFFA0087158,
prev = 0xFFFFFFFFA0087158)),
prev = 0xFFFF88007BF55618)

it seems that misc_list fall into a loop which cause surfaceflinger (a service from Android) dead.

And we got futher more information after that,

(struct miscdevice*)0xFFFFFFFFA0087140 = 0xFFFFFFFFA0087140 -> (
minor = 20,
name = 0xFFFFFFFFA008606D -> "fm",
fops = 0xFFFFFFFFA0086700 -> (
owner = 0xFFFFFFFFA0087480,
llseek = 0x0,
read = 0xFFFFFFFFA0081860,
write = 0x0,
read_iter = 0x0,
write_iter = 0x0,
iterate = 0x0,
poll = 0x0,
unlocked_ioctl = 0xFFFFFFFFA00832C0,
compat_ioctl = 0xFFFFFFFFA0083870,
mmap = 0x0,
open = 0xFFFFFFFFA0081B80,
flush = 0x0,
release = 0xFFFFFFFFA00838C0,
fsync = 0x0,
aio_fsync = 0x0,
fasync = 0x0,
lock = 0x0,
sendpage = 0x0,
get_unmapped_area = 0x0,
check_flags = 0x0,
flock = 0x0,
splice_write = 0x0,
splice_read = 0x0,
setlease = 0x0,
fallocate = 0x0,
show_fdinfo = 0x0),
list = (
next = 0xFFFFFFFFA0087158,
prev = 0xFFFFFFFFA0087158),
parent = 0x0,
this_device = 0xFFFF88007BD32000,
groups = 0x0,
nodename = 0x0,
mode = 0)

We found the device is "fm". We highly suspect that fm driver call misc_register twice and reinitialize list to make ->pre & ->next pointing to himself.

Meanwhile, we checked fm driver and found nothing obviously wrong in the code.
Consider that this is a crash after 46 hours continuous power-on/off, it maybe caused by some special cases we are hard to know for now.
We think it might make some sence to add protection code into misc_register() at first.

Hope this could help to understand our situation.

We'll try to provide any detail inforamtion about this if necessary.

Thanks,
Orson

>
> thanks,
>
> greg k-h

2017-06-28 05:18:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

On Wed, Jun 28, 2017 at 09:54:32AM +0800, Orson Zhai wrote:
> We found the device is "fm". We highly suspect that fm driver call
> misc_register twice and reinitialize list to make ->pre & ->next
> pointing to himself.
>
> Meanwhile, we checked fm driver and found nothing obviously wrong in the code.

Do you have a pointer to this driver? Is it in the kernel tree?

> Consider that this is a crash after 46 hours continuous power-on/off,
> it maybe caused by some special cases we are hard to know for now.

What would cause this driver to want to register/unregister itself? Is
it "recycling" the misc structure, or creating it new each time?

And what kernel version are you testing here?

> We think it might make some sence to add protection code into
> misc_register() at first.

To protect from "foolish" callers? Usually we fix the calling code to
not do foolish things. :)

thanks,

greg k-h

2017-06-28 10:34:36

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

On Wed, Jun 28, 2017 at 7:18 AM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Wed, Jun 28, 2017 at 09:54:32AM +0800, Orson Zhai wrote:
>> We found the device is "fm". We highly suspect that fm driver call
>> misc_register twice and reinitialize list to make ->pre & ->next
>> pointing to himself.
>>
>> Meanwhile, we checked fm driver and found nothing obviously wrong in the code.
>
> Do you have a pointer to this driver? Is it in the kernel tree?

I found a version of some spreadtrum FM driver in the sources for the
Samsung Galaxy
J3, this is the driver https://pastebin.com/p7Y7xQNE

The driver has a single static 'misc_device' structure that will get
registered each
time the probe() function is called. The driver also supports both a static
platform_device definition (which a proper driver should not have, the device
should always be created either from platform code or from DT), and probing
from device tree.

If the DT has multiple "sprd,sprd_fm" device nodes, or a driver is lacking
the "#ifndef CONFIG_OF" guard around the static platform device, it should
always crash with the described symptom, but I don't see why it would only
happen after many hours of boot testing.

Arnd

2017-06-28 11:21:19

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

On Wed, Jun 28, 2017 at 12:34:28PM +0200, Arnd Bergmann wrote:
> On Wed, Jun 28, 2017 at 7:18 AM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Wed, Jun 28, 2017 at 09:54:32AM +0800, Orson Zhai wrote:
> >> We found the device is "fm". We highly suspect that fm driver call
> >> misc_register twice and reinitialize list to make ->pre & ->next
> >> pointing to himself.
> >>
> >> Meanwhile, we checked fm driver and found nothing obviously wrong in the code.
> >
> > Do you have a pointer to this driver? Is it in the kernel tree?
>
> I found a version of some spreadtrum FM driver in the sources for the
> Samsung Galaxy
> J3, this is the driver https://pastebin.com/p7Y7xQNE

Ah nice, Orson, is that the driver?

Any objection for me adding it to the kernel tree so we can fix up the
issues that Arnd points out in it (not to mention the coding style
issues...)

thanks,

greg k-h

2017-06-29 06:54:31

by Chunyan Zhang

[permalink] [raw]
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

Hi Greg,

On 28 June 2017 at 19:21, Greg Kroah-Hartman <[email protected]> wrote:
> On Wed, Jun 28, 2017 at 12:34:28PM +0200, Arnd Bergmann wrote:
>> On Wed, Jun 28, 2017 at 7:18 AM, Greg Kroah-Hartman
>> <[email protected]> wrote:
>> > On Wed, Jun 28, 2017 at 09:54:32AM +0800, Orson Zhai wrote:
>> >> We found the device is "fm". We highly suspect that fm driver call
>> >> misc_register twice and reinitialize list to make ->pre & ->next
>> >> pointing to himself.
>> >>
>> >> Meanwhile, we checked fm driver and found nothing obviously wrong in the code.
>> >
>> > Do you have a pointer to this driver? Is it in the kernel tree?
>>
>> I found a version of some spreadtrum FM driver in the sources for the
>> Samsung Galaxy
>> J3, this is the driver https://pastebin.com/p7Y7xQNE
>
> Ah nice, Orson, is that the driver?

For some unknown reason, Orson was missing in this list :)
I'm adding him back.

This is not the latest version of our FM driver. You can find the
version we're using within Spreadtrum here [1], and the kernel version
we're using is v4.4.49.

>
> Any objection for me adding it to the kernel tree so we can fix up the

We have no objection to adding this driver to the kernel tree.

> issues that Arnd points out in it (not to mention the coding style
> issues...)

Actually, our FM driver owner Songhe (cc'ed here) had checked by
adding logs, the probe() function was called only once during the
whole initialization.
They have made a few times of review on the driver and haven't found
any suspicion in it, so Zhongping thought out the solution you can see
in this patch.

Many thanks for your help,
Chunyan

>
> thanks,
>
> greg k-h

2017-06-29 07:03:52

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

On Thu, Jun 29, 2017 at 02:54:23PM +0800, Chunyan Zhang wrote:
> Hi Greg,
>
> On 28 June 2017 at 19:21, Greg Kroah-Hartman <[email protected]> wrote:
> > On Wed, Jun 28, 2017 at 12:34:28PM +0200, Arnd Bergmann wrote:
> >> On Wed, Jun 28, 2017 at 7:18 AM, Greg Kroah-Hartman
> >> <[email protected]> wrote:
> >> > On Wed, Jun 28, 2017 at 09:54:32AM +0800, Orson Zhai wrote:
> >> >> We found the device is "fm". We highly suspect that fm driver call
> >> >> misc_register twice and reinitialize list to make ->pre & ->next
> >> >> pointing to himself.
> >> >>
> >> >> Meanwhile, we checked fm driver and found nothing obviously wrong in the code.
> >> >
> >> > Do you have a pointer to this driver? Is it in the kernel tree?
> >>
> >> I found a version of some spreadtrum FM driver in the sources for the
> >> Samsung Galaxy
> >> J3, this is the driver https://pastebin.com/p7Y7xQNE
> >
> > Ah nice, Orson, is that the driver?
>
> For some unknown reason, Orson was missing in this list :)
> I'm adding him back.
>
> This is not the latest version of our FM driver. You can find the
> version we're using within Spreadtrum here [1], and the kernel version
> we're using is v4.4.49.

You forgot the link in your footnote, so we can read the code :)

> > Any objection for me adding it to the kernel tree so we can fix up the
>
> We have no objection to adding this driver to the kernel tree.

Great, want me to make up the patch, or do you want to?

thanks,

greg k-h

2017-06-29 07:30:34

by Chunyan Zhang

[permalink] [raw]
Subject: Re: [RFC PATCH] char: misc: Init misc->list in a safe way

ti

On 29 June 2017 at 15:03, Greg Kroah-Hartman <[email protected]> wrote:
> On Thu, Jun 29, 2017 at 02:54:23PM +0800, Chunyan Zhang wrote:
>> Hi Greg,
>>
>> On 28 June 2017 at 19:21, Greg Kroah-Hartman <[email protected]> wrote:
>> > On Wed, Jun 28, 2017 at 12:34:28PM +0200, Arnd Bergmann wrote:
>> >> On Wed, Jun 28, 2017 at 7:18 AM, Greg Kroah-Hartman
>> >> <[email protected]> wrote:
>> >> > On Wed, Jun 28, 2017 at 09:54:32AM +0800, Orson Zhai wrote:
>> >> >> We found the device is "fm". We highly suspect that fm driver call
>> >> >> misc_register twice and reinitialize list to make ->pre & ->next
>> >> >> pointing to himself.
>> >> >>
>> >> >> Meanwhile, we checked fm driver and found nothing obviously wrong in the code.
>> >> >
>> >> > Do you have a pointer to this driver? Is it in the kernel tree?
>> >>
>> >> I found a version of some spreadtrum FM driver in the sources for the
>> >> Samsung Galaxy
>> >> J3, this is the driver https://pastebin.com/p7Y7xQNE
>> >
>> > Ah nice, Orson, is that the driver?
>>
>> For some unknown reason, Orson was missing in this list :)
>> I'm adding him back.
>>
>> This is not the latest version of our FM driver. You can find the
>> version we're using within Spreadtrum here [1], and the kernel version
>> we're using is v4.4.49.
>
> You forgot the link in your footnote, so we can read the code :)

Ah, my mistake :)

[1] https://github.com/sprdlinux/kernel/blob/for-greg/drivers/misc/sprdwcn_bsp/sc2342/radio/fmdrv_ops.c

>
>> > Any objection for me adding it to the kernel tree so we can fix up the
>>
>> We have no objection to adding this driver to the kernel tree.
>
> Great, want me to make up the patch, or do you want to?

How do we have the heart to take your precious time :)
Anyway, we can do that on our own.

Thank you,
Chunyan

>
> thanks,
>
> greg k-h