2015-06-14 09:05:19

by Pali Rohár

[permalink] [raw]
Subject: Possible broken MM code in dell-laptop.c?

Hello,

in drivers/platform/x86/dell-laptop.c is this part of code:

static int __init dell_init(void)
{
...
/*
* Allocate buffer below 4GB for SMI data--only 32-bit physical addr
* is passed to SMI handler.
*/
bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
if (!bufferpage) {
ret = -ENOMEM;
goto fail_buffer;
}
buffer = page_address(bufferpage);

ret = dell_setup_rfkill();

if (ret) {
pr_warn("Unable to setup rfkill\n");
goto fail_rfkill;
}
...
fail_rfkill:
free_page((unsigned long)bufferpage);
fail_buffer:
...
}

Then there is another part:

static void __exit dell_exit(void)
{
...
free_page((unsigned long)buffer);
}

I suspect that there is some problem with free_page() call. In dell_init
is called free_page() on bufferpage and in dell_exit() on buffer.

Matthew and Stuart, you introduced this inconsistency in commit:

-------------------------------------------------
commit 116ee77b2858d9c89c0327f3a47c8ba864bf4a96
Author: Stuart Hayes <[email protected]>
Committer: Matthew Garrett <[email protected]>
Date: Wed Feb 10 14:12:13 2010 -0500

dell-laptop: Use buffer with 32-bit physical address

Calls to communicate with system firmware via a SMI (using dcdbas)
need to use a buffer that has a physical address of 4GB or less.
Currently the dell-laptop driver does not guarantee this, and when
the
buffer address is higher than 4GB, the address is truncated to 32
bits
and the SMI handler writes to the wrong memory address.

Signed-off-by: Stuart Hayes <[email protected]>
Acked-by: Matthew Garrett <[email protected]>
-------------------------------------------------

Can you or somebody else (CCed linux-mm) look at this page related code?
I think it is wrong, but somebody authoritative should provide answer.

Thanks.

--
Pali Rohár
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2015-06-15 20:36:59

by Darren Hart

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Sun, Jun 14, 2015 at 11:05:07AM +0200, Pali Roh?r wrote:
> Hello,
>
> in drivers/platform/x86/dell-laptop.c is this part of code:
>
> static int __init dell_init(void)
> {
> ...
> /*
> * Allocate buffer below 4GB for SMI data--only 32-bit physical addr
> * is passed to SMI handler.
> */
> bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
> if (!bufferpage) {
> ret = -ENOMEM;
> goto fail_buffer;
> }
> buffer = page_address(bufferpage);
>
> ret = dell_setup_rfkill();
>
> if (ret) {
> pr_warn("Unable to setup rfkill\n");
> goto fail_rfkill;
> }
> ...
> fail_rfkill:
> free_page((unsigned long)bufferpage);
> fail_buffer:
> ...
> }
>
> Then there is another part:
>
> static void __exit dell_exit(void)
> {
> ...
> free_page((unsigned long)buffer);

I believe you are correct, and this should be bufferpage. Have you observed any
failures?

--
Darren Hart
Intel Open Source Technology Center

2015-06-15 20:42:41

by Pali Rohár

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Monday 15 June 2015 22:36:45 Darren Hart wrote:
> On Sun, Jun 14, 2015 at 11:05:07AM +0200, Pali Rohár wrote:
> > Hello,
> >
> > in drivers/platform/x86/dell-laptop.c is this part of code:
> >
> > static int __init dell_init(void)
> > {
> > ...
> >
> > /*
> >
> > * Allocate buffer below 4GB for SMI data--only 32-bit physical
> > addr * is passed to SMI handler.
> > */
> >
> > bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
> > if (!bufferpage) {
> >
> > ret = -ENOMEM;
> > goto fail_buffer;
> >
> > }
> > buffer = page_address(bufferpage);
> >
> > ret = dell_setup_rfkill();
> >
> > if (ret) {
> >
> > pr_warn("Unable to setup rfkill\n");
> > goto fail_rfkill;
> >
> > }
> >
> > ...
> >
> > fail_rfkill:
> > free_page((unsigned long)bufferpage);
> >
> > fail_buffer:
> > ...
> > }
> >
> > Then there is another part:
> >
> > static void __exit dell_exit(void)
> > {
> > ...
> >
> > free_page((unsigned long)buffer);
>
> I believe you are correct, and this should be bufferpage. Have you
> observed any failures?

Rmmoding dell-laptop.ko works fine. There is no error in dmesg. I think
that buffer (and not bufferpage) should be passed to free_page(). So in
my opinion problem is at fail_rfkill: label and not in dell_exit().

But somebody from linux-mm should look at it...

--
Pali Rohár
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2015-06-15 21:18:32

by Michal Hocko

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Sun 14-06-15 11:05:07, Pali Roh?r wrote:
> Hello,
>
> in drivers/platform/x86/dell-laptop.c is this part of code:
>
> static int __init dell_init(void)
> {
> ...
> /*
> * Allocate buffer below 4GB for SMI data--only 32-bit physical addr
> * is passed to SMI handler.
> */
> bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
[...]
> buffer = page_address(bufferpage);
[...]
> fail_rfkill:
> free_page((unsigned long)bufferpage);

This one should be __free_page because it consumes struct page* and it
is the proper counter part for alloc_page. free_page, just to make it
confusing, consumes an address which has to be translated to a struct
page.

I have no idea why the API has been done this way and yeah, it is really
confusing.

[...]
> static void __exit dell_exit(void)
> {
> ...
> free_page((unsigned long)buffer);


--
Michal Hocko
SUSE Labs

2015-06-15 21:28:08

by Pali Rohár

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Monday 15 June 2015 23:18:16 Michal Hocko wrote:
> On Sun 14-06-15 11:05:07, Pali Rohár wrote:
> > Hello,
> >
> > in drivers/platform/x86/dell-laptop.c is this part of code:
> >
> > static int __init dell_init(void)
> > {
> > ...
> >
> > /*
> >
> > * Allocate buffer below 4GB for SMI data--only 32-bit physical
> > addr * is passed to SMI handler.
> > */
> >
> > bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
>
> [...]
>
> > buffer = page_address(bufferpage);
>
> [...]
>
> > fail_rfkill:
> > free_page((unsigned long)bufferpage);
>
> This one should be __free_page because it consumes struct page* and
> it is the proper counter part for alloc_page. free_page, just to
> make it confusing, consumes an address which has to be translated to
> a struct page.
>
> I have no idea why the API has been done this way and yeah, it is
> really confusing.
>
> [...]
>
> > static void __exit dell_exit(void)
> > {
> > ...
> >
> > free_page((unsigned long)buffer);

So both, either:

free_page((unsigned long)buffer);

or

__free_page(bufferpage);

is correct?

--
Pali Rohár
[email protected]


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2015-06-16 06:34:00

by Michal Hocko

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Mon 15-06-15 23:27:59, Pali Roh?r wrote:
> On Monday 15 June 2015 23:18:16 Michal Hocko wrote:
> > On Sun 14-06-15 11:05:07, Pali Roh?r wrote:
> > > Hello,
> > >
> > > in drivers/platform/x86/dell-laptop.c is this part of code:
> > >
> > > static int __init dell_init(void)
> > > {
> > > ...
> > >
> > > /*
> > >
> > > * Allocate buffer below 4GB for SMI data--only 32-bit physical
> > > addr * is passed to SMI handler.
> > > */
> > >
> > > bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
> >
> > [...]
> >
> > > buffer = page_address(bufferpage);
> >
> > [...]
> >
> > > fail_rfkill:
> > > free_page((unsigned long)bufferpage);
> >
> > This one should be __free_page because it consumes struct page* and
> > it is the proper counter part for alloc_page. free_page, just to
> > make it confusing, consumes an address which has to be translated to
> > a struct page.
> >
> > I have no idea why the API has been done this way and yeah, it is
> > really confusing.
> >
> > [...]
> >
> > > static void __exit dell_exit(void)
> > > {
> > > ...
> > >
> > > free_page((unsigned long)buffer);
>
> So both, either:
>
> free_page((unsigned long)buffer);
>
> or
>
> __free_page(bufferpage);
>
> is correct?

Yes. Although I would use __free_page variant as both seem to be
globally visible.

--
Michal Hocko
SUSE Labs

2015-06-16 07:15:36

by Pali Rohár

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Tuesday 16 June 2015 08:33:46 Michal Hocko wrote:
> On Mon 15-06-15 23:27:59, Pali Rohár wrote:
> > On Monday 15 June 2015 23:18:16 Michal Hocko wrote:
> > > On Sun 14-06-15 11:05:07, Pali Rohár wrote:
> > > > Hello,
> > > >
> > > > in drivers/platform/x86/dell-laptop.c is this part of code:
> > > >
> > > > static int __init dell_init(void)
> > > > {
> > > > ...
> > > >
> > > > /*
> > > >
> > > > * Allocate buffer below 4GB for SMI data--only 32-bit physical
> > > > addr * is passed to SMI handler.
> > > > */
> > > >
> > > > bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
> > >
> > > [...]
> > >
> > > > buffer = page_address(bufferpage);
> > >
> > > [...]
> > >
> > > > fail_rfkill:
> > > > free_page((unsigned long)bufferpage);
> > >
> > > This one should be __free_page because it consumes struct page* and
> > > it is the proper counter part for alloc_page. free_page, just to
> > > make it confusing, consumes an address which has to be translated to
> > > a struct page.
> > >
> > > I have no idea why the API has been done this way and yeah, it is
> > > really confusing.
> > >
> > > [...]
> > >
> > > > static void __exit dell_exit(void)
> > > > {
> > > > ...
> > > >
> > > > free_page((unsigned long)buffer);
> >
> > So both, either:
> >
> > free_page((unsigned long)buffer);
> >
> > or
> >
> > __free_page(bufferpage);
> >
> > is correct?
>
> Yes. Although I would use __free_page variant as both seem to be
> globally visible.
>

Michal, thank you for explaining this situation!

Darren, I will prepare patch which will fix code and use __free_page().

(Btw, execution on fail_rfkill label caused kernel panic)

--
Pali Rohár
[email protected]

2015-06-16 07:43:26

by Michal Hocko

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Tue 16-06-15 09:15:23, Pali Roh?r wrote:
[...]
> Michal, thank you for explaining this situation!
>
> Darren, I will prepare patch which will fix code and use __free_page().
>
> (Btw, execution on fail_rfkill label caused kernel panic)

I am sorry, I could have made it more clear in the very first email.
A panic is to be expected because free_page will translate the given
address to a struct page* but this is what the code gave it. So an
unrelated struct page would be freed (or maybe an invalid one).

--
Michal Hocko
SUSE Labs

2015-06-16 10:12:21

by Pavel Machek

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Mon 2015-06-15 22:42:30, Pali Roh?r wrote:
> On Monday 15 June 2015 22:36:45 Darren Hart wrote:
> > On Sun, Jun 14, 2015 at 11:05:07AM +0200, Pali Roh?r wrote:
> > > Hello,
> > >
> > > in drivers/platform/x86/dell-laptop.c is this part of code:
> > >
> > > static int __init dell_init(void)
> > > {
> > > ...
> > >
> > > /*
> > >
> > > * Allocate buffer below 4GB for SMI data--only 32-bit physical
> > > addr * is passed to SMI handler.
> > > */
> > >
> > > bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
> > > if (!bufferpage) {
> > >
> > > ret = -ENOMEM;
> > > goto fail_buffer;
> > >
> > > }
> > > buffer = page_address(bufferpage);
> > >
> > > ret = dell_setup_rfkill();
> > >
> > > if (ret) {
> > >
> > > pr_warn("Unable to setup rfkill\n");
> > > goto fail_rfkill;
> > >
> > > }
> > >
> > > ...
> > >
> > > fail_rfkill:
> > > free_page((unsigned long)bufferpage);
> > >
> > > fail_buffer:
> > > ...
> > > }
> > >
> > > Then there is another part:
> > >
> > > static void __exit dell_exit(void)
> > > {
> > > ...
> > >
> > > free_page((unsigned long)buffer);
> >
> > I believe you are correct, and this should be bufferpage. Have you
> > observed any failures?
>
> Rmmoding dell-laptop.ko works fine. There is no error in dmesg. I think
> that buffer (and not bufferpage) should be passed to free_page(). So in
> my opinion problem is at fail_rfkill: label and not in dell_exit().

You seem to be right. Interface is strange...

alloc_pages() returns struct page *,
__free_pages() takes struct page *,
free_pages() takes unsinged long.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-06-17 03:43:55

by Darren Hart

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Tue, Jun 16, 2015 at 09:15:23AM +0200, Pali Roh?r wrote:
> On Tuesday 16 June 2015 08:33:46 Michal Hocko wrote:
> > On Mon 15-06-15 23:27:59, Pali Roh?r wrote:
> > > On Monday 15 June 2015 23:18:16 Michal Hocko wrote:
> > > > On Sun 14-06-15 11:05:07, Pali Roh?r wrote:
> > > > > Hello,
> > > > >
> > > > > in drivers/platform/x86/dell-laptop.c is this part of code:
> > > > >
> > > > > static int __init dell_init(void)
> > > > > {
> > > > > ...
> > > > >
> > > > > /*
> > > > >
> > > > > * Allocate buffer below 4GB for SMI data--only 32-bit physical
> > > > > addr * is passed to SMI handler.
> > > > > */
> > > > >
> > > > > bufferpage = alloc_page(GFP_KERNEL | GFP_DMA32);
> > > >
> > > > [...]
> > > >
> > > > > buffer = page_address(bufferpage);
> > > >
> > > > [...]
> > > >
> > > > > fail_rfkill:
> > > > > free_page((unsigned long)bufferpage);
> > > >
> > > > This one should be __free_page because it consumes struct page* and
> > > > it is the proper counter part for alloc_page. free_page, just to
> > > > make it confusing, consumes an address which has to be translated to
> > > > a struct page.
> > > >
> > > > I have no idea why the API has been done this way and yeah, it is
> > > > really confusing.
> > > >
> > > > [...]
> > > >
> > > > > static void __exit dell_exit(void)
> > > > > {
> > > > > ...
> > > > >
> > > > > free_page((unsigned long)buffer);
> > >
> > > So both, either:
> > >
> > > free_page((unsigned long)buffer);
> > >
> > > or
> > >
> > > __free_page(bufferpage);
> > >
> > > is correct?
> >
> > Yes. Although I would use __free_page variant as both seem to be
> > globally visible.
> >

Michal - thanks for the context.

I'm surprised by your recommendation to use __free_page() out here in platform
driver land.

I'd also prefer that the driver consistently free the same address to avoid
confusion.

For these reasons, free_page((unsigned long)buffer) seems like the better
option.

Can you elaborate on why you feel __free_page() is a better choice?

--
Darren Hart
Intel Open Source Technology Center

2015-06-17 07:19:52

by Michal Hocko

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Tue 16-06-15 20:43:34, Darren Hart wrote:
[...]
> Michal - thanks for the context.
>
> I'm surprised by your recommendation to use __free_page() out here in platform
> driver land.
>
> I'd also prefer that the driver consistently free the same address to avoid
> confusion.
>
> For these reasons, free_page((unsigned long)buffer) seems like the better
> option.
>
> Can you elaborate on why you feel __free_page() is a better choice?

Well the allocation uses alloc_page and __free_page is the freeing
counterpart so it is natural to use it if the allocated page is
available. Which is the case here.

Anyway the code can be cleaned up by using __get_free_page for the
allocation, then you do not have to care about the struct page and get
the address right away without an additional code. free_page would be a
natural freeing path.
__get_free_page would be even a better API because it enforces that
the allocation is not from the highmem - which the driver already does
by not using __GFP_HIGHMEM.

--
Michal Hocko
SUSE Labs

2015-06-18 21:15:16

by Darren Hart

[permalink] [raw]
Subject: Re: Possible broken MM code in dell-laptop.c?

On Wed, Jun 17, 2015 at 09:19:39AM +0200, Michal Hocko wrote:
> On Tue 16-06-15 20:43:34, Darren Hart wrote:
> [...]
> > Michal - thanks for the context.
> >
> > I'm surprised by your recommendation to use __free_page() out here in platform
> > driver land.
> >
> > I'd also prefer that the driver consistently free the same address to avoid
> > confusion.
> >
> > For these reasons, free_page((unsigned long)buffer) seems like the better
> > option.
> >
> > Can you elaborate on why you feel __free_page() is a better choice?
>
> Well the allocation uses alloc_page and __free_page is the freeing
> counterpart so it is natural to use it if the allocated page is
> available. Which is the case here.
>
> Anyway the code can be cleaned up by using __get_free_page for the
> allocation, then you do not have to care about the struct page and get
> the address right away without an additional code. free_page would be a
> natural freeing path.
> __get_free_page would be even a better API because it enforces that
> the allocation is not from the highmem - which the driver already does
> by not using __GFP_HIGHMEM.
>

Thank you Michal, I guess I'm just tripping over an API with mismatched __ and
no __ prefix paired calls. Thanks for the clarification.

Pali, I'm fine with any of these options - it sounds as though __get_free_page()
may be a general improvement.

--
Darren Hart
Intel Open Source Technology Center