2006-03-15 19:31:35

by Vivek Goyal

[permalink] [raw]
Subject: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Hi,

Is there a reason why "start" and "end" field of "struct resource" are of
type unsigned long. My understanding is that "struct resource" can be used
to represent any system resource including physical memory. But unsigned
long is not suffcient to represent memory more than 4GB on PAE systems.

Currently /proc/iomem exports physical memory also apart from io device
memory. But on i386, it truncates any memory more than 4GB. This leads
to problems for kexec/kdump.

- Kexec reads /proc/iomem to determine the system memory layout and prepares
a memory map based on that and passes it to the kernel being kexeced. Given
the fact that memory more than 4GB has been truncated, new kernel never
gets to see and use that memory.

- Kdump also reads /proc/iomem to determine the physical memory layout of
the system and encodes this informaiton in ELF headers. After a crash
new kernel parses these ELF headers being used by previous kernel and
vmcore is prepared accordingly. As memory more than 4GB has been truncated,
kdump never sees that memory and never prepares ELF headers for it. Hence
vmcore is truncated and limited to 4GB even if there is more physical
memory in the system.

One of the possible solutions to this problem is that expand the size
of "start" and "end" to "unsigned long long". But whole of the PCI and
driver code has been written assuming start and end to be unsigned long
and compiler starts throwing warnings.

I am attaching a prototype patch which does minimal changes to make memory
more than 4GB appear in /proc/iomem. It does not take care of modifying
in tree drivers to supress warnings.

Looking for your suggestion what's the best way to handle this issue. If
above approach seems reasonable then I can go ahead and do the changes
for in tree drivers to handle the warnings.

Thanks
Vivek
-


Attachments:
(No filename) (1.81 kB)
struct-resource-start-field-expansion.patch (2.50 kB)
Download all attachments

2006-03-15 20:28:13

by Kumar Gala

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


On Mar 15, 2006, at 2:13 PM, Eric W. Biederman wrote:

> Kumar Gala <[email protected]> writes:
>
>> On Mar 15, 2006, at 1:57 PM, Arjan van de Ven wrote:
>>
>>>
>>>> One of the possible solutions to this problem is that expand the
>>>> size
>>>> of "start" and "end" to "unsigned long long". But whole of the
>>>> PCI and
>>>> driver code has been written assuming start and end to be
>>>> unsigned long
>>>> and compiler starts throwing warnings.
>>>
>>>
>>> please use dma_addr_t then instead of unsigned long long
>>>
>>> this is the right size on all platforms afaik (could a ppc64 person
>>> verify this?> ;)
>>
>> Actually we really just want "start" and "end" to be u64 on all
>> platforms.
>> Linus was ok with this change but no one has gone through and
>> fixed everything
>> that would be required for it.
>
> Since it is faster to ask :)
>
> How is it that other pieces of code have problems?
> Warnings or something nasty?

Warnings primarily, however I think some places have assumptions
about size that have to be looked at.

- kumar

2006-03-15 19:47:48

by Kumar Gala

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


On Mar 15, 2006, at 1:31 PM, Vivek Goyal wrote:

> Hi,
>
> Is there a reason why "start" and "end" field of "struct resource"
> are of
> type unsigned long. My understanding is that "struct resource" can
> be used
> to represent any system resource including physical memory. But
> unsigned
> long is not suffcient to represent memory more than 4GB on PAE
> systems.
>
> Currently /proc/iomem exports physical memory also apart from io
> device
> memory. But on i386, it truncates any memory more than 4GB. This leads
> to problems for kexec/kdump.
>
> - Kexec reads /proc/iomem to determine the system memory layout and
> prepares
> a memory map based on that and passes it to the kernel being
> kexeced. Given
> the fact that memory more than 4GB has been truncated, new kernel
> never
> gets to see and use that memory.
>
> - Kdump also reads /proc/iomem to determine the physical memory
> layout of
> the system and encodes this informaiton in ELF headers. After a
> crash
> new kernel parses these ELF headers being used by previous kernel
> and
> vmcore is prepared accordingly. As memory more than 4GB has been
> truncated,
> kdump never sees that memory and never prepares ELF headers for
> it. Hence
> vmcore is truncated and limited to 4GB even if there is more
> physical
> memory in the system.
>
> One of the possible solutions to this problem is that expand the size
> of "start" and "end" to "unsigned long long". But whole of the PCI and
> driver code has been written assuming start and end to be unsigned
> long
> and compiler starts throwing warnings.
>
> I am attaching a prototype patch which does minimal changes to make
> memory
> more than 4GB appear in /proc/iomem. It does not take care of
> modifying
> in tree drivers to supress warnings.
>
> Looking for your suggestion what's the best way to handle this
> issue. If
> above approach seems reasonable then I can go ahead and do the changes
> for in tree drivers to handle the warnings.
>
> Thanks
> Vivek
> -

Your patch is insufficient. You really need to audit all the users
of "struct resource". If you search the lkml archives you will see
that Deepak Saxena, GregKH, and myself have taken stabs at this. You
should start with Greg's patch which I'm guessing is rather out of
date now.

- kumar

2006-03-15 21:05:00

by Kumar Gala

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


On Mar 15, 2006, at 2:53 PM, Benjamin LaHaise wrote:

> On Wed, Mar 15, 2006 at 02:31:14PM -0500, Vivek Goyal wrote:
>> Is there a reason why "start" and "end" field of "struct resource"
>> are of
>> type unsigned long. My understanding is that "struct resource" can
>> be used
>> to represent any system resource including physical memory. But
>> unsigned
>> long is not suffcient to represent memory more than 4GB on PAE
>> systems.
>> and compiler starts throwing warnings.
>
> Please make this depend on the kernel being compiled with PAE. We
> don't
> need to bloat 32 bit kernels needlessly.

I disagree. I think we need to look to see what the "bloat" is
before we go and make start/end config dependent.

It seems clear that drivers dont handle the fact that "start"/"end"
change an 32-bit vs 64-bit archs to begin with. By making this even
more config dependent seems to be asking for more trouble.

- kumar

2006-03-15 21:26:41

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


On Wed, 15 Mar 2006, Kumar Gala wrote:

>
> On Mar 15, 2006, at 2:53 PM, Benjamin LaHaise wrote:
>
>> On Wed, Mar 15, 2006 at 02:31:14PM -0500, Vivek Goyal wrote:
>>> Is there a reason why "start" and "end" field of "struct resource"
>>> are of
>>> type unsigned long. My understanding is that "struct resource" can
>>> be used
>>> to represent any system resource including physical memory. But
>>> unsigned
>>> long is not suffcient to represent memory more than 4GB on PAE
>>> systems.
>>> and compiler starts throwing warnings.
>>
>> Please make this depend on the kernel being compiled with PAE. We
>> don't
>> need to bloat 32 bit kernels needlessly.
>
> I disagree. I think we need to look to see what the "bloat" is
> before we go and make start/end config dependent.
>
> It seems clear that drivers dont handle the fact that "start"/"end"
> change an 32-bit vs 64-bit archs to begin with. By making this even
> more config dependent seems to be asking for more trouble.
>
> - kumar

ix86 machines need to make many operations just to increment
or decrement a 64-bit variable, plus the operations are not
atomic so they need to be protected. The bloat of using 64-bit
objects in 32-bit machines is very real and a tremendous problem.
That's why all the stuff in the kernel wasn't 'long long' to
begin with. It's execution bloat, a.k.a., time expansion that
is the problem.

Bump a 32-bit variable, addressed from an index register:

incl (%ebx)

Bump a 64-bit variable, addressed the same way.

addl $1,(%ebx) ; Need to add because inc doesn't carry
adcl $0,4(%ebx) ; Add 0 plus the carry-flag

If an interrupt or context-switch comes between the two operations,
the result is undefined, NotGood(tm).

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.54 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2006-03-15 20:39:21

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Kumar Gala <[email protected]> writes:

> Warnings primarily, however I think some places have assumptions about size
> that have to be looked at.

Ok. So nothing too bad just a thorough audit. Although any driver that
has a real problem with a 64bit type is already broken on every 64bit
arch.

And we do need the 64bit type so we can handle 64bit pci resources on
x86 and ppc32 at some point.

Eric

2006-03-15 21:19:04

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

On Wed, Mar 15, 2006 at 03:05:30PM -0600, Kumar Gala wrote:
> I disagree. I think we need to look to see what the "bloat" is
> before we go and make start/end config dependent.

Eh? 32 bit kernels get used in embedded systems, which includes those
with only 8MB of RAM. The upper 32 bits will never be anything other
than 0.

> It seems clear that drivers dont handle the fact that "start"/"end"
> change an 32-bit vs 64-bit archs to begin with. By making this even
> more config dependent seems to be asking for more trouble.

You can't get a non-32 bit value on a 32 bit platform, so why should a
driver be expected to handle anything?

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.

2006-03-15 20:00:42

by Kumar Gala

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


On Mar 15, 2006, at 1:57 PM, Arjan van de Ven wrote:

>
>> One of the possible solutions to this problem is that expand the size
>> of "start" and "end" to "unsigned long long". But whole of the PCI
>> and
>> driver code has been written assuming start and end to be unsigned
>> long
>> and compiler starts throwing warnings.
>
>
> please use dma_addr_t then instead of unsigned long long
>
> this is the right size on all platforms afaik (could a ppc64 person
> verify this?> ;)

Actually we really just want "start" and "end" to be u64 on all
platforms. Linus was ok with this change but no one has gone through
and fixed everything that would be required for it.

- kumar

2006-03-15 20:16:14

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Kumar Gala <[email protected]> writes:

> On Mar 15, 2006, at 1:57 PM, Arjan van de Ven wrote:
>
>>
>>> One of the possible solutions to this problem is that expand the size
>>> of "start" and "end" to "unsigned long long". But whole of the PCI and
>>> driver code has been written assuming start and end to be unsigned long
>>> and compiler starts throwing warnings.
>>
>>
>> please use dma_addr_t then instead of unsigned long long
>>
>> this is the right size on all platforms afaik (could a ppc64 person
>> verify this?> ;)
>
> Actually we really just want "start" and "end" to be u64 on all platforms.
> Linus was ok with this change but no one has gone through and fixed everything
> that would be required for it.

Since it is faster to ask :)

How is it that other pieces of code have problems?
Warnings or something nasty?

Eric

2006-03-15 20:32:15

by Vivek Goyal

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

On Wed, Mar 15, 2006 at 01:13:56PM -0700, Eric W. Biederman wrote:
> Kumar Gala <[email protected]> writes:
>
> > On Mar 15, 2006, at 1:57 PM, Arjan van de Ven wrote:
> >
> >>
> >>> One of the possible solutions to this problem is that expand the size
> >>> of "start" and "end" to "unsigned long long". But whole of the PCI and
> >>> driver code has been written assuming start and end to be unsigned long
> >>> and compiler starts throwing warnings.
> >>
> >>
> >> please use dma_addr_t then instead of unsigned long long
> >>
> >> this is the right size on all platforms afaik (could a ppc64 person
> >> verify this?> ;)
> >
> > Actually we really just want "start" and "end" to be u64 on all platforms.
> > Linus was ok with this change but no one has gone through and fixed everything
> > that would be required for it.
>
> Since it is faster to ask :)
>
> How is it that other pieces of code have problems?
> Warnings or something nasty?

Few problems which I have noticed so far.

- Many printk() warnings. Wherever start and end are being printed,
the format specifier being used is %lx. Needs to be changed to %Lx.

- Some folks save a pointer of type (unsigned long *) to start and end field
and then try to operate on it. This pointer type shall have to be changed
to something like u64*.

unsigned long *port, *end, *tport, *tend;
port = &dev->res.port_resource[idx].start;

- Some folks cast "start" to a pointer and then use it. Compiler gives warning.

addr_reg = (void __iomem *) addr->start;

-vivek

2006-03-15 20:58:44

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

On Wed, Mar 15, 2006 at 02:31:14PM -0500, Vivek Goyal wrote:
> Is there a reason why "start" and "end" field of "struct resource" are of
> type unsigned long. My understanding is that "struct resource" can be used
> to represent any system resource including physical memory. But unsigned
> long is not suffcient to represent memory more than 4GB on PAE systems.
> and compiler starts throwing warnings.

Please make this depend on the kernel being compiled with PAE. We don't
need to bloat 32 bit kernels needlessly.

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.

2006-03-15 21:29:45

by Kumar Gala

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


On Mar 15, 2006, at 3:13 PM, Benjamin LaHaise wrote:

> On Wed, Mar 15, 2006 at 03:05:30PM -0600, Kumar Gala wrote:
>> I disagree. I think we need to look to see what the "bloat" is
>> before we go and make start/end config dependent.
>
> Eh? 32 bit kernels get used in embedded systems, which includes those
> with only 8MB of RAM. The upper 32 bits will never be anything other
> than 0.

Why do people equate embedded with small amounts of memory. I know
of embedded systems which use 32-bit PowerPCs that have >4G of system
memory.

>> It seems clear that drivers dont handle the fact that "start"/"end"
>> change an 32-bit vs 64-bit archs to begin with. By making this even
>> more config dependent seems to be asking for more trouble.
>
> You can't get a non-32 bit value on a 32 bit platform, so why should a
> driver be expected to handle anything?

I dont follow. I would say that most drivers shouldn't care about
the fact that they are on a 32-bit platform or 64-bit platform. The
point is that drivers have made assumptions about being on 32-bit
platforms which breaks when a 32-bit platform supports a larger
physical address space.

- kumar

2006-03-15 19:57:49

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


> One of the possible solutions to this problem is that expand the size
> of "start" and "end" to "unsigned long long". But whole of the PCI and
> driver code has been written assuming start and end to be unsigned long
> and compiler starts throwing warnings.


please use dma_addr_t then instead of unsigned long long

this is the right size on all platforms afaik (could a ppc64 person
verify this?> ;)

2006-03-15 20:10:31

by Kumar Gala

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"


On Mar 15, 2006, at 2:01 PM, Kumar Gala wrote:

>
> On Mar 15, 2006, at 1:57 PM, Arjan van de Ven wrote:
>
>>
>>> One of the possible solutions to this problem is that expand the
>>> size
>>> of "start" and "end" to "unsigned long long". But whole of the
>>> PCI and
>>> driver code has been written assuming start and end to be
>>> unsigned long
>>> and compiler starts throwing warnings.
>>
>>
>> please use dma_addr_t then instead of unsigned long long
>>
>> this is the right size on all platforms afaik (could a ppc64 person
>> verify this?> ;)
>
> Actually we really just want "start" and "end" to be u64 on all
> platforms. Linus was ok with this change but no one has gone
> through and fixed everything that would be required for it.

As my memory comes back to me on this. I also believe that Andrew
asked me for size comparisons between a kernel using 32-bit start/end
and 64-bit start/end on a 32-bit machine for a allyesconfig build. I
don't believe I ever got around to doing this or reporting the
numbers to him. It would be useful to have both code size
differences and run time (if possible).

- kumar

2006-03-15 21:32:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Benjamin LaHaise <[email protected]> writes:

> On Wed, Mar 15, 2006 at 03:05:30PM -0600, Kumar Gala wrote:
>> I disagree. I think we need to look to see what the "bloat" is
>> before we go and make start/end config dependent.
>
> Eh? 32 bit kernels get used in embedded systems, which includes those
> with only 8MB of RAM. The upper 32 bits will never be anything other
> than 0.

If the impact is very slight or unmeasurable this means the option
needs to fall under CONFIG_EMBEDDED, where you can change if
every last bit of RAM counts but otherwise you won't care.

>> It seems clear that drivers dont handle the fact that "start"/"end"
>> change an 32-bit vs 64-bit archs to begin with. By making this even
>> more config dependent seems to be asking for more trouble.
>
> You can't get a non-32 bit value on a 32 bit platform, so why should a
> driver be expected to handle anything?

Having > 32bit values on a 32bit platform is not the issue.

Some drivers appear to puke simply because the value is 64bit. Which
means the driver will have problems on any 64bit kernel. That kind
of behavior is worth purging.

Eric

2006-03-15 21:34:18

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

On Wed, Mar 15, 2006 at 02:29:32PM -0700, Eric W. Biederman wrote:
> If the impact is very slight or unmeasurable this means the option
> needs to fall under CONFIG_EMBEDDED, where you can change if
> every last bit of RAM counts but otherwise you won't care.

But we have a data type that is correct for this usage: dma_addr_t.

> Having > 32bit values on a 32bit platform is not the issue.
>
> Some drivers appear to puke simply because the value is 64bit. Which
> means the driver will have problems on any 64bit kernel. That kind
> of behavior is worth purging.

Forcing it to be a 64 bit value doesn't fix that problem, so that isn't
a valid excuse for adding bloat.

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.

2006-03-15 21:34:04

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Benjamin LaHaise <[email protected]> writes:

> On Wed, Mar 15, 2006 at 03:05:30PM -0600, Kumar Gala wrote:
>> I disagree. I think we need to look to see what the "bloat" is
>> before we go and make start/end config dependent.
>
> Eh? 32 bit kernels get used in embedded systems, which includes those
> with only 8MB of RAM. The upper 32 bits will never be anything other
> than 0.

If the impact is very slight or unmeasurable this means the option
needs to fall under CONFIG_EMBEDDED, where you can change if
every last bit of RAM counts but otherwise you won't care.

>> It seems clear that drivers dont handle the fact that "start"/"end"
>> change an 32-bit vs 64-bit archs to begin with. By making this even
>> more config dependent seems to be asking for more trouble.
>
> You can't get a non-32 bit value on a 32 bit platform, so why should a
> driver be expected to handle anything?

Having > 32bit values on a 32bit platform is not the issue.

Some drivers appear to puke simply because the value is 64bit. Which
means the driver will have problems on any 64bit kernel. That kind
of behavior is worth purging.

Eric

2006-03-15 21:34:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Kumar Gala <[email protected]> writes:

> On Mar 15, 2006, at 2:53 PM, Benjamin LaHaise wrote:
>
>> On Wed, Mar 15, 2006 at 02:31:14PM -0500, Vivek Goyal wrote:
>>> Is there a reason why "start" and "end" field of "struct resource" are of
>>> type unsigned long. My understanding is that "struct resource" can be used
>>> to represent any system resource including physical memory. But unsigned
>>> long is not suffcient to represent memory more than 4GB on PAE systems.
>>> and compiler starts throwing warnings.
>>
>> Please make this depend on the kernel being compiled with PAE. We don't
>> need to bloat 32 bit kernels needlessly.
>
> I disagree. I think we need to look to see what the "bloat" is before we go
> and make start/end config dependent.
>
> It seems clear that drivers dont handle the fact that "start"/"end" change an
> 32-bit vs 64-bit archs to begin with. By making this even more config
> dependent seems to be asking for more trouble.

Especially as all resource access are rare slow path operations.

Depending on PAE and the like look like an optimization to consider after
getting the parts working.

Eric

2006-03-15 21:34:56

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Vivek Goyal <[email protected]> writes:

> Few problems which I have noticed so far.
>
> - Many printk() warnings. Wherever start and end are being printed,
> the format specifier being used is %lx. Needs to be changed to %Lx.

Sane, but we need to check the 64bit case as well.

> - Some folks save a pointer of type (unsigned long *) to start and end field
> and then try to operate on it. This pointer type shall have to be changed
> to something like u64*.
>
> unsigned long *port, *end, *tport, *tend;
> port = &dev->res.port_resource[idx].start;

Weird.

> - Some folks cast "start" to a pointer and then use it. Compiler gives warning.
>
> addr_reg = (void __iomem *) addr->start;

I'm not familiar enough with that part of the code off the top of my head
but that except for a few helper functions that kind of behavior should
be pretty much forbidden.

This feels like entering the guts of ugly barely working drivers at
the moment.

Eric

Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

On Wednesday 15 March 2006 15:30, Kumar Gala wrote:
> On Mar 15, 2006, at 3:13 PM, Benjamin LaHaise wrote:
> > On Wed, Mar 15, 2006 at 03:05:30PM -0600, Kumar Gala wrote:
> >> I disagree. I think we need to look to see what the "bloat" is
> >> before we go and make start/end config dependent.
> >
> > Eh? 32 bit kernels get used in embedded systems, which includes those
> > with only 8MB of RAM. The upper 32 bits will never be anything other
> > than 0.
>
> Why do people equate embedded with small amounts of memory.
They don't. I believe Kumar said "which includes those that have 8mB" and not
"which all have 8mB" :)
> I know
> of embedded systems which use 32-bit PowerPCs that have >4G of system
> memory.
>
> >> It seems clear that drivers dont handle the fact that "start"/"end"
> >> change an 32-bit vs 64-bit archs to begin with. By making this even
> >> more config dependent seems to be asking for more trouble.
> >
> > You can't get a non-32 bit value on a 32 bit platform, so why should a
> > driver be expected to handle anything?
>
> I dont follow. I would say that most drivers shouldn't care about
> the fact that they are on a 32-bit platform or 64-bit platform. The
> point is that drivers have made assumptions about being on 32-bit
> platforms which breaks when a 32-bit platform supports a larger
> physical address space.
Some platforms are way too different from a 32 bit one to have a driver
support both... so in some cases this wouldn't be good.
>
> - kumar
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
--hackmiester
Walk a mile in my shoes and you will be a mile away in a new pair of shoes.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFD/yYl3ApzN91C7BcRAoVVAJ97uhjh30nQ4hd9bQ90gJqiwsLEfgCeKSrg
bVfqEeJ09WhO6Y51WHEHb6o=
=VTUd
-----END PGP SIGNATURE-----

-----BEGIN GEEK CODE BLOCK-----
Version: Geek Code v3.1 (PHP)
GCS/CM/E/IT d-@ s: a- C++$ UBLS*++++$ P+ L+++$ E- W++$ !N-- !o+ K-- !w-- !O-
M++$ V-- PS@ PE@ Y--? PGP++ !t--- 5--? !X-- !R-- tv-- b+ DI++ D++ G+ e++++
h---- r+++ z++++
------END GEEK CODE BLOCK------

Quick contact info:
Work: [email protected]
Personal: [email protected]
Large files/spam: [email protected]
GTalk:hackmiester/AIM:hackmiester1337/Y!:hackm1ester/IRC:irc.7sinz.net/7sinz


Attachments:
(No filename) (2.45 kB)
(No filename) (189.00 B)
Download all attachments

2006-03-15 21:39:07

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

"linux-os \(Dick Johnson\)" <[email protected]> writes:
> ix86 machines need to make many operations just to increment
> or decrement a 64-bit variable, plus the operations are not
> atomic so they need to be protected. The bloat of using 64-bit
> objects in 32-bit machines is very real and a tremendous problem.
> That's why all the stuff in the kernel wasn't 'long long' to
> begin with. It's execution bloat, a.k.a., time expansion that
> is the problem.
>
> Bump a 32-bit variable, addressed from an index register:
>
> incl (%ebx)
>
> Bump a 64-bit variable, addressed the same way.
>
> addl $1,(%ebx) ; Need to add because inc doesn't carry
> adcl $0,4(%ebx) ; Add 0 plus the carry-flag
>
> If an interrupt or context-switch comes between the two operations,
> the result is undefined, NotGood(tm).

No it is well defined it just isn't atomic. But difference.
The thing is struct resource is accessed less often the file
offsets which are already 64bit. Basically they are only
used during driver initialization.

So while I agree in general that 64bit values should be avoided
this is one of those places where we can productively use a
64bit value, on a 32bit machine.

To keep using 32bit kernels on the newer x86 machines at some
point this will even become a requirement as 64bit BAR will
actually have 64bit values placed in them.

Eric

2006-03-15 21:51:38

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Benjamin LaHaise <[email protected]> writes:

> On Wed, Mar 15, 2006 at 02:29:32PM -0700, Eric W. Biederman wrote:
>> If the impact is very slight or unmeasurable this means the option
>> needs to fall under CONFIG_EMBEDDED, where you can change if
>> every last bit of RAM counts but otherwise you won't care.
>
> But we have a data type that is correct for this usage: dma_addr_t.

Well the name is wrong. Because these are in general not DMA addresses,
but it may have the other desired properties. So it may be
useable.

>> Having > 32bit values on a 32bit platform is not the issue.
>>
>> Some drivers appear to puke simply because the value is 64bit. Which
>> means the driver will have problems on any 64bit kernel. That kind
>> of behavior is worth purging.
>
> Forcing it to be a 64 bit value doesn't fix that problem, so that isn't
> a valid excuse for adding bloat.

It doesn't fix it but it finds it. Which is half the battle. Once
the existing references are fixed up it makes it hard to introduce new
breakage like that, because more people see it.

As for bloat this assumes there will be some measurable bloat.
Resources are used in such a limited fashion I will be surprised if
you can measure any bloat.

Eric

2006-03-15 21:58:05

by David Miller

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

From: Arjan van de Ven <[email protected]>
Date: Wed, 15 Mar 2006 20:57:45 +0100

> please use dma_addr_t then instead of unsigned long long
>
> this is the right size on all platforms afaik (could a ppc64 person
> verify this?> ;)

Nope, it's 32-bit on Sparc64 for example and we definitely
want to support 64-bit BARs there.

2006-03-15 22:08:22

by Greg KH

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

On Wed, Mar 15, 2006 at 02:31:14PM -0500, Vivek Goyal wrote:
> Hi,
>
> Is there a reason why "start" and "end" field of "struct resource" are of
> type unsigned long. My understanding is that "struct resource" can be used
> to represent any system resource including physical memory. But unsigned
> long is not suffcient to represent memory more than 4GB on PAE systems.

As Kumar has stated, people have tried to do this in the past. Please
search the archives for the problems that will happen when you change
this.

I agree it should be fixed, but if you want to do this, you need to
audit a _lot_ of kernel code and fix it up everywhere. Please start
with the old patches posted to lkml in the past and work from there.

good luck,

greg k-h

2006-03-15 22:08:33

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Benjamin LaHaise <[email protected]> writes:

> On Wed, Mar 15, 2006 at 02:29:32PM -0700, Eric W. Biederman wrote:
>> If the impact is very slight or unmeasurable this means the option
>> needs to fall under CONFIG_EMBEDDED, where you can change if
>> every last bit of RAM counts but otherwise you won't care.
>
> But we have a data type that is correct for this usage: dma_addr_t.

Actually now that I think back there are machines with < 4GiB of ram
with 64bit pci BAR values. It is more common to have 32bit values BAR
values and > 4GiB of ram.

So I don't see dma_addr_t in general being the right choice.

Although I do suspect that in most cases dma_addr_t <= the
size of what is in struct resource.

Nor do I think struct resource is nearly as important when being
efficient, as dma_addr_t. struct resource is only used during
driver setup which is a very rare event. A few extra instructions
there likely will get lost in the noise and most of the will probably
be removed because they are in an __init section anyway.

Eric

2006-03-15 22:10:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

[email protected] (Eric W. Biederman) wrote:
>
> Benjamin LaHaise <[email protected]> writes:
>
> > On Wed, Mar 15, 2006 at 02:29:32PM -0700, Eric W. Biederman wrote:
> >> If the impact is very slight or unmeasurable this means the option
> >> needs to fall under CONFIG_EMBEDDED, where you can change if
> >> every last bit of RAM counts but otherwise you won't care.
> >
> > But we have a data type that is correct for this usage: dma_addr_t.
>
> Well the name is wrong. Because these are in general not DMA addresses,
> but it may have the other desired properties. So it may be
> useable.

Yes, dma_addr_t does the right thing but has the wrong name.

I guess it should have been called bus_addr_t (?). If so, I guess a
suitable compromise might be:

#ifndef bus_addr_t
#define bus_addr_t dma_addr_t
#endif

then use bus_addr_t in struct resource.

Yes, we'd like to see the impact on vmlinux size, please.

Many of the problems will be in printks. Whoever does this work should
capture stderr from before- and after- allmodconfig builds and diff them.

The appropriate way to printk a bus_addr_t will be

printk("%lld", (unsigned long long)val);

which will increase code size even if bus_addr_t is u32, sadly.


Finally, we shouldn't just slavishly fix up great piles of printks to fix
the warnings (actually they're bugs). Lots of the printks we have are
fairly useless, and there are other ways of finding out a device's IO and
IOMEM addresses anyway.

So IMO the preferred way of fixing a printk which is generating warnings is
to delete the thing. Chances are that if the affected code actually has a
maintainer, he won't notice anyway ;)

2006-03-15 22:13:10

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

On Wed, Mar 15, 2006 at 02:59:54PM -0700, Eric W. Biederman wrote:
> Actually now that I think back there are machines with < 4GiB of ram
> with 64bit pci BAR values. It is more common to have 32bit values BAR
> values and > 4GiB of ram.

Such machines on x86 would have to be compiled with PAE. Ditto any other
architecture, as you *have* to be able to represent those physical addresses,
which requires having page tables that can map them, which requires whatever
PAE is on the platform.

> Nor do I think struct resource is nearly as important when being
> efficient, as dma_addr_t. struct resource is only used during
> driver setup which is a very rare event. A few extra instructions
> there likely will get lost in the noise and most of the will probably
> be removed because they are in an __init section anyway.

Bloat for no good reason is a bad habit to get into.

-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.

2006-03-15 22:18:39

by David Miller

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

From: Andrew Morton <[email protected]>
Date: Wed, 15 Mar 2006 14:13:05 -0800

> [email protected] (Eric W. Biederman) wrote:
> >
> > Benjamin LaHaise <[email protected]> writes:
> >
> > > On Wed, Mar 15, 2006 at 02:29:32PM -0700, Eric W. Biederman wrote:
> > >> If the impact is very slight or unmeasurable this means the option
> > >> needs to fall under CONFIG_EMBEDDED, where you can change if
> > >> every last bit of RAM counts but otherwise you won't care.
> > >
> > > But we have a data type that is correct for this usage: dma_addr_t.
> >
> > Well the name is wrong. Because these are in general not DMA addresses,
> > but it may have the other desired properties. So it may be
> > useable.
>
> Yes, dma_addr_t does the right thing but has the wrong name.

No it doesn't.

It's 32-bit on Sparc64 because all DMA mappings go through
the IOMMU into a 32-bit window on PCI space.

But we do most certainly want to support full 64-bit BARs
in PCI devices on sparc64.

2006-03-16 14:47:03

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [RFC][PATCH] Expanding the size of "start" and "end" field in "struct resource"

Benjamin LaHaise <[email protected]> writes:

> On Wed, Mar 15, 2006 at 02:59:54PM -0700, Eric W. Biederman wrote:
>> Actually now that I think back there are machines with < 4GiB of ram
>> with 64bit pci BAR values. It is more common to have 32bit values BAR
>> values and > 4GiB of ram.
>
> Such machines on x86 would have to be compiled with PAE. Ditto any other
> architecture, as you *have* to be able to represent those physical addresses,
> which requires having page tables that can map them, which requires whatever
> PAE is on the platform.

It depends on who the user is. In the case that inspired this thread
user space can profit from the information even when the kernel can't.
In addition it appears kernel code quality can benefit as well.

>> Nor do I think struct resource is nearly as important when being
>> efficient, as dma_addr_t. struct resource is only used during
>> driver setup which is a very rare event. A few extra instructions
>> there likely will get lost in the noise and most of the will probably
>> be removed because they are in an __init section anyway.
>
> Bloat for no good reason is a bad habit to get into.

I agree bloat for no good reason is a bad habit to get into.
Premature optimization is a worse habit to get into, as is making
problems unnecessarily complex. Until working patches are produced,
and the impact can be assessed it is to soon to seriously worry about
something that back of the napkin calculations indicate suggest
will have no measurable impact.

Eric