hello,every one.
I have a question:
Why does linux choose 896MB to do a start point of ZONE_HIGHMEM and
the end point of ZONE_NORMAL. Just for experience?
What is the advantages?
Hi Hayfeng,
On Tue, Apr 6, 2010 at 8:07 PM, hayfeng Lee <[email protected]> wrote:
> hello,every one.
> I have a question:
> Why does linux choose 896MB to do a start point of ZONE_HIGHMEM and
> the end point of ZONE_NORMAL. Just for experience?
> What is the advantages?
This is not an advantage but a limitation of 32 bit processor and
architecture. Only physical memory in first 896MB is directly mapped
to the kernel virtual memory address space. This is called
ZONE_NORMAL. To access any physical memory in ZONE_HIGHMEM, the kernel
has to set up page table entries to indirectly map the physical memory
into a virtual memory address (I think around 128MB or so worth page
table entries are reused for this purpose). On the other hand, on 64
bit architectures, the entire physical memory is directly mapped and
accessible to the kernel. ZONE_HIGHMEM doesn't exist on 64 bit.
Take the above with a grain of salt, someone with a better knowledge
about this intrusive topic can be give a more detailed explanation :)
Hope this helps, thanks,
-Joel
On 04/06/2010 08:02 AM, Joel Fernandes wrote:
> Hi Hayfeng,
>
> On Tue, Apr 6, 2010 at 8:07 PM, hayfeng Lee <[email protected]> wrote:
>> hello,every one.
>> I have a question:
>> Why does linux choose 896MB to do a start point of ZONE_HIGHMEM and
>> the end point of ZONE_NORMAL. Just for experience?
>> What is the advantages?
>
> This is not an advantage but a limitation of 32 bit processor and
> architecture. Only physical memory in first 896MB is directly mapped
> to the kernel virtual memory address space. This is called
> ZONE_NORMAL. To access any physical memory in ZONE_HIGHMEM, the kernel
> has to set up page table entries to indirectly map the physical memory
> into a virtual memory address (I think around 128MB or so worth page
> table entries are reused for this purpose). On the other hand, on 64
> bit architectures, the entire physical memory is directly mapped and
> accessible to the kernel. ZONE_HIGHMEM doesn't exist on 64 bit.
>
> Take the above with a grain of salt, someone with a better knowledge
> about this intrusive topic can be give a more detailed explanation :)
>
The ELF ABI specifies that user space has 3 GB available to it. That
leaves 1 GB for the kernel. The kernel, by default, uses 128 MB for I/O
mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
All of these boundaries are configurable; with PAE enabled the user
space boundary has to be on a 1 GB boundary.
-hpa
On Tue, Apr 6, 2010 at 11:17 AM, H. Peter Anvin <[email protected]> wrote:
> On 04/06/2010 08:02 AM, Joel Fernandes wrote:
>> Hi Hayfeng,
>>
>> On Tue, Apr 6, 2010 at 8:07 PM, hayfeng Lee <[email protected]> wrote:
>>> hello,every one.
>>> I have a question:
>>> Why does linux choose 896MB to do a start point of ZONE_HIGHMEM and
>>> the end point of ZONE_NORMAL. Just for experience?
>>> What is the advantages?
>>
>> This is not an advantage but a limitation of 32 bit processor and
>> architecture. Only physical memory in first 896MB ?is directly mapped
>> to the kernel virtual memory address space. This is called
>> ZONE_NORMAL. To access any physical memory in ZONE_HIGHMEM, the kernel
>> has to set up page table entries to indirectly map the physical memory
>> into a virtual memory address (I think around 128MB or so worth page
>> table entries are reused for this purpose). On the other hand, on 64
>> bit architectures, the entire physical memory is directly mapped and
>> accessible to the kernel. ZONE_HIGHMEM doesn't exist on 64 bit.
>>
>> Take the above with a grain of salt, someone with a better knowledge
>> about this intrusive topic can be give a more detailed explanation :)
>>
>
> The ELF ABI specifies that user space has 3 GB available to it. ?That
> leaves 1 GB for the kernel. ?The kernel, by default, uses 128 MB for I/O
> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>
> All of these boundaries are configurable; with PAE enabled the user
> space boundary has to be on a 1 GB boundary.
>
> ? ? ? ?-hpa
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at ?http://www.tux.org/lkml/
>
the VM split is also configurable when building the kernel (for 32-bit
processors).
On 04/06/2010 12:20 PM, Frank Hu wrote:
>>
>> The ELF ABI specifies that user space has 3 GB available to it. That
>> leaves 1 GB for the kernel. The kernel, by default, uses 128 MB for I/O
>> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>>
>> All of these boundaries are configurable; with PAE enabled the user
>> space boundary has to be on a 1 GB boundary.
>>
>
> the VM split is also configurable when building the kernel (for 32-bit
> processors).
I did say "all these boundaries are configurable". Rather explicitly.
-hpa
Hi Peter,
On Wed, Apr 7, 2010 at 1:14 AM, H. Peter Anvin <[email protected]> wrote:
> On 04/06/2010 12:20 PM, Frank Hu wrote:
>>>
>>> The ELF ABI specifies that user space has 3 GB available to it. ?That
>>> leaves 1 GB for the kernel. ?The kernel, by default, uses 128 MB for I/O
>>> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>>>
>>> All of these boundaries are configurable; with PAE enabled the user
>>> space boundary has to be on a 1 GB boundary.
>>>
>>
>> the VM split is also configurable when building the kernel (for 32-bit
>> processors).
>
> I did say "all these boundaries are configurable". ?Rather explicitly.
>
I thought the 896 MB was a hardware limitation on 32 bit architectures
and something that cannot be configured? Or am I missing something
here? Also the vm-splits refer to "virtual memory" . While ZONE_* and
the 896MB we were discussing refers to "physical memory". How then is
discussing about vm splits pertinent here?
Thanks,
-Joel
On 04/06/2010 01:01 PM, Joel Fernandes wrote:
> Hi Peter,
>
> On Wed, Apr 7, 2010 at 1:14 AM, H. Peter Anvin <[email protected]> wrote:
>> On 04/06/2010 12:20 PM, Frank Hu wrote:
>>>>
>>>> The ELF ABI specifies that user space has 3 GB available to it. That
>>>> leaves 1 GB for the kernel. The kernel, by default, uses 128 MB for I/O
>>>> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>>>>
>>>> All of these boundaries are configurable; with PAE enabled the user
>>>> space boundary has to be on a 1 GB boundary.
>>>>
>>>
>>> the VM split is also configurable when building the kernel (for 32-bit
>>> processors).
>>
>> I did say "all these boundaries are configurable". Rather explicitly.
>>
>
> I thought the 896 MB was a hardware limitation on 32 bit architectures
> and something that cannot be configured? Or am I missing something
> here? Also the vm-splits refer to "virtual memory" . While ZONE_* and
> the 896MB we were discussing refers to "physical memory". How then is
> discussing about vm splits pertinent here?
>
It's not a hardware limitation. Rather, it has to do with how the 4 GB
of virtual address space is carved up. LOWMEM specifically refers to
the amount of memory which is permanently mapped into the virtual
address space, whereas HIGHMEM is mapped in and out on demand -- a
fairly expensive operation.
-hpa
On Tue, Apr 6, 2010 at 12:44 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/06/2010 12:20 PM, Frank Hu wrote:
>>>
>>> The ELF ABI specifies that user space has 3 GB available to it. ?That
>>> leaves 1 GB for the kernel. ?The kernel, by default, uses 128 MB for I/O
>>> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>>>
>>> All of these boundaries are configurable; with PAE enabled the user
>>> space boundary has to be on a 1 GB boundary.
>>>
>>
>> the VM split is also configurable when building the kernel (for 32-bit
>> processors).
>
> I did say "all these boundaries are configurable". ?Rather explicitly.
>
> ? ? ? ?-hpa
>
thought that you can only configure how to split the VM like 1G/3G or
2G/2G. But the DMA zone size, the 128MB space for I/O is not
configurable. The NORMAL zone size will be deducted based on the VM
Split and some hard coded DMA zone and 128 MB space size.
I am not a guru in this space... so I might be wrong.
On 04/06/2010 01:15 PM, Frank Hu wrote:
>
> thought that you can only configure how to split the VM like 1G/3G or
> 2G/2G. But the DMA zone size, the 128MB space for I/O is not
> configurable. The NORMAL zone size will be deducted based on the VM
> Split and some hard coded DMA zone and 128 MB space size.
>
> I am not a guru in this space... so I might be wrong.
And you are. The vmalloc zone (not DMA zone -- that's something
entirely different) is configurable via the vmalloc= kernel command line
option.
-hpa
On 04/06/2010 04:27 PM, Youngwhan Song wrote:
> Nice explanation, Venkatram,
>
> Just one question pop up mind.
>
> What if actual physical memory is only 256MB? How does kernel divide
> virtual memory? Do we need to specify the region to kernel? Or will
> kernel itself decide it automatically?
>
If there is less than 896 MB of physical memory, the vmalloc region is
automatically extended (in your case, it will be 768 MB in size.) There
will be no HIGHMEM in such a case, and if you are compiling your own
kernel you will gain considerable speed by disabling HIGHMEM support
completely.
This, of course, was the norm back when Linux was first created, and a
typical amount of memory was 8 MB or so. That we'd have gigabytes of
memory seemed very distant at the time.
-hpa
If the last 128MB out of the kernel 1GB space is used to for highmen,
meanwhile it's also used for IO/vmalloc, how does this work?
Xianghua
On Tue, Apr 6, 2010 at 6:32 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/06/2010 04:27 PM, Youngwhan Song wrote:
>> Nice explanation, Venkatram,
>>
>> Just one question pop up mind.
>>
>> What if actual physical memory is only 256MB? How does kernel divide
>> virtual memory? Do we need to specify the region to kernel? Or will
>> kernel itself decide it automatically?
>>
>
> If there is less than 896 MB of physical memory, the vmalloc region is
> automatically extended (in your case, it will be 768 MB in size.) There
> will be no HIGHMEM in such a case, and if you are compiling your own
> kernel you will gain considerable speed by disabling HIGHMEM support
> completely.
>
> This, of course, was the norm back when Linux was first created, and a
> typical amount of memory was 8 MB or so. That we'd have gigabytes of
> memory seemed very distant at the time.
>
> -hpa
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On 04/06/2010 07:04 PM, Venkatram Tummala wrote:
> Hey Xiao,
>
> last 128MB is not used for highmem. last 128MB is used for data
> structures(page tables etc.) to support highmem . Highmem is not
> something which is "INSIDE" Kernel's Virtual Address space. Highmem
> refers to a region of "Physical memory" which can be mapped into
> kernel's virtual address space through page tables.
>
> Regards,
> Venkatram Tummala
>
Not quite.
The vmalloc region is for *anything which is dynamically mapped*, which
includes I/O, vmalloc, and HIGHMEM (kmap).
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 04/06/2010 09:05 PM, Chetan Nanda wrote:
>
> I have a question here, what if I have a 32bit system with 2GB of RAM,
> so in that case my 896MB - 2GB RAM would be in accessible?
> What my understanding on the subject is:
> Only 896 MB of physical RAM is directly mapped on to kernel 1G virtual
> address space. And we still require page table settings to do that (page
> table would be identity mapping). But for rest of RAM, i.e. whenever
> there is need to access physical RAM beyond 896MB then that page will be
> mapped on to pages from 128MB kernel virtual address space (1GB - 896MB
> = 128MB), and AFAIK kmap is just for that.
>
> Please correct me if i am wrong
>
Correct.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On 04/06/2010 10:57 PM, Venkatram Tummala wrote:
> Just a note Chetan.
>
> We can't exactly say that we require "page table settings" to map that
> 896 MB of physical ram. It is an identity mapped segment (1-1 mapping).
> So, we dont require the "page tables". Virtual address will be equal to
> Physical Address + Page Offset. It is just an addition of offset
>
No, we still need page tables for the identity-mapped segment.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wed, Apr 7, 2010 at 12:48 AM, Venkatram Tummala
<[email protected]> wrote:
> I completely agree with you. I was just trying to clarify Xianghua's
> statement "last 128 MB is used for HIGHMEM". I got the feeling that he
> thought that last 128MB can be used for vmalloc, IO and for HIGHMEM. So, i
> was clarifying that last 128MB is not "used for highmem" but it is used to
> support highmem.(among many other things). That was what i intended.
>
> On Tue, Apr 6, 2010 at 7:09 PM, H. Peter Anvin <[email protected]> wrote:
>>
>> On 04/06/2010 07:04 PM, Venkatram Tummala wrote:
>> > Hey Xiao,
>> >
>> > last 128MB is not used for highmem. last 128MB is used for data
>> > structures(page tables etc.) to support highmem . Highmem is not
>> > something which is "INSIDE" Kernel's Virtual Address space. Highmem
>> > refers to a region of "Physical memory" which can be mapped into
>> > kernel's virtual address space through page tables.
>> >
>> > Regards,
>> > Venkatram Tummala
>> >
>>
>> Not quite.
>>
>> The vmalloc region is for *anything which is dynamically mapped*, which
>> includes I/O, vmalloc, and HIGHMEM (kmap).
>>
>> -hpa
>>
>> --
>> H. Peter Anvin, Intel Open Source Technology Center
>> I work for Intel. I don't speak on their behalf.
>>
>
>
Thanks Venkatram, do these sound right:
1. All HIGHMEM(physical address beyond 896MB) are kmapped back into
the last 128MB kernel "virtual" address space(using page tables stored
in the last 128MB physical address). That also implies it's a very
limited virtual space for large memory system and need do kunmap when
you're done with it(so you can kmap other physical memories in).
I'm not familiar with large-memory systems, not sure how kmap cope
with that using this limited 128M window assuming kernel is 1:3 split.
2. The last 128MB physical address can be used for page tables(kmap),
vmalloc, IO,etc
Regards,
Xianghua
Frank Hu <[email protected]> writes:
> thought that you can only configure how to split the VM like 1G/3G or
> 2G/2G. But the DMA zone size, the 128MB space for I/O is not
> configurable. The NORMAL zone size will be deducted based on the VM
> Split and some hard coded DMA zone and 128 MB space size.
Only the DMA zone size if fixed - a hardware property of PC/AT-style DMA
controllers and ISA bus.
--
Krzysztof Halasa
Dear Venkatram,
Thanks for your? hot heart and detailed explaination.
Your opinion is just? for the balance with choosing 896MB ?
Then I want to konw wheather is the decision for 896? based on a lot
of? experiments.
I think it is an important things .
Best wishes.
2010/4/7 Venkatram Tummala <[email protected]>
>
> First Of All,
>
> Total Virtual address space of the user process + Virtual address space of the kernel should ALWAYS be equal to 4GB. (on 32-bit)
>
> So, you have to luxury to decide the split between user address space + kernel address space.
>
> Lets the consider the two exterme alternatives to choosing 896MB
>
> First Extreme : You dont want to? choose too less memory for identity mapped segment (eg. 512MB instead of 896MB) because you don't all the addition memory for vmalloc area. As identity mapped segment is 1-1 mapping, finding & accessing the corresponding physical address is easier & faster. You want to have the max. memory possible in this identity mapped segment.
>
> Second Extreme : You dont to choose too high memory either (eg. 960MB instead of 896 MB) because that will leave you with insufficient memory for the vmalloc area.
>
> So, you have to balance the two extremes. Kernel Guys decided that 128MB (1024MB - 896MB) is sufficient for vmalloc area. So, the rest of the address space is 1-1 mapped.
>
> Hope this is clear.
>
> Regards,
> Venkatram Tummala
>
>
>
>
>
>
> On Tue, Apr 6, 2010 at 10:08 PM, tek-life <[email protected]> wrote:
>>
>> Thanks for you so detailed answer.
>> But I ?am also confused that why don't we choose 64MB physical memory
>> spaces for ZONE_HIGHMEM.
>> It's known that in the initialization of the memory ,kernel will
>> create main page table ,and the kernel use physical address + 3G to
>> map . then the kernel map from 0 to 896MB physical memory spaces to
>> 3G~3G+896MB virtual spaces. why cann't we map 0~512MB to 3G ~ 3G+512MB
>> , and the left (>512MB) is given to ZONE_HIGHMEM for dynamic mapping?
>> The focus is 896MB not others.
>> why choose 896 ? why not 512 or 960 or others?
>>
>> 2010/4/7 Venkatram Tummala <[email protected]>
>> >
>> > Joel,
>> >
>> > To make things clear, 896 MB is not a hardware limitation. The 3GB:1GB split can be configured during the kernel build but the split cannot be changed dynamically.
>> >
>> > you are correct that ZONE_* refers to grouping of physical memory but the very concept of ZONES is logical and not physical.
>> >
>> > Now, why does ZONE_NORMAL has only 896MB on a 32 bit system?
>> >
>> > If you recall the concept of virtual memory, you will remember that its aim is to provide a illusion to the user processes that it has all the theoritical maximum memory possible on that specific architecture, which is 4GB in this case, and that that is only process running on the system. The kernel internally deals with pages, swapping in & out pages to create this illusion. The advantage is that user processes does not have to care about how much physical memory is actually present in the system.
>> >
>> > So, out of this 4GB, it was conceptually decided that 3GB is the process's virtual address space and 1GB is the kernel virtual address space. The kernel maps these 3GB of user processes' virtual address space to physical memory using page tables. The kernel can just address 1GB of virtual addresses. This 1GB of virtual addresses is directly mapped (1-1 mapping) into the physical memory without using page tables. If the kernel wants to address more virtual addresses, it has to kmap the high memory(ZONE_HIGHMEM) which sets up the page tables etc. So, you can imagine this as : "Whenever a context switch occurs, 3GB virtual address space of the previous running process will be replaced by the virtual address space of the newly selected process, and the 1GB always remains with the kernel." Note that all this is virtual (That is, conceptual), this is only an illusion.
>> >
>> > So, out of this 1GB of kernel virtual address space that is 1-1 mapped into the physical memory(without requiring page tables), 0-16MB is used by device drivers, 896MB - 1024MB is used by the kernel for vmalloc, kmap, etc which leaves (16MB - 896MB) and this range is "called" ZONE_NORMAL.
>> >
>> > Giving specific emphasis to the word "called" in the previous sentence.
>> >
>> > In summary, the kernel can only access 896 MB of physical ram because it only has 1GB of virtual address space available out of which the lower 16MB is used for DMA by device drivers and the 896MB-1024MB is used to support kmap, vmalloc etc. And note that this limitation is not because of the hardware but this is because of the conceptualization of the division of virtual address space into user address space & kernel address space.
>> >
>> > For example, you can make the split 2G-2G instead of 3G-1G. So, the kernel can now use 2GB of virtual address space (directly mapped to 2GB of physical memory). You can also make the split 1GB:3GB instead of 3GB:1GB as already explained.
>> >
>> > Hope this clears the confusion.
>> >
>> > Regards,
>> > Venkatram Tummala
>> >
>> >
>> > On Tue, Apr 6, 2010 at 1:01 PM, Joel Fernandes <[email protected]> wrote:
>> >>
>> >> Hi Peter,
>> >>
>> >> On Wed, Apr 7, 2010 at 1:14 AM, H. Peter Anvin <[email protected]> wrote:
>> >> > On 04/06/2010 12:20 PM, Frank Hu wrote:
>> >> >>>
>> >> >>> The ELF ABI specifies that user space has 3 GB available to it. ?That
>> >> >>> leaves 1 GB for the kernel. ?The kernel, by default, uses 128 MB for I/O
>> >> >>> mapping, vmalloc, and kmap support, which leaves 896 MB for LOWMEM.
>> >> >>>
>> >> >>> All of these boundaries are configurable; with PAE enabled the user
>> >> >>> space boundary has to be on a 1 GB boundary.
>> >> >>>
>> >> >>
>> >> >> the VM split is also configurable when building the kernel (for 32-bit
>> >> >> processors).
>> >> >
>> >> > I did say "all these boundaries are configurable". ?Rather explicitly.
>> >> >
>> >>
>> >> I thought the 896 MB was a hardware limitation on 32 bit architectures
>> >> and something that cannot be configured? Or am I missing something
>> >> here? Also the vm-splits refer to "virtual memory" . While ZONE_* and
>> >> the 896MB we were discussing refers to "physical memory". How then is
>> >> discussing about vm splits pertinent here?
>> >>
>> >> Thanks,
>> >> -Joel
>> >>
>> >> --
>> >> To unsubscribe from this list: send an email with
>> >> "unsubscribe kernelnewbies" to [email protected]
>> >> Please read the FAQ at http://kernelnewbies.org/FAQ
>> >>
>> >
>>
>
On 04/07/2010 09:48 AM, Himanshu Aggarwal wrote:
> I think for some architectures, the position of highmem is constrained
> by hardware as well. It is not always a kernel decision and not always
> configurable as in case of x86.
This is correct.
> In case of MIPS32, low memory is between 0 and 512 MB and high memory
> starts above 512 MB. Also the user space is of size 2 GB.
>
> Please see the definition of macros PAGE_OFFSET and HIGHMEM_START at :
> http://lxr.linux.no/linux+v2.6.33/arch/mips/include/asm/mach-generic/spaces.h
Right so far...
> This is because MIPS32 processors have KSEG0 and KSEG1 segments lying
> between 0 and 512 MB and KSEG2/3 lies above it.
>
> May be someone on the group can confirm this.
Wrong. I have to say this thread has been just astonishing in the
amount of misinformation.
On MIPS32, userspace is 0-2 GB, kseg0 is 2.0-2.5 GB and kseg1 is 2.5-3.0
GB. kseg2/3 (3.0-4.0 GB), which invokes the TLB, is used for the
vmalloc/iomap/kmap area.
LOWMEM has to fit inside kseg0, so LOWMEM is limited to 512 MB in thie
current Linux implementation.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Wed, Apr 7, 2010 at 10:44 PM, H. Peter Anvin <[email protected]> wrote:
> On 04/07/2010 09:48 AM, Himanshu Aggarwal wrote:
>> I think for some architectures, the position of highmem is constrained
>> by hardware as well. ?It is not always a kernel decision and not always
>> configurable as in case of x86.
>
>
> This is correct.
>
>> In case of MIPS32, low memory is between 0 and 512 MB and high memory
>> starts above 512 MB. Also the user space is of size 2 GB.
>>
>> Please see the definition of macros PAGE_OFFSET and HIGHMEM_START at :
>> http://lxr.linux.no/linux+v2.6.33/arch/mips/include/asm/mach-generic/spaces.h
>
> Right so far...
>
>> This is because MIPS32 processors have KSEG0 and KSEG1 segments lying
>> between 0 and 512 MB and KSEG2/3 lies above it.
>>
>> May be someone on the group can confirm this.
>
> Wrong. ?I have to say this thread has been just astonishing in the
> amount of misinformation.
>
> On MIPS32, userspace is 0-2 GB, kseg0 is 2.0-2.5 GB and kseg1 is 2.5-3.0
> GB. ?kseg2/3 (3.0-4.0 GB), which invokes the TLB, is used for the
> vmalloc/iomap/kmap area.
>
> LOWMEM has to fit inside kseg0, so LOWMEM is limited to 512 MB in thie
> current Linux implementation.
http://www.johnloomis.org/microchip/pic32/memory/memory.html
So what is the memory division here in mips, again 1:3?
kseg2 is already 1 GB address space?
-Nobin