2005-10-11 20:16:35

by Jonathan M. McCune

[permalink] [raw]
Subject: using segmentation in the kernel

Hello,

We're starting work on a project for the 32-bit x86 Linux kernel that
involves using segmentation in the kernel. As a first effort, we'd
like to adjust the kernel code and data segment descriptors so that
the kernel code, and data segment, bss, heap and stack exist in linear
address range between 3GB and 4 GB. How could we implment this so that
it breaks the memory management subsystem the least (or not at all if
we are lucky ;-))?

Our current thinking is to modify only the base address and the limit
of the the kernel code and data segment descriptors (_KERNEL_CS and
_KERNEL_DS). We set the base address to 3GB and the limit to 1GB. We
would also change the kernel linker script (vmlinux.lds.S) by removing
the relocation caused by PAGE_OFFSET. This would mean that the kernel
would be linked to start at address 0 + 1MB in logical address
space. Since we would set the base address of the kernel code and data
segment descriptors to 3GB, the processor would translate all
addresses emitted by the kernel so that the kernel would use addresses
of 3GB + 1MB and above in the linear address space. Hopefully, this
would mean that the all the paging code in the kernel would continue
to work correctly.

We do not understand the mm subsystem well enough to figure out if our
method would work at all or if it works what things in the mm
subsystem would be likely to break. Can someone who understands the mm
subsystem please help us here?


Thanks!
-Jon


Attachments:
smime.p7s (3.10 kB)
S/MIME Cryptographic Signature

2005-10-11 20:35:28

by Brian Gerst

[permalink] [raw]
Subject: Re: using segmentation in the kernel

Jonathan M. McCune wrote:
> Hello,
>
> We're starting work on a project for the 32-bit x86 Linux kernel that
> involves using segmentation in the kernel. As a first effort, we'd
> like to adjust the kernel code and data segment descriptors so that
> the kernel code, and data segment, bss, heap and stack exist in linear
> address range between 3GB and 4 GB. How could we implment this so that
> it breaks the memory management subsystem the least (or not at all if
> we are lucky ;-))?

Why send the kernel back to the 2.0 days? There is no valid reason for
doing this with they way x86 segmentation works, which is why it was
done away with in 2.1.

--
Brian Gerst

2005-10-11 20:55:22

by Alon Bar-Lev

[permalink] [raw]
Subject: Re: using segmentation in the kernel

Brian Gerst wrote:
> Jonathan M. McCune wrote:
>
>> Hello,
>>
> Why send the kernel back to the 2.0 days? There is no valid reason for
> doing this with they way x86 segmentation works, which is why it was
> done away with in 2.1.
>

But with segmentation you can set code to be read-only,
disallow execution from stack, separate modules so that they
will not affect kernel and more...

The main problem with segmentation is that it is x86 specific...

Best Regards,
Alon Bar-Lev.

2005-10-11 21:12:21

by Al Viro

[permalink] [raw]
Subject: Re: using segmentation in the kernel

On Tue, Oct 11, 2005 at 10:24:46PM +0200, Alon Bar-Lev wrote:
> Brian Gerst wrote:
> >Jonathan M. McCune wrote:
> >
> >>Hello,
> >>
> >Why send the kernel back to the 2.0 days? There is no valid reason for
> >doing this with they way x86 segmentation works, which is why it was
> >done away with in 2.1.
> >
>
> But with segmentation you can set code to be read-only,
> disallow execution from stack, separate modules so that they
> will not affect kernel and more...

You do realize that it's a BS, don't you?

* attacker that would rewrite kernel code can switch a pointer to method in
any of the method tables (or pointer to the entire method table, while we are
at it).
* overwriting return address is trivial if you got stack smashing and there
is a plenty of interesting functions in the kernel ready to elevate priveleges
* modules rely on practically complete access to kernel data structures, so
no amount of playing with rings will change anything for them.

2005-10-11 21:15:34

by Brian Gerst

[permalink] [raw]
Subject: Re: using segmentation in the kernel

Alon Bar-Lev wrote:
> Brian Gerst wrote:
>
>> Jonathan M. McCune wrote:
>>
>>> Hello,
>>>
>> Why send the kernel back to the 2.0 days? There is no valid reason
>> for doing this with they way x86 segmentation works, which is why it
>> was done away with in 2.1.
>>
>
> But with segmentation you can set code to be read-only, disallow
> execution from stack, separate modules so that they will not affect
> kernel and more...
>
> The main problem with segmentation is that it is x86 specific...

Too much pain for for not enough gain. Segments are not fine-grained
enough to work well. Look at the PaX and execshield hacks for
userspace. You are far better off working at the page-table level (RO
and NX pages) which has the advantage of being portable.

--
Brian Gerst

2005-10-12 09:05:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: using segmentation in the kernel

On Tue, 2005-10-11 at 22:24 +0200, Alon Bar-Lev wrote:
> Brian Gerst wrote:
> > Jonathan M. McCune wrote:
> >
> >> Hello,
> >>
> > Why send the kernel back to the 2.0 days? There is no valid reason for
> > doing this with they way x86 segmentation works, which is why it was
> > done away with in 2.1.
> >
>
> But with segmentation you can set code to be read-only,

you can do that without segmentation too, absolutely no problem

> disallow execution from stack,

That is why CPUs have NX nowadays. And it's not like the kernel is full
of buffer overflows; due to the 4Kb stack space (total), there are very
very few static buffers on the stack at all; simply because theres no
space to do it.

> separate modules so that they
> will not affect kernel and more...

and I don't believe this one yota. THe only way to do this is to run
modules in ring 1, at which point you are in deep shit anyway.



2005-10-12 13:03:41

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: using segmentation in the kernel


On Tue, 11 Oct 2005, Jonathan M. McCune wrote:

> Hello,
>
> We're starting work on a project for the 32-bit x86 Linux kernel that
> involves using segmentation in the kernel. As a first effort, we'd
> like to adjust the kernel code and data segment descriptors so that
> the kernel code, and data segment, bss, heap and stack exist in linear
> address range between 3GB and 4 GB. How could we implment this so that
> it breaks the memory management subsystem the least (or not at all if
> we are lucky ;-))?
>
> Our current thinking is to modify only the base address and the limit
> of the the kernel code and data segment descriptors (_KERNEL_CS and
> _KERNEL_DS). We set the base address to 3GB and the limit to 1GB. We
> would also change the kernel linker script (vmlinux.lds.S) by removing
> the relocation caused by PAGE_OFFSET. This would mean that the kernel
> would be linked to start at address 0 + 1MB in logical address
> space. Since we would set the base address of the kernel code and data
> segment descriptors to 3GB, the processor would translate all
> addresses emitted by the kernel so that the kernel would use addresses
> of 3GB + 1MB and above in the linear address space. Hopefully, this
> would mean that the all the paging code in the kernel would continue
> to work correctly.
>
> We do not understand the mm subsystem well enough to figure out if our
> method would work at all or if it works what things in the mm
> subsystem would be likely to break. Can someone who understands the mm
> subsystem please help us here?
>
>
> Thanks!
> -Jon
>

On the ix86 you have a problem. Let's say that you write some
code from scratch, that runs the CPU in 32-bit linear address-mode
without paging. Then you want to activate paging. To activate
paging, you MUST have provided some code and some data-space for
your descriptors, where there is a 1:1 mapping between virtual
and bus addresses. If you don't do this, at the instant you
change to paging mode, you crash. The CPU fetches garbage.

This is why the first few megabytes of Linux are unity-mapped.
You will always need to run the kernel out of an area where
a portion of that "segment" is unity-mapped. That segment
is where the descriptors for addressing, paging, and interrupts
must reside.

If you truly wanted to run the kernel from 3-4 GB as you state,
you must have RAM there, i.e., some physical stuff so that
a 1:1 mapping could be implemented. The 3-4 GB region is
where a lot of PCI addressing occurs on 32-bit machines and,
in fact, there are some "do-not-touch" addresses in that
region as well.

Remember that the kernel runs in virtual address mode, but
the descriptors that specify that mode need to be in physical
memory, addressed at the same offset. You can experiment
by making a module that attempts to turn off paging and
then turn it back on. The kernel will crash instantly.
However, if you write some code somewhere in low address-
space where the startup code already exists, that turns
off paging, then turns it back on; and your module code
calls this other code, the machine will work fine. You
need the interrupts off when you play.

So, basically you can't do what you want with any OS that
uses ix86 type CPUs. The question is; "What was it that
you really wanted to do?". What you gave us was the
"implementation details". What I want to know is what
you intend to accomplish. The ix86 architecture lends itself
to a lot of interesting things so if I knew your intentions
I might be able to help.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.48 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-10-12 15:39:05

by Alan

[permalink] [raw]
Subject: Re: using segmentation in the kernel

On Mer, 2005-10-12 at 11:05 +0200, Arjan van de Ven wrote:
> > separate modules so that they
> > will not affect kernel and more...
>
> and I don't believe this one yota. THe only way to do this is to run
> modules in ring 1, at which point you are in deep shit anyway.

Not neccessarily. Its how Xen works on x86-32 for example. It keeps
itself protected from the entire Linux instance by using segmentation on
32bit processors (not 64bit however as x86-64 has no segments in 64bit)

Doing that without major work on the kernel itself would be hard, and
you'd need to isolate out things like page table updates and verify them
whenever modules wanted to touch such stuff

Alan

2005-10-12 15:44:52

by Arjan van de Ven

[permalink] [raw]
Subject: Re: using segmentation in the kernel


> > and I don't believe this one yota. THe only way to do this is to run
> > modules in ring 1, at which point you are in deep shit anyway.
>
> Not neccessarily. Its how Xen works on x86-32 for example. It keeps
> itself protected from the entire Linux instance by using segmentation on

it only works if you make a very small syscall-like area which you use
to talk to the "real" kernel. Which is entirely not how linux modules
work right now.... at which point you're just about a userspace
application anyway. Might be an interesting research project of
course...


> 32bit processors (not 64bit however as x86-64 has no segments in 64bit)

afaik x86-64 grew segments recently for 64 bit mode for an unnamed other
virtualization vendor


2005-10-12 23:57:12

by Jonathan M. McCune

[permalink] [raw]
Subject: Re: using segmentation in the kernel

Alan Cox wrote:

>On Mer, 2005-10-12 at 11:05 +0200, Arjan van de Ven wrote:
>
>
>>> separate modules so that they
>>>will not affect kernel and more...
>>>
>>>
>>and I don't believe this one yota. THe only way to do this is to run
>>modules in ring 1, at which point you are in deep shit anyway.
>>
>>
>
>Not neccessarily. Its how Xen works on x86-32 for example. It keeps
>itself protected from the entire Linux instance by using segmentation on
>32bit processors (not 64bit however as x86-64 has no segments in 64bit)
>
>Doing that without major work on the kernel itself would be hard, and
>you'd need to isolate out things like page table updates and verify them
>whenever modules wanted to touch such stuff
>
>Alan
>
>
>

linux-os (Dick Johnson) wrote:

>On the ix86 you have a problem. Let's say that you write some
>code from scratch, that runs the CPU in 32-bit linear address-mode
>without paging. Then you want to activate paging. To activate
>paging, you MUST have provided some code and some data-space for
>your descriptors, where there is a 1:1 mapping between virtual
>and bus addresses. If you don't do this, at the instant you
>change to paging mode, you crash. The CPU fetches garbage.
>
>This is why the first few megabytes of Linux are unity-mapped.
>You will always need to run the kernel out of an area where
>a portion of that "segment" is unity-mapped. That segment
>is where the descriptors for addressing, paging, and interrupts
>must reside.
>
>If you truly wanted to run the kernel from 3-4 GB as you state,
>you must have RAM there, i.e., some physical stuff so that
>a 1:1 mapping could be implemented. The 3-4 GB region is
>where a lot of PCI addressing occurs on 32-bit machines and,
>in fact, there are some "do-not-touch" addresses in that
>region as well.
>
>Remember that the kernel runs in virtual address mode, but
>the descriptors that specify that mode need to be in physical
>memory, addressed at the same offset. You can experiment
>by making a module that attempts to turn off paging and
>then turn it back on. The kernel will crash instantly.
>However, if you write some code somewhere in low address-
>space where the startup code already exists, that turns
>off paging, then turns it back on; and your module code
>calls this other code, the machine will work fine. You
>need the interrupts off when you play.
>
>So, basically you can't do what you want with any OS that
>uses ix86 type CPUs. The question is; "What was it that
>you really wanted to do?". What you gave us was the
>"implementation details". What I want to know is what
>you intend to accomplish. The ix86 architecture lends itself
>to a lot of interesting things so if I knew your intentions
>I might be able to help.
>
>Cheers,
>Dick Johnson
>

Hello,

Thanks for all the responses. The project we are working on does
involve the use of Xen, so we have the advantage of Xen's taking care
of the bootstrapping hassles with "unity mapping" parts of the kernel.
To put it another way, the architecture we are really interested in is
xen-i386. We are curious about the implications of restricting the
kernel's code and data segments such that the kernel cannot read/write
user space directly. We want to set the base address of the Kernel
segment descriptors to 3GB and the limit to 1GB-64MB ( Xen uses the
top 64 MB). We were just wondering if the best way to achieve this
would be to change the kernel linker script and the segment base
addresses appropriately. Any insight into whether this would work at
all or what would work and how to debug something like this would be
greatly appreciated.

Thanks,
-Jon


Attachments:
smime.p7s (3.10 kB)
S/MIME Cryptographic Signature

2005-10-13 08:52:04

by Denis Vlasenko

[permalink] [raw]
Subject: Re: using segmentation in the kernel

On Wednesday 12 October 2005 16:03, linux-os (Dick Johnson) wrote:
> On the ix86 you have a problem. Let's say that you write some
> code from scratch, that runs the CPU in 32-bit linear address-mode
> without paging. Then you want to activate paging. To activate
> paging, you MUST have provided some code and some data-space for
> your descriptors, where there is a 1:1 mapping between virtual
> and bus addresses. If you don't do this, at the instant you
> change to paging mode, you crash. The CPU fetches garbage.
>
> This is why the first few megabytes of Linux are unity-mapped.
> You will always need to run the kernel out of an area where
> a portion of that "segment" is unity-mapped. That segment
> is where the descriptors for addressing, paging, and interrupts
> must reside.
>
> If you truly wanted to run the kernel from 3-4 GB as you state,
> you must have RAM there, i.e., some physical stuff so that
> a 1:1 mapping could be implemented. The 3-4 GB region is
> where a lot of PCI addressing occurs on 32-bit machines and,
> in fact, there are some "do-not-touch" addresses in that
> region as well.

This is untrue. After paging is enabled, you can jump
to non-unity mapped location and remove small unity-mapped region.

> Remember that the kernel runs in virtual address mode, but
> the descriptors that specify that mode need to be in physical
> memory, addressed at the same offset. You can experiment

Some of us are smart enough to add an offset when doing virt<->phys
conversion, if needed.
--
vda