2005-04-13 04:25:29

by Tomko

[permalink] [raw]
Subject: Why system call need to copy the date from the userspace before using it

Hi all,

I am new to linux , hope someone can help me.
While i am reading the source code of the linux system call , i find
that the system call need to call copy_from_user() to copy the data from
user space to kernel space before using it . Why not use it directly as
the system call has got the address ? Furthermore , how to distinguish
between user space and kernel space ?

Thx a lot,

TOM


2005-04-13 05:30:41

by Vadim Lobanov

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Wed, 13 Apr 2005, Tomko wrote:

> Hi all,
>
> I am new to linux , hope someone can help me.
> While i am reading the source code of the linux system call , i find
> that the system call need to call copy_from_user() to copy the data from
> user space to kernel space before using it . Why not use it directly as
> the system call has got the address ? Furthermore , how to distinguish
> between user space and kernel space ?
>
> Thx a lot,
>
> TOM
> -

The quick and simple answer to this question is: data integrity.

The main thing to understand is that, from the perspective of the
kernel, any user input provided in the form of system calls must have
immutable data. Only if the data is immutable can the kernel code parse
it and decide what to do, without getting into really hairy race
conditions. And, for that matter, it's much simpler and less error-prone
to program code where you don't have to worry about the inputs changing
around you all the time.

So, you might say, what's wrong with the user code giving the kernel a
pointer to a userland buffer? After all, the calling task will be
blocked while the system call is being executed on its behalf. The
biggest problem is that the buffer can still be modified, while the
system call is executing, by another userland thread running in the same
virtual memory context. Or, for that matter, by another process that has
this chunk of memory shared with the original task. There are
innumerable ways for the data to potentially change in the middle of the
system call, and the simplest solution ends up being to copy the data to
kernelspace before working with it. That way, no userland tasks can
change it on you.

I'm sure there are other reasons for doing the copy, that someone will
be able to chime in with. Other input is always welcome. :-)

-Vadim Lobanov

2005-04-13 06:48:14

by Vadim Lobanov

[permalink] [raw]
Subject: RE: Why system call need to copy the date from the userspace before using it

On Wed, 13 Apr 2005, Eshwar wrote:

>
> >The quick and simple answer to this question is: data integrity.
>
> >The main thing to understand is that, from the perspective of the
> >kernel, any user input provided in the form of system calls must have
> >immutable data. Only if the data is immutable can the kernel code parse
> >it and decide what to do, without getting into really hairy race
> >conditions. And, for that matter, it's much simpler and less error-prone
> >to program code where you don't have to worry about the inputs changing
> >around you all the time.
>
> Does this approach lead to major performance bottleneck??
>

It should not be so much of a performance bottleneck -- this kind of
operation lends itself naturally to parallelization, since it has few
(if any) dependencies. The only race I can think of off-hand is the
exit() syscall, but I'm sure that's already handled elsewhere (just not
sure of the details) In the end, however, if you believe my previous
email, then you should believe that the copy has to happen in any case.

I don't have any actual data points on-hand. Perhaps someone else does?

-Vadim Lobanov

2005-04-13 10:30:53

by Jan-Benedict Glaw

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Wed, 2005-04-13 12:21:41 +0800, Tomko <[email protected]>
wrote in message <[email protected]>:
> While i am reading the source code of the linux system call , i find
> that the system call need to call copy_from_user() to copy the data from
> user space to kernel space before using it . Why not use it directly as
> the system call has got the address ? Furthermore , how to distinguish
> between user space and kernel space ?

Think about the memory access. The page that contains the data could be
swapped out, so the kernel isn't allowed to just access it, because it's
not there.

MfG, JBG

--
Jan-Benedict Glaw [email protected] . +49-172-7608481 _ O _
"Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg _ _ O
fuer einen Freien Staat voll Freier Bürger" | im Internet! | im Irak! O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));


Attachments:
(No filename) (934.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-04-13 10:44:20

by Tomko

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

hi ,
Thank you for your reply, can i ask some more question?

Inside the system call , the kernel often copy the data by calling
copy_from_user() rather than just using strcpy(), is it because the
memory mapping in kenel space is different from user space? for example
, now user program want to pass a pointer *a to kernel space , is it
true that *a seems means address 0xb000 to user space but actually it is
at 0xc000 at kernel space?

Thx a lot,
TOM

Jan-Benedict Glaw wrote:

>On Wed, 2005-04-13 12:21:41 +0800, Tomko <[email protected]>
>wrote in message <[email protected]>:
>
>
>>While i am reading the source code of the linux system call , i find
>>that the system call need to call copy_from_user() to copy the data from
>>user space to kernel space before using it . Why not use it directly as
>>the system call has got the address ? Furthermore , how to distinguish
>>between user space and kernel space ?
>>
>>
>
>Think about the memory access. The page that contains the data could be
>swapped out, so the kernel isn't allowed to just access it, because it's
>not there.
>
>MfG, JBG
>
>
>

2005-04-13 11:11:16

by Catalin Marinas

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

Tomko <[email protected]> wrote:
> Inside the system call , the kernel often copy the data by calling
> copy_from_user() rather than just using strcpy(), is it because the
> memory mapping in kenel space is different from user space?

No, it is because this function checks whether the access to the user
space address is OK. There are situations when it can also sleep (page
not present).

--
Catalin

2005-04-13 11:33:48

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Wed, 2005-04-13 at 12:21 +0800, Tomko wrote:
> Hi all,
>
> I am new to linux , hope someone can help me.
> While i am reading the source code of the linux system call , i find
> that the system call need to call copy_from_user() to copy the data from
> user space to kernel space before using it . Why not use it directly as
> the system call has got the address ? Furthermore , how to distinguish
> between user space and kernel space ?

Well, there are more than one reason. But, in general, you always need
to access user memory using specific accessors, like copy_to/from_user,
get/put_user, etc... Some of these reasons are:

- Userland can give you a bogus pointer. Doing a normal access from the
kernel via a bogus pointer can lead to all sort of funny things, which
you really do not want to happen. What if userland is giving a
destination pointer to the kernel that points to ... the kernel itself
or some of it's data structures ? that would be way to easy for userland
to cause the kernel to crash if the kernel "trusted" pointers from
userland. So one thing those functions do is to check the pointer to see
if it's within valid userland memory bounds

- Even if within valid memory bounds, it may still be bogus, that is
point to a page that is not mapped, or a destination pointer pointing to
a read-only page, or all sort of other fault caused by accessing it.
Those special access functions are designed to "recover" from there
errors. Instead of the kernel crashing/Oops'ing because of the bad
access, the kernel page fault handler will "notice" that the access
comes from one of these special function and will do some black magic so
that instead of crashing, the access function will just return with an
error that can then be passed back to userspace (usually EFAULT).

- Some architectures don't have user and kernel memory mapped at the
same time (think about x86 in 4G/4G mode for example). In that case,
accessing user memory requires some specific memory management tricks
that are architecture specific. Those functions take care of that.

There may even be more I don't have in mind at the moment, but the above
is already enough to justify having specific accessor functions for
kernel code to access userland originated pointers.

Ben.


2005-04-13 12:00:09

by Hacksaw

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

>>Why not use it directly
>Some of these reasons are:

It seems like you gave reason why userland pointers shouldn't be trusted, not
why userland data should be copied into kernel land. All the problems you
mentioned would have to be solved by the kernel regardless of copying the data
around.

Ummm... Except for the who's mapped now problem. That's pretty weird. I guess
that's something that comes with trying to use tons of RAM in a 32 bit system.

I thought the big issue was the need to lock the page(s) during the call, and
maybe some tricky races which made the idea difficult.
--
The key is realizing the whole world is stupid and being happy anyway
http://www.hacksaw.org -- http://www.privatecircus.com -- KB1FVD


2005-04-13 12:40:46

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Wed, 13 Apr 2005, Hacksaw wrote:

>>> Why not use it directly
>> Some of these reasons are:
>
> It seems like you gave reason why userland pointers shouldn't be trusted, not
> why userland data should be copied into kernel land. All the problems you
> mentioned would have to be solved by the kernel regardless of copying the data
> around.
>

You do not seem to understand. Assume that I did a read(fd, buf, len) and
the length would overflow a user-mode buffer. One needs to make sure
that the kernel is protected and the user gets a seg-fault. Since the
kernel, in kernel mode, can do anything it pleases, including destroying
itself, one needs to make sure that it won't. Therefore a special
kind of memcpy() was designed, called copy_to/from_user to protect the
kernel.

> Ummm... Except for the who's mapped now problem. That's pretty weird. I guess
> that's something that comes with trying to use tons of RAM in a 32 bit system.
>
> I thought the big issue was the need to lock the page(s) during the call, and
> maybe some tricky races which made the idea difficult.
> --

The kernel does NOT have to copy data from user-space before
using it. In fact, user-mode pointers are valid in kernel-space
when the kernel is performing a function on behalf of the user-
mode code. The problem is that data-space is usually allocated
in user-mode code (like using malloc()). When the kernel needs
to access that buffer, it has no clue how much the user-code
allocated. It can't trust that the user-code put in the right
buffer length. Therefore, it needs to set up a user-mode trap
if the access attempts to go beyond the buffer length.

Examples of not copying to/from user mode is memory-mapped
data. The kernel knows how much data was actually mapped. It
also knows if it will page-fault when being accessed. If
DMA is being performed to such memory, it needs to be reserved
so it won't be paged. It also has to be non-cached so that
writes that the CPU didn't do can be read properly by the CPU.

Under these conditions, the kernel-mode code writes or DMAs
directly to some user buffer. User-mode code needs to find
out when new data are available, perhaps using select() or
poll().

If you are writing a driver, never attempt to copy/to/from/user
with a spin-lock held. You need to allow page-faults to
occur because the user's RAM may have been "borrowed" by
somebody else (paged out). A page-fault needs to occur to
replace the user's RAM-data and reconnect to the user's
working-set.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-04-13 18:39:06

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Wed, Apr 13, 2005 at 08:40:05AM -0400, Richard B. Johnson wrote:
> The kernel does NOT have to copy data from user-space before
> using it.

Incorrect. It must, or the kernel code in question is by definition
buggy.

> In fact, user-mode pointers are valid in kernel-space
> when the kernel is performing a function on behalf of the user-
> mode code.

On some architectures, this is true. But not all architectures, and
not in all circumstances. For example, even on the x86 architecture,
in the 4G/4G mode, a user-mode pointer is *not* valid when kernel code
is running. You must use copy_to_user()/copy_from_user(). Simply
dereferencing a user-mode pointer is a BUG. It might work sometimes,
on some architectures, but not everywhere. Therefore, for correctly
written kernel code, you must not do it.

- Ted

2005-04-13 19:21:21

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Wed, 13 Apr 2005, Theodore Ts'o wrote:

> On Wed, Apr 13, 2005 at 08:40:05AM -0400, Richard B. Johnson wrote:
>> The kernel does NOT have to copy data from user-space before
>> using it.
>
> Incorrect. It must, or the kernel code in question is by definition
> buggy.
>

What? Explain why a memory-mapped buffer can't be DMAed directly?

>> In fact, user-mode pointers are valid in kernel-space
>> when the kernel is performing a function on behalf of the user-
>> mode code.
>
> On some architectures, this is true. But not all architectures, and
> not in all circumstances. For example, even on the x86 architecture,
> in the 4G/4G mode, a user-mode pointer is *not* valid when kernel code
> is running. You must use copy_to_user()/copy_from_user(). Simply
> dereferencing a user-mode pointer is a BUG. It might work sometimes,
> on some architectures, but not everywhere. Therefore, for correctly
> written kernel code, you must not do it.

You apparently didn't even bother to read my explanation why the
copy/to/from user was necessary unless the buffer(s) were memory-
mapped, marked reserved, and set to no-cache. In that case
you can DMA directly to/from user-space. Perhaps you just wanted
to argue?

> - Ted

Again, as long as you can guarantee that the RAM you are using
is reserved so the kernel won't use it for paged RAM, and as
long as it's accessible in both user-mode and kernel-mode,
which means memory-mapped, either the kernel or the user can
use it as it sees fit. If it can't, the kernel is buggy. In
fact, there is no way the kernel could prevent it from being
used in this manner. Since, by definition, the kernel
will leave reserved memory alone, and memory-mapped space
will not fault, there is no way for the kernel to even know
how it is being accessed.


Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-04-14 02:10:43

by Tomko

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

Catalin Marinas wrote:

>Tomko <[email protected]> wrote:
>
>
>>Inside the system call , the kernel often copy the data by calling
>>copy_from_user() rather than just using strcpy(), is it because the
>>memory mapping in kenel space is different from user space?
>>
>>
>
>No, it is because this function checks whether the access to the user
>space address is OK. There are situations when it can also sleep (page
>not present).
>
>
>
what u means "OK"? kernel space should have right to access any memory
address , can u expained in details what u means "OK"?

2005-04-14 02:19:12

by David Schwartz

[permalink] [raw]
Subject: RE: Why system call need to copy the date from the userspace before using it


> Catalin Marinas wrote:

> >Tomko <[email protected]> wrote:

> >>Inside the system call , the kernel often copy the data by calling
> >>copy_from_user() rather than just using strcpy(), is it because the
> >>memory mapping in kenel space is different from user space?

> >No, it is because this function checks whether the access to the user
> >space address is OK. There are situations when it can also sleep (page
> >not present).

> what u means "OK"? kernel space should have right to access any memory
> address , can u expained in details what u means "OK"?

Kernel space does have the right to access any memory but user space
doesn't, so the kernel can't just take an address from user space and use
it. Consider:

int i=open("/dev/null", O_RDWR);
write(i, NULL, 4);

Are you seriously suggesting the kernel should just read from address zero
because a user-space program asked it to?

DS


2005-04-14 14:01:13

by Helge Hafting

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

Tomko wrote:

> Catalin Marinas wrote:
>
>>
>> No, it is because this function checks whether the access to the user
>> space address is OK. There are situations when it can also sleep (page
>> not present).
>>
>>
>>
> what u means "OK"? kernel space should have right to access any
> memory address , can u expained in details what u means "OK"?

The user may not have any right to that memory, that means he
has no right to ask the kernel to mess with it either, even if the
kernel is authorized to do so.

Just as you can't ask a cop to jail someone. Sure, the cop has the power
to arrest, but he will check that there is reason to do so and not
automatically assume that you are right.

Helge Hafting

2005-04-16 04:51:15

by Hacksaw

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

Sorry if this bugs anyone, but I'm learning things here.

What I would expect the kernel to do is this:

system_call_data_prep (userdata, size){

if !4G/4G {
for each page from userdata to userdata+size
{
if the page is swapped out, swap it in
if the page is not owned by the user process, return -ENOWAYMAN
otherwise, lock the page
}
return userdata;
}
else { //kernel land and userland are mutually exclusive
copy the data into kernel land
return kernelland_copy_of_userdata;
}
}

(And then the syscall would need to run the opposite function
sys_call_data_unprep to unlock pages.)

Hmm, maybe that interface sucks.

Is it anything close to that?

--
The best is the enemy of the good -- Voltaire
The Good Enough is the enemy of the Great -- Me
http://www.hacksaw.org -- http://www.privatecircus.com -- KB1FVD


2005-04-16 05:18:46

by Vadim Lobanov

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Sat, 16 Apr 2005, Hacksaw wrote:

> Sorry if this bugs anyone, but I'm learning things here.
>
> What I would expect the kernel to do is this:
>
> system_call_data_prep (userdata, size){
>
> if !4G/4G {
> for each page from userdata to userdata+size
> {
> if the page is swapped out, swap it in
> if the page is not owned by the user process, return -ENOWAYMAN
> otherwise, lock the page
> }
> return userdata;
> }
> else { //kernel land and userland are mutually exclusive
> copy the data into kernel land
> return kernelland_copy_of_userdata;
> }
> }
>
> (And then the syscall would need to run the opposite function
> sys_call_data_unprep to unlock pages.)
>
> Hmm, maybe that interface sucks.

That's one approach. Unfortunately, it's not what the kernel currently
does. The root of the problem is -- it needs to copy the data, even if
the kernel can access userspace data. There are many reasons for why
this is a simpler way to program the interface; if you want actual
concrete examples, let me know.

In order to accomplish the copy_from_user() procedure, from the i386
perspective, the kernel first figures out where userspace is telling it
to look for the data buffer. It checks if the LAST page belongs to
userland, and fails if not; this works because the kernel sits in higher
memory. Then it simply does the direct copy. If during the copy it hits
an invalid page, the exception handler code will run, realize that the
exception occurred because of the copy, and return an error code right
then and there.

Lots of details left out, but this is the 10,000 foot view, I think.

-Vadim Lobanov

> Is it anything close to that?
>
> --
> The best is the enemy of the good -- Voltaire
> The Good Enough is the enemy of the Great -- Me
> http://www.hacksaw.org -- http://www.privatecircus.com -- KB1FVD
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2005-04-16 08:30:42

by Hacksaw

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

>if you want actual concrete examples, let me know.
I'd love a few, but maybe privately?


I can certainly see where always copying is simpler; I certainly consider this
to be an optimization, which must be looked at carefully, lest you end up with
that which speed things up a little, but adds a big maintenance headache.

But this strikes me as a potentially big speed up for movement through
devices. (Or is there already a mechanism for that?)

>It checks if the LAST page belongs to userland, and fails if not;
I can't claim to know how memory assignment goes. I suppose that this
statement means that the address space the userland program sees is continuous?

If not I could see a scenario where that would allow someone to get at data
that isn't theirs, by allocating around until they got an chunk far up in
memory, then just specified a start address way lower with the right size to
end up on their chunk.

I'm assuming this isn't a workable scenario, right?
--
You are in a maze of twisty passages, all alike. Again.
http://www.hacksaw.org -- http://www.privatecircus.com -- KB1FVD


2005-04-16 19:36:13

by Vadim Lobanov

[permalink] [raw]
Subject: Re: Why system call need to copy the date from the userspace before using it

On Sat, 16 Apr 2005, Hacksaw wrote:

> >if you want actual concrete examples, let me know.
> I'd love a few, but maybe privately?
>
>
> I can certainly see where always copying is simpler; I certainly consider this
> to be an optimization, which must be looked at carefully, lest you end up with
> that which speed things up a little, but adds a big maintenance headache.
>
> But this strikes me as a potentially big speed up for movement through
> devices. (Or is there already a mechanism for that?)
>
> >It checks if the LAST page belongs to userland, and fails if not;
> I can't claim to know how memory assignment goes. I suppose that this
> statement means that the address space the userland program sees is continuous?
>
> If not I could see a scenario where that would allow someone to get at data
> that isn't theirs, by allocating around until they got an chunk far up in
> memory, then just specified a start address way lower with the right size to
> end up on their chunk.
>
> I'm assuming this isn't a workable scenario, right?

For the copy_from_user() operation, we're still talking about virtual
memory. In virtual memory terms, each userspace program resides in the
lower addresses, while the kernel takes up the higher addresses. The
user program can pass the kernel any virtual memory pointer it feels
like.

The kernel first checks that it won't try to read from itself, simply by
checking that the last page, belonging to the highest virtual address of
the supposed buffer, does not belong to kernel space. So far so good.

Now the only thing that can go wrong is that the user program told the
kernel that the buffer exists, but some or all of the pages are not
mapped in virtual memory. This is taken care of "transparently" during
the copy -- if we try to copy from a page that isn't mapped, the kernel
will catch the exception, realize that the buffer was bogus, and return
an error.

All of this works because virtual memory is much more restrictive than
physical memory in terms of what data resides where.

> --
> You are in a maze of twisty passages, all alike. Again.
> http://www.hacksaw.org -- http://www.privatecircus.com -- KB1FVD
>
>

-Vadim Lobanov