2002-03-25 19:31:20

by Kevin Pedretti

[permalink] [raw]
Subject: do_exit() and lock_kernel() semantics

Hello,
do_exit() does a lock_kernel() before it destroys the dying
processes mm context (sets task_struct->mm to NULL in 2.4 and &init_mm
in 2.2). Does lock_kernel() somehow disable interrupts? It doesn't
look like it does.


Is there anyway from an interrupt context to check if a process is still
alive (not exiting) and prevent it from exiting until the ISR is over?
I guess if lock_kernel disables interrupts globally and waits for
inprogress interrupts to complete, then this isn't a problem.


More detail:
The reason I ask is that I'm working on/modifying a set of modules that
accesses user space from interrupt context. I know this is not a good
thing to do generally, but for performance reasons the original author
wanted to copy directly into a mlocked user space buffer from a network
receive interrupt. Since the buffer is mlocked, it is always guaranteed
to be there and no page faults will happen (right??? I'm new at this).
Thus, for each receive we have to convert the virt address of the
user-land receive buffer to a physical address (in the kernel region)
before doing the memcpy (copy_to_user doesn't work from interrupt
context). This all seems to work fine in practice. However, it seems
to me that there is a race that can happen if a process is in the middle
of dying and a receive interrupt happens. task->mm can be set to
NULL/init_mm out from under me while doing a receive (e.g. on another cpu).


Thanks for any help.

Kevin


2002-03-25 20:13:56

by Manfred Spraul

[permalink] [raw]
Subject: Re: do_exit() and lock_kernel() semantics

> Thus, for each receive we have to convert the virt address of the
> user-land receive buffer to a physical address (in the kernel region)
> before doing the memcpy (copy_to_user doesn't work from interrupt
> context).

Why do you want to do that at interrupt time?
I'd call map_user_kiobuf() when the user-land buffer is set up, and then
write directly (i.e. with kmap_atomic()) into the pages stored in
iobuf->maplist[]. It avoids the page table scan at interrupt time.

Which platform do you use? map_user_kiobuf() doesn't enforce cache
coherency internally, outside of i386 you might need additional
flush_cache_whatever (see Documentation/cachetlb.txt)

--
Manfred

2002-03-25 20:19:28

by Andrew Morton

[permalink] [raw]
Subject: Re: do_exit() and lock_kernel() semantics

Kevin Pedretti wrote:
>
> Hello,
> do_exit() does a lock_kernel() before it destroys the dying
> processes mm context (sets task_struct->mm to NULL in 2.4 and &init_mm
> in 2.2). Does lock_kernel() somehow disable interrupts? It doesn't
> look like it does.

Nope.

> Is there anyway from an interrupt context to check if a process is still
> alive (not exiting) and prevent it from exiting until the ISR is over?

See kernel/timer.c:count_active_tasks(). It does
read_lock(&task_list_lock) to pin everything down
while it walks the task list in an interrupt.

And you're in luck - tasklist_lock is exported to modules.

> I guess if lock_kernel disables interrupts globally and waits for
> inprogress interrupts to complete, then this isn't a problem.
>
> More detail:
> The reason I ask is that I'm working on/modifying a set of modules that
> accesses user space from interrupt context. I know this is not a good
> thing to do generally, but for performance reasons the original author
> wanted to copy directly into a mlocked user space buffer from a network
> receive interrupt. Since the buffer is mlocked, it is always guaranteed
> to be there and no page faults will happen (right??? I'm new at this).
> Thus, for each receive we have to convert the virt address of the
> user-land receive buffer to a physical address (in the kernel region)
> before doing the memcpy (copy_to_user doesn't work from interrupt
> context).

That sounds sane. Pin the user pages, set up a kernel virtual
mapping of them. You can't rely on userspace having performed
the mlock of course; you'll need to pin the pages in-kernel.
Probably you can just use map_user_kiobuf().

> This all seems to work fine in practice. However, it seems
> to me that there is a race that can happen if a process is in the middle
> of dying and a receive interrupt happens. task->mm can be set to
> NULL/init_mm out from under me while doing a receive (e.g. on another cpu).
>

I guess that if you've pinned the pages, then you're safe even
if the task exits - the pages won't be going away. But your
interrupt will need to deal with the kiovec, not the process mm.

This could end up meaning that your final page_cache_release()
happens in interrupt context. We may have a problem with that
if the page is still on the global LRU. See the thread
starting at http://www.uwsg.iu.edu/hypermail/linux/kernel/0202.0/1157.html

-

2002-03-26 00:01:25

by Kevin Pedretti

[permalink] [raw]
Subject: Re: do_exit() and lock_kernel() semantics

Manfred Spraul wrote:

>>Thus, for each receive we have to convert the virt address of the
>>user-land receive buffer to a physical address (in the kernel region)
>>before doing the memcpy (copy_to_user doesn't work from interrupt
>>context).
>>
>
>Why do you want to do that at interrupt time?
>I'd call map_user_kiobuf() when the user-land buffer is set up, and then
>write directly (i.e. with kmap_atomic()) into the pages stored in
>iobuf->maplist[]. It avoids the page table scan at interrupt time.
>
>Which platform do you use? map_user_kiobuf() doesn't enforce cache
>coherency internally, outside of i386 you might need additional
>flush_cache_whatever (see Documentation/cachetlb.txt)
>
>--
> Manfred
>

I'm guessing the reason is that this module was initially developed on
2.0 and ported to 2.2. and 2.4. I think the kiobuf stuff is only in
2.4+, right? I should probably work on converting things, although our
production Cplant cluster is still using 2.2. It might help reduce our
latency, although I'm guessing the page table walk is pretty quick.

This module needs to work on Alpha, i386, and ia64 so I'd have to look
into the cache issues.

Thanks,
Kevin

2002-03-26 07:54:11

by Ashok Raj

[permalink] [raw]
Subject: RE: do_exit() and lock_kernel() semantics

1. your driver must have a way to syncup with intr routine accesssing this
buffer. The way we do this is by synching access to this buffer and making
sure your file close cleans up this, so intr routine does not touch this
buffer if process is exited. (assuming you provide access via file handles,
and handle the cleanup as file close)

2. you cannot do the user virtual to kernel address during an intr call. You
must do this and cache the list of page numbers. (then convert the page
number to kva before doing the copy). You must also be aware that if the
buffer crosses page boundaries (i.e true virtual addr spanning pages, you
might need to do this copy multiple times.) since there is no function in
linux kernel to obtain a kva for a uva.

-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Kevin Pedretti
Sent: Monday, March 25, 2002 11:25 AM
To: [email protected]
Cc: [email protected]; [email protected]
Subject: do_exit() and lock_kernel() semantics


Hello,
do_exit() does a lock_kernel() before it destroys the dying
processes mm context (sets task_struct->mm to NULL in 2.4 and &init_mm
in 2.2). Does lock_kernel() somehow disable interrupts? It doesn't
look like it does.


Is there anyway from an interrupt context to check if a process is still
alive (not exiting) and prevent it from exiting until the ISR is over?
I guess if lock_kernel disables interrupts globally and waits for
inprogress interrupts to complete, then this isn't a problem.


More detail:
The reason I ask is that I'm working on/modifying a set of modules that
accesses user space from interrupt context. I know this is not a good
thing to do generally, but for performance reasons the original author
wanted to copy directly into a mlocked user space buffer from a network
receive interrupt. Since the buffer is mlocked, it is always guaranteed
to be there and no page faults will happen (right??? I'm new at this).
Thus, for each receive we have to convert the virt address of the
user-land receive buffer to a physical address (in the kernel region)
before doing the memcpy (copy_to_user doesn't work from interrupt
context). This all seems to work fine in practice. However, it seems
to me that there is a race that can happen if a process is in the middle
of dying and a receive interrupt happens. task->mm can be set to
NULL/init_mm out from under me while doing a receive (e.g. on another cpu).


Thanks for any help.

Kevin

2002-03-26 10:12:40

by Itai Nahshon

[permalink] [raw]
Subject: Re: do_exit() and lock_kernel() semantics

On Monday 25 March 2002 21:25 pm, Kevin Pedretti wrote:
> The reason I ask is that I'm working on/modifying a set of modules that
> accesses user space from interrupt context. ?I know this is not a good
> thing to do generally, but for performance reasons the original author
> wanted to copy directly into a mlocked user space buffer from a network

Some drivers (I know for sure about OSS drivers) do it the opposite way.

The driver allocates a buffer (or usually multiple buffers) in physical
memory. The buffers are directly accessible from the device hardware
for DMA etc. The interrupt routines normally would not touch the buffers
(although they could) but just tell the device how to use the buffers.

The user's process that needs to use the device can use the read/write
interface, or for better performance mmap the device (which maps
the buffers into a contiguous user space) and access the buffers directly.

With the mmap api, ioctls are used to tell the process how much new data
is available (for reading) or how much was consumed by the device (so
these buffers can be written with new data).

-- Itai