We like to design a software in certain module way. For example,
a libForwardTableManager.so for infiniband switch manager
might manager 128 MBytes of share memory data. 10 or more
applications will call the APIs in the libForwardTableManager.so
to get/set the forward table data.
Since the data manager by this share library are globally share,
there is a chance that someone might corrupt this data via
invalid pointers. Or even if we don't give them a pointer, the
application can still corrupt the share memory with an uninitialized
pointer on it stack. This type of problem are extremely hard
to debug.
What I like to do is to use the mprotect() api to turn on/off the
memory read/write access to the globally share memory. This
way, the only possible memory corruption to the share table
is from the APIs in the libForwardTableManager.so. It makes
debugging this kind of problem easier. If the application
corrupts the memory, it will cause a seg-fault which also
makes debugging simple.
Questions for the linux kernel guru are:
Is this reasonable to do in Linux?
Any idea the overhead for such scheme in term of numbers of
micro-seconds added to each API call. I like to see the
overhead in sub-microseconds range since the application
might call the api in libForwardTableManager.so at the rate
of 100k api call per seconds.
I used the TSC counter to profile the mprotect() overhead
in QNX (micro-kernel RTOS). It has overhead is 130
milliseconds for 6 MB of share memory which is extremely high.
I think the reason is all of QNX APIs turns to IPC messages
to process manager task. It cause context switch to
other tasks.
For x86 system, is there way to specify 4MB page table entry
instead of 4K page table entry when use the mmap() api.
With 4MB mmap() page table entry, mprotected should take only
8 iterations to change the access bits for 32 MB of share
memory as compare to 8k iterations for 4k page table entries.
Am I corrected on this?
Thanks
----------------------------------------------------------------
Tony Lee Nokia Networks, Inc.
Work:(650)864-6565 545 Whisman Drive - Bld C
Mountain View, CA 94043
> For x86 system, is there way to specify 4MB page table entry
> instead of 4K page table entry when use the mmap() api.
> With 4MB mmap() page table entry, mprotected should take only
> 8 iterations to change the access bits for 32 MB of share
> memory as compare to 8k iterations for 4k page table entries.
> Am I corrected on this?
The mainstream kernel has no real support for 4Mb pages - some experimental
work has been done but little else. Even then the TLB flush has a non
trivial overhead. On SMP the effect will be more significant since it
must ensure the other threads on other processors also see updated page
tables.
Alan
> -----Original Message-----
> From: ext Andrew Morton [mailto:[email protected]]
> Sent: Friday, March 22, 2002 2:58 PM
> To: Lee Tony.P (NET/MtView)
> Cc: [email protected]
> Subject: Re: mprotect() api overhead.
>
>
> [email protected] wrote:
> >
> > ...
> > What I like to do is to use the mprotect() api to turn on/off the
> > memory read/write access to the globally share memory. This
> > way, the only possible memory corruption to the share table
> > is from the APIs in the libForwardTableManager.so. It makes
> > debugging this kind of problem easier. If the application
> > corrupts the memory, it will cause a seg-fault which also
> > makes debugging simple.
> >
> > Questions for the linux kernel guru are:
> >
> > Is this reasonable to do in Linux?
> >
> > Any idea the overhead for such scheme in term of numbers of
> > micro-seconds added to each API call. I like to see the
> > overhead in sub-microseconds range since the application
> > might call the api in libForwardTableManager.so at the rate
> > of 100k api call per seconds.
> >
> > I used the TSC counter to profile the mprotect() overhead
> > in QNX (micro-kernel RTOS). It has overhead is 130
> > milliseconds for 6 MB of share memory which is
> extremely high.
> > I think the reason is all of QNX APIs turns to IPC messages
> > to process manager task. It cause context switch to
> > other tasks.
>
> Seems that mprotect() against a 6 megabyte region takes five
> microseconds
> in Linux. Which is too expensive for you.
>
> It would be better if you could map the same memory region
> two times. One with PROT_READ and the other with
> PROT_READ|PROT_WRITE.
> Then just use the appropriate pointer at the appropriate time.
>
Andrew,
Thanks for the info.
5 microseconds is definitly a lot better than the number I got
with QNX. Mapping the same region twice doesn't help. Here's
why: App A call my API() and my api use local variables as
pointers to the read write region of share memory on stack.
App B has an un-init local variable in stack, it updates that
pointer and the share memory is corrupted. It is impossible
to track down exactly who cause the corruption in this case.
----------------------------------------------------------------
Tony Lee Nokia Networks, Inc.
Work:(650)864-6565 545 Whisman Drive - Bld C
Mountain View, CA 94043
[email protected] wrote:
>
> ...
> What I like to do is to use the mprotect() api to turn on/off the
> memory read/write access to the globally share memory. This
> way, the only possible memory corruption to the share table
> is from the APIs in the libForwardTableManager.so. It makes
> debugging this kind of problem easier. If the application
> corrupts the memory, it will cause a seg-fault which also
> makes debugging simple.
>
> Questions for the linux kernel guru are:
>
> Is this reasonable to do in Linux?
>
> Any idea the overhead for such scheme in term of numbers of
> micro-seconds added to each API call. I like to see the
> overhead in sub-microseconds range since the application
> might call the api in libForwardTableManager.so at the rate
> of 100k api call per seconds.
>
> I used the TSC counter to profile the mprotect() overhead
> in QNX (micro-kernel RTOS). It has overhead is 130
> milliseconds for 6 MB of share memory which is extremely high.
> I think the reason is all of QNX APIs turns to IPC messages
> to process manager task. It cause context switch to
> other tasks.
Seems that mprotect() against a 6 megabyte region takes five microseconds
in Linux. Which is too expensive for you.
It would be better if you could map the same memory region
two times. One with PROT_READ and the other with PROT_READ|PROT_WRITE.
Then just use the appropriate pointer at the appropriate time.
-
> -----Original Message-----
> From: ext Alan Cox [mailto:[email protected]]
> Sent: Friday, March 22, 2002 5:03 PM
> To: Lee Tony.P (NET/MtView)
> Cc: [email protected]
> Subject: Re: mprotect() api overhead.
>
>
> > For x86 system, is there way to specify 4MB page table entry
> > instead of 4K page table entry when use the mmap() api.
> > With 4MB mmap() page table entry, mprotected should take only
> > 8 iterations to change the access bits for 32 MB of share
> > memory as compare to 8k iterations for 4k page table entries.
> > Am I corrected on this?
>
> The mainstream kernel has no real support for 4Mb pages -
> some experimental
> work has been done but little else. Even then the TLB flush has a non
> trivial overhead. On SMP the effect will be more significant since it
> must ensure the other threads on other processors also see
> updated page
> tables.
Alan, Thanks for the info.
Just talked to someone from HP lab that worked on PA-RISC chip
6 years ago. PA-RISC had some special page table setup that lets
one app to call api in other app's virtual memory in 7 instructions
and without TLB flush.
I was told such "features" is in Itanium.
There were (will be) utopia.... :-)
As for SMP case, for my application, it is less an issue, since
when user call my API in the .so, the mprotect (or that HP
7 instructions) will open access to the share memory for them
regardless which CPU they are coming from. If other thread
running in other CPU need to windows open, it will also call
my api which in turn will call the mprotect to open the windows
for that CPU.
Think of it as one module software that can call APIs in other
module running in other virtual memory with low overhead but also has
memory protection against other software (exe or .so) without entering
kernel. It will be extremly useful in large scale fault
tolerance software development.
Just image a world that
Apache's mod_tcl.so crashes but the httpd server still running.
zlib has double free bugs but it can not touch the apache
since all the API that can call into the apache module
is somehow protected by HW page table.
It has to fast, otherwise I would just use CGI and standard
Unix ipc which give us the protection but no performance.
----------------------------------------------------------------
Tony Lee Nokia Networks, Inc.
> As for SMP case, for my application, it is less an issue, since
> when user call my API in the .so, the mprotect (or that HP=20
> 7 instructions) will open access to the share memory for them
> regardless which CPU they are coming from. If other thread
That still requires cross processor synchronization - so it will still
take the same hit
Alan
On Sat, Mar 23, 2002 at 02:20:37AM +0000, Alan Cox wrote:
> > As for SMP case, for my application, it is less an issue, since
> > when user call my API in the .so, the mprotect (or that HP=20
> > 7 instructions) will open access to the share memory for them
> > regardless which CPU they are coming from. If other thread
>
> That still requires cross processor synchronization - so it will still
> take the same hit
It's actually an instruction on ia64, so the overhead is fairly low
(similar to a cache miss). That said, Linux doesn't have the ability
to share portions of page tables between processes at present, so it
doesn't matter.
-ben
> From: <[email protected]>
> Date: Fri, 22 Mar 2002 22:10:20 PST
> We like to design a software in certain module way. For example,
> a libForwardTableManager.so for infiniband switch manager
> might manager 128 MBytes of share memory data. 10 or more
> applications will call the APIs in the libForwardTableManager.so
> to get/set the forward table data.
>[...]
> What I like to do is to use the mprotect() api to turn on/off the
> memory read/write access to the globally share memory. This
> way, the only possible memory corruption to the share table
> is from the APIs in the libForwardTableManager.so
Tony, I think you need to rethink your API.
E.g. what does the switch manager do and why did you
decide to keep any data in a shared memory, of all things?
Why do you need several applications to access the
switch forwarding table? Perhaps, if you answer those
questions, you do not need to bang mprotect() so hard
anymore.
-- Pete