LinuxLists.cc - zap_page_range(): TLB flush race

2000-04-08 19:59:20

Subject: zap_page_range(): TLB flush race

it seems we have a smp race in zap_page_range():

When we remove a page from the page tables, we must call:

flush_cache_page();
pte_clear();
flush_tlb_page();
free_page();

We must not free the page before we have called flush_tlb_xy(),
otherwise the second cpu could access memory that already freed.

but zap_page_range() calls free_page() before the flush_tlb() call.

Is that really a bug, has anyone a good idea how to fix that?

filemap_sync() calls flush_tlb_page() for each page, but IMHO this is a
really bad idea, the performance will suck with multi-threaded apps on
SMP.

Perhaps build a linked list, and free later?
We could abuse the next pointer from "struct page".
--
Manfred

2000-04-08 21:03:28

by Kanoj Sarcar

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

>
> it seems we have a smp race in zap_page_range():
>
> When we remove a page from the page tables, we must call:
>
> flush_cache_page();
> pte_clear();
> flush_tlb_page();
> free_page();
>
> We must not free the page before we have called flush_tlb_xy(),
> otherwise the second cpu could access memory that already freed.
>
> but zap_page_range() calls free_page() before the flush_tlb() call.
>
> Is that really a bug, has anyone a good idea how to fix that?

Why do you think this is a bug? After the pte_clear, we need to flush
tlb, so that if anyone wants to drag in the mapping (by accessing the
virtual address), he will fault (since translation is not in tlb) and
wait on mmap_sem. After that, when zap_page_range has freed the page,
and released the mmap_sem, the faulter will find he was trying to access
what is now invalid memory and get a signal/killed.

But a race does exist in establish_pte(), when the flush_tlb happens
_before_ the set_pte(), another thread might drag in the old translation
on a different cpu.

>
> filemap_sync() calls flush_tlb_page() for each page, but IMHO this is a
> really bad idea, the performance will suck with multi-threaded apps on
> SMP.

The best you can do probably is a flush_tlb_range?

Kanoj

>
> Perhaps build a linked list, and free later?
> We could abuse the next pointer from "struct page".
> --
> Manfred
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
>

2000-04-08 22:46:23

by Manfred Spraul

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Kanoj Sarcar wrote:
>
> >
> > it seems we have a smp race in zap_page_range():
> >
> > When we remove a page from the page tables, we must call:
> >
> > flush_cache_page();
> > pte_clear();
> > flush_tlb_page();
> > free_page();
> >
> > We must not free the page before we have called flush_tlb_xy(),
> > otherwise the second cpu could access memory that already freed.
> >
> > but zap_page_range() calls free_page() before the flush_tlb() call.
> >
> > Is that really a bug, has anyone a good idea how to fix that?
>
> Why do you think this is a bug? After the pte_clear, we need to flush
> tlb, so that if anyone wants to drag in the mapping (by accessing the
> virtual address), he will fault (since translation is not in tlb) and
> wait on mmap_sem.

Because the second cpu can use a stale tlb entry. We must free the page
_after_ the flush, not before the flush.

Example:
cpu1, cpu2: execute 2 threads from one process.
cpu3: unrelated thread that allocates a new page.

cpu1:
1) writes to one page in a tight loop. The tlb entry won't be discared
by the cpu without an explicit flush.

cpu2:
2) sys_munmap()
* zap_page_range(): calls free_page() for each page in the area.
do_munmap() for a 500 MB block will take a few milliseconds.

cpu3:
3) somewhere: get_free_page(). Now it gets a pointer to a page that cpu1
still writes to.

cpu2:
4) zap_page_range() returns, now the tlb is flushed.

cpu1:
5) received the ipi, the local tlb is flushed, page fault.
* but: cpu1 stomped on the page that was allocated by cpu3.

>
> But a race does exist in establish_pte(), when the flush_tlb happens
> _before_ the set_pte(), another thread might drag in the old translation
> on a different cpu.
>

Yes, establish_pte() is broken. We should reverse the calls:

set_pte(); /* update the kernel page tables */
update_mmu(); /* update architecture specific page tables. */
flush_tlb(); /* and flush the hardware tlb */

> >
> > filemap_sync() calls flush_tlb_page() for each page, but IMHO this is a
> > really bad idea, the performance will suck with multi-threaded apps on
> > SMP.
>
> The best you can do probably is a flush_tlb_range?

filemap_sync() calls both flush_tlb_range() and flush_tlb_page() on each
page, I just don't know which call I should remove :-/

Btw, I couldn't find an update_mmu_range() function, what's the exact
purpose of update_mmu_cache()? mips64 uses this function.

Can't we merge update_mmu_cache() with flush_tlb_page()?

--
Manfred

2000-04-08 23:23:44

by Kanoj Sarcar

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

>
> Kanoj Sarcar wrote:
> >
> > >
> > > it seems we have a smp race in zap_page_range():
> > >
> > > When we remove a page from the page tables, we must call:
> > >
> > > flush_cache_page();
> > > pte_clear();
> > > flush_tlb_page();
> > > free_page();
> > >
> > > We must not free the page before we have called flush_tlb_xy(),
> > > otherwise the second cpu could access memory that already freed.
> > >
> > > but zap_page_range() calls free_page() before the flush_tlb() call.
> > >
> > > Is that really a bug, has anyone a good idea how to fix that?
> >
> > Why do you think this is a bug? After the pte_clear, we need to flush
> > tlb, so that if anyone wants to drag in the mapping (by accessing the
> > virtual address), he will fault (since translation is not in tlb) and
> > wait on mmap_sem.
>
> Because the second cpu can use a stale tlb entry. We must free the page
> _after_ the flush, not before the flush.
>
> Example:
> cpu1, cpu2: execute 2 threads from one process.
> cpu3: unrelated thread that allocates a new page.
>
> cpu1:
> 1) writes to one page in a tight loop. The tlb entry won't be discared
> by the cpu without an explicit flush.
>
> cpu2:
> 2) sys_munmap()
> * zap_page_range(): calls free_page() for each page in the area.
> do_munmap() for a 500 MB block will take a few milliseconds.
>
> cpu3:
> 3) somewhere: get_free_page(). Now it gets a pointer to a page that cpu1
> still writes to.
>
> cpu2:
> 4) zap_page_range() returns, now the tlb is flushed.
>
> cpu1:
> 5) received the ipi, the local tlb is flushed, page fault.
> * but: cpu1 stomped on the page that was allocated by cpu3.
>

Yup, you got it, I had forgotten this issue. Btw, a bunch of people are
aware of the pte handling races in Linux (Linus, Alan, David M etc). In
fact, around 2.2.10 timeframe, Alan and I did a quick study of such races.
I posted a patch, we had some conversation, but ultimately we have not
fixed any of these races.

If you are interested, you can read a discussion and the patch at
http://reality.sgi.com/kanoj_engr/smppte.patch

The above race that you point out is covered in a somewhat different
fashion: "1. During munmap, clones should be restricted while the file
pages are being written out to disk, else some writes are not synced
to disk at unmap time."

>
> >
> > But a race does exist in establish_pte(), when the flush_tlb happens
> > _before_ the set_pte(), another thread might drag in the old translation
> > on a different cpu.
> >
>
> Yes, establish_pte() is broken. We should reverse the calls:
>
> set_pte(); /* update the kernel page tables */
> update_mmu(); /* update architecture specific page tables. */
> flush_tlb(); /* and flush the hardware tlb */
>

People are aware of this too, it was introduced during the 390 merge.
I tried talking to the IBM guy about this, I didn't see a response from
him ...

I think what we now need is a critical mass, something that will make us
go "okay, lets just fix these races once and for all".

>
> > >
> > > filemap_sync() calls flush_tlb_page() for each page, but IMHO this is a
> > > really bad idea, the performance will suck with multi-threaded apps on
> > > SMP.
> >
> > The best you can do probably is a flush_tlb_range?
>
> filemap_sync() calls both flush_tlb_range() and flush_tlb_page() on each
> page, I just don't know which call I should remove :-/

Hmm, this is related to the above race I quoted... see the patch in this
regard.

Kanoj

>
> Btw, I couldn't find an update_mmu_range() function, what's the exact
> purpose of update_mmu_cache()? mips64 uses this function.
>
> Can't we merge update_mmu_cache() with flush_tlb_page()?
>
> --
> Manfred
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
>

2000-04-08 23:29:48

by Alan

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

> > Yes, establish_pte() is broken. We should reverse the calls:
> >
> > set_pte(); /* update the kernel page tables */
> > update_mmu(); /* update architecture specific page tables. */
> > flush_tlb(); /* and flush the hardware tlb */
> >
>
> People are aware of this too, it was introduced during the 390 merge.
> I tried talking to the IBM guy about this, I didn't see a response from
> him ...

Strange since I did and it included you

> I think what we now need is a critical mass, something that will make us
> go "okay, lets just fix these races once and for all".

Basically establish_pte() has to be architecture specific, as some processors
need different orders either to avoid races or to handle cpu specific
limitations.

Alan

2000-04-08 23:45:08

by David Miller

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

From: [email protected] (Kanoj Sarcar)
Date: Sat, 8 Apr 2000 14:11:05 -0700 (PDT)

> filemap_sync() calls flush_tlb_page() for each page, but IMHO this is a
> really bad idea, the performance will suck with multi-threaded apps on
> SMP.

The best you can do probably is a flush_tlb_range?

People, look at the callers of filemap_sync, it does range tlb/cache
flushes so the flushes in filemap_sync_pte() are in fact spurious.

Later,
David S. Miller
[email protected]

2000-04-08 23:49:36

by Kanoj Sarcar

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

>
> > > Yes, establish_pte() is broken. We should reverse the calls:
> > >
> > > set_pte(); /* update the kernel page tables */
> > > update_mmu(); /* update architecture specific page tables. */
> > > flush_tlb(); /* and flush the hardware tlb */
> > >
> >
> > People are aware of this too, it was introduced during the 390 merge.
> > I tried talking to the IBM guy about this, I didn't see a response from
> > him ...
>
> Strange since I did and it included you

Yes, I did get the first mail from the IBM guy (was he from Denmark, seem
to have seen ibm.de in his email?) explaining why the 390 wanted this
ordering ... In response, I pointed out that the 390 was either prone
to other races then, or was doing something in its low level handlers,
and could he please confirm what is the case?

Let me remember: he mentioned the old pte must be around for the ipte(?)
instruction to flush the tlb. If the new pte is dropped in before, the
flush fails, the stale tlb entry stays, problems happen. So I pointed out
other places which do set_pte, then flush_tlb. I also wanted to know
whether the flush_tlb somehow makes sure that other threads/cpus can not
pull in the old translation till the set_pte completes (something like
what freeze_pte_* does in my patch). I did not receive a response to this.

>
> > I think what we now need is a critical mass, something that will make us
> > go "okay, lets just fix these races once and for all".
>
> Basically establish_pte() has to be architecture specific, as some processors
> need different orders either to avoid races or to handle cpu specific
> limitations.

Even if you did that, wouldn't it just mean that the 390 would still be
prone to races, but other platforms are not? Of course, that's much
better than having all platforms be prone to the race!

And we should also handle the generic races with clones and ptes, an
example of which Manfred just demonstrated.

Kanoj
>
> Alan
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
>

2000-04-09 00:12:30

by Kanoj Sarcar

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

>
> > filemap_sync() calls flush_tlb_page() for each page, but IMHO this is a
> > really bad idea, the performance will suck with multi-threaded apps on
> > SMP.
>
> The best you can do probably is a flush_tlb_range?
>
> People, look at the callers of filemap_sync, it does range tlb/cache
> flushes so the flushes in filemap_sync_pte() are in fact spurious.
>
> Later,
> David S. Miller
> [email protected]
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux.eu.org/Linux-MM/
>

Hehehe ... now I am just arguing for the sake of arguing :-)

Let me give an example. I write a program with 2 threads, C1 and
C2, which mmap a file MAP_SHARED, and both write to it, with the
understanding that when they unmap it, all their previous writes will
be synced to disk. Okay, so C1 goes ahead and unmaps it, C2 thinks
"let me write to a page. If I don't get a signal indicating illegal
access, that means the write will show up on disk later. If I do
get a signal, I know the file has been unmapped, hence the write
will never make it to disk, so I need to take some recovery action".

Okay, so is this a good program? (May be not, they should probably
have synchronized the unmapping between themselves, but C2's
assumptions are not wrong).

If this is indeed a good program, you want to freeze C2's access
to the pages as soon as the unmap starts. We don't have a freeze
operation, the best next we can do is to flush the tlb as close
as possible to clearing the pte, which makes it really difficult
for C2 to drag in a tlb entry and continue writes that will not
be synced to disk.

So, though the filemap_sync_pte()s are theoretically unneeded,
they might actually be reducing a race ...

While on this topic, shouldn't filemap_unmap() invoke

filemap_sync(vma, start, len, MS_ASYNC|MS_INVALIDATE);

rather than

filemap_sync(vma, start, len, MS_ASYNC);

Not that the MS_ASYNC part makes much difference ...

Kanoj

2000-04-09 09:08:12

by Manfred Spraul

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Alan Cox wrote:
>
>
> Basically establish_pte() has to be architecture specific, as some processors
> need different orders either to avoid races or to handle cpu specific
> limitations.
>
I don't know: IMHO we have far to many architecture specific functions
in that area:

set_pte()
establish_pte()

flush_tlb()
update_mmu_cache();
flush_cache();
flush_icache();

Can't we merge them?

<< 1)
set_pte(vma,pte,new_val);
* flushes the cache, changes one pte, updates the tlb.
<< 2)
set_pte_new(vma,pte,new_val);
* sets the pte, the old value was non-present. Most cpu
don't need to flush the tlb. (2.3.99 never flushes the tlb)
<< 3)
prepare_ptechange_{range,mm}(vma,start,end);
for()
__set_pte(vma,pte,new_val);
commit_ptechange_{range,mm}(vma,start,end);
* should be used if you change multiple pages.
<<<<<<<<<

I don't understand the purpose of flush_page_to_ram():
filemap_sync_pte() calls it if MS_INVALIDATE is not set, it's not called
if MS_INVALIDATE is set.
In both cases, the kernel pointer is accessed in filemap_write_page().

--
Manfred

2000-04-09 09:19:05

by David Miller

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Date: Sun, 09 Apr 2000 11:10:13 +0200
From: Manfred Spraul <[email protected]>

I don't understand the purpose of flush_page_to_ram():

It's a (bad) attempt to deal with virtually indexed caches which
are larger than the page size of the machine. Generally, when
the kernel writes to a page which potentially can subsequently
accessed from user mappings, it must call flush_page_to_ram.

It flushes the kernel-view page out of the caches and thus
makes main memory up to date, this way when the user accesses
the page from his mapping he won't see stale data if he happens
to have the page mapped at a bad "virtual alias" of what the
kernel maps it at.

It sucks, I'd like to kill it along with flush_icache_page.

I have been working a lot recently on something which is clean and
hopefully can allow these things to die. But I don't want to talk
more about it until I am able to come up with an implementation which
I am happy with, because until that time it may as well not exist.

Later,
David S. Miller
[email protected]

2000-04-10 22:25:54

by Stephen C. Tweedie

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Hi,

On Sun, Apr 09, 2000 at 12:37:05AM +0100, Alan Cox wrote:
>
> Basically establish_pte() has to be architecture specific, as some processors
> need different orders either to avoid races or to handle cpu specific
> limitations.

What exactly do different architectures need which set_pte() doesn't
already allow them to do magic in?

--Stephen

2000-04-10 23:23:30

by David Miller

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Date: Mon, 10 Apr 2000 23:21:49 +0100
From: "Stephen C. Tweedie" <[email protected]>

On Sun, Apr 09, 2000 at 12:37:05AM +0100, Alan Cox wrote:
>
> Basically establish_pte() has to be architecture specific, as some processors
> need different orders either to avoid races or to handle cpu specific
> limitations.

What exactly do different architectures need which set_pte() doesn't
already allow them to do magic in?

Doing a properly synchronized PTE update and Cache/TLB flush when the
mapping can exist on multiple processors is not most efficiently done
if we take some generic setup.

The idea is that if we encapsulate the "flush_cache; set_pte;
flush_tlb" operations into a single arch-specific routine, the
implementation can then implement the most efficient solution possible
to this SMP problem.

For example, the fastest way to atomically update an existing PTE on
an SMP system using a software TLB miss scheme is wildly different
from that on an SMP system using a hardware replaced TLB.

For example, with a software TLB miss scheme it might be something
like this:

establish_pte() {
capture_cpus(mm->cpu_vm_mask);
everybody_flush_cache_page(mm->cpu_vm_mask, ...);
atomic_set_pte(ptep, entry);
everybody_flush_tlb_page(mm->cpu_vm_mask, ...);
release_cpus(mm->cpu_vm_mask);
}

With the obvious important optimizations for when mm->count is one,
etc.

The other case is when we are checking the dirty status of a pte
in vmscan, something similar is needed there as well:

pte_t atomic_pte_check_dirty() {
capture_cpus(mm->cpu_vm_mask);
entry = *ptep;
if (pte_dirty(entry)) {
everybody_flush_cache_page(mm->cpu_vm_mask, ...);
pte_clear(ptep);
everybody_flush_tlb_page(mm->cpu_vm_mask, ...);
}
release_cpus(mm->cpu_vm_mask);
return entry;
}

Later,
David S. Miller
[email protected]

2000-04-11 09:15:40

by Stephen C. Tweedie

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Hi,

On Mon, Apr 10, 2000 at 04:12:18PM -0700, David S. Miller wrote:
> On Sun, Apr 09, 2000 at 12:37:05AM +0100, Alan Cox wrote:
> >
> > Basically establish_pte() has to be architecture specific, as some processors
> > need different orders either to avoid races or to handle cpu specific
> > limitations.
>
> What exactly do different architectures need which set_pte() doesn't
> already allow them to do magic in?
>
> Doing a properly synchronized PTE update and Cache/TLB flush when the
> mapping can exist on multiple processors is not most efficiently done
> if we take some generic setup.

OK, I'm sure there are optimisation issues, but I was worried about
correctness problems from what Alan said.

--Stephen

2000-04-11 11:58:33

by Alan

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

> What exactly do different architectures need which set_pte() doesn't
> already allow them to do magic in?

Some of them need a valid PTE to exist in order to flush a page from cache
to memory

2000-04-11 14:42:22

by Manfred Spraul

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

"Stephen C. Tweedie" wrote:
>
>
> OK, I'm sure there are optimisation issues, but I was worried about
> correctness problems from what Alan said.
>

They are correctness problems:
[I'm only reading the asm source, I might be wrong]

* s390 has hardware support for tlb flush ipis.
* it doesn't support hardware contexts (address space numbers, region
id,...)
* They need the old pte value and the virtual address for their flush
ipi.

-->
set_pte()
flush_tlb_page()
doesn't work.

Obviously their work-around
flush_tlb_page()
set_pte()
is wrong as well, and it breaks all other architectures :-/

They need an atomic set_and_flush_pte() macro.
OTHO on i386 a tlb flush during set_pte() would cause a dramatic
slowdown.

--
Manfred

2000-04-11 16:49:03

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

On Tue, 11 Apr 2000, Manfred Spraul wrote:

>* They need the old pte value and the virtual address for their flush
>ipi.

Why can't they flush all the address space unconditionally on the other
cpus? I can't find a valid reason for which they do need the old pte
value. The tlb should be a virtual->physical mapping only, the pte isn't
relevant at all with the TLB. however if they really need both old pte
address and the virtual address of the page, they can trivially pass the
parameters to the other CPUs acquring a spinlock and using some global
variable exactly as IA32 does to avoid flushing the whole TLB on the other
CPUs in the flush_tlb_page case.

>Obviously their work-around
> flush_tlb_page()
> set_pte()
>is wrong as well, and it breaks all other architectures :-/

I bet it breaks s390 too.

The other filemap_sync race with threads that Kanoj was talking about is
very less severe since it can't make the machine unstable, but it can only
forgot to write some bit using strange userspace app design (only _data_
corruption can happen to the shared mmaping of the patological app).

Andrea

2000-04-11 18:08:57

by Manfred Spraul

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Andrea Arcangeli wrote:
>
> On Tue, 11 Apr 2000, Manfred Spraul wrote:
>
> >* They need the old pte value and the virtual address for their flush
> >ipi.
>
> Why can't they flush all the address space unconditionally on the other
> cpus?

They have a special instruction that flushes one mapping on all cpus in
the system.
It has 2 parameters:
* virtual address of the page
* segment of the physical page to be flushed???

Is someone out there with a s390 asm handbook? I only have these
comments:

+/*
+ * s390 has two ways of flushing TLBs
+ * 'ptlb' does a flush of the local processor
+ * 'ipte' invalidates a pte in a page table and flushes that out of
+ * the TLBs of all PUs of a SMP
+ */

+ /*
+ * S390 has 1mb segments, we are emulating 4MB segments
+ */
+
+ pto = (pte_t*) (((unsigned long) pte) & 0x7ffffc00);
+
+ __asm__ __volatile(" ic 0,2(%0)\n"
+ " ipte %1,%2\n"
+ " stc 0,2(%0)"
+: : "a" (pte), "a" (pto), "a" (addr): "0");
+}

> I can't find a valid reason for which they do need the old pte
> value. The tlb should be a virtual->physical mapping only, the pte isn't
> relevant at all with the TLB. however if they really need both old pte
> address and the virtual address of the page, they can trivially pass the
> parameters to the other CPUs acquring a spinlock and using some global
> variable exactly as IA32 does to avoid flushing the whole TLB on the other
> CPUs in the flush_tlb_page case.
>
> >Obviously their work-around
> > flush_tlb_page()
> > set_pte()
> >is wrong as well, and it breaks all other architectures :-/
>
> I bet it breaks s390 too.
>

Of course :-)

> The other filemap_sync race with threads that Kanoj was talking about is
> very less severe since it can't make the machine unstable, but it can only
> forgot to write some bit using strange userspace app design (only _data_
> corruption can happen to the shared mmaping of the patological app).

Yes.
Can we ignore the munmap+access case?
I'd say that if 2 threads race with munmap+access, then the behaviour is
undefined.
Tlb flushes are expensive, I'd like to avoid the second tlb flush as in
Kanoj's patch.

--
Manfred

2000-04-11 18:29:47

by Kanoj Sarcar

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

>
> Yes.
> Can we ignore the munmap+access case?
> I'd say that if 2 threads race with munmap+access, then the behaviour is
> undefined.
> Tlb flushes are expensive, I'd like to avoid the second tlb flush as in
> Kanoj's patch.
>

To handle clones on SMP systems properly, you have to stop at least other
threads from writing to the page during unmap time, and possibly loading
the old translation during translation-changing time. Probably the only
generic way to do this is to twiddle the ptes and flush the tlb's, unless
you start making big chunks of code architecture dependent. Note that in
my patch, in most cases, the tlb flush position has changed, not the
number of flushes ....

Kanoj

2000-04-12 10:08:23

by Jamie Lokier

[permalink] [raw]

Subject: Re: zap_page_range(): TLB flush race

Manfred Spraul wrote:
> Can we ignore the munmap+access case?
> I'd say that if 2 threads race with munmap+access, then the behaviour is
> undefined.
> Tlb flushes are expensive, I'd like to avoid the second tlb flush as in
> Kanoj's patch.

No, you can't ignore it. A variation called mprotect+access is used by
garbage collection systems that expect to receive SEGVs when access is
to a protected region.

At very least, you'd have to document the race very clearly, and provide
a workaround.

-- Jamie