2009-03-04 05:58:32

by Nicolas Pitre

[permalink] [raw]
Subject: [RFC] atomic highmem kmap page pinning

I've implemented highmem for ARM. Yes, some ARM machines do have lots
of memory...

The problem is that most ARM machines have a non IO coherent cache,
meaning that the dma_map_* set of functions must clean and/or invalidate
the affected memory manually. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual addresses.

In dma_map_page(), an highmem pages could still be mapped and cached
even after kunmap() was called on it. As long as highmem pages are
mapped, page_address(page) is non null and we can use that to
synchronize the cache.

It is unlikely but still possible for kmap() to race and recycle the
obtained virtual address above, and use it for another page though. In
that case, the new mapping could end up with dirty cache lines for
another page, and the unsuspecting cache invalidation loop in
dma_map_page() won't notice resulting in data loss. Hence the need for
some kind of kmap page pinning which can be used in any context,
including IRQ context.

This is a RFC patch implementing the necessary part in the core code, as
suggested by RMK. Please comment.

diff --git a/mm/highmem.c b/mm/highmem.c
index b36b83b..548ca77 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -113,9 +113,9 @@ static void flush_all_zero_pkmaps(void)
*/
void kmap_flush_unused(void)
{
- spin_lock(&kmap_lock);
+ spin_lock_irq(&kmap_lock);
flush_all_zero_pkmaps();
- spin_unlock(&kmap_lock);
+ spin_unlock_irq(&kmap_lock);
}

static inline unsigned long map_new_virtual(struct page *page)
@@ -145,10 +145,10 @@ start:

__set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(&pkmap_map_wait, &wait);
- spin_unlock(&kmap_lock);
+ spin_unlock_irq(&kmap_lock);
schedule();
remove_wait_queue(&pkmap_map_wait, &wait);
- spin_lock(&kmap_lock);
+ spin_lock_irq(&kmap_lock);

/* Somebody else might have mapped it while we slept */
if (page_address(page))
@@ -184,19 +184,43 @@ void *kmap_high(struct page *page)
* For highmem pages, we can't trust "virtual" until
* after we have the lock.
*/
- spin_lock(&kmap_lock);
+ spin_lock_irq(&kmap_lock);
vaddr = (unsigned long)page_address(page);
if (!vaddr)
vaddr = map_new_virtual(page);
pkmap_count[PKMAP_NR(vaddr)]++;
BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 2);
- spin_unlock(&kmap_lock);
+ spin_unlock_irq(&kmap_lock);
return (void*) vaddr;
}

EXPORT_SYMBOL(kmap_high);

/**
+ * kmap_high_get - pin a highmem page into memory
+ * @page: &struct page to pin
+ *
+ * Returns the page's current virtual memory address, or NULL if no mapping
+ * exists. When and only when a non null address is returned then a
+ * matching call to kunmap_high() is necessary.
+ *
+ * This can be called from interrupt context.
+ */
+void *kmap_high_get(struct page *page)
+{
+ unsigned long vaddr, flags;
+
+ spin_lock_irqsave(&kmap_lock, flags);
+ vaddr = (unsigned long)page_address(page);
+ if (vaddr) {
+ BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
+ pkmap_count[PKMAP_NR(vaddr)]++;
+ }
+ spin_unlock_irqrestore(&kmap_lock, flags);
+ return (void*) vaddr;
+}
+
+/**
* kunmap_high - map a highmem page into memory
* @page: &struct page to unmap
*/
@@ -204,9 +228,10 @@ void kunmap_high(struct page *page)
{
unsigned long vaddr;
unsigned long nr;
+ unsigned long flags;
int need_wakeup;

- spin_lock(&kmap_lock);
+ spin_lock_irqsave(&kmap_lock, flags);
vaddr = (unsigned long)page_address(page);
BUG_ON(!vaddr);
nr = PKMAP_NR(vaddr);
@@ -232,7 +257,7 @@ void kunmap_high(struct page *page)
*/
need_wakeup = waitqueue_active(&pkmap_map_wait);
}
- spin_unlock(&kmap_lock);
+ spin_unlock_irqrestore(&kmap_lock, flags);

/* do wake-up, if needed, race-free outside of the spin lock */
if (need_wakeup)


Nicolas


2009-03-04 07:40:11

by Andrew Morton

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Wed, 04 Mar 2009 00:58:13 -0500 (EST) Nicolas Pitre <[email protected]> wrote:

> I've implemented highmem for ARM. Yes, some ARM machines do have lots
> of memory...
>
> The problem is that most ARM machines have a non IO coherent cache,
> meaning that the dma_map_* set of functions must clean and/or invalidate
> the affected memory manually. And because the majority of those
> machines have a VIVT cache, the cache maintenance operations must be
> performed using virtual addresses.
>
> In dma_map_page(), an highmem pages could still be mapped and cached
> even after kunmap() was called on it. As long as highmem pages are
> mapped, page_address(page) is non null and we can use that to
> synchronize the cache.
>
> It is unlikely but still possible for kmap() to race and recycle the
> obtained virtual address above, and use it for another page though. In
> that case, the new mapping could end up with dirty cache lines for
> another page, and the unsuspecting cache invalidation loop in
> dma_map_page() won't notice resulting in data loss. Hence the need for
> some kind of kmap page pinning which can be used in any context,
> including IRQ context.
>
> This is a RFC patch implementing the necessary part in the core code, as
> suggested by RMK. Please comment.

Seems harmless enough to me.

> +void *kmap_high_get(struct page *page)
> +{
> + unsigned long vaddr, flags;
> +
> + spin_lock_irqsave(&kmap_lock, flags);
> + vaddr = (unsigned long)page_address(page);
> + if (vaddr) {
> + BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
> + pkmap_count[PKMAP_NR(vaddr)]++;
> + }
> + spin_unlock_irqrestore(&kmap_lock, flags);
> + return (void*) vaddr;
> +}

We could remove a pile of ugly casts if we changed PKMAP_NR() to take a
void*. Not that this is relevant.

2009-03-04 08:16:34

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Wed, 04 Mar 2009 00:58:13 -0500 (EST)
Nicolas Pitre <[email protected]> wrote:

> I've implemented highmem for ARM. Yes, some ARM machines do have lots
> of memory...
>
> The problem is that most ARM machines have a non IO coherent cache,
> meaning that the dma_map_* set of functions must clean and/or invalidate
> the affected memory manually. And because the majority of those
> machines have a VIVT cache, the cache maintenance operations must be
> performed using virtual addresses.
>
> In dma_map_page(), an highmem pages could still be mapped and cached
> even after kunmap() was called on it. As long as highmem pages are
> mapped, page_address(page) is non null and we can use that to
> synchronize the cache.
> It is unlikely but still possible for kmap() to race and recycle the
> obtained virtual address above, and use it for another page though. In
> that case, the new mapping could end up with dirty cache lines for
> another page, and the unsuspecting cache invalidation loop in
> dma_map_page() won't notice resulting in data loss. Hence the need for
> some kind of kmap page pinning which can be used in any context,
> including IRQ context.
>
> This is a RFC patch implementing the necessary part in the core code, as
> suggested by RMK. Please comment.

I am not sure if i understand your concern totally.
I can understand it can be recycled. but Why is it racing ?
Now, kmap semantic is that it can't be called in interrupt context.

As far as I understand, To make irq_disable to prevent this problem is rather big cost.

I think it would be better to make page_address can return null in that case
where pkmap_count is less than one or it's not previous page mapping.


> diff --git a/mm/highmem.c b/mm/highmem.c
> index b36b83b..548ca77 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -113,9 +113,9 @@ static void flush_all_zero_pkmaps(void)
> */
> void kmap_flush_unused(void)
> {
> - spin_lock(&kmap_lock);
> + spin_lock_irq(&kmap_lock);
> flush_all_zero_pkmaps();
> - spin_unlock(&kmap_lock);
> + spin_unlock_irq(&kmap_lock);
> }
>
> static inline unsigned long map_new_virtual(struct page *page)
> @@ -145,10 +145,10 @@ start:
>
> __set_current_state(TASK_UNINTERRUPTIBLE);
> add_wait_queue(&pkmap_map_wait, &wait);
> - spin_unlock(&kmap_lock);
> + spin_unlock_irq(&kmap_lock);
> schedule();
> remove_wait_queue(&pkmap_map_wait, &wait);
> - spin_lock(&kmap_lock);
> + spin_lock_irq(&kmap_lock);
>
> /* Somebody else might have mapped it while we slept */
> if (page_address(page))
> @@ -184,19 +184,43 @@ void *kmap_high(struct page *page)
> * For highmem pages, we can't trust "virtual" until
> * after we have the lock.
> */
> - spin_lock(&kmap_lock);
> + spin_lock_irq(&kmap_lock);
> vaddr = (unsigned long)page_address(page);
> if (!vaddr)
> vaddr = map_new_virtual(page);
> pkmap_count[PKMAP_NR(vaddr)]++;
> BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 2);
> - spin_unlock(&kmap_lock);
> + spin_unlock_irq(&kmap_lock);
> return (void*) vaddr;
> }
>
> EXPORT_SYMBOL(kmap_high);
>
> /**
> + * kmap_high_get - pin a highmem page into memory
> + * @page: &struct page to pin
> + *
> + * Returns the page's current virtual memory address, or NULL if no mapping
> + * exists. When and only when a non null address is returned then a
> + * matching call to kunmap_high() is necessary.
> + *
> + * This can be called from interrupt context.
> + */
> +void *kmap_high_get(struct page *page)
> +{
> + unsigned long vaddr, flags;
> +
> + spin_lock_irqsave(&kmap_lock, flags);
> + vaddr = (unsigned long)page_address(page);
> + if (vaddr) {
> + BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
> + pkmap_count[PKMAP_NR(vaddr)]++;
> + }
> + spin_unlock_irqrestore(&kmap_lock, flags);
> + return (void*) vaddr;
> +}
> +
> +/**
> * kunmap_high - map a highmem page into memory
> * @page: &struct page to unmap
> */
> @@ -204,9 +228,10 @@ void kunmap_high(struct page *page)
> {
> unsigned long vaddr;
> unsigned long nr;
> + unsigned long flags;
> int need_wakeup;
>
> - spin_lock(&kmap_lock);
> + spin_lock_irqsave(&kmap_lock, flags);
> vaddr = (unsigned long)page_address(page);
> BUG_ON(!vaddr);
> nr = PKMAP_NR(vaddr);
> @@ -232,7 +257,7 @@ void kunmap_high(struct page *page)
> */
> need_wakeup = waitqueue_active(&pkmap_map_wait);
> }
> - spin_unlock(&kmap_lock);
> + spin_unlock_irqrestore(&kmap_lock, flags);
>
> /* do wake-up, if needed, race-free outside of the spin lock */
> if (need_wakeup)
>
>
> Nicolas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


--
Kinds Regards
Minchan Kim

2009-03-04 17:26:19

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Wed, 4 Mar 2009, Minchan Kim wrote:

> On Wed, 04 Mar 2009 00:58:13 -0500 (EST)
> Nicolas Pitre <[email protected]> wrote:
>
> > I've implemented highmem for ARM. Yes, some ARM machines do have lots
> > of memory...
> >
> > The problem is that most ARM machines have a non IO coherent cache,
> > meaning that the dma_map_* set of functions must clean and/or invalidate
> > the affected memory manually. And because the majority of those
> > machines have a VIVT cache, the cache maintenance operations must be
> > performed using virtual addresses.
> >
> > In dma_map_page(), an highmem pages could still be mapped and cached
> > even after kunmap() was called on it. As long as highmem pages are
> > mapped, page_address(page) is non null and we can use that to
> > synchronize the cache.
> > It is unlikely but still possible for kmap() to race and recycle the
> > obtained virtual address above, and use it for another page though. In
> > that case, the new mapping could end up with dirty cache lines for
> > another page, and the unsuspecting cache invalidation loop in
> > dma_map_page() won't notice resulting in data loss. Hence the need for
> > some kind of kmap page pinning which can be used in any context,
> > including IRQ context.
> >
> > This is a RFC patch implementing the necessary part in the core code, as
> > suggested by RMK. Please comment.
>
> I am not sure if i understand your concern totally.
> I can understand it can be recycled. but Why is it racing ?

Suppose this sequence of events:

- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.

--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.

--> - for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);

*** preemption occurs in the middle of the loop above ***

- kmap_high() is called for a different page.

--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so it is
unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.

- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says it will eventually happen to use
the same vaddr as the one which used to belong to the
other page being processed by dma_map_page() in the
preempted thread above.

- The caller of kmap_high() start dirtying the cache using the
new virtual mapping for its page.

*** the first thread is rescheduled ***

- The for loop is resumed, but now cached data
belonging to a different physical page is
being discarded!

And this is not only a preemption issue. ARM can be SMP as well where
this scenario is just as likely, and disabling preemption in
dma_map_page() won't prevent it.

> Now, kmap semantic is that it can't be called in interrupt context.

I know. And in this case I don't need the full kmap_high() semantics.
What I need is a guarantee that, if I start invalidating cache lines
from an highmem page, its virtual mapping won't go away. Meaning that I
need to increase pkmap_count whenever it is not zero. And if it is zero
then there is simply no cache invalidation to worry about. And that
pkmap_count increment must be possible from any context as its primary
user would be dma_map_page().

> As far as I understand, To make irq_disable to prevent this problem is
> rather big cost.

How big? Could you please elaborate on the significance of this cost?

> I think it would be better to make page_address can return null in that case
> where pkmap_count is less than one

This is already the case, and when it happens then there is no cache
invalidation to perform like I say above. The race is possible when
pkmap_count is 1 or becomes 1.

> or it's not previous page mapping.

Even if the cache invalidation loop checks on every iteration if the
page mapping changed which would be terribly inefficient, there is still
a race window for the mapping to change between the mapping test
and the actual cache line invalidation instruction.


Nicolas

2009-03-04 23:09:21

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

Hi, Nicolas.

On Wed, 04 Mar 2009 12:26:00 -0500 (EST)
Nicolas Pitre <[email protected]> wrote:

> On Wed, 4 Mar 2009, Minchan Kim wrote:
>
> > On Wed, 04 Mar 2009 00:58:13 -0500 (EST)
> > Nicolas Pitre <[email protected]> wrote:
> >
> > > I've implemented highmem for ARM. Yes, some ARM machines do have lots
> > > of memory...
> > >
> > > The problem is that most ARM machines have a non IO coherent cache,
> > > meaning that the dma_map_* set of functions must clean and/or invalidate
> > > the affected memory manually. And because the majority of those
> > > machines have a VIVT cache, the cache maintenance operations must be
> > > performed using virtual addresses.
> > >
> > > In dma_map_page(), an highmem pages could still be mapped and cached
> > > even after kunmap() was called on it. As long as highmem pages are
> > > mapped, page_address(page) is non null and we can use that to
> > > synchronize the cache.
> > > It is unlikely but still possible for kmap() to race and recycle the
> > > obtained virtual address above, and use it for another page though. In
> > > that case, the new mapping could end up with dirty cache lines for
> > > another page, and the unsuspecting cache invalidation loop in
> > > dma_map_page() won't notice resulting in data loss. Hence the need for
> > > some kind of kmap page pinning which can be used in any context,
> > > including IRQ context.
> > >
> > > This is a RFC patch implementing the necessary part in the core code, as
> > > suggested by RMK. Please comment.
> >
> > I am not sure if i understand your concern totally.
> > I can understand it can be recycled. but Why is it racing ?
>
> Suppose this sequence of events:
>
> - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
>
> --> - vaddr = page_address(page) is non null. In this case
> it is likely that the page has valid cache lines
> associated with vaddr. Remember that the cache is VIVT.
>
> --> - for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
> invalidate_cache_line(i);
>
> *** preemption occurs in the middle of the loop above ***
>
> - kmap_high() is called for a different page.
>
> --> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
> is called. The pkmap_count value for the page passed
> to dma_map_page() above happens to be 1, so it is
> unmapped. But prior to that, flush_cache_kmaps()
> cleared the cache for it. So far so good.

Thanks for kind explanation.:)

I thought kmap and dma_map_page usage was following.

kmap(page);
...
dma_map_page(...)
invalidate_cache_line

kunmap(page);

In this case, how do pkmap_count value for the page passed to dma_map_page become 1 ?
The caller have to make sure to complete dma_map_page before kunmap.

Do I miss something ?

>
> - A fresh pkmap entry is assigned for this kmap request.
> The Murphy law says it will eventually happen to use
> the same vaddr as the one which used to belong to the
> other page being processed by dma_map_page() in the
> preempted thread above.
>
> - The caller of kmap_high() start dirtying the cache using the
> new virtual mapping for its page.
>
> *** the first thread is rescheduled ***
>
> - The for loop is resumed, but now cached data
> belonging to a different physical page is
> being discarded!
> And this is not only a preemption issue. ARM can be SMP as well where
> this scenario is just as likely, and disabling preemption in
> dma_map_page() won't prevent it.
>
> > Now, kmap semantic is that it can't be called in interrupt context.
>
> I know. And in this case I don't need the full kmap_high() semantics.
> What I need is a guarantee that, if I start invalidating cache lines
> from an highmem page, its virtual mapping won't go away. Meaning that I
> need to increase pkmap_count whenever it is not zero. And if it is zero
> then there is simply no cache invalidation to worry about. And that
> pkmap_count increment must be possible from any context as its primary
> user would be dma_map_page().
>
> > As far as I understand, To make irq_disable to prevent this problem is
> > rather big cost.
>
> How big? Could you please elaborate on the significance of this cost?

I don't have a number. It depends on you for submitting this patch.
The kernel have been used kmap in many fs and driver, even mm.
So, For merging this path, you should provide benchmark result.
Sometime, other server guy can help you for getting the data.

> > I think it would be better to make page_address can return null in that case
> > where pkmap_count is less than one
>
> This is already the case, and when it happens then there is no cache
> invalidation to perform like I say above. The race is possible when
> pkmap_count is 1 or becomes 1.
>
> > or it's not previous page mapping.
>
> Even if the cache invalidation loop checks on every iteration if the
> page mapping changed which would be terribly inefficient, there is still
> a race window for the mapping to change between the mapping test
> and the actual cache line invalidation instruction.
>
>
> Nicolas


--
Kinds Regards
Minchan Kim

2009-03-04 23:47:01

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Thu, Mar 05, 2009 at 08:07:17AM +0900, Minchan Kim wrote:
> On Wed, 04 Mar 2009 12:26:00 -0500 (EST)
> Nicolas Pitre <[email protected]> wrote:
>
> > On Wed, 4 Mar 2009, Minchan Kim wrote:
> >
> > > On Wed, 04 Mar 2009 00:58:13 -0500 (EST)
> > > Nicolas Pitre <[email protected]> wrote:
> > >
> > > > I've implemented highmem for ARM. Yes, some ARM machines do have lots
> > > > of memory...
> > > >
> > > > The problem is that most ARM machines have a non IO coherent cache,
> > > > meaning that the dma_map_* set of functions must clean and/or invalidate
> > > > the affected memory manually. And because the majority of those
> > > > machines have a VIVT cache, the cache maintenance operations must be
> > > > performed using virtual addresses.
> > > >
> > > > In dma_map_page(), an highmem pages could still be mapped and cached
> > > > even after kunmap() was called on it. As long as highmem pages are
> > > > mapped, page_address(page) is non null and we can use that to
> > > > synchronize the cache.
> > > > It is unlikely but still possible for kmap() to race and recycle the
> > > > obtained virtual address above, and use it for another page though. In
> > > > that case, the new mapping could end up with dirty cache lines for
> > > > another page, and the unsuspecting cache invalidation loop in
> > > > dma_map_page() won't notice resulting in data loss. Hence the need for
> > > > some kind of kmap page pinning which can be used in any context,
> > > > including IRQ context.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > > > This is a RFC patch implementing the necessary part in the core code, as
> > > > suggested by RMK. Please comment.
> > >
> > > I am not sure if i understand your concern totally.
> > > I can understand it can be recycled. but Why is it racing ?
> >
> > Suppose this sequence of events:
> >
> > - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
> >
> > --> - vaddr = page_address(page) is non null. In this case
> > it is likely that the page has valid cache lines
> > associated with vaddr. Remember that the cache is VIVT.
> >
> > --> - for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
> > invalidate_cache_line(i);
> >
> > *** preemption occurs in the middle of the loop above ***
> >
> > - kmap_high() is called for a different page.
> >
> > --> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
> > is called. The pkmap_count value for the page passed
> > to dma_map_page() above happens to be 1, so it is
> > unmapped. But prior to that, flush_cache_kmaps()
> > cleared the cache for it. So far so good.
>
> Thanks for kind explanation.:)
>
> I thought kmap and dma_map_page usage was following.
>
> kmap(page);
> ...
> dma_map_page(...)
> invalidate_cache_line
>
> kunmap(page);

No, that's not the usage at all. kmap() can't be called from the
contexts which dma_map_page() is called from (iow, IRQ contexts as
pointed out in the paragraph I underlined above.)

We're talking about dma_map_page() _internally_ calling kmap_get_page()
to _atomically_ and _safely_ check whether the page was kmapped. If
it was kmapped, we need to pin the page and return its currently mapped
address for cache handling and then release that reference.

None of the existing kmap support comes anywhere near to providing a
mechanism for this because it can't be used in the contexts under which
dma_map_page() is called.

If we could do it with existing interfaces, we wouldn't need a new
interface would we?

2009-03-05 00:27:39

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Wed, 4 Mar 2009 23:46:33 +0000
Russell King - ARM Linux <[email protected]> wrote:

> On Thu, Mar 05, 2009 at 08:07:17AM +0900, Minchan Kim wrote:
> > On Wed, 04 Mar 2009 12:26:00 -0500 (EST)
> > Nicolas Pitre <[email protected]> wrote:
> >
> > > On Wed, 4 Mar 2009, Minchan Kim wrote:
> > >
> > > > On Wed, 04 Mar 2009 00:58:13 -0500 (EST)
> > > > Nicolas Pitre <[email protected]> wrote:
> > > >
> > > > > I've implemented highmem for ARM. Yes, some ARM machines do have lots
> > > > > of memory...
> > > > >
> > > > > The problem is that most ARM machines have a non IO coherent cache,
> > > > > meaning that the dma_map_* set of functions must clean and/or invalidate
> > > > > the affected memory manually. And because the majority of those
> > > > > machines have a VIVT cache, the cache maintenance operations must be
> > > > > performed using virtual addresses.
> > > > >
> > > > > In dma_map_page(), an highmem pages could still be mapped and cached
> > > > > even after kunmap() was called on it. As long as highmem pages are
> > > > > mapped, page_address(page) is non null and we can use that to
> > > > > synchronize the cache.
> > > > > It is unlikely but still possible for kmap() to race and recycle the
> > > > > obtained virtual address above, and use it for another page though. In
> > > > > that case, the new mapping could end up with dirty cache lines for
> > > > > another page, and the unsuspecting cache invalidation loop in
> > > > > dma_map_page() won't notice resulting in data loss. Hence the need for
> > > > > some kind of kmap page pinning which can be used in any context,
> > > > > including IRQ context.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> > > > > This is a RFC patch implementing the necessary part in the core code, as
> > > > > suggested by RMK. Please comment.
> > > >
> > > > I am not sure if i understand your concern totally.
> > > > I can understand it can be recycled. but Why is it racing ?
> > >
> > > Suppose this sequence of events:
> > >
> > > - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
> > >
> > > --> - vaddr = page_address(page) is non null. In this case
> > > it is likely that the page has valid cache lines
> > > associated with vaddr. Remember that the cache is VIVT.
> > >
> > > --> - for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
> > > invalidate_cache_line(i);
> > >
> > > *** preemption occurs in the middle of the loop above ***
> > >
> > > - kmap_high() is called for a different page.
> > >
> > > --> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
> > > is called. The pkmap_count value for the page passed
> > > to dma_map_page() above happens to be 1, so it is
> > > unmapped. But prior to that, flush_cache_kmaps()
> > > cleared the cache for it. So far so good.
> >
> > Thanks for kind explanation.:)
> >
> > I thought kmap and dma_map_page usage was following.
> >
> > kmap(page);
> > ...
> > dma_map_page(...)
> > invalidate_cache_line
> >
> > kunmap(page);
>
> No, that's not the usage at all. kmap() can't be called from the
> contexts which dma_map_page() is called from (iow, IRQ contexts as
> pointed out in the paragraph I underlined above.)
>
> We're talking about dma_map_page() _internally_ calling kmap_get_page()
> to _atomically_ and _safely_ check whether the page was kmapped. If
> it was kmapped, we need to pin the page and return its currently mapped
> address for cache handling and then release that reference.

Thanks, Russel.
I see. That was thing I missed. :)

> None of the existing kmap support comes anywhere near to providing a
> mechanism for this because it can't be used in the contexts under which
> dma_map_page() is called.

Right.

> If we could do it with existing interfaces, we wouldn't need a new
> interface would we?

OK.
As previous said, I don't like kmap_high's irq disable.
It's already used in many place. so irq'disable effect might be rather big.

How about new interface which is like KM_IRQ's kmap_atomic slot
with serializing kmap_atomic_lock?

Let's Cced other experts.

--
Kinds Regards
Minchan Kim

2009-03-05 00:30:32

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

It seems Andrea's mail address is changed.
I will resend new Andrea's mail address.

On Thu, Mar 5, 2009 at 9:25 AM, Minchan Kim <[email protected]> wrote:
> - Show quoted text -
> On Wed, 4 Mar 2009 23:46:33 +0000
> Russell King - ARM Linux <[email protected]> wrote:
>
>> On Thu, Mar 05, 2009 at 08:07:17AM +0900, Minchan Kim wrote:
>> > On Wed, 04 Mar 2009 12:26:00 -0500 (EST)
>> > Nicolas Pitre <[email protected]> wrote:
>> >
>> > > On Wed, 4 Mar 2009, Minchan Kim wrote:
>> > >
>> > > > On Wed, 04 Mar 2009 00:58:13 -0500 (EST)
>> > > > Nicolas Pitre <[email protected]> wrote:
>> > > >
>> > > > > I've implemented highmem for ARM.  Yes, some ARM machines do have lots
>> > > > > of memory...
>> > > > >
>> > > > > The problem is that most ARM machines have a non IO coherent cache,
>> > > > > meaning that the dma_map_* set of functions must clean and/or invalidate
>> > > > > the affected memory manually.  And because the majority of those
>> > > > > machines have a VIVT cache, the cache maintenance operations must be
>> > > > > performed using virtual addresses.
>> > > > >
>> > > > > In dma_map_page(), an highmem pages could still be mapped and cached
>> > > > > even after kunmap() was called on it.  As long as highmem pages are
>> > > > > mapped, page_address(page) is non null and we can use that to
>> > > > > synchronize the cache.
>> > > > > It is unlikely but still possible for kmap() to race and recycle the
>> > > > > obtained virtual address above, and use it for another page though.  In
>> > > > > that case, the new mapping could end up with dirty cache lines for
>> > > > > another page, and the unsuspecting cache invalidation loop in
>> > > > > dma_map_page() won't notice resulting in data loss.  Hence the need for
>> > > > > some kind of kmap page pinning which can be used in any context,
>> > > > > including IRQ context.
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> > > > > This is a RFC patch implementing the necessary part in the core code, as
>> > > > > suggested by RMK. Please comment.
>> > > >
>> > > > I am not sure if i understand your concern totally.
>> > > > I can understand it can be recycled. but Why is it racing ?
>> > >
>> > > Suppose this sequence of events:
>> > >
>> > >   - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
>> > >
>> > >   -->     - vaddr = page_address(page) is non null. In this case
>> > >             it is likely that the page has valid cache lines
>> > >             associated with vaddr. Remember that the cache is VIVT.
>> > >
>> > >           -->     - for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
>> > >                           invalidate_cache_line(i);
>> > >
>> > >   *** preemption occurs in the middle of the loop above ***
>> > >
>> > >   - kmap_high() is called for a different page.
>> > >
>> > >   -->     - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
>> > >             is called.  The pkmap_count value for the page passed
>> > >             to dma_map_page() above happens to be 1, so it is
>> > >             unmapped.  But prior to that, flush_cache_kmaps()
>> > >             cleared the cache for it.  So far so good.
>> >
>> > Thanks for kind explanation.:)
>> >
>> > I thought kmap and dma_map_page usage was following.
>> >
>> > kmap(page);
>> > ...
>> > dma_map_page(...)
>> >   invalidate_cache_line
>> >
>> > kunmap(page);
>>
>> No, that's not the usage at all.  kmap() can't be called from the
>> contexts which dma_map_page() is called from (iow, IRQ contexts as
>> pointed out in the paragraph I underlined above.)
>>
>> We're talking about dma_map_page() _internally_ calling kmap_get_page()
>> to _atomically_ and _safely_ check whether the page was kmapped.  If
>> it was kmapped, we need to pin the page and return its currently mapped
>> address for cache handling and then release that reference.
>
> Thanks, Russel.
> I see. That was thing I missed. :)
>
>> None of the existing kmap support comes anywhere near to providing a
>> mechanism for this because it can't be used in the contexts under which
>> dma_map_page() is called.
>
> Right.
>
>> If we could do it with existing interfaces, we wouldn't need a new
>> interface would we?
>
> OK.
> As previous said, I don't like kmap_high's irq disable.
> It's already used in many place. so irq'disable effect might be rather big.
>
> How about new interface which is like KM_IRQ's kmap_atomic slot
>  with serializing kmap_atomic_lock?
>
> Let's Cced other experts.
>
> --
> Kinds Regards
> Minchan Kim
>



--
Kinds regards,
Minchan Kim

2009-03-05 02:38:20

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Thu, 5 Mar 2009, Minchan Kim wrote:

> I thought kmap and dma_map_page usage was following.
>
> kmap(page);
> ...
> dma_map_page(...)
> invalidate_cache_line
>
> kunmap(page);
>
> In this case, how do pkmap_count value for the page passed to dma_map_page become 1 ?
> The caller have to make sure to complete dma_map_page before kunmap.


The caller doesn't have to call kmap() on pages it intends to use for
DMA.

> Do I miss something ?

See above.

> > > As far as I understand, To make irq_disable to prevent this problem is
> > > rather big cost.
> >
> > How big? Could you please elaborate on the significance of this cost?
>
> I don't have a number. It depends on you for submitting this patch.

My assertion is that the cost is negligible. This is why I'm asking you
why you think this is a big cost.


Nicolas

2009-03-05 04:23:09

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Wed, 04 Mar 2009 21:37:43 -0500 (EST)
Nicolas Pitre <[email protected]> wrote:

> On Thu, 5 Mar 2009, Minchan Kim wrote:
>
> > I thought kmap and dma_map_page usage was following.
> >
> > kmap(page);
> > ...
> > dma_map_page(...)
> > invalidate_cache_line
> >
> > kunmap(page);
> >
> > In this case, how do pkmap_count value for the page passed to dma_map_page become 1 ?
> > The caller have to make sure to complete dma_map_page before kunmap.
>
>
> The caller doesn't have to call kmap() on pages it intends to use for
> DMA.

Thanks for pointing me.
Russel also explained that.
Sorry for my misunderstanding.

I want to add your changelog in git log.
---
- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.

--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.

--> - for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);

*** preemption occurs in the middle of the loop above ***

- kmap_high() is called for a different page.

--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so it is
unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.

- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says it will eventually happen to use
the same vaddr as the one which used to belong to the
other page being processed by dma_map_page() in the
preempted thread above.

- The caller of kmap_high() start dirtying the cache using the
new virtual mapping for its page.

*** the first thread is rescheduled ***

- The for loop is resumed, but now cached data
belonging to a different physical page is
being discarded!
---

> > Do I miss something ?
>
> See above.
>
> > > > As far as I understand, To make irq_disable to prevent this problem is
> > > > rather big cost.
> > >
> > > How big? Could you please elaborate on the significance of this cost?
> >
> > I don't have a number. It depends on you for submitting this patch.
>
> My assertion is that the cost is negligible. This is why I'm asking you
> why you think this is a big cost.

Of course, I am not sure whether it's big cost or not.
But I thought it already is used in many fs, driver.
so, whether it's big cost depends on workload type .

However, This patch is needed for VIVT and no coherent cache.
Is right ?

If it is right, it will add unnessary overhead in other architecture
which don't have this problem.

I think it's not desirable although it is small cost.
If we have a other method which avoids unnessary overhead, It would be better.
Unfortunately, I don't have any way to solve this, now.

Let us wait for other server guys's opinion. :)

> Nicolas


--
Kinds Regards
Minchan Kim

2009-03-05 04:57:54

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Thu, 5 Mar 2009, Minchan Kim wrote:

> On Wed, 04 Mar 2009 21:37:43 -0500 (EST)
> Nicolas Pitre <[email protected]> wrote:
>
> > My assertion is that the cost is negligible. This is why I'm asking you
> > why you think this is a big cost.
>
> Of course, I am not sure whether it's big cost or not.
> But I thought it already is used in many fs, driver.
> so, whether it's big cost depends on workload type .
>
> However, This patch is needed for VIVT and no coherent cache.
> Is right ?
>
> If it is right, it will add unnessary overhead in other architecture
> which don't have this problem.
>
> I think it's not desirable although it is small cost.
> If we have a other method which avoids unnessary overhead, It would be better.
> Unfortunately, I don't have any way to solve this, now.

OK. What about this patch then:

>From c4db60c3a2395476331b62e08cf1f64fc9af8d54 Mon Sep 17 00:00:00 2001
From: Nicolas Pitre <[email protected]>
Date: Wed, 4 Mar 2009 22:49:41 -0500
Subject: [PATCH] atomic highmem kmap page pinning

Most ARM machines have a non IO coherent cache, meaning that the
dma_map_*() set of functions must clean and/or invalidate the affected
memory manually before DMA occurs. And because the majority of those
machines have a VIVT cache, the cache maintenance operations must be
performed using virtual
addresses.

When a highmem page is kunmap'd, its mapping (and cache) remains in place
in case it is kmap'd again. However if dma_map_page() is then called with
such a page, some cache maintenance on the remaining mapping must be
performed. In that case, page_address(page) is non null and we can use
that to synchronize the cache.

It is unlikely but still possible for kmap() to race and recycle the
virtual address obtained above, and use it for another page before some
on-going cache invalidation loop in dma_map_page() is done. In that case,
the new mapping could end up with dirty cache lines for another page,
and the unsuspecting cache invalidation loop in dma_map_page() might
simply discard those dirty cache lines resulting in data loss.

For example, let's consider this sequence of events:

- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.

--> - vaddr = page_address(page) is non null. In this case
it is likely that the page has valid cache lines
associated with vaddr. Remember that the cache is VIVT.

--> for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
invalidate_cache_line(i);

*** preemption occurs in the middle of the loop above ***

- kmap_high() is called for a different page.

--> - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
is called. The pkmap_count value for the page passed
to dma_map_page() above happens to be 1, so the page
is unmapped. But prior to that, flush_cache_kmaps()
cleared the cache for it. So far so good.

- A fresh pkmap entry is assigned for this kmap request.
The Murphy law says this pkmap entry will eventually
happen to use the same vaddr as the one which used to
belong to the other page being processed by
dma_map_page() in the preempted thread above.

- The kmap_high() caller start dirtying the cache using the
just assigned virtual mapping for its page.

*** the first thread is rescheduled ***

- The for(...) loop is resumed, but now cached
data belonging to a different physical page is
being discarded !

And this is not only a preemption issue as ARM can be SMP as well,
making the above scenario just as likely. Hence the need for some kind
of pkmap page pinning which can be used in any context, primarily for
the benefit of dma_map_page() on ARM.

This provides the necessary interface to cope with the above issue if
ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
unchanged.

Signed-off-by: Nicolas Pitre <[email protected]>

diff --git a/mm/highmem.c b/mm/highmem.c
index b36b83b..cc61399 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -67,6 +67,25 @@ pte_t * pkmap_page_table;

static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);

+/*
+ * Most architectures have no use for kmap_high_get(), so let's abstract
+ * the disabling of IRQ out of the locking in that case to save on a
+ * potential useless overhead.
+ */
+#ifdef ARCH_NEEDS_KMAP_HIGH_GET
+#define spin_lock_kmap() spin_lock_irq(&kmap_lock)
+#define spin_unlock_kmap() spin_unlock_irq(&kmap_lock)
+#define spin_lock_kmap_any(flags) spin_lock_irqsave(&kmap_lock, flags)
+#define spin_unlock_kmap_any(flags) spin_unlock_irqrestore(&kmap_lock, flags)
+#else
+#define spin_lock_kmap() spin_lock(&kmap_lock)
+#define spin_unlock_kmap() spin_unlock(&kmap_lock)
+#define spin_lock_kmap_any(flags) \
+ do { spin_lock(&kmap_lock); (void)(flags); } while (0)
+#define spin_unlock_kmap_any(flags) \
+ do { spin_unlock(&kmap_lock); (void)(flags); } while (0)
+#endif
+
static void flush_all_zero_pkmaps(void)
{
int i;
@@ -113,9 +132,9 @@ static void flush_all_zero_pkmaps(void)
*/
void kmap_flush_unused(void)
{
- spin_lock(&kmap_lock);
+ spin_lock_kmap();
flush_all_zero_pkmaps();
- spin_unlock(&kmap_lock);
+ spin_unlock_kmap();
}

static inline unsigned long map_new_virtual(struct page *page)
@@ -145,10 +164,10 @@ start:

__set_current_state(TASK_UNINTERRUPTIBLE);
add_wait_queue(&pkmap_map_wait, &wait);
- spin_unlock(&kmap_lock);
+ spin_unlock_kmap();
schedule();
remove_wait_queue(&pkmap_map_wait, &wait);
- spin_lock(&kmap_lock);
+ spin_lock_kmap();

/* Somebody else might have mapped it while we slept */
if (page_address(page))
@@ -184,29 +203,59 @@ void *kmap_high(struct page *page)
* For highmem pages, we can't trust "virtual" until
* after we have the lock.
*/
- spin_lock(&kmap_lock);
+ spin_lock_kmap();
vaddr = (unsigned long)page_address(page);
if (!vaddr)
vaddr = map_new_virtual(page);
pkmap_count[PKMAP_NR(vaddr)]++;
BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 2);
- spin_unlock(&kmap_lock);
+ spin_unlock_kmap();
return (void*) vaddr;
}

EXPORT_SYMBOL(kmap_high);

+#ifdef ARCH_NEEDS_KMAP_HIGH_GET
+/**
+ * kmap_high_get - pin a highmem page into memory
+ * @page: &struct page to pin
+ *
+ * Returns the page's current virtual memory address, or NULL if no mapping
+ * exists. When and only when a non null address is returned then a
+ * matching call to kunmap_high() is necessary.
+ *
+ * This can be called from any context.
+ */
+void *kmap_high_get(struct page *page)
+{
+ unsigned long vaddr, flags;
+
+ spin_lock_kmap_any(flags);
+ vaddr = (unsigned long)page_address(page);
+ if (vaddr) {
+ BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
+ pkmap_count[PKMAP_NR(vaddr)]++;
+ }
+ spin_unlock_kmap_any(flags);
+ return (void*) vaddr;
+}
+#endif
+
/**
* kunmap_high - map a highmem page into memory
* @page: &struct page to unmap
+ *
+ * If ARCH_NEEDS_KMAP_HIGH_GET is not defined then this may be called
+ * only from user context.
*/
void kunmap_high(struct page *page)
{
unsigned long vaddr;
unsigned long nr;
+ unsigned long flags;
int need_wakeup;

- spin_lock(&kmap_lock);
+ spin_lock_kmap_any(flags);
vaddr = (unsigned long)page_address(page);
BUG_ON(!vaddr);
nr = PKMAP_NR(vaddr);
@@ -232,7 +281,7 @@ void kunmap_high(struct page *page)
*/
need_wakeup = waitqueue_active(&pkmap_map_wait);
}
- spin_unlock(&kmap_lock);
+ spin_unlock_kmap_any(flags);

/* do wake-up, if needed, race-free outside of the spin lock */
if (need_wakeup)

2009-03-05 22:23:57

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Thu, Mar 5, 2009 at 1:57 PM, Nicolas Pitre <[email protected]> wrote:
> On Thu, 5 Mar 2009, Minchan Kim wrote:
>
>> On Wed, 04 Mar 2009 21:37:43 -0500 (EST)
>> Nicolas Pitre <[email protected]> wrote:
>>
>> > My assertion is that the cost is negligible.  This is why I'm asking you
>> > why you think this is a big cost.
>>
>> Of course, I am not sure whether it's big cost or not.
>> But I thought it already is used in many fs, driver.
>> so, whether it's big cost depends on workload type .
>>
>> However, This patch is needed for VIVT and no coherent cache.
>> Is right ?
>>
>> If it is right, it will add unnessary overhead in other architecture
>> which don't have this problem.
>>
>> I think it's not desirable although it is small cost.
>> If we have a other method which avoids unnessary overhead, It would be better.
>> Unfortunately, I don't have any way to solve this, now.
>
> OK.  What about this patch then:

It looks good to me except one thing below.
Reviewed-by: MinChan Kim <[email protected]>

> From c4db60c3a2395476331b62e08cf1f64fc9af8d54 Mon Sep 17 00:00:00 2001
> From: Nicolas Pitre <[email protected]>
> Date: Wed, 4 Mar 2009 22:49:41 -0500
> Subject: [PATCH] atomic highmem kmap page pinning
>
> Most ARM machines have a non IO coherent cache, meaning that the
> dma_map_*() set of functions must clean and/or invalidate the affected
> memory manually before DMA occurs.  And because the majority of those
> machines have a VIVT cache, the cache maintenance operations must be
> performed using virtual
> addresses.
>
> When a highmem page is kunmap'd, its mapping (and cache) remains in place
> in case it is kmap'd again. However if dma_map_page() is then called with
> such a page, some cache maintenance on the remaining mapping must be
> performed. In that case, page_address(page) is non null and we can use
> that to synchronize the cache.
>
> It is unlikely but still possible for kmap() to race and recycle the
> virtual address obtained above, and use it for another page before some
> on-going cache invalidation loop in dma_map_page() is done. In that case,
> the new mapping could end up with dirty cache lines for another page,
> and the unsuspecting cache invalidation loop in dma_map_page() might
> simply discard those dirty cache lines resulting in data loss.
>
> For example, let's consider this sequence of events:
>
>        - dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
>
>        -->     - vaddr = page_address(page) is non null. In this case
>                it is likely that the page has valid cache lines
>                associated with vaddr. Remember that the cache is VIVT.
>
>                -->     for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
>                                invalidate_cache_line(i);
>
>        *** preemption occurs in the middle of the loop above ***
>
>        - kmap_high() is called for a different page.
>
>        -->     - last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
>                  is called.  The pkmap_count value for the page passed
>                  to dma_map_page() above happens to be 1, so the page
>                  is unmapped.  But prior to that, flush_cache_kmaps()
>                  cleared the cache for it.  So far so good.
>
>                - A fresh pkmap entry is assigned for this kmap request.
>                  The Murphy law says this pkmap entry will eventually
>                  happen to use the same vaddr as the one which used to
>                  belong to the other page being processed by
>                  dma_map_page() in the preempted thread above.
>
>        - The kmap_high() caller start dirtying the cache using the
>          just assigned virtual mapping for its page.
>
>        *** the first thread is rescheduled ***
>
>                        - The for(...) loop is resumed, but now cached
>                          data belonging to a different physical page is
>                          being discarded !
>
> And this is not only a preemption issue as ARM can be SMP as well,
> making the above scenario just as likely. Hence the need for some kind
> of pkmap page pinning which can be used in any context, primarily for
> the benefit of dma_map_page() on ARM.
>
> This provides the necessary interface to cope with the above issue if
> ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
> unchanged.
>
> Signed-off-by: Nicolas Pitre <[email protected]>
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b36b83b..cc61399 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -67,6 +67,25 @@ pte_t * pkmap_page_table;
>
>  static DECLARE_WAIT_QUEUE_HEAD(pkmap_map_wait);
>
> +/*
> + * Most architectures have no use for kmap_high_get(), so let's abstract
> + * the disabling of IRQ out of the locking in that case to save on a
> + * potential useless overhead.
> + */
> +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> +#define spin_lock_kmap()             spin_lock_irq(&kmap_lock)
> +#define spin_unlock_kmap()           spin_unlock_irq(&kmap_lock)
> +#define spin_lock_kmap_any(flags)    spin_lock_irqsave(&kmap_lock, flags)
> +#define spin_unlock_kmap_any(flags)  spin_unlock_irqrestore(&kmap_lock, flags)
> +#else
> +#define spin_lock_kmap()             spin_lock(&kmap_lock)
> +#define spin_unlock_kmap()           spin_unlock(&kmap_lock)
> +#define spin_lock_kmap_any(flags)    \
> +       do { spin_lock(&kmap_lock); (void)(flags); } while (0)
> +#define spin_unlock_kmap_any(flags)  \
> +       do { spin_unlock(&kmap_lock); (void)(flags); } while (0)
> +#endif
> +
>  static void flush_all_zero_pkmaps(void)
>  {
>        int i;
> @@ -113,9 +132,9 @@ static void flush_all_zero_pkmaps(void)
>  */
>  void kmap_flush_unused(void)
>  {
> -       spin_lock(&kmap_lock);
> +       spin_lock_kmap();
>        flush_all_zero_pkmaps();
> -       spin_unlock(&kmap_lock);
> +       spin_unlock_kmap();
>  }
>
>  static inline unsigned long map_new_virtual(struct page *page)
> @@ -145,10 +164,10 @@ start:
>
>                        __set_current_state(TASK_UNINTERRUPTIBLE);
>                        add_wait_queue(&pkmap_map_wait, &wait);
> -                       spin_unlock(&kmap_lock);
> +                       spin_unlock_kmap();
>                        schedule();
>                        remove_wait_queue(&pkmap_map_wait, &wait);
> -                       spin_lock(&kmap_lock);
> +                       spin_lock_kmap();
>
>                        /* Somebody else might have mapped it while we slept */
>                        if (page_address(page))
> @@ -184,29 +203,59 @@ void *kmap_high(struct page *page)
>         * For highmem pages, we can't trust "virtual" until
>         * after we have the lock.
>         */
> -       spin_lock(&kmap_lock);
> +       spin_lock_kmap();
>        vaddr = (unsigned long)page_address(page);
>        if (!vaddr)
>                vaddr = map_new_virtual(page);
>        pkmap_count[PKMAP_NR(vaddr)]++;
>        BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 2);
> -       spin_unlock(&kmap_lock);
> +       spin_unlock_kmap();
>        return (void*) vaddr;
>  }
>
>  EXPORT_SYMBOL(kmap_high);
>
> +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> +/**
> + * kmap_high_get - pin a highmem page into memory
> + * @page: &struct page to pin
> + *
> + * Returns the page's current virtual memory address, or NULL if no mapping
> + * exists.  When and only when a non null address is returned then a
> + * matching call to kunmap_high() is necessary.
> + *
> + * This can be called from any context.
> + */
> +void *kmap_high_get(struct page *page)
> +{
> +       unsigned long vaddr, flags;
> +
> +       spin_lock_kmap_any(flags);
> +       vaddr = (unsigned long)page_address(page);
> +       if (vaddr) {
> +               BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
> +               pkmap_count[PKMAP_NR(vaddr)]++;
> +       }
> +       spin_unlock_kmap_any(flags);
> +       return (void*) vaddr;
> +}
> +#endif

Let's add empty function for architecture of no ARCH_NEEDS_KMAP_HIGH_GET,

> +
>  /**
>  * kunmap_high - map a highmem page into memory
>  * @page: &struct page to unmap
> + *
> + * If ARCH_NEEDS_KMAP_HIGH_GET is not defined then this may be called
> + * only from user context.
>  */
>  void kunmap_high(struct page *page)
>  {
>        unsigned long vaddr;
>        unsigned long nr;
> +       unsigned long flags;
>        int need_wakeup;
>
> -       spin_lock(&kmap_lock);
> +       spin_lock_kmap_any(flags);
>        vaddr = (unsigned long)page_address(page);
>        BUG_ON(!vaddr);
>        nr = PKMAP_NR(vaddr);
> @@ -232,7 +281,7 @@ void kunmap_high(struct page *page)
>                 */
>                need_wakeup = waitqueue_active(&pkmap_map_wait);
>        }
> -       spin_unlock(&kmap_lock);
> +       spin_unlock_kmap_any(flags);
>
>        /* do wake-up, if needed, race-free outside of the spin lock */
>        if (need_wakeup)
>



--
Kinds regards,
Minchan Kim

2009-03-05 22:59:47

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Fri, Mar 06, 2009 at 07:23:44AM +0900, Minchan Kim wrote:
> > +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> > +/**
> > + * kmap_high_get - pin a highmem page into memory
> > + * @page: &struct page to pin
> > + *
> > + * Returns the page's current virtual memory address, or NULL if no mapping
> > + * exists. ?When and only when a non null address is returned then a
> > + * matching call to kunmap_high() is necessary.
> > + *
> > + * This can be called from any context.
> > + */
> > +void *kmap_high_get(struct page *page)
> > +{
> > + ? ? ? unsigned long vaddr, flags;
> > +
> > + ? ? ? spin_lock_kmap_any(flags);
> > + ? ? ? vaddr = (unsigned long)page_address(page);
> > + ? ? ? if (vaddr) {
> > + ? ? ? ? ? ? ? BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
> > + ? ? ? ? ? ? ? pkmap_count[PKMAP_NR(vaddr)]++;
> > + ? ? ? }
> > + ? ? ? spin_unlock_kmap_any(flags);
> > + ? ? ? return (void*) vaddr;
> > +}
> > +#endif
>
> Let's add empty function for architecture of no ARCH_NEEDS_KMAP_HIGH_GET,

The reasoning being?

2009-03-05 23:14:37

by Minchan Kim

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Fri, Mar 6, 2009 at 7:59 AM, Russell King - ARM Linux
<[email protected]> wrote:
> On Fri, Mar 06, 2009 at 07:23:44AM +0900, Minchan Kim wrote:
>> > +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
>> > +/**
>> > + * kmap_high_get - pin a highmem page into memory
>> > + * @page: &struct page to pin
>> > + *
>> > + * Returns the page's current virtual memory address, or NULL if no mapping
>> > + * exists.  When and only when a non null address is returned then a
>> > + * matching call to kunmap_high() is necessary.
>> > + *
>> > + * This can be called from any context.
>> > + */
>> > +void *kmap_high_get(struct page *page)
>> > +{
>> > +       unsigned long vaddr, flags;
>> > +
>> > +       spin_lock_kmap_any(flags);
>> > +       vaddr = (unsigned long)page_address(page);
>> > +       if (vaddr) {
>> > +               BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
>> > +               pkmap_count[PKMAP_NR(vaddr)]++;
>> > +       }
>> > +       spin_unlock_kmap_any(flags);
>> > +       return (void*) vaddr;
>> > +}
>> > +#endif
>>
>> Let's add empty function for architecture of no ARCH_NEEDS_KMAP_HIGH_GET,
>
> The reasoning being?

I thought it can be used in common arm function.
so, for VIVT, it can be work but for VIPT, it can be nulled as
preventing compile error.

But, I don't know where we use kmap_high_get since I didn't see any
patch which use it.
If it is only used architecture specific place, pz, forgot my comment.

--
Kinds regards,
Minchan Kim

2009-03-07 22:28:29

by Nicolas Pitre

[permalink] [raw]
Subject: Re: [RFC] atomic highmem kmap page pinning

On Fri, 6 Mar 2009, Minchan Kim wrote:

> On Fri, Mar 6, 2009 at 7:59 AM, Russell King - ARM Linux
> <[email protected]> wrote:
> > On Fri, Mar 06, 2009 at 07:23:44AM +0900, Minchan Kim wrote:
> >> > +#ifdef ARCH_NEEDS_KMAP_HIGH_GET
> >> > +/**
> >> > + * kmap_high_get - pin a highmem page into memory
> >> > + * @page: &struct page to pin
> >> > + *
> >> > + * Returns the page's current virtual memory address, or NULL if no mapping
> >> > + * exists.  When and only when a non null address is returned then a
> >> > + * matching call to kunmap_high() is necessary.
> >> > + *
> >> > + * This can be called from any context.
> >> > + */
> >> > +void *kmap_high_get(struct page *page)
> >> > +{
> >> > +       unsigned long vaddr, flags;
> >> > +
> >> > +       spin_lock_kmap_any(flags);
> >> > +       vaddr = (unsigned long)page_address(page);
> >> > +       if (vaddr) {
> >> > +               BUG_ON(pkmap_count[PKMAP_NR(vaddr)] < 1);
> >> > +               pkmap_count[PKMAP_NR(vaddr)]++;
> >> > +       }
> >> > +       spin_unlock_kmap_any(flags);
> >> > +       return (void*) vaddr;
> >> > +}
> >> > +#endif
> >>
> >> Let's add empty function for architecture of no ARCH_NEEDS_KMAP_HIGH_GET,
> >
> > The reasoning being?
>
> I thought it can be used in common arm function.
> so, for VIVT, it can be work but for VIPT, it can be nulled as
> preventing compile error.

The issue is not about VIVT vs VIPT, but rather about the fact that IO
transactions don't snoop the cache. So this is needed even for current
VIPT implementations.

> But, I don't know where we use kmap_high_get since I didn't see any
> patch which use it.
> If it is only used architecture specific place, pz, forgot my comment.

Yes, that's the case. And it is much preferable to have a compilation
error than providing an empty stub to silently mask out misuses.


Nicolas