2006-01-27 06:46:06

by Pierre Ossman

[permalink] [raw]
Subject: How to map high memory for block io

I'm having some problems getting high memory support to work smoothly in
my driver. The documentation doesn't indicate what I might be doing
wrong so I'll have to ask here.

The problem seems to be that kmap & co maps a single page into kernel
memory. So when I happen to cross page boundaries I start corrupting
some unrelated parts of the kernel. I would prefer not having to
consider page boundaries in an already messy PIO loop, so I've been
trying to find either a routine to map an entire sg entry or some way to
force the block layer to not give me stuff crossing pages.

As you can guess I have not found anything that can do what I want, so
some pointers would be nice.

Rgds
Pierre


2006-01-27 10:24:04

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27 2006, Pierre Ossman wrote:
> I'm having some problems getting high memory support to work smoothly in
> my driver. The documentation doesn't indicate what I might be doing
> wrong so I'll have to ask here.
>
> The problem seems to be that kmap & co maps a single page into kernel
> memory. So when I happen to cross page boundaries I start corrupting
> some unrelated parts of the kernel. I would prefer not having to
> consider page boundaries in an already messy PIO loop, so I've been
> trying to find either a routine to map an entire sg entry or some way to
> force the block layer to not give me stuff crossing pages.
>
> As you can guess I have not found anything that can do what I want, so
> some pointers would be nice.

Honestly, just don't bother if you are doing PIO anyways. Just tell the
block layer that you want io bounced for you instead.

--
Jens Axboe

2006-01-27 10:33:48

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Jens Axboe wrote:
> On Fri, Jan 27 2006, Pierre Ossman wrote:
>
>> I'm having some problems getting high memory support to work smoothly in
>> my driver. The documentation doesn't indicate what I might be doing
>> wrong so I'll have to ask here.
>>
>> The problem seems to be that kmap & co maps a single page into kernel
>> memory. So when I happen to cross page boundaries I start corrupting
>> some unrelated parts of the kernel. I would prefer not having to
>> consider page boundaries in an already messy PIO loop, so I've been
>> trying to find either a routine to map an entire sg entry or some way to
>> force the block layer to not give me stuff crossing pages.
>>
>> As you can guess I have not found anything that can do what I want, so
>> some pointers would be nice.
>>
>
> Honestly, just don't bother if you are doing PIO anyways. Just tell the
> block layer that you want io bounced for you instead.
>
>

This is the MMC layer so there is some separation between the block
layer and the drivers. Also, the transfers won't necessarily be from the
block layer so a generic solution is desired. I don't suppose there is
some way of accessing the bounce buffer routines in a non-bio context?

Rgds
Pierre

2006-01-27 10:41:12

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27 2006, Pierre Ossman wrote:
> Jens Axboe wrote:
> > On Fri, Jan 27 2006, Pierre Ossman wrote:
> >
> >> I'm having some problems getting high memory support to work smoothly in
> >> my driver. The documentation doesn't indicate what I might be doing
> >> wrong so I'll have to ask here.
> >>
> >> The problem seems to be that kmap & co maps a single page into kernel
> >> memory. So when I happen to cross page boundaries I start corrupting
> >> some unrelated parts of the kernel. I would prefer not having to
> >> consider page boundaries in an already messy PIO loop, so I've been
> >> trying to find either a routine to map an entire sg entry or some way to
> >> force the block layer to not give me stuff crossing pages.
> >>
> >> As you can guess I have not found anything that can do what I want, so
> >> some pointers would be nice.
> >>
> >
> > Honestly, just don't bother if you are doing PIO anyways. Just tell the
> > block layer that you want io bounced for you instead.
> >
> >
>
> This is the MMC layer so there is some separation between the block
> layer and the drivers. Also, the transfers won't necessarily be from the
> block layer so a generic solution is desired. I don't suppose there is
> some way of accessing the bounce buffer routines in a non-bio context?

Only the mapping routines are appropriate at that point, or things get
complicated. You could still do a two-page mapping, if you are careful
about using different KMAP_ types.

--
Jens Axboe

2006-01-27 12:14:28

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Jens Axboe wrote:
> On Fri, Jan 27 2006, Pierre Ossman wrote:
>
>> Jens Axboe wrote:
>>
>>> On Fri, Jan 27 2006, Pierre Ossman wrote:
>>>
>>>
>>>> I'm having some problems getting high memory support to work smoothly in
>>>> my driver. The documentation doesn't indicate what I might be doing
>>>> wrong so I'll have to ask here.
>>>>
>>>> The problem seems to be that kmap & co maps a single page into kernel
>>>> memory. So when I happen to cross page boundaries I start corrupting
>>>> some unrelated parts of the kernel. I would prefer not having to
>>>> consider page boundaries in an already messy PIO loop, so I've been
>>>> trying to find either a routine to map an entire sg entry or some way to
>>>> force the block layer to not give me stuff crossing pages.
>>>>
>>>> As you can guess I have not found anything that can do what I want, so
>>>> some pointers would be nice.
>>>>
>>>>
>>> Honestly, just don't bother if you are doing PIO anyways. Just tell the
>>> block layer that you want io bounced for you instead.
>>>
>>>
>>>
>> This is the MMC layer so there is some separation between the block
>> layer and the drivers. Also, the transfers won't necessarily be from the
>> block layer so a generic solution is desired. I don't suppose there is
>> some way of accessing the bounce buffer routines in a non-bio context?
>>
>
> Only the mapping routines are appropriate at that point, or things get
> complicated. You could still do a two-page mapping, if you are careful
> about using different KMAP_ types.
>
>

That would still make things rather difficult since there is no way to
get both maps into joining vaddrs. Is there no way to say "don't cross
page boundaries"? Setting a segment size of PAGE_SIZE still causes
problems when the offset isn't 0.

Russell, would having a "highmem not supported" flag in the host
structure be an acceptable solution? mmc_block could then use that to
tell the block layer that bounce buffers are needed. As for other,
future, users they would have to take care not to give those drivers
highmem sg lists.

The current buggy code, was modeled after another MMC driver (mmci). So
I suspect there might be more occurrences like this. Perhaps an audit
should be added as a janitor project?

Rgds
Pierre

2006-01-27 12:37:18

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27 2006, Pierre Ossman wrote:
> Jens Axboe wrote:
> > On Fri, Jan 27 2006, Pierre Ossman wrote:
> >
> >> Jens Axboe wrote:
> >>
> >>> On Fri, Jan 27 2006, Pierre Ossman wrote:
> >>>
> >>>
> >>>> I'm having some problems getting high memory support to work smoothly in
> >>>> my driver. The documentation doesn't indicate what I might be doing
> >>>> wrong so I'll have to ask here.
> >>>>
> >>>> The problem seems to be that kmap & co maps a single page into kernel
> >>>> memory. So when I happen to cross page boundaries I start corrupting
> >>>> some unrelated parts of the kernel. I would prefer not having to
> >>>> consider page boundaries in an already messy PIO loop, so I've been
> >>>> trying to find either a routine to map an entire sg entry or some way to
> >>>> force the block layer to not give me stuff crossing pages.
> >>>>
> >>>> As you can guess I have not found anything that can do what I want, so
> >>>> some pointers would be nice.
> >>>>
> >>>>
> >>> Honestly, just don't bother if you are doing PIO anyways. Just tell the
> >>> block layer that you want io bounced for you instead.
> >>>
> >>>
> >>>
> >> This is the MMC layer so there is some separation between the block
> >> layer and the drivers. Also, the transfers won't necessarily be from the
> >> block layer so a generic solution is desired. I don't suppose there is
> >> some way of accessing the bounce buffer routines in a non-bio context?
> >>
> >
> > Only the mapping routines are appropriate at that point, or things get
> > complicated. You could still do a two-page mapping, if you are careful
> > about using different KMAP_ types.
> >
> >
>
> That would still make things rather difficult since there is no way to
> get both maps into joining vaddrs. Is there no way to say "don't cross
> page boundaries"? Setting a segment size of PAGE_SIZE still causes
> problems when the offset isn't 0.

To be absolutely sure, you can just disallow multiple pages in a bio.
Ala:

static int my_merge_bvec_fn(request_queue_t *q, struct bio *bio,
struct bio_vec *bvec)
{
return 1;
}

init_code()
{
...
blk_queue_merge_bvec(q, my_merge_bvec_fn);
}

--
Jens Axboe

2006-01-27 13:16:31

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Jens Axboe wrote:
> On Fri, Jan 27 2006, Pierre Ossman wrote:
>> That would still make things rather difficult since there is no way to
>> get both maps into joining vaddrs. Is there no way to say "don't cross
>> page boundaries"? Setting a segment size of PAGE_SIZE still causes
>> problems when the offset isn't 0.
>>
>
> To be absolutely sure, you can just disallow multiple pages in a bio.
>
>

The only stuff that reaches the MMC drivers is the scatter list. So
anything that operates on the request queue or any other block layer
specifics is probably out of the question since it breaks the abstraction.

Doesn't seem like a generic solution is easily implemented. I'll start
hacking together some way of specifying that highmem isn't supported so
that mmc_block can indicate this to the block layer.

Rgds
Pierre

2006-01-27 13:48:19

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Pierre Ossman wrote:
>
> Doesn't seem like a generic solution is easily implemented. I'll start
> hacking together some way of specifying that highmem isn't supported so
> that mmc_block can indicate this to the block layer.
>

If I set the limit to BLK_BOUNCE_HIGH then (page_address(sg->page) +
sg->offset) is guaranteed to be directly accessible for the entire
sg->length on all architectures, right? This seems to be the assumption
in the USB ub driver at least.

Rgds
Pierre

2006-01-27 13:58:25

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27 2006, Pierre Ossman wrote:
> Pierre Ossman wrote:
> >
> > Doesn't seem like a generic solution is easily implemented. I'll start
> > hacking together some way of specifying that highmem isn't supported so
> > that mmc_block can indicate this to the block layer.
> >
>
> If I set the limit to BLK_BOUNCE_HIGH then (page_address(sg->page) +
> sg->offset) is guaranteed to be directly accessible for the entire
> sg->length on all architectures, right? This seems to be the assumption
> in the USB ub driver at least.

Yes, hence my initial suggestion to just do that.

--
Jens Axboe

2006-01-27 14:15:03

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Jens Axboe wrote:
> On Fri, Jan 27 2006, Pierre Ossman wrote:
>
>>
>> If I set the limit to BLK_BOUNCE_HIGH then (page_address(sg->page) +
>> sg->offset) is guaranteed to be directly accessible for the entire
>> sg->length on all architectures, right? This seems to be the assumption
>> in the USB ub driver at least.
>>
>
> Yes, hence my initial suggestion to just do that.
>
>

Just making sure I understood it correctly. These bugs tend to be a pain
to debug so I'd prefer to get it right from the start. :)

Thanks for the help.

Rgds
Pierre


2006-01-27 18:38:11

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Pierre Ossman wrote:
>
> Doesn't seem like a generic solution is easily implemented. I'll start
> hacking together some way of specifying that highmem isn't supported so
> that mmc_block can indicate this to the block layer.
>

Whilst doing this I discovered that the MMC layer already does some
bounce buffer limiting. It defaults to BLK_BOUNCE_HIGH, but when
dev->dma_mask is set it uses that. The problem is that the default value
for PCI devices covers high mem transfers.

Russell, what was your plan here? Should MMC drivers set dma_mask to
BLK_BOUNCE_HIGH when not doing DMA? Or perhaps 0?

Rgds
Pierre

2006-01-27 19:43:26

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27, 2006 at 01:14:15PM +0100, Pierre Ossman wrote:
> Russell, would having a "highmem not supported" flag in the host
> structure be an acceptable solution? mmc_block could then use that to
> tell the block layer that bounce buffers are needed. As for other,
> future, users they would have to take care not to give those drivers
> highmem sg lists.

The mmc_block layer only tells the block layer what the driver told it.

> The current buggy code, was modeled after another MMC driver (mmci). So
> I suspect there might be more occurrences like this. Perhaps an audit
> should be added as a janitor project?

I don't see what the problem is. A sg entry is a list of struct page
pointers, an offset, and a size. As such, it can't describe a transfer
which crosses a page because such a structure does not imply that one
struct page follows another struct page.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-01-27 20:04:36

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:
>
> I don't see what the problem is. A sg entry is a list of struct page
> pointers, an offset, and a size. As such, it can't describe a transfer
> which crosses a page because such a structure does not imply that one
> struct page follows another struct page.
>
>

If the pages do not strictly follow each other then there is a lot of
broken code in the kernel. drivers/mmc/mmci.c and drivers/block/ub.c
being two occurences since both assume they can access the entire entry
through a single mapping.

Rgds
Pierre

2006-01-27 20:09:19

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27 2006, Pierre Ossman wrote:
> Russell King wrote:
> >
> > I don't see what the problem is. A sg entry is a list of struct page
> > pointers, an offset, and a size. As such, it can't describe a transfer
> > which crosses a page because such a structure does not imply that one
> > struct page follows another struct page.
> >
> >
>
> If the pages do not strictly follow each other then there is a lot of
> broken code in the kernel. drivers/mmc/mmci.c and drivers/block/ub.c
> being two occurences since both assume they can access the entire entry
> through a single mapping.

Assuming that page idx and idx + 1 in a bio are contigious is a very bad
one, no one has ever made any such guarantees. Besides, if you kmap()
it, things change. The pages in a bio have nothing to do with each
other, they are usually just a consequence of how the page cache ended
up allocating. So completely random usually, once you get a little
fragmentation.

Are you sure you are reading that code correctly? Code making such
assumptions would be breaking pretty quickly.

--
Jens Axboe

2006-01-27 20:12:20

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:

> >
> > I don't see what the problem is. A sg entry is a list of struct page
> > pointers, an offset, and a size. As such, it can't describe a transfer
> > which crosses a page because such a structure does not imply that one
> > struct page follows another struct page.
> >
> >
>

If the pages do not strictly follow each other then there is a lot of
broken code in the kernel. drivers/mmc/mmci.c and drivers/block/ub.c
being two occurences since both assume they can access the entire entry
through a single mapping.

Rgds
Pierre


2006-01-27 20:15:06

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27, 2006 at 09:04:33PM +0100, Pierre Ossman wrote:
> Russell King wrote:
> > I don't see what the problem is. A sg entry is a list of struct page
> > pointers, an offset, and a size. As such, it can't describe a transfer
> > which crosses a page because such a structure does not imply that one
> > struct page follows another struct page.
>
> If the pages do not strictly follow each other then there is a lot of
> broken code in the kernel. drivers/mmc/mmci.c and drivers/block/ub.c
> being two occurences since both assume they can access the entire entry
> through a single mapping.

We don't make that assumption. What we do is:

- map the current sg using kmap_atomic()
- copy up to sg->length into or out of that mapping
- unmap current sg
- if we have reached the end of this sg, move on to the next

What this means is that we assume sg->offset + sg->length <= PAGE_SIZE
in all cases, which is the same assumption architecture DMA code makes.
If that's invalid, there's likely to be a lot of architecture DMA support
which is broken.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-01-27 20:16:09

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27, 2006 at 09:12:16PM +0100, Pierre Ossman wrote:
> Russell King wrote:
> > > I don't see what the problem is. A sg entry is a list of struct page
> > > pointers, an offset, and a size. As such, it can't describe a transfer
> > > which crosses a page because such a structure does not imply that one
> > > struct page follows another struct page.
>
> If the pages do not strictly follow each other then there is a lot of
> broken code in the kernel. drivers/mmc/mmci.c and drivers/block/ub.c
> being two occurences since both assume they can access the entire entry
> through a single mapping.

Why am I getting duplicates of this message? Please don't forcefully
send me duplicates - it's a waste of my bandwidth. Thanks.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-01-27 20:21:10

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27 2006, Russell King wrote:
> On Fri, Jan 27, 2006 at 09:04:33PM +0100, Pierre Ossman wrote:
> > Russell King wrote:
> > > I don't see what the problem is. A sg entry is a list of struct page
> > > pointers, an offset, and a size. As such, it can't describe a transfer
> > > which crosses a page because such a structure does not imply that one
> > > struct page follows another struct page.
> >
> > If the pages do not strictly follow each other then there is a lot of
> > broken code in the kernel. drivers/mmc/mmci.c and drivers/block/ub.c
> > being two occurences since both assume they can access the entire entry
> > through a single mapping.
>
> We don't make that assumption. What we do is:
>
> - map the current sg using kmap_atomic()
> - copy up to sg->length into or out of that mapping
> - unmap current sg
> - if we have reached the end of this sg, move on to the next
>
> What this means is that we assume sg->offset + sg->length <= PAGE_SIZE
> in all cases, which is the same assumption architecture DMA code makes.
> If that's invalid, there's likely to be a lot of architecture DMA support
> which is broken.

That is definitely valid, same goes for the bio_vec structure. They map
_a_ page, after all :-)

--
Jens Axboe

2006-01-27 20:26:52

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27, 2006 at 09:22:06PM +0100, Jens Axboe wrote:
> That is definitely valid, same goes for the bio_vec structure. They map
> _a_ page, after all :-)

Okay. Pierre - are you saying that you have an sg entry where
sg->offset + sg->length > PAGE_SIZE, and hence is causing you to
cross a page boundary?

(Sorry, your initial mail got lost because I've tend to be over-eager
these days with the D key with lkml over the last week or so - I've
not been around much.)

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-01-27 20:29:04

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:
> On Fri, Jan 27, 2006 at 09:04:33PM +0100, Pierre Ossman wrote:
>
>> Russell King wrote:
>>
>>> I don't see what the problem is. A sg entry is a list of struct page
>>> pointers, an offset, and a size. As such, it can't describe a transfer
>>> which crosses a page because such a structure does not imply that one
>>> struct page follows another struct page.
>>>
>> If the pages do not strictly follow each other then there is a lot of
>> broken code in the kernel. drivers/mmc/mmci.c and drivers/block/ub.c
>> being two occurences since both assume they can access the entire entry
>> through a single mapping.
>>
>
> We don't make that assumption. What we do is:
>
> - map the current sg using kmap_atomic()
> - copy up to sg->length into or out of that mapping
> - unmap current sg
> - if we have reached the end of this sg, move on to the next
>

This it the algorithm I've been following, but...

> What this means is that we assume sg->offset + sg->length <= PAGE_SIZE
>

This doesn't seem to be true. Since when tracking this problem I added
the following:

BUG_ON((host->cur_sg->length + host->cur_sg->offset) > PAGE_SIZE);


which subsequently triggered. That was the start of my quest for how to
do highmem properly.

Rgds
Pierre


2006-01-27 20:38:33

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:
> On Fri, Jan 27, 2006 at 09:22:06PM +0100, Jens Axboe wrote:
>
>> That is definitely valid, same goes for the bio_vec structure. They map
>> _a_ page, after all :-)
>>
>
> Okay. Pierre - are you saying that you have an sg entry where
> sg->offset + sg->length > PAGE_SIZE, and hence is causing you to
> cross a page boundary?
>
>

That, and sg->length > PAGE_SIZE. On highmem systems this causes all
kinds of funky behaviour. Usually just bogus data in the buffers though.

> (Sorry, your initial mail got lost because I've tend to be over-eager
> these days with the D key with lkml over the last week or so - I've
> not been around much.)
>
>

The background wasn't really detailed on LKML anyway. The background was
that I got several reports of very strange behaviour, all from people
running highmem systems. When debug messages were added, printing the sg
structure entries, we discovered that the problems seemed to occur when
we were crossing pages.

You can find the thread here if you feel like a lot of reading: :)

http://list.drzeus.cx/pipermail/sdhci-devel/2006-January/000301.html

Rgds
Pierre

2006-01-27 21:59:04

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Pierre Ossman wrote:
> Russell King wrote:
>
>> On Fri, Jan 27, 2006 at 09:22:06PM +0100, Jens Axboe wrote:
>>
>>
>>> That is definitely valid, same goes for the bio_vec structure. They map
>>> _a_ page, after all :-)
>>>
>>>
>> Okay. Pierre - are you saying that you have an sg entry where
>> sg->offset + sg->length > PAGE_SIZE, and hence is causing you to
>> cross a page boundary?
>>
>>
>>
>
> That, and sg->length > PAGE_SIZE. On highmem systems this causes all
> kinds of funky behaviour. Usually just bogus data in the buffers though.
>
>

Test done here, few minutes ago. Added this to the wbsd driver in its
kmap routine:

if ((host->cur_sg->offset + host->cur_sg->length) > PAGE_SIZE)
printk(KERN_DEBUG "wbsd: Big sg: %d, %d\n",
host->cur_sg->offset, host->cur_sg->length);

got:

[17385.425389] wbsd: Big sg: 0, 8192
[17385.436849] wbsd: Big sg: 0, 7168
[17385.436859] wbsd: Big sg: 0, 7168
[17385.454029] wbsd: Big sg: 2560, 5632
[17385.454216] wbsd: Big sg: 2560, 5632

And so on.

Rgds
Pierre

2006-01-27 22:55:09

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27, 2006 at 10:58:59PM +0100, Pierre Ossman wrote:
> Test done here, few minutes ago. Added this to the wbsd driver in its
> kmap routine:
>
> if ((host->cur_sg->offset + host->cur_sg->length) > PAGE_SIZE)
> printk(KERN_DEBUG "wbsd: Big sg: %d, %d\n",
> host->cur_sg->offset, host->cur_sg->length);
>
> got:
>
> [17385.425389] wbsd: Big sg: 0, 8192
> [17385.436849] wbsd: Big sg: 0, 7168
> [17385.436859] wbsd: Big sg: 0, 7168
> [17385.454029] wbsd: Big sg: 2560, 5632
> [17385.454216] wbsd: Big sg: 2560, 5632

Jens - what's going on? These look like invalid sg entries to me.

If they are supposed to be like that, there will be additional problems
for block drivers ensuring cache coherency on PIO.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-01-28 19:17:10

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Fri, Jan 27 2006, Russell King wrote:
> On Fri, Jan 27, 2006 at 10:58:59PM +0100, Pierre Ossman wrote:
> > Test done here, few minutes ago. Added this to the wbsd driver in its
> > kmap routine:
> >
> > if ((host->cur_sg->offset + host->cur_sg->length) > PAGE_SIZE)
> > printk(KERN_DEBUG "wbsd: Big sg: %d, %d\n",
> > host->cur_sg->offset, host->cur_sg->length);
> >
> > got:
> >
> > [17385.425389] wbsd: Big sg: 0, 8192
> > [17385.436849] wbsd: Big sg: 0, 7168
> > [17385.436859] wbsd: Big sg: 0, 7168
> > [17385.454029] wbsd: Big sg: 2560, 5632
> > [17385.454216] wbsd: Big sg: 2560, 5632
>
> Jens - what's going on? These look like invalid sg entries to me.
>
> If they are supposed to be like that, there will be additional problems
> for block drivers ensuring cache coherency on PIO.

No freaking idea, must be coming out of the pci dma mapping. The IOMMU
doing funky stuff? How are these sg lists mapped?

--
Jens Axboe

2006-01-28 19:33:05

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Jens Axboe wrote:
> On Fri, Jan 27 2006, Russell King wrote:
>
>> On Fri, Jan 27, 2006 at 10:58:59PM +0100, Pierre Ossman wrote:
>>
>>> Test done here, few minutes ago. Added this to the wbsd driver in its
>>> kmap routine:
>>>
>>> if ((host->cur_sg->offset + host->cur_sg->length) > PAGE_SIZE)
>>> printk(KERN_DEBUG "wbsd: Big sg: %d, %d\n",
>>> host->cur_sg->offset, host->cur_sg->length);
>>>
>>> got:
>>>
>>> [17385.425389] wbsd: Big sg: 0, 8192
>>> [17385.436849] wbsd: Big sg: 0, 7168
>>> [17385.436859] wbsd: Big sg: 0, 7168
>>> [17385.454029] wbsd: Big sg: 2560, 5632
>>> [17385.454216] wbsd: Big sg: 2560, 5632
>>>
>> Jens - what's going on? These look like invalid sg entries to me.
>>
>> If they are supposed to be like that, there will be additional problems
>> for block drivers ensuring cache coherency on PIO.
>>
>
> No freaking idea, must be coming out of the pci dma mapping. The IOMMU
> doing funky stuff? How are these sg lists mapped?
>
>

This is an ISA (i.e. platform) device, so no PCI involved. There is also
no IOMMU on this system.

As for the mapping there doesn't seem to be anything fancy about it
(this is Russell's territory so this is just my naive view of it). The
queue is set up in mmc_queue.c and the sg is mapped using
blk_rq_map_sg() in mmc_block.c.

But if sg entries are not supposed to cross pages, then I guess that
means that any transfer is limited in size by PAGE_SIZE *
min(max_phys_seg, max_hw_seg), right?

Rgds
Pierre

2006-01-29 15:21:33

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Sat, Jan 28 2006, Pierre Ossman wrote:
> Jens Axboe wrote:
> > On Fri, Jan 27 2006, Russell King wrote:
> >
> >> On Fri, Jan 27, 2006 at 10:58:59PM +0100, Pierre Ossman wrote:
> >>
> >>> Test done here, few minutes ago. Added this to the wbsd driver in its
> >>> kmap routine:
> >>>
> >>> if ((host->cur_sg->offset + host->cur_sg->length) > PAGE_SIZE)
> >>> printk(KERN_DEBUG "wbsd: Big sg: %d, %d\n",
> >>> host->cur_sg->offset, host->cur_sg->length);
> >>>
> >>> got:
> >>>
> >>> [17385.425389] wbsd: Big sg: 0, 8192
> >>> [17385.436849] wbsd: Big sg: 0, 7168
> >>> [17385.436859] wbsd: Big sg: 0, 7168
> >>> [17385.454029] wbsd: Big sg: 2560, 5632
> >>> [17385.454216] wbsd: Big sg: 2560, 5632
> >>>
> >> Jens - what's going on? These look like invalid sg entries to me.
> >>
> >> If they are supposed to be like that, there will be additional problems
> >> for block drivers ensuring cache coherency on PIO.
> >>
> >
> > No freaking idea, must be coming out of the pci dma mapping. The IOMMU
> > doing funky stuff? How are these sg lists mapped?
> >
> >
>
> This is an ISA (i.e. platform) device, so no PCI involved. There is also
> no IOMMU on this system.
>
> As for the mapping there doesn't seem to be anything fancy about it
> (this is Russell's territory so this is just my naive view of it). The
> queue is set up in mmc_queue.c and the sg is mapped using
> blk_rq_map_sg() in mmc_block.c.
>
> But if sg entries are not supposed to cross pages, then I guess that
> means that any transfer is limited in size by PAGE_SIZE *
> min(max_phys_seg, max_hw_seg), right?

Ah, you need to disable clustering to prevent that from happening! I was
confused there for a while.

--
Jens Axboe

2006-01-30 07:57:58

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Jens Axboe wrote:
> On Sat, Jan 28 2006, Pierre Ossman wrote:
>
>> Jens Axboe wrote:
>>
>>> On Fri, Jan 27 2006, Russell King wrote:
>>>
>>>
>>>> On Fri, Jan 27, 2006 at 10:58:59PM +0100, Pierre Ossman wrote:
>>>>
>>>>
>>>>> Test done here, few minutes ago. Added this to the wbsd driver in its
>>>>> kmap routine:
>>>>>
>>>>> if ((host->cur_sg->offset + host->cur_sg->length) > PAGE_SIZE)
>>>>> printk(KERN_DEBUG "wbsd: Big sg: %d, %d\n",
>>>>> host->cur_sg->offset, host->cur_sg->length);
>>>>>
>>>>> got:
>>>>>
>>>>> [17385.425389] wbsd: Big sg: 0, 8192
>>>>> [17385.436849] wbsd: Big sg: 0, 7168
>>>>> [17385.436859] wbsd: Big sg: 0, 7168
>>>>> [17385.454029] wbsd: Big sg: 2560, 5632
>>>>> [17385.454216] wbsd: Big sg: 2560, 5632
>>>>>
>>>>>
>>>> Jens - what's going on? These look like invalid sg entries to me.
>>>>
>>>> If they are supposed to be like that, there will be additional problems
>>>> for block drivers ensuring cache coherency on PIO.
>>>>
>>>>
>>> No freaking idea, must be coming out of the pci dma mapping. The IOMMU
>>> doing funky stuff? How are these sg lists mapped?
>>>
>>>
>>>
>> This is an ISA (i.e. platform) device, so no PCI involved. There is also
>> no IOMMU on this system.
>>
>> As for the mapping there doesn't seem to be anything fancy about it
>> (this is Russell's territory so this is just my naive view of it). The
>> queue is set up in mmc_queue.c and the sg is mapped using
>> blk_rq_map_sg() in mmc_block.c.
>>
>> But if sg entries are not supposed to cross pages, then I guess that
>> means that any transfer is limited in size by PAGE_SIZE *
>> min(max_phys_seg, max_hw_seg), right?
>>
>
> Ah, you need to disable clustering to prevent that from happening! I was
> confused there for a while.
>
>

And which is the lesser evil, highmem bounce buffers or disabling
clustering? I'd probably vote for the former since the MMC overhead can
be quite large.

(Readding Russell as cc since he seems to have gotten lost somewhere)

2006-01-30 08:07:22

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Mon, Jan 30 2006, Pierre Ossman wrote:
> Jens Axboe wrote:
> > On Sat, Jan 28 2006, Pierre Ossman wrote:
> >
> >> Jens Axboe wrote:
> >>
> >>> On Fri, Jan 27 2006, Russell King wrote:
> >>>
> >>>
> >>>> On Fri, Jan 27, 2006 at 10:58:59PM +0100, Pierre Ossman wrote:
> >>>>
> >>>>
> >>>>> Test done here, few minutes ago. Added this to the wbsd driver in its
> >>>>> kmap routine:
> >>>>>
> >>>>> if ((host->cur_sg->offset + host->cur_sg->length) > PAGE_SIZE)
> >>>>> printk(KERN_DEBUG "wbsd: Big sg: %d, %d\n",
> >>>>> host->cur_sg->offset, host->cur_sg->length);
> >>>>>
> >>>>> got:
> >>>>>
> >>>>> [17385.425389] wbsd: Big sg: 0, 8192
> >>>>> [17385.436849] wbsd: Big sg: 0, 7168
> >>>>> [17385.436859] wbsd: Big sg: 0, 7168
> >>>>> [17385.454029] wbsd: Big sg: 2560, 5632
> >>>>> [17385.454216] wbsd: Big sg: 2560, 5632
> >>>>>
> >>>>>
> >>>> Jens - what's going on? These look like invalid sg entries to me.
> >>>>
> >>>> If they are supposed to be like that, there will be additional problems
> >>>> for block drivers ensuring cache coherency on PIO.
> >>>>
> >>>>
> >>> No freaking idea, must be coming out of the pci dma mapping. The IOMMU
> >>> doing funky stuff? How are these sg lists mapped?
> >>>
> >>>
> >>>
> >> This is an ISA (i.e. platform) device, so no PCI involved. There is also
> >> no IOMMU on this system.
> >>
> >> As for the mapping there doesn't seem to be anything fancy about it
> >> (this is Russell's territory so this is just my naive view of it). The
> >> queue is set up in mmc_queue.c and the sg is mapped using
> >> blk_rq_map_sg() in mmc_block.c.
> >>
> >> But if sg entries are not supposed to cross pages, then I guess that
> >> means that any transfer is limited in size by PAGE_SIZE *
> >> min(max_phys_seg, max_hw_seg), right?
> >>
> >
> > Ah, you need to disable clustering to prevent that from happening! I was
> > confused there for a while.
> >
> >
>
> And which is the lesser evil, highmem bounce buffers or disabling
> clustering? I'd probably vote for the former since the MMC overhead can
> be quite large.

Disabling clustering is by far the least expensive way to accomplish it.

--
Jens Axboe

2006-01-31 18:39:09

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Jens Axboe wrote:
> On Mon, Jan 30 2006, Pierre Ossman wrote:
>
>> Jens Axboe wrote:
>>
>>>
>>> Ah, you need to disable clustering to prevent that from happening! I was
>>> confused there for a while.
>>>
>>>
>>>
>> And which is the lesser evil, highmem bounce buffers or disabling
>> clustering? I'd probably vote for the former since the MMC overhead can
>> be quite large.
>>
>
> Disabling clustering is by far the least expensive way to accomplish it.
>
>

Russell, what's your view on this? And how should we handle it with
regard to MMC drivers?

Rgds
Pierre

2006-03-01 23:29:31

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Tue, Jan 31, 2006 at 07:39:02PM +0100, Pierre Ossman wrote:
> Jens Axboe wrote:
> > On Mon, Jan 30 2006, Pierre Ossman wrote:
> >
> >> Jens Axboe wrote:
> >>
> >>>
> >>> Ah, you need to disable clustering to prevent that from happening! I was
> >>> confused there for a while.
> >>>
> >>>
> >>>
> >> And which is the lesser evil, highmem bounce buffers or disabling
> >> clustering? I'd probably vote for the former since the MMC overhead can
> >> be quite large.
> >>
> >
> > Disabling clustering is by far the least expensive way to accomplish it.
> >
> >
>
> Russell, what's your view on this? And how should we handle it with
> regard to MMC drivers?

Okay, I've hit this same problem (but in a slightly different way) with
mmci.c. The way I'm proposing to fix this for mmci is to introduce a
new capability which says "clustering is supported by this driver."

I'm unconvinced that we can safely fiddle with the queue's flags once
the queue is in use, hence why I've gone for the init-time only solution.
Maybe Jens can comment on that?

(The side effect of this patch is that everyone ends up with clustering
disabled until they explicitly update their driver to enable it - which
results in us defaulting to a safe operation mode.)

diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x '*.orig' -x '*.rej' -x .git a/drivers/mmc/mmc_queue.c b/drivers/mmc/mmc_queue.c
--- a/drivers/mmc/mmc_queue.c Sun Nov 6 22:16:40 2005
+++ b/drivers/mmc/mmc_queue.c Wed Mar 1 23:18:33 2006
@@ -144,6 +144,14 @@ int mmc_init_queue(struct mmc_queue *mq,
blk_queue_max_hw_segments(mq->queue, host->max_hw_segs);
blk_queue_max_segment_size(mq->queue, host->max_seg_size);

+ /*
+ * If the host does not support clustered requests, tell BIO.
+ * It is not clear when we can alter this, so only do it at
+ * initialisation time, before the queue is in use.
+ */
+ if (!(host->caps & MMC_CAP_CLUSTER))
+ mq->queue->queue_flags &= ~(1 << QUEUE_FLAG_CLUSTER);
+
mq->queue->queuedata = mq;
mq->req = NULL;

diff -up -x BitKeeper -x ChangeSet -x SCCS -x _xlk -x '*.orig' -x '*.rej' -x .git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
--- a/include/linux/mmc/host.h Sun Nov 6 22:20:12 2005
+++ b/include/linux/mmc/host.h Wed Mar 1 23:10:18 2006
@@ -85,6 +85,7 @@ struct mmc_host {
unsigned long caps; /* Host capabilities */

#define MMC_CAP_4_BIT_DATA (1 << 0) /* Can the host do 4 bit transfers */
+#define MMC_CAP_CLUSTER (1 << 1) /* host can support clustered BIOs */

/* host specific block data */
unsigned int max_seg_size; /* see blk_queue_max_segment_size */



--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-03-02 07:21:51

by Jens Axboe

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Wed, Mar 01 2006, Russell King wrote:
> On Tue, Jan 31, 2006 at 07:39:02PM +0100, Pierre Ossman wrote:
> > Jens Axboe wrote:
> > > On Mon, Jan 30 2006, Pierre Ossman wrote:
> > >
> > >> Jens Axboe wrote:
> > >>
> > >>>
> > >>> Ah, you need to disable clustering to prevent that from happening! I was
> > >>> confused there for a while.
> > >>>
> > >>>
> > >>>
> > >> And which is the lesser evil, highmem bounce buffers or disabling
> > >> clustering? I'd probably vote for the former since the MMC overhead can
> > >> be quite large.
> > >>
> > >
> > > Disabling clustering is by far the least expensive way to accomplish it.
> > >
> > >
> >
> > Russell, what's your view on this? And how should we handle it with
> > regard to MMC drivers?
>
> Okay, I've hit this same problem (but in a slightly different way) with
> mmci.c. The way I'm proposing to fix this for mmci is to introduce a
> new capability which says "clustering is supported by this driver."
>
> I'm unconvinced that we can safely fiddle with the queue's flags once
> the queue is in use, hence why I've gone for the init-time only solution.
> Maybe Jens can comment on that?

You can set it anytime, basically, provided you use the atomic bit
operations on it. Of course you may be left with clustered and
non-clustered requests in the queue until everything has drained out,
but I doubt that would be a problem :-)

--
Jens Axboe

2006-03-02 07:26:53

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:
> On Tue, Jan 31, 2006 at 07:39:02PM +0100, Pierre Ossman wrote:
>
>> Jens Axboe wrote:
>>
>>> On Mon, Jan 30 2006, Pierre Ossman wrote:
>>>
>>>
>>>> Jens Axboe wrote:
>>>>
>>>>
>>>>> Ah, you need to disable clustering to prevent that from happening! I was
>>>>> confused there for a while.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> And which is the lesser evil, highmem bounce buffers or disabling
>>>> clustering? I'd probably vote for the former since the MMC overhead can
>>>> be quite large.
>>>>
>>>>
>>> Disabling clustering is by far the least expensive way to accomplish it.
>>>
>>>
>>>
>> Russell, what's your view on this? And how should we handle it with
>> regard to MMC drivers?
>>
>
> Okay, I've hit this same problem (but in a slightly different way) with
> mmci.c. The way I'm proposing to fix this for mmci is to introduce a
> new capability which says "clustering is supported by this driver."
>
>

This will decrease performance more than necessary for drivers that can
do clustering, just not in highmem. So what about another flag that says
"highmem is supported by this driver"?

Rgds
Pierre

2006-03-02 09:42:01

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Thu, Mar 02, 2006 at 08:26:50AM +0100, Pierre Ossman wrote:
> Russell King wrote:
> > Okay, I've hit this same problem (but in a slightly different way) with
> > mmci.c. The way I'm proposing to fix this for mmci is to introduce a
> > new capability which says "clustering is supported by this driver."
>
> This will decrease performance more than necessary for drivers that can
> do clustering, just not in highmem. So what about another flag that says
> "highmem is supported by this driver"?

I think you're asking Jens that question - I know of no way to tell
the block layer that clustering is fine for normal but not highmem.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-03-02 09:52:03

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:
> On Thu, Mar 02, 2006 at 08:26:50AM +0100, Pierre Ossman wrote:
>
>> Russell King wrote:
>>
>>> Okay, I've hit this same problem (but in a slightly different way) with
>>> mmci.c. The way I'm proposing to fix this for mmci is to introduce a
>>> new capability which says "clustering is supported by this driver."
>>>
>> This will decrease performance more than necessary for drivers that can
>> do clustering, just not in highmem. So what about another flag that says
>> "highmem is supported by this driver"?
>>
>
> I think you're asking Jens that question - I know of no way to tell
> the block layer that clustering is fine for normal but not highmem.
>
>

That wasn't what I meant. What I was referring to was disabling highmem
altogether, the way that is done now through looking at the dma mask.

Rgds
Pierre

2006-03-02 10:04:38

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Thu, Mar 02, 2006 at 10:52:04AM +0100, Pierre Ossman wrote:
> Russell King wrote:
> > I think you're asking Jens that question - I know of no way to tell
> > the block layer that clustering is fine for normal but not highmem.
>
> That wasn't what I meant. What I was referring to was disabling highmem
> altogether, the way that is done now through looking at the dma mask.

You need to set your struct device's dma_mask appropriately:

u64 limit = BLK_BOUNCE_HIGH;

if (host->dev->dma_mask && *host->dev->dma_mask)
limit = *host->dev->dma_mask;

blk_queue_bounce_limit(mq->queue, limit);

Hence, if dma_mask is a NULL pointer or zero, highmem will be bounced.
Neither PNP nor your platform device sets dma_mask, so highmem will
always be bounced in the case of wbsd - which from what you write above
is what you require anyway.

Note: The host can't reach the queue itself because the queues are
created dynamically - it doesn't know when the queue is created or
destroyed or which request comes from which queue. I'd also guess that
randomly changing the bounce limit will probably end up with a random
selection of requests which have been bounced and those which haven't
hitting the driver.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2006-03-02 10:26:16

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:
> On Thu, Mar 02, 2006 at 10:52:04AM +0100, Pierre Ossman wrote:
>
>> Russell King wrote:
>>
>>> I think you're asking Jens that question - I know of no way to tell
>>> the block layer that clustering is fine for normal but not highmem.
>>>
>> That wasn't what I meant. What I was referring to was disabling highmem
>> altogether, the way that is done now through looking at the dma mask.
>>
>
> You need to set your struct device's dma_mask appropriately:
>
> u64 limit = BLK_BOUNCE_HIGH;
>
> if (host->dev->dma_mask && *host->dev->dma_mask)
> limit = *host->dev->dma_mask;
>
> blk_queue_bounce_limit(mq->queue, limit);
>
> Hence, if dma_mask is a NULL pointer or zero, highmem will be bounced.
>

This I know. My beef is the readability of:

if (do_dma)
*dev->dma_mask = DEVICE_DMA_MASK;
else
*dev->dma_mask = 0;

I.e. we use dma_mask even though we don't do DMA.

> Neither PNP nor your platform device sets dma_mask, so highmem will
> always be bounced in the case of wbsd - which from what you write above
> is what you require anyway.
>
>

wbsd isn't the issue, sdhci is. PCI sets a 32-bit mask by default so I
end up with highmem pages even when I'm not doing DMA.

Rgds
Pierre

2006-03-02 11:45:33

by Russell King

[permalink] [raw]
Subject: Re: How to map high memory for block io

On Thu, Mar 02, 2006 at 11:26:13AM +0100, Pierre Ossman wrote:
> This I know. My beef is the readability of:
>
> if (do_dma)
> *dev->dma_mask = DEVICE_DMA_MASK;
> else
> *dev->dma_mask = 0;
>
> I.e. we use dma_mask even though we don't do DMA.

Not a lot we can do about the readability issue - we end up with code
like that in some layer of the MMC. If it's in mmc_queue, it might be:

u64 limit = BLK_BOUNCE_HIGH;

if (host->caps & MMC_CAP_DMA &&
host->dev->dma_mask && *host->dev->dma_mask)
limit = *host->dev->dma_mask;

which is equally as (un)readable as your code.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of: 2.6 Serial core

2007-01-30 20:41:50

by Pierre Ossman

[permalink] [raw]
Subject: Re: How to map high memory for block io

Russell King wrote:
> Okay, I've hit this same problem (but in a slightly different way) with
> mmci.c. The way I'm proposing to fix this for mmci is to introduce a
> new capability which says "clustering is supported by this driver."
>
> I'm unconvinced that we can safely fiddle with the queue's flags once
> the queue is in use, hence why I've gone for the init-time only solution.
> Maybe Jens can comment on that?
>
> (The side effect of this patch is that everyone ends up with clustering
> disabled until they explicitly update their driver to enable it - which
> results in us defaulting to a safe operation mode.)
>
>

Ok, time to warm up this old chestnut. I "solved" my immediate problem
with wbsd, and this got forgotten. AFAICT, mmci.c and the rest of the
lot are still broken in the sense that they use kmap but exceed page
limits (which happens to work on non-highmem pages).

I think the right solution is to let them use page_address() instead.
Would that be correct?

Rgds

--
-- Pierre Ossman

Linux kernel, MMC maintainer http://www.kernel.org
PulseAudio, core developer http://pulseaudio.org
rdesktop, core developer http://www.rdesktop.org