[PATCH] [scsi] Remove __GFP_DMA
After 821de3a27bf33f11ec878562577c586cd5f83c64, it's not necessary to alloate a
DMA buffer any more in sd.c.
Signed-off-by: Bernhard Walle <[email protected]>
---
drivers/scsi/sd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1515,7 +1515,7 @@ static int sd_revalidate_disk(struct gen
if (!scsi_device_online(sdp))
goto out;
- buffer = kmalloc(SD_BUF_SIZE, GFP_KERNEL | __GFP_DMA);
+ buffer = kmalloc(SD_BUF_SIZE, GFP_KERNEL);
if (!buffer) {
sd_printk(KERN_WARNING, sdkp, "sd_revalidate_disk: Memory "
"allocation failure.\n");
On Mon, 21 May 2007, Bernhard Walle wrote:
> [PATCH] [scsi] Remove __GFP_DMA
>
> After 821de3a27bf33f11ec878562577c586cd5f83c64, it's not necessary to alloate a
> DMA buffer any more in sd.c.
>
> Signed-off-by: Bernhard Walle <[email protected]>
Great that avoids a DMA kmalloc slab. Any other GFP_DMAs left in the scsi
layer?
Acked-by: Christoph Lameter <[email protected]>
On 5/23/07, Christoph Lameter <[email protected]> wrote:
> On Mon, 21 May 2007, Bernhard Walle wrote:
>
> > [PATCH] [scsi] Remove __GFP_DMA
> >
> > After 821de3a27bf33f11ec878562577c586cd5f83c64, it's not necessary to alloate a
> > DMA buffer any more in sd.c.
> >
> > Signed-off-by: Bernhard Walle <[email protected]>
>
> Great that avoids a DMA kmalloc slab. Any other GFP_DMAs left in the scsi
> layer?
>
> Acked-by: Christoph Lameter <[email protected]>
Yes, here is another patch
Signed-off-by: Aubrey.Li <[email protected]>
---
drivers/scsi/aacraid/commctrl.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/aacraid/commctrl.c b/drivers/scsi/aacraid/commctrl.c
index 72b0393..405722d 100644
--- a/drivers/scsi/aacraid/commctrl.c
+++ b/drivers/scsi/aacraid/commctrl.c
@@ -580,8 +580,8 @@ static int aac_send_raw_srb(struct aac_dev* dev,
void __user * arg)
for (i = 0; i < upsg->count; i++) {
u64 addr;
void* p;
- /* Does this really need to be GFP_DMA? */
- p = kmalloc(upsg->sg[i].count,GFP_KERNEL|__GFP_DMA);
+
+ p = kmalloc(upsg->sg[i].count,GFP_KERNEL);
if(p == 0) {
dprintk((KERN_DEBUG"aacraid: Could not allocate SG buffer - size
= %d buffer number %d of %d\n",
upsg->sg[i].count,i,upsg->count));
@@ -624,8 +624,8 @@ static int aac_send_raw_srb(struct aac_dev* dev,
void __user * arg)
for (i = 0; i < usg->count; i++) {
u64 addr;
void* p;
- /* Does this really need to be GFP_DMA? */
- p = kmalloc(usg->sg[i].count,GFP_KERNEL|__GFP_DMA);
+
+ p = kmalloc(usg->sg[i].count,GFP_KERNEL);
if(p == 0) {
kfree (usg);
dprintk((KERN_DEBUG"aacraid: Could not allocate SG buffer - size
= %d buffer number %d of %d\n",
@@ -666,8 +666,8 @@ static int aac_send_raw_srb(struct aac_dev* dev,
void __user * arg)
for (i = 0; i < upsg->count; i++) {
u64 addr;
void* p;
- /* Does this really need to be GFP_DMA? */
- p = kmalloc(usg->sg[i].count,GFP_KERNEL|__GFP_DMA);
+
+ p = kmalloc(usg->sg[i].count,GFP_KERNEL);
if(p == 0) {
dprintk((KERN_DEBUG"aacraid: Could not allocate SG buffer - size
= %d buffer number %d of %d\n",
usg->sg[i].count,i,usg->count));
--
1.5.1.1
On Wed, 2007-05-23 at 10:41 +0800, Aubrey Li wrote:
> On 5/23/07, Christoph Lameter <[email protected]> wrote:
> > On Mon, 21 May 2007, Bernhard Walle wrote:
> >
> > > [PATCH] [scsi] Remove __GFP_DMA
> > >
> > > After 821de3a27bf33f11ec878562577c586cd5f83c64, it's not necessary to alloate a
> > > DMA buffer any more in sd.c.
> > >
> > > Signed-off-by: Bernhard Walle <[email protected]>
> >
> > Great that avoids a DMA kmalloc slab. Any other GFP_DMAs left in the scsi
> > layer?
> >
> > Acked-by: Christoph Lameter <[email protected]>
>
> Yes, here is another patch
I'll defer to Mark on this one. However, please remember that you can't
just blindly remove GFP_DMA ... there are some cards which require it.
Aacraid is one example ... it has a set of cards that can only DMA to 31
bits. For them, the GFP_DMA is necessary: The allocation in question
is a scatterlist, which must be within the device DMA mask.
James
On Wed, 23 May 2007, James Bottomley wrote:
> I'll defer to Mark on this one. However, please remember that you
> can't just blindly remove GFP_DMA ... there are some cards which
> require it.
>
> Aacraid is one example ... it has a set of cards that can only DMA
> to 31 bits. For them, the GFP_DMA is necessary: The allocation in
> question is a scatterlist, which must be within the device DMA mask.
a question i asked a while back, and still haven't seen an answer for
-- given this in include/linux/gfp.h:
#define GFP_DMA __GFP_DMA
is there a qualitative difference between these two macros? is there
*supposed* to be? if there isn't, one would think that just one
variation would be sufficient.
rday
--
========================================================================
Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA
http://fsdev.net/wiki/index.php?title=Main_Page
========================================================================
On Wed, 2007-05-23 at 11:55 -0400, Robert P. J. Day wrote:
> On Wed, 23 May 2007, James Bottomley wrote:
>
> > I'll defer to Mark on this one. However, please remember that you
> > can't just blindly remove GFP_DMA ... there are some cards which
> > require it.
> >
> > Aacraid is one example ... it has a set of cards that can only DMA
> > to 31 bits. For them, the GFP_DMA is necessary: The allocation in
> > question is a scatterlist, which must be within the device DMA mask.
>
> a question i asked a while back, and still haven't seen an answer for
> -- given this in include/linux/gfp.h:
>
> #define GFP_DMA __GFP_DMA
>
> is there a qualitative difference between these two macros? is there
> *supposed* to be? if there isn't, one would think that just one
> variation would be sufficient.
__GFP_ are the raw GFP flags ... the GFP_ are combinations of the flags
with predefined meanings. There's no convention that you have to use
one form or the other when making combinations of the allocation flags.
Historically that's lead to things like
GFP_ATOMIC | __GFP_DMA (indicating additional DMA zone to the usual
atomic flags)
and
GFP_ATOMIC | GFP_DMA (indicating same thing, but with the defined flags)
You can argue that GFP_DMA has a pretty bogus GFP meaning and should
never appear on its own, which, to my mind makes the former usage
preferable. However, GFP_DMA has been in linux since pretty much the
dawn of ISA drivers, so I think it's conventionally well understood.
James
James
The 31 bit limit for some of these cards is a problem, we currently only
do __GFP_DMA for bounce buffer sg elements allocated for user supplied
references in ioctls.
I figure we should be using pci_alloc_consistent calls for these
allocations to more accurately acquire memory within the 31 bit limit if
necessary, we could switch to these to remove the need for the __GFP_DMA
flag in the aacraid driver?
-- Mark
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of James Bottomley
Sent: Wednesday, May 23, 2007 11:18 AM
To: Aubrey Li
Cc: Christoph Lameter; Bernhard Walle; [email protected];
Andrew Morton; [email protected]
Subject: Re: [PATCH] [scsi] Remove __GFP_DMA
On Wed, 2007-05-23 at 10:41 +0800, Aubrey Li wrote:
> On 5/23/07, Christoph Lameter <[email protected]> wrote:
> > On Mon, 21 May 2007, Bernhard Walle wrote:
> >
> > > [PATCH] [scsi] Remove __GFP_DMA
> > >
> > > After 821de3a27bf33f11ec878562577c586cd5f83c64, it's not necessary
to alloate a
> > > DMA buffer any more in sd.c.
> > >
> > > Signed-off-by: Bernhard Walle <[email protected]>
> >
> > Great that avoids a DMA kmalloc slab. Any other GFP_DMAs left in the
scsi
> > layer?
> >
> > Acked-by: Christoph Lameter <[email protected]>
>
> Yes, here is another patch
I'll defer to Mark on this one. However, please remember that you can't
just blindly remove GFP_DMA ... there are some cards which require it.
Aacraid is one example ... it has a set of cards that can only DMA to 31
bits. For them, the GFP_DMA is necessary: The allocation in question
is a scatterlist, which must be within the device DMA mask.
James
On Wed, 23 May 2007 15:17:08 -0400
"Salyzyn, Mark" <[email protected]> wrote:
> The 31 bit limit for some of these cards is a problem, we currently only
> do __GFP_DMA for bounce buffer sg elements allocated for user supplied
> references in ioctls.
>
> I figure we should be using pci_alloc_consistent calls for these
> allocations to more accurately acquire memory within the 31 bit limit if
> necessary, we could switch to these to remove the need for the __GFP_DMA
> flag in the aacraid driver?
That didn't used to work right on the AMD boards when I tried it last as
we ended up with a buffer that was mapped by the IOMMU for some reason
and that was not below 2GB.
Alan Cox wrote:
> On Wed, 23 May 2007 15:17:08 -0400
> "Salyzyn, Mark" <[email protected]> wrote:
>
>> The 31 bit limit for some of these cards is a problem, we currently only
>> do __GFP_DMA for bounce buffer sg elements allocated for user supplied
>> references in ioctls.
>>
>> I figure we should be using pci_alloc_consistent calls for these
>> allocations to more accurately acquire memory within the 31 bit limit if
>> necessary, we could switch to these to remove the need for the __GFP_DMA
>> flag in the aacraid driver?
>
> That didn't used to work right on the AMD boards when I tried it last as
> we ended up with a buffer that was mapped by the IOMMU for some reason
> and that was not below 2GB.
The physical address you mean? If that is still happening then it needs
to get fixed. The allocation should not succeed if it can't provide
memory that's inside the DMA mask for the device..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/
> > That didn't used to work right on the AMD boards when I tried it last as
> > we ended up with a buffer that was mapped by the IOMMU for some reason
> > and that was not below 2GB.
>
> The physical address you mean? If that is still happening then it needs
> to get fixed. The allocation should not succeed if it can't provide
> memory that's inside the DMA mask for the device..
But the allocation can succeed - using GFP_DMA at least you can do it as
you get memory below 2^24 you don't need to map via the IOMMU
So, is the sequence:
p = kmalloc(upsg->sg[i].count,GFP_KERNEL);
. . .
addr = pci_map_single(dev->pdev, p, upsg->sg[i].count,
data_dir);
Going to ensure that we have a 31 bit (not 32 bit) physical address?
If not, then I reject this patch. We can not consider replacement with
pci_alloc_consistent until it works on AMD respecting the DMA masks.
Sincerely -- Mark Salyzyn
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Aubrey Li
> Sent: Tuesday, May 22, 2007 10:41 PM
> To: Christoph Lameter
> Cc: Bernhard Walle; [email protected]; Andrew
> Morton; [email protected]; James Bottomley
> Subject: Re: [PATCH] [scsi] Remove __GFP_DMA
>
>
> On 5/23/07, Christoph Lameter <[email protected]> wrote:
> > On Mon, 21 May 2007, Bernhard Walle wrote:
> >
> > > [PATCH] [scsi] Remove __GFP_DMA
> > >
> > > After 821de3a27bf33f11ec878562577c586cd5f83c64, it's not
> necessary to alloate a
> > > DMA buffer any more in sd.c.
> > >
> > > Signed-off-by: Bernhard Walle <[email protected]>
> >
> > Great that avoids a DMA kmalloc slab. Any other GFP_DMAs
> left in the scsi
> > layer?
> >
> > Acked-by: Christoph Lameter <[email protected]>
>
> Yes, here is another patch
>
> Signed-off-by: Aubrey.Li <[email protected]>
> ---
> drivers/scsi/aacraid/commctrl.c | 12 ++++++------
> 1 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/aacraid/commctrl.c
> b/drivers/scsi/aacraid/commctrl.c
> index 72b0393..405722d 100644
> --- a/drivers/scsi/aacraid/commctrl.c
> +++ b/drivers/scsi/aacraid/commctrl.c
> @@ -580,8 +580,8 @@ static int aac_send_raw_srb(struct aac_dev* dev,
> void __user * arg)
> for (i = 0; i < upsg->count; i++) {
> u64 addr;
> void* p;
> - /* Does this really need to be
> GFP_DMA? */
> - p =
> kmalloc(upsg->sg[i].count,GFP_KERNEL|__GFP_DMA);
> +
> + p =
> kmalloc(upsg->sg[i].count,GFP_KERNEL);
> if(p == 0) {
>
> dprintk((KERN_DEBUG"aacraid: Could not allocate SG buffer - size
> = %d buffer number %d of %d\n",
>
> upsg->sg[i].count,i,upsg->count));
> @@ -624,8 +624,8 @@ static int aac_send_raw_srb(struct aac_dev* dev,
> void __user * arg)
> for (i = 0; i < usg->count; i++) {
> u64 addr;
> void* p;
> - /* Does this really need to be
> GFP_DMA? */
> - p =
> kmalloc(usg->sg[i].count,GFP_KERNEL|__GFP_DMA);
> +
> + p =
> kmalloc(usg->sg[i].count,GFP_KERNEL);
> if(p == 0) {
> kfree (usg);
>
> dprintk((KERN_DEBUG"aacraid: Could not allocate SG buffer - size
> = %d buffer number %d of %d\n",
> @@ -666,8 +666,8 @@ static int aac_send_raw_srb(struct aac_dev* dev,
> void __user * arg)
> for (i = 0; i < upsg->count; i++) {
> u64 addr;
> void* p;
> - /* Does this really need to be
> GFP_DMA? */
> - p =
> kmalloc(usg->sg[i].count,GFP_KERNEL|__GFP_DMA);
> +
> + p =
> kmalloc(usg->sg[i].count,GFP_KERNEL);
> if(p == 0) {
>
> dprintk((KERN_DEBUG"aacraid: Could not allocate SG buffer - size
> = %d buffer number %d of %d\n",
>
> usg->sg[i].count,i,usg->count));
> --
> 1.5.1.1
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Thu, 2007-05-24 at 09:24 -0400, Salyzyn, Mark wrote:
> So, is the sequence:
>
> p = kmalloc(upsg->sg[i].count,GFP_KERNEL);
> . . .
> addr = pci_map_single(dev->pdev, p, upsg->sg[i].count,
> data_dir);
>
> Going to ensure that we have a 31 bit (not 32 bit) physical address?
No, unfortunately. Implementing kmalloc_mask() and kmalloc_dev() was
something I said I'd do ... about two years ago.
> If not, then I reject this patch. We can not consider replacement with
> pci_alloc_consistent until it works on AMD respecting the DMA masks.
It should, I believe ... x86_64 has a complex allocation scheme where
for masks < 32 bit it first tries in GFP_DMA32 and sees if it gets lucky
before falling back to GFP_DMA. i386 just goes straight to GFP_DMA.
James
On Thu, 24 May 2007, Salyzyn, Mark wrote:
> So, is the sequence:
>
> p = kmalloc(upsg->sg[i].count,GFP_KERNEL);
> . . .
> addr = pci_map_single(dev->pdev, p, upsg->sg[i].count,
> data_dir);
>
> Going to ensure that we have a 31 bit (not 32 bit) physical address?
Only if you have less than 2G of memory. So no.
On Thu, 24 May 2007, James Bottomley wrote:
> > Going to ensure that we have a 31 bit (not 32 bit) physical address?
>
> No, unfortunately. Implementing kmalloc_mask() and kmalloc_dev() was
> something I said I'd do ... about two years ago.
Tell me more about these ideas.
On Thu, 2007-05-24 at 10:00 -0700, Christoph Lameter wrote:
> On Thu, 24 May 2007, James Bottomley wrote:
>
> > > Going to ensure that we have a 31 bit (not 32 bit) physical address?
> >
> > No, unfortunately. Implementing kmalloc_mask() and kmalloc_dev() was
> > something I said I'd do ... about two years ago.
>
> Tell me more about these ideas.
Oh, it was this Kernel Summit presentation which discussed it
http://licensing.steeleye.com/support/papers/kernel_summit_iommu.pdf
The writeup of the session is here:
http://lwn.net/Articles/144100/
The idea was basically to match an allocation to a device mask. I was
going to do a generic implementation (which would probably kmalloc,
check the physaddr and fall back to GFP_DMA if we were unlucky) but
allow the architectures to override.
However, the majority of need (except for the aacraid like devices) was
solved by GFP_DMA32
James
On Thu, 24 May 2007, James Bottomley wrote:
> The idea was basically to match an allocation to a device mask. I was
> going to do a generic implementation (which would probably kmalloc,
> check the physaddr and fall back to GFP_DMA if we were unlucky) but
> allow the architectures to override.
Hmmmm... We could actually implement something like it in the slab
allocators. The mask parameter would lead the allocator to check if the
objects are in a satisfactory range. If not it could examine its partial
lists for slabs that satisfy the range. If that does not work then it
would eventually go to the page allocator to ask for a page in a fitting
range.
That wont be fast though. How performance sensitive are the allocations?
On Thu, 2007-05-24 at 10:22 -0700, Christoph Lameter wrote:
> On Thu, 24 May 2007, James Bottomley wrote:
>
> > The idea was basically to match an allocation to a device mask. I was
> > going to do a generic implementation (which would probably kmalloc,
> > check the physaddr and fall back to GFP_DMA if we were unlucky) but
> > allow the architectures to override.
>
> Hmmmm... We could actually implement something like it in the slab
> allocators. The mask parameter would lead the allocator to check if the
> objects are in a satisfactory range. If not it could examine its partial
> lists for slabs that satisfy the range. If that does not work then it
> would eventually go to the page allocator to ask for a page in a fitting
> range.
>
> That wont be fast though. How performance sensitive are the allocations?
Most of them aren't very. They're usually things that are set at driver
intialisation time (shared mailbox memory etc).
If the allocation is performance sensitive and has to be done at command
or ioctl submission time, we tend to feed the consistent or mapped
memory into a dma pool which pre allocates and solves the performance
issue.
In the aacraid case, the mbox is done at ioctl time, but isn't
incredibly performance sensitive.
James
On Wed, 2007-05-23 at 20:38 +0100, Alan Cox wrote:
> On Wed, 23 May 2007 15:17:08 -0400
> "Salyzyn, Mark" <[email protected]> wrote:
>
> > The 31 bit limit for some of these cards is a problem, we currently only
> > do __GFP_DMA for bounce buffer sg elements allocated for user supplied
> > references in ioctls.
> >
> > I figure we should be using pci_alloc_consistent calls for these
> > allocations to more accurately acquire memory within the 31 bit limit if
> > necessary, we could switch to these to remove the need for the __GFP_DMA
> > flag in the aacraid driver?
>
> That didn't used to work right on the AMD boards when I tried it last as
> we ended up with a buffer that was mapped by the IOMMU for some reason
> and that was not below 2GB.
Is that a bug in the x86_64 subsystem ... or just a bug in the HW?
I'm assuming what happened is that the GART aperture was placed above
the 2GB limit making the GART unable to act as an IOMMU for any devices
with a dma_mask < 32bit?
This does show up a problem in our dma_ functions: namely that if the
system says it has an IOMMU then we assume it's fully functional (i.e.
able to reach all addresses). There was a move afoot a while ago to
replace the single PCI_DMA_BUS_IS_PHYS macro which is used to tell the
system globally if we have an IOMMU with something like
dma_dev_uses_iommu(struct device *) which would do the same thing but on
a per-device basis. Does this need to be resurrected?
James