2006-11-21 20:33:31

by wbrana

[permalink] [raw]
Subject: [PATCH] snd-hda-intel: fix insufficient memory

Module allocates insufficient memory for multichannel and high quality
audio (96 kHz, 24 bit)
Patch for 2.6.19-* changes default/maximal size from 64/128 to 256/4096 kB.

--- sound/pci/hda/hda_intel.c.orig 2006-09-29 13:40:36.000000000 +0200
+++ sound/pci/hda/hda_intel.c 2006-11-05 16:45:13.000000000 +0100
@@ -1270,5 +1270,5 @@
snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
snd_dma_pci_data(chip->pci),
- 1024 * 64, 1024 * 128);
+ 1024 * 256, 1024 * 4096);
chip->pcm[pcm_dev] = pcm;
if (chip->pcm_devs < pcm_dev + 1)


2006-11-22 06:46:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] snd-hda-intel: fix insufficient memory

On Tue, 21 Nov 2006 21:33:29 +0100
[email protected] wrote:

> Module allocates insufficient memory for multichannel and high quality
> audio (96 kHz, 24 bit)
> Patch for 2.6.19-* changes default/maximal size from 64/128 to 256/4096 kB.
>
> --- sound/pci/hda/hda_intel.c.orig 2006-09-29 13:40:36.000000000 +0200
> +++ sound/pci/hda/hda_intel.c 2006-11-05 16:45:13.000000000 +0100
> @@ -1270,5 +1270,5 @@
> snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
> snd_dma_pci_data(chip->pci),
> - 1024 * 64, 1024 * 128);
> + 1024 * 256, 1024 * 4096);
> chip->pcm[pcm_dev] = pcm;
> if (chip->pcm_devs < pcm_dev + 1)

This was recently increased, but not by such a large amount:

snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
snd_dma_pci_data(chip->pci),
1024 * 64, 1024 * 1024);

is that sufficient?

2006-11-22 15:02:14

by Eric Dumazet

[permalink] [raw]
Subject: [RCU] adds a prefetch() in rcu_do_batch()

--- linux-2.6.19-rc6/kernel/rcupdate.c 2006-11-16 05:03:40.000000000 +0100
+++ linux-2.6.19-rc6-ed/kernel/rcupdate.c 2006-11-22 15:12:09.000000000 +0100
@@ -235,12 +235,14 @@ static void rcu_do_batch(struct rcu_data

list = rdp->donelist;
while (list) {
- next = rdp->donelist = list->next;
+ next = list->next;
+ prefetch(next);
list->func(list);
list = next;
if (++count >= rdp->blimit)
break;
}
+ rdp->donelist = list;

local_irq_disable();
rdp->qlen -= count;


Attachments:
(No filename) (633.00 B)
rcu.prefetch.patch (492.00 B)
Download all attachments

2006-11-22 17:19:51

by wbrana

[permalink] [raw]
Subject: Re: [PATCH] snd-hda-intel: fix insufficient memory

Module uses shorter buffer if memory isn't sufficient.
It doesn't change size of allocated memory automatically.
User has to change it with command like this:
echo 256 > /proc/asound/card0/pcm0p/sub0/prealloc
I can change it, but many regular users don't know it.
This card is used in new PCs, which have at least 256 MB RAM.
256 kB shouldn't be too much. Whole block isn't used if it isn't needed.

On 11/22/06, Andrew Morton <[email protected]> wrote:
> On Tue, 21 Nov 2006 21:33:29 +0100
> [email protected] wrote:
>
> > Module allocates insufficient memory for multichannel and high quality
> > audio (96 kHz, 24 bit)
> > Patch for 2.6.19-* changes default/maximal size from 64/128 to 256/4096 kB.
> >
> > --- sound/pci/hda/hda_intel.c.orig 2006-09-29 13:40:36.000000000 +0200
> > +++ sound/pci/hda/hda_intel.c 2006-11-05 16:45:13.000000000 +0100
> > @@ -1270,5 +1270,5 @@
> > snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
> > snd_dma_pci_data(chip->pci),
> > - 1024 * 64, 1024 * 128);
> > + 1024 * 256, 1024 * 4096);
> > chip->pcm[pcm_dev] = pcm;
> > if (chip->pcm_devs < pcm_dev + 1)
>
> This was recently increased, but not by such a large amount:
>
> snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
> snd_dma_pci_data(chip->pci),
> 1024 * 64, 1024 * 1024);
>
> is that sufficient?
>

2006-11-22 17:48:25

by Eric Dumazet

[permalink] [raw]
Subject: [PATCH] dont insert pipe dentries into dentry_hashtable.

--- linux-2.6.19-rc6/fs/pipe.c 2006-11-22 17:33:52.000000000 +0100
+++ linux-2.6.19-rc6-ed/fs/pipe.c 2006-11-22 17:53:02.000000000 +0100
@@ -830,7 +830,14 @@ void free_pipe_info(struct inode *inode)
static struct vfsmount *pipe_mnt __read_mostly;
static int pipefs_delete_dentry(struct dentry *dentry)
{
- return 1;
+ /*
+ * At creation time, we pretended this dentry was hashed
+ * (by clearing DCACHE_UNHASHED bit in d_flags)
+ * At delete time, we restore the truth : not hashed.
+ * (so that dput() can proceed correctly)
+ */
+ dentry->d_flags |= DCACHE_UNHASHED;
+ return 0;
}

static struct dentry_operations pipefs_dentry_operations = {
@@ -891,17 +898,22 @@ struct file *create_write_pipe(void)
if (!inode)
goto err_file;

- sprintf(name, "[%lu]", inode->i_ino);
+ this.len = sprintf(name, "[%lu]", inode->i_ino);
this.name = name;
- this.len = strlen(name);
- this.hash = inode->i_ino; /* will go */
+ this.hash = 0;
err = -ENOMEM;
dentry = d_alloc(pipe_mnt->mnt_sb->s_root, &this);
if (!dentry)
goto err_inode;

dentry->d_op = &pipefs_dentry_operations;
- d_add(dentry, inode);
+ /*
+ * We dont want to publish this dentry into global dentry hash table.
+ * We pretend dentry is already hashed, by unsetting DCACHE_UNHASHED
+ * This permits a working /proc/$pid/fd/XXX on pipes
+ */
+ dentry->d_flags &= ~DCACHE_UNHASHED;
+ d_instantiate(dentry, inode);
f->f_vfsmnt = mntget(pipe_mnt);
f->f_dentry = dentry;
f->f_mapping = inode->i_mapping;


Attachments:
(No filename) (1.08 kB)
pipe_nohash_dentry.patch (1.46 kB)
Download all attachments

2006-11-22 20:04:36

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] snd-hda-intel: fix insufficient memory

On Wed, 22 Nov 2006 18:19:49 +0100
[email protected] wrote:

> Module uses shorter buffer if memory isn't sufficient.
> It doesn't change size of allocated memory automatically.
> User has to change it with command like this:
> echo 256 > /proc/asound/card0/pcm0p/sub0/prealloc
> I can change it, but many regular users don't know it.
> This card is used in new PCs, which have at least 256 MB RAM.
> 256 kB shouldn't be too much. Whole block isn't used if it isn't needed.

That didn't answer my question.

> On 11/22/06, Andrew Morton <[email protected]> wrote:
> > On Tue, 21 Nov 2006 21:33:29 +0100
> > [email protected] wrote:
> >
> > > Module allocates insufficient memory for multichannel and high quality
> > > audio (96 kHz, 24 bit)
> > > Patch for 2.6.19-* changes default/maximal size from 64/128 to 256/4096 kB.
> > >
> > > --- sound/pci/hda/hda_intel.c.orig 2006-09-29 13:40:36.000000000 +0200
> > > +++ sound/pci/hda/hda_intel.c 2006-11-05 16:45:13.000000000 +0100
> > > @@ -1270,5 +1270,5 @@
> > > snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
> > > snd_dma_pci_data(chip->pci),
> > > - 1024 * 64, 1024 * 128);
> > > + 1024 * 256, 1024 * 4096);
> > > chip->pcm[pcm_dev] = pcm;
> > > if (chip->pcm_devs < pcm_dev + 1)
> >
> > This was recently increased, but not by such a large amount:
> >
> > snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
> > snd_dma_pci_data(chip->pci),
> > 1024 * 64, 1024 * 1024);
> >
> > is that sufficient?
> >

Are the new settings of 64kb and 1MB sufficient? If not, by how much must
they be increased, and why?

2006-11-22 21:36:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] dont insert pipe dentries into dentry_hashtable.

On Wed, 22 Nov 2006 18:48:41 +0100
Eric Dumazet <[email protected]> wrote:

> We currently insert pipe dentries into the global dentry hashtable.
> This is *suboptimal* because there is currently no way these entries can be
> used for a lookup(). (/proc/xxx/fd/xxx uses a different mechanism). Inserting
> them in dentry hashtable slow dcache lookups.
>
>
> To let __dpath() still work correctly (ie not adding a " (deleted)") after
> dentry name, we do :
>
> - Right after d_alloc(), pretend they are hashed by clearing the
> DCACHE_UNHASHED bit.
>
> - Call d_instantiate() instead of d_add() : dentry is not inserted in hash
> table.
>
> __dpath() & friends work as intended during dentry lifetime.
>
> - At dismantle time, once dput() must clear the dentry, setting again
> DCACHE_UNHASHED bit inside the custom d_delete() function provided by pipe
> code, so that dput() can just kill_it.
>
> This patch, combined with the next one (avoid RCU for never hashed dentries)
> reduced time of { pipe(p); close(p[0]); close(p[1]);} on my UP machine
> (1.6GHz Pentium-M) from 3.23 us to 2.86 us
> (But this patch does not depend on other patches, only bench results)

The DCACHE_UNHASHED games seem hacky.

Would it be cleaner to define a new dentry.d_flags bit which can be used to
indicate that this is a hashing-not-needed dentry, and to handle that over
in dcache.c?

2006-11-22 21:40:21

by Al Viro

[permalink] [raw]
Subject: Re: [PATCH] dont insert pipe dentries into dentry_hashtable.

On Wed, Nov 22, 2006 at 01:36:18PM -0800, Andrew Morton wrote:
> The DCACHE_UNHASHED games seem hacky.
>
> Would it be cleaner to define a new dentry.d_flags bit which can be used to
> indicate that this is a hashing-not-needed dentry, and to handle that over
> in dcache.c?

Much more invasive change, actually, and on fairly hot paths.

2006-11-23 04:12:42

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] dont insert pipe dentries into dentry_hashtable.

From: Andrew Morton <[email protected]>
Date: Wed, 22 Nov 2006 13:36:18 -0800

> The DCACHE_UNHASHED games seem hacky.

Note this DCACHE_UNHASHES scheme was Al Viro's suggested
implementation :-)

2006-11-23 20:17:23

by wbrana

[permalink] [raw]
Subject: Re: [PATCH] snd-hda-intel: fix insufficient memory

On 11/22/06, Andrew Morton <[email protected]> wrote:
>
> Are the new settings of 64kb and 1MB sufficient? If not, by how much must
> they be increased, and why?
>
>
Default size should be increased to 256 kB to allow same buffer length
( 16384 frames )
with 2 and 8 channels or same buffer time with 48 kHz/16 bit and 96 kHz/32 bit.
Maximal size should be at least 4096 kB, which should't limit buffer
time with any sound
like 192 kHz/8 channels/32 bit.

2006-11-23 21:11:04

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] snd-hda-intel: fix insufficient memory

On Thu, 23 Nov 2006 21:17:21 +0100
[email protected] wrote:

> On 11/22/06, Andrew Morton <[email protected]> wrote:
> >
> > Are the new settings of 64kb and 1MB sufficient? If not, by how much must
> > they be increased, and why?
> >
> >
> Default size should be increased to 256 kB to allow same buffer length
> ( 16384 frames )
> with 2 and 8 channels or same buffer time with 48 kHz/16 bit and 96 kHz/32 bit.
> Maximal size should be at least 4096 kB, which should't limit buffer
> time with any sound
> like 192 kHz/8 channels/32 bit.

Please send a new patch, with a *full* changelog which completely describes
the need for the change.

2006-11-30 01:24:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RCU] adds a prefetch() in rcu_do_batch()

On Wed, Nov 22, 2006 at 04:02:29PM +0100, Eric Dumazet wrote:
> On some workloads, (for example when lot of close() syscalls are done), RCU
> qlen can be quite large, and RCU heads are no longer in cpu cache when
> rcu_do_batch() is called.
>
> This patches adds a prefetch() in rcu_do_batch() to give CPU a hint to bring
> back cache lines containing 'struct rcu_head's.
>
> Most list manipulations macros include prefetch(), but not open coded ones (at
> least with current C compilers :) )
>
> I got a nice speedup on a trivial benchmark (3.48 us per iteration instead of
> 3.95 us on a 1.6 GHz Pentium-M)
> while (1) { pipe(p); close(fd[0]); close(fd[1]);}

Interesting! How much of the speedup was due to the prefetch() and how
much to removing the extra store to rdp->donelist?

Thanx, Paul

> Signed-off-by: Eric Dumazet <[email protected]>

> --- linux-2.6.19-rc6/kernel/rcupdate.c 2006-11-16 05:03:40.000000000 +0100
> +++ linux-2.6.19-rc6-ed/kernel/rcupdate.c 2006-11-22 15:12:09.000000000 +0100
> @@ -235,12 +235,14 @@ static void rcu_do_batch(struct rcu_data
>
> list = rdp->donelist;
> while (list) {
> - next = rdp->donelist = list->next;
> + next = list->next;
> + prefetch(next);
> list->func(list);
> list = next;
> if (++count >= rdp->blimit)
> break;
> }
> + rdp->donelist = list;
>
> local_irq_disable();
> rdp->qlen -= count;

2006-11-30 08:55:37

by Eric Dumazet

[permalink] [raw]
Subject: Re: [RCU] adds a prefetch() in rcu_do_batch()

On Thursday 30 November 2006 02:25, Paul E. McKenney wrote:
> On Wed, Nov 22, 2006 at 04:02:29PM +0100, Eric Dumazet wrote:
> > On some workloads, (for example when lot of close() syscalls are done),
> > RCU qlen can be quite large, and RCU heads are no longer in cpu cache
> > when rcu_do_batch() is called.
> >
> > This patches adds a prefetch() in rcu_do_batch() to give CPU a hint to
> > bring back cache lines containing 'struct rcu_head's.
> >
> > Most list manipulations macros include prefetch(), but not open coded
> > ones (at least with current C compilers :) )
> >
> > I got a nice speedup on a trivial benchmark (3.48 us per iteration
> > instead of 3.95 us on a 1.6 GHz Pentium-M)
> > while (1) { pipe(p); close(fd[0]); close(fd[1]);}
>
> Interesting! How much of the speedup was due to the prefetch() and how
> much to removing the extra store to rdp->donelist?

I only benchmarked the prefetch() case.

Then, when cooking the patch I found I could do the rdp->donelist affectation
after the loop. I am not sure it's worth to do another benchmark for this
trivial optimization (Please dont tell me its not a valid one :) )

Eric

2006-11-30 17:14:11

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [RCU] adds a prefetch() in rcu_do_batch()

On Thu, Nov 30, 2006 at 09:55:52AM +0100, Eric Dumazet wrote:
> On Thursday 30 November 2006 02:25, Paul E. McKenney wrote:
> > On Wed, Nov 22, 2006 at 04:02:29PM +0100, Eric Dumazet wrote:
> > > On some workloads, (for example when lot of close() syscalls are done),
> > > RCU qlen can be quite large, and RCU heads are no longer in cpu cache
> > > when rcu_do_batch() is called.
> > >
> > > This patches adds a prefetch() in rcu_do_batch() to give CPU a hint to
> > > bring back cache lines containing 'struct rcu_head's.
> > >
> > > Most list manipulations macros include prefetch(), but not open coded
> > > ones (at least with current C compilers :) )
> > >
> > > I got a nice speedup on a trivial benchmark (3.48 us per iteration
> > > instead of 3.95 us on a 1.6 GHz Pentium-M)
> > > while (1) { pipe(p); close(fd[0]); close(fd[1]);}
> >
> > Interesting! How much of the speedup was due to the prefetch() and how
> > much to removing the extra store to rdp->donelist?
>
> I only benchmarked the prefetch() case.
>
> Then, when cooking the patch I found I could do the rdp->donelist affectation
> after the loop. I am not sure it's worth to do another benchmark for this
> trivial optimization (Please dont tell me its not a valid one :) )

It would be a good idea to check it out. Modern CPUs can be a bit
on the tricky side. I have seen cases where removing instructions
slowed things down. And it can't be -that- hard to run the other
two cases!

Thanx, Paul