Hi!
I've just upgraded the system to 2.6.5-rc2-bk6, and I'm using
dm-crypt. It's a heavily used server, on average 20-30mbit/sec
traffic is on the wire 7/24, and just noticed, that the load is very
high. In every 4-5 sec pdflush takes a lot of cpu... Is this
intentional? I've found a similar question on kerneltrap
(http://kerneltrap.org/node/view/2756), but havent found a solution
yet. I'm just wondering if it is a problem, or it's the normal
behavior? It's a 1.8 P4 with 1G ram and highmem enabled, with 256 bit
aes thru dm-crypt.
Regards,
--
Zoltan NAGY,
Network Administrator
Dnia Monday 29 of March 2004 16:50, Zoltan NAGY napisa?:
> I've just upgraded the system to 2.6.5-rc2-bk6, and I'm using
> dm-crypt. It's a heavily used server, on average 20-30mbit/sec
> traffic is on the wire 7/24, and just noticed, that the load is very
> high. In every 4-5 sec pdflush takes a lot of cpu... Is this
> intentional? I've found a similar question on kerneltrap
> (http://kerneltrap.org/node/view/2756), but havent found a solution
> yet. I'm just wondering if it is a problem, or it's the normal
> behavior? It's a 1.8 P4 with 1G ram and highmem enabled, with 256 bit
> aes thru dm-crypt.
Same here (duron 1.2GHz, 512MB ram, IDE disks (via and promise controllers))
and it doesn't matter if I'm using cryptoloop or dm-crypt.
System hangs for about one seconds (xmms stops playing, xterms are not
responding to key presses) and then everything back to normal until next hang
happens (in few seconds).
> Zoltan NAGY,
> Network Administrator
ps. I'm not sure (didn't do any checking to be sure) but afaik this started to
happening with 2.6.3 kernel.
--
Arkadiusz Mi?kiewicz CS at FoE, Wroclaw University of Technology
arekm.pld-linux.org, 1024/3DB19BBD, JID: arekm.jabber.org, PLD/Linux
On Mon, Mar 29 2004, Zoltan NAGY wrote:
> Hi!
>
> I've just upgraded the system to 2.6.5-rc2-bk6, and I'm using
> dm-crypt. It's a heavily used server, on average 20-30mbit/sec
> traffic is on the wire 7/24, and just noticed, that the load is very
> high. In every 4-5 sec pdflush takes a lot of cpu... Is this
> intentional? I've found a similar question on kerneltrap
> (http://kerneltrap.org/node/view/2756), but havent found a solution
> yet. I'm just wondering if it is a problem, or it's the normal
> behavior? It's a 1.8 P4 with 1G ram and highmem enabled, with 256 bit
> aes thru dm-crypt.
Try the -mm kernels intead, should have lots better behaviour for
pdflush/dm interactions.
--
Jens Axboe
Jens Axboe <[email protected]> wrote:
>
> On Mon, Mar 29 2004, Zoltan NAGY wrote:
> > Hi!
> >
> > I've just upgraded the system to 2.6.5-rc2-bk6, and I'm using
> > dm-crypt. It's a heavily used server, on average 20-30mbit/sec
> > traffic is on the wire 7/24, and just noticed, that the load is very
> > high. In every 4-5 sec pdflush takes a lot of cpu... Is this
> > intentional? I've found a similar question on kerneltrap
> > (http://kerneltrap.org/node/view/2756), but havent found a solution
> > yet. I'm just wondering if it is a problem, or it's the normal
> > behavior? It's a 1.8 P4 with 1G ram and highmem enabled, with 256 bit
> > aes thru dm-crypt.
>
> Try the -mm kernels intead, should have lots better behaviour for
> pdflush/dm interactions.
>
How come? Isn't this problem just "gee, we have a lot of stuff to encrypt
during writeback"? If so, then it should be sufficient to poke a hole in
the encryption loop?
--- 25/drivers/md/dm-crypt.c~a Mon Mar 29 16:11:49 2004
+++ 25-akpm/drivers/md/dm-crypt.c Mon Mar 29 16:11:56 2004
@@ -669,6 +669,7 @@ static int crypt_map(struct dm_target *t
/* out of memory -> run queues */
if (remaining)
blk_congestion_wait(bio_data_dir(clone), HZ/100);
+ cond_resched();
}
/* drop reference, clones could have returned before we reach this */
_
On Mon, Mar 29 2004, Andrew Morton wrote:
> Jens Axboe <[email protected]> wrote:
> >
> > On Mon, Mar 29 2004, Zoltan NAGY wrote:
> > > Hi!
> > >
> > > I've just upgraded the system to 2.6.5-rc2-bk6, and I'm using
> > > dm-crypt. It's a heavily used server, on average 20-30mbit/sec
> > > traffic is on the wire 7/24, and just noticed, that the load is very
> > > high. In every 4-5 sec pdflush takes a lot of cpu... Is this
> > > intentional? I've found a similar question on kerneltrap
> > > (http://kerneltrap.org/node/view/2756), but havent found a solution
> > > yet. I'm just wondering if it is a problem, or it's the normal
> > > behavior? It's a 1.8 P4 with 1G ram and highmem enabled, with 256 bit
> > > aes thru dm-crypt.
> >
> > Try the -mm kernels intead, should have lots better behaviour for
> > pdflush/dm interactions.
> >
>
> How come? Isn't this problem just "gee, we have a lot of stuff to encrypt
> during writeback"? If so, then it should be sufficient to poke a hole in
> the encryption loop?
If that is the problem, then yeah that'd work. I was assuming the 'load'
was just io load and pdflush got stuck on them.
> --- 25/drivers/md/dm-crypt.c~a Mon Mar 29 16:11:49 2004
> +++ 25-akpm/drivers/md/dm-crypt.c Mon Mar 29 16:11:56 2004
> @@ -669,6 +669,7 @@ static int crypt_map(struct dm_target *t
> /* out of memory -> run queues */
> if (remaining)
> blk_congestion_wait(bio_data_dir(clone), HZ/100);
> + cond_resched();
> }
Reminds me, we have to kill that blk_congestion_wait() stuff too.
--
Jens Axboe
Am Mo, den 29.03.2004, um 16:12 Uhr -0800, schrieb Andrew Morton:
> How come? Isn't this problem just "gee, we have a lot of stuff to encrypt
> during writeback"? If so, then it should be sufficient to poke a hole in
> the encryption loop?
>
> --- 25/drivers/md/dm-crypt.c~a Mon Mar 29 16:11:49 2004
> +++ 25-akpm/drivers/md/dm-crypt.c Mon Mar 29 16:11:56 2004
> @@ -669,6 +669,7 @@ static int crypt_map(struct dm_target *t
> /* out of memory -> run queues */
> if (remaining)
> blk_congestion_wait(bio_data_dir(clone), HZ/100);
> + cond_resched();
> }
>
> /* drop reference, clones could have returned before we reach this */
cryptoapi always does this after every block. It also happens with
preemption enabled. I got feedback from a person who said that renicing
pdflush to 0 helped. So it looks like the CPU scheduler doesn't want to
schedule pdflush away. Hmm.
Christophe Saout <[email protected]> wrote:
>
> Am Mo, den 29.03.2004, um 16:12 Uhr -0800, schrieb Andrew Morton:
>
> > How come? Isn't this problem just "gee, we have a lot of stuff to encrypt
> > during writeback"? If so, then it should be sufficient to poke a hole in
> > the encryption loop?
> >
> > --- 25/drivers/md/dm-crypt.c~a Mon Mar 29 16:11:49 2004
> > +++ 25-akpm/drivers/md/dm-crypt.c Mon Mar 29 16:11:56 2004
> > @@ -669,6 +669,7 @@ static int crypt_map(struct dm_target *t
> > /* out of memory -> run queues */
> > if (remaining)
> > blk_congestion_wait(bio_data_dir(clone), HZ/100);
> > + cond_resched();
> > }
> >
> > /* drop reference, clones could have returned before we reach this */
>
> cryptoapi always does this after every block. It also happens with
> preemption enabled. I got feedback from a person who said that renicing
> pdflush to 0 helped. So it looks like the CPU scheduler doesn't want to
> schedule pdflush away. Hmm.
Oh, OK, since pdflush was converted to use the kthread stuff it has been
running at keventd's `nice -10'. You can probably renice it by hand.
diff -puN mm/pdflush.c~pdflush-nice-0 mm/pdflush.c
--- 25/mm/pdflush.c~pdflush-nice-0 2004-03-30 01:59:17.795116816 -0800
+++ 25-akpm/mm/pdflush.c 2004-03-30 02:02:29.865917616 -0800
@@ -177,6 +177,12 @@ static int __pdflush(struct pdflush_work
static int pdflush(void *dummy)
{
struct pdflush_work my_work;
+
+ /*
+ * pdflush can spend a lot of time doing encryption via dm-crypt. We
+ * don't want to do that at keventd's priority.
+ */
+ set_user_nice(current, 0);
return __pdflush(&my_work);
}
_
Hello Jens,
> Try the -mm kernels intead, should have lots better behaviour for
> pdflush/dm interactions.
allright. with -mm5 it is _much_ better. load is back to normal. I
suppose we should get all this stuff into mainline ASAP ;) I'm sure
lot of people are using this combo.
thanks for you help,
--
Zoltan NAGY,
Network Administrator