Er, oops, I think this one's my fault.
The recent driverfs additions for SCSI also added partition handling in
driverfs. The code is slightly more invasive than it should be so the IDE
driver needs to know how to use it (which it doesn't yet). In theory there's
a NULL pointer check in driverfs_create_partitions for precisely this case,
but it looks like the IDE code is forgetting to zero out a kmalloc of a struct
gendisk somewhere (hence the 5a5a... contents). At a cursory glance, this
seems to be in ide/probe.c, so does the attached patch fix it?
I'll try to reproduce, but I'm all SCSI here except for my laptop.
James
At 16:41 04/07/02, James Bottomley wrote:
>Er, oops, I think this one's my fault.
>
>The recent driverfs additions for SCSI also added partition handling in
>driverfs. The code is slightly more invasive than it should be so the IDE
>driver needs to know how to use it (which it doesn't yet). In theory there's
>a NULL pointer check in driverfs_create_partitions for precisely this case,
>but it looks like the IDE code is forgetting to zero out a kmalloc of a
>struct
>gendisk somewhere (hence the 5a5a... contents). At a cursory glance, this
>seems to be in ide/probe.c, so does the attached patch fix it?
>
>I'll try to reproduce, but I'm all SCSI here except for my laptop.
Your patch fixed it. Please submit to Linus!
Best regards,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
James,
The whole reason for my replacement was to add driverfs to IDE and remove
devfs and ultimately "de-gooch" the kernel. So we are nearly 100 patches
in and the primary reason for ousting is still a failure, NICE!
On Thu, 4 Jul 2002, James Bottomley wrote:
> Er, oops, I think this one's my fault.
>
> The recent driverfs additions for SCSI also added partition handling in
> driverfs. The code is slightly more invasive than it should be so the IDE
> driver needs to know how to use it (which it doesn't yet). In theory there's
> a NULL pointer check in driverfs_create_partitions for precisely this case,
> but it looks like the IDE code is forgetting to zero out a kmalloc of a struct
> gendisk somewhere (hence the 5a5a... contents). At a cursory glance, this
> seems to be in ide/probe.c, so does the attached patch fix it?
>
> I'll try to reproduce, but I'm all SCSI here except for my laptop.
>
> James
>
>
Andre Hedrick
LAD Storage Consulting Group
[email protected] said:
> The whole reason for my replacement was to add driverfs to IDE and
> remove devfs and ultimately "de-gooch" the kernel. So we are nearly
> 100 patches in and the primary reason for ousting is still a failure,
> NICE!
Well, perhaps we have slightly different agendas. I think driverfs will solve
a whole series of enumeration and mapping problems that occur in the SCSI
mid-layer and which get especially tortuous with Fibre Channel. I also think
it will help us bring the SCSI and IDE views closer together.
I persuaded Linus to put the SCSI driverfs patches in the kernel even though I
knew they touched more than SCSI (the partitions code) and were not as modular
as I would have liked. The reason is that we need to get as much visibility
on this as possible before the code freeze. I'm fully prepared to sort out
any problems with this as they arise (and indeed the panic is already fixed).
I believe it's a variation of a principle attributable to a wise Australian:
get the right solution in, even if not quite the right implementation. That
way, everyone will be extrememly motivated to help produce the right
implementation.
James
On Thu, 4 Jul 2002, James Bottomley wrote:
> [email protected] said:
> > The whole reason for my replacement was to add driverfs to IDE and
> > remove devfs and ultimately "de-gooch" the kernel. So we are nearly
> > 100 patches in and the primary reason for ousting is still a failure,
> > NICE!
>
> Well, perhaps we have slightly different agendas. I think driverfs will solve
> a whole series of enumeration and mapping problems that occur in the SCSI
> mid-layer and which get especially tortuous with Fibre Channel. I also think
> it will help us bring the SCSI and IDE views closer together.
Well there was this model I started before I got booted to unify the
transport layer to operate under the classic/standard CDB model. However
this would require fixing the FS's (which are bit-buckets for data
archives, and no more than a filing cabinet with cool features),
exteneding block to do nothing but translate between FS <> STORAGE.
Next was to have asymetric transfer because disk drives reorder to their
desire, regardless what the meatballs in Linux think. Linux could vastly
impove its position in storage if it did two simple things.
1) 8K writes and 64K (or larger) reads.
2) ONE maybe TWO passes on elevator operations.
Since this is falling on deaf ears in general, oh well.
Maybe you can carry the banner of sanity.
> I persuaded Linus to put the SCSI driverfs patches in the kernel even though I
> knew they touched more than SCSI (the partitions code) and were not as modular
> as I would have liked. The reason is that we need to get as much visibility
> on this as possible before the code freeze. I'm fully prepared to sort out
> any problems with this as they arise (and indeed the panic is already fixed).
Great!
I have no problems with "driverfs".
> I believe it's a variation of a principle attributable to a wise Australian:
> get the right solution in, even if not quite the right implementation. That
> way, everyone will be extrememly motivated to help produce the right
> implementation.
I prefer what Col. Medberry told to be "Perfectly Lazy!"
"Son, DO IT ONCE and ONLY ONCE, as not to REPEAT THE SAME ERROR!!!!"
This comes after taking his course for the second time, and being ripped
out of the class rooom and being addressed in a very stern manner.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Thu, Jul 04 2002, Andre Hedrick wrote:
> 1) 8K writes and 64K (or larger) reads.
I've heard this before, but noone seems to have tested it yet. You know,
this is a couple of lines of change in ll_rw_blk.c and blkdev.h to
support this. Any reason you haven't done that, benched, and submitted
something to that effect? I'll even walk you through the 2.5 changes
needed to do this:
blkdev.h:
unsigned short max_sectors;
change to
unsigned short max_sectors[2];
ll_rw_blk.c:
ll_back_merge_fn()
if (req->nr_sectors + bio_sectors(bio) > q->max_sectors) {
change to
if (req->nr_sectors + bio_sectors(bio) > q->max_sectors[rq_data_dir[req]) {
Ditto for ll_front_merge_fn() and ll_merge_requests_fn(). The line in
attempt_merge() can be killed.
generic_make_request()
BUG_ON(bio_sectors(bio) > q->max_sectors);
change to
BUG_ON(bio_sectors(bio) > q->max_sectors[bio_data_dir(bio)];
And do the trivial thing to blk_queue_max_sectors() as well. Now all you
need to do is change ide-probe.c to set the values you want.
> 2) ONE maybe TWO passes on elevator operations.
Explain.
> Since this is falling on deaf ears in general, oh well.
How so?
--
Jens Axboe
On Fri, 5 Jul 2002, Jens Axboe wrote:
> On Thu, Jul 04 2002, Andre Hedrick wrote:
> > 1) 8K writes and 64K (or larger) reads.
>
> I've heard this before, but noone seems to have tested it yet. You know,
> this is a couple of lines of change in ll_rw_blk.c and blkdev.h to
> support this. Any reason you haven't done that, benched, and submitted
> something to that effect? I'll even walk you through the 2.5 changes
> needed to do this:
[root@localhost mnt2]# bonnie -s 256
Bonnie 1.2: File './Bonnie.1557', size: 268435456, volumes: 1
Writing with putc()... done: 21846 kB/s 98.0 %CPU
Rewriting... done: 11099 kB/s 3.1 %CPU
Writing intelligently... done: 58316 kB/s 14.0 %CPU
Reading with getc()... done: 14346 kB/s 64.6 %CPU
Reading intelligently... done: 18026 kB/s 2.0 %CPU
Seeker 1...Seeker 3...Seeker 2...start 'em...done...done...done...
---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
localh 1* 256 21846 98.0 58316 14.0 11099 3.1 14346 64.6 18026 2.0 939.3 1.4
Yeah "bonnie" is a mickey mouse benchmark.
[root@localhost mnt2]# bonnie -s 1024
Bonnie 1.2: File './Bonnie.1598', size: 1073741824, volumes: 1
Writing with putc()... done: 20760 kB/s 97.8 %CPU
Rewriting... done: 11462 kB/s 3.2 %CPU
Writing intelligently... done: 31044 kB/s 7.6 %CPU
Reading with getc()... done: 15006 kB/s 69.2 %CPU
Reading intelligently... done: 15993 kB/s 1.6 %CPU
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
localh 1*1024 20760 97.8 31044 7.6 11462 3.2 15006 69.2 15993 1.6 159.4 0.5
Using the hardware to help us and by working with it it, once can
basically boost the write and slash the cpu usage.
> blkdev.h:
> unsigned short max_sectors;
>
> change to
>
> unsigned short max_sectors[2];
>
> ll_rw_blk.c:
> ll_back_merge_fn()
> if (req->nr_sectors + bio_sectors(bio) > q->max_sectors) {
>
> change to
>
> if (req->nr_sectors + bio_sectors(bio) > q->max_sectors[rq_data_dir[req]) {
>
> Ditto for ll_front_merge_fn() and ll_merge_requests_fn(). The line in
> attempt_merge() can be killed.
>
> generic_make_request()
> BUG_ON(bio_sectors(bio) > q->max_sectors);
>
> change to
>
> BUG_ON(bio_sectors(bio) > q->max_sectors[bio_data_dir(bio)];
>
> And do the trivial thing to blk_queue_max_sectors() as well. Now all you
> need to do is change ide-probe.c to set the values you want.
>
> > 2) ONE maybe TWO passes on elevator operations.
>
> Explain.
On writes restrict which are small the ordering is almost instant.
Specifically ONE maybe TWO passes will sort.
Reads may need more as we optimize best on big reads.
> > Since this is falling on deaf ears in general, oh well.
>
> How so?
*BLINK*
It has been generally been ignored so I am glad to see a change, thanks!
I do not do 2.5, remember I go booted and backstabbed.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Thu, Jul 04 2002, Andre Hedrick wrote:
> On Fri, 5 Jul 2002, Jens Axboe wrote:
>
> > On Thu, Jul 04 2002, Andre Hedrick wrote:
> > > 1) 8K writes and 64K (or larger) reads.
> >
> > I've heard this before, but noone seems to have tested it yet. You know,
> > this is a couple of lines of change in ll_rw_blk.c and blkdev.h to
> > support this. Any reason you haven't done that, benched, and submitted
> > something to that effect? I'll even walk you through the 2.5 changes
> > needed to do this:
>
>
> [root@localhost mnt2]# bonnie -s 256
[snip bonnie results]
These mean nothing to me -- what are they, the base line or the changed
kernel? Or none of the above?!
> Using the hardware to help us and by working with it it, once can
> basically boost the write and slash the cpu usage.
You need to add some context to that statement.
> > > 2) ONE maybe TWO passes on elevator operations.
> >
> > Explain.
>
> On writes restrict which are small the ordering is almost instant.
> Specifically ONE maybe TWO passes will sort.
>
> Reads may need more as we optimize best on big reads.
So you are saying that writes don't need to be reordered as much,
because the drive typically does that? I guess that will always be true
with write back caching, I doubt that holds for write through.
And I don't quite follow the number of passes you compare, passes of
what? Insert and merge are a single pass per request, tops.
--
Jens Axboe
Jens, those numbers have meaning regardless.
The simple fact that "reads" were nearly constant dictats that small reads
suffer a penality, while small writes only suffered disk limitations for
the most part.
To prove it yourself just set the max_sectors to 16 or 8k.
The object is to get you to try it out and see, and maybe backport
something to try async io's based on direction for 2.4.
Cheers,
On Fri, 5 Jul 2002, Jens Axboe wrote:
> On Thu, Jul 04 2002, Andre Hedrick wrote:
> > On Fri, 5 Jul 2002, Jens Axboe wrote:
> >
> > > On Thu, Jul 04 2002, Andre Hedrick wrote:
> > > > 1) 8K writes and 64K (or larger) reads.
> > >
> > > I've heard this before, but noone seems to have tested it yet. You know,
> > > this is a couple of lines of change in ll_rw_blk.c and blkdev.h to
> > > support this. Any reason you haven't done that, benched, and submitted
> > > something to that effect? I'll even walk you through the 2.5 changes
> > > needed to do this:
> >
> >
> > [root@localhost mnt2]# bonnie -s 256
>
> [snip bonnie results]
>
> These mean nothing to me -- what are they, the base line or the changed
> kernel? Or none of the above?!
>
> > Using the hardware to help us and by working with it it, once can
> > basically boost the write and slash the cpu usage.
>
> You need to add some context to that statement.
>
> > > > 2) ONE maybe TWO passes on elevator operations.
> > >
> > > Explain.
> >
> > On writes restrict which are small the ordering is almost instant.
> > Specifically ONE maybe TWO passes will sort.
> >
> > Reads may need more as we optimize best on big reads.
>
> So you are saying that writes don't need to be reordered as much,
> because the drive typically does that? I guess that will always be true
> with write back caching, I doubt that holds for write through.
>
> And I don't quite follow the number of passes you compare, passes of
> what? Insert and merge are a single pass per request, tops.
>
> --
> Jens Axboe
>
Andre Hedrick
LAD Storage Consulting Group