2002-11-07 17:42:13

by MdkDev

[permalink] [raw]
Subject: 2.5.46: ide-cd cdrecord success report


Decided to replicate Adam Kropelins CD burning test (burn cd while
executing 'dd if=/dev/zero of=foo bs=1M'). Didn't have any problems - I
burned 323 MB ISO image while running the aforementioned dd command.
cdrecord reported:Track 01: 323 of 323 MB written (fifo 100%) [buf 99%] 4.2x.
Track 01: Total bytes read/written: 339247104/339247104 (165648 sectors).
Writing time: 566.244s
Average write speed 4.0x.
Min drive buffer fill was 99%
Fixating...
Fixating time: 77.859s
cdrecord: fifo had 5344 puts and 5344 gets.
cdrecord: fifo was 0 times empty and 5186 times full, min fill was 92%.

File foo contained 7363 1 MB records.

Hardware:
CPU - AMD XP 2100+
RAM - 512 MB
MB - MSI KT3 Ultra3 (VIA KT333 chipset)
HDD - 2 IBM Deskstar IDE disks (using integrated RAID controller PDC 20276
as an ordinary ATA133 controller)CD burner - LiteOn LTR-16101B



2002-11-07 17:57:12

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

On Thu, Nov 07 2002, MdkDev wrote:
>
> Decided to replicate Adam Kropelins CD burning test (burn cd while
> executing 'dd if=/dev/zero of=foo bs=1M'). Didn't have any problems - I
> burned 323 MB ISO image while running the aforementioned dd command.

Cool, are you using an ide drive as the source of the iso?

Thanks for the raport. I'd also like raports such as this one (which I
really do appreciate) to contain an oppinion of how well cd recording
works on your system now as compared to before. Anything from "didn't
notice any difference" to "it's much faster, I noticed that because" and
"bah it sucks right now, ..." would be fine :)

> HDD - 2 IBM Deskstar IDE disks (using integrated RAID controller PDC 20276
> as an ordinary ATA133 controller)CD burner - LiteOn LTR-16101B

IDE, indeed :-)

deadline scheduler works really well with ide drives, SCSI tends to
still _suck_.

--
Jens Axboe

2002-11-07 18:00:37

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

On Thu, Nov 07, 2002 at 07:49:01PM +0200, MdkDev wrote:
>
> Decided to replicate Adam Kropelins CD burning test (burn cd while
> executing 'dd if=/dev/zero of=foo bs=1M'). Didn't have any problems - I
> burned 323 MB ISO image while running the aforementioned dd command.
> cdrecord reported:Track 01: 323 of 323 MB written (fifo 100%) [buf 99%] 4.2x.
> Track 01: Total bytes read/written: 339247104/339247104 (165648 sectors).
> Writing time: 566.244s
> Average write speed 4.0x.
> Min drive buffer fill was 99%
> Fixating...
> Fixating time: 77.859s
> cdrecord: fifo had 5344 puts and 5344 gets.
> cdrecord: fifo was 0 times empty and 5186 times full, min fill was 92%.
>
> File foo contained 7363 1 MB records.
>
> Hardware:
> CPU - AMD XP 2100+
> RAM - 512 MB
> MB - MSI KT3 Ultra3 (VIA KT333 chipset)
> HDD - 2 IBM Deskstar IDE disks (using integrated RAID controller PDC 20276
> as an ordinary ATA133 controller)CD burner - LiteOn LTR-16101B

Thanks, this is good information. Was the destination for the 'dd' and
the source CD image on the same drive? What filesystem were you using?

I notice you used a 4x writer...I'll try lowering my write speed to 4x
and see if that makes a difference. I'll also see if I can rig up an
IDE disk instead of SCSI.

--Adam

Subject: Re: 2.5.46: ide-cd cdrecord success report

On Thu, 7 Nov 2002, Jens Axboe wrote:

> Thanks for the raport. I'd also like raports such as this one (which I
> really do appreciate) to contain an oppinion of how well cd recording
> works on your system now as compared to before. Anything from "didn't
> notice any difference" to "it's much faster, I noticed that because" and
> "bah it sucks right now, ..." would be fine :)
>

OK, this is what I've done.
for i in `seq 100`; do tar jxf linux-2.5.41.tar.bz2 ; rm -fr linux-2.5.41
; done

and started copying a 571MB ISO image at 10x, then I've started mozilla,
opened some pages with flash and java, xmms came next, I'm listening to
mp3 (not a single skip, it used to skip in the "Fixating" part with
ide-scsi)... ah, and I'm reading email using Evolution. Everything under
Gnome 2.0 (RH 8.0)

Track 01: 571 of 571 MB written (fifo 100%) [buf 99%] 10.6x.
Track 01: Total bytes read/written: 599654400/599654400 (292800 sectors).
Writing time: 400.083s
Average write speed 9.9x.
Min drive buffer fill was 73%
Fixating...
Fixating time: 27.454s
cdrecord: fifo had 9446 puts and 9446 gets.
cdrecord: fifo was 0 times empty and 8430 times full, min fill was 76%.

This is a lot better than with ide-scsi, system felt smoother also.

Specs:
Celeron 600Mhz
256MB RAM
cheap mobo, cheap disks UDMA66 and UDMA33
hdb: Hewlett-Packard CD-Writer Plus 9300, ATAPI CD/DVD-ROM drive

tar was done in the same disk in which ISO image is. I'm using ext3
data=ordered

The only annoying thing is that if I left the drive without a disc my
syslog get flooded with end_request: I/O error, dev hdb, sector 0
messages, but I guess this is somewhat related to nautilus.


Best regards
--
Robinson Maureira Castillo
Asesor DAI
INACAP

2002-11-08 01:46:43

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

On Thu, Nov 07, 2002 at 08:14:09PM +0200, MdkDev wrote:
>
> > Thanks, this is good information. Was the destination for the 'dd' and
> > the source CD image on the same drive? What filesystem were you using?
>
> Yes, destination for the dd and the source for the CD image was on the
> same drive. I'm using ext3 filesystem.
>
> > I notice you used a 4x writer...I'll try lowering my write speed to 4x
> > and see if that makes a difference. I'll also see if I can rig up an
> > IDE disk instead of SCSI.
>
> The burner is capable of writing 16x (CDR) or 10x (CDRW) but the speed was
> 4x in this case because of the media. I'll try and repeat the test using
> max speed.

I've done some more testing and narrowed the problem down a little. I
switched to an IDE drive to rule out SCSI and found that the problem is
still with me: a parallel `dd` to the image source drive still kills burn.

Then I lowered the write speed from 12x down to 4x to match yours...and
then the problem went away. A look at `vmstat 1` shows that the heavy
write load causes the read throughput to drop to about 768 KB/sec.
That's far too slow for 12x but is just about right for 4x. Makes sense,
then that a 4x burn survives.

It's easy to duplicate without involving cdrecord at all. Just do...

dd if=/dev/hda1 of=/dev/null bs=1M

...in parallel with...

dd if=/dev/zero of=/mnt/foo bs=1M

When you kick in the write load, the read throughput drops to < 1
MB/sec. `vmstat 1` output below shows the transition...

--Adam

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 0 4272 112684 20356 36856 0 0 0 16 1016 72 5 0 95
0 1 1 4272 108476 23676 36856 0 0 3320 0 1067 224 13 4 83
1 0 1 4272 96656 35580 36856 0 0 11904 0 1192 474 30 39 30
0 1 0 4272 84836 47356 36856 0 0 11776 0 1192 463 30 42 27
0 1 1 4272 72956 59268 36856 0 0 11904 40 1313 1139 44 38 18
0 1 0 4272 61076 71044 36856 0 0 11776 0 1200 478 29 43 29
0 1 0 4272 49256 82828 36856 0 0 11784 0 1199 522 39 37 24
0 1 0 4272 37196 94732 36856 0 0 11904 0 1199 497 35 38 26
0 1 0 4272 25256 106508 36856 0 0 11776 0 1199 512 32 43 24
1 2 3 4692 2468 75604 90132 0 420 9608 10888 1189 721 19 75 6
0 2 3 4692 3540 46932 117580 0 0 808 10428 1093 287 30 70 0
0 2 3 4692 3448 42916 121624 0 0 1024 8824 1099 266 65 35 0
0 2 3 4692 3200 35152 129692 0 0 1024 6512 1088 294 55 45 0
0 2 4 4692 1564 24640 141696 0 0 768 7688 1091 277 44 56 0
1 1 2 4692 3492 19044 145484 0 0 1024 5956 1084 285 69 31 0
0 2 3 4692 3492 12020 152448 0 0 768 7192 1085 298 55 45 0
0 2 3 4692 3432 12444 152056 0 0 676 6852 1086 266 55 45 0
0 2 3 4692 3612 12196 152288 0 0 768 5724 1083 287 55 45 0
0 2 3 4692 3552 12580 151900 0 0 768 7756 1088 272 65 35 0
2 2 3 4692 3492 12840 152048 0 0 1024 5032 1195 691 66 34 0
0 2 3 4692 3792 13304 151212 0 0 772 7452 1191 586 60 40 0

2002-11-08 02:03:49

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

Adam Kropelin wrote:
>
> ..
> dd if=/dev/hda1 of=/dev/null bs=1M
>
> ...in parallel with...
>
> dd if=/dev/zero of=/mnt/foo bs=1M
>
> When you kick in the write load, the read throughput drops to < 1
> MB/sec. `vmstat 1` output below shows the transition...
>
> --Adam
>
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 2 0 0 4272 112684 20356 36856 0 0 0 16 1016 72 5 0 95
> 0 1 1 4272 108476 23676 36856 0 0 3320 0 1067 224 13 4 83
> 1 0 1 4272 96656 35580 36856 0 0 11904 0 1192 474 30 39 30
> 0 1 0 4272 84836 47356 36856 0 0 11776 0 1192 463 30 42 27
> 0 1 1 4272 72956 59268 36856 0 0 11904 40 1313 1139 44 38 18
> 0 1 0 4272 61076 71044 36856 0 0 11776 0 1200 478 29 43 29
> 0 1 0 4272 49256 82828 36856 0 0 11784 0 1199 522 39 37 24
> 0 1 0 4272 37196 94732 36856 0 0 11904 0 1199 497 35 38 26
> 0 1 0 4272 25256 106508 36856 0 0 11776 0 1199 512 32 43 24
> 1 2 3 4692 2468 75604 90132 0 420 9608 10888 1189 721 19 75 6
> 0 2 3 4692 3540 46932 117580 0 0 808 10428 1093 287 30 70 0
> 0 2 3 4692 3448 42916 121624 0 0 1024 8824 1099 266 65 35 0
> 0 2 3 4692 3200 35152 129692 0 0 1024 6512 1088 294 55 45 0
> 0 2 4 4692 1564 24640 141696 0 0 768 7688 1091 277 44 56 0
> 1 1 2 4692 3492 19044 145484 0 0 1024 5956 1084 285 69 31 0

Your bandwidth there fell from 12 megs/sec to around 8. That is
reasonable, I think. It's just that the read-vs-write balance is
wrong for you.

Try changing drivers/block/deadline-iosched.c:fifo_batch to 16.

And the application isn't performing enough caching, perhaps.

The VM could be fairly easily changed to defer writeback if there is
read activity happening on the same spindle. Been thinking about
that (and its relative, early flush) a bit. But that write has
to go to disk sometime.

2002-11-08 02:42:29

by Adam Kropelin

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

On Thu, Nov 07, 2002 at 06:10:17PM -0800, Andrew Morton wrote:
> Your bandwidth there fell from 12 megs/sec to around 8. That is
> reasonable, I think. It's just that the read-vs-write balance is
> wrong for you.

Agreed. I only thought this was a problem because you said it should
work. ;)

> Try changing drivers/block/deadline-iosched.c:fifo_batch to 16.

Works! A 12x burn succeeded with a parallell dd *and* and make -j20.
Overall disk throughput suffered by a couple MB/s but there was a solid
2 MB/s left for the recorder.

> And the application isn't performing enough caching, perhaps.

Perhaps, but it was a steady downward spiral, and with a read
throughput far lower than needed I think the application would have
had to cache the entire image in order to survive.

> The VM could be fairly easily changed to defer writeback if there is
> read activity happening on the same spindle. Been thinking about
> that (and its relative, early flush) a bit. But that write has
> to go to disk sometime.

Sounds interesting. It seems that this is the sort of behavior is
precisely what some workloads need and precisely the opposite of
what others need. Such is the life of a vm hacker, I suppose. :/

--Adam

2002-11-08 07:59:40

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

On Thu, Nov 07 2002, Adam Kropelin wrote:
> > Try changing drivers/block/deadline-iosched.c:fifo_batch to 16.
>
> Works! A 12x burn succeeded with a parallell dd *and* and make -j20.
> Overall disk throughput suffered by a couple MB/s but there was a solid
> 2 MB/s left for the recorder.

Ok I'm just about convinced now, I'll make 16 the default batch count.
I'm very happy to hear that the deadline scheduler gets the job done
there.

--
Jens Axboe

2002-11-08 11:19:32

by Ingo Oeser

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

Hi Jens,

On Fri, Nov 08, 2002 at 09:05:58AM +0100, Jens Axboe wrote:
> Ok I'm just about convinced now, I'll make 16 the default batch count.
> I'm very happy to hear that the deadline scheduler gets the job done
> there.

Isn't it exactly seek_cost, which you want?

Would you like to make it tunable from user space somehow?

Since Adam already noticed, that there might not be a "perfect"
value for all, this is the logical next step.

PS: If you update, please consider an update of your comments
there, too ;-)

Regards

Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

2002-11-08 11:36:53

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

On Fri, Nov 08 2002, Ingo Oeser wrote:
> Hi Jens,
>
> On Fri, Nov 08, 2002 at 09:05:58AM +0100, Jens Axboe wrote:
> > Ok I'm just about convinced now, I'll make 16 the default batch count.
> > I'm very happy to hear that the deadline scheduler gets the job done
> > there.
>
> Isn't it exactly seek_cost, which you want?

Yes it's one request for a non-contig range, or a number of contig ones.
It isn't quite clear that this is always a good thing. Since we are
moving request from the sorted list, there's a chance that even though
serving X and then X+1 will incur a seek, it will be a small one. But
for now, yes, a seek is a seek and 16 is equal to seek_cost and
therefore it will be just the one.

> Would you like to make it tunable from user space somehow?

Yes of course, that has been the plan all along. I'm in fact doing that
right now.

> Since Adam already noticed, that there might not be a "perfect"
> value for all, this is the logical next step.

Noone ever questioned the fact that these tunables should in fact be
tunable.

> PS: If you update, please consider an update of your comments
> there, too ;-)

Patch is sent hours ago, and I did.

--
Jens Axboe

2002-11-08 13:52:58

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.46: ide-cd cdrecord success report

On Fri, Nov 08 2002, Jens Axboe wrote:
> > Would you like to make it tunable from user space somehow?
>
> Yes of course, that has been the plan all along. I'm in fact doing that
> right now.

Here's a patch that includes that feature, puts the tunables in sysfs
(so you obviously need that mounted). In

/sys/block/<disk>/iosched

you will find (for the deadline scheduler):

bart:/sys/block # ls hda/iosched/
. .. fifo_batch front_merges read_expire seek_cost writes_starved

This patch also has the deadline rbtree changes. Since it's just a
prototype, I didn't bother extracting the bits.

Pat, there are a few bugs in sysfs writeable files. Al discovered that
sysfs_write_file doesn't return -EINVAL if not ->store is defined, which
means that user apps will repeatedly call write() for an unwriteable
file because it keeps returning 0. Irk. In addition, permission checks
are buggy as well. Try adding a file with just S_IRUGO. Opening such a
beast for writing will succeed just fine. I'm guessing not a whole lot
of writeable sysfs files exist yet? :-)

===== drivers/block/deadline-iosched.c 1.11 vs edited =====
--- 1.11/drivers/block/deadline-iosched.c Fri Nov 8 10:01:37 2002
+++ edited/drivers/block/deadline-iosched.c Fri Nov 8 14:51:05 2002
@@ -17,6 +17,7 @@
#include <linux/init.h>
#include <linux/compiler.h>
#include <linux/hash.h>
+#include <linux/rbtree.h>

/*
* feel free to try other values :-). read_expire value is the timeout for
@@ -33,7 +34,7 @@
*/
static int writes_starved = 2;

-static const int deadline_hash_shift = 8;
+static const int deadline_hash_shift = 10;
#define DL_HASH_BLOCK(sec) ((sec) >> 3)
#define DL_HASH_FN(sec) (hash_long(DL_HASH_BLOCK((sec)), deadline_hash_shift))
#define DL_HASH_ENTRIES (1 << deadline_hash_shift)
@@ -48,7 +49,7 @@
/*
* run time data
*/
- struct list_head sort_list[2]; /* sorted listed */
+ struct rb_root rb_list[2];
struct list_head read_fifo; /* fifo list */
struct list_head *dispatch; /* driver dispatch queue */
struct list_head *hash; /* request hash */
@@ -60,19 +61,34 @@
* settings that change how the i/o scheduler behaves
*/
unsigned int fifo_batch;
- unsigned long read_expire;
+ unsigned int read_expire;
unsigned int seek_cost;
unsigned int writes_starved;
+ unsigned int front_merges;
};

/*
* pre-request data.
*/
struct deadline_rq {
- struct list_head fifo;
+ /*
+ * rbtree index, key is the starting offset
+ */
+ struct rb_node rb_node;
+ sector_t rb_key;
+
+ struct request *request;
+
+ /*
+ * request hash, key is the ending offset (for back merge lookup)
+ */
struct list_head hash;
unsigned long hash_valid_count;
- struct request *request;
+
+ /*
+ * expire fifo
+ */
+ struct list_head fifo;
unsigned long expires;
};

@@ -81,23 +97,23 @@
#define RQ_DATA(rq) ((struct deadline_rq *) (rq)->elevator_private)

/*
- * rq hash
+ * the back merge hash support functions
*/
-static inline void __deadline_del_rq_hash(struct deadline_rq *drq)
+static inline void __deadline_hash_del(struct deadline_rq *drq)
{
drq->hash_valid_count = 0;
list_del_init(&drq->hash);
}

#define ON_HASH(drq) (drq)->hash_valid_count
-static inline void deadline_del_rq_hash(struct deadline_rq *drq)
+static inline void deadline_hash_del(struct deadline_rq *drq)
{
if (ON_HASH(drq))
- __deadline_del_rq_hash(drq);
+ __deadline_hash_del(drq);
}

static inline void
-deadline_add_rq_hash(struct deadline_data *dd, struct deadline_rq *drq)
+deadline_hash_add(struct deadline_data *dd, struct deadline_rq *drq)
{
struct request *rq = drq->request;

@@ -109,33 +125,30 @@

#define list_entry_hash(ptr) list_entry((ptr), struct deadline_rq, hash)
static struct request *
-deadline_find_hash(struct deadline_data *dd, sector_t offset)
+deadline_hash_find(struct deadline_data *dd, sector_t offset)
{
struct list_head *hash_list = &dd->hash[DL_HASH_FN(offset)];
struct list_head *entry, *next = hash_list->next;
- struct deadline_rq *drq;
- struct request *rq = NULL;

while ((entry = next) != hash_list) {
+ struct deadline_rq *drq = list_entry_hash(entry);
+ struct request *__rq = drq->request;
+
next = entry->next;

- drq = list_entry_hash(entry);
-
- BUG_ON(!drq->hash_valid_count);
+ BUG_ON(!ON_HASH(drq));

- if (!rq_mergeable(drq->request)
+ if (!rq_mergeable(__rq)
|| drq->hash_valid_count != dd->hash_valid_count) {
- __deadline_del_rq_hash(drq);
+ __deadline_hash_del(drq);
continue;
}

- if (drq->request->sector + drq->request->nr_sectors == offset) {
- rq = drq->request;
- break;
- }
+ if (__rq->sector + __rq->nr_sectors == offset)
+ return __rq;
}

- return rq;
+ return NULL;
}

static sector_t deadline_get_last_sector(struct deadline_data *dd)
@@ -154,86 +167,135 @@
return last_sec;
}

+/*
+ * rb tree support functions
+ */
+#define RB_NONE (2)
+#define RB_EMPTY(root) ((root)->rb_node == NULL)
+#define ON_RB(node) ((node)->rb_color != RB_NONE)
+#define RB_CLEAR(node) ((node)->rb_color = RB_NONE)
+#define deadline_rb_entry(node) rb_entry((node), struct deadline_rq, rb_node)
+#define DRQ_RB_ROOT(dd, drq) (&(dd)->rb_list[rq_data_dir((drq)->request)])
+
+static inline int
+__deadline_rb_add(struct deadline_data *dd, struct deadline_rq *drq)
+{
+ struct rb_node **p = &DRQ_RB_ROOT(dd, drq)->rb_node;
+ struct rb_node *parent = NULL;
+ struct deadline_rq *__drq;
+
+ while (*p) {
+ parent = *p;
+ __drq = deadline_rb_entry(parent);
+
+ if (drq->rb_key < __drq->rb_key)
+ p = &(*p)->rb_left;
+ else if (drq->rb_key > __drq->rb_key)
+ p = &(*p)->rb_right;
+ else
+ return 1;
+ }
+
+ rb_link_node(&drq->rb_node, parent, p);
+ return 0;
+}
+
+static void deadline_rb_add(struct deadline_data *dd, struct deadline_rq *drq)
+{
+ drq->rb_key = drq->request->sector;
+
+ if (!__deadline_rb_add(dd, drq)) {
+ rb_insert_color(&drq->rb_node, DRQ_RB_ROOT(dd, drq));
+ return;
+ }
+
+ /*
+ * this cannot happen
+ */
+ blk_dump_rq_flags(drq->request, "deadline_rb_add alias");
+ list_add_tail(&drq->request->queuelist, dd->dispatch);
+}
+
+static inline void
+deadline_rb_del(struct deadline_data *dd, struct deadline_rq *drq)
+{
+ if (ON_RB(&drq->rb_node)) {
+ rb_erase(&drq->rb_node, DRQ_RB_ROOT(dd, drq));
+ RB_CLEAR(&drq->rb_node);
+ }
+}
+
+static struct request *
+deadline_rb_find(struct deadline_data *dd, sector_t sector, int data_dir)
+{
+ struct rb_node *n = dd->rb_list[data_dir].rb_node;
+ struct deadline_rq *drq;
+
+ while (n) {
+ drq = deadline_rb_entry(n);
+
+ if (sector < drq->rb_key)
+ n = n->rb_left;
+ else if (sector > drq->rb_key)
+ n = n->rb_right;
+ else
+ return drq->request;
+ }
+
+ return NULL;
+}
+
static int
deadline_merge(request_queue_t *q, struct list_head **insert, struct bio *bio)
{
struct deadline_data *dd = q->elevator.elevator_data;
- const int data_dir = bio_data_dir(bio);
- struct list_head *entry, *sort_list;
struct request *__rq;
- int ret = ELEVATOR_NO_MERGE;
+ int ret;

/*
* try last_merge to avoid going to hash
*/
ret = elv_try_last_merge(q, bio);
if (ret != ELEVATOR_NO_MERGE) {
- *insert = q->last_merge;
- goto out;
+ __rq = list_entry_rq(q->last_merge);
+ goto out_insert;
}

/*
* see if the merge hash can satisfy a back merge
*/
- if ((__rq = deadline_find_hash(dd, bio->bi_sector))) {
+ __rq = deadline_hash_find(dd, bio->bi_sector);
+ if (__rq) {
BUG_ON(__rq->sector + __rq->nr_sectors != bio->bi_sector);

if (elv_rq_merge_ok(__rq, bio)) {
- *insert = &__rq->queuelist;
ret = ELEVATOR_BACK_MERGE;
goto out;
}
}

/*
- * scan list from back to find insertion point.
+ * check for front merge
*/
- entry = sort_list = &dd->sort_list[data_dir];
- while ((entry = entry->prev) != sort_list) {
- __rq = list_entry_rq(entry);
+ if (dd->front_merges) {
+ sector_t rb_key = bio->bi_sector + bio_sectors(bio);

- BUG_ON(__rq->flags & REQ_STARTED);
-
- if (!(__rq->flags & REQ_CMD))
- continue;
-
- /*
- * it's not necessary to break here, and in fact it could make
- * us loose a front merge. emperical evidence shows this to
- * be a big waste of cycles though, so quit scanning
- */
- if (!*insert && bio_rq_in_between(bio, __rq, sort_list)) {
- *insert = &__rq->queuelist;
- break;
- }
-
- if (__rq->flags & (REQ_SOFTBARRIER | REQ_HARDBARRIER))
- break;
-
- /*
- * checking for a front merge, hash will miss those
- */
- if (__rq->sector - bio_sectors(bio) == bio->bi_sector) {
- ret = elv_try_merge(__rq, bio);
- if (ret != ELEVATOR_NO_MERGE) {
- *insert = &__rq->queuelist;
- break;
+ __rq = deadline_rb_find(dd, rb_key, bio_data_dir(bio));
+ if (__rq) {
+ BUG_ON(rb_key != __rq->sector);
+
+ if (elv_rq_merge_ok(__rq, bio)) {
+ ret = ELEVATOR_FRONT_MERGE;
+ goto out;
}
}
}

- /*
- * no insertion point found, check the very front
- */
- if (!*insert && !list_empty(sort_list)) {
- __rq = list_entry_rq(sort_list->next);
-
- if (bio->bi_sector + bio_sectors(bio) < __rq->sector &&
- bio->bi_sector > deadline_get_last_sector(dd))
- *insert = sort_list;
- }
-
+ return ELEVATOR_NO_MERGE;
out:
+ q->last_merge = &__rq->queuelist;
+out_insert:
+ *insert = &__rq->queuelist;
return ret;
}

@@ -242,8 +304,19 @@
struct deadline_data *dd = q->elevator.elevator_data;
struct deadline_rq *drq = RQ_DATA(req);

- deadline_del_rq_hash(drq);
- deadline_add_rq_hash(dd, drq);
+ /*
+ * hash always needs to be repositioned, key is end sector
+ */
+ deadline_hash_del(drq);
+ deadline_hash_add(dd, drq);
+
+ /*
+ * if the merge was a front merge, we need to reposition request
+ */
+ if (req->sector != drq->rb_key) {
+ deadline_rb_del(dd, drq);
+ deadline_rb_add(dd, drq);
+ }

q->last_merge = &req->queuelist;
}
@@ -258,11 +331,16 @@
BUG_ON(!drq);
BUG_ON(!dnext);

- deadline_del_rq_hash(drq);
- deadline_add_rq_hash(dd, drq);
+ deadline_hash_del(drq);
+ deadline_hash_add(dd, drq);
+
+ if (req->sector != drq->rb_key) {
+ deadline_rb_del(dd, drq);
+ deadline_rb_add(dd, drq);
+ }

/*
- * if dnext expires before drq, assign it's expire time to drq
+ * if dnext expires before drq, assign its expire time to drq
* and move into dnext position (dnext will be deleted) in fifo
*/
if (!list_empty(&drq->fifo) && !list_empty(&dnext->fifo)) {
@@ -274,53 +352,56 @@
}

/*
- * move request from sort list to dispatch queue. maybe remove from rq hash
- * here too?
+ * move request from sort list to dispatch queue.
*/
static inline void
-deadline_move_to_dispatch(struct deadline_data *dd, struct request *rq)
+deadline_move_to_dispatch(struct deadline_data *dd, struct deadline_rq *drq)
{
- struct deadline_rq *drq = RQ_DATA(rq);
-
- list_move_tail(&rq->queuelist, dd->dispatch);
+ deadline_rb_del(dd, drq);
list_del_init(&drq->fifo);
+ list_add_tail(&drq->request->queuelist, dd->dispatch);
}

/*
- * move along sort list and move entries to dispatch queue, starting from rq
+ * move along sort list and move entries to dispatch queue, starting from drq
*/
-static void deadline_move_requests(struct deadline_data *dd, struct request *rq)
+static void deadline_move_requests(struct deadline_data *dd, struct deadline_rq *drq)
{
- struct list_head *sort_head = &dd->sort_list[rq_data_dir(rq)];
sector_t last_sec = deadline_get_last_sector(dd);
int batch_count = dd->fifo_batch;

do {
- struct list_head *nxt = rq->queuelist.next;
+ struct rb_node *rbnext = rb_next(&drq->rb_node);
+ struct deadline_rq *dnext = NULL;
+ struct request *__rq;
int this_rq_cost;

+ if (rbnext)
+ dnext = deadline_rb_entry(rbnext);
+
/*
* take it off the sort and fifo list, move
* to dispatch queue
*/
- deadline_move_to_dispatch(dd, rq);
+ deadline_move_to_dispatch(dd, drq);

/*
* if this is the last entry, don't bother doing accounting
*/
- if (nxt == sort_head)
+ if (dnext == NULL)
break;

+ __rq = drq->request;
this_rq_cost = dd->seek_cost;
- if (rq->sector == last_sec)
- this_rq_cost = (rq->nr_sectors + 255) >> 8;
+ if (__rq->sector == last_sec)
+ this_rq_cost = (__rq->nr_sectors + 255) >> 8;

batch_count -= this_rq_cost;
if (batch_count <= 0)
break;

- last_sec = rq->sector + rq->nr_sectors;
- rq = list_entry_rq(nxt);
+ last_sec = __rq->sector + __rq->nr_sectors;
+ drq = dnext;
} while (1);
}

@@ -343,25 +424,10 @@
return 0;
}

-static struct request *deadline_next_request(request_queue_t *q)
+static int deadline_dispatch_requests(struct deadline_data *dd)
{
- struct deadline_data *dd = q->elevator.elevator_data;
+ const int writes = !RB_EMPTY(&dd->rb_list[WRITE]);
struct deadline_rq *drq;
- struct list_head *nxt;
- struct request *rq;
- int writes;
-
- /*
- * if still requests on the dispatch queue, just grab the first one
- */
- if (!list_empty(&q->queue_head)) {
-dispatch:
- rq = list_entry_rq(q->queue_head.next);
- dd->last_sector = rq->sector + rq->nr_sectors;
- return rq;
- }
-
- writes = !list_empty(&dd->sort_list[WRITE]);

/*
* if we have expired entries on the fifo list, move some to dispatch
@@ -370,19 +436,18 @@
if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;

- nxt = dd->read_fifo.next;
- drq = list_entry_fifo(nxt);
- deadline_move_requests(dd, drq->request);
- goto dispatch;
+ drq = list_entry_fifo(dd->read_fifo.next);
+dispatch_requests:
+ deadline_move_requests(dd, drq);
+ return 1;
}

- if (!list_empty(&dd->sort_list[READ])) {
+ if (!RB_EMPTY(&dd->rb_list[READ])) {
if (writes && (dd->starved++ >= dd->writes_starved))
goto dispatch_writes;

- nxt = dd->sort_list[READ].next;
- deadline_move_requests(dd, list_entry_rq(nxt));
- goto dispatch;
+ drq = deadline_rb_entry(rb_first(&dd->rb_list[READ]));
+ goto dispatch_requests;
}

/*
@@ -391,14 +456,40 @@
*/
if (writes) {
dispatch_writes:
- nxt = dd->sort_list[WRITE].next;
- deadline_move_requests(dd, list_entry_rq(nxt));
dd->starved = 0;
- goto dispatch;
+
+ drq = deadline_rb_entry(rb_first(&dd->rb_list[WRITE]));
+ goto dispatch_requests;
+ }
+
+ return 0;
+}
+
+static struct request *deadline_next_request(request_queue_t *q)
+{
+ struct deadline_data *dd = q->elevator.elevator_data;
+ struct request *rq;
+
+ /*
+ * if there are still requests on the dispatch queue, grab the first one
+ */
+ if (!list_empty(dd->dispatch)) {
+dispatch:
+ rq = list_entry_rq(dd->dispatch->next);
+ dd->last_sector = rq->sector + rq->nr_sectors;
+ return rq;
}

- BUG_ON(!list_empty(&dd->sort_list[READ]));
- BUG_ON(writes);
+ if (deadline_dispatch_requests(dd))
+ goto dispatch;
+
+ /*
+ * if we have entries on the read or write sorted list, its a bug
+ * if deadline_dispatch_requests() didn't move any
+ */
+ BUG_ON(!RB_EMPTY(&dd->rb_list[READ]));
+ BUG_ON(!RB_EMPTY(&dd->rb_list[WRITE]));
+
return NULL;
}

@@ -409,32 +500,28 @@
struct deadline_rq *drq = RQ_DATA(rq);
const int data_dir = rq_data_dir(rq);

- /*
- * flush hash on barrier insert, as not to allow merges before a
- * barrier.
- */
if (unlikely(rq->flags & REQ_HARDBARRIER)) {
DL_INVALIDATE_HASH(dd);
q->last_merge = NULL;
}

- /*
- * add to sort list
- */
- if (!insert_here)
- insert_here = dd->sort_list[data_dir].prev;
-
- list_add(&rq->queuelist, insert_here);
+ if (unlikely(!(rq->flags & REQ_CMD))) {
+ if (!insert_here)
+ insert_here = dd->dispatch->prev;

- if (unlikely(!(rq->flags & REQ_CMD)))
+ list_add(&rq->queuelist, insert_here);
return;
+ }
+
+ deadline_rb_add(dd, drq);

if (rq_mergeable(rq)) {
- deadline_add_rq_hash(dd, drq);
+ deadline_hash_add(dd, drq);

if (!q->last_merge)
q->last_merge = &rq->queuelist;
- }
+ } else
+ blk_dump_rq_flags(rq, "not mergeable");

if (data_dir == READ) {
/*
@@ -450,8 +537,11 @@
struct deadline_rq *drq = RQ_DATA(rq);

if (drq) {
+ struct deadline_data *dd = q->elevator.elevator_data;
+
list_del_init(&drq->fifo);
- deadline_del_rq_hash(drq);
+ deadline_hash_del(drq);
+ deadline_rb_del(dd, drq);
}
}

@@ -459,9 +549,9 @@
{
struct deadline_data *dd = q->elevator.elevator_data;

- if (!list_empty(&dd->sort_list[WRITE]) ||
- !list_empty(&dd->sort_list[READ]) ||
- !list_empty(&q->queue_head))
+ if (!RB_EMPTY(&dd->rb_list[WRITE]) ||
+ !RB_EMPTY(&dd->rb_list[READ]) ||
+ !list_empty(dd->dispatch))
return 0;

BUG_ON(!list_empty(&dd->read_fifo));
@@ -473,7 +563,7 @@
{
struct deadline_data *dd = q->elevator.elevator_data;

- return &dd->sort_list[rq_data_dir(rq)];
+ return dd->dispatch;
}

static void deadline_exit(request_queue_t *q, elevator_t *e)
@@ -484,8 +574,8 @@
int i;

BUG_ON(!list_empty(&dd->read_fifo));
- BUG_ON(!list_empty(&dd->sort_list[READ]));
- BUG_ON(!list_empty(&dd->sort_list[WRITE]));
+ BUG_ON(!RB_EMPTY(&dd->rb_list[READ]));
+ BUG_ON(!RB_EMPTY(&dd->rb_list[WRITE]));

for (i = READ; i <= WRITE; i++) {
struct request_list *rl = &q->rq[i];
@@ -538,14 +628,15 @@
INIT_LIST_HEAD(&dd->hash[i]);

INIT_LIST_HEAD(&dd->read_fifo);
- INIT_LIST_HEAD(&dd->sort_list[READ]);
- INIT_LIST_HEAD(&dd->sort_list[WRITE]);
+ dd->rb_list[READ] = RB_ROOT;
+ dd->rb_list[WRITE] = RB_ROOT;
dd->dispatch = &q->queue_head;
dd->fifo_batch = fifo_batch;
dd->read_expire = read_expire;
dd->seek_cost = seek_cost;
dd->hash_valid_count = 1;
dd->writes_starved = writes_starved;
+ dd->front_merges = 1;
e->elevator_data = dd;

for (i = READ; i <= WRITE; i++) {
@@ -567,6 +658,7 @@
memset(drq, 0, sizeof(*drq));
INIT_LIST_HEAD(&drq->fifo);
INIT_LIST_HEAD(&drq->hash);
+ RB_CLEAR(&drq->rb_node);
drq->request = rq;
rq->elevator_private = drq;
}
@@ -578,6 +670,139 @@
return ret;
}

+struct deadline_fs_entry {
+ struct attribute attr;
+ ssize_t (*show)(elevator_t *, char *, size_t, loff_t);
+ ssize_t (*store)(elevator_t *, const char *, size_t, loff_t);
+};
+
+static ssize_t
+deadline_var_show(unsigned int var, char *page, size_t count, loff_t off)
+{
+ if (off)
+ return 0;
+
+ return sprintf(page, "%d\n", var);
+}
+
+static ssize_t
+deadline_var_store(unsigned int *var, const char *page, size_t count,loff_t off)
+{
+ char *p = (char *) page;
+
+ if (off)
+ return 0;
+
+ *var = simple_strtoul(p, &p, 10);
+ return count;
+}
+
+#define SHOW_FUNCTION(__FUNC, __VAR) \
+static ssize_t __FUNC(elevator_t *e, char *page, size_t cnt, loff_t off) \
+{ \
+ struct deadline_data *dd = (e)->elevator_data; \
+ return deadline_var_show(__VAR, (page), (cnt), (off)); \
+}
+
+SHOW_FUNCTION(deadline_fifo_show, dd->fifo_batch);
+SHOW_FUNCTION(deadline_readexpire_show, dd->read_expire);
+SHOW_FUNCTION(deadline_seekcost_show, dd->seek_cost);
+SHOW_FUNCTION(deadline_writesstarved_show, dd->writes_starved);
+SHOW_FUNCTION(deadline_frontmerges_show, dd->front_merges);
+#undef SHOW_FUNCTION
+
+#define STORE_FUNCTION(__FUNC, __VAR) \
+static ssize_t __FUNC(elevator_t *e, const char *page, size_t cnt, loff_t off) \
+{ \
+ struct deadline_data *dd = (e)->elevator_data; \
+ return deadline_var_store(__VAR, (page), (cnt), (off)); \
+}
+
+STORE_FUNCTION(deadline_fifo_store, &dd->fifo_batch);
+STORE_FUNCTION(deadline_readexpire_store, &dd->read_expire);
+STORE_FUNCTION(deadline_seekcost_store, &dd->seek_cost);
+STORE_FUNCTION(deadline_writesstarved_store, &dd->writes_starved);
+STORE_FUNCTION(deadline_frontmerges_store, &dd->front_merges);
+#undef STORE_FUNCTION
+
+static struct deadline_fs_entry deadline_fifo_entry = {
+ .attr = {.name = "fifo_batch", .mode = S_IRUGO | S_IWUSR },
+ .show = deadline_fifo_show,
+ .store = deadline_fifo_store,
+};
+static struct deadline_fs_entry deadline_readexpire_entry = {
+ .attr = {.name = "read_expire", .mode = S_IRUGO | S_IWUSR },
+ .show = deadline_readexpire_show,
+ .store = deadline_readexpire_store,
+};
+static struct deadline_fs_entry deadline_seekcost_entry = {
+ .attr = {.name = "seek_cost", .mode = S_IRUGO | S_IWUSR },
+ .show = deadline_seekcost_show,
+ .store = deadline_seekcost_store,
+};
+static struct deadline_fs_entry deadline_writesstarved_entry = {
+ .attr = {.name = "writes_starved", .mode = S_IRUGO | S_IWUSR },
+ .show = deadline_writesstarved_show,
+ .store = deadline_writesstarved_store,
+};
+static struct deadline_fs_entry deadline_frontmerges_entry = {
+ .attr = {.name = "front_merges", .mode = S_IRUGO | S_IWUSR },
+ .show = deadline_frontmerges_show,
+ .store = deadline_frontmerges_store,
+};
+
+static struct attribute *default_attrs[] = {
+ &deadline_fifo_entry.attr,
+ &deadline_readexpire_entry.attr,
+ &deadline_seekcost_entry.attr,
+ &deadline_writesstarved_entry.attr,
+ &deadline_frontmerges_entry.attr,
+ NULL,
+};
+
+static ssize_t deadline_attr_show(struct kobject *kobj, struct attribute *attr,
+ char *page, size_t count, loff_t off)
+{
+ elevator_t *e = container_of(kobj, elevator_t, kobj);
+ struct deadline_fs_entry *entry = container_of(attr, struct deadline_fs_entry, attr);
+
+ if (!entry->show)
+ return 0;
+
+ return entry->show(e, page, count, off);
+}
+
+static ssize_t deadline_attr_store(struct kobject *kobj, struct attribute *attr,
+ const char *page, size_t count, loff_t off)
+{
+ elevator_t *e = container_of(kobj, elevator_t, kobj);
+ struct deadline_fs_entry *entry = container_of(attr, struct deadline_fs_entry, attr);
+
+ if (!entry->store)
+ return -EINVAL;
+
+ return entry->store(e, page, count, off);
+}
+
+static struct sysfs_ops deadline_sysfs_ops = {
+ .show = &deadline_attr_show,
+ .store = &deadline_attr_store,
+};
+
+extern struct subsystem block_subsys;
+
+struct subsystem deadline_subsys = {
+ .parent = &block_subsys,
+ .sysfs_ops = &deadline_sysfs_ops,
+ .default_attrs = default_attrs,
+};
+
+static void deadline_register_fs(elevator_t *e)
+{
+ e->kobj.subsys = &deadline_subsys;
+ kobject_register(&e->kobj);
+}
+
static int __init deadline_slab_setup(void)
{
drq_pool = kmem_cache_create("deadline_drq", sizeof(struct deadline_rq),
@@ -586,6 +811,7 @@
if (!drq_pool)
panic("deadline: can't init slab pool\n");

+ subsystem_register(&deadline_subsys);
return 0;
}

@@ -600,6 +826,7 @@
.elevator_remove_req_fn = deadline_remove_request,
.elevator_queue_empty_fn = deadline_queue_empty,
.elevator_get_sort_head_fn = deadline_get_sort_head,
+ .elevator_register_fs_fn = deadline_register_fs,
.elevator_init_fn = deadline_init,
.elevator_exit_fn = deadline_exit,
};
===== drivers/block/elevator.c 1.35 vs edited =====
--- 1.35/drivers/block/elevator.c Fri Nov 8 09:57:28 2002
+++ edited/drivers/block/elevator.c Fri Nov 8 14:17:31 2002
@@ -381,6 +381,19 @@
return &q->queue_head;
}

+void elv_register_fs(struct gendisk *disk)
+{
+ request_queue_t *q = disk->queue;
+ elevator_t *e = &q->elevator;
+
+ kobject_init(&e->kobj);
+ snprintf(e->kobj.name, KOBJ_NAME_LEN, "%s", "iosched");
+ e->kobj.parent = &disk->kobj;
+
+ if (e->elevator_register_fs_fn)
+ e->elevator_register_fs_fn(e);
+}
+
elevator_t elevator_noop = {
.elevator_merge_fn = elevator_noop_merge,
.elevator_next_req_fn = elevator_noop_next_request,
===== drivers/block/genhd.c 1.58 vs edited =====
--- 1.58/drivers/block/genhd.c Mon Oct 21 09:53:07 2002
+++ edited/drivers/block/genhd.c Fri Nov 8 12:56:16 2002
@@ -119,6 +119,7 @@
blk_register_region(MKDEV(disk->major, disk->first_minor), disk->minors,
NULL, exact_match, exact_lock, disk);
register_disk(disk);
+ elv_register_fs(disk);
}

EXPORT_SYMBOL(add_disk);
===== drivers/block/ll_rw_blk.c 1.139 vs edited =====
--- 1.139/drivers/block/ll_rw_blk.c Fri Nov 8 09:57:28 2002
+++ edited/drivers/block/ll_rw_blk.c Fri Nov 8 14:57:56 2002
@@ -73,7 +73,7 @@
{
int ret;

- ret = queue_nr_requests / 4 - 1;
+ ret = queue_nr_requests / 8 - 1;
if (ret < 0)
ret = 1;
return ret;
@@ -86,7 +86,7 @@
{
int ret;

- ret = queue_nr_requests / 4 + 1;
+ ret = queue_nr_requests / 8 + 1;
if (ret > queue_nr_requests)
ret = queue_nr_requests;
return ret;
@@ -700,31 +700,22 @@
seg_size = nr_phys_segs = nr_hw_segs = 0;
bio_for_each_segment(bv, bio, i) {
if (bvprv && cluster) {
- int phys, seg;
-
- if (seg_size + bv->bv_len > q->max_segment_size) {
- nr_phys_segs++;
+ if (seg_size + bv->bv_len > q->max_segment_size)
goto new_segment;
- }
-
- phys = BIOVEC_PHYS_MERGEABLE(bvprv, bv);
- seg = BIOVEC_SEG_BOUNDARY(q, bvprv, bv);
- if (!phys || !seg)
- nr_phys_segs++;
- if (!seg)
+ if (!BIOVEC_PHYS_MERGEABLE(bvprv, bv))
goto new_segment;
-
- if (!BIOVEC_VIRT_MERGEABLE(bvprv, bv))
+ if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bv))
goto new_segment;

seg_size += bv->bv_len;
bvprv = bv;
continue;
- } else {
- nr_phys_segs++;
}
new_segment:
- nr_hw_segs++;
+ if (!bvprv || !BIOVEC_VIRT_MERGEABLE(bvprv, bv))
+ nr_hw_segs++;
+
+ nr_phys_segs++;
bvprv = bv;
seg_size = bv->bv_len;
}
@@ -1621,7 +1612,7 @@
struct list_head *next = rq->queuelist.next;
struct list_head *sort_head = elv_get_sort_head(q, rq);

- if (next != sort_head)
+ if (next != sort_head && next != &rq->queuelist)
attempt_merge(q, rq, list_entry_rq(next));
}

@@ -1630,7 +1621,7 @@
struct list_head *prev = rq->queuelist.prev;
struct list_head *sort_head = elv_get_sort_head(q, rq);

- if (prev != sort_head)
+ if (prev != sort_head && prev != &rq->queuelist)
attempt_merge(q, list_entry_rq(prev), rq);
}

@@ -2180,8 +2171,8 @@
queue_nr_requests = (total_ram >> 9) & ~7;
if (queue_nr_requests < 16)
queue_nr_requests = 16;
- if (queue_nr_requests > 128)
- queue_nr_requests = 128;
+ if (queue_nr_requests > 1024)
+ queue_nr_requests = 1024;

batch_requests = queue_nr_requests / 8;
if (batch_requests > 8)
===== fs/sysfs/inode.c 1.59 vs edited =====
--- 1.59/fs/sysfs/inode.c Wed Oct 30 21:27:35 2002
+++ edited/fs/sysfs/inode.c Fri Nov 8 14:33:59 2002
@@ -243,7 +243,7 @@
if (kobj && kobj->subsys)
ops = kobj->subsys->sysfs_ops;
if (!ops || !ops->store)
- return 0;
+ return -EINVAL;

page = (char *)__get_free_page(GFP_KERNEL);
if (!page)
===== include/linux/elevator.h 1.17 vs edited =====
--- 1.17/include/linux/elevator.h Mon Oct 28 18:51:57 2002
+++ edited/include/linux/elevator.h Fri Nov 8 13:45:30 2002
@@ -18,6 +18,8 @@
typedef int (elevator_init_fn) (request_queue_t *, elevator_t *);
typedef void (elevator_exit_fn) (request_queue_t *, elevator_t *);

+typedef void (elevator_register_fs_fn) (elevator_t *);
+
struct elevator_s
{
elevator_merge_fn *elevator_merge_fn;
@@ -34,7 +36,11 @@
elevator_init_fn *elevator_init_fn;
elevator_exit_fn *elevator_exit_fn;

+ elevator_register_fs_fn *elevator_register_fs_fn;
+
void *elevator_data;
+
+ struct kobject kobj;
};

/*
@@ -49,6 +55,7 @@
extern void elv_remove_request(request_queue_t *, struct request *);
extern int elv_queue_empty(request_queue_t *);
extern inline struct list_head *elv_get_sort_head(request_queue_t *, struct request *);
+extern void elv_register_fs(struct gendisk *);

#define __elv_add_request_pos(q, rq, pos) \
(q)->elevator.elevator_add_req_fn((q), (rq), (pos))


--
Jens Axboe

2002-11-08 18:19:41

by Rob Landley

[permalink] [raw]
Subject: Whither the "system without /proc" crowd?

On Friday 08 November 2002 13:58, Jens Axboe wrote:

> Here's a patch that includes that feature, puts the tunables in sysfs
> (so you obviously need that mounted). In
>
> /sys/block/<disk>/iosched

Stupid question time:

A great deal of text has been expended over the years by people desperately
trying to make sure you didn't need /proc mounted to have a usable system,
for some definition of usable. Now with rootfs, initramfs, sysfs, and the
libfs inspired "make a filesystem rather than an ioctl" policy, the main
argument against requiring the use of /proc is that it has a lot more gunk in
it (left over from the days when it was the only ramfs type system to export
values in) than anyone is comfortable with. (The argument against /dev/pty
largely seems to be inertia, now that the "number of ptys" issue as a config
tunable seems to have been cleared up).

There seems to be some sort of nebulous plan for eventually stripping down
/proc, perhaps making a "crapfs" that's a union mount on top of /proc
providing deprecated legacy support for a release or two. But I haven't
heard it explicitly stated.

So my questions are:

1) will some subset of /proc, /sys, /dev/pty, etc become required at some
point in the future on everything but the most customized embedded systems?
Or is keeping the system usable without them still a goal?

2) Is there a plan to rehabilitate /proc?

(I ask because I don't know. Maybe I missed some important posts...)

Rob

--
http://penguicon.sf.net - Terry Pratchett, Eric Raymond, Pete Abrams, Illiad,
CmdrTaco, liquid nitrogen ice cream, and caffienated jello. Well why not?