On Monday 12 November 2001 10:32, Christoph Hellwig wrote:
> On Mon, Nov 12, 2001 at 08:41:30AM -0600, Mark Peloquin wrote:
> >
> > > 1) gendisk_head called by EVMS's ldev_mgr - RedHat now uses an ac-based
> > > kernel, and gendisk_head was made private as of 2.4.9-ac9. I have no
> > > idea why this was done, but was able to fix the unresolved symbol by
> > > exporting it again as in a vanilla genhd.c. There is a comment saying
> > > that you should never access this directly, and the only reason it was
> > > exported was for source compatibility, so I guess this is why it was
> > > removed in the ac kernel. Can EVMS be fixed so that it doesn't need
> > > gendisk_head exported by the kernel?
> >
> > Since we build on the stock kernels, we have not yet dealt with
> > some of the differences found in the -ac kernels. Abstraction of
> > the gendisk_head is one of these. I believe the -ac kernel now
> > using functions to add and delete gendisk entries from the
> > gendisk list, and thus no longer requires the gendisk_head to
> > be exported. The change you made will work until we code for
> > the difference.
>
> You are supposed to use {add,del,get}_gendisk since 2.4.10/2.4.9-ac<~14>.
> Alan removed the export in 2.4.9-ac17, but I disagree with him as we
> want to maintain source compatibility.
>
> Note that direct access is racy as no locking is performed.
>
> Christoph
We have been looking back over the EVMS code that accesses the gendisk list,
and trying to figure out how to abstract out the direct access of
gendisk_head. Unfortunately, the currently supplied API get_gendisk() is not
sufficient. EVMS needs to traverse the entire gendisk list to determine all
of the available disks in the system. The only way to do this using
get_gendisk() would be to iterate through every possible major number,
calling get_gendisk() for each one. Obviously that is wasteful. Also (as I
believe has been pointed out already in other discussions), accesing the list
even in this manner is still unsafe, because the gendisk_lock is only held
while searching the list for the given major number, not while the caller
actually accesses the item that is returned.
So one of my questions is: what is in store for the gendisk list in 2.5?
There is a comment in add_gendisk() about some part of that code going away
in 2.5 (although it's vague as to what the comment refers to), but no
corresponding comment in del_gendisk(). Are these two APIs coming or going?
I've also noticed get_start_sect() and get_nr_sects() in genhd.c in the
latest 2.5 patches. Are there any more new APIs that will be coming?
In order for EVMS to run this list correctly during volume discovery, and to
sufficiently abstract access to the gendisk list variables, and to keep
everything SMP safe, it seems we would need APIs such as lock_gendisk_list(),
unlock_gendisk_list(), get_gendisk_first(), and get_gendisk_next(). I've
included a patch below (against 2.4.16) for genhd.c with examples of these
APIs. EVMS could then use these to lock the list, traverse the list and
process all entries, and then unlock.
Comments?
Kevin Corry
diff -Naur linux-2.4.16/drivers/block/genhd.c
linux-2.4.16-new/drivers/block/genhd.c
--- linux-2.4.16/drivers/block/genhd.c Wed Oct 17 16:46:29 2001
+++ linux-2.4.16-new/drivers/block/genhd.c Wed Dec 5 14:27:49 2001
@@ -123,6 +123,51 @@
EXPORT_SYMBOL(get_gendisk);
+void
+lock_gendisk_list(void)
+{
+ read_lock(&gendisk_lock);
+}
+
+EXPORT_SYMBOL(lock_gendisk_list);
+
+void
+unlock_gendisk_list(void)
+{
+ read_unlock(&gendisk_lock);
+}
+
+EXPORT_SYMBOL(unlock_gendisk_list);
+
+
+/**
+ * get_gendisk_first - returns the first item in the gendisk list. Must have
+ * first locked the gendisk list using lock_gendisk().
+ */
+struct gendisk *
+get_gendisk_first(void)
+{
+ return gendisk_head;
+}
+
+EXPORT_SYMBOL(get_gendisk_first);
+
+
+/**
+ * get_gendisk_next - return the gendisk entry immediately following the
+ * specified entry. Must have locked the gendisk list
+ * first using lock_gendisk().
+ * @cur: Current item from the gendisk list. Get the following item.
+ */
+struct gendisk *
+get_gendisk_next(struct gendisk *cur)
+{
+ return( cur ? cur->next : NULL );
+}
+
+EXPORT_SYMBOL(get_gendisk_next);
+
+
#ifdef CONFIG_PROC_FS
int
get_partition_list(char *page, char **start, off_t offset, int count)
diff -Naur linux-2.4.16/include/linux/genhd.h
linux-2.4.16-new/include/linux/genhd.h
--- linux-2.4.16/include/linux/genhd.h Thu Nov 22 13:47:05 2001
+++ linux-2.4.16-new/include/linux/genhd.h Wed Dec 5 14:29:46 2001
@@ -91,6 +91,10 @@
extern void add_gendisk(struct gendisk *gp);
extern void del_gendisk(struct gendisk *gp);
extern struct gendisk *get_gendisk(kdev_t dev);
+extern void lock_gendisk_list(void);
+extern void unlock_gendisk_list(void);
+extern struct gendisk *get_gendisk_first(void);
+extern struct gendisk *get_gendisk_next(struct gendisk *cur);
#endif /* __KERNEL__ */
On Wed, Dec 05, 2001 at 02:52:59PM -0600, Kevin Corry wrote:
> So one of my questions is: what is in store for the gendisk list in 2.5?
> There is a comment in add_gendisk() about some part of that code going away
> in 2.5 (although it's vague as to what the comment refers to),
There are two comments, first
* XXX: you should _never_ access this directly.
* the only reason this is exported is source compatiblity.
over the declaration of gendisk_head - since 2.5.1-pre2 gendisk_head
is static and the comment is gone.
The second is
* In 2.5 this will go away. Fix the drivers who rely on
* old behaviour.
This is in add_gendisk and means that we currently work around callers
trying to do add_gendisk on the same structure twice. This will go away
as soon as the 'sd' driver is fixed.
> but no
> corresponding comment in del_gendisk(). Are these two APIs coming or going?
> I've also noticed get_start_sect() and get_nr_sects() in genhd.c in the
> latest 2.5 patches. Are there any more new APIs that will be coming?
For early 2.5 the above API will remain - for mid to late 2.5 I plan to
get rid of per-major gendisk completly. My design on a replacement is not
yet written down, but the details are:
- the minor_shift member moves into the block queue
- each block queue gets a pointer to be used for partitioning, this will
be opaque to the drivers.
- a per-queue 'struct gendisk' equivalent (minus the fields that aren't used)
will go into above pointer.
- partitions will be registered as normal block devices, to the layers
outside the partitioning handling there will be no difference between
a block device and a partition.
Hope this helps..
> In order for EVMS to run this list correctly during volume discovery, and to
> sufficiently abstract access to the gendisk list variables, and to keep
> everything SMP safe, it seems we would need APIs such as lock_gendisk_list(),
> unlock_gendisk_list(), get_gendisk_first(), and get_gendisk_next(). I've
> included a patch below (against 2.4.16) for genhd.c with examples of these
> APIs. EVMS could then use these to lock the list, traverse the list and
> process all entries, and then unlock.
This API looks really ugly to me. Did you take a look at my 'walk_gendisk'
patch I sent to the evms list some time ago? (attached again).
Christoph
--
Of course it doesn't work. We've performed a software upgrade.
diff -uNr -Xdontdiff linux.shrink_icache/drivers/block/genhd.c linux.walk_gendisk/drivers/block/genhd.c
--- linux.shrink_icache/drivers/block/genhd.c Wed Oct 31 12:52:32 2001
+++ linux.walk_gendisk/drivers/block/genhd.c Fri Nov 2 15:06:00 2001
@@ -121,6 +121,32 @@
EXPORT_SYMBOL(get_gendisk);
+/**
+ * walk_gendisk - issue a command for every registered gendisk
+ * @walk: user-specified callback
+ * @data: opaque data for the callback
+ *
+ * This function walks through the gendisk chain and calls back
+ * into @walk for every element.
+ */
+int
+walk_gendisk(int (*walk)(struct gendisk *, void *), void *data)
+{
+ struct gendisk *gp;
+ int error = 0;
+
+ read_lock(&gendisk_lock);
+ for (gp = gendisk_head; gp; gp = gp->next)
+ if ((error = walk(gp, data)))
+ break;
+ read_unlock(&gendisk_lock);
+
+ return error;
+}
+
+EXPORT_SYMBOL(walk_gendisk);
+
+
#ifdef CONFIG_PROC_FS
int
get_partition_list(char *page, char **start, off_t offset, int count)
diff -uNr -Xdontdiff linux.shrink_icache/include/linux/genhd.h linux.walk_gendisk/include/linux/genhd.h
--- linux.shrink_icache/include/linux/genhd.h Fri Nov 2 15:02:48 2001
+++ linux.walk_gendisk/include/linux/genhd.h Fri Nov 2 15:06:00 2001
@@ -93,6 +93,7 @@
extern void add_gendisk(struct gendisk *gp);
extern void del_gendisk(struct gendisk *gp);
extern struct gendisk *get_gendisk(kdev_t dev);
+extern int walk_gendisk(int (*walk)(struct gendisk *, void *), void *);
#endif /* __KERNEL__ */
On Wednesday 05 December 2001 15:53, Christoph Hellwig wrote:
> On Wed, Dec 05, 2001 at 02:52:59PM -0600, Kevin Corry wrote:
> > So one of my questions is: what is in store for the gendisk list in 2.5?
> > There is a comment in add_gendisk() about some part of that code going
> > away in 2.5 (although it's vague as to what the comment refers to),
>
> There are two comments, first
>
> * XXX: you should _never_ access this directly.
> * the only reason this is exported is source compatiblity.
>
> over the declaration of gendisk_head - since 2.5.1-pre2 gendisk_head
> is static and the comment is gone.
Yes, I noticed that. This is why we are trying to find a different method for
walking the gendisk list. :)
> The second is
>
> * In 2.5 this will go away. Fix the drivers who rely on
> * old behaviour.
>
> This is in add_gendisk and means that we currently work around callers
> trying to do add_gendisk on the same structure twice. This will go away
> as soon as the 'sd' driver is fixed.
Ok.
> > but no
> > corresponding comment in del_gendisk(). Are these two APIs coming or
> > going? I've also noticed get_start_sect() and get_nr_sects() in genhd.c
> > in the latest 2.5 patches. Are there any more new APIs that will be
> > coming?
>
> For early 2.5 the above API will remain - for mid to late 2.5 I plan to
> get rid of per-major gendisk completly. My design on a replacement is not
> yet written down, but the details are:
>
> - the minor_shift member moves into the block queue
> - each block queue gets a pointer to be used for partitioning, this will
> be opaque to the drivers.
> - a per-queue 'struct gendisk' equivalent (minus the fields that aren't
> used) will go into above pointer.
> - partitions will be registered as normal block devices, to the layers
> outside the partitioning handling there will be no difference between
> a block device and a partition.
>
> Hope this helps..
Sounds good. Any chance you could just leave out the partitioning stuff and
let EVMS handle it?
> > In order for EVMS to run this list correctly during volume discovery, and
> > to sufficiently abstract access to the gendisk list variables, and to
> > keep everything SMP safe, it seems we would need APIs such as
> > lock_gendisk_list(), unlock_gendisk_list(), get_gendisk_first(), and
> > get_gendisk_next(). I've included a patch below (against 2.4.16) for
> > genhd.c with examples of these APIs. EVMS could then use these to lock
> > the list, traverse the list and process all entries, and then unlock.
>
> This API looks really ugly to me. Did you take a look at my 'walk_gendisk'
> patch I sent to the evms list some time ago? (attached again).
>
> Christoph
Agreed, they are ugly. I didn't really write them to try to get them accepted
or anything, just to generate some discussion.
I must have missed your walk_gendisk() patch the last time (do you remember
how long ago you posted that?). It looks like it should accomplish just what
we need for discovery, and provide the correct locking. Of course, Ram is
gong to shoot me now, since I believe he suggested something like this
earlier today. :)
We will go with the walk_gendisk() method, and add your genhd.c and genhd.h
patches to our set of patches for EVMS.
-Kevin
On Wed, Dec 05, 2001 at 04:18:33PM -0600, Kevin Corry wrote:
> > a block device and a partition.
> >
> > Hope this helps..
>
> Sounds good. Any chance you could just leave out the partitioning stuff and
> let EVMS handle it?
You know my opinion to EVMS - so I'd rather avoid makeing it mandatory.
But if the 2.5 actually looks then like I expect it now partitioning code
will be much simpler than - all code dealing with ondisk formats will
be in early, pre-mount-root userspace, based on Andries Brouwer's partx
(part of util-linux).
> I must have missed your walk_gendisk() patch the last time (do you remember
> how long ago you posted that?).
It was on Nov, 12 - but it was in a private thread that only started on
evms-devel. So Mark was the only EVMS person that got it. Sorry for
the confusion.
Christoph
--
Of course it doesn't work. We've performed a software upgrade.
On Wed, Dec 05, 2001 at 10:53:46PM +0100, Christoph Hellwig wrote:
> - each block queue gets a pointer to be used for partitioning, this will
> be opaque to the drivers.
Yuck!
Why do you want a reference to partitions?
IMHO: the whole major/minor thing is butt ugly for partitions.
Of course, I have no intention of implementing this, but I think:
* there should be an IDE major, and a SCSI major, etc.
(or perhaps just a hard disk major? it's rather ugly exposing
the difference, but at times useful)
* so, if the IDE major was 2, then major2/minor0 is the first hard
disk, major2/minor1 the second, etc...
* partition tables get the same interface as LVM. (partition tables
are really like VGs, that only permit one PV)
Actually, it's really already implemented (!). Just use LVM2.
It would be trivial for me to get parted to tell LVM2 to recognize
partition tables. (and also trivial, but slightly more painful,
to port the existing ptables in the kernel)
Also, this is 100% compatible with corry's desire to "let EVMS
handle it".
Andrew
PS sorry for the delay. My server died a few hours before I left for
Brazil...
On Tue, Dec 11, 2001 at 11:36:18PM +1100, Andrew Clausen wrote:
> On Wed, Dec 05, 2001 at 10:53:46PM +0100, Christoph Hellwig wrote:
> > - each block queue gets a pointer to be used for partitioning, this will
> > be opaque to the drivers.
>
> Yuck!
>
> Why do you want a reference to partitions?
I don't want - that't the point.
The whole block layer and the drivers should not know at all about
specific partitions and volume managers.
If we want to still support the current partition APIs it will not
be entirely possible to reach this goal, but Al convienced me in a
long discussion that 2.6 is to early to get rid of those yet.
Christoph
--
Of course it doesn't work. We've performed a software upgrade.