2007-10-20 05:07:28

by emist

[permalink] [raw]
Subject: [PATCH] Bug fix for the s390 dcssblk driver

# This patch fixes a memory corruption bug in the s390 dcssblk driver.
# The bug occurs when an attempt to change the type of a segment
# returns an error. At this point the driver tries to remove the segment in
# question while some of the device's attributes are in use. This causes the
# driver to hang.
#
# questions/comments @ [email protected]

diff -urN linux-2.6.23.1/drivers/s390/block/dcssblk.c linuxx/drivers/s390/block/dcssblk.c
--- linux-2.6.23.1/drivers/s390/block/dcssblk.c 2007-10-12 12:43:44.000000000 -0400
+++ linuxx/drivers/s390/block/dcssblk.c 2007-10-20 00:51:19.000000000 -0400
@@ -253,8 +253,12 @@
SEGMENT_EXCLUSIVE);
if (rc < 0) {
BUG_ON(rc == -EINVAL);
- if (rc != -EAGAIN)
- goto removeseg;
+ if (rc != -EAGAIN){
+ PRINT_DEBUG("Could not reload segment %s in the specified format, reloading\n",
+ dev_info->segment_name);
+ rc = segment_modify_shared(dev_info->segment_name, SEGMENT_SHARED);
+ goto out;
+ }
} else {
dev_info->is_shared = 0;
set_disk_ro(dev_info->gd, 0);


Attachments:
dcssblk_fix (1.03 kB)

2007-10-20 17:24:48

by emist

[permalink] [raw]
Subject: Re: [PATCH] Bug fix for the s390 dcssblk driver

Frans Pop wrote:
> emist wrote:
>> The following patch fixes and issue in the s390 dcssblk driver. The
>> issue is caused when an unsuccessful attempt is made in order to change
>> a segment's type through the device attribute file "shared". This causes
>> the driver to remove the device in question, removing with it the device
>> attribute which is currently handling the call. The result is a hang on
>> the driver as it removes memory from under its feet.
>>
>> Not exactly sure if this explanation makes sense or its entirely
>> accurate. This is what I believe at this point from encountering and
>> fixing the error. Anyway here is the patch, hope it helps.
>
> Hi,
>
> If you don't get any reactions to your patch during the next few days, I
> suggest you resend it and then CC the [email protected] list and
> possibly also the maintainer at [email protected].
>
> In general you should always try to CC the relevant list/people as listed in
> the MAINTAINERS file and not just the linux-kernel list, both for patches
> and when reporting problems.
>
> Cheers,
> Frans Pop
>

Thanks Frans, I will do as you suggest.

Have a good one,

Igor H.

2007-10-21 10:09:40

by Heiko Carstens

[permalink] [raw]
Subject: Re: [PATCH] Bug fix for the s390 dcssblk driver

On Sat, Oct 20, 2007 at 01:24:34PM -0400, emist wrote:
> Frans Pop wrote:
> > emist wrote:
> >> The following patch fixes and issue in the s390 dcssblk driver. The
> >> issue is caused when an unsuccessful attempt is made in order to change
> >> a segment's type through the device attribute file "shared". This causes
> >> the driver to remove the device in question, removing with it the device
> >> attribute which is currently handling the call. The result is a hang on
> >> the driver as it removes memory from under its feet.
> >>
> >> Not exactly sure if this explanation makes sense or its entirely
> >> accurate. This is what I believe at this point from encountering and
> >> fixing the error. Anyway here is the patch, hope it helps.
> >
> > Hi,
> >
> > If you don't get any reactions to your patch during the next few days, I
> > suggest you resend it and then CC the [email protected] list and
> > possibly also the maintainer at [email protected].
> >
> > In general you should always try to CC the relevant list/people as listed in
> > the MAINTAINERS file and not just the linux-kernel list, both for patches
> > and when reporting problems.
> >
> > Cheers,
> > Frans Pop
> >
>
> Thanks Frans, I will do as you suggest.
>
> Have a good one,
>
> Igor H.

Gerald or Carsten (cc'ed) should look into this.
Thanks for reporting.

2007-10-22 03:47:06

by emist

[permalink] [raw]
Subject: Re: [PATCH] Bug fix for the s390 dcssblk driver

# This patch fixes a memory corruption bug in the s390 dcssblk driver.
# The bug occurs when an attempt to change the type of a segment
# returns an error. At this point the driver tries to remove the segment in
# question while some of the device's attributes are in use. This causes the
# driver to hang.
#
# questions/comments @ [email protected]


diff -urN linux-2.6.23.1/drivers/s390/block/dcssblk.c linuxx/drivers/s390/block/dcssblk.c
--- linux-2.6.23.1/drivers/s390/block/dcssblk.c 2007-10-20 01:19:29.000000000 -0400
+++ linuxx/drivers/s390/block/dcssblk.c 2007-10-20 01:16:13.000000000 -0400
@@ -230,8 +230,15 @@
SEGMENT_SHARED);
if (rc < 0) {
BUG_ON(rc == -EINVAL);
- if (rc != -EAGAIN)
- goto removeseg;
+ if (rc != -EAGAIN){
+ PRINT_DEBUG
+ ("Could not reload segment %s in the specified format, reloading\n",
+ dev_info->segment_name);
+ rc = segment_modify_shared(dev_info->
+ segment_name,
+ SEGMENT_EXCLUSIVE);
+ goto out;
+ }
} else {
dev_info->is_shared = 1;
switch (dev_info->segment_type) {
@@ -253,8 +260,12 @@
SEGMENT_EXCLUSIVE);
if (rc < 0) {
BUG_ON(rc == -EINVAL);
- if (rc != -EAGAIN)
- goto removeseg;
+ if (rc != -EAGAIN){
+ PRINT_DEBUG("Could not reload segment %s in the specified format, reloading\n",
+ dev_info->segment_name);
+ rc = segment_modify_shared(dev_info->segment_name, SEGMENT_SHARED);
+ goto out;
+ }
} else {
dev_info->is_shared = 0;
set_disk_ro(dev_info->gd, 0);


Attachments:
dcssblk_fix (1.50 kB)

2007-10-22 11:37:50

by Cornelia Huck

[permalink] [raw]
Subject: Re: [PATCH] Bug fix for the s390 dcssblk driver

On Sun, 21 Oct 2007 23:46:49 -0400,
emist <[email protected]> wrote:

> # This patch fixes a memory corruption bug in the s390 dcssblk driver.
> # The bug occurs when an attempt to change the type of a segment
> # returns an error. At this point the driver tries to remove the segment in
> # question while some of the device's attributes are in use. This causes the
> # driver to hang.

Hm, seems we missed another of those device attributes exhibiting
suicidal tendencies...

Tejun has a patchset allowing device attributes to commit suicide (see
http://marc.info/?l=linux-kernel&m=119027371416452&w=2), although I'm
not sure what its current status is. Until then, you would need to use
device_schedule_callback() to commit suicide.

This all of course only applies if killing the segment is better than
leaving it in its current state, but others can make a better judgement
on that :)

2007-10-23 13:22:56

by Gerald Schaefer

[permalink] [raw]
Subject: Re: [PATCH] Bug fix for the s390 dcssblk driver

On Mon, 2007-10-22 at 13:37 +0200, Cornelia Huck wrote:
> On Sun, 21 Oct 2007 23:46:49 -0400,
> emist <[email protected]> wrote:
>
> > # This patch fixes a memory corruption bug in the s390 dcssblk driver.
> > # The bug occurs when an attempt to change the type of a segment
> > # returns an error. At this point the driver tries to remove the segment in
> > # question while some of the device's attributes are in use. This causes the
> > # driver to hang.
>
> Hm, seems we missed another of those device attributes exhibiting
> suicidal tendencies...
>
> Tejun has a patchset allowing device attributes to commit suicide (see
> http://marc.info/?l=linux-kernel&m=119027371416452&w=2), although I'm
> not sure what its current status is. Until then, you would need to use
> device_schedule_callback() to commit suicide.
>
> This all of course only applies if killing the segment is better than
> leaving it in its current state, but others can make a better judgement
> on that :)

Hi,

thanks for reporting this bug, seems like we forgot to consider the
suicidal behavior of this driver when the device_unregister() stuff was
changed.

The best solution for now would be to use the scheduled callback
function, like Cornelia described. If segment_modify_shared() should
fail, the DCSS segment will be unloaded. Calling the function again
with the old "shared" flag will not help because it will not reload
the segment. So we need to remove/unregister the device in this error
path, and for now this should be done with device_schedule_callback().

Signed-off-by: Gerald Schaefer <[email protected]>
---

dcssblk.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

Index: linux-2.6.23/drivers/s390/block/dcssblk.c
===================================================================
--- linux-2.6.23.orig/drivers/s390/block/dcssblk.c
+++ linux-2.6.23/drivers/s390/block/dcssblk.c
@@ -193,6 +193,12 @@ dcssblk_segment_warn(int rc, char* seg_n
}
}

+static void dcssblk_unregister_callback(struct device *dev)
+{
+ device_unregister(dev);
+ put_device(dev);
+}
+
/*
* device attribute for switching shared/nonshared (exclusive)
* operation (show + store)
@@ -276,8 +282,7 @@ removeseg:
blk_cleanup_queue(dev_info->dcssblk_queue);
dev_info->gd->queue = NULL;
put_disk(dev_info->gd);
- device_unregister(dev);
- put_device(dev);
+ rc = device_schedule_callback(dev, dcssblk_unregister_callback);
out:
up_write(&dcssblk_devices_sem);
return rc;

2007-10-23 22:04:06

by emist

[permalink] [raw]
Subject: Re: [PATCH] Bug fix for the s390 dcssblk driver

Gerald Schaefer wrote:
> On Mon, 2007-10-22 at 13:37 +0200, Cornelia Huck wrote:
>> On Sun, 21 Oct 2007 23:46:49 -0400,
>> emist <[email protected]> wrote:
>>
>>> # This patch fixes a memory corruption bug in the s390 dcssblk driver.
>>> # The bug occurs when an attempt to change the type of a segment
>>> # returns an error. At this point the driver tries to remove the segment in
>>> # question while some of the device's attributes are in use. This causes the
>>> # driver to hang.
>> Hm, seems we missed another of those device attributes exhibiting
>> suicidal tendencies...
>>
>> Tejun has a patchset allowing device attributes to commit suicide (see
>> http://marc.info/?l=linux-kernel&m=119027371416452&w=2), although I'm
>> not sure what its current status is. Until then, you would need to use
>> device_schedule_callback() to commit suicide.
>>
>> This all of course only applies if killing the segment is better than
>> leaving it in its current state, but others can make a better judgement
>> on that :)
>
> Hi,
>
> thanks for reporting this bug, seems like we forgot to consider the
> suicidal behavior of this driver when the device_unregister() stuff was
> changed.
>
> The best solution for now would be to use the scheduled callback
> function, like Cornelia described. If segment_modify_shared() should
> fail, the DCSS segment will be unloaded. Calling the function again
> with the old "shared" flag will not help because it will not reload
> the segment. So we need to remove/unregister the device in this error
> path, and for now this should be done with device_schedule_callback().
>
> Signed-off-by: Gerald Schaefer <[email protected]>
> ---
>
> dcssblk.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.23/drivers/s390/block/dcssblk.c
> ===================================================================
> --- linux-2.6.23.orig/drivers/s390/block/dcssblk.c
> +++ linux-2.6.23/drivers/s390/block/dcssblk.c
> @@ -193,6 +193,12 @@ dcssblk_segment_warn(int rc, char* seg_n
> }
> }
>
> +static void dcssblk_unregister_callback(struct device *dev)
> +{
> + device_unregister(dev);
> + put_device(dev);
> +}
> +
> /*
> * device attribute for switching shared/nonshared (exclusive)
> * operation (show + store)
> @@ -276,8 +282,7 @@ removeseg:
> blk_cleanup_queue(dev_info->dcssblk_queue);
> dev_info->gd->queue = NULL;
> put_disk(dev_info->gd);
> - device_unregister(dev);
> - put_device(dev);
> + rc = device_schedule_callback(dev, dcssblk_unregister_callback);
> out:
> up_write(&dcssblk_devices_sem);
> return rc;
>
>
Hey,

That makes sense. I no longer have access to an s390 system so I will
not be able to provide a tested patch for this bug. I will however
attempt to fix this issue and submit a patch using the scheduled
callback function and maybe someone could make sure that it works properly.

Have a good one,

Igor H.