2019-10-02 21:23:12

by Qian Cai

[permalink] [raw]
Subject: memory leaks in dasd_eckd_check_characteristics() error paths

For some reasons, dasd_eckd_check_characteristics() received -ENOMEM and then
dasd_generic_set_online() emits this message,

dasd: 0.0.0122 Setting the DASD online with discipline ECKD failed with rc=-12

After that, there are several memory leaks below. There are "config_data" and
then stored as,

/* store per path conf_data */
device->path[pos].conf_data = conf_data;

When it processes the error path in  dasd_generic_set_online(), it calls
dasd_delete_device() which nuke the whole "struct dasd_device" without freeing
the device->path[].conf_data first. Is it safe to free those in
dasd_free_device() without worrying about the double-free? Or, is it better to
free those in dasd_eckd_check_characteristics()'s goto error handling, i.e.,
out_err*?

--- a/drivers/s390/block/dasd.c
+++ b/drivers/s390/block/dasd.c
@@ -153,6 +153,9 @@ struct dasd_device *dasd_alloc_device(void)
  */
 void dasd_free_device(struct dasd_device *device)
 {
+       for (int i = 0; i < 8; i++)
+               kfree(device->path[i].conf_data);
+
        kfree(device->private);
        free_pages((unsigned long) device->ese_mem, 1);
        free_page((unsigned long) device->erp_mem);


unreferenced object 0x0fcee900 (size 256):
  comm "dasdconf.sh", pid 446, jiffies 4294940081 (age 170.340s)
  hex dump (first 32 bytes):
    dc 01 01 00 f0 f0 f2 f1 f0 f7 f9 f0 f0 c9 c2 d4  ................
    f7 f5 f0 f0 f0 f0 f0 f0 f0 c6 d9 c2 f7 f1 62 33  ..............b3
  backtrace:
    [<00000000a83b1992>] kmem_cache_alloc_trace+0x200/0x388
    [<00000000048ef3e2>] dasd_eckd_read_conf+0x408/0x1400 [dasd_eckd_mod]
    [<00000000ce31f195>] dasd_eckd_check_characteristics+0x3cc/0x938
[dasd_eckd_mod]
    [<00000000f6f1759b>] dasd_generic_set_online+0x150/0x4c0
    [<00000000efca1efa>] ccw_device_set_online+0x324/0x808
    [<00000000f9779774>] online_store_recog_and_online+0xe8/0x220
    [<00000000349a5446>] online_store+0x2ce/0x420
    [<000000005bd145f8>] kernfs_fop_write+0x1bc/0x270
    [<0000000005664197>] vfs_write+0xce/0x220
    [<0000000044a8bccb>] ksys_write+0xea/0x190
    [<0000000037335938>] system_call+0x296/0x2b4


2019-10-16 16:05:26

by Stefan Haberland

[permalink] [raw]
Subject: Re: memory leaks in dasd_eckd_check_characteristics() error paths

Hi,

thanks for reporting this.

On 02.10.19 21:33, Qian Cai wrote:
> For some reasons, dasd_eckd_check_characteristics() received -ENOMEM and then
> dasd_generic_set_online() emits this message,
>
> dasd: 0.0.0122 Setting the DASD online with discipline ECKD failed with rc=-12
>
> After that, there are several memory leaks below. There are "config_data" and
> then stored as,
>
> /* store per path conf_data */
> device->path[pos].conf_data = conf_data;
>
> When it processes the error path in  dasd_generic_set_online(), it calls
> dasd_delete_device() which nuke the whole "struct dasd_device" without freeing
> the device->path[].conf_data first.

Usually dasd_delete_device() calls dasd_generic_free_discipline() which
takes care of
the device->path[].conf_data in dasd_eckd_uncheck_device().
From a first look this looks sane.

So I need to spend a closer look if this does not happen correctly here.

> Is it safe to free those in
> dasd_free_device() without worrying about the double-free? Or, is it better to
> free those in dasd_eckd_check_characteristics()'s goto error handling, i.e.,
> out_err*?
>
> --- a/drivers/s390/block/dasd.c
> +++ b/drivers/s390/block/dasd.c
> @@ -153,6 +153,9 @@ struct dasd_device *dasd_alloc_device(void)
>   */
>  void dasd_free_device(struct dasd_device *device)
>  {
> +       for (int i = 0; i < 8; i++)
> +               kfree(device->path[i].conf_data);
> +
>         kfree(device->private);
>         free_pages((unsigned long) device->ese_mem, 1);
>         free_page((unsigned long) device->erp_mem);
>
>
> unreferenced object 0x0fcee900 (size 256):
>   comm "dasdconf.sh", pid 446, jiffies 4294940081 (age 170.340s)
>   hex dump (first 32 bytes):
>     dc 01 01 00 f0 f0 f2 f1 f0 f7 f9 f0 f0 c9 c2 d4  ................
>     f7 f5 f0 f0 f0 f0 f0 f0 f0 c6 d9 c2 f7 f1 62 33  ..............b3
>   backtrace:
>     [<00000000a83b1992>] kmem_cache_alloc_trace+0x200/0x388
>     [<00000000048ef3e2>] dasd_eckd_read_conf+0x408/0x1400 [dasd_eckd_mod]
>     [<00000000ce31f195>] dasd_eckd_check_characteristics+0x3cc/0x938
> [dasd_eckd_mod]
>     [<00000000f6f1759b>] dasd_generic_set_online+0x150/0x4c0
>     [<00000000efca1efa>] ccw_device_set_online+0x324/0x808
>     [<00000000f9779774>] online_store_recog_and_online+0xe8/0x220
>     [<00000000349a5446>] online_store+0x2ce/0x420
>     [<000000005bd145f8>] kernfs_fop_write+0x1bc/0x270
>     [<0000000005664197>] vfs_write+0xce/0x220
>     [<0000000044a8bccb>] ksys_write+0xea/0x190
>     [<0000000037335938>] system_call+0x296/0x2b4


2019-10-16 16:15:35

by Qian Cai

[permalink] [raw]
Subject: Re: memory leaks in dasd_eckd_check_characteristics() error paths

On Wed, 2019-10-16 at 15:29 +0200, Stefan Haberland wrote:
> Hi,
>
> thanks for reporting this.
>
> On 02.10.19 21:33, Qian Cai wrote:
> > For some reasons, dasd_eckd_check_characteristics() received -ENOMEM and then
> > dasd_generic_set_online() emits this message,
> >
> > dasd: 0.0.0122 Setting the DASD online with discipline ECKD failed with rc=-12
> >
> > After that, there are several memory leaks below. There are "config_data" and
> > then stored as,
> >
> > /* store per path conf_data */
> > device->path[pos].conf_data = conf_data;
> >
> > When it processes the error path in  dasd_generic_set_online(), it calls
> > dasd_delete_device() which nuke the whole "struct dasd_device" without freeing
> > the device->path[].conf_data first.
>
> Usually dasd_delete_device() calls dasd_generic_free_discipline() which
> takes care of
> the device->path[].conf_data in dasd_eckd_uncheck_device().
> From a first look this looks sane.
>
> So I need to spend a closer look if this does not happen correctly here.

When dasd_eckd_check_characteristics() failed here,

if (!private) {
private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
if (!private) {
dev_warn(&device->cdev->dev,
 "Allocating memory for private DASD data "
 "failed\n");
return -ENOMEM;
}
device->private = private;

The device->private is NULL.

Then, in dasd_eckd_uncheck_device(), it will return immediately.

if (!private)
return;

>
> > Is it safe to free those in
> > dasd_free_device() without worrying about the double-free? Or, is it better to
> > free those in dasd_eckd_check_characteristics()'s goto error handling, i.e.,
> > out_err*?
> >
> > --- a/drivers/s390/block/dasd.c
> > +++ b/drivers/s390/block/dasd.c
> > @@ -153,6 +153,9 @@ struct dasd_device *dasd_alloc_device(void)
> >   */
> >  void dasd_free_device(struct dasd_device *device)
> >  {
> > +       for (int i = 0; i < 8; i++)
> > +               kfree(device->path[i].conf_data);
> > +
> >         kfree(device->private);
> >         free_pages((unsigned long) device->ese_mem, 1);
> >         free_page((unsigned long) device->erp_mem);
> >
> >
> > unreferenced object 0x0fcee900 (size 256):
> >   comm "dasdconf.sh", pid 446, jiffies 4294940081 (age 170.340s)
> >   hex dump (first 32 bytes):
> >     dc 01 01 00 f0 f0 f2 f1 f0 f7 f9 f0 f0 c9 c2 d4  ................
> >     f7 f5 f0 f0 f0 f0 f0 f0 f0 c6 d9 c2 f7 f1 62 33  ..............b3
> >   backtrace:
> >     [<00000000a83b1992>] kmem_cache_alloc_trace+0x200/0x388
> >     [<00000000048ef3e2>] dasd_eckd_read_conf+0x408/0x1400 [dasd_eckd_mod]
> >     [<00000000ce31f195>] dasd_eckd_check_characteristics+0x3cc/0x938
> > [dasd_eckd_mod]
> >     [<00000000f6f1759b>] dasd_generic_set_online+0x150/0x4c0
> >     [<00000000efca1efa>] ccw_device_set_online+0x324/0x808
> >     [<00000000f9779774>] online_store_recog_and_online+0xe8/0x220
> >     [<00000000349a5446>] online_store+0x2ce/0x420
> >     [<000000005bd145f8>] kernfs_fop_write+0x1bc/0x270
> >     [<0000000005664197>] vfs_write+0xce/0x220
> >     [<0000000044a8bccb>] ksys_write+0xea/0x190
> >     [<0000000037335938>] system_call+0x296/0x2b4
>
>

2019-10-16 16:36:40

by Stefan Haberland

[permalink] [raw]
Subject: Re: memory leaks in dasd_eckd_check_characteristics() error paths

On 16.10.19 16:09, Qian Cai wrote:
> On Wed, 2019-10-16 at 15:29 +0200, Stefan Haberland wrote:
>> Hi,
>>
>> thanks for reporting this.
>>
>> On 02.10.19 21:33, Qian Cai wrote:
>>> For some reasons, dasd_eckd_check_characteristics() received -ENOMEM and then
>>> dasd_generic_set_online() emits this message,
>>>
>>> dasd: 0.0.0122 Setting the DASD online with discipline ECKD failed with rc=-12
>>>
>>> After that, there are several memory leaks below. There are "config_data" and
>>> then stored as,
>>>
>>> /* store per path conf_data */
>>> device->path[pos].conf_data = conf_data;
>>>
>>> When it processes the error path in  dasd_generic_set_online(), it calls
>>> dasd_delete_device() which nuke the whole "struct dasd_device" without freeing
>>> the device->path[].conf_data first.
>> Usually dasd_delete_device() calls dasd_generic_free_discipline() which
>> takes care of
>> the device->path[].conf_data in dasd_eckd_uncheck_device().
>> From a first look this looks sane.
>>
>> So I need to spend a closer look if this does not happen correctly here.
> When dasd_eckd_check_characteristics() failed here,
>
> if (!private) {
> private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> if (!private) {
> dev_warn(&device->cdev->dev,
>  "Allocating memory for private DASD data "
>  "failed\n");
> return -ENOMEM;
> }
> device->private = private;
>
> The device->private is NULL.
>
> Then, in dasd_eckd_uncheck_device(), it will return immediately.
>
> if (!private)
> return;

Yes but in this case there is no per_path configuration data stored.
This is done after the private structure is allocated successfully.


>>> Is it safe to free those in
>>> dasd_free_device() without worrying about the double-free? Or, is it better to
>>> free those in dasd_eckd_check_characteristics()'s goto error handling, i.e.,
>>> out_err*?
>>>
>>> --- a/drivers/s390/block/dasd.c
>>> +++ b/drivers/s390/block/dasd.c
>>> @@ -153,6 +153,9 @@ struct dasd_device *dasd_alloc_device(void)
>>>   */
>>>  void dasd_free_device(struct dasd_device *device)
>>>  {
>>> +       for (int i = 0; i < 8; i++)
>>> +               kfree(device->path[i].conf_data);
>>> +
>>>         kfree(device->private);
>>>         free_pages((unsigned long) device->ese_mem, 1);
>>>         free_page((unsigned long) device->erp_mem);
>>>
>>>
>>> unreferenced object 0x0fcee900 (size 256):
>>>   comm "dasdconf.sh", pid 446, jiffies 4294940081 (age 170.340s)
>>>   hex dump (first 32 bytes):
>>>     dc 01 01 00 f0 f0 f2 f1 f0 f7 f9 f0 f0 c9 c2 d4  ................
>>>     f7 f5 f0 f0 f0 f0 f0 f0 f0 c6 d9 c2 f7 f1 62 33  ..............b3
>>>   backtrace:
>>>     [<00000000a83b1992>] kmem_cache_alloc_trace+0x200/0x388
>>>     [<00000000048ef3e2>] dasd_eckd_read_conf+0x408/0x1400 [dasd_eckd_mod]
>>>     [<00000000ce31f195>] dasd_eckd_check_characteristics+0x3cc/0x938
>>> [dasd_eckd_mod]
>>>     [<00000000f6f1759b>] dasd_generic_set_online+0x150/0x4c0
>>>     [<00000000efca1efa>] ccw_device_set_online+0x324/0x808
>>>     [<00000000f9779774>] online_store_recog_and_online+0xe8/0x220
>>>     [<00000000349a5446>] online_store+0x2ce/0x420
>>>     [<000000005bd145f8>] kernfs_fop_write+0x1bc/0x270
>>>     [<0000000005664197>] vfs_write+0xce/0x220
>>>     [<0000000044a8bccb>] ksys_write+0xea/0x190
>>>     [<0000000037335938>] system_call+0x296/0x2b4
>>

2019-10-16 17:08:08

by Qian Cai

[permalink] [raw]
Subject: Re: memory leaks in dasd_eckd_check_characteristics() error paths

On Wed, 2019-10-16 at 16:56 +0200, Stefan Haberland wrote:
> On 16.10.19 16:09, Qian Cai wrote:
> > On Wed, 2019-10-16 at 15:29 +0200, Stefan Haberland wrote:
> > > Hi,
> > >
> > > thanks for reporting this.
> > >
> > > On 02.10.19 21:33, Qian Cai wrote:
> > > > For some reasons, dasd_eckd_check_characteristics() received -ENOMEM and then
> > > > dasd_generic_set_online() emits this message,
> > > >
> > > > dasd: 0.0.0122 Setting the DASD online with discipline ECKD failed with rc=-12
> > > >
> > > > After that, there are several memory leaks below. There are "config_data" and
> > > > then stored as,
> > > >
> > > > /* store per path conf_data */
> > > > device->path[pos].conf_data = conf_data;
> > > >
> > > > When it processes the error path in  dasd_generic_set_online(), it calls
> > > > dasd_delete_device() which nuke the whole "struct dasd_device" without freeing
> > > > the device->path[].conf_data first.
> > >
> > > Usually dasd_delete_device() calls dasd_generic_free_discipline() which
> > > takes care of
> > > the device->path[].conf_data in dasd_eckd_uncheck_device().
> > > From a first look this looks sane.
> > >
> > > So I need to spend a closer look if this does not happen correctly here.
> >
> > When dasd_eckd_check_characteristics() failed here,
> >
> > if (!private) {
> > private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
> > if (!private) {
> > dev_warn(&device->cdev->dev,
> >  "Allocating memory for private DASD data "
> >  "failed\n");
> > return -ENOMEM;
> > }
> > device->private = private;
> >
> > The device->private is NULL.
> >
> > Then, in dasd_eckd_uncheck_device(), it will return immediately.
> >
> > if (!private)
> > return;
>
> Yes but in this case there is no per_path configuration data stored.
> This is done after the private structure is allocated successfully.

Yes, you are right. It has been a while so I must lost a bit memory. Actually,
it looks like in dasd_eckd_check_characteristic() that device->private is set to
NULL from this path,

/* Read Configuration Data */
rc = dasd_eckd_read_conf(device);
if (rc)
goto out_err1;

out_err1:
kfree(private->conf_data);
kfree(device->private);
device->private = NULL;
return rc;

because dasd_eckd_read_conf() returns -ENOMEM and calls,

out_error:
kfree(rcd_buf);
*rcd_buffer = NULL;
*rcd_buffer_size = 0;
return ret;

It will only free its own config_data in one operational path, but there could
has already allocated in earlier paths in dasd_eckd_read_conf() without any
rollback and calls return,

for (lpm = 0x80; lpm; lpm>>= 1) {
if (!(lpm & opm))
continue;
rc = dasd_eckd_read_conf_lpm(device, &conf_data,
     &conf_len, lpm);
if (rc && rc != -EOPNOTSUPP) { /* -EOPNOTSUPP is ok */
DBF_EVENT_DEVID(DBF_WARNING, device->cdev,
"Read configuration data returned "
"error %d", rc);
return rc;
}

Later, dasd_eckd_uncheck_device() see private->device is NULL without cleaning
up. Does it make sense?

>
>
> > > > Is it safe to free those in
> > > > dasd_free_device() without worrying about the double-free? Or, is it better to
> > > > free those in dasd_eckd_check_characteristics()'s goto error handling, i.e.,
> > > > out_err*?
> > > >
> > > > --- a/drivers/s390/block/dasd.c
> > > > +++ b/drivers/s390/block/dasd.c
> > > > @@ -153,6 +153,9 @@ struct dasd_device *dasd_alloc_device(void)
> > > >   */
> > > >  void dasd_free_device(struct dasd_device *device)
> > > >  {
> > > > +       for (int i = 0; i < 8; i++)
> > > > +               kfree(device->path[i].conf_data);
> > > > +
> > > >         kfree(device->private);
> > > >         free_pages((unsigned long) device->ese_mem, 1);
> > > >         free_page((unsigned long) device->erp_mem);
> > > >
> > > >
> > > > unreferenced object 0x0fcee900 (size 256):
> > > >   comm "dasdconf.sh", pid 446, jiffies 4294940081 (age 170.340s)
> > > >   hex dump (first 32 bytes):
> > > >     dc 01 01 00 f0 f0 f2 f1 f0 f7 f9 f0 f0 c9 c2 d4  ................
> > > >     f7 f5 f0 f0 f0 f0 f0 f0 f0 c6 d9 c2 f7 f1 62 33  ..............b3
> > > >   backtrace:
> > > >     [<00000000a83b1992>] kmem_cache_alloc_trace+0x200/0x388
> > > >     [<00000000048ef3e2>] dasd_eckd_read_conf+0x408/0x1400 [dasd_eckd_mod]
> > > >     [<00000000ce31f195>] dasd_eckd_check_characteristics+0x3cc/0x938
> > > > [dasd_eckd_mod]
> > > >     [<00000000f6f1759b>] dasd_generic_set_online+0x150/0x4c0
> > > >     [<00000000efca1efa>] ccw_device_set_online+0x324/0x808
> > > >     [<00000000f9779774>] online_store_recog_and_online+0xe8/0x220
> > > >     [<00000000349a5446>] online_store+0x2ce/0x420
> > > >     [<000000005bd145f8>] kernfs_fop_write+0x1bc/0x270
> > > >     [<0000000005664197>] vfs_write+0xce/0x220
> > > >     [<0000000044a8bccb>] ksys_write+0xea/0x190
> > > >     [<0000000037335938>] system_call+0x296/0x2b4
>
>

2019-10-16 17:09:03

by Stefan Haberland

[permalink] [raw]
Subject: Re: memory leaks in dasd_eckd_check_characteristics() error paths

On 16.10.19 17:26, Qian Cai wrote:
> On Wed, 2019-10-16 at 16:56 +0200, Stefan Haberland wrote:
>> On 16.10.19 16:09, Qian Cai wrote:
>>> On Wed, 2019-10-16 at 15:29 +0200, Stefan Haberland wrote:
>>>> Hi,
>>>>
>>>> thanks for reporting this.
>>>>
>>>> On 02.10.19 21:33, Qian Cai wrote:
>>>>> For some reasons, dasd_eckd_check_characteristics() received -ENOMEM and then
>>>>> dasd_generic_set_online() emits this message,
>>>>>
>>>>> dasd: 0.0.0122 Setting the DASD online with discipline ECKD failed with rc=-12
>>>>>
>>>>> After that, there are several memory leaks below. There are "config_data" and
>>>>> then stored as,
>>>>>
>>>>> /* store per path conf_data */
>>>>> device->path[pos].conf_data = conf_data;
>>>>>
>>>>> When it processes the error path in  dasd_generic_set_online(), it calls
>>>>> dasd_delete_device() which nuke the whole "struct dasd_device" without freeing
>>>>> the device->path[].conf_data first.
>>>> Usually dasd_delete_device() calls dasd_generic_free_discipline() which
>>>> takes care of
>>>> the device->path[].conf_data in dasd_eckd_uncheck_device().
>>>> From a first look this looks sane.
>>>>
>>>> So I need to spend a closer look if this does not happen correctly here.
>>> When dasd_eckd_check_characteristics() failed here,
>>>
>>> if (!private) {
>>> private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
>>> if (!private) {
>>> dev_warn(&device->cdev->dev,
>>>  "Allocating memory for private DASD data "
>>>  "failed\n");
>>> return -ENOMEM;
>>> }
>>> device->private = private;
>>>
>>> The device->private is NULL.
>>>
>>> Then, in dasd_eckd_uncheck_device(), it will return immediately.
>>>
>>> if (!private)
>>> return;
>> Yes but in this case there is no per_path configuration data stored.
>> This is done after the private structure is allocated successfully.
> Yes, you are right. It has been a while so I must lost a bit memory. Actually,
> it looks like in dasd_eckd_check_characteristic() that device->private is set to
> NULL from this path,
>
> /* Read Configuration Data */
> rc = dasd_eckd_read_conf(device);
> if (rc)
> goto out_err1;
>
> out_err1:
> kfree(private->conf_data);
> kfree(device->private);
> device->private = NULL;
> return rc;
>
> because dasd_eckd_read_conf() returns -ENOMEM and calls,
>
> out_error:
> kfree(rcd_buf);
> *rcd_buffer = NULL;
> *rcd_buffer_size = 0;
> return ret;
>
> It will only free its own config_data in one operational path, but there could
> has already allocated in earlier paths in dasd_eckd_read_conf() without any
> rollback and calls return,
>
> for (lpm = 0x80; lpm; lpm>>= 1) {
> if (!(lpm & opm))
> continue;
> rc = dasd_eckd_read_conf_lpm(device, &conf_data,
>      &conf_len, lpm);
> if (rc && rc != -EOPNOTSUPP) { /* -EOPNOTSUPP is ok */
> DBF_EVENT_DEVID(DBF_WARNING, device->cdev,
> "Read configuration data returned "
> "error %d", rc);
> return rc;
> }
>
> Later, dasd_eckd_uncheck_device() see private->device is NULL without cleaning
> up. Does it make sense?

Yes, this looks like an error path not handling this correctly.

>
>>
>>>>> Is it safe to free those in
>>>>> dasd_free_device() without worrying about the double-free? Or, is it better to
>>>>> free those in dasd_eckd_check_characteristics()'s goto error handling, i.e.,
>>>>> out_err*?
>>>>>
>>>>> --- a/drivers/s390/block/dasd.c
>>>>> +++ b/drivers/s390/block/dasd.c
>>>>> @@ -153,6 +153,9 @@ struct dasd_device *dasd_alloc_device(void)
>>>>>   */
>>>>>  void dasd_free_device(struct dasd_device *device)
>>>>>  {
>>>>> +       for (int i = 0; i < 8; i++)
>>>>> +               kfree(device->path[i].conf_data);
>>>>> +
>>>>>         kfree(device->private);
>>>>>         free_pages((unsigned long) device->ese_mem, 1);
>>>>>         free_page((unsigned long) device->erp_mem);
>>>>>
>>>>>
>>>>> unreferenced object 0x0fcee900 (size 256):
>>>>>   comm "dasdconf.sh", pid 446, jiffies 4294940081 (age 170.340s)
>>>>>   hex dump (first 32 bytes):
>>>>>     dc 01 01 00 f0 f0 f2 f1 f0 f7 f9 f0 f0 c9 c2 d4  ................
>>>>>     f7 f5 f0 f0 f0 f0 f0 f0 f0 c6 d9 c2 f7 f1 62 33  ..............b3
>>>>>   backtrace:
>>>>>     [<00000000a83b1992>] kmem_cache_alloc_trace+0x200/0x388
>>>>>     [<00000000048ef3e2>] dasd_eckd_read_conf+0x408/0x1400 [dasd_eckd_mod]
>>>>>     [<00000000ce31f195>] dasd_eckd_check_characteristics+0x3cc/0x938
>>>>> [dasd_eckd_mod]
>>>>>     [<00000000f6f1759b>] dasd_generic_set_online+0x150/0x4c0
>>>>>     [<00000000efca1efa>] ccw_device_set_online+0x324/0x808
>>>>>     [<00000000f9779774>] online_store_recog_and_online+0xe8/0x220
>>>>>     [<00000000349a5446>] online_store+0x2ce/0x420
>>>>>     [<000000005bd145f8>] kernfs_fop_write+0x1bc/0x270
>>>>>     [<0000000005664197>] vfs_write+0xce/0x220
>>>>>     [<0000000044a8bccb>] ksys_write+0xea/0x190
>>>>>     [<0000000037335938>] system_call+0x296/0x2b4
>>

2019-10-19 08:19:20

by Stefan Haberland

[permalink] [raw]
Subject: Re: memory leaks in dasd_eckd_check_characteristics() error paths

On 16.10.19 17:28, Stefan Haberland wrote:
> On 16.10.19 17:26, Qian Cai wrote:
>> On Wed, 2019-10-16 at 16:56 +0200, Stefan Haberland wrote:
>>> On 16.10.19 16:09, Qian Cai wrote:
>>>> On Wed, 2019-10-16 at 15:29 +0200, Stefan Haberland wrote:
>>>>> Hi,
>>>>>
>>>>> thanks for reporting this.
>>>>>
>>>>> On 02.10.19 21:33, Qian Cai wrote:
>>>>>> For some reasons, dasd_eckd_check_characteristics() received -ENOMEM and then
>>>>>> dasd_generic_set_online() emits this message,
>>>>>>
>>>>>> dasd: 0.0.0122 Setting the DASD online with discipline ECKD failed with rc=-12
>>>>>>
>>>>>> After that, there are several memory leaks below. There are "config_data" and
>>>>>> then stored as,
>>>>>>
>>>>>> /* store per path conf_data */
>>>>>> device->path[pos].conf_data = conf_data;
>>>>>>
>>>>>> When it processes the error path in  dasd_generic_set_online(), it calls
>>>>>> dasd_delete_device() which nuke the whole "struct dasd_device" without freeing
>>>>>> the device->path[].conf_data first.
>>>>> Usually dasd_delete_device() calls dasd_generic_free_discipline() which
>>>>> takes care of
>>>>> the device->path[].conf_data in dasd_eckd_uncheck_device().
>>>>> From a first look this looks sane.
>>>>>
>>>>> So I need to spend a closer look if this does not happen correctly here.
>>>> When dasd_eckd_check_characteristics() failed here,
>>>>
>>>> if (!private) {
>>>> private = kzalloc(sizeof(*private), GFP_KERNEL | GFP_DMA);
>>>> if (!private) {
>>>> dev_warn(&device->cdev->dev,
>>>>  "Allocating memory for private DASD data "
>>>>  "failed\n");
>>>> return -ENOMEM;
>>>> }
>>>> device->private = private;
>>>>
>>>> The device->private is NULL.
>>>>
>>>> Then, in dasd_eckd_uncheck_device(), it will return immediately.
>>>>
>>>> if (!private)
>>>> return;
>>> Yes but in this case there is no per_path configuration data stored.
>>> This is done after the private structure is allocated successfully.
>> Yes, you are right. It has been a while so I must lost a bit memory. Actually,
>> it looks like in dasd_eckd_check_characteristic() that device->private is set to
>> NULL from this path,
>>
>> /* Read Configuration Data */
>> rc = dasd_eckd_read_conf(device);
>> if (rc)
>> goto out_err1;
>>
>> out_err1:
>> kfree(private->conf_data);
>> kfree(device->private);
>> device->private = NULL;
>> return rc;
>>
>> because dasd_eckd_read_conf() returns -ENOMEM and calls,
>>
>> out_error:
>> kfree(rcd_buf);
>> *rcd_buffer = NULL;
>> *rcd_buffer_size = 0;
>> return ret;
>>
>> It will only free its own config_data in one operational path, but there could
>> has already allocated in earlier paths in dasd_eckd_read_conf() without any
>> rollback and calls return,
>>
>> for (lpm = 0x80; lpm; lpm>>= 1) {
>> if (!(lpm & opm))
>> continue;
>> rc = dasd_eckd_read_conf_lpm(device, &conf_data,
>>      &conf_len, lpm);
>> if (rc && rc != -EOPNOTSUPP) { /* -EOPNOTSUPP is ok */
>> DBF_EVENT_DEVID(DBF_WARNING, device->cdev,
>> "Read configuration data returned "
>> "error %d", rc);
>> return rc;
>> }
>>
>> Later, dasd_eckd_uncheck_device() see private->device is NULL without cleaning
>> up. Does it make sense?
> Yes, this looks like an error path not handling this correctly.
>

I will provide a patch for this. Most likely I will cover this in the
error handling of the function.
Thanks again for reporting.

Regards,
Stefan