2009-03-15 10:47:23

by Jiri Slaby

[permalink] [raw]
Subject: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

Hi,

sometimes, when booting up/resuming from disk, I get an oops[1].

obj_desc->common_field.access_bit_width is zero, but even after the
loop. Division before the loop is apparently OK.

This is the case:
/* Mask off any extra bits in the last datum */

buffer_tail_bits = obj_desc->common_field.bit_length %
obj_desc->common_field.access_bit_width;

.L39:
xorl %edx, %edx #
movzbl 37(%rbx), %esi #
<variable>.common_field.access_bit_width,
<variable>.common_field.access_bit_width
movl 24(%rbx), %eax # <variable>.common_field.bit_length,
<variable>.common_field.bit_length
--------------- here:
divl %esi # <variable>.common_field.access_bit_width
movl %edx, %ecx #, tmp121
testl %edx, %edx # tmp121
je .L41 #,

[1] http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops.png


2009-03-15 17:54:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero



On Sun, 15 Mar 2009, Jiri Slaby wrote:
>
> sometimes, when booting up/resuming from disk, I get an oops[1].
>
> obj_desc->common_field.access_bit_width is zero, but even after the loop.
> Division before the loop is apparently OK.
>
> This is the case:
> /* Mask off any extra bits in the last datum */
>
> buffer_tail_bits = obj_desc->common_field.bit_length %
> obj_desc->common_field.access_bit_width;

Hmm. "bit_length" is zero too, according to the oops (it's in %eax).

Looks like the whole "obj_desc" is basically uninitialized. I have no idea
why, and why it is apparently not reliable. ACPI is odd.

Does everything work if you just do something like the appended?

It's insane, but so is ACPI.

Linus
---
drivers/acpi/acpica/exfldio.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/exfldio.c b/drivers/acpi/acpica/exfldio.c
index ef58ac4..9297adb 100644
--- a/drivers/acpi/acpica/exfldio.c
+++ b/drivers/acpi/acpica/exfldio.c
@@ -698,6 +698,9 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
}
ACPI_MEMSET(buffer, 0, buffer_length);

+ if (!obj_desc->common_field.bit_length)
+ return_ACPI_STATUS(AE_OK);
+
/* Compute the number of datums (access width data items) */

datum_count = ACPI_ROUND_UP_TO(obj_desc->common_field.bit_length,

2009-03-15 19:08:25

by Moore, Robert

[permalink] [raw]
Subject: RE: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

Can you please send or post the acpidump for the machine? Thanks.

>-----Original Message-----
>From: [email protected] [mailto:linux-acpi-
>[email protected]] On Behalf Of Jiri Slaby
>Sent: Sunday, March 15, 2009 3:47 AM
>To: Len Brown
>Cc: [email protected]; [email protected]; Linux kernel mailing list;
>[email protected]
>Subject: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
>
>Hi,
>
>sometimes, when booting up/resuming from disk, I get an oops[1].
>
>obj_desc->common_field.access_bit_width is zero, but even after the
>loop. Division before the loop is apparently OK.
>
>This is the case:
> /* Mask off any extra bits in the last datum */
>
> buffer_tail_bits = obj_desc->common_field.bit_length %
> obj_desc->common_field.access_bit_width;
>
>.L39:
> xorl %edx, %edx #
> movzbl 37(%rbx), %esi #
><variable>.common_field.access_bit_width,
><variable>.common_field.access_bit_width
> movl 24(%rbx), %eax # <variable>.common_field.bit_length,
><variable>.common_field.bit_length
>--------------- here:
> divl %esi # <variable>.common_field.access_bit_width
> movl %edx, %ecx #, tmp121
> testl %edx, %edx # tmp121
> je .L41 #,
>
>[1] http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops.png
>--
>To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-03-15 19:39:15

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

On 15.3.2009 20:08, Moore, Robert wrote:
> Can you please send or post the acpidump for the machine? Thanks.

Sure:
http://www.fi.muni.cz/~xslaby/sklad/adump

2009-03-16 03:51:33

by Lin Ming

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero


> Hi,
>
> sometimes, when booting up/resuming from disk, I get an oops[1].
>
> obj_desc->common_field.access_bit_width is zero, but even after the
> loop. Division before the loop is apparently OK.

Would please try below debug patch to see which region filed is
accessed?

diff --git a/drivers/acpi/acpica/exfldio.c b/drivers/acpi/acpica/exfldio.c
index ef58ac4..10ac0c8 100644
--- a/drivers/acpi/acpica/exfldio.c
+++ b/drivers/acpi/acpica/exfldio.c
@@ -683,6 +683,7 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
u32 datum_count;
u32 field_datum_count;
u32 i;
+ struct acpi_buffer name_buffer = { ACPI_ALLOCATE_BUFFER, NULL };

ACPI_FUNCTION_TRACE(ex_extract_from_field);

@@ -765,6 +766,20 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
raw_datum >> obj_desc->common_field.start_field_bit_offset;
}

+ if (obj_desc->common_field.bit_length == 0 ||
+ obj_desc->common_field.access_bit_width == 0) {
+ status = acpi_get_name(obj_desc->common_field.node,
+ ACPI_FULL_PATHNAME, &name_buffer);
+ if (ACPI_FAILURE(status)) {
+ printk(KERN_DEBUG "ACPI Debug: %s\n",
+ acpi_format_exception(status));
+ } else {
+ printk(KERN_DEBUG "ACPI Debug: field node path: %s\n",
+ (char *) name_buffer.pointer);
+ kfree(name_buffer.pointer);
+ }
+ }
+
/* Mask off any extra bits in the last datum */

buffer_tail_bits = obj_desc->common_field.bit_length %

---
Lin Ming

>
> This is the case:
> /* Mask off any extra bits in the last datum */
>
> buffer_tail_bits = obj_desc->common_field.bit_length %
> obj_desc->common_field.access_bit_width;
>
> .L39:
> xorl %edx, %edx #
> movzbl 37(%rbx), %esi #
> <variable>.common_field.access_bit_width,
> <variable>.common_field.access_bit_width
> movl 24(%rbx), %eax # <variable>.common_field.bit_length,
> <variable>.common_field.bit_length
> --------------- here:
> divl %esi # <variable>.common_field.access_bit_width
> movl %edx, %ecx #, tmp121
> testl %edx, %edx # tmp121
> je .L41 #,
>
> [1] http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops.png
> --

2009-03-16 16:31:50

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

On 16.3.2009 04:42, Lin Ming wrote:
>> sometimes, when booting up/resuming from disk, I get an oops[1].
>>
>> obj_desc->common_field.access_bit_width is zero, but even after the
>> loop. Division before the loop is apparently OK.
>
> Would please try below debug patch to see which region filed is
> accessed?

Yes, except the fact, we wouldn't see anything :) -- I'll add inside the
if a return statement or something to disallow the oops to flood screen.

Also, it doesn't happen regularly. Mostly several first tries after cold
start. I hope this is a regression and not a HW problem, but it emerged
after switching to 2.6.29-*, I will keep you informed.

Thanks.

2009-03-18 08:09:15

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

On 16.3.2009 17:31, Jiri Slaby wrote:
> On 16.3.2009 04:42, Lin Ming wrote:
>>> sometimes, when booting up/resuming from disk, I get an oops[1].
>>>
>>> obj_desc->common_field.access_bit_width is zero, but even after the
>>> loop. Division before the loop is apparently OK.
>>
>> Would please try below debug patch to see which region filed is
>> accessed?
>
> Yes, except the fact, we wouldn't see anything :) -- I'll add inside the
> if a return statement or something to disallow the oops to flood screen.
>
> Also, it doesn't happen regularly. Mostly several first tries after cold
> start. I hope this is a regression and not a HW problem, but it emerged
> after switching to 2.6.29-*, I will keep you informed.

Hmm, I didn't see it with rc8 and this patch applied so far (no 'ACPI
Debug' in dmesg). Also I don't see anything changed within drivers/acpi/
et al. between rc7 and rc8. Maybe the timing changed somewhere else...
I'm confused, but keep trying.

2009-03-19 16:06:57

by Jiri Slaby

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

On 18.3.2009 09:08, Jiri Slaby wrote:
> On 16.3.2009 17:31, Jiri Slaby wrote:
>> On 16.3.2009 04:42, Lin Ming wrote:
>>>> sometimes, when booting up/resuming from disk, I get an oops[1].
>>>>
>>>> obj_desc->common_field.access_bit_width is zero, but even after the
>>>> loop. Division before the loop is apparently OK.
>>>
>>> Would please try below debug patch to see which region filed is
>>> accessed?
>
> I'm confused, but keep trying.

Got it again. obj_desc->common_field.node is martian too (0x4000000), so
the added acpi_get_name dies:
http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops1.png

Whole common_field seems to be mangled. Ideas?

2009-03-20 00:59:00

by Lin Ming

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

On Fri, 2009-03-20 at 00:06 +0800, Jiri Slaby wrote:
> On 18.3.2009 09:08, Jiri Slaby wrote:
> > On 16.3.2009 17:31, Jiri Slaby wrote:
> >> On 16.3.2009 04:42, Lin Ming wrote:
> >>>> sometimes, when booting up/resuming from disk, I get an oops[1].
> >>>>
> >>>> obj_desc->common_field.access_bit_width is zero, but even after the
> >>>> loop. Division before the loop is apparently OK.
> >>>
> >>> Would please try below debug patch to see which region filed is
> >>> accessed?
> >
> > I'm confused, but keep trying.
>
> Got it again. obj_desc->common_field.node is martian too (0x4000000), so
> the added acpi_get_name dies:
> http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops1.png
>
> Whole common_field seems to be mangled. Ideas?

Thanks for the info. We are looking at this.

Lin Ming

2009-03-20 05:03:58

by Lin Ming

[permalink] [raw]
Subject: Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero

On Fri, 2009-03-20 at 00:06 +0800, Jiri Slaby wrote:
> On 18.3.2009 09:08, Jiri Slaby wrote:
> > On 16.3.2009 17:31, Jiri Slaby wrote:
> >> On 16.3.2009 04:42, Lin Ming wrote:
> >>>> sometimes, when booting up/resuming from disk, I get an oops[1].
> >>>>
> >>>> obj_desc->common_field.access_bit_width is zero, but even after the
> >>>> loop. Division before the loop is apparently OK.
> >>>
> >>> Would please try below debug patch to see which region filed is
> >>> accessed?
> >
> > I'm confused, but keep trying.
>
> Got it again. obj_desc->common_field.node is martian too (0x4000000), so
> the added acpi_get_name dies:
> http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops1.png
>
> Whole common_field seems to be mangled. Ideas?

I should save the field object and node path name at the entry of
acpi_ex_extract_from_field.
Would you please try below debug patch?
I also add a return statement before the fault to disallow the oops to
flood screen.

Thanks.

diff --git a/drivers/acpi/acpica/exfldio.c b/drivers/acpi/acpica/exfldio.c
index ef58ac4..5109f5b 100644
--- a/drivers/acpi/acpica/exfldio.c
+++ b/drivers/acpi/acpica/exfldio.c
@@ -670,6 +670,34 @@ acpi_ex_write_with_update_rule(union acpi_operand_object *obj_desc,
*
******************************************************************************/

+void debug_dump_field(union acpi_operand_object *obj_desc)
+{
+ struct acpi_object_field_common *obj_field;
+
+ if (!obj_desc) {
+ printk(KERN_DEBUG "ACPI Debug: NULL field object\n");
+ return;
+ }
+
+ printk(KERN_DEBUG "next_object: %p\n", obj_desc->common.next_object);
+ printk(KERN_DEBUG "descriptor_type: %x\n", obj_desc->common.descriptor_type);
+ printk(KERN_DEBUG "type: %x\n", obj_desc->common.type);
+ printk(KERN_DEBUG "reference_count: %x\n", obj_desc->common.reference_count);
+ printk(KERN_DEBUG "flags: %x\n", obj_desc->common.flags);
+
+ obj_field = &obj_desc->common_field;
+
+ printk(KERN_DEBUG "field_flags: %x\n", obj_field->field_flags);
+ printk(KERN_DEBUG "attribute: %x\n", obj_field->attribute);
+ printk(KERN_DEBUG "access_byte_width: %x\n", obj_field->access_byte_width);
+ printk(KERN_DEBUG "node: %p\n", obj_field->node);
+ printk(KERN_DEBUG "bit_length: %x\n", obj_field->bit_length);
+ printk(KERN_DEBUG "base_byte_offset: %x\n", obj_field->base_byte_offset);
+ printk(KERN_DEBUG "value: %x\n", obj_field->value);
+ printk(KERN_DEBUG "start_field_bit_offset: %x\n", obj_field->start_field_bit_offset);
+ printk(KERN_DEBUG "access_bit_width: %x\n", obj_field->access_bit_width);
+}
+
acpi_status
acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
void *buffer, u32 buffer_length)
@@ -683,9 +711,24 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
u32 datum_count;
u32 field_datum_count;
u32 i;
+ struct acpi_buffer name_buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+ union acpi_operand_object backup_obj;

ACPI_FUNCTION_TRACE(ex_extract_from_field);

+ /* backup the object */
+ memcpy(&backup_obj, obj_desc, sizeof(*obj_desc));
+
+ name_buffer.pointer = NULL;
+ if (obj_desc->common_field.node) {
+ status = acpi_get_name(obj_desc->common_field.node,
+ ACPI_FULL_PATHNAME, &name_buffer);
+ if (ACPI_FAILURE(status)) {
+ printk(KERN_DEBUG "ACPI Debug: %s\n",
+ acpi_format_exception(status));
+ }
+ }
+
/* Validate target buffer and clear it */

if (buffer_length <
@@ -765,6 +808,29 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
raw_datum >> obj_desc->common_field.start_field_bit_offset;
}

+ if (obj_desc->common_field.bit_length == 0 ||
+ obj_desc->common_field.access_bit_width == 0) {
+
+ /* oops, dump the object */
+ if (name_buffer.pointer) {
+ printk(KERN_DEBUG "ACPI Debug: field node path: %s\n",
+ (char *) name_buffer.pointer);
+ }
+
+ printk(KERN_DEBUG "ACPI Debug: The original object:\n");
+ debug_dump_field(&backup_obj);
+
+ printk(KERN_DEBUG "ACPI Debug: The bad object:\n");
+ debug_dump_field(obj_desc);
+
+ kfree(name_buffer.pointer);
+
+ /* we return here to disallow the oops to flood screen */
+ return_ACPI_STATUS(AE_ERROR);
+ }
+
+ kfree(name_buffer.pointer);
+
/* Mask off any extra bits in the last datum */

buffer_tail_bits = obj_desc->common_field.bit_length %