2019-01-22 16:12:51

by Ross Lagerwall

[permalink] [raw]
Subject: [PATCH 0/2] Fix crash in cper_estatus_check()

I recently encountered a crash in cper_estatus_check() when called by
bert_init(). Patches follow to fix the problem. Note that I cannot fully
test the patches since the hardware error record on that machine has
been cleared.

The crash log:

[ 125.666350] BUG: unable to handle kernel paging request at ffffc9004046d02c
[ 125.666503] PGD 1f6dce067 P4D 1f6dce067 PUD 1e6532067 PMD 1e3d11067 PTE 0
[ 125.666696] Oops: 0000 [#1] SMP KASAN NOPTI
[ 125.666837] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 4.19.0+0 #1
[ 125.666983] Hardware name: Dell Inc. PowerEdge M520/0DW6GX, BIOS 1.8.6 08/30/2013
[ 125.667171] RIP: e030:cper_estatus_check+0x7e/0xf0
[ 125.667315] Code: 41 29 c5 48 98 48 01 c3 48 89 d8 4c 29 e0 48 39 e8 7d 4a 48 8d 7b 18 be 04 00 00 00 e8 bb 6f 9f ff 48 8d 7b 14 be 02 00 00 00 <44> 8b 73 18 e8 a9 6f 9f ff 0f b6 4b 15 44 89 ee 66 83 f9 03 19 d2
[ 125.667554] RSP: e02b:ffff8881e65efce0 EFLAGS: 00010246
[ 125.667699] RAX: fffff5200808da06 RBX: ffffc9004046d014 RCX: ffffffff8192bf25
[ 125.667849] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffc9004046d028
[ 125.668009] RBP: 0000000000000700 R08: fffff5200808da06 R09: fffff5200808da06
[ 125.668207] R10: 0000000000000001 R11: fffff5200808da05 R12: ffffc9004046cc14
[ 125.668358] R13: 0000000000000300 R14: 00000000000000c0 R15: ffffc9004046cc00
[ 125.668519] FS: 0000000000000000(0000) GS:ffff8881e77c0000(0000) knlGS:0000000000000000
[ 125.668698] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 125.668844] CR2: ffffc9004046d02c CR3: 000000000260c000 CR4: 0000000000042660
[ 125.668999] Call Trace:
[ 125.669139] bert_init+0x21c/0x362
[ 125.669279] ? setup_bert_disable+0x12/0x12
[ 125.669420] ? pci_get_dev_by_id+0x57/0x70
[ 125.669560] ? pci_get_device+0x86/0xc0
[ 125.669738] ? pci_create_sysfs_dev_files+0x1a6/0x330
[ 125.669883] ? setup_bert_disable+0x12/0x12
[ 125.670026] ? set_debug_rodata+0x11/0x11
[ 125.670166] ? do_one_initcall+0x8b/0x253
[ 125.670306] do_one_initcall+0x8b/0x253
[ 125.670447] ? perf_trace_initcall_level+0x250/0x250
[ 125.670592] ? __wake_up_common+0x140/0x1d0
[ 125.670736] ? kasan_unpoison_shadow+0x30/0x40
[ 125.670879] ? kasan_unpoison_shadow+0x30/0x40
[ 125.671023] ? set_debug_rodata+0x11/0x11
[ 125.671164] kernel_init_freeable+0x269/0x304
[ 125.671346] ? rest_init+0xc0/0xc0
[ 125.671485] kernel_init+0xf/0x130
[ 125.671623] ? rest_init+0xc0/0xc0
[ 125.671761] ? rest_init+0xc0/0xc0
[ 125.671901] ret_from_fork+0x35/0x40
[ 125.672063] Modules linked in:
[ 125.672201] CR2: ffffc9004046d02c
[ 125.672349] ---[ end trace a17cd87742b2c49e ]---
[ 125.683693] RIP: e030:cper_estatus_check+0x7e/0xf0
[ 125.683840] Code: 41 29 c5 48 98 48 01 c3 48 89 d8 4c 29 e0 48 39 e8 7d 4a 48 8d 7b 18 be 04 00 00 00 e8 bb 6f 9f ff 48 8d 7b 14 be 02 00 00 00 <44> 8b 73 18 e8 a9 6f 9f ff 0f b6 4b 15 44 89 ee 66 83 f9 03 19 d2
[ 125.684103] RSP: e02b:ffff8881e65efce0 EFLAGS: 00010246
[ 125.684247] RAX: fffff5200808da06 RBX: ffffc9004046d014 RCX: ffffffff8192bf25
[ 125.684397] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffffc9004046d028
[ 125.684548] RBP: 0000000000000700 R08: fffff5200808da06 R09: fffff5200808da06
[ 125.684699] R10: 0000000000000001 R11: fffff5200808da05 R12: ffffc9004046cc14
[ 125.684850] R13: 0000000000000300 R14: 00000000000000c0 R15: ffffc9004046cc00
[ 125.685009] FS: 0000000000000000(0000) GS:ffff8881e77c0000(0000) knlGS:0000000000000000
[ 125.685224] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 125.685371] CR2: ffffc9004046d02c CR3: 000000000260c000 CR4: 0000000000042660
[ 125.685566] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

Thanks,

Ross Lagerwall (2):
acpi/apei: Avoid possible OOB when accessing BERT region
efi/cper: Avoid possible OOB when checking generic data block

drivers/acpi/apei/bert.c | 23 ++++++++++-------------
drivers/firmware/efi/cper.c | 10 ++++++----
2 files changed, 16 insertions(+), 17 deletions(-)

--
2.17.2



2019-01-22 16:11:42

by Ross Lagerwall

[permalink] [raw]
Subject: [PATCH 1/2] acpi/apei: Avoid possible OOB when accessing BERT region

Check that the length recorded in the generic error status block is
within the region before checking the contents of the region itself.
Otherwise it may result in an OOB access if the system firmware has
generated a status block with an invalid length (larger than the mapped
region). Also move the block_status check so that it only happens after
the block has been verified to be within the mapped region.

Signed-off-by: Ross Lagerwall <[email protected]>
---
drivers/acpi/apei/bert.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/acpi/apei/bert.c b/drivers/acpi/apei/bert.c
index 12771fcf0417..0d948d0a41af 100644
--- a/drivers/acpi/apei/bert.c
+++ b/drivers/acpi/apei/bert.c
@@ -42,15 +42,7 @@ static void __init bert_print_all(struct acpi_bert_region *region,
int remain = region_len;
u32 estatus_len;

- if (!estatus->block_status)
- return;
-
- while (remain > sizeof(struct acpi_bert_region)) {
- if (cper_estatus_check(estatus)) {
- pr_err(FW_BUG "Invalid error record.\n");
- return;
- }
-
+ while (remain >= sizeof(struct acpi_bert_region)) {
estatus_len = cper_estatus_len(estatus);
if (remain < estatus_len) {
pr_err(FW_BUG "Truncated status block (length: %u).\n",
@@ -58,6 +50,15 @@ static void __init bert_print_all(struct acpi_bert_region *region,
return;
}

+ /* No more error records. */
+ if (!estatus->block_status)
+ return;
+
+ if (cper_estatus_check(estatus)) {
+ pr_err(FW_BUG "Invalid error record.\n");
+ return;
+ }
+
pr_info_once("Error records from previous boot:\n");

cper_estatus_print(KERN_INFO HW_ERR, estatus);
@@ -70,10 +71,6 @@ static void __init bert_print_all(struct acpi_bert_region *region,
estatus->block_status = 0;

estatus = (void *)estatus + estatus_len;
- /* No more error records. */
- if (!estatus->block_status)
- return;
-
remain -= estatus_len;
}
}
--
2.17.2


2019-01-22 16:12:08

by Ross Lagerwall

[permalink] [raw]
Subject: [PATCH 2/2] efi/cper: Avoid possible OOB when checking generic data block

When checking a generic status block, we iterate over all the generic
data blocks. The loop condition only checks that the start of the
generic data block is valid (within estatus->data_length) but not the
whole block. Because the size of data blocks (excluding error data) may
vary depending on the revision and the revision is contained within the
data block, ensure that enough of the current data block is valid before
dereferencing any members otherwise an OOB access may occur if
estatus->data_length is invalid. This relies on the fact that
struct acpi_hest_generic_data_v300 is a superset of the earlier version.
Also rework the other checks to avoid potential underflow.

Signed-off-by: Ross Lagerwall <[email protected]>
---
drivers/firmware/efi/cper.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
index a7902fccdcfa..7cc18874b9d0 100644
--- a/drivers/firmware/efi/cper.c
+++ b/drivers/firmware/efi/cper.c
@@ -546,7 +546,7 @@ EXPORT_SYMBOL_GPL(cper_estatus_check_header);
int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
{
struct acpi_hest_generic_data *gdata;
- unsigned int data_len, gedata_len;
+ unsigned int data_len, record_len;
int rc;

rc = cper_estatus_check_header(estatus);
@@ -555,10 +555,12 @@ int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
data_len = estatus->data_length;

apei_estatus_for_each_section(estatus, gdata) {
- gedata_len = acpi_hest_get_error_length(gdata);
- if (gedata_len > data_len - acpi_hest_get_size(gdata))
+ if (sizeof(struct acpi_hest_generic_data) > data_len)
return -EINVAL;
- data_len -= acpi_hest_get_record_size(gdata);
+ record_len = acpi_hest_get_record_size(gdata);
+ if (record_len > data_len)
+ return -EINVAL;
+ data_len -= record_len;
}
if (data_len)
return -EINVAL;
--
2.17.2


2019-01-23 11:55:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] efi/cper: Avoid possible OOB when checking generic data block

On Tue, Jan 22, 2019 at 04:09:12PM +0000, Ross Lagerwall wrote:
> When checking a generic status block, we iterate over all the generic
> data blocks. The loop condition only checks that the start of the
> generic data block is valid (within estatus->data_length) but not the
> whole block. Because the size of data blocks (excluding error data) may
> vary depending on the revision and the revision is contained within the
> data block, ensure that enough of the current data block is valid before
> dereferencing any members otherwise an OOB access may occur if

Please write out the OOB abbreviation in your commit messages.

> estatus->data_length is invalid. This relies on the fact that
> struct acpi_hest_generic_data_v300 is a superset of the earlier version.
> Also rework the other checks to avoid potential underflow.
>
> Signed-off-by: Ross Lagerwall <[email protected]>
> ---
> drivers/firmware/efi/cper.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c
> index a7902fccdcfa..7cc18874b9d0 100644
> --- a/drivers/firmware/efi/cper.c
> +++ b/drivers/firmware/efi/cper.c
> @@ -546,7 +546,7 @@ EXPORT_SYMBOL_GPL(cper_estatus_check_header);
> int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
> {
> struct acpi_hest_generic_data *gdata;
> - unsigned int data_len, gedata_len;
> + unsigned int data_len, record_len;
> int rc;
>
> rc = cper_estatus_check_header(estatus);
> @@ -555,10 +555,12 @@ int cper_estatus_check(const struct acpi_hest_generic_status *estatus)
> data_len = estatus->data_length;
>
> apei_estatus_for_each_section(estatus, gdata) {
> - gedata_len = acpi_hest_get_error_length(gdata);
> - if (gedata_len > data_len - acpi_hest_get_size(gdata))
> + if (sizeof(struct acpi_hest_generic_data) > data_len)
> return -EINVAL;

<---- newline here.

Also, add a new line before the data_len assignment above, in the function.

> - data_len -= acpi_hest_get_record_size(gdata);
> + record_len = acpi_hest_get_record_size(gdata);

record_size so that it matches the function name it is used to compute
this.

Btw, trying to grok this code is making my head spin.

> + if (record_len > data_len)
> + return -EINVAL;

<---- newline here.

Btw, those checks in the loop you can abstract away into a separate
function so that you end up with something more readable like:

apei_estatus_for_each_section(estatus, gdata) {
record_size = check_hest_record_size(gdata, data_len);
if (!record_size)
return -EINVAL;

data_len -= record_size;
}

for example.

> + data_len -= record_len;
> }
> if (data_len)
> return -EINVAL;
> --

Thx.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2019-01-28 10:07:24

by Ross Lagerwall

[permalink] [raw]
Subject: Re: [PATCH 2/2] efi/cper: Avoid possible OOB when checking generic data block

On 1/23/19 11:54 AM, Borislav Petkov wrote:
> On Tue, Jan 22, 2019 at 04:09:12PM +0000, Ross Lagerwall wrote:
>> When checking a generic status block, we iterate over all the generic
>> data blocks. The loop condition only checks that the start of the
>> generic data block is valid (within estatus->data_length) but not the
>> whole block. Because the size of data blocks (excluding error data) may
>> vary depending on the revision and the revision is contained within the
>> data block, ensure that enough of the current data block is valid before
>> dereferencing any members otherwise an OOB access may occur if
snip
>> - data_len -= acpi_hest_get_record_size(gdata);
>> + record_len = acpi_hest_get_record_size(gdata);
>
> record_size so that it matches the function name it is used to compute
> this.
>
> Btw, trying to grok this code is making my head spin.
>
>> + if (record_len > data_len)
>> + return -EINVAL;
>
> <---- newline here.
>
> Btw, those checks in the loop you can abstract away into a separate
> function so that you end up with something more readable like:
>
> apei_estatus_for_each_section(estatus, gdata) {
> record_size = check_hest_record_size(gdata, data_len);
> if (!record_size)
> return -EINVAL;
>
> data_len -= record_size;
> }
>
> for example.
>

There are only two if statements in the loop body -- I don't think it is
necessary to abstract this into a separate function (which still
requires having one if statement in the loop body).

I've made the other changes you suggested and sent a V2.

Thanks,
--
Ross Lagerwall