Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752832AbcD0NAU (ORCPT ); Wed, 27 Apr 2016 09:00:20 -0400 Received: from mail-ig0-f174.google.com ([209.85.213.174]:34807 "EHLO mail-ig0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752385AbcD0NAR (ORCPT ); Wed, 27 Apr 2016 09:00:17 -0400 Subject: Re: [PATCH] block: partitions: efi: Always check for alternative GPT at end of drive To: Ard Biesheuvel , "Elliott, Robert (Persistent Memory)" References: <1461632806-5946-1-git-send-email-jwerner@chromium.org> <20160426102014.o7k77uzi32h73y3b@ws.net.home> <20160426183353.GB16601@linux-uzut.site> <94D0CD8314A33A4D9D801C0FE68B402963904365@G4W3296.americas.hpqcorp.net> Cc: Davidlohr Bueso , Karel Zak , Julius Werner , "linux-efi@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , Gwendal Grignou , Doug Anderson From: "Austin S. Hemmelgarn" Message-ID: <5720B7CA.1050002@gmail.com> Date: Wed, 27 Apr 2016 08:59:54 -0400 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: avast! (VPS 160427-0, 2016-04-27), Outbound message X-Antivirus-Status: Clean Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5637 Lines: 120 On 2016-04-27 02:00, Ard Biesheuvel wrote: > On 26 April 2016 at 22:34, Elliott, Robert (Persistent Memory) > wrote: >> >> >>> -----Original Message----- >>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- >>> owner@vger.kernel.org] On Behalf Of Davidlohr Bueso >>> Sent: Tuesday, April 26, 2016 1:34 PM >>> To: Karel Zak >>> Cc: Julius Werner ; linux-efi@vger.kernel.org; >>> linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; Gwendal >>> Grignou ; Doug Anderson >>> Subject: Re: [PATCH] block: partitions: efi: Always check for >>> alternative GPT at end of drive >>> >>> On Tue, 26 Apr 2016, Karel Zak wrote: >>> >>>> On Mon, Apr 25, 2016 at 06:06:46PM -0700, Julius Werner wrote: >>>>> The GUID Partiton Table layout maintains two synonymous partition >>>>> tables on a block device, one starting in sector 1 and one in the >>>>> very last sectors of the block device. This is useful if one of >>>>> the tables gets >>>>> accidentally corrupted (e.g. through a partial write because of an >>>>> unexpected power loss). >>>>> >>>>> Linux normally only boots if the primary GPT is valid. It will not >>>>> even try to find the alternative GPT to an invalid primary one >>>>> unless the "gpt" command line option forces more aggressive >>>>> detection. This doesn't >>>>> really make any sense... if the "gpt" option is not set, the code >>>>> validates the protective or hybrid MBR in sector 0 anyway before >>>>> it even starts looking for the actual GPTs. If we get to the point >>>>> where a valid proctective or hybrid MBR was found but the primary >>>>> GPT was not found (valid), checking the alternative GPT is our >>>>> best bet: we know that this >>> >>> 'best bet' in a kernel is not enough :) Which is why userland tools >>> can fix and/or do any sort of crazy stuff with the backup and recover >>> the primary etc etc. >> >> Drive blocks go bad; the redundant GPTs are there to let the >> system keep booting and running if that happens. >> >> Rewriting the bad GPTs is what should require user intervention. >> >>> >>>>> block device is meant to use GPT (because any other partitioning >>> system >>>>> would've presumably overwritten sector 0), and we know that if the >>>>> alternative GPT is valid it should contain more accurate >>> information >>>>> than parsing the protective/hybrid MBR with msdos_partition() >>> would >>>>> yield (which would otherwise be what happens next). >>> >>>> I guess "force_gpt" (and "gpt" on kernel command line) exists to >>>> force users to think and care about a reason why the device has >>>> unreadable (broken) primary GPT header. >>> >>> Yes, from find_valid_gpt(): >>> >>> * If the Primary GPT header is not valid, the Alternate GPT header >>> * is not checked unless the 'gpt' kernel command line option is >>> passed. >>> * This protects against devices which misreport their size, and >>> forces >>> * the user to decide to use the Alternate GPT. >>> >>> ... so users are at least forced in some way to think about this. >>> >>>> It seems like bad (and dangerous) idea to silently ignore corrupted >>>> primary GTP header and boot from such device. >>> >>> Yeah, there's no way in hell I trust a backup gpt in kernel space. >>> We simply have no way of distinguishing between good and bad devices. >>> >>>> And note that alternative GPT header and the end of the device is a >>>> just guess. The proper location of the alternative header is >>>> specified with-in primary header (pgpt->alternate_lba). The header >>>> at the end of >>>> the device (as used for "force_gpt") is a fallback solution only. >>> >>> And this only illustrates the ambiguity of the backup. >> >> The UEFI specification is not ambiguous - you should always look >> for the backup GPT Header at the last LBA: >> >> "Two GPT Header structures are stored on the device: the primary >> and the backup. The primary GPT Header must be located in LBA 1 >> (i.e., the second logical block), and the backup GPT Header must >> be located in the last LBA of the device." >> >> If the primary GPT Header is corrupted (e.g., CRC is bad), you >> cannot trust any fields in it, including the Alternate LBA field. >> The Alternate LBA field is there to help you tolerate failures >> while growing or shrinking the block device size (not important >> for individual physical drives, but an issue for logical drives >> presented by RAID controllers). >> > > What the UEFI spec stipulates is not really relevant for the kernel. > So the firmware must use the backup GPT if the CRC of the primary one > indicates that it is corrupted, fine. Once we are in the kernel, the > policy is currently different, which makes sense since we are not only > mounting the boot device, but other block devices as well. > No, it is relevant considering that it's the authoritative standard for the GPT format. Sure, we have to deal with other block devices. The fact is though, we currently refuse to do anything with a disk that has a corrupted primary GPT, but a valid secondary. I agree that the user needs to be notified somehow that something is wrong, but refusing to work is not a user friendly behavior, and doesn't really give much specific information about what's wrong (keep in mind, most typical desktop users won't look at kernel logs, and a lot of people using embedded devices can't). For what it's worth, Windows 7 and newer will properly read partitions on a disk with a corrupt primary GPT and a valid secondary, and I'd be willing to bet that OS X does so as well.