Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965015AbeAITbg (ORCPT + 1 other); Tue, 9 Jan 2018 14:31:36 -0500 Received: from mail-co1nam03on0079.outbound.protection.outlook.com ([104.47.40.79]:35855 "EHLO NAM03-CO1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S964998AbeAITbc (ORCPT ); Tue, 9 Jan 2018 14:31:32 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Christian.Koenig@amd.com; Subject: Re: [BISECTED] v4.15-rc: Boot regression on x86_64/AMD To: Linus Torvalds Cc: Bjorn Helgaas , Aaro Koskinen , Andy Shevchenko , Linux Kernel Mailing List , linux-pci@vger.kernel.org, Boris Ostrovsky , Juergen Gross References: <20180105220412.fzpwqe4zljdawr36@darkstar.musicnaut.iki.fi> <628e2b58-b16b-5792-b4ef-88bec15ab779@amd.com> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <5d794f18-f54c-3ca6-3869-5f5e2825b1ad@amd.com> Date: Tue, 9 Jan 2018 20:31:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Originating-IP: [2a02:908:1251:8fc0:4c6d:7233:b7e1:3b88] X-ClientProxiedBy: HE1PR06CA0139.eurprd06.prod.outlook.com (10.170.251.26) To CY4PR12MB1301.namprd12.prod.outlook.com (10.168.168.138) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: d44d0a22-ee58-444d-8002-08d5579795da X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(48565401081)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603307)(7153060)(7193020);SRVR:CY4PR12MB1301; X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1301;3:wEwS5sUSjRFI1evtxEXfJt6nhLM5I7BFWyoS5BedpDv0offP+o9ndGfKUpRn29JEtrWSRZHEIz3NNLoRiVQaw5VBD6Ri59eMDzNd/N5UrZkw8fcWOTnyfoIIWK+70edxIG4+1ErDJFZGyxI/Mz+1CJ3W5RGr64Q/S3n2JdiWt11CPzmagYpacjsoKyJwrk2ZqdlR7qsd2pZZsXyDVwzGknA7VOG2MdV37Bu+r/T9WkyokhOSs+E5Px8q5jpFDPa5;25:C09dD1/bxoxrw8vol87hklxWN1CmbePwWFAnwr0wtkfPGqjEy8y2PC+YNzqMiyDBXnMHy7IWqX+iDc0xNvNWpXZIFWx29pgIOD++vPSOXsCDD1rPAhDTtwFi/B2z50QyF1tCF1wFIn0dcbc00k360ljwceIw9w3HRh5IIoEQnR0YBZolnP187viNRtzlUu620aAkX/UrgDHvZKrHi1xG07gLJNVMFI3LkVt1XIgsz2+XcI//uIzm4HmU/etqvtV/UN0e3jTwcbxVSgB/53DcZ0tbSy7H7StlsHG+9cwdwibQoBoYeM8ZjDo7YVD7NozcZx8jZofOg3iiKwuUs+Ji2w==;31:P+RKvtvC5XH8OSpx1RdVnQIUHyGBPq6o+fejsTFvE/kgkaRInQK/+oeZ2Fyk16Ou7SECcMG18jT+McM8pEvwJM0jPn+YncmaQ8o7GuRETGo3+mp8zeVvWQUIi/xiRQJlLtCOvSanWUB/sWSHxF7Vr+q6enFAhCfPwa6rQOSxBOHY0pouknGTQ+ReMtUkqDCHGNCrSw3JcBf3LQcMsMzF/GsQg2a7X1LALkShXU3oUrs= X-MS-TrafficTypeDiagnostic: CY4PR12MB1301: X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1301;20:ZOnyqNHEFjUnuYydP2fOrkgfVCuYWaLP8uPvy1dlXuzqgNH/H+Le9DDNALRBGIIWIzHYx851MRPWvpbpcIosvdVMjvSqn5MSWbyZq4df7n1Y5W2f/1qu07suWy/l1ChN0drg/OikE2bxt0KCU+3sTneRtTDqj6YwPgi+MuJqPj4glQyVCbdHvNSWzXKeEtYN4+3goGkSCgBS+aSpfjybcMClF2JvyrsbqZsLgtExA6d/Lvcpu6doMUMhnh+VL3tu/PrshQdjDHqa1s4m0k0Pn2J0OT2BI2reC6BqVZ5mrBvKT6eJjepXWkUyu9Ipi6Lb9ce7cLD2CRoBAIpIYCvz34evnr2dLsk0N9peemNwbs0LzevsvQZsrKz1o3ampxG9CbBR6JZnnUSQWIbKhBq0U2XlHT51rpejXVknzZpkLpgXbzSnz3v4CRb7Zjzll1RxQu9s4peei3wAIv2fyQQ/don1loe60oVNMqCHCQKvr6Ln/sP1uxRQhuzDWb/bPtx1;4:Zl5StdaFOqq0Hi2oAj402Ui6zG68e+k19wCE4NhePfDKG/aHUohk68mfrBTbDwxCjyFPrbUy9dL/k0YYZX4TfkfzFBWOa8HbJW4QtdcuvCfX0ArKniCPnAImvi4llWshZdSuJR5/DZi67Zlntu4p+rOW5Fet2pYTcr0VsAlg+cObI3fuWPHpHbnyLPOM4OiAUzJBLEErROKg69CRPMIL9WPjDAgaXcqWlFx4B1pK6v1wt7MDtH3cEXcrsepdaaTKyxFXlJh0y1T4auu2GOGTcaIQBRtLLTHiDTqJINZ/wC5FHnwRvdLKD8CUOoZKR4RM X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(767451399110); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040470)(2401047)(5005006)(8121501046)(3002001)(10201501046)(3231023)(944501075)(93006095)(93001095)(6055026)(6041268)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123560045)(20161123564045)(20161123562045)(20161123558120)(6072148)(201708071742011);SRVR:CY4PR12MB1301;BCL:0;PCL:0;RULEID:(100000803101)(100110400095);SRVR:CY4PR12MB1301; X-Forefront-PRVS: 0547116B72 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(366004)(396003)(346002)(39380400002)(376002)(39860400002)(24454002)(189003)(51444003)(199004)(2486003)(23676004)(67846002)(65956001)(52116002)(65806001)(47776003)(105586002)(52396003)(50466002)(54906003)(1706002)(106356001)(58126008)(93886005)(316002)(8676002)(81166006)(72206003)(64126003)(229853002)(6486002)(81156014)(31686004)(2906002)(2870700001)(6306002)(83506002)(6116002)(5660300001)(76176011)(305945005)(7736002)(8936002)(53936002)(25786009)(2950100002)(6916009)(65826007)(6666003)(4326008)(39060400002)(59450400001)(386003)(53546011)(52146003)(68736007)(5890100001)(36756003)(478600001)(86362001)(97736004)(31696002)(6246003);DIR:OUT;SFP:1101;SCL:1;SRVR:CY4PR12MB1301;H:[IPv6:2a02:908:1251:8fc0:4c6d:7233:b7e1:3b88];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtDWTRQUjEyTUIxMzAxOzIzOnRIUmJQcmM5R3hRcHVJaUhrcWFJV2s5L3pj?= =?utf-8?B?MWs3VHVPV0Z5L2wwZnZHOTNDakc3TCtHem1WMDNKdXlaV0hnT1g2NzlDZG9q?= =?utf-8?B?ZDB3VDkwNUFTZHBiUWpYRFViTWhtbTZzQ3ZmOTdsaExRVFBRbld2bUJZQ1BW?= =?utf-8?B?M1lyeWpZVmYwVVU2QWV2dGh0M0hFdEVyak5reTJNME9xeDV4WEQ1KzBnb3ZS?= =?utf-8?B?VHQzVnFBRGxtOTNxbVpTRVNzWVBnMDZRUlBRc1dtUjREc0RsellEeWttVjRx?= =?utf-8?B?Q0ZCRWpVUGxRbCtWQTJoY1RNLzgyQVVpYXVlb3dKSDN0WVNUeVpIeXlKWWE0?= =?utf-8?B?UXRtSm5aeThqUzJLSjAzSXZRcHBudXFiQ3lsNkpkVWxONWQ2ajdzZm1YNGR6?= =?utf-8?B?Q1NWaTlqbGxDblM1amJnZ3UrOGQ2NEJKNER0NUtqK09DTnJLNXhkb1FXdDhw?= =?utf-8?B?TWJBNkhxS3F0YkZvVFZ6cjRwMmdoek1vVWVqUjNwQkR2eHhycTJockpqUDFS?= =?utf-8?B?em9ES1VSdEE5Z0I5U3pEcGZlcnlodWY4QXNvL2grYThaYU5KaFNTYzZQbjZR?= =?utf-8?B?Rm5kK0tocjVsSGd0cmJRQzZERzBhRCt6WHZXOGNTUDJmNFRjN2NWNWRxUjQx?= =?utf-8?B?d3VUbDUrYlRrQjBJQjdZTEFXejhVSE5TUEwxQ2ZkUHpaRTZtaWRhbkcvUmcz?= =?utf-8?B?bkFyQnJ3bW05ZGVTRWhwZzlJaDRRbUgweFpFejVJUmd0K2JuZ3ZaVFdGdWJQ?= =?utf-8?B?dHJ5THgxOWZENDdUN3lNdFJDSE5naW53U0VYVXVoejQ0NXphSFhKSDRpMklV?= =?utf-8?B?Z2hEcWFueXdTWWludTlWZnJhQzY1Zi9raTR1MHVhbDdSOWdKRm5HY2tVaTFw?= =?utf-8?B?WENHU1lIbGtuZHhqeWczc3pnMGtHVGlZZUkvclg3eGxmTEZSdXVGbmFqTTIr?= =?utf-8?B?cVhWaEFCektzd3QyV2RTZExJcXNLLzloRmRqUFdVVFFueWVibGw5eDdvWUhs?= =?utf-8?B?aHB1ZHBtQm5GNXY5UTNqNUcxQU9lMjhmQTFlNitUY3YxMEo0bERxWjViQ2ps?= =?utf-8?B?dEZlb3l2aTVyeDYxTGlXdmVXcEpicnNmUXkzdmcvNWF2VmxydnE0anU1eTJx?= =?utf-8?B?bGdDbk9UdlF5WEJYUmlHNEswNVE3d2dGNUd0ZnlmWFpvLy9QQnB5dkdsbHg0?= =?utf-8?B?eGkzQitiMWtkdmpqSWNSRWZBQVVyRjFVSm5jWHVVdDIxRzZCbS8wd2ltUVI5?= =?utf-8?B?VElzWnpMamduYm1CYTJTNlR3OERWd0VWK1JqMEx2SXVrQ2FZUWo1c2pHNTlG?= =?utf-8?B?YmJDa25VR00wODQ1TW5FZFdCdnl3Nk1oTXk0NUdyek1pUFNiQXdrQVB0V3Fr?= =?utf-8?B?SzlqbkF3YmV1SWRkVldvVGdtUWRRYmNDY1AzTGM3K0ZSUUVIeUJVNk95b3p1?= =?utf-8?B?V0RVSWx1V29NdUFyRUJJUnFxbWRUNno0WDRPUjBKTzRBUlAyMytvMStWYi9V?= =?utf-8?B?eEFsQUh4TStUTlJDSGtzU3kyLy9ZUFdiVDVDSEtaMlJzNitZVU9wVWROTGtW?= =?utf-8?B?SlUvVS9vdmpLb1N5T0ZqNzIrSnhzdFpnclhYTzQySko5SDYxWmdvb3h1a2RI?= =?utf-8?B?RkxnbTJlaGZFSC9uc1pFcXJnQkNXOG44WEo1VVV3WE4wUUxVREJFQ3REMDN4?= =?utf-8?B?RHBmRkR0c0JYMHlSaXpyYnkvQm5pZnd1cWFKaVpvK0diRnpqcTlGK2RWbm1a?= =?utf-8?B?RFA4VzRrVG1FcHVFalh5TldpMW0vdTV2SHV1ekgvNFpkQVl6ZVlTTjFvYmtZ?= =?utf-8?B?MS8wNXdkY1VNb2ZQRjduTjA0RzVsMTlnTHNrTng2L0RnMlZVOFNhZ3M5SGxw?= =?utf-8?B?NkZSSnRGa25nR0xQalFtZkdQY2RnRFpjSFpGMWU4RWJSWEUyWlp5TURQRlVy?= =?utf-8?B?cHJER1dDcXFJR3ZjY3pHRFFIMFVTNlVTTE9JaEhoaDhpRkNxWU8ybmRBd1Bp?= =?utf-8?B?bXgwR0YyN05LSmE3YmwvZHBjQU95ZDZLVFladz09?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1301;6:/cg/kMwkDXrVC6RtHS/gPYNCbjWYbARgYFCGKGDOtvkCfQmYOVPNyK92ndR5EKXuebLI5rdwd0BLhPLW82nswshulXdSIIPb4IPS/ZNu4CpRshq8HTDul4p4iTjFp/jGomBM7ogMKF5gVcT9e1nwbclOkXHvXUduCWDMAdvlr2LkF2IuP651ZQi0RT4400PSe6jMhw1TFq5xwP0MzqJW32xUBjZu/lyNpKnh49pf7bScYeqdQVaRiXhbjTkB15FEmZGUKs+y2/OjE5sl23JdVhIONA8Sh7L6lWGTxFQJgqY5TJj8kxH9DNNllyD6qZ2Yq5bHaRgdTCnfWgriLiN8c47MPK7zz29XRle/D3Pe3+w=;5:EqNtVLuX3Dlq0wP03HYbElYAv1kbDw5yW3W5kupEjlK0Sxz68gup+NMbjzp3vDtL9a5YJGAnpTLuYG4NVjcJljfPTNoyzXZYfTGGzti2k/MPfGmAmXoI55PDBz3JagDlMy0iK2bH6fwHXsFOeVu9Arlks2+lS+JSSDCado4L/Uw=;24:X7Tx9daORiiCR6p2V2irvQuHRbtV4LIKXe/dphTztom0nmGHd6s7tpg+rzQXk9C01QUY9e/VXvX6VxUshKpMFDuTkWKs0V2iFyJQA4ana1k=;7:MT34ncch/OO3I6Sx83eCDCaKKtoQqpNpKxaK7WQWdRNPsw3Jtu6MPCNjdgYPNviDqLf5wQKJdflMLMoCvFvqGfTA+3oNtA8NN7Wus5ZaK0/GhqjrcSRc+8Ll1AHOSV99sMLYFrpkh3hw5Fza0+vRxVqIcNxRvZUtKxT62C9DoeUi4HW2pf72QPIho38mNwBTHl3xuGOJ6GRljnSLzYTncNt6uWOENCnya/oa0+L1uBQN8uqxp8WRUT0gN+bd/DwX SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY4PR12MB1301;20:CWLWLT9jcMhR7AacEX7ogkkjndfNoEYRrPylvUPCxkzIEu9ggKVUoJ5U0P6mmSzfpdtmTEHnJijXBeR6eKaF+bherEz/ysq2sw5zEGM/lg0Fi4SmUDjHYqd5ixzI52b0khJTl75f9wEqnObAHEx51pstu9mRHBjF/PPt5FMXB6XM7+Fufg7YcJbaxRYgAnsKQfr/2booyBJHl4KTIwJ0vB4pSUoD75mEommTNcsZvXxQJldSbYQoYwuUOafF06qG X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jan 2018 19:31:28.5471 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d44d0a22-ee58-444d-8002-08d5579795da X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1301 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Am 09.01.2018 um 20:18 schrieb Linus Torvalds: > On Tue, Jan 9, 2018 at 2:37 AM, Christian König > wrote: >> I tested a bit with Aaro and came up with the attached patch, it adds a 16GB >> guard between the end of memory and the new window for the PCIe root hub. >> But I agree with you that this is just a hack and not a real solution. > Guys, that last statement makes no sense. > > The *real* hack was that original patch that caused problems. > > I mean, just look at it. It has a completely made up - and bad - > default start, and then it tries to forcibly just create the maximum > window it possibly can. Well, not quite, but almost. > > Now *THAT* is hacky, and fragile, and a bad idea. It's a fundamentally > bad idea exactly because it assumes > > (a) we have perfect knowledge > > (b) that window that wasn't even enabled or configured in the first > place should be the maximum window. > > both of those assumptions seem completely bogus, and seem to have no > actual reason. > > This comment in that code really does say it all: > > /* Just grab the free area behind system memory for this */ > > very lackadaisical. > > I agree that the 16GB guard is _also_ random, but it's certainly not > less random or hacky. > > But I really wonder why you want to be that close to memory at all. Actually I don't want to be close to the end of memory at all. It's just what I found a certain other OS is doing and I thought: Hey, that has the best chance of working... But yeah, thinking about it I agree with you that this was probably not a good idea. > What was wrong with the patch thgat just put it the hell out of any > normal memory range, and just changed the default start from one > random (and clearly bad) value to _another_ random but at least > out-of-the-way value? Well Bjorn didn't liked it because I couldn't come up with a good explanation why 256GB is a good value in general (it is a good value for our particular use case). > IOW, this change > > - res->start = 0x100000000ull; > + res->start = 0xbd00000000ull; > > really has a relatively solid explanation for it: "pick a high address > that is likely out of the way". That's *much* better than "pick an > address that is right after memory". > > Now, could there be a better high address to pick? Probably. It would > be really nice to have a *reason* for the address to be picked. > > But honestly, even "it doesn't break Aaro's machine" is a better > reason than many, in the absence of other reasons. > > For example, was there a reason for that random 756GB address? Is the > limit of the particular AMD 64-bit bar perhaps at the 1TB mark (and > that "res->end" value is because "close to it, but not at the top")? That is actually a hardware limit documented in the BIOS and Kernel developers guide for AMD CPUs (https://support.amd.com/TechDocs/49125_15h_Models_30h-3Fh_BKDG.pdf). I should probably add a comment explaining this. > So I think "just above RAM" is a _horrible_ default starting point. > The random 16GB guard is _better_, but it honestly doesn't seem any > better than the simpler original patch. > > A starting point like "halfway to from the hardware limit" would > actually be a better reason. Or just "we picked an end-point, let's > pick a starting point that gives us a _sufficient_ - but not excessive > - window". Well that is exactly what the 256GB patch was doing. So as long as you are fine with that I'm perfectly fine to use that one. Christian. > Or any number of other possible starting points. Almost _anything_ is > better than "end of RAM". > > That "above end of RAM" might be a worst-case fall-back value (and in > fact, I think that _is_ pretty close to what the PCI code uses for the > case of "we don't have any parent at all, so we'll just have to assume > it's a transparent bridge"), but I don't think it's necessarily what > you should _strive_ for. > > So the hackyness of whatever the fix is really should be balanced with > the hackyness of the original patch that introduced this problem. > > Linus