2022-02-08 15:27:18

by Ash Logan

[permalink] [raw]
Subject: [RFC] Upstreaming Linux for Nintendo Wii U

Hello,

I'm the lead dev on a downstream kernel with support for the Wii U[1],
Nintendo's previous-gen game console. You might have seen Emmanuel
<[email protected]> submitting some of the more self-contained
drivers recently[2][3]. I've gotten to the point where I'd like to look
at upstreaming the platform. Since we need to refactor all the patches
for upstreaming anyway, I thought it would be good to talk to the
experts ahead of time ;)

Some quick details about the platform:
- Tri-core PowerPC "Espresso" (750CL) @ 1.24GHz
- 2GiB DDR3-1600 (and a little over 32MiB of SRAM)
- "Latte" proprietary SoC with USB, SDIO, SATA, crypto, ARM9
coprocessor, Radeon R7xx GPU
- Curiously, the entire graphics pipeline from the original Wii, usually
powered off

The bulk of the interesting work for Linux is in the SoC, which is
pretty similar to the original Wii's in layout (we expect to share a lot
of drivers), with the addition of some more modern blocks.

The state of the downstream work:
- Basic platform init works, "easy" drivers like SDIO, SATA, accelerated
cryptography, etc. all here - some are even upstreamed already.
- Bootloader duties are performed by linux-loader[4], a small firmware
for the ARM coprocessor that idles once Linux starts.
- linux-loader handles a dtbImage right now and has a hardcoded memory
area to pass commandline parameters, parsed from a config file. I don't
expect that to be acceptable, eventually I'd like to move it to loading
vmlinuz directly and pulling the dtb off the SD card, similar to the
Raspberry Pi. Alternatively, petitboot, but kexec doesn't seem to work
right now.
- Linux itself runs tolerably (though given the hardware it should be
faster), with framebuffer graphics and basic support for most hardware,
with the notable exceptions of the WiFi card and the GPU.
- No SMP - will cover this later.

That's about the state of things. I'm not sure how much is or isn't
upstreamable, but right now I'm only thinking about getting the basic
platform support up and some core hardware. On that front, there are a
few decisions that need to be made and help that needs to be had, which
is where I hope you all can give some insight:

- USB only works with patches to the USB core[5] that appear to have
failed upstreaming before[6]. I don't really understand these well
enough to say what particular hardware restriction they're working
around. I do know that there's a curious restriction on DMA addressing
where most SoC blocks (including USB) can't see the SRAM at address 0,
but we worked around this using reserved-mem in the devicetree. Almost
all of the peripherals on Wii U are connected over USB, so having a
working stack is pretty important.
- The Radeon, despite being a mostly standard card, has its GPUF0MMReg
area mapped into the SoC's mmio, with no PCI bus in sight. The Linux
drivers (radeon, too old for amdgpu) seem to expect PCI, so some pretty
extensive patching would be needed to get that moving - not to mention
things like the proprietary HDMI encoder, which seems similar to the
PS4's[7]. Downstream, we have an fbdev driver, which I obviously don't
expect to get accepted.
- Both of those issues together means I'm not convinced an initial port
would have any viable output device. I would like to fix USB, though
barring that we could use a flat framebuffer that linux-loader leaves
enabled.
- Right now I've made a new platform (like ps3) rather than joining the
GameCube and Wii in embedded6xx, since that is marked as BROKEN_ON_SMP.
The Wii U is a 3-core system, though a CPU bug[8] prevents existing
userspaces working with it. Bit of a "cross that bridge when we get
there" situation, though I'm reluctant to prevent that possibility by
using a BROKEN_ON_SMP platform.
- Like the Wii before it, the Wii U has a small amount of RAM at address
zero, a gap, then a large amount of RAM at a higher address. Instead of
the "map everything and reserve the gap" approach of the Wii, we loop
over each memblock and map only true RAM[9]. This seems to work, but as
far as I can tell is unique amongst powerpc32 platforms, so it's worth
pointing out. (Note: I've been told this doesn't work anymore after some
KUAP changes[10], so this point might be moot; haven't investigated)
- Due to the aformentioned DMA restrictions and possibly a fatal
bytemasking bug on uncached mappings[11], I have been wondering if it'd
be better to just give up on the SRAM at address 0 altogether and use it
as VRAM or something, loading the kernel at a higher address.
- Like the Wii, the Wii U also takes a bit of a loose approach to cache
coherency, and has several SoC peripherals with big-endian registers,
requiring driver patching. USB already has devicetree quirks, but others
require more drastic measures. I expect we'll take that on a
driver-by-driver basis.

In terms of platform bringup, the key issue is whether to be embedded6xx
or not and what output device to use. Beyond that it's just things like
IRQ controller drivers, should be pretty straightforward. I think on our
end, we'll start rebasing to 5.15 (LTS) and start sending patches from
there. I know getting closer to HEAD is preferable, this project has
just moved very slowly in the past and being on LTS has been a lifesaver.

Please let me know your thoughts, suggestions and questions, I'm new to
this and want to make sure we're sending you the best submissions we can.

Thanks,
Ash
https://heyquark.com/aboutme

[1] https://linux-wiiu.org
[2] https://lkml.org/lkml/2021/5/19/391
[3] https://lkml.org/lkml/2021/10/14/1150
[4] https://gitlab.com/linux-wiiu/linux-loader
[5] https://gitlab.com/linux-wiiu/linux-wiiu/-/merge_requests/8/diffs
[6] https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-March/080705.html
[7]
https://github.com/eeply/ps4-linux/commit/b2e54fcc05d4ed77bcea4ba3f3bdc33cb3b318e0
[8]
https://fail0verflow.com/blog/2014/console-hacking-2013-omake/#espresso
(3rd paragraph, "In fact, the SMPization of the 750 in the Espresso is
not perfect...")
[9]
https://gitlab.com/linux-wiiu/linux-wiiu/-/blob/fabcfd93d47ba0d2105eec7f3b5d7785f2a69445/arch/powerpc/mm/pgtable_32.c#L273-L282
[10] https://lkml.org/lkml/2021/6/3/204
[11] https://bugs.dolphin-emu.org/issues/12565


2022-02-11 18:15:58

by Michael Ellerman

[permalink] [raw]
Subject: Re: [RFC] Upstreaming Linux for Nintendo Wii U

Ash Logan <[email protected]> writes:
> Hello,

Hi Ash,

I can't really answer all your questions, but I can chime in on one or
two things ...

> - Right now I've made a new platform (like ps3) rather than joining the
> GameCube and Wii in embedded6xx, since that is marked as BROKEN_ON_SMP.
> The Wii U is a 3-core system, though a CPU bug[8] prevents existing
> userspaces working with it. Bit of a "cross that bridge when we get
> there" situation, though I'm reluctant to prevent that possibility by
> using a BROKEN_ON_SMP platform.

I'm happy for it to be a new platform. I'd almost prefer it to be a
separate platform, that way you can make changes in your platform code
without worrying (as much) about breaking other platforms.

> - Like the Wii before it, the Wii U has a small amount of RAM at address
> zero, a gap, then a large amount of RAM at a higher address. Instead of
> the "map everything and reserve the gap" approach of the Wii, we loop
> over each memblock and map only true RAM[9]. This seems to work, but as
> far as I can tell is unique amongst powerpc32 platforms, so it's worth
> pointing out. (Note: I've been told this doesn't work anymore after some
> KUAP changes[10], so this point might be moot; haven't investigated)

We'd need more detail on that I guess. Currently all the 32-bit
platforms use the flat memory model, which assumes RAM is a single
contiguous block. Though that doesn't mean it all has to be used or
mapped, like the Wii does. To properly support your layout you should be
using sparsemem, but it's possible that's more trouble than it's worth,
I'm not sure. How far apart are the low and high blocks of RAM, and what
are their sizes?

> - Due to the aformentioned DMA restrictions and possibly a fatal
> bytemasking bug on uncached mappings[11], I have been wondering if it'd
> be better to just give up on the SRAM at address 0 altogether and use it
> as VRAM or something, loading the kernel at a higher address.

Don't you have exceptions entering down at low addresses? Even so you
could possibly trampoline them up to the kernel at a high address.

> In terms of platform bringup, the key issue is whether to be embedded6xx
> or not and what output device to use. Beyond that it's just things like
> IRQ controller drivers, should be pretty straightforward. I think on our
> end, we'll start rebasing to 5.15 (LTS) and start sending patches from
> there. I know getting closer to HEAD is preferable, this project has
> just moved very slowly in the past and being on LTS has been a lifesaver.

As I said I'm happy for it to be a new platform. If there ends up being
a lot of shared code we can always refactor, but embedded6xx is only
~1500 LOC anyway.

One thing that has come up with previous console port submissions is the
requirement for patches to be signed off. The docs are here if you
aren't familiar with them:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin

Otherwise your plan sounds good to me, 4.19 is pretty old so getting up
to 5.15 would be a good start. Then submit whatever bits you can and
chip away at it.

cheers

2022-02-12 01:45:59

by Christophe Leroy

[permalink] [raw]
Subject: Re: [RFC] Upstreaming Linux for Nintendo Wii U

Hi Ash,

Le 11/02/2022 à 12:29, Michael Ellerman a écrit :
> Ash Logan <[email protected]> writes:
>> - Like the Wii before it, the Wii U has a small amount of RAM at address
>> zero, a gap, then a large amount of RAM at a higher address. Instead of
>> the "map everything and reserve the gap" approach of the Wii, we loop
>> over each memblock and map only true RAM[9]. This seems to work, but as
>> far as I can tell is unique amongst powerpc32 platforms, so it's worth
>> pointing out. (Note: I've been told this doesn't work anymore after some
>> KUAP changes[10], so this point might be moot; haven't investigated)
>
> We'd need more detail on that I guess. Currently all the 32-bit
> platforms use the flat memory model, which assumes RAM is a single
> contiguous block. Though that doesn't mean it all has to be used or
> mapped, like the Wii does. To properly support your layout you should be
> using sparsemem, but it's possible that's more trouble than it's worth,
> I'm not sure. How far apart are the low and high blocks of RAM, and what
> are their sizes?

Can you provide details on what's happening with KUAP changes ?

You are pointing to series https://lkml.org/lkml/2021/6/3/204

Does it work when CONFIG_PPC_KUAP is not selected or doesn't it work
either ?

Are you able to bisect which commit of that series is the culprit ?

Thanks
Christophe

2022-02-14 07:53:52

by Ash Logan

[permalink] [raw]
Subject: Re: [RFC] Upstreaming Linux for Nintendo Wii U

Hi Christophe,

On 12/2/22 00:11, Christophe Leroy wrote:
> Hi Ash,
>
> Le 11/02/2022 à 12:29, Michael Ellerman a écrit :
>> Ash Logan <[email protected]> writes:
>>> - Like the Wii before it, the Wii U has a small amount of RAM at address
>>> zero, a gap, then a large amount of RAM at a higher address. Instead of
>>> the "map everything and reserve the gap" approach of the Wii, we loop
>>> over each memblock and map only true RAM[9]. This seems to work, but as
>>> far as I can tell is unique amongst powerpc32 platforms, so it's worth
>>> pointing out. (Note: I've been told this doesn't work anymore after some
>>> KUAP changes[10], so this point might be moot; haven't investigated)
>>
>> We'd need more detail on that I guess. Currently all the 32-bit
>> platforms use the flat memory model, which assumes RAM is a single
>> contiguous block. Though that doesn't mean it all has to be used or
>> mapped, like the Wii does. To properly support your layout you should be
>> using sparsemem, but it's possible that's more trouble than it's worth,
>> I'm not sure. How far apart are the low and high blocks of RAM, and what
>> are their sizes?
>
> Can you provide details on what's happening with KUAP changes ?
>
> You are pointing to series https://lkml.org/lkml/2021/6/3/204
>
> Does it work when CONFIG_PPC_KUAP is not selected or doesn't it work
> either ?
>
> Are you able to bisect which commit of that series is the culprit ?

Emmanuel told me about this during their work on 5.13 which I wasn't
involved in, and now can't remember any of the details, so I guess I
don't actually have any more information.
I'm working on getting a baseline setup for 5.15 (just udbg and the
like), so if there is an issue I should soon find out about it and will
get back to you.

> Thanks
> Christophe

Thanks,
Ash

2022-02-14 12:38:49

by Ash Logan

[permalink] [raw]
Subject: Re: [RFC] Upstreaming Linux for Nintendo Wii U

Thanks for your response!

On 11/2/22 22:29, Michael Ellerman wrote:
> Ash Logan <[email protected]> writes:
>> Hello,
>
> Hi Ash,
>
> I can't really answer all your questions, but I can chime in on one or
> two things ...
>
>> - Right now I've made a new platform (like ps3) rather than joining the
>> GameCube and Wii in embedded6xx, since that is marked as BROKEN_ON_SMP.
>> The Wii U is a 3-core system, though a CPU bug[8] prevents existing
>> userspaces working with it. Bit of a "cross that bridge when we get
>> there" situation, though I'm reluctant to prevent that possibility by
>> using a BROKEN_ON_SMP platform.
>
> I'm happy for it to be a new platform. I'd almost prefer it to be a
> separate platform, that way you can make changes in your platform code
> without worrying (as much) about breaking other platforms.

Sounds good to me! Since a lot of the architecture is the same as the
Wii and GameCube, maybe once things are working well for Wii U we can
look at refactoring those out too - a "nintendo" platform? Not a concern
for now though.

>> - Like the Wii before it, the Wii U has a small amount of RAM at address
>> zero, a gap, then a large amount of RAM at a higher address. Instead of
>> the "map everything and reserve the gap" approach of the Wii, we loop
>> over each memblock and map only true RAM[9]. This seems to work, but as
>> far as I can tell is unique amongst powerpc32 platforms, so it's worth
>> pointing out. (Note: I've been told this doesn't work anymore after some
>> KUAP changes[10], so this point might be moot; haven't investigated)
>
> We'd need more detail on that I guess. Currently all the 32-bit
> platforms use the flat memory model, which assumes RAM is a single
> contiguous block. Though that doesn't mean it all has to be used or
> mapped, like the Wii does. To properly support your layout you should be
> using sparsemem, but it's possible that's more trouble than it's worth,
> I'm not sure. How far apart are the low and high blocks of RAM, and what
> are their sizes?

From the devicetree:

memory {
device_type = "memory";
reg = <0x00000000 0x02000000 /* MEM1 - 32MiB */
0x08000000 0x00300000 /* MEM0 - 3MiB */
0x10000000 0x80000000>; /* MEM2 - 2GiB */
};

We could probably drop MEM0 without anybody missing it, so let's say a
256MiB gap between MEM1 and MEM2.
sparsemem does look like a good option, though I note it depends on
ppc64, so yeah, might be a lot of trouble for the benefit of two
platforms (Wii and Wii U).
I'm currently attempting to get something baseline running on 5.15, will
see if the memblock thing still works so I can have a patch for RFC.

>> - Due to the aformentioned DMA restrictions and possibly a fatal
>> bytemasking bug on uncached mappings[11], I have been wondering if it'd
>> be better to just give up on the SRAM at address 0 altogether and use it
>> as VRAM or something, loading the kernel at a higher address.
>
> Don't you have exceptions entering down at low addresses? Even so you
> could possibly trampoline them up to the kernel at a high address.

Maybe? Looking through head_book3s_32.S that appears to be the case.
Will probably stick with physaddr 0 for now then.

>> In terms of platform bringup, the key issue is whether to be embedded6xx
>> or not and what output device to use. Beyond that it's just things like
>> IRQ controller drivers, should be pretty straightforward. I think on our
>> end, we'll start rebasing to 5.15 (LTS) and start sending patches from
>> there. I know getting closer to HEAD is preferable, this project has
>> just moved very slowly in the past and being on LTS has been a lifesaver.
>
> As I said I'm happy for it to be a new platform. If there ends up being
> a lot of shared code we can always refactor, but embedded6xx is only
> ~1500 LOC anyway.
>
> One thing that has come up with previous console port submissions is the
> requirement for patches to be signed off. The docs are here if you
> aren't familiar with them:
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin

No problem, will make sure everything is signed off by the people involved.

> Otherwise your plan sounds good to me, 4.19 is pretty old so getting up
> to 5.15 would be a good start. Then submit whatever bits you can and
> chip away at it.
>
> cheers

Thanks,
Ash