Switch From Scratch

2019-01-19

Switch From Scratch

I recently received went ahead and bought yet another variation of the Nintendo Switch. This one is different though. Jetson-TX1 devkit .

It's a devkit for the Tegra X1, the SoC underlying the Nintendo Switch. I originally got this new piece of hardware to port KFS, my Horizon/NX kernel reimplementation, to ARM64. But having just finished the god-awful PR to bring documentation coverage to 100%, I needed a break from the Kernel and some time to work on something silly.

So I have this devkit for the same SoC that the Nintendo Switch OS, Horizon/NX, is built for. Soo... How hard could it be to run the original Horizon/NX on this thing? This might sound like a silly idea at first, but there are some really good reason to do this. The Jetson TX1 has UART exposed and very easily connected to, providing a very simple way to get debug output. It also sports a JTAG debugger, which might allow debugging the kernel, trustzone, and everything in-between (although JTAG debuggers for the Cortex-A57 are ridiculously expensive).

So with this silly idea in mind, I went ahead...

First steps

The first thing I did was try booting Hekate with Fusee-Gelee. FG is absolutely useless on a Jetson-TX1: RCM is usable and in fact the first thing we're kind of supposed to do when receiving a Jetson-TX1 is use RCM mode to flash an up-to-date Linux.

So anyways, I went ahead and ran fusee-launcher with Hekate. At first, fusee-launcher wouldn't find the Jetson-TX1, but that was easily fixed: the devkit uses a different USB VID/PID pair than the Nintendo Switch. Fusee-Launcher is already well-equipped to deal with this:

# python3 fusee-launcher.py -V 955 -P 7721 hekate-weird
Setting ourselves up to smash the stack...
Uploading payload...
Smashing the stack...
The USB device stopped responding-- sure smells like we've smashed its stack. :)
Launch complete!

No way, it works! But... no output on the UART console. Looks like hekate just froze. What's wrong? I dug into the source code, and discovered it's supposed to print a message to UART very early on:

void ipl_main()
{
	config_hw();

	//Pivot the stack so we have enough space.
	pivot_stack(0x90010000);

	//Tegra/Horizon configuration goes to 0x80000000+, package2 goes to 0xA9800000, we place our heap in between.
	heap_init(0x90020000);

	uart_send(DEBUG_UART_PORT, (u8 *)"Hekate: Hello!\r\n", 18);
    // ...
}

Four functions in and I was failing already. I figured I needed a debugging primitive so I could figure out if the flow of execution reached one of those points. I remembered that there exists a register I could write to in order to cause the CPU to reboot to RCM. I could probably use this to figure out if I reached some code or not:

// Stolen shamelessly from stuckpixel's reboot_to_rcm.
void reboot_to_rcm() {
    u64 virt_addr = 0x7000E400;
    u64 scratch0 = virt_addr + 0x50; //Scratch register 0 is at PMC base + 0x50
    *(u32 *)scratch0 = 1 << 1; //Bit 1 set to 1 makes the SoC go into RCM on reboot
    *(u32 *)virt_addr |= 1 << 4; //reboot without clearing scratch 0
    while(1); //we should never reach here
}

I put this function right at the start of the main, and lo and behold, we rebooted to RCM. Good, so at least we do reach Hekate's ipl_main! By moving the reboot_to_rcm around, I pin-pointed the culprit to when I was returning from heap_init. At that point, I started asking around, and CTCaer immediately had the right insight: the SDRAM configuration must be wrong.

You see, before using the RAM, it needs to be configured. When hekate starts, it's using IRAM (integrated RAM). config_hw configures various parts of the SoC, such as the DRAM, and pivot_stack moves the stack to the DRAM. But if the DRAM was misconfigured, we might end up freezing here.

After a bit of fighting, I managed to extract the SDRAM configuration from the Linux installed on the Jetson-TX1. It is stored in the BCT, and the UART console tells us which entry in the BCT (there are multiple config) it is using. By copying the config into hekate, I managed to get DRAM working!

At this point, hekate worked, but I had no way to interface with it. I had to write a bit of code that redirected the GUI to the UART so I could dig into the menus. I didn't spend too much time on it, so it can be a bit buggy. Buggy GUI

The sdMMC

The next step was trying to "launch" CFW. I was quickly met with "Failed to mount SDcard". Ugh. CTCaer, once again, had the right insight immediately: "gpio/pinmux nightmare". This wasn't immediately clear to me, but a bit of googling later, I discovered that Nvidia actually had a bit of hardware to allow making each GPIO pin service multiple hardwares. The pinmuxing hardware is in charge of configuring which device a GPIO pin is currently targetting, among other thing.

I spent a looooooot of time staring at the DTS and the _sdmmc_config_sdmmc1 function responsible for setting up the pinmuxing hardware. After several hours staring at the code and comparing various DTS (notably comparing the jetson TX1 and F0F Switch-Linux Device Trees), I finally figured out that the Enable SD Card Power pin was indeed different: the Nintendo Switch uses GPIO Port E Pin 4, whereas the Jetson TX1 uses GPIO Port Z Pin 3.

A quick fix later, and sdMMC was working properly!

Installing Horizon/NX

At this point, I have a fully working Hekate (at least as far as I can see). The next step is going to be to install Horizon/NX on there. I was suddenly hit by a pleasent and an unpleasent surprise: the eMMC was working properly, but it was only 16GB large (vs 32GB on the Switch). This meant I couldn't just take a NAND backup and restore it on the Jetson TX1 eMMC, I'd have to muck around with the GPT. And I knew from previous conversations with rajkosto that the GPT on the switch can be a bit capricious...

I went ahead and took a dump of a real switch in order to access the real GPT. I also got a dump of the GPT table of the normal Jetson-TX1 Linux install. I then went ahead and compared the two. Mostly, what I needed to do was to take the Switch GPT, and change the max_lba (last usable block on disk) and the size of the USER partition so they coincided with the GPT from the Jetson-TX1. And of course, recalculate the CMAC32. Nothing a hex editor couldn't do in a couple minutes... After a few hours of work, I had a GPT partition I could flash!

Meanwhile, I started looking for how I was going to actually access the NAND. @Thog suggested booting u-boot from RCM in order to gain access to UMS (if you've ever used memloader to mount your switch's eMMC on your computer, that's actually u-boot you were using). He even got me a script that would use the Jetson Driver Kit to boot u-boot from RCM <3.

Thanks to this, I could mess around the Jetson's EMMC as if it was a USB on my computer, and even if I messed up the partitioning super badly, I could always flash a new firmware and be good to go. So I went ahead and flashed the GPT:

cat jetsontx1.gpt > /dev/sdc

And loaded the disk into HacDiskMount... Fun discovery: GPT actually has its headers in two different location in order to fight corruption! And HacDiskMount helpfully told me that the primary and secondary GPT weren't matching. I fixed this by opening the disk in fdisk and letting it write the (existing) partition scheme. That would overwrite the secondary GPT with the correct data from primary.

We now have our switch partitioned. Let's get to the flashing! I got ChoiDuJour to generate all the files I'd need to flash, and went ahead and flashed the various BCPKG2* partitions, as those required no encryption. I then faced two problems: I needed to encrypt the other partitions with the BIS Keys, which I did not have, and I needed to flash the BOOT0 partition, which contained the FUSE burning code...

I ReFuse

I'm absolutely horrified of the idea that I might accidentally burn a fuse. That would be extremely annoying, even moreso now that SciresM gave me an awesome insight:

SciresM: @roblâbla burnt fuses of zero will allow you to boot early early 1.0.0s
SciresM: (pre 1.0.0-7)
SciresM: so that is pretty cool :)

Oh fuck. That is pretty cool. Now there's a ton of pressure to avoid burning the fuses...

The BOOT0 contains many important things. It contains the package1ldr, the first piece of code to run after the bootROM, which contains the fuse burning logic. It also contains the NX-Bootloader, the Secure Monitor, and the Warmboot. And most importantly, it contains the Keyblobs, which are the master key of the kingdom. Most of those components need to be there for Hekate to successfully boot Horizon/NX.

Also, Keyblobs need to be encrypted too! And somewhat annoyingly, they need the TSEC key (same key necessary to derive the BIS Keys), which requires package1 to be present to be dumped... What a Catch-22. With the help of shchmue, I did a bit of hacky patchwork around the hekate function to dump the TSEC Key to use a package1 file from SD Card instead of attempting to read it from the BOOT0. Thanks to this (and a few hours of debugging later), I got both the TSEC key and the BIS key, which would unlock the situation.

Encrypting the keyblobs turned out to be a bit complicated. Nobody had written any code to do it. Furthermore, I was faced with a bit of a fun problem: the SBK on a Jetson TX1 is 00000000000000000000000000000000, and I was intending to keep it that way. But [hactool] doesn't like that key. It's using it as a marker for "no key present" in its code. So I wrote a bit of code for [linkle], my switch multitool, to generate encrypted keyblobs from a keyblob, a keyblob_key and a keyblob_mac_key. I then modded ChoiDuJour to have it generate a complete BOOT0 using those keyblobs.

I now had a BOOT0 ready for use on my switch, but there remained a problem: it was a dangerous BOOT0 containing fuse-burning code. I had to do something about this. I decided to take the nuclear option: I zeroed out both instances of the Package1Ldr's .text section in the BOOT0. This section is unused anyways, since hekate fully replaces the bootloader. With this done, I could flash the BOOT0.

I booted ums on the mmc 0.1 partition in order to access BOOT, and started flashing:

cat BOOT0 > /dev/sdc

Now I just had to reboot hekate...

Initializing...

Identified pkg1 ('20161121183008'),
Keyblob version 0

Loaded pkg1 and keyblob
Generated keys
Decrypting pkg1
Unpacking pkg1
Unpacking warmboot from 90025610 to 8000D000 size 65504
Skipping 1
Unpacking warmboot from 90045534 to 8000D000 size 3828
Decrypted and unpacked pkg1
Patching Warmboot
Patching Security Monitor
Loaded warmboot.bin and secmon
Read pkg2
Parsed ini1
Patching kernel initial processes
Rebuilt and loaded pkg2

Booting...

Success \o/. It'd be nice to verify that we have an actually working kernel though. I went ahead and enabled debug_mode in the hekate IPL, which gave me this:

Break() called. 0100000000000002

Success \o/. NCM is failing, probably because I didn't flash the SYSTEM partition. But we're getting into the kernel, and even getting in userland!

Brick 1

This is where things went south. For some reason, after doing things, my Jetson TX1 started bugging. It started with a life-wrecking experience where it would not respond to RCM, no output on UART, and the CR2 LED (indicating the 3v3/1v8 power rail) was off... I had somehow bricked my Jetson TX1. Welp.

I did a bit of experimenting, and after making sure everything worked correctly with a multimeter, I realized something odd: the 1v8 rail had a power output of 4.8V. That is not normal. Filled with sadness, I decided to RMA the unit to get a replacement.

Attempt 2

Date: 2019-02-15

Received the new board. Got an extra camera module. Nice. Not sure what I'll do with that yet.

After talking to CTCaer a bit more, it was brought to my attention that the PMIC (the thing that manages power on the Jetson and Switch) needed to be configured properly for the board. If it's misconfigured, it might end up sending incorrect voltages to the various devices, potentially frying them! :scream:.

Hekate's PMIC configuration is a bit messy. The good news is, there is a central driver through which everything should go through, in the [max7762x.c]. Bad news is that a lot of Hekate's code kinda just manually talk to the chip, issuing raw I2C commands instead of going through the driver. Especially in the config_hw function.

So here's the plan:

I'm going to change all those raw i2c_send_byte calls to the appropriate max77620_regulator_set_voltage calls.
I'm going to fix the _pmic_regulators global so that it matches the Jetson TX1 instead of the Nintendo Switch. I'll figure out the correct value using [nvidia's Device Tree].
Finally, I'm going to need to find all the Pinmux Configuration from hekate, and make sure they are correct as well, since a wrong pinmux configuration might lead to sending the wrong voltage levels to a device. For the pinmux configuration, nvidia has a nice repository with the configs for various boards and script to generate the header files for the different projects using them (linux, uboot, etc...). I'll take the values from there.

Sidenote: Nvidia's naming convention for the boards is, erm, fun. Evidently, my board is a Jetson TX1, and the "id" for it is a p2371-2180, based on the fact that in the driver package, the jetson-tx1.conf script is symlinked to the p2371-2180-devkit.conf file. However, that same file also uses the p2597-2180 DTS... So I guess those two are equivalent.

Thanks

Many thanks to everyone who helped me out here! I wouldn't have done this without you <3.

Thog and Ac_K, for helping me and listening to my endless rambling :P
CTCaer, helped me a bunch with the initial hekate debugging
shchmue, helped me figure out how to dump the TSEC Keys
And everyone else that motivated me to get this working!

Tools

You can find all the modifications I've made here:

hekate: https://github.com/roblabla/hekate (missing UART GUI)
linkle: https://github.com/megaton-hammer/linkle

The Arc Subfield Pattern

While working on my Horizon/NX reimplementation, I came across a somewhat fun pattern. Horizon/NX has a structure that looks like this:

struct KPort {
    KAutoObject refcount;
    KServerPort serverside;
    KClientPort clientside;
};

This structure is allocated via a SlabHeap. Both the Client and Server are handed separately, yet they all refer to the same (global) refcount. Once that refcount drops to 0, the whole KPort (including both substructures) will be deallocated.

Horizon/NX is written in C, so all operations are manual anyways. But porting this pattern to Rust, while making it as Rust-y as possible (read: using RAII), is non-trivial. We basically want the ability to map an Arc<T> to an Arc<T.field>. Hmm...

The boilerplate

Let's start by writing some boilerplate:

// Their content don't really matter...
pub struct ServerPort(());
pub struct ClientPort(());

pub struct Port {
    server: ServerPort,
    client: ClientPort
}

impl Port {
    pub fn new() -> Arc<Port> {
        Arc::new(Port {
            server: ServerPort,
            client: ClientPort
        })
    }
}

Nothing crazy so far. Here's an important observation: it should be impossible to construct ServerPort/ClientPort for the construct I'm about to show to be safe!

Now, we want to create a wrapper around ClientPort that represents the shared ownership with a Port. (you'd want to do this for ServerPort also).

struct ClientPortArc(*const ClientPort);

impl Port {
    fn client(this: Arc<Self>) -> ClientPortArc {
        unsafe {
            // Safety: no actual dereferencing is happening. We are merely
            // getting the address of one of the field.
            ClientPortArc(&(*Arc::into_raw(s)).client)
        }
    }
}

impl Deref for ClientPortArc {
    type Target = ClientPort;
    fn deref(&self) -> &ClientPort {
        unsafe {
            // Safety: We guarantee that the raw pointer is valid at
            // construction. Furthermore, since we leak an Arc with into_raw,
            // we know the pointer shouldn't get freed from under our feets.
            &*self.0
        }
    }
}

We now have our wrapper type. Now, we need to drop it. This is where the magic comes in. We have a pointer to a subfield. We know that subfield comes from a wrapper struct (Port). We now want to get a pointer back to the wrapper struct, hopefully in a generic way (let's not hardcode offsets, please).

Introducing: `container_of`

container_of is a magical macro. I first met it in the Linux Kernel, in their implementation of intrusive linked lists. It takes three arguments: a wrapper type name, the name of a member in that type, and a pointer to that member. Through pointer arithmetic, it will deduce a pointer to the wrapper type.

In other words, we can do this:

let client: ClientPortArc = panic!("whatever");
let port = container_of!(client.0, Port, client);

And get back a pointer to the Port that contained the ClientPortArc. I won't bore you with the details of how the container_of macro actually works (it's not complicated, but kinda messy, and has some interesting implications with UB. Which is especially complicated in Rust given what's UB and what isn't is still not defined). But here's a link if you want all the details. As for the rust version, it can be found in the intrusive-collections crate.

Goint back to our ClientPort. We can now use this to recover the Arc we leaked, and let it drop the refcount!

impl Drop for ClientPortArc {
    fn drop(&mut self) {
        unsafe {
            // Safety: We are guaranteed that ClientPortArc was created from an
            // Arc, and is built from within the wrapper.
            Arc::from_raw(container_of!(self.0, Port, client))
        }
    }
}

This should all work, but it's a lot of boilerplate to write each time we want to create such a wrapper. Can we do better?

Macros. They're good for you.

Each time I find such a pattern, I try to wrap it in a macro, and thoroughly document it. Now, there is one very important thing about macros: they can be extremely opaque to readers. As such, it's important for the macro designer to make it look as close as possible to normal rust code, so we can match people's expectations.

Let's start with how we want our macro to look:

arc_wrapper! {
    /// A wrapper around a ClientPort, providing shared ownership with the
    /// parent Port.
    pub struct ClientPortArc(ClientPort) wrapping Port, client {
        /// Get an owned reference to the Client part of this Port.
        fn client();
    }
}

It's kinda messy, but I couldn't think up of a better way. The function defined inside the {} will be added to Port, and will have the same prototype as our client function from earlier.

Now, let's try to implement the macro.

macro_rules! arc_wrapper {
    ($(#[$meta:meta])* $vis:vis struct $ty:ident($innerty:path) wrapping $container:path, $field:ident {
        $(#[$methodmeta:meta])*
        fn $methodname();
    }) => {

Nothing crazy so far. Something interesting to keep in mind: doc-comments in the form of /// will get turned into #[doc = "comment"] before being passed to macros. Hence why we take a list of meta here. Also, the $vis macro type will only be stable from 1.30 onward.

        $(#[$meta])*
        $vis struct $ty(*const $innerty);
        
        impl $container {
            $(#[$methodmeta])*
            $vis fn $field(s: Arc<Self>) -> $ty {
                unsafe { $ty(&(*Arc::into_raw(s)).$field) }
            }
        }

We're defining our new type, and creating a getter on the container struct to return our wrapper.

        impl Deref for $ty {
            type Target = $innerty;
            
            fn deref(&self) -> &$innerty {
                unsafe {
                    &*self.0
                }
            }
        }
        
        impl Drop for $ty {
            fn drop(&mut self) {
                unsafe {
                    Arc::from_raw(container_of!(self.0, $container, $field));
                }
            }
        }
    }
}

Switch From Scratch

Switch From Scratch

First steps

The sdMMC

Installing Horizon/NX

I ReFuse

Brick 1

Attempt 2

Thanks

Tools

The Arc Subfield Pattern

The Arc Subfield Pattern

The boilerplate

Introducing: container_of

Macros. They're good for you.

Introducing: `container_of`