py/emitnative: Put a pointer to the native prelude in child_table array.

Some architectures (like esp32 xtensa) cannot read byte-wise from
executable memory.  This means the prelude for native functions -- which is
usually located after the machine code for the native function -- must be
placed in separate memory that can be read byte-wise.  Prior to this commit
this was achieved by enabling N_PRELUDE_AS_BYTES_OBJ for the emitter and
MICROPY_EMIT_NATIVE_PRELUDE_AS_BYTES_OBJ for the runtime.  The prelude was
then placed in a bytes object, pointed to by the module's constant table.

This behaviour is changed by this commit so that a pointer to the prelude
is stored either in mp_obj_fun_bc_t.child_table, or in
mp_obj_fun_bc_t.child_table[num_children] if num_children > 0.  The reasons
for doing this are:

1. It decouples the native emitter from runtime requirements, the emitted
   code no longer needs to know if the system it runs on can/can't read
   byte-wise from executable memory.

2. It makes all ports have the same emitter behaviour, there is no longer
   the N_PRELUDE_AS_BYTES_OBJ option.

3. The module's constant table is now used only for actual constants in the
   Python code.  This allows further optimisations to be done with the
   constants (eg constant deduplication).

Code size change for those ports that enable the native emitter:
   unix x64:   +80 +0.015%
      stm32:   +24 +0.004% PYBV10
    esp8266:   +88 +0.013% GENERIC
      esp32:   -20 -0.002% GENERIC[incl -112(data)]
        rp2:   +32 +0.005% PICO

Signed-off-by: Damien George <damien@micropython.org>
diff --git a/tools/mpy-tool.py b/tools/mpy-tool.py
index 3ebbdd1..2974e35 100755
--- a/tools/mpy-tool.py
+++ b/tools/mpy-tool.py
@@ -824,7 +824,7 @@
         for rc in self.children:
             rc.disassemble()
 
-    def freeze_children(self):
+    def freeze_children(self, prelude_ptr=None):
         # Freeze children and generate table of children.
         if len(self.children):
             for rc in self.children:
@@ -834,10 +834,12 @@
             print("static const mp_raw_code_t *const children_%s[] = {" % self.escaped_name)
             for rc in self.children:
                 print("    &raw_code_%s," % rc.escaped_name)
+            if prelude_ptr:
+                print("    (void *)%s," % prelude_ptr)
             print("};")
             print()
 
-    def freeze_raw_code(self, qstr_links=(), type_sig=0):
+    def freeze_raw_code(self, prelude_ptr=None, qstr_links=(), type_sig=0):
         # Generate mp_raw_code_t.
         print("static const mp_raw_code_t raw_code_%s = {" % self.escaped_name)
         print("    .kind = %s," % RawCode.code_kind_str[self.code_kind])
@@ -849,6 +851,8 @@
         print("    #endif")
         if len(self.children):
             print("    .children = (void *)&children_%s," % self.escaped_name)
+        elif prelude_ptr:
+            print("    .children = (void *)%s," % prelude_ptr)
         else:
             print("    .children = NULL,")
         print("    #if MICROPY_PERSISTENT_CODE_SAVE")
@@ -1112,8 +1116,25 @@
 
         print("};")
 
-        self.freeze_children()
-        self.freeze_raw_code(self.qstr_links, self.type_sig)
+        prelude_ptr = None
+        if self.code_kind == MP_CODE_NATIVE_PY:
+            prelude_ptr = "fun_data_%s_prelude_macro" % self.escaped_name
+            print("#if MICROPY_EMIT_NATIVE_PRELUDE_SEPARATE_FROM_MACHINE_CODE")
+            n = len(self.fun_data) - self.prelude_offset
+            print("static const byte fun_data_%s_prelude[%u] = {" % (self.escaped_name, n), end="")
+            for i in range(n):
+                print(" 0x%02x," % self.fun_data[self.prelude_offset + i], end="")
+            print("};")
+            print("#define %s &fun_data_%s_prelude[0]" % (prelude_ptr, self.escaped_name))
+            print("#else")
+            print(
+                "#define %s &fun_data_%s[%u]"
+                % (prelude_ptr, self.escaped_name, self.prelude_offset)
+            )
+            print("#endif")
+
+        self.freeze_children(prelude_ptr)
+        self.freeze_raw_code(prelude_ptr, self.qstr_links, self.type_sig)
 
 
 class MPYSegment: