py: Improve allocation policy of qstr data.

Previous to this patch all interned strings lived in their own malloc'd
chunk.  On average this wastes N/2 bytes per interned string, where N is
the number-of-bytes for a quanta of the memory allocator (16 bytes on 32
bit archs).

With this patch interned strings are concatenated into the same malloc'd
chunk when possible.  Such chunks are enlarged inplace when possible,
and shrunk to fit when a new chunk is needed.

RAM savings with this patch are highly varied, but should always show an
improvement (unless only 3 or 4 strings are interned).  New version
typically uses about 70% of previous memory for the qstr data, and can
lead to savings of around 10% of total memory footprint of a running
script.

Costs about 120 bytes code size on Thumb2 archs (depends on how many
calls to gc_realloc are made).
diff --git a/py/mpstate.h b/py/mpstate.h
index 42593e4..dd185a7 100644
--- a/py/mpstate.h
+++ b/py/mpstate.h
@@ -131,6 +131,12 @@
     // END ROOT POINTER SECTION
     ////////////////////////////////////////////////////////////
 
+    // pointer and sizes to store interned string data
+    // (qstr_last_chunk can be root pointer but is also stored in qstr pool)
+    byte *qstr_last_chunk;
+    mp_uint_t qstr_last_alloc;
+    mp_uint_t qstr_last_used;
+
     // Stack top at the start of program
     // Note: this entry is used to locate the end of the root pointer section.
     char *stack_top;