NOTE: This code is not yet in a state to be committed, but I wanted to 
publish something at this point in order to start public discussions on 
the approach before I develop it further.  The code, as published has 
been tested on a model running a kernel that enables memory tagging, but 
there are a number of issues that still need to be resolved before I 
would even consider asking for a merge.  I'm not asking for a review of 
the code as much as a review of the approach at this point in time.

I'm posting this now before the Cauldron so hopefully we can have some 
useful discussions on the matter during that.

ARMv8.5 adds an extension known as MTE (memory tagging extension); the 
extension allows granules of memory (16 bytes per granule) to be 
individually 'coloured' with a 4-bit tag value.  Unused bits in the top 
byte of a 64-bit address pointer can then be set to describe the colour 
expected at that address and a protection fault can be raised if there 
is then a mismatch.  This can then be used for a number of purposes, but 
primarily is intended to assist with debugging a number of run-time 
faults that are common in code, including buffer overrun faults and 
use-after-free type errors.

Nomenclature: The AArch64 extension is called MTE.  I've tried to use 
the term 'memory tagging' (mtag) in the generic code to keep the layers 
separate.  Ideally mtag can be used on multiple architectures.

This patch proposes a way to utilize the extension to provide such 
protection in Glibc's malloc() API.  Essentially the code here is 
divided into four logical set of changes, though for the purposes of 
this discussion I've rolled this up into a single patch set.

Part 1 is a simple change to the configuration code to allow memory 
tagging support to built into glibc
Part 2 introduces a new (currently internal) API within glibc that 
provides access to the architecture-specific memory tagging operations. 
The API essentially compiles down either no-ops or existing standard 
library functions when the extension is disabled
Part 3 is the bulk of the changes to malloc/malloc.c to use the API to 
colour memory; I've tried to ensure that when the extension is disabled 
there is no overhead on existing users.  If the extension is enabled 
during the build, but disabled at run time (eg to support systems that 
do not have the extension), then there are some minor overheads, but 
they are hopefully not significant.
Part 4 is finally some target specific changes for AArch64; when MTE is 
enabled we have to be very careful about buffer overruns on read 
operations.  Consequently we have to constrain some of the string 
operations to ensure that they do not unsafely read across a tag granule 
boundary.  This code is very preliminary at present - eventually we 
would want to be able to select the code at run time and revert back to 
the standard code if tagging is disabled.

Parts 2 and 3 are obviously the focus of the discussion I'd like to have 
at present.

For part 2, the API is currently entirely private within glibc, but 
potentially in future (once we're happy that the concept is stable) it 
might be useful to open this up to users as well.

Part 3, which is the bulk of the changes colours all memory requests 
obtained through the malloc API.  Each call to malloc (or any of the 
more aligned variants) or realloc will return a coloured pointer if the 
extension is enabled

- for realloc I've chosen to recolour all affected memory even if the 
same logical address is returned (the pointer will contain a different 
colour, ensuring that before-and-after pointers will not compare equal). 
  This is perhaps the most extreme position, but in some cases that 
might catch assumptions or code that continues to use the pre-realloc 
address incorrectly.

- colouring is mostly done at the outermost level of the library.  This 
is not necessarily the most efficient point to do this, but it certainly 
creates the least disruption in the code base.  The main exception to 
this is realloc where the separation of the layers is not quite as clean 
as for the other entry points.  The advantage of colouring at the 
outermost level is that calloc() can combine the clearing of the memory 
with the colouring process if the architecture supports that (MTE does).

- one colour is retained for use by libc itself.  This can be used to 
detect when user code goes outside the allocated buffer region.

- free() recolours the memory; this is a run-time overhead but is useful 
for catching use-after-free accesses.

Limitations in the prototype:

MTE has a granule size (minimum colourable memory block) of 16 bytes. 
This happens to fit well with malloc's block header, which on aarch64 is 
also 16 bytes of size and thus leads to little overhead in the data 
structures.  I haven't attempted yet to look at support for other sizes, 
but I suspect that things will become a bit messy if the granule is 
larger than the block header (it will certainly be less efficient).

At present, the code simply assumes that all memory could be tagged 
(though it works correctly if it is not).  We are in discussions with 
the kernel folk about the possible syscall API extensions that might be 
needed to make requests for memory from the kernel tagable.

I've written enough for now.  Let the discussions begin...

-----

[mtag] Allow memory tagging to be enabled from the command line

This patch adds the configuration machinery to allow memory tagging to be
enabled from the command line via the configure option 
--enable-memory-tagging.

The current default is off, though in time we may change that once the API
is more stable.

[AArch64][mtag] Basic support for memory tagging

This patch adds the basic support for memory tagging.  This is very much
preliminary code and is unlikely to be in its final form.

- generic/libc-mtag.h - default implementation of the memory tagging
   interface.  Maps most functions onto no-ops, a few are mapped back onto
   existing APIs (eg memset).
- aarch64/libc-mtag.h - implementation for AArch64.
- aarch64/__mtag_* - helper functions for memory tagging (unoptimized).
- malloc/malloc.c - updates to support tagging of memory allocations.

[AArch64][mtag] Mitigations for string functions when MTE is enabled.

This is an initial set of patches for mitigating against MTE issues when
that is enabled.  Most of the changes are sub-optimal, but should avoid 
the boundary conditions that can cause spurious MTE faults.