<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="generator" content="Asciidoctor 2.0.23">
<title>Unicode</title>
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
<link rel="stylesheet" href="./asciidoctor.css">
<link rel="stylesheet" href="./mlton.css">

</head>
<body class="article">
<div id="mlton-header">
<div id="mlton-header-text">
<h2>
<a href="./Home">
MLton
20241230
</a>
</h2>
</div>
</div>
<div id="header">
<h1>Unicode</h1>
</div>
<div id="content">
<div class="sect1">
<h2 id="_support_in_the_definition_of_standard_ml">Support in The Definition of Standard ML</h2>
<div class="sectionbody">
<div class="paragraph">
<p>There is no real support for Unicode in the
<a href="DefinitionOfStandardML">Definition</a>; there are only a few throw-away
sentences along the lines of "the characters with numbers 0 to 127
coincide with the ASCII character set."</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_support_in_the_standard_ml_basis_library">Support in The Standard ML Basis Library</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Neither is there real support for Unicode in the <a href="BasisLibrary">Basis
Library</a>.  The general consensus (which includes the opinions of the
editors of the Basis Library) is that the <code>WideChar</code> and <code>WideString</code>
structures are insufficient for the purposes of Unicode.  There is no
<code>LargeChar</code> structure, which in itself is a deficiency, since a
programmer can not program against the largest supported character
size.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_current_support_in_mlton">Current Support in MLton</h2>
<div class="sectionbody">
<div class="paragraph">
<p>MLton, as a minor extension over the Definition, supports UTF-8 byte
sequences in text constants.  This feature enables "UTF-8 convenience"
(but not comprehensive Unicode support); in particular, it allows one
to copy text from a browser and paste it into a string constant in an
editor and, furthermore, if the string is printed to a terminal, then
will (typically) appear as the original text.  See the
<a href="SuccessorML#ExtendedTextConsts">extended text constants feature of
Successor ML</a> for more details.</p>
</div>
<div class="paragraph">
<p>MLton, also as a minor extension over the Definition, supports
<code>\Uxxxxxxxx</code> numeric escapes in text constants and has preliminary
internal support for 16- and 32-bit characters and strings.</p>
</div>
<div class="paragraph">
<p>MLton provides <code>WideChar</code> and <code>WideString</code> structures, corresponding
to 32-bit characters and strings, respectively.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_questions_and_discussions">Questions and Discussions</h2>
<div class="sectionbody">
<div class="paragraph">
<p>There are periodic flurries of questions and discussion about Unicode
in MLton/SML.  In December 2004, there was a discussion that led to
some seemingly sound design decisions.  The discussion started at:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://www.mlton.org/pipermail/mlton/2004-December/026396.html" class="bare">http://www.mlton.org/pipermail/mlton/2004-December/026396.html</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>There is a good summary of points at:</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://www.mlton.org/pipermail/mlton/2004-December/026440.html" class="bare">http://www.mlton.org/pipermail/mlton/2004-December/026440.html</a></p>
</li>
</ul>
</div>
<div class="paragraph">
<p>In November 2005, there was a followup discussion and the beginning of
some coding.</p>
</div>
<div class="ulist">
<ul>
<li>
<p><a href="http://www.mlton.org/pipermail/mlton/2005-November/028300.html" class="bare">http://www.mlton.org/pipermail/mlton/2005-November/028300.html</a></p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_also_see">Also see</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The <a href="fxp">fxp</a> XML parser has some support for dealing with Unicode
documents.</p>
</div>
</div>
</div>
</div>
<div id="mlton-footer">
<div id="mlton-footer-text">
<div>
Last updated Thu Oct 21 15:53:06 2021 -0400 by Matthew Fluet.
<a href="https://github.com/MLton/mlton/commits/master/doc/guide/src/Unicode.adoc">Log</a>
<a href="https://github.com/MLton/mlton/edit/master/doc/guide/src/Unicode.adoc">Edit</a>
</div>
</div>
</body>
</html>