Why does this code generate "The Text Could Not Be Converted" at the .toText?

Bryan_Dunphy · November 21, 2020, 4:53pm

This is sample code. The real code is getting an unknown number from a UInt32 array.

Var s As String =   Encodings.UTF32.Chr(1604).DefineEncoding(Encodings.UTF32)
System.DebugLog("1604 is " + s)
Var t As Text = s.ToText

Jon_Ogden · November 21, 2020, 4:55pm

Because if the character you are trying to convert is not a valid for Text you will get such an error. There’s a lot of string characters that are NOT text characters.

Bryan_Dunphy · November 21, 2020, 5:04pm

The values in the array came from hard coded test constants in c code where a custom function called “UCS4Print” is used to display the characters. The c code runs perfectly when compiled at the Mac command line.

I’m trying to convert the c code’s function (punycode en/decode) to Xojo for use in my app. So what do I do now?

Jon_Ogden · November 21, 2020, 5:16pm

Why do you need to use Text? Why not just deploy as string?

Kem_Tekinay · November 21, 2020, 6:05pm

Can you post the C and Xojo code you have so far?

Bryan_Dunphy · November 21, 2020, 8:03pm

Original C code

struct punycode
{
const char *name;
size_t inlen;
uint32_t in[100];
const char *out;
int rc;
};

In reality the test cases go from A to S.

static const struct punycode punycode[] = {
{
“(A) Arabic (Egyptian)”, 17,
{
0x0644, 0x064A, 0x0647, 0x0645, 0x0627, 0x0628, 0x062A, 0x0643,
0x0644, 0x0645, 0x0648, 0x0634, 0x0639, 0x0631, 0x0628, 0x064A,
0x061F}, “egbpdaj6bu4bxfgehfvwxn”, IDN2_OK}}

static int debug = 0;
static int error_count = 0;
static int break_on_error = 0;

static void
ucs4print (const uint32_t * str, size_t len)
{
size_t i;

printf ("\t;; ");
for (i = 0; i < len; i++)
{
printf (“U+%04x “, str[i]);
if ((i + 1) % 4 == 0)
printf (” “);
if ((i + 1) % 8 == 0 && i + 1 < len)
printf (”\n\t;; “);
}
puts (””);
}

#include “punycode.h”

int
main (void)
{
char *p;
uint32_t *q;
int rc;
size_t i, outlen;

p = (char *) malloc (sizeof (*p) * BUFSIZ);
if (p == NULL)
fail (“malloc() returned NULL\n”);

q = (uint32_t *) malloc (sizeof (*q) * BUFSIZ);
if (q == NULL)
fail (“malloc() returned NULL\n”);

for (i = 0; i < sizeof (punycode) / sizeof (punycode[0]); i++)
{
if (debug)
printf (“PUNYCODE entry %d: %s\n”, (int) i, punycode[i].name);

  if (debug)
{
  printf ("in:\n");
  ucs4print (punycode[i].in, punycode[i].inlen);
}

  outlen = BUFSIZ;
  rc = _idn2_punycode_encode_internal (punycode[i].inlen, punycode[i].in,
				   &outlen, p);
  if (rc != punycode[i].rc)
{
  fail ("punycode_encode() entry %d failed: %d\n", (int) i, rc);
  if (debug)
    printf ("FATAL\n");
  continue;
}

  if (rc == IDN2_OK)
p[outlen] = '\0';

  if (debug && rc == IDN2_OK)
{
  printf ("computed out: %s\n", p);
  printf ("expected out: %s\n", punycode[i].out);
}
  else if (debug)
printf ("returned %d expected %d\n", rc, punycode[i].rc);

  if (rc == IDN2_OK)
{
  if (strlen (punycode[i].out) != strlen (p) ||
      memcmp (punycode[i].out, p, strlen (p)) != 0)
    {
      fail ("punycode() entry %d failed\n", (int) i);
      if (debug)
	printf ("ERROR\n");
    }
  else if (debug)
    printf ("OK\n\n");
}
  else if (debug)
printf ("OK\n\n");

  if (debug)
{
  printf ("in: %s\n", punycode[i].out);
}

  outlen = BUFSIZ;
  rc = _idn2_punycode_decode_internal (strlen (punycode[i].out),
				   punycode[i].out, &outlen, q);
  if (rc != punycode[i].rc)
{
  fail ("punycode() entry %d failed: %d\n", (int) i, rc);
  if (debug)
    printf ("FATAL\n");
  continue;
}

  if (debug && rc == IDN2_OK)
{
  printf ("computed out:\n");
  ucs4print (q, outlen);
  printf ("expected out:\n");
  ucs4print (punycode[i].in, punycode[i].inlen);
}
  else if (debug)
printf ("returned %d expected %d\n", rc, punycode[i].rc);

  if (rc == IDN2_OK)
{
  if (punycode[i].inlen != outlen ||
      memcmp (punycode[i].in, q, outlen) != 0)
    {
      fail ("punycode_decode() entry %d failed\n", (int) i);
      if (debug)
	printf ("ERROR\n");
    }
  else if (debug)
    printf ("OK\n\n");
}
  else if (debug)
printf ("OK\n\n");
}

free (q);
free §;

return 0;
}

My Xojo version
code from test App’s only Windows Open

TestCases.Append(New Punycode_Data(_
"(A) Arabic (Egyptian)", 17, Array(&h0644, _
&h064A, &h0647, &h0645, &h0627, &h0628, _
&h062A, &h0643, &h0644, &h0645, &h0648, _
&h0634, &h0639, &h0631, &h0628, &h064A, _
&h061F), "egbpdaj6bu4bxfgehfvwxn", IDN2_OK))

Code from Window.Activate

Me.Listbox1.Visible = False
For Each v As Punycode_Data In TestCases
  PopulateRow(v)
Next
Me.Listbox1.Visible = True

PopulateRow

For Each v As Punycode_Data In TestCases
  Var textInput As Text = ""
  Var ub As UInteger = data.Input_Data.Ubound
  For ndx As UInteger = 0 To ub
    Var c As Text = Encodings.UTF32.Chr(_
    data.Input_Data(ndx)).ToText ' This is the failing toText
    textInput = textInput + c
  Next
  Var enc As Text = Punycode.ToPunycode(textInput)
  Var expected As Boolean = (enc = data.Expected_Output)
  Var dec As Text = Punycode.FromPunycode(enc)
  Var match As Boolean = (dec = textInput)
  Window1.Listbox1.AddRow
  Window1.Listbox1.CellTypeAt(Window1.Listbox1._
  LastRowIndex, 2) = Listbox.CellTypes.CheckBox
  Window1.Listbox1.CellTypeAt(Window1.Listbox1._
  LastRowIndex, 4) = Listbox.CellTypes.CheckBox
  Window1.Listbox1.CellValueAt(Window1.Listbox1._
  LastRowIndex, 0) = textInput
  Window1.Listbox1.CellValueAt(Window1.Listbox1._
  LastRowIndex, 1) = enc
  Window1.Listbox1.CellValueAt(Window1.Listbox1._
  LastRowIndex, 3) = dec
  Window1.Listbox1.CellCheckBoxValueAt(Window1.Listbox1.LastRowIndex, 2) = expected
  Window1.Listbox1.CellCheckBoxValueAt(Window1.Listbox1.LastRowIndex, 4) = match
Next

Kem_Tekinay · November 22, 2020, 5:31am

I assume the array you’re attempting to convert is this?

Array(&h0644, _
&h064A, &h0647, &h0645, &h0627, &h0628, _
&h062A, &h0643, &h0644, &h0645, &h0648, _
&h0634, &h0639, &h0631, &h0628, &h064A, _
&h061F)

If so, it looks to me like the array represents character codes, so you just need to convert them to a string, something like this:

var codes() as UInt64 = array(&h0644, _
&h064A, &h0647, &h0645, &h0627, &h0628, _
&h062A, &h0643, &h0644, &h0645, &h0648, _
&h0634, &h0639, &h0631, &h0628, &h064A, _
&h061F)

var chars() as string

for each code as UInt64 in codes
  chars.AddRow String.Chr( code )
next

var s as string = String.FromArray( chars, "" )

That gives you the string:

ليهمابتكلموشعربي؟

Kem_Tekinay · November 22, 2020, 5:32am

Let me add that C is not my forte so I didn’t analyze that code to see if I’m on the right track.

Tim_Hare · November 22, 2020, 6:59am

Why use UTF32.Chr()? Try UTF8 instead. And I agree with Kem, why Text instead of String?

Bryan_Dunphy · November 22, 2020, 8:54am

The final app needs to run under Desktop, Web, and i(Pad)OS. I THOUGHT I read somewhere that at least one of those platforms does not support String?

Bryan_Dunphy · November 22, 2020, 9:02am

I assumed the values were UTF32 because I googled UCS4 and discovered that it had been absorbed into UTF32.

TimStreater · November 22, 2020, 9:13am

These would appear to be Unicode values, see Unicode/UTF-8-character table - starting from code position 0480.

For instance, 0x0644 is listed there as:

Unicode code point: U+0644
character: ل
UTF-8 (hex.): d9 84
name: ARABIC LETTER LAM

Kem_Tekinay · November 22, 2020, 2:31pm

The code you wrote wouldn’t run in iOS anyway, and iOS will support string soon enough.

AlbertoD · November 22, 2020, 2:31pm

Yes, for now iOS uses Text but if you visit the Xojo Roadmap page you will find that the #1 on the list is bringing API 2.0 to iOS and, with that, the change to String.

Bryan_Dunphy · November 22, 2020, 10:09pm

Thanks for the hint. I discovered Text.FromUnicodeCodepoint and used that to create

For Each v As Punycode_Data In TestCases
  Var textInput As Text = ""
  Var ub As UInteger = v.Input_Data.Ubound
  For ndx As UInteger = 0 To ub
        textInput = textInput + Text.FromUnicodeCodepoint(v.Input_Data(ndx))
    Next

This code is working for now.