Symbol Conversion UTF-8 ?

I need to convert from an (Integer)Number …to… (String)Chr

If I convert from (Int)129 to (Str)Chr, I get: apparently nothing, at least in a Text-Field or MsgBox it returns an space char
If I convert from (Int)129 to (Str)Hex, I get: “0x81” which makes sense
If I convert from (Str)0x81 hex back to (Str)Chr, I get: “?” {yes, this question mark dark symbol}

In PHP, if I do the same conversion from (Int)129, I get the (string)“ü” which makes sense

Said that, I don’t think Xojo is storing the right Chr after my conversion from INT > CHR.

Anyone knows what is happening ? How do I double-check this char conversion ? Is it an Encoding problem ?

Yes, it is an encoding issue. Xojo defaults to UTF8, php does not.

[code]Dim value As Integer = 129

Dim char As String = Chr(value) // doesn’t print since &h81 is a control character
Dim hex As String = Hex(value) // 81

Dim back As Integer = Val("&h" + hex) // 129[/code]

… and PHP strings are not utf8 AFAIK, so therefor &h81 is “ü” because of the encoding of your PHP settings.

Ok but I need to write this Chr value in to a TCP Socket.
Is it ok to write Chr(129) even if it is not screening ?

Or, maybe it would be a better idea to write Hex(129)

Do you need to send the value or the string representation?

Not sure, I am converting this from a PHP code and this variable is a String (as per var_dump) so I presume I have to write a string.

PHP Code:

    $frameHead[0] = 129;
    $frameHead[1] = ($masked === true) ? $payloadLength + 128 : $payloadLength;

    // convert frame-head to string:
    foreach (array_keys($frameHead) as $i) {
        $frameHead[$i] = chr($frameHead[$i]);
    }

In xojo, I get different results:

MsgBox is empty/1 space

Dim a As String = Chr(129) Dim b As Integer = Val(a) MsgBox Str(b)

Msgbox is correct:

Dim c As String = Hex("&h81") Dim d As Integer = Val("&h81") MsgBox Str(d)

Of course it is, you need to use Asc, not Val.

Dim a As String = Chr(129) Dim b As Integer = Asc(a) MsgBox Str(b) // 129

Very confusing. They say that PHP is much more difficult to learn but I think Xojo is outside the box, lol. Thanks!

By the way, what is the outcome of a MemoryBlock ? Is it a String ?

A MemoryBlock is a block of memory. A String is a data type.
You can assign a string to a MemoryBlock and vice versa and Xojo will do the necessary conversion.

Ok but I understand that a String stores text, an Integer stores numbers, a Float/Double stores numbers with decimal places, an Array stores multiple values (in PHP multiple datatypes, in Xojo one datatype only) and so on, but I don’t understand what a MemoryBlock stores or in what it differs from a String or an Array.

How does a block of memory looks like ? For instance a String looks like “Hello Word!”, an Integer looks like 12345, what about a MemoryBlock ?

[quote=114054:@Walter Sander]If I convert from (Int)129 to (Str)Chr, I get: apparently nothing, at least in a Text-Field or MsgBox it returns an space char
If I convert from (Int)129 to (Str)Hex, I get: “0x81” which makes sense
If I convert from (Str)0x81 hex back to (Str)Chr, I get: “?” {yes, this question mark dark symbol}[/quote]

[quote=114054:@Walter Sander]If I convert from (Int)129 to (Str)Chr, I get: apparently nothing, at least in a Text-Field or MsgBox it returns an space char
If I convert from (Int)129 to (Str)Hex, I get: “0x81” which makes sense
If I convert from (Str)0x81 hex back to (Str)Chr, I get: “?” {yes, this question mark dark symbol}

In PHP, if I do the same conversion from (Int)129, I get the (string)“ü” which makes sense

Said that, I don’t think Xojo is storing the right Chr after my conversion from INT > CHR.[/quote]

The proper way of getting a character from a numeric value is chr(x). It fetches the character at the order x in the font order (ASCII if you will). The example you cite for PHP seems to bring ü from the extended ASCII character table described at
http://www.theasciicode.com.ar/extended-ascii-code/letter-u-with-umlaut-diaeresis-lowercase-u-umlaut-ascii-code-129.html

You will never get anywhere if you try character 129 on a PC under Windows.

Absolutely no regular encoding for Mac or Windows uses position 129 for ü (Unicode point 00FC). In Mac Roman it is ‘A ring’ Å and in Windows CP 1252 and ISO 8859 129 is not assigned, so it displays nothing.

The normal way of getting a particular character in a font from its numeric position is char(129) not str(129) which would display Å on a Mac and nothing on a PC.

The best way to deal with accented characters throughout all systems is to use Unicode, as we do in Xojo with &U00FC which displays ü. I looked for PHP and Unicode on the net but there not seem to be the equivalent to convert strings into Unicode.

The best you could do is to build a dictionary wich translates the ASCII values in the table used by PHP into valid Unicode points, so you can convert back and forth.

Allright. I must be slow. The chr(129) ü actually comes from the all antiquated CP-437 original IBM-C character table represented at the link I posted above, as well as at http://www.ascii-codes.com/

That was way before Unicode, and left aside as soon as Windows came about in 1984.

Xojo uses UTF-8 which transparently fetches characters from all sorts of systems, but CP-437 is decidedly too old.

What does the PHP function write into the Socket then ? Isn’t that a String ?

$frameHead[0] = chr($frameHead[0]); // this is an ü symbol and a print_r or var_dump of shows this vartype=string

maybe what I need to write into that TCPSocket is a Binary value, I don’t know, looks confusing to me

PHP handles variables much more easier because we don’t have to say if it is going to be an Integer or a String. PHP will figure it out for you and if you change the variable value from 1 to “1” it will automatically be changed from String to Integer.

[quote=114169:@Walter Sander]What does the PHP function write into the Socket then ? Isn’t that a String ?

$frameHead[0] = chr($frameHead[0]); // this is an ü symbol and a print_r or var_dump of shows this vartype=string

maybe what I need to write into that TCPSocket is a Binary value, I don’t know, looks confusing to me

PHP handles variables much more easier because we don’t have to say if it is going to be an Integer or a String. PHP will figure it out for you and if you change the variable value from 1 to “1” it will automatically be changed from String to Integer.[/quote]

If you love so much undefined variables in PHP, you can use the Xojo Variant type, which will do the same. But don’t complain when you get a bug from it.

If I say “Pomme” in French or “Apple” in English, both mean the same fruit. With a 1982 IBM-PC and apparently your version of PHP, chr(129) means ü. With a Mac using a Mac Roman encoded font, chr(159 means ü. For a Windows machine, chr(252) means ü. Now what does the socket send ? If it is the web, here is the table for Web encoding : http://www.utexas.edu/learn/html/spchar.html. For a web browser, ü means ü.

I know, it is confusing. In Xojo, &U00FC means ü no matter the system : Mac, Linux, Windows, Web. UTF-8 makes all system cooperate to fetch the right Unicode character.

This forum is not about PHP programming, it is on you to make the proper research. However, you may want to look the PHP chr manual :
http://php.net/manual/en/function.chr.php

This, in particular, seems to be what you need :

[code]Another quick and short function to get unicode char by its code.

<?php /** * Return unicode char by its code * * @param int $u * @return char */ function unichr($u) { return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES'); } ?>[/code]

PHP is not a general purpose language like Xojo. The name indicates that: Personal Home Page Tools (years later changed to something else). So you can’t compare the two.

No, bytes.

[quote=114054:@Walter Sander]I need to convert from an (Integer)Number …to. . (String)Chr
[/quote]

Reading again your question, after explaining in details the many ways to get a ü, I realize I did not ask the most important questions :

  • Where does that integer come from ?
  • What do you want to obtain ?

If the integer is 129 and the caracter you want is ü, then there is no direct way to get that in Xojo, as I tried to outline above. If you simply want to get ü, then on a PC the integer must be 252 and on a Mac 159.

Eli Ott, thanks a lot. Grateful :slight_smile:

Michel, I know this forum is not about PHP, but I need to convert from an existing PHP demo script because I haven’t found a demo in Xojo and I understand they are different languages for different purposes and thus incomparable. I just aim to achieve the same result of this script :slight_smile: I really really appreciate your super comprehensive answers/explanations.

The integer is an INT (I believe it represents the Decimal value from an ASCII table). This is the PHP Script:

[code]function hybi10Encode($payload, $type = ‘text’, $masked = true) {
$frameHead = array();
$frame = ‘’;
$payloadLength = strlen($payload);

switch ($type) {
    case 'text':
        // first byte indicates FIN, Text-Frame (10000001):
        $frameHead[0] = 129;
        break;

    case 'close':
        // first byte indicates FIN, Close Frame(10001000):
        $frameHead[0] = 136;
        break;

    case 'ping':
        // first byte indicates FIN, Ping frame (10001001):
        $frameHead[0] = 137;
        break;

    case 'pong':
        // first byte indicates FIN, Pong frame (10001010):
        $frameHead[0] = 138;
        break;
}

// set mask and payload length (using 1, 3 or 9 bytes)
if ($payloadLength > 65535) {
    $payloadLengthBin = str_split(sprintf('%064b', $payloadLength), 8);
    $frameHead[1] = ($masked === true) ? 255 : 127;
    for ($i = 0; $i < 8; $i++) {
        $frameHead[$i + 2] = bindec($payloadLengthBin[$i]);
    }

    // most significant bit MUST be 0 (close connection if frame too big)
    if ($frameHead[2] > 127) {
        $this->close(1004);
        return false;
    }
} elseif ($payloadLength > 125) {
    $payloadLengthBin = str_split(sprintf('%016b', $payloadLength), 8);
    $frameHead[1] = ($masked === true) ? 254 : 126;
    $frameHead[2] = bindec($payloadLengthBin[0]);
    $frameHead[3] = bindec($payloadLengthBin[1]);
} else {
    $frameHead[1] = ($masked === true) ? $payloadLength + 128 : $payloadLength;
}

// convert frame-head to string:
foreach (array_keys($frameHead) as $i) {
    $frameHead[$i] = chr($frameHead[$i]);
}

if ($masked === true) {
    // generate a random mask:
    $mask = array();
    for ($i = 0; $i < 4; $i++) {
        $mask[$i] = chr(rand(0, 255));
    }

    $frameHead = array_merge($frameHead, $mask);
}
$frame = implode('', $frameHead);
// append payload to frame:
for ($i = 0; $i < $payloadLength; $i++) {
    $frame .= ($masked === true) ? $payload[$i] ^ $mask[$i % 4] : $payload[$i];
}

return $frame;

}[/code]

What I want to obtain is a masked frame to send through my TCP Socket (which is a WebSocket connection actually)

[quote=114136:@Walter Sander]Ok but I understand that a String stores text, an Integer stores numbers, a Float/Double stores numbers with decimal places, an Array stores multiple values (in PHP multiple datatypes, in Xojo one datatype only) and so on, but I don’t understand what a MemoryBlock stores or in what it differs from a String or an Array.

How does a block of memory looks like ? For instance a String looks like “Hello Word!”, an Integer looks like 12345, what about a MemoryBlock ?[/quote]

Memoryblocks store bytes - raw bytes

Norman, Thanks. I think I was confused because an Hex code really looks like a Byte code.