unpack
count = unpack (template, expr[, arr])
This function does the reverse of pack — it unpacks the binary data structure expr into the array arr, returning the number of elements stored in the array. If arr is omitted, then unpack extracts a single value and returns it instead.
This is similar to the Perl function of the same name. template has the same format as pack and is a sequence of characters that specify the type of each value, as follows:
• a — an ASCII string, unstripped
• A — an ASCII string, with trailing nulls and spaces removed
• c — a signed 8–bit character value
• C — an unsigned 8–bit character value
• d — a double-precision floating point number in native format
• f — a single-precision floating point number in native format
• i — a signed integer (32–bit) value
• I — an unsigned integer (32–bit) value
• l — a signed long value
• L — an unsigned long value
• n — a short integer value in network (big-endian) order
• N — a long (32–bit) value in network (big-endian) order
• p — a pointer to a null terminated byte string
• P — a pointer to a null terminated Unicode string
• s — a signed short (16–bit) value
• S — an unsigned short (16–bit) value
• x — skip forward a byte
• X — back up a byte
• z — a Unicode string in big-endian (network) order, null padded
• Z — a Unicode string in little-endian order, null padded
• @ — go to absolute position for next field
Each character may be followed by a number that specifies a repeat count. The character and repeat count comprise a field specifier. Field specifiers may be separated by white space. For all type specifiers except, a, A, x, X, and @, the repeat count specifies how many values from the list of arguments are to be used. If the repeat count is *, then all remaining values are used.
If you specify * as the first character of the template, you are indicating that the second argument is a pointer to a buffer to unpack, and not the buffer itself. Normally, this is not needed for ACL strings but the dl_call function could return such a pointer. If a number follows the leading *, it is the size of the buffer. If the size is omitted, the unpack function will not detect if the template exceeds the bounds of the buffer. This feature should be used with caution since an incorrect template or invalid input could cause Arbortext Editor to terminate unexpectedly.
For specifiers a and A, only a single value is used. The repeat count specifies the size of the field in the input data structure and is also the size of the output string for type a. If type A was specified, then trailing nulls and spaces are stripped from the output string. The type specifiers x, X, and @ do not use up any values. The repeat count for x and X specify how many bytes can be skipped in the input string from the current position before unpacking the next field. The field specified @* tells to move to the end of the string. For example:
unpack("@*X3a3", "abcxyz")
returns the last three bytes of the string, “xyz.”
Since Arbortext Editor does not support floating point as a basic type, the unpacked values corresponding to the f and d type specifiers are returned as strings. For example:
unpack("d", pack("d", "72.27")))
returns “72.
The a and A designators convert a possibly multi-byte string in the system character set to a Unicode string. Note, if a field length is specified, for example, a7, not all bytes may be converted to Unicode if the field ends in the middle of a multi-byte character.
The c and C designators unpack 8-bit characters, not Unicode characters.
The p designator points to a null terminated byte string. The resulting string variable could then be converted by calling unpack or mbstoucs.
The z and Z designators create a byte string of Unicode characters, since strings are normally 8-bit Latin1 (ISO 8859–1) characters.
Here is an example of how to byte-swap a file containing Unicode characters:
while ((len = read(inf, buf, 512)) > 0) {
ustr = unpack("z*", buf);
write(outf, pack("Z*", ustr));
}
This works whether the original file was in big-endian or little-endian order.
If a function called by dl_call returns a pointer to a string, the ACL code must convert this to an ACL string using unpack. A typical method would be unpack("p", result). This will work but it is much better to use unpack("*a*", result). Using “p” will simply widen the 8 bit string to 16 bits without doing any multi-byte to Unicode conversion. Consequently, this method does not work if the string contains any multibyte characters. Using “*a*” will convert the result string from a multibyte string to Unicode using the current locale. In both cases you are assuming that the result is an 8-bit string. If it is a Unicode string you can use unpack with “P”, “*z*”, or “*Z*”. In all cases the ACL code must know whether the result is an 8-bit string or a Unicode string.
|
Before using “*a*”, “*z*”, or “*Z*” to unpack the result, make sure the pointer is a valid pointer and not a null pointer.
|
The Perl type specifiers %, b, B, h, H, and u are currently not supported.
Related Topics
Parent topic