A method to encode number array into printable text in AS2  

     

There are many methods to convert a number array into a printable text string. The most commonly used one is simply print out the decimal form of the number, and use a delimiter such as "," to separate numbers.

This method is simple and straight forward. However, when you look at the result string you will realize that it is a little bit waste of space. For example, an integer that is less than 256 technically needs only 1 byte in binary format, but in the former format, 3 characters will be used to express number 255. the larger the number is, the more bytes it requires than it should be.

The delimiter "," is also quite inefficient because the one byte it occupies serves only one purpose when it is capable of representing 256 different states.

So, I think theatrically the efficiency of this method has significant room of improvement.

Let's begin with the number encoding.

The basic idea is to use extra characters other than 0-9 to encode a number. For example if we add "a-z" and "A-Z" to the encoding alphabet, we can encode 61 within one byte. This way we can shorten the length of encoded number dramatically.

Then how do we deal with negative numbers and decimal fractions?
The decimal solution use "-" to mark a negative number. We can use this method too but it is not the best way. There is a method called "zigzag" encoding, which use a simple calculation to convert both negative numbers and positive numbers into positive numbers. For negative numbers, it first convert the number to its positive format, then multiply the result by 2 and then minus 1. So we have a positive odd number. For positive numbers, just multiply it by 2. So the result is a even positive number. This way we can eliminate the use of negative mark. So 0,-1,1,-2,2 will be converted to 0,1,2,3,4 .

For decimal fractions, we also convert them to integer first. I chose a method that try to find a scale that can eliminate the fraction when it is multiplied to the number. Then we encode both the result integer and the scale, so we can recover the original number by this 2 numbers later in the decoding process.

How do we separate numbers without delimiters? One solution is to fixed length for every number in the array. This method is most efficient when numbers in the array are not much different from each other. But here I choose a flexible solution that every number marks its own end within the encoding.

The solution mimics the implementation of variant binary format. The format uses the highest bit of every byte to identify whether it is the end of the data. So only 7 bits out of 8 of a byte is used to encode data. Which means it trade the capacity of the byte with flexibility. For numbers less than 8 characters long, it is more efficient than the delimiter solution. And it is more efficient than fixed length solution if its members varies a lot in sizes.

Practically I divide the alphabet in halves, and use only one half to encode the last character of the number and the other for the rest.

In real implementation I actually divide the whole alphabet into 3 parts. "0-9" are used for encoding scales, "a-z" are used for encoding last character of the number and "A-Z" are used for others. The scale part is only necessary when the following number uses a different scale than the previous one in order to further reduce the length of encoded string.

My test shows that the encoded string comes out of this method saves 50% of space than Array.toString() method averagely and In some cases it is only 20% long of the latter.

The attachment is the AS2.0 implementation of the encoding method described above.

The usage is simple. Call the static encode() method with an array as parameter, the result value will be the encoded string. For the decoding process, just call the decode() method and pass in the string, the original array will be returned.

AttachmentSize
PrtEncoder.as8.34 KB