You can think of a (single byte character) string as a base-256 encoded number where "x00" represents 0, ' ' (space, i.e., "x20") represents 32 and so on until "xFF", which represents 255.
A representation only with numbers 0-9 can be accomplished simply by changing the representation to base 10.
Note that "base64 encoding" is not actually a base conversion. base64 breaks the input into groups of 3 bytes (24 bits) and does the base conversion on those groups individually. This works well because a number with 24 bits can be represented with four digits in base 64 (2^24 = 64^4).
This is more or less what el.pescado does – he splits the input data into 8-bit pieces and then converts the number into base 10. However, this technique has one disadvantage relatively to base 64 encoding – it does not align correctly with the byte boundary. To represent a number with 8-bits (0-255 when unsigned) we need three digits in base 10. However, the left-most digit has less information than the others. It can either be 0, 1 or 2 (for unsigned numbers).
A digit in base 10 stores log(10)/log(2) bits. No matter the chunk size you choose, you're never going to be able to align the representations with 8-bit bytes (in the sense of "aligning" I've described in the paragraph before). Consequently, the most compact representation is a base conversion (which you can see as if it were a "base encoding" with only one big chunk).
Here is an example with bcmath.
bcscale(0);
function base256ToBase10(string $string) {
//argument is little-endian
$result = "0";
for ($i = strlen($string)-1; $i >= 0; $i--) {
$result = bcadd($result,
bcmul(ord($string[$i]), bcpow(256, $i)));
}
return $result;
}
function base10ToBase256(string $number) {
$result = "";
$n = $number;
do {
$remainder = bcmod($n, 256);
$n = bcdiv($n, 256);
$result .= chr($remainder);
} while ($n > 0);
return $result;
}
For
$string = "Mary had a little lamb";
$base10 = base256ToBase10($string);
echo $base10,"
";
$base256 = base10ToBase256($base10);
echo $base256;
we get
36826012939234118013885831603834892771924668323094861
Mary had a little lamb
Since each digit encodes only log(10)/log(2)=~3.32193
bits expect the number to tend to be 140% longer (not 200% longer, as would be with el.pescado's answer).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…