Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm representing nucleotides A,C,G,T as 0,1,2,3, and afterwards I need to translate the sequence representing as quaternary to decimal. Is there a way to achieve this in perl? I'm not sure if pack/unpack can do this or not.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
258 views
Welcome To Ask or Share your Answers For Others

1 Answer

Base 4 requires exactly 2 bits, so it's easy to handle efficiently.

my $uvsize = length(pack('J>', 0)) * 8;
my %base4to2 = map { $_ => sprintf('%2b', $_) } 0..3;

sub base4to10 {
   my ($s) = @_;
   $s =~ s/(.)/$base4to2{$1}/sg;
   $s = substr(("0" x $uvsize) . $s, -$uvsize);
   return unpack('J>', pack('B*', $s));
}

This allows inputs of 16 digits on builds supporting 32-bit integers, and 32 digits on builds supporting 64-bit integers.

It's possible to support slightly larger numbers using floating points: 26 on builds with IEEE doubles, 56 on builds with IEEE quads. This would require a different implementation.

Larger than that would require a module such as Math::BigInt for Perl to store them.


Faster and simpler:

my %base4to16 = (
   '0' => '0',   '00' => '0',   '20' => '8',
   '1' => '1',   '01' => '1',   '21' => '9',
   '2' => '2',   '02' => '2',   '22' => 'A',
   '3' => '3',   '03' => '3',   '23' => 'B',
                 '10' => '4',   '30' => 'C',
                 '11' => '5',   '31' => 'D',
                 '12' => '6',   '32' => 'E',
                 '13' => '7',   '33' => 'F',
);

sub base4to10 {
   (my $s = $_[0]) =~ s/(..?)/$base4to16{$1}/sg;
   return hex($s);
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...