sorting - Sort mixed text lines (alphanum) in Perl

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

sorting - Sort mixed text lines (alphanum) in Perl

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I have txt file with every line structure like this:

P[containerVrsn:U(0)recordVrsn:U(0)size:U(212)ownGid:G[mdp:U(1090171666)**seqNo:U(81920)**]logicalDbNo:U(1)classVrsn:U(1)timeStamp:U(0)dbRecord:T[classNo:U(1064620)size:U(184)updateVersion:U(3)checksum:U(748981000)

And have to sort file lines based on seqNo (min to max). Sequence number can be virtually any number starting from zero. Any idea how can it be done in efficient way?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

330 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:26:32+0000

The Schwartzian Transform as suggested in Toto's answer is probably the fastest way to sort your lines here. But you said you're a Perl newbie, and I like to show how the lines can be sorted traditionally.

Perl has a sort function that sorts a list simply by alphabet. But you can supply a custom comparison function and let sort use your function to compare the elements. During its operation sort must continuously compare two elements (=lines) of your list and decide which one is greater or lesser or whether they are equal.

If you supply a comparison function, sort will call it with two such elements as the parameters $a and $b. You do not need to must not declare $a and $b, they are magic and just there. Your comparison function could look like this:

sub by_seqNo
{
    # extract the sequence number from $a and $b
    my ($seqA) = ($a =~ /seqNo:U((d+)/);
    my ($seqB) = ($b =~ /seqNo:U((d+)/);

    # numerically compare the sequence numbers (returns -1/0/+1)
    $seqA <=> $seqB;
}

The first two lines extract the numbers after seqNo:U( and store them as $seqA and $seqB. The third line compares these sequence numbers as integers and returns that result. Combined with the sort function this gives:

my @sorted = sort by_seqNo @lines;

The reason why the Schwartzian Transform (ST) is faster than this solution is because the ST does the (expensive) operation of extracting the seqNo from your lines exactly once for each line. The "traditional" approach on the other hand extracts the seqNo twice for each comparison.

Categories

sorting - Sort mixed text lines (alphanum) in Perl

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags