# Sorting string lexicographically in Perl

This situation comes from soralind, and it is an extension of this post.

(Minor update 2014-08-16: fix the bug of subroutine weight_sum  )

The aim is to sort strings lexicographically NOT alphabetically.  For example, an alphabet with predetermined order:  `T A G C` . For a set of strings: `AT AC TG TC` , to list them alphabetically:

``````AC
AT
TC
TG
``````

lexicographical order:

``````TG
TC
AT
AC
``````

Similarly, Schwartzian transform by weight is used.

``````#!/usr/bin/env perl
use strict;

my @alphabet = qw/T A G C/;
my @list = qw/AT AC TG TC/;

print "\$_\n" for sort_lexicographically (\@alphabet, \@list);

sub sort_lexicographically {
my ( \$alphabet, \$list ) = @_;
my \$n = scalar @\$alphabet;  # size of alphabet

# set weight for every letter
my %weight = ();
my \$w      = 0;
for my \$key (@alphabet) {
\$w++;
\$weight{\$key} = \$w;
}

# SUM weight of a string
sub weight_sum {
my ( \$weight, \$s ) = @_;
my \$sum = 0;
my @letters = reverse split //, \$s;
for my \$i ( 0 .. \$#letters ) {
\$sum += \$\$weight{ \$letters[\$i] } * ( \$n**\$i );
}
return \$sum;
}

# sort by Schwartzian transform
@list = map { \$_-&gt; }
sort { \$a-&gt; &lt;=&gt; \$b-&gt; }    # sort by weight
map { [ weight_sum( \%weight, \$_ ), \$_ ] } @list;

return @list;
}
``````

Here are the weights of some strings. It’s kind of 4-bit digital, isn’t it? 4 is the size of alphabet.

``````string  weight
T       1
A       2
G       3
C       4
TT      5
TA      6
TG      7
TC      8
AT      9
AA      10
AG      11
AC      12
GT      13
GA      14
GG      15
GC      16
CT      17
CA      18
CG      19
CC      20
``````

This sorting strategy could also be easily implemented in other programming languages.