I wrote a bioinformatics package in golang,
in which I used a function to check whether a letter(byte
) is a valid DNA/RNA/Protein letter.
The easy way is storing the letters of alphabet in a map and check the existance of a letter.
However, when I used go tool pprof
to profile the performance,
I found the hash functions (mapaccess2
, memhash8
, memhash
)
of map
cost much time (see figure below).
Then I found a faster way: storing letters in a slice,
in detail, saving a letter(byte
) at position int(letter)
of slice.
To check a letter, just chech the value of slice[int(letter)]
, non-zero means
valid letter.
[update at 2016-06-02] Two switch
versions were also tested.
They were faster than map
version, but still slower than slice
version.
Besides, it was affected by the number of case
sentences in switch
,
i.e. the bigger the alphabet size is, the slower it runs.
See the benchmark result:
Tests | Iterations | Time/operation |
---|---|---|
BenchmarkCheckLetterWithMap-4 | 2000000000 | 0.18 ns/op |
BenchmarkCheckLetterWithSwitch-4 | 1000000000 | 0.02 ns/op |
BenchmarkCheckLetterWithSwitchWithLargerAlphabetSize-4 | 1000000000 | 0.03 ns/op |
BenchmarkCheckLetterWithSlice-4 | 2000000000 | 0.01 ns/op |
source code: checkLetter_test.go