Safely handle Ctrl-C in Golang


A framework for safely handling Ctrl-C in Golang.

cancel := make(chan struct{})

// do some jobs
chWaitJobs := make(chan int)
go func(cancel chan ) { // one example
    // do jobs that may take long time
    for {
        select {
        case <-cancel:
            break LOOP

        // do something

    <- chWaitJobs

chExitSignalMonitor := make(chan struct{})
cleanupDone := make(chan int)

signalChan := make(chan os.Signal, 1)
signal.Notify(signalChan, os.Interrupt)

go func() {
    select {
    case <-signalChan:
        log.Criticalf("received an interrupt, cleanning up ...")

        // broadcast cancel signal, so other jobs can exit safely

        // cleanning work

        cleanupDone <- 1
    case <-chExitSignalMonitor:
        cleanupDone <- 1

// wait jobs done



Read more →

Sequence Parsing Strategies in SeqKit



Illustration of FASTA/Q file parsing strategies. (A) and (C) Main thread parses one sequence, waits (blocked) it to be processed and then parses next one. (B) Sequence parsing thread continuously (non-blocked) parses sequences and passes them to main thread. The width of rectangles representing sequence parsing and sequence processing is proportional with running time. Sequence parsing speeds in (A) and (B) are the same, which are both much slower than that in (C). The speeds of sequence processing are identical in (A), (B) and (C). In (B), chunks of sequences in buffer can be processed in parallel, but most of the time the main thread needs to serially manipulate the sequences.

Read more →

Recent experience of programming


During the past three months, I’ve wrote two tools (SeqKit and csvtk) and extended few packages (bio and util), all in Go language.

Here are some experience I’ve got.

Code organiztion

src           # source code
docs          # documents
tests         # tests
benchmarks    # benchmark results
examples      # examples


Version control

  • Use git.
  • Create a repository on Github.


  • Write annotation for all public variables and functions. Use lint tools to check this.
  • Duild a project website (mkdocs/hugo) and host it on


Very important!

  • Unit Test.
    • Cover all functions, especially the frenquently used packages.
    • Use test tool of the programming language.
  • Function Test.
    • Use automated tools, e.g. ssshtest - Stupid Simple (ba)Sh Testing - A functional software testing framwork


  • Packing release files.
  • Automated testing.

Read more →

C note

c note


data types



#define PI 3.1415926


const double pi = 3.1415;

Read more →

map is not the fastest in go


I wrote a bioinformatics package in golang, in which I used a function to check whether a letter(byte) is a valid DNA/RNA/Protein letter.

The easy way is storing the letters of alphabet in a map and check the existance of a letter. However, when I used go tool pprof to profile the performance, I found the hash functions (mapaccess2, memhash8, memhash) of map cost much time (see figure below).

Then I found a faster way: storing letters in a slice, in detail, saving a letter(byte) at position int(letter) of slice. To check a letter, just chech the value of slice[int(letter)], non-zero means valid letter.

[update at 2016-06-02] Two switch versions were also tested. They were faster than map version, but still slower than slice version. Besides, it was affected by the number of case sentences in switch, i.e. the bigger the alphabet size is, the slower it runs.

See the benchmark result:

Tests Iterations Time/operation
BenchmarkCheckLetterWithMap-4 2000000000 0.18 ns/op
BenchmarkCheckLetterWithSwitch-4 1000000000 0.02 ns/op
BenchmarkCheckLetterWithSwitchWithLargerAlphabetSize-4 1000000000 0.03 ns/op
BenchmarkCheckLetterWithSlice-4 2000000000 0.01 ns/op

Read more →

Fetch taxon information by species name or taxid

python bioinf

[updates] I wrote a tool to do the same job and it’s even more powerful.

gTaxon - a fast cross-platform NCBI taxonomy data querying tool, with cmd client and REST API server for both local and remote server. http:/

This post presents a script for fetching taxon information by species name or taxid.

Take home message:

1). using cache to avoid repeatly search

2). object of could be treated as list, but it could not be rightly pickled. Using Json is also not OK. The right way is cache the xml text.

search = Entrez.efetch(id=taxid, db="taxonomy", retmode="xml")
# data =
##  read and parse xml
data_xml =
data = list(Entrez.parse(StringIO(data_xml)))

3). pickle file was fragile. A flag file could be used to detect whether data is rightly dumped.

4). using multi-threads to accelerate fetching.

Read more →

Data science learning path on python



  1. NumPy - NumPy is the fundamental package for scientific computing with Python
  2. pandas - pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
  3. matplotlib - matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms
  4. seaborn - Statistical data visualization using matplotlib


  1. Statsmodels - Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests.
  2. scikit-learn - Machine Learning in Python
  3. theano - Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently


Read more →

Install Python Numba


Official site:

Easiest way

conda install numba


Read more →

Make a Linux service auto start



  1. Create a service script in /etc/init.d/ and chmod a+x for it.
  2. Test, sudo service xxxxx start
  3. At last, sudo systemctl enable xxxxx

Read more →

Migrate from wordpress to hugo


Why abandon wordpress

It’s too heavy to my VPS, even with cache plugin. Static page is much more faster!

Why Hugo

See official doc:

  • Hugo is written in golang. It’s just an tiny executable binary file available for most popular operating system.
  • All pages are written in Markdown.
  • Hugo contains a super fast web server, it could monitor file changes and sync the contents in all most realtime (~200 ms for 100 pages).

Read more →