Manipulation on CSV/TSV files

bioinf devop

CSV (Comma Separated Values) files and TSV (Tab Separated Values) files are common data transfer files in many fields, including Bioinformatics. CSV files is more powerful, because content in quoting characters (double quotation marks,  most of the time) could contains field separators (comma). Therefore we could not just split one line by filed separators.

Read more →



网上传输的任何信息都有可能被恶意截获。尽管如此,我们仍然在网上保存着很多重要的资料,比如私人邮件、银行交易。这是因为,有一个叫着 SSL/TLS/HTTPS 的东西在保障我们的信息安全,它将我们和网站服务器的通信加密起来。 如果网站觉得它的用户资料很敏感,打算使用 SSL/TLS/HTTPS 加密,必须先向有 CA (Certificate Authority) 权限的公司/组织申请一个证书。有 CA 权限的公司/组织都是经过全球审核,值得信赖的。

CNNIC 可以随意造一个假的证书给任何网站,替换网站真正的证书,从而盗取我们的任何资料!目的在于,你懂的。

Read more →

Coding experience of this week


I did some analysis on high thoughput sequencing data this week. Here are some experience valuable to share.

1. Make scripts flexible and reproducible

Use arguments parser to handle different running condition/steps. Make sure it is easy to change parameters from command line.

Use counting option “verbose”; and logging modules for multiple level of output, e.g. “quite”; -> “info”; -> “verbose output”; -> “debug info”;,  avoid  repeatedly changing debug code.

Read more →

Manipulation on FASTA format file

bioinf devop

FASTA  format is a basic sequence format in the field of Bioinformatics. It’s easy to manipulate and parse.

Please use my another cool tool, SeqKit – a cross-platform and ultrafast toolkit for FASTA/Q file manipulation!

In my practice, I do a lot of work with FASTA format file. And I wrote some scripts to parse and analyze it. These are also some great tools like Bio series package like BioPerl, Biopython and BioJava.

Read more →

Python note

python note

I plan to use Python as one of my main programming language for its huge numbers of libraries.

last update 2015-8-14


Data structure

  • complex structure data.setdefault('names', []).append('Ruby')
  • get element data.get('foo', 0) # not data['foo']
  • [3*x for x in vec if x > 3] # [(x, x**2) for x in vec]
  • iterate for x, y in data.iteritems():
  • iterate through two sequences same time for x, y in zip(a, b):

Read more →

Install Python applications on Linux

linux python

Python is very popular. A lot of bioinformatics softwares are written in Python, so we have to learn how to install Python applications, particularly when you have no ROOT privilege.

Install private Python firstly

tar -zxcf Python-2.7.8.tgz
cd Python-2.7.8
./configure --prefix=/db/home/shenwei/local/app/python
make install

echo export LOCAL_PYTHON=/db/home/shenwei/local/app/python >> ~/.bashrc
echo export PYTHONPATH=\$LOCAL_PYTHON:\$LOCAL_PYTHON/lib/python2.7/site-packages:\$PYTHONPATH >> ~/.bashrc
echo export PATH=\$PYTHONPATH/bin:\$PATH >> ~/.bashrc
. ~/.bashrc

Read more →

Lost my code by “mving” executables to source file!

linux data

Like the title said, I lost my code, which costed more than half of a day.

All possible  rescue solutions failed:

  1. Shell command mv used, which rewrited the source file. It’s incurable.
  2. I supposed to upload the code to Github right after this rename operation,  but I didn’t.
  3. Dropbox syncing was paused.
  4. No backup, even deleted old version in the trash.
  5. Text editor was closed. If not it will store the code in RAM.

Read more →

Shell Note

linux note

Yet another learning note after Perl note and Golang note.

last update: 2016-03-31

Read more →

Standards for Command Line Interfaces


Some common practice  summerized from other great tools.

last update: 2014-09-08


The name should be descriptive and easy to remember.


Use option parser library  to parse them. Long-named option is recommended to make them readable.

Some necessary options

  • –help  Print introduction, usage and examples.
  • –version  Version is needed!
  • –verbose  Print additional information.

Additional options

  • -t Number of threads (concurrency).

Read more →


devop linux



                    |-> subjob2 |  
JOB1:    subjob1 -> |-> subjob3 | -> subjob5 -> subjob6
                    |-> subjob4 |

                    |-> subjob2 |  
JOB2:    subjob1 -> |-> subjob3 | -> subjob5 -> subjob6
                    |-> subjob4 |

                    |-> subjob2 |  
JOB3:    subjob1 -> |-> subjob3 | -> subjob5 -> subjob6
                    |-> subjob4 |


Read more →