Steven asked that I create a karyotype file (GitHub Issue) from the NCBI P.verrucosa genome (GCA_014529365.1) in the following format:
name\tlength
(read: name
<tab>
length
)
This was a very quick process, using the FastA Index file and awk
. In fact, the Jupyter Notebook entry and this notebook entry have taken me far, far longer to put together than the commands to generate the desired output file. The commands were recorded in the following Jupyter Notebook file.
RESULTS
Output folder:
20230215-pver-GCA_014529365.1-karytoype/
Output file (txt)
20230215-pver-GCA_014529365.1-karytoype/GCA_014529365.1-pver-karytotype-name_length.tab
Preview:
JAAVTL010000001.1 2095917 JAAVTL010000002.1 2081954 JAAVTL010000003.1 1617595 JAAVTL010000004.1 1576134 JAAVTL010000005.1 1560107 JAAVTL010000006.1 1451149 JAAVTL010000007.1 1442001 JAAVTL010000008.1 1404416 JAAVTL010000009.1 1375744 JAAVTL010000010.1 1318009
MD5 checksum:
5aafd422505f26c0793a3b88abe0359f