![GitHub Installs](https://img.shields.io/endpoint?url=https://r-pkg.github.io/install-stats/LiuyangLee/gclink/badge.json&label=GitHub%20Installs&style=flat-square)
![CRAN Downloads](https://img.shields.io/badge/dynamic/json?url=https://cranlogs.r-pkg.org/badges/grand-total/gclink&query=$.count&label=CRAN%20Downloads&color=blue&style=flat-square)

# gclink: Gene-Cluster Discovery, Annotation and Visualization

## Overview

`gclink` performs end-to-end analysis of gene clusters (e.g., photosynthesis, carbon/nitrogen/sulfur cycling, carotenoid, antibiotic, or viral genes) from (meta)genomes. It provides:

- Parsing of Basic Local Alignment Search Tool (BLAST) results in tab-delimited format produced by tools like NCBI BLAST+ and Diamond BLASTp
- Contiguous cluster detection
- Publication-ready visualization


## Key Features

### Adaptive Workflow
- Works with or without coding sequences input
- Skips plotting when functional grouping is absent
- Supports custom gene lists for universal cluster detection

### Cluster Detection
- Density-based identification via `AllGeneNum` and `MinConSeq` parameters
- Handles incomplete gene annotation coverage
- Optional insertion of hypothetical ORFs at cluster boundaries

### Visualization
- Publication-ready arrow plots with customizable based on `gggenes`:
  - Color themes
  - Functional group levels
  - Genome subsets

## Installation

```r
# Install from CRAN
install.packages("gclink")

# Install from GitHub
if (!require("devtools")) install.packages("devtools")
devtools::install_github("LiuyangLee/gclink")
```

## Case 1: Using blastp result
```r
# Case 1: Using blastp result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)
library(gclink)
data(blastp_df)
data(seq_data)
data(photosynthesis_gene_list)
data(PGC_group)
gc_list <- gclink(in_blastp_df = blastp_df,
                  in_seq_data = seq_data,
                  in_gene_list = photosynthesis_gene_list,
                  in_GC_group  = PGC_group,
                  AllGeneNum = 50,
                  MinConSeq  = 25,
                  apply_length_filter = TRUE,
                  down_IQR   = 10,
                  up_IQR     = 10,
                  orf_before_first = 0,
                  orf_after_last = 0,
                  levels_gene_group = c('bch','puh','puf','crt','acsF','assembly','regulator',
                                        'hypothetical ORF'),
                  color_theme = c('#3BAA51','#6495ED','#DD2421','#EF9320','#F8EB00',
                                  '#FF0683','#956548','grey'),
                  genome_subset = NULL)
gc_meta = gc_list[["GC_meta"]]
gc_seq = gc_list[["GC_seq"]]
gc_plot = gc_list[["GC_plot"]]
head(gc_meta)   # Cluster metadata
head(gc_seq)    # FASTA sequences
print(gc_plot)  # Visualization
```

### 1 Input Data Preview
#### 1.1 A dataframe of Diamond BLASTp output (e.g., head(`blastp_df`))
| qaccver                                                  | saccver                                                                 | pident | length | mismatch | gapopen | qstart | qend | sstart | send | evalue    | bitscore |
|----------------------------------------------------------|-------------------------------------------------------------------------|--------|--------|----------|---------|--------|------|--------|------|-----------|----------|
| Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_67 | enzymerhodopsin_XP_002954798.1_Volvox_carteri                          | 26.6   | 576    | 343      | 15      | 157    | 666  | 332    | 893  | 8.18e-41  | 161      |
| Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_113 | petB_Candidatus_Methylomirabilis_oxyfera_DAMO_1671_MOX                 | 76.6   | 248    | 58       | 0       | 14     | 261  | 9      | 256  | 5.43e-149 | 417      |
| Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_114 | petC_Candidatus_Nitronauta_litoralis_G3M70_16785_NLI                   | 50.8   | 177    | 73       | 2       | 8      | 184  | 27     | 189  | 3.83e-59  | 184      |
| Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_523 | cruC_Humisphaera_borealis_IPV69_18620_HBS                             | 31.5   | 365    | 208      | 11      | 42     | 378  | 48     | 398  | 1.45e-41  | 151      |
| Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_616 | rfpB_KL662192_1_938                                                   | 33.0   | 227    | 137      | 3       | 4      | 223  | 3      | 221  | 2.53e-32  | 124      |
| Kuafubacteriaceae--GCA_016703535.1---JADJBV010000002.1_754 | bchI_p_Myxococcota--c_WYAZ01--o_WYAZ01--GCA_016703535.1---JADJBV010000002.1_754 | 100.0 | 343 | 0 | 0 | 1 | 343 | 1 | 343 | 4.73e-249 | 677 |

#### 1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ⁠ORF_id # start # end # strand # ...⁠) and Sequence (e.g., head(`seq_data`))
| SeqName                                                                               | Sequence                                                                                                                                                                                                                                                 |
|---------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_1 # 3 # 266 # 1 # ID=85_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.807 | CCGGACGCGCCGCCCGCCCCGAAGGCCCCGCCGGCCGCCCCCACCTATCCGCTCGAAGGCGCGCTCGGTATCAGCCGCGTGCGCCTCGTGCGCGCCACGCCCTGCGGCCTCACCGGCCGCGAGCTCGGCGCCGGCGAGGAGGCCCTCCTCGTCCACTTCGACGACGGACGCCCGCCCCTCGCGGTCGCCCCCGACGCGCTCCCGACGCCCCCCGGCGACGGGACGCCCCCCACCGGCGCTCCGCCGGAAGGAGACCCCGCATGA |
| Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_2 # 263 # 490 # 1 # ID=85_2;partial=00;start_type=ATG;rbs_motif=AGGAG;rbs_spacer=5-10bp;gc_cont=0.737 | ATGACCCGCCCCGAAGACGCCCCGCCCACCCACGAAGCCGCGGACCGCGCCGTGCGCTCCCTCTTCCAGATCGGTCGCCTCTGGGCCTCCCACGGCCTCGAGATGGGTCGCATGACCTTGCGGACCGCCGCCAAGACCCTCGAGAGCACCGCCGAGACCCTCGAGGACCTCTCCCAGCGCGTCGCCCCCGACGACGAGCGCCCCGCGGACGAACGCGCCGCCGACTGA |
| Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_3 # 667 # 2184 # -1 # ID=85_3;partial=00;start_type=ATG;rbs_motif=AGGAGG;rbs_spacer=5-10bp;gc_cont=0.775 | ATGAGCGCGATCGAAGGGACCCGGCCTCGGGACGGCGAGGCCCGCATGCCCGTGGAGGCGACCCCCGTGGAGGCCATCGGGGGCCTCGTCGCCCGGGCGCGTGACGCCGGCTTCGACCACGCGGCCCGGCCCCTCGCCGAGCGCGCGGGGCTGCTGCGCGCGCTCGCGGACGCCATCCTCGCCGACGGGGAGGCCATCGTCGCGCTCCTCGAGGAGGAGACGGGCAAGCCGGCGGCGGAGGCGTGGCTCCACGAGGTCGTGCCGACGGCGGACCTCGGGAGCTGGTGGAGCAGCCAGGGGCCGGCGCACCTCGCGACGGAAGCCGTGCGCCTCGACCCGCTCGCCTACCCTGGCAAGCGCGCGCGCGTCGAGGTGGTCCCGCGTGGCGTCGTGGCGCTGATCACGCCTTGGAACTTCCCGGTGGCGATCCCGCTGCGGACGCTCTTCCCGGCGCTCCTCGCGGGCAACGGCGTCGTCTGGAAGCCGTCCGAGCACACGCCGCGGGTGGCGGCGCGCGTGCACGGGATCGTGCGCGAGGTCTTCGGGCCGGACCTGGTCGAGCTGGTGCAGGGCGCCGGCGCGCAGGGGGCGGCGCTGGTCGAGGCGGACGTGGACGCGGTGGTGTTCACGGGCAGCGTGGCGACCGGGCGGAAGGTCGGCGCGGCGGCGGGGCGGGCGCTCACGCCGGCGTCGCTCGAGCTCGGCGGCAAGGACGCGGCCGTGGTGCTCGACGACGCGGACCTGGAGCGCACGGCCCGGGGCCTGCTCTGGGCGGCGATGGCGAACGCGGGGCAGAACTGCGCCGGGCTCGAGCGCGTCTACGCGGTGGCGGAGGTCGCCGGCCCGCTGAAGGCGCGGCTCGGTGAGCTGGCCGGAGAGCTGGTGCCCGGGCGCGACGTGGGGCCGCTGGTGACCGAGGCGCAGCTCGCGACGGTGGAGCGGCACGTGCGCGAGGCGGTCGACGGGGGCGCGGAGGTGCTGGCCGGCGGCGAGCGGCTCGAGCGGGGCGGGCGCTGGTTCGCGCCGACCGTGCTGGCGGAGGTCGAGCCGTCTTCGGCGGCGCTCCGGGAGGAGACGTTCGGGCCGGTGGTCGTCGTGCAGACGGTGGCGGACGAGGCGGCGGCCGTGGCGGCGGCGAACGACTCGCGCTTCGGGCTGACGGCGAGCGTCTGGACGCGGGACGCGGCGCGCGGGGAGGCGGTCGCACGGCGGCTCCGGGCGGGCGTCGTGACGGTGAACAACCACGCCTTCACCGGGGCCATCCCGGCGCTGCCCTGGGGCGGCGTCGGCGAGACGGGCTTCGGGGTGACGAACTCGCCGCACGCGCTCCACGCATTGGTGCGGCCGCGGGCCGTGGTCGTGGACGGCAACGCGCGGCCGGAGCTCTACTGGCACCCCTACGACGAGGCGCTCGAGCGGCTCGGGAAGGGCATGGCGGCGCTCCGCGGCAAGGGCGGGCCGATCACGAAGGTGCGCGCCGTGGCCAGGCTGCTCGGGGCGCTCCGCCGGCGCTTCTGA |

#### 1.3 (Optional) Gene group (e.g., head(`PGC_group`))
| gene     | gene_group | gene_label |
|----------|------------|------------|
| bciE     | bci        | E          |
| bchB     | bch        | B          |
| bchC     | bch        | C          |
| bchD     | bch        | D          |

#### 1.4 (Optional) Candidate gene list (e.g., head(`photosynthesis_gene_list`))
bciE bchB bchC bchD bchE

### 2 Output Data Preview
#### 2.1 Gene cluster information (`GC_meta`)
| gene | qaccver | saccver | pident | length | mismatch | gapopen | qstart | qend | sstart | send | evalue | bitscore | genome | orf | contig | genome_contig | orf_position | gene_cluster | GC_orf_position | GC_present_length | GC_absent_length | GC_length | SeqName | Sequence | start | end | direction | gene_group | gene_label | Pgenome | Pstart | Pend | Pdirection |
|------|---------|---------|--------|--------|----------|---------|--------|------|--------|------|--------|----------|--------|-----|--------|--------------|-------------|--------------|----------------|------------------|-----------------|----------|---------|----------|-------|-----|-----------|------------|------------|---------|--------|------|------------|
| pufC | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_97 | pufC_Rhodospirillum_centenum_RC1_2101_RCE | 53.1 | 335 | 147 | 7 | 3 | 329 | 6 | 338 | 7.66E-112 | 333 | Houyibacteriaceae--LLY-WYZ-15_3 | k141_102864_97 | k141_102864 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864 | 97 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 1 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_97 # 117640 # 118917 # -1 # ID=85_97;partial=00;start_type=GTG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.710 | GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA | 117640 | 118917 | -1 | puf | C | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 0 | 1277 | FALSE |
| pufM | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_98 | pufM_p_Myxococcota--c_Polyangia--o_Polyangiales--ERR1726576_bin.13---k141_102738_3 | 100 | 437 | 0 | 0 | 1 | 437 | 1 | 437 | 4.73E-308 | 834 | Houyibacteriaceae--LLY-WYZ-15_3 | k141_102864_98 | k141_102864 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864 | 98 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 2 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_98 # 118914 # 120224 # -1 # ID=85_98;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.704 | ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA | 118914 | 120224 | -1 | puf | M | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 1274 | 2584 | FALSE |
| pufL | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_99 | pufL_p_Myxococcota--c_Polyangia--o_Polyangiales--ERR1726567_bin.15---k141_184359_2 | 100 | 275 | 0 | 0 | 1 | 275 | 1 | 275 | 2.63E-214 | 583 | Houyibacteriaceae--LLY-WYZ-15_3 | k141_102864_99 | k141_102864 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864 | 99 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 3 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_99 # 120270 # 121094 # -1 # ID=85_99;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.648 | ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA | 120270 | 121094 | -1 | puf | L | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 2630 | 3454 | FALSE |
| bchO | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_100 | bchO_Pararhodospirillum_photometricum_RSPPHO_00117_RPM | 44.9 | 265 | 144 | 1 | 33 | 295 | 28 | 292 | 6.97E-60 | 194 | Houyibacteriaceae--LLY-WYZ-15_3 | k141_102864_100 | k141_102864 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864 | 100 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 4 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_100 # 121191 # 122102 # -1 # ID=85_100;partial=00;start_type=ATG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.762 | ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA | 121191 | 122102 | -1 | bch | O | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 3551 | 4462 | FALSE |
| bchD | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_101 | bchD_p_Myxococcota--c_Polyangia--o_Polyangiales--GCA_002699025.1---PABA01000098.1_81 | 100 | 587 | 0 | 0 | 1 | 587 | 1 | 587 | 0 | 1064 | Houyibacteriaceae--LLY-WYZ-15_3 | k141_102864_101 | k141_102864 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864 | 101 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 5 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_101 # 122099 # 123859 # -1 # ID=85_101;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.792 | ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA | 122099 | 123859 | -1 | bch | D | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 4459 | 6219 | FALSE |
| bchI | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_102 | bchI_p_Myxococcota--c_Polyangia--o_Polyangiales--GCA_002699025.1---PABA01000098.1_82 | 100 | 339 | 0 | 0 | 1 | 339 | 1 | 339 | 1.97E-239 | 652 | Houyibacteriaceae--LLY-WYZ-15_3 | k141_102864_102 | k141_102864 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864 | 102 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 6 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864_102 # 123863 # 124879 # -1 # ID=85_102;partial=00;start_type=ATG;rbs_motif=GGA/GAG/AGG;rbs_spacer=5-10bp;gc_cont=0.745 | ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA | 123863 | 124879 | -1 | bch | I | Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1 | 6223 | 7239 | FALSE |

#### 2.2 Gene cluster sequence (`GC_seq`)
```
>pufC_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA
>pufM_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA
>pufL_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA
>bchO_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA
>bchD_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA
>bchI_Houyibacteriaceae--LLY-WYZ-15_3---k141_102864---1
ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA
```

#### 2.3 Gene cluster plot (`GC_plot`)
<img width="6000" height="900" alt="gc_plot case1" src="https://github.com/user-attachments/assets/37a15149-d86a-4d5c-a2bd-b39b69deb863" />



## Case 2: Using eggNOG (evolutionary gene genealogy Nonsupervised Orthologous Groups) format result
```r
# Case 2: Using eggNOG result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)
library(gclink)
data(eggnog_df)
data(seq_data)
data(KO_group)
KOs = c("K02291","K09844","K20611","K13789",
        "K09846","K08926","K08927","K08928",
        "K08929","K13991","K04035","K04039",
        "K11337","K03404","K11336","K04040",
        "K03403","K03405","K04037","K03428",
        "K04038","K06049","K10960","K11333",
        "K11334","K11335","K08226","K08226",
        "K09773")
rename_KOs = paste0("ko:", KOs)
eggnog_df$qaccver = eggnog_df$`#query`
eggnog_df$saccver = eggnog_df$KEGG_ko
eggnog_df$evalue = eggnog_df$evalue
eggnog_df$bitscore = eggnog_df$score
eggnog_df$gene = eggnog_df$KEGG_ko
gc_list_2 = gclink(in_blastp_df = eggnog_df,
                  in_seq_data = seq_data,
                  in_gene_list = rename_KOs,
                  in_GC_group  = KO_group,
                  AllGeneNum = 50,
                  MinConSeq  = 25,
                  apply_evalue_filter = FALSE,
                  min_evalue = 1,
                  apply_score_filter = TRUE,
                  min_score = 10,
                  orf_before_first = 1,
                  orf_after_last = 1,
                  levels_gene_group = c('bch','puh','puf','crt',
                                        'acsF','assembly','hypothetical ORF'),
                  color_theme = c('#3BAA51','#6495ED','#DD2421','#EF9320',
                                  '#F8EB00','#FF0683','grey'))
gc_meta_2 = gc_list_2[["GC_meta"]]
gc_seq_2 = gc_list_2[["GC_seq"]]
gc_plot_2 = gc_list_2[["GC_plot"]]
head(gc_meta_2)   # Cluster metadata
head(gc_seq_2)    # FASTA sequences
print(gc_plot_2)  # Visualization
```

### 1 Input Data Preview
#### 1.1 A dataframe of Diamond BLASTp output from eggNOG (e.g., head(`eggnog_df`))
| #query | seed_ortholog | evalue | score | eggNOG_OGs | max_annot_lvl | COG_category | Description | Preferred_name | GOs | EC | KEGG_ko | KEGG_Pathway | KEGG_Module | KEGG_Reaction | KEGG_rclass | BRITE | KEGG_TC | CAZy | BiGG_Reaction | PFAMs |
|--------|--------------|--------|-------|------------|---------------|--------------|-------------|---------------|-----|----|---------|--------------|-------------|---------------|-------------|-------|---------|------|---------------|-------|
| Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_1 | 439375.Oant_2732 | 1.57E-45 | 162 | COG3293@1\|root,COG3293@2\|Bacteria,1PVIT@1224\|Proteobacteria,2TURP@28211\|Alphaproteobacteria,1J3RT@118882\|Brucellaceae | 28211\|Alphaproteobacteria | L | Transposase DDE domain | - | - | - | ko:K07492 | - | - | - | - | ko00000 | - | - | - | DDE_Tnp_1,DDE_Tnp_1_2,DUF4096 |
| Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_2 | 1173264.KI913949_gene2450 | 3.58E-17 | 83.6 | COG3335@1\|root,COG3415@1\|root,COG3335@2\|Bacteria,COG3415@2\|Bacteria,1G39S@1117\|Cyanobacteria,1HCKE@1150\|Oscillatoriales | 1117\|Cyanobacteria | L | COGs COG3415 Transposase and inactivated derivatives | - | - | - | ko:K07494 | - | - | - | - | ko00000 | - | - | - | DDE_3,HTH_32,HTH_Tnp_IS630 |
| Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_3 | 794903.OPIT5_03400 | 3.03E-30 | 114 | COG3335@1\|root,COG3335@2\|Bacteria | 2\|Bacteria | L | DDE superfamily endonuclease | - | - | - | ko:K07494 | - | - | - | - | ko00000 | - | - | - | DDE_3,HTH_Tnp_IS630 |
| Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_5 | 502025.Hoch_2790 | 2.78E-50 | 191 | 2AY84@1\|root,31QA9@2\|Bacteria,1QMYF@1224\|Proteobacteria,4374U@68525\|delta/epsilon subdivisions,2X20E@28221\|Deltaproteobacteria,2YWTZ@29\|Myxococcales | 28221\|Deltaproteobacteria | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_11 | 105420.BBPO01000003_gene1121 | 2.00E-11 | 72.8 | COG2887@1\|root,COG2887@2\|Bacteria,2GJC5@201174\|Actinobacteria,2NGJC@228398\|Streptacidiphilus | 201174\|Actinobacteria | L | Protein of unknown function (DUF2800) | recB | - | - | ko:K07465 | - | - | - | - | ko00000 | - | - | - | PDDEXK_1 |
| Kuafuiibacteriaceae--GCA_016703535.1---JADJBV010000001.1_12 | 1122915.AUGY01000071_gene4398 | 2.13E-37 | 152 | COG1201@1\|root,COG1201@2\|Bacteria,1UHYQ@1239\|Firmicutes,4ISB0@91061\|Bacilli,277Q5@186822\|Paenibacillaceae | 91061\|Bacilli | L | helicase superfamily c-terminal domain | - | - | - | - | - | - | - | - | - | - | - | - | DUF1998,Helicase_C |

#### 1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ⁠ORF_id # start # end # strand # ...⁠) and Sequence (e.g., head(`seq_data`))
Same with Case 1

#### 1.3 (Optional) KO/gene group (e.g., head(`KO_group`))
| gene       | gene_group | gene_label |
|------------|------------|------------|
| ko:K04035  | acsF       | acsF       |
| ko:K08226  | assembly   | bch2       |
| ko:K04039  | bch        | B          |
| ko:K11337  | bch        | C          |
| ko:K03404  | bch        | D          |
| ko:K11336  | bch        | F          |

#### 1.4 (Optional) Candidate KO/gene list
ko:K04035 ko:K08226 ko:K04039 ko:K11337 ko:K03404 ko:K11336

### 2 Output Data Preview
#### 2.1 Gene cluster information (`GC_meta`)
Similar with Case 1
#### 2.2 Gene cluster sequence (`GC_seq`)
Similar with Case 1
#### 2.3 Gene cluster plot (`GC_plot`)
<img width="6000" height="900" alt="gc_plot case2" src="https://github.com/user-attachments/assets/19982c2f-b235-41d9-8d49-03fde3e4ba2c" />


## Documentation

Full function reference:
```r
?gclink::gclink
```

## Citation

If you use `gclink` in your research, please cite:

> Li, L., Huang, D., Hu, Y., Rudling, N. M., Canniffe, D. P., Wang, F., & Wang, Y.
> "Globally distributed Myxococcota with photosynthesis gene clusters illuminate the origin and evolution of a potentially chimeric lifestyle."
> *Nature Communications* (2023), 14, 6450.
> https://doi.org/10.1038/s41467-023-42193-7

## Dependencies

- R (≥ 3.5)
- dplyr (≥ 1.1.4)
- gggenes (≥ 0.5.1)
- ggplot2 (≥ 3.5.2)

## License

GPL-3 © [Liuyang Li](https://orcid.org/0000-0001-6004-9437)

## Contact

- Maintainer: Liuyang Li <cyanobacteria@yeah.net>
- Bug reports: https://github.com/LiuyangLee/gclink/issues
