This dataset contains PCHiC interactions for human and mouse, obtained after processing with HiCUP and and CHiCAGO. There are two sub-folders, one for each species. In each sub-folder, you will find the following files: 1) bait_coords_${assembly}.txt (where ${assembly} is hg38 for human and mm10 for mouse). This file contains the coordinates and TSS annotations (+/-1kb) of the baited fragments. There are four tab-separated columns: ID, chromsome, start, end, TSS_ID, gene_ID 2) frag_coords_${assembly}.txt (where ${assembly} is hg38 for human and mm10 for mouse). This file contains the coordinates of all restriction fragments, baited or not. There are four tab-separated columns: chromsome, start, end, ID. 3) all_interactions.txt This file contains the PCHiC interactions, combined across all available samples. There are several tab-separated columns, starting with the following ones: - chr_bait: chromosome of the baited fragment. - start_bait: start position of the baited fragment. - end_bait: end position of the baited fragment. - chr: chromosome of the contacted fragment. - start: start position of the contacted fragment. - end: end position of the contacted fragment. - type: type of the contacted fragment (baited or unbaited). - distance: distance between the mid-point coordinates of the baited and contacted fragment. The distance is given in absolute value. The following columns correspond to the CHiCAGO interaction score of the corresponding contact, in each of the analyzed samples. If an interaction was not seen in a given sample, its score is set to NA. For each species, you will also find the interactions separately for each sample, in the sub-folder named "interactions_samples". The data is provided for each sample in a tab-delimited file, with the following columns: - bait_chr: chromosome of the baited fragment. - bait_start: start position of the baited fragment. - bait_end: end position of the baited fragment. - chr: chromosome of the contacted fragment. - start: start position of the contacted fragment. - end: end position of the contacted fragment. - N_reads: the number of reads that support this interaction. - score: the score of the interaction computed by CHiCAGO. - baited_frag: type of the contacted fragment (baited or unbaited). - dist: distance between the mid-point coordinates of the baited and contacted fragment. The distance is given in absolute value. All coordinates are provided with the convention that the first base of the chromosome is numbered 1. The chromosome names follow the UCSC notations, that is, chr1, chr2,..., chrX, chrY. 4) subsampled_interactions.txt This file contains a table of interactions, obtained after subsampling to homogenize the number of interactions per sample (Methods) 5) nb_mapped_reads.txt This file contains the number of PCHi-C mapped reads, for each restriction fragment and for each sample (technical and biological replicates provided separately). 6) mappability_statistics.txt This file contains the theoretical mappability statistics, obtained with artificial read re-mapping, 7) aberrant_interactions.txt This file contains a table of aberrant interactions, which take place at large distances (>1Mb or trans) in one species, but at small distances (100kb) in the other species. These interactions may derive from genome assembly errors, we have discarded them (as well as all other interactions involving the same baits or contacted fragments) from all analyses. 8) fragment_statistics.txt This file contains various statistics (GC content, repeat content, BLAT results etc) for all restriction fragments. ############################################################################## Here is a brief overview of the procedure that we used to obtain these interactions. Obtention of restriction fragments: 1 - Digest genome with the corresponding restriction enzymes with hicup_digester (from the HiCUP pipeline) 2 - Obtain bait positions from Supplementary Data in Schoenfelder et al. 2015 3 - Convert coordinates between genome assemblies (hg19 to hg38, mm9 to mm10) with LiftOver 4 - Obtain list of all restriction fragments and baited fragments with create_baitmap.pl (from the CHICAGO pipeline) 5 - Convert "Ensembl conventional coordinates notation" (i.e : 1:0:16007) to "UCSC conventional coordinates notation" (i.e : chr1:1:16007) Obtention of interactions : 1 - Download SRA files to obtain PC-HIC reads (SRA_to_fastq.sh) 2 - Run HiCUP pipeline on each sample (run_HiCup.sh) 4 - Merge and Deduplication of interactions if biological replicates (run_merge_dedup.sh) 3 - Run CHICAGO on each sample (run_bam2chicago + run_chicago.sh) 4 - Get all interactions for a species (uniq_interactions_file_human.py or uniq_interactions_file_mouse.py) 5 - Add some statistics in interactions files (add_infos_all_interactions.py + add_infos_sample_interactions.py