This dataset contains the simulated PCHiC interactions for human and mouse. There are two sub-folders, one for each species. In each sub-folder, you will find the following files: 1) simulated_all_interactions.txt This file contains the simulated PCHiC interactions, combined across all available samples. There are several tab-separated columns, starting with the following ones: - chr_bait: chromosome of the baited fragment. - start_bait: start position of the baited fragment. - end_bait: end position of the baited fragment. - chr: chromosome of the contacted fragment. - start: start position of the contacted fragment. - end: end position of the contacted fragment. - type: type of the contacted fragment (baited or unbaited). - distance: distance between the mid-point coordinates of the baited and contacted fragment. The distance is given in absolute value. The following columns correspond to the presence/absence of the corresponding contact, in the simulated data for each of the analyzed samples. If an interaction was not seen in a given sample, the value is set to NA. If the interaction was present, the value is set to 1. For each species, you will also find the interactions separately for each sample, in the sub-folder named "interactions_samples". The data is provided for each sample in a tab-delimited file, with the following columns: - bait_chr: chromosome of the baited fragment. - bait_start: start position of the baited fragment. - bait_end: end position of the baited fragment. - chr: chromosome of the contacted fragment. - start: start position of the contacted fragment. - end: end position of the contacted fragment. - N_reads: the number of reads that support this interaction. - score: the score of the interaction computed by CHiCAGO. - baited_frag: type of the contacted fragment (baited or unbaited). - dist: distance between the mid-point coordinates of the baited and contacted fragment. The distance is given in absolute value. All coordinates are provided with the convention that the first base of the chromosome is numbered 1. The chromosome names follow the UCSC notations, that is, chr1, chr2,..., chrX, chrY. ############################################################################## 2) simulated_subsampled_interactions.txt This file contains a subsampled set of simulated interactions. The same number of interactions are drawn for each sample. This set is used for the analysis of contact consrvation (figure 5 and associated supplemental figures) 3) simulated_subsampled_global_interactions.txt This file contains a subsampled set of simulated interactions. Here we match the number of interactions in the pooled (combined across all samples) simulated dataset, to the number of interactions in the pooled PCHi-C dataset. ############################################################################# Here is a brief overview of the procedure that we used to obtain these interactions with "simulation/simulations_bait_other.py and simulation/run_simulations.sh" For each sample : 1 - calculate linear genomic distance between the center position of each baited fragment and each interacting fragment 2 - divide the observed interactions into 5 kb distance classes, from 25 kb to 10 Mb (3990 classes in total) 3 - compute the fraction of contacts that are found in each distance class, across all baited fragments for each sample resulting in a distance distribution 4 - for each baited fragment, compute the contact probability for all fragments found on the same chromosome, within the 25 kb - 10 Mb distance window, as the average probability of the overlapping distance classes 5 - randomly draw contacts based on this empirical probability distribution, respecting the number of interactions observed in the real PCHiC interactions for each bait The "all_interactions" file is obtain after running "script/reads2interactions/uniq_interactions_file_human.py" or "script/reads2interactions/uniq_interactions_file_mouse.py"