Long-Read Data Analysis - Vietnam Genome Project

LONG-READ DATA ANALYSIS

Sample summary

Long-read sequencing of two samples (male/female) for the reference of Vietnamese

- VN920 (Cell 1,2)

- VN7 (Cell 3,4)

Batch Report

Sequencing information of concordance rate, loading rate...

Raw data Report

Sequencing information of adapter, plymerase reads, subreads...

CCS metrics

Circular consensus sequencing to analyze raw data with all parameter information

Hifi Read metrics

Hifi read information: read count, number of bases of 2 samples in each cell

Analysis log

Outcome of Cromwell pipeline with all parameter information

PacBio – HiFi gDNA SMRTbell Express 2.0 library prep

Detailed methods:

HiFi gDNA libraries were prepared using the SMRTbell Express Template Prep Kit 2.0 (PacBio, 100-938-900) according to the standard protocol (PacBio, PN 101-853-100 Version 05 (August 2021)), except there was no AMPure PB clean up after the adapter ligation and the nuclease treatment step was done following an older protocol (PN 101-853-100 Version 03 (January 2020)) using the SMRTbell Enzyme Cleanup Kit (PacBio, 101-746-400). 10-12 µg of gDNA was sheared to a mode size of ~15-20 kb using the Megaruptor 2 with Long Hydropores (Diagenode, E07010002). The sheared DNA was purified with AMPure PB beads (PacBio, PCB-100-265-900) before being quantified on the Qubit fluorometer using the Qubit dsDNA HS assay kit (Invitrogen, Q32854). The sheared DNA size distribution was confirmed on the Agilent Femto Pulse using the Genomic DNA 165 kb Kit (Agilent, FP-1002-0275). 9-10 µg of purified, sheared DNA was used to make each library as follows. The DNA was treated to remove single-stranded overhangs, followed by a DNA damage repair reaction and an end-repair/A-tailing reaction. Overhang adapters were ligated to the A-tailed library fragments and a nuclease treatment was done to remove damaged or non-intact SMRTbell templates, followed by purification with AMPure PB beads. The libraries were size selected using the BluePippin and a 0.75% Agarose Cassette with the S1 Marker (Sage Science, BLF7510) to enrich for fragments >10 kb, followed by purification with AMPure PB beads. The final purified, size selected libraries were quantified on the Qubit fluorometer using the Qubit dsDNA HS assay kit (Invitrogen, Q32854) to assess concentration, and the Agilent Femto Pulse using the Genomic DNA 165 kb Kit (Agilent, FP-1002-0275) to assess fragment size distribution.

Short methods:

HiFi gDNA libraries were prepared using the SMRTbell Express Template Prep Kit 2.0 (PacBio, 100-938-900). The gDNA was sheared to a mode size of ~15-20 kb, and 9-10 µg of purified, sheared DNA was used to make each library according to the standard protocol (PacBio, PN 101-853-100 Version 05 (August 2021)), except for the nuclease treatment step, which followed a different protocol (PN 101-853-100 Version 03 (January 2020)) using the SMRTbell Enzyme Cleanup Kit (PacBio, 101-746-400). The final purified libraries were quantified on the Qubit fluorometer using the Qubit dsDNA HS assay kit (Invitrogen, Q32854) to assess concentration, and the Agilent Femto Pulse using the Genomic DNA 165 kb Kit (Agilent, FP-1002-0275) to assess fragment size distribution.

PacBio – Sequel II sequencing v2

Sequencing was performed using the PacBio Sequel II (software/chemistry v11.0). The libraries were prepared for sequencing according to the SMRT Link (v11.0) sample setup calculator, following the standard protocol with AMPure PB bead purification, using Sequencing Primer v5, Sequel II Binding Kit v2.2 and Sequel II DNA Internal Control v1.0. The polymerase-bound libraries were sequenced on two SMRT Cells each with a 30 hour movie time plus a 2 hour pre-extension using the Sequel II Sequencing 2.0 Kit (PacBio, 101-820-200) and SMRT Cell 8M (PacBio, 101-389-001).

Raw data

Main output files:

subreads.bam = Main raw data file. Contains unaligned base calls from high-quality regions.

subreads.bam.pbi = Recorded auxiliary identifying information and precomputed summary statistics per aligned read. May be required for analysis in SMRT Link.

sts.xml = Contains summary statistics about the collection/cell and its post-processing.

subreads.xml = File needed for importing the data into SMRT Link.

CCS (HiFi) analysis (including 5mC CpG detection)

After sequencing, the raw subreads were processed to generate CCS (HiFi) reads using the default settings of the CCS application (v6.3.0) in SMRT Link (v11.0).

Default ccs analysis settings:

Parameter	Default value
Minimum CCS read length	10
Maximum CCS read length	50,000
Process All Reads	On (overrides the above read length cut-offs)

Main output files:

ccs.report.csv.zip = CCS per read details

with_5mC.bam = BAM file containing all the HiFi reads in the sample that include 5mC calls.

m*.hifi_reads.bam = BAM file containing all HiFi reads.

m*.hifi_reads.fasta.gz = HiFi reads FASTA file.

m*.hifi_reads.fastq.gz = HiFi reads FASTQ file.

ccs_accuracy_hist.png = Read quality distribution.

ccs_hifi_read_length_yield_plot.png = Yield of HiFi data by read length.

ccs_readlength_hist_plot.png = Read length distribution of HiFi reads.

ccs_npasses_hist.png = Number of passes for HiFi reads.

m5c_detections.png = CpG Methylation in Reads: The cumulative of percentage of CpG sites in the sample mapped against the predicted probability of methylation.

m5c_detections_hist.png = CpG Methylation in Reads (Histogram): Histogram displaying the percentage of CpG sites in the sample versus the predicted probability of methylation.

Library preparation and sequencing was performed at the University of Queensland Sequencing Facility (University of Queensland, Brisbane, Australia).

Contact

Quick Links