Skip to content

Instantly share code, notes, and snippets.

@chrisamiller
Created November 7, 2025 20:30
Show Gist options
  • Select an option

  • Save chrisamiller/230cf13c1ee0ca10a5535279957f48a5 to your computer and use it in GitHub Desktop.

Select an option

Save chrisamiller/230cf13c1ee0ca10a5535279957f48a5 to your computer and use it in GitHub Desktop.
Working with FASTQs on the command line

Working with FASTQs on the command line

We're going to work with data from a human cell line posted here: https://storage.googleapis.com/bfx_workshop_tmp/Exome_Tumor.tar

  • Make a directory called "week03", and download the tarball to your computer using the command line (wget or curl).

  • Use tar -xvf to extract the directory from the tar file, then cd into the directory and look around with ls. We're not going to use all of this data in this week's homework. Let's focus on the contents of Exome_Tumor.tar. Untar it, then unzip the fastq files.

  • Look at the first three records (not first three lines!) of each fastq file. Take a close look at the read names and how they match up across files.

  • How many paired end sequences do these files contain?

  • How many total nucleotides of sequence are contained in these two files?

  • What is the read length? Is the read length consistent for every record?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment