Uploaded image for project: 'IGB'
  1. IGB
  2. IGBF-1422

Find example data sets where soft clipping reveals polymorphisms

    Details

    • Type: Improvement
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Done
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Story Points:
      0.5
    • Sprint:
      Fall 2018 Sprint 3

      Description

      Find real soft clip data for testing purposes. The ideal dataset would have the following:
      -paired-end data
      -data with known structural variants
      -preferably aligned in BAM format

      One user who uses soft clipping in their research suggested the following links:
      http://jimb.stanford.edu/giab-resources/
      https://toolbox.google.com/datasetsearch

        Attachments

          Issue Links

            Activity

            aduong Anh Moss (Inactive) created issue -
            aduong Anh Moss (Inactive) made changes -
            Field Original Value New Value
            Link This issue relates to IGBF-1291 [ IGBF-1291 ]
            aduong Anh Moss (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            aduong Anh Moss (Inactive) made changes -
            Summary Real soft clip data Find real soft clip data
            Description Look for real world data for soft clipping.

            -pair-end would be great
            -with known structural variants
            -aligned as a BAM file, preferably

            Find real soft clip data for testing purposes. The ideal dataset would have the following:
            -paired-end data
            -data with known structural variants
            -preferably aligned in BAM format

            One user who uses soft clipping in their research suggested the following links:
            ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/NHGRI_Illumina300X_novoalign_bams/
            ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/
            aduong Anh Moss (Inactive) made changes -
            Description Find real soft clip data for testing purposes. The ideal dataset would have the following:
            -paired-end data
            -data with known structural variants
            -preferably aligned in BAM format

            One user who uses soft clipping in their research suggested the following links:
            ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NIST_NA12878_HG001_HiSeq_300x/NHGRI_Illumina300X_novoalign_bams/
            ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/
            Find real soft clip data for testing purposes. The ideal dataset would have the following:
            -paired-end data
            -data with known structural variants
            -preferably aligned in BAM format

            One user who uses soft clipping in their research suggested the following links:
            http://jimb.stanford.edu/giab-resources/
            https://toolbox.google.com/datasetsearch
            Hide
            aduong Anh Moss (Inactive) added a comment -

            Here is a link to the following dataset:
            https://figshare.com/articles/BAM_and_BAI_files/4530707/1

            Data posted by: Reetta Holmila: Massively parallel deep sequencing of plasma cfDNA methylation in HCC and controls (categories: biomarkers)

            This page lists both the BAM and BAI files.

            Show
            aduong Anh Moss (Inactive) added a comment - Here is a link to the following dataset: https://figshare.com/articles/BAM_and_BAI_files/4530707/1 Data posted by: Reetta Holmila: Massively parallel deep sequencing of plasma cfDNA methylation in HCC and controls (categories: biomarkers) This page lists both the BAM and BAI files.
            Hide
            ann.loraine Ann Loraine added a comment -

            Quick followup:

            Is this DNA-Seq data or bisulfite sequencing data?

            Show
            ann.loraine Ann Loraine added a comment - Quick followup: Is this DNA-Seq data or bisulfite sequencing data?
            Hide
            aduong Anh Moss (Inactive) added a comment -

            I looked over the data information, and it does appear to be bisulfite sequencing data. I found this paragraph detailing the data:

            "Massively parallel deep bisulfite sequencing of plasma cfDNA methylation in HCC
            Published on 09 Jan 2017 - 00:14 by Reetta Holmila
            In this study, we applied targeted massively parallel semiconductor sequencing to assess methylation on a panel of genes (FBLN1, HINT2, LAMC1, LTBP1, LTBP2, PSMA2, PSMA7, PXDN, TGFB1, UBE2L3, VIM and YWHAZ) in plasma circulating cell-free DNA (cfDNA) and to evaluate the potential of these genes as HCC biomarkers in two different series, one from France (42 HCC cases and 42 controls) and one from Thailand (42 HCC cases, 26 chronic liver disease cases and 42 controls). We also analyzed a set of HCC and adjacent tissues and liver cell lines to further compare with ‘The Cancer Genome Atlas’ (TCGA) data."

            Link to that page: https://figshare.com/projects/Massively_parallel_deep_bisulfite_sequencing_of_plasma_cfDNA_methylation_in_HCC/18185

            Show
            aduong Anh Moss (Inactive) added a comment - I looked over the data information, and it does appear to be bisulfite sequencing data. I found this paragraph detailing the data: "Massively parallel deep bisulfite sequencing of plasma cfDNA methylation in HCC Published on 09 Jan 2017 - 00:14 by Reetta Holmila In this study, we applied targeted massively parallel semiconductor sequencing to assess methylation on a panel of genes (FBLN1, HINT2, LAMC1, LTBP1, LTBP2, PSMA2, PSMA7, PXDN, TGFB1, UBE2L3, VIM and YWHAZ) in plasma circulating cell-free DNA (cfDNA) and to evaluate the potential of these genes as HCC biomarkers in two different series, one from France (42 HCC cases and 42 controls) and one from Thailand (42 HCC cases, 26 chronic liver disease cases and 42 controls). We also analyzed a set of HCC and adjacent tissues and liver cell lines to further compare with ‘The Cancer Genome Atlas’ (TCGA) data." Link to that page: https://figshare.com/projects/Massively_parallel_deep_bisulfite_sequencing_of_plasma_cfDNA_methylation_in_HCC/18185
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Also need to locate BAM files from sequencing of non-bisulfite converted samples from individuals with structural variations.

            Show
            ann.loraine Ann Loraine added a comment - - edited Also need to locate BAM files from sequencing of non -bisulfite converted samples from individuals with structural variations.
            ann.loraine Ann Loraine made changes -
            Assignee Anh Moss [ aduong ] Ann Loraine [ aloraine ]
            Hide
            ann.loraine Ann Loraine added a comment -

            Un-assigning as Ahn is moving on to her next rotation. Good luck with everything!

            Show
            ann.loraine Ann Loraine added a comment - Un-assigning as Ahn is moving on to her next rotation. Good luck with everything!
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Status In Progress [ 3 ] Open [ 1 ]
            ann.loraine Ann Loraine made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ]
            ann.loraine Ann Loraine made changes -
            Attachment SoftClipRegion.png [ 14162 ]
            Hide
            ann.loraine Ann Loraine added a comment - - edited

            Data sets from genome sequencing saved in IGB Dropbox under "Jira issues" folder for this issue.
            Data are from twitter thread w/ Brent Pedersen from Aaron Quinlan lab University of Utah. See attached.
            Marking this as closed (for now) as we now have a good handle on how to get more data & where to find potential reviewers/representative users:

            • Brent Pedersen, Quinlan lab (working on structural variants)
            • Greenwood Genetics (human genetics clinic in South Carolina)
            • Canadian Bioinformatics workshop on genomic medicine instructors & attendees (attended by Nowlan & Ann, 2018)
            • Wake Forest human genetics and genomics faculty
            • Steve Chervitz Trutane (Personalis)
            • Clinical variant analysts (https://www.linkedin.com/in/shelly-sorrells-phd-2a46333b/)
            Show
            ann.loraine Ann Loraine added a comment - - edited Data sets from genome sequencing saved in IGB Dropbox under "Jira issues" folder for this issue. Data are from twitter thread w/ Brent Pedersen from Aaron Quinlan lab University of Utah. See attached. Marking this as closed (for now) as we now have a good handle on how to get more data & where to find potential reviewers/representative users: Brent Pedersen, Quinlan lab (working on structural variants) Greenwood Genetics (human genetics clinic in South Carolina) Canadian Bioinformatics workshop on genomic medicine instructors & attendees (attended by Nowlan & Ann, 2018) Wake Forest human genetics and genomics faculty Steve Chervitz Trutane (Personalis) Clinical variant analysts ( https://www.linkedin.com/in/shelly-sorrells-phd-2a46333b/ )
            ann.loraine Ann Loraine made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            ann.loraine Ann Loraine made changes -
            Summary Find real soft clip data Find example data sets where soft clipping reveals polymorphisms
            ann.loraine Ann Loraine made changes -
            Assignee Ann Loraine [ aloraine ] Anh Moss [ aduong ]
            ann.loraine Ann Loraine made changes -
            Workflow Loraine Lab Workflow [ 18128 ] Fall 2019 Workflow Update [ 19954 ]
            ann.loraine Ann Loraine made changes -
            Workflow Fall 2019 Workflow Update [ 19954 ] Revised Fall 2019 Workflow Update [ 22075 ]

              People

              • Assignee:
                aduong Anh Moss (Inactive)
                Reporter:
                aduong Anh Moss (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: