imported>Gydbwiki at 14:25, 1 April 2015

2015-04-01T14:25:07Z

New page

[[GPRO_Wiki|Return to Index]]

__FORCETOC__

=Menu Tools: Data preprocessing=

==Data preprocessing==

In Next generation Sequencing (NGS) sequence preprocessing is the process of transforming raw reads to assembly-ready sequence generating in parallel associated informative reports. Raw data preprocessing includes tasks such as converting the raw trace file from proprietary to standard form, deriving template information, base-calling, vector screening, quality evaluation and control, disk management, associated tracking and reporting operations, demultiplex, sequence trimming/clipping and elimination of artifacts etc.

Preprocessing of raw data is thus a necessity has given rise to a number of Unix-based free-source software tools using a wide range of paradigms. Management and use of these tools requires some informatic skills about linux commands. Taking this into primary consideration we implemented GPRO with a multi-funtional friendly-to-use interface (Figure 7.1) in order to let the users to deal with the most representative preprocessing tools installed in the remote server just having skills at the user level (click-and-go actions) although we assume . You can access the preprocessing GPRO interface just clicking in the tab “Data Preprocessing” of the main menu highlighted in the Figure 7.1 below.

<br>
[[Image:gp_7_1.jpg|center|900px]]
<table align=center style="width:900px">
<tr>
<td>
'''Figure 7.1'''. Interface for Data preprocessing
</td>
</tr>
</table>
<br>

== Menu ==

Figure 7.2 schematizes the menu within the preprocessing interface showing the organization of the different solutions installed in our server to which GPRO currently support the accession. Almost all these pre-processing tools are free source tools designed by third parties so you must cite them if you obtain some interesting publishable results.

<br>
[[Image:gp_7_2.jpg|center|900px]]
<table align=center style="width:900px">
<tr>
<td>
'''Figure 7.2'''. Preprocessing menu.
</td>
</tr>
</table>
<br>

Following is a brief description of each interface tab.

===Converters===

This tab facilitates accession to some scripts for format conversion (as shown in Figure 7.2). You have two tabs "Color space" and "Nucleotide space". The first links to a script that converts Solid-based (color-space) fasta files coupled with quality files either into color space fastq (csFastq) or the conventional nucleotide based fastq.

===Private user tools===
If you have your own server coupled with GPRO you also have a tab you can use for running other proprietary source code tools our your personal scripts (if you need more details about how proceed please contact us

===Processing and cleaning===

This tab provides accessing to three distinct software packages for preprocessing and cleaning via the preprocessing interface. These are;

# Cutadapt ([[literature:100853|Martin 2011]]), for removing primers and adapters from the sequences any many more actions. For more details please visit [http://code.google.com/p/cutadapt/ the web site]
# Fastxtool kit a collection of tools summarized in Figure 2 for fasta and fastq preprocessing [http://hannonlab.cshl.edu/fastx_toolkit/ FASTX-TOOL-KIT],
# Prinseq ([[literature:100869|Schmieder and edwards 2011]]), which is a tool for filtering, reformat, and/or trimming sequence data, for more info visit [http://prinseq.sourceforge.net/ the web site]

===Quality analyses===

You can use this tab for performing quality analyses using [http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ FASTQC]

== How to proceed with the preprocessing interface ==

The way to manage any of the aforesaid tools via the interface is quite intuitive and friendly. As shown in Figure 7.3, when launching the preprocessing interface you also activate a FTP protocol between your PC and your user account in the server pipeline. The first step is you to drag the files you want to process from your PC to your user account.

Then create (by right-clicking) an output folder that you can name as your wish, select the file format (fastq, fasta or fasta + qual) in the interface box named format,

Subsequently, select in the menu the tool you are going to use (the interface will automatically the applications and requirements of the selected tool). Then, use the mouse to drag both the input files you want to preprocess and the output folder wherein you want to get the resulting files to the top box (small red line in figure) and the output fold box (larger red line), respectively.

<br>
[[Image:gp_7_3.jpg|center|900px]]
<table align=center style="width:900px">
<tr>
<td>
'''Figure 7.3'''. Managing the preprocessing interface.
</td>
</tr>
</table>
<br>

Finally, at the bottom of the interface you have an interactive form listing all command options and parameters (Figure 7.4) provided by this tool, select the option or fill the box data analysis parameters where required and then you are ready to launch the preprocessing analysis. In this task, you have two options. You can click the tab "Run program" (in Figure 7.3) to launch the analysis as such you configured or you can click on the Tab "Append and command" then your command string will appear in the queue box below allowing you to prepare other analyses. In this way you can simultaneuosly launch the same command on multiple files where you will only need to drag a new input file to the input box, or yet more interesting, if you keep the option "Use output file created by previous command as the next input file" selected you can design an "ad hoc" preprocessing pipeline for a particular data file. This is, you design a command for demultiplexing your file and then another command for trimming the first 10 nucleotides at 5´in the output of the last command and then eliminate all those sequences having not enough quality (according to a threshold) from the output of the former output and so on.

<br>
[[Image:gp_7_4.jpg|center|800px]]
<table align=center style="width:800px">
<tr>
<td>
'''Figure 7.4'''. Form for selecting commands and parameters.
</td>
</tr>
</table>
<br>

[[GPRO_Wiki|Return to Index]]

[[Category:GPRO manual]]

Menu Tools: Data preprocessing - Revision history

imported>Gydbwiki at 14:25, 1 April 2015