----------------------------------------------------------------------
How To Access Data
STACKdb v3.1.1
----------------------------------------------------------------------
Release Date: January 2003
The STACKdb v3.1.1 data is provided in relational database format for
use with the stackPACK v2.2 viewing, management and output generation
software, also provided with the release. Each STACKdb category is
stored as an individual stackPACK project. In addition the STACKdb data
is provided in FastA formatted flat files for immediate access,
searching and analysis. The following document describes STACKdb data
access in terms of viewing and exporting using the stackPACK web-based
interface and command line tools.
OVERVIEW OF VIEWING AND EXPORTING FUNCTIONS
Data type View Export
----------------------------------- --------- ------------------------
WebProbe WebReport Command Line
----------------------------------- --------- ------------------------
Sequence data:
- Input data in unmasked format (1) -(2) x x
- Input data in masked format x x x
- Clonelink consensus sequence x x x
- Primary consensus sequence x x x
- Alternate consensus sequence x x x
- Intermediate (phrap) contig x - -
consensus sequence
- Cluster members x -(3) x
- Cluster singletons x x x
- Phrap singletons x x x
Alignments:
- Final alignment x x x
- Intermediate (phrap) alignment x x x
Other:
- Cluster member list in csv format - - x
- Alignment analyses x x x
- Project summary report x - x
- Non-redundant project summary - x x
report in csv format
- Quality scores - - -
Sequence annotation (4):
- Direction x x x
- Length x x x
- Clone ID x x x
- Clone library x x x
- Tissue type x x x
- Organism x x x
------------------------------------------------------------------------
(1)STACKdb v3.1 data was converted to stackPACK v2.2 format in order to
produce STACKdb v3.1.1. The data output in original unmasked format
functionality is not available in stackPACK v2.1.1, used to produce
STACKdb v3.1, and original data can thus not be output in STACKdb v3.1.1.
(2)Sequence data can only be viewed in original unmasked format if the
masking step was skipped during the processing pipeline.
(3)The clonelink, cluster, contig and consensus members can be output
in FastA format by clicking on the appropriate accession number
in the cluster tree within WebProbe.
(4)Sequence annotation can only be parsed and thus viewed and exported
if it is in a defined and recognizable field within the sequence
header information. Please refer to the output format specification
document provided with this release for details on sequence
annotation parsing within STACKdb v3.1.1.
1. ACCESSING DATA FROM THE WEB INTERFACE
The STACKdb data is stored in stackPACK's relational database and is
managed, viewed and exported using the following web interface
components:
- WebProjectManager: Project management and manipulation.
- WebProbe: Results viewing and analysis.
- WebReport: Output reporting for download and evaluation.
Please note that WebPipe and certain parts of WebProjectManager, which
are responsible for project creation and data processing in stackPACK,
have been disabled in the version provided for STACKdb viewing and
export. If you would like to cluster your own data using the stackPACK
transcript reconstruction and variation analysis management system,
please contact Electric Genetics at sales@egenetics.com. Academic and
non-profit institutions may download the full stackPACK toolset from:
http://www.sanbi.ac.za/CODES/
1.1 WebProjectManager
The project manager lists the various STACKdb categories and allows some
basic project management and manipulation such as the display of summary
reports. Users are also able to filter the full project list by name,
owner or description by entering all or part of the project information
in the text box and clicking on 'Search'.
Data processing, project deletion and project description editing within
WebProjectManager is only possible for full stackPACK users.
--------------------------------------------------------------------
How To Use WebProjectManager:
- Click on WebProjectManager in the menu bar for a list of the
various STACKdb categories
- Click on the actual project name for a summary report
- Select the tick box next to the relevant project name(s) and click
on:
o 'Edit_Description' to edit the project description (disabled)
o 'Add_Sequences' to incrementally add new sequence data to
existing projects (disabled)
o 'Delete_Project(s)' to delete the project (disabled)
--------------------------------------------------------------------
Please refer to WebProjectManager in the online manual found at
stackPACK's Support link for further details.
1.2 WebProbe
WebProbe helps users gain insight into the tissue clusters by providing
viewing tools that link various consensus sequences, alignments,
alignment analyses and external data sources like UniGene. STACKdb
entries have internal accession numbers for the clonelinks, clusters,
contigs and consensus sequences produced in the clustering pipeline and
these, as well as the original GenBank sequence accession numbers, may
be used to query the viewer.
--------------------------------------------------------------------
How To Use WebProbe:
- Click on WebProbe in the menu bar
- Enter the project name and owner, or use the 'Search' option to
filter the full project list
- Query WebProbe:
o For cluster data by entering an internal or GenBank accession
number in the 'Accession' text box and clicking on 'View'
o For clusters with potential alternate expression forms by
clicking on 'Alternates_List'
o For a summary report by clicking on 'Summary'
--------------------------------------------------------------------
WebProbe allows the user to do the following:
- Retrieve specific cluster family information by querying with internal
clonelink, cluster, contig, consensus or original Genbank accession
number.
- Download clonelink, cluster and contig member sequences in FastA
format.
- Link out to the corresponding cluster in Unigene.
- Obtain a summary of any STACKdb category, including:
o Total number of input sequences
o Total number of clonelinked clusters
o Total number of multi-sequence clusters
o Total number of sequences in multi-sequence clusters
o Total number of singletons
NOTE: The same summary report is also obtainable from
WebProjectManager.
- Obtain a report of all alternate consensus sequences of any STACKdb
category.
- Access the following consensus and alignment views associated with
each cluster:
o Clonelink Consensus Sequence View
The clonelink consensus sequence produced during the clonelinking
step. Multi-sequence clusters and/or singletons are joined to
form clonelinks by virtue of shared clone IDs.
o All Final Consensus Sequence(s) View
All final consensus sequences for a particular contig produced
during the alignment analysis step. The final consensus sequences
are displayed in rank order in this view and includes both primary
and alternate consensus sequences.
o PHRAP Consensus Sequence View
The intermediate contig consensus sequence produced during the
assembly step.
o PHRAP Alignment View
Assembly and initial sequence alignment produced during the
assembly step.
o Alignment Analysis View
An overview of the entire contig produced during the alignment
analysis step, highlighting any potential alternate expression
forms that may occur.
o Final Consensus Sequence View
The final consensus sequence for a particular contig produced
during the alignment analysis step. This could be a primary or
alternate consensus sequence.
o Final Alignment View
Final sequence alignment produced during the alignment analysis
step. Potential alternate expression forms for each contig are
identified and a separate alignment is displayed for each
potential alternate expression form.
NOTE:
- Data analysis is simplified by four different display options in
WebProbe for the alignment views: plain text, coloured bases,
differences only and highlighted differences.
- You are able to output all consensus sequence members in FastA or
csv formats using one of the pre-defined reports.
- You are able to output all alignments in ACE, ClustalW or MSF
formats using one of the pre-defined reports for further searching,
editing and use in other programs using the command line or
interface reports.
ALTERNATE EXPRESSION FORMS
Contigs may have more than one possible consensus sequence, due to
alternate splicing, chimeras or other isoforms of the gene represented
by the cluster.
--------------------------------------------------------------------
How to search for clusters with possible alternate expression forms:
During the clustering pipeline, the program looks for subalignments
that might represent alternate expression forms. To find all clusters
in a tissue category with alternate expression forms:
Go to stackPACK WebProbe by clicking WebProbe in the top menu bar
- Enter the project name and owner, or use the 'Search' option to
filter the full project list
- Request an Alternates Report by clicking on 'Alternates_List'
StackPACK will provide you with a list of cluster contigs that
contain potential alternate expression forms.
--------------------------------------------------------------------
Please refer to WebProbe in the online manual found at stackPACK's
Support link for further details.
1.3 WebReport
WebReport provides a list of predefined reports that can be selected to
download STACKdb entries for further data evaluation.
--------------------------------------------------------------------
How To Use WebReport:
- Click on WebReport in the menu bar
- Enter the project name and owner, or use the 'Search' option to
filter the full project list
- Choose the type of report you wish to generate
- Click on 'Next' to select the report options
- Click on 'Export' to generate the report
--------------------------------------------------------------------
The following web reports are available:
- All input sequences in masked or original FastA format.
- Consensus sequences in FastA format
o All clonelinked consensus sequences, multi-sequence cluster
primary consensus sequences or/and multi-sequence cluster
alternate consensus sequences.
o Note: The alternate consensus sequence option report is
provided for each of the STACKdb categories with the release.
- Singleton sequences in FastA format
o Cluster and/or phrap singleton options per project.
- Alignments in MSF, ClustalW or ACE format
o Final or intermediate (phrap) alignments per accession or
per project in ACE, ClustalW or MSF formats.
- Alignment analyses
o The Alignment Analysis logs, per accession or for the whole
project.
- Non-redundant output of the entire project in FastA format
o This selection produces a comprehensive output report
containing all consensus and singleton data from the project.
o This report is provided for each of the STACKdb categories
with the release.
- Non-redundant summary of the entire project, in comma-delimited
format.
o This report is provided as an overview to the entire project
that can easily be manipulated by the user.
o All information about each EST or mRNA sequence is provided
in a single row, including various stackPACK internal IDs,
original EST or mRNA ID, length, clone library and clone ID
information.
o The resultant comma delimited file can be imported into most
spreadsheet programs and sorted for further analysis.
Please refer to WebReport in the online manual found at stackPACK's
Support link for further details.
--------------------------------------------------------------------
How do I create a BLASTable database of data processed by stackPACK?
- Using one of stackPACK's FastA-formatted reports, such as the
non-redundant report provided with the distribution, extract the
data you wish to search using BLAST.
- Format the resulting FastA file using NCBI's 'formatdb' program.
- For detailed information on how to run formatdb or set up BLASTable
database, please refer to the BLAST OPTIONS in the NCBI BLAST
directory (http://cbr-rbc.nrc-cnrc.gc.ca/documentation/blast/formatdb.html)
--------------------------------------------------------------------
2. ACCESSING DATA FROM THE COMMAND LINE
A series of scripts are provided that allow STACKdb users to export
STACKdb data from the command line using a number of predefined
reports. These reports correspond to those found in the WebReport,
with the exception of the stack_ReportClusterMemberEst.py report
that is not available from the web interface.
The following command line reports are available:
- stack_ReportAllSequences.py
o All masked or original unmasked input sequence, in FastA format.
- stack_ReportConsensus.py
o All clonelinked consensus sequences, multi-sequence cluster
primary consensus sequences or/and multi-sequence cluster
alternate consensus sequences, in FastA format.
- stack_ReportAllSingleton.py
o All cluster and/or phrap singleton sequences, in FastA format.
- stack_ReportClusterMemberEst.py
o List of all constituent EST or mRNA sequences per cluster
accession or for the whole project, in FastA or comma-delimited
format.
- stack_ReportAlignment.py
o Final or intermediate (phrap) alignments, per accession or for
the whole project, in MSF, ClustalW or ACE formats.
- stack_ReportClusterAlignmentAnalysis.py
o The Alignment Analysis logs, per accession or for the whole
project.
- stack_ReportNonRedundant.py
o Non-redundant output of the entire project, in FastA or
comma-delimited format.
Please refer to section 5, " Exporting Data from the Command Line," in
the stackPACK command line manual found at stackPACK's Support link
for a detailed description of each report.
3. DOCUMENTATION
The following documentation is available from stackPACK's Introduction
link, to assist the user in getting the most out of searching STACKdb:
- A brief guide to the interface.
- Detailed user manual with links available from every page of the
interface.
- Naming conventions used in the stackPACK interface.
- Input data format descriptions.
- Command line user manual.
4. DO YOU STILL HAVE ANY QUESTIONS?
Please contact Electric Genetics at:
phone +27 21 959 3964
fax +27 21 959 2512
e-mail support@egenetics.com