----------------------------------------------------------------------

                        How To Access Data  
                          STACKdb v3.1.1

----------------------------------------------------------------------

Release Date: January 2003

The STACKdb v3.1.1 data is provided in relational database format for 
use with the stackPACK v2.2 viewing, management and output generation 
software, also provided with the release. Each STACKdb category is 
stored as an individual stackPACK project. In addition the STACKdb data
is provided in FastA formatted flat files for immediate access, 
searching and analysis. The following document describes STACKdb data 
access in terms of viewing and exporting using the stackPACK web-based 
interface and command line tools. 


OVERVIEW OF VIEWING AND EXPORTING FUNCTIONS

Data type			     View     	Export
-----------------------------------  ---------  ------------------------
				     WebProbe 	WebReport   Command Line
-----------------------------------  ---------  ------------------------
Sequence data:			    
- Input data in unmasked format (1)  -(2)	x	    x
- Input data in masked format	     x		x	    x
- Clonelink consensus sequence	     x		x	    x
- Primary consensus sequence	     x		x	    x
- Alternate consensus sequence	     x		x	    x	
- Intermediate (phrap) contig 	     x		-	    -
  consensus sequence
- Cluster members		     x		-(3)	    x
- Cluster singletons		     x		x	    x
- Phrap singletons		     x          x	    x

Alignments:
- Final alignment		     x		x	    x
- Intermediate (phrap) alignment     x		x	    x

Other:
- Cluster member list in csv format  -		-	    x
- Alignment analyses 		     x		x	    x
- Project summary report	     x		-	    x
- Non-redundant project summary	     -		x	    x
  report in csv format		     
- Quality scores		     -		-	    -

Sequence annotation (4):
- Direction                         x           x           x
- Length                            x           x           x
- Clone ID                          x           x           x
- Clone library                     x           x           x
- Tissue type                       x           x           x
- Organism                          x           x           x
------------------------------------------------------------------------

(1)STACKdb v3.1 data was converted to stackPACK v2.2 format in order to
   produce STACKdb v3.1.1. The data output in original unmasked format 
   functionality is not available in stackPACK v2.1.1, used to produce 
   STACKdb v3.1, and original data can thus not be output in STACKdb v3.1.1.

(2)Sequence data can only be viewed in original unmasked format if the 
   masking step was skipped during the processing pipeline.

(3)The clonelink, cluster, contig and consensus members can be output 
   in FastA format by clicking on the appropriate accession number 
   in the cluster tree within WebProbe.

(4)Sequence annotation can only be parsed and thus viewed and exported 
   if it is in a defined and recognizable field within the sequence 
   header information. Please refer to the output format specification 
   document provided with this release for details on sequence 
   annotation parsing within STACKdb v3.1.1.



1. ACCESSING DATA FROM THE WEB INTERFACE

The STACKdb data is stored in stackPACK's relational database and is 
managed, viewed and exported using the following web interface 
components:
- WebProjectManager: Project management and manipulation.
- WebProbe: Results viewing and analysis.
- WebReport: Output reporting for download and evaluation.

Please note that WebPipe and certain parts of WebProjectManager, which 
are responsible for project creation and data processing in stackPACK, 
have been disabled in the version provided for STACKdb viewing and 
export. If you would like to cluster your own data using the stackPACK 
transcript reconstruction and variation analysis management system, 
please contact Electric Genetics at sales@egenetics.com. Academic and 
non-profit institutions may download the full stackPACK toolset from: 
http://www.sanbi.ac.za/CODES/


1.1 WebProjectManager

The project manager lists the various STACKdb categories and allows some
basic project management and manipulation such as the display of summary 
reports. Users are also able to filter the full project list by name, 
owner or description by entering all or part of the project information 
in the text box and clicking on 'Search'. 

Data processing, project deletion and project description editing within 
WebProjectManager is only possible for full stackPACK users.

  --------------------------------------------------------------------
  How To Use WebProjectManager:

  - Click on WebProjectManager in the menu bar for a list of the 
    various STACKdb categories
  - Click on the actual project name for a summary report
  - Select the tick box next to the relevant project name(s) and click 
    on:
     o 'Edit_Description' to edit the project description (disabled)
     o 'Add_Sequences' to incrementally add new sequence data to 
        existing projects (disabled)
     o 'Delete_Project(s)' to delete the project (disabled)
  --------------------------------------------------------------------

Please refer to WebProjectManager in the online manual found at 
stackPACK's Support link for further details.


1.2 WebProbe

WebProbe helps users gain insight into the tissue clusters by providing 
viewing tools that link various consensus sequences, alignments, 
alignment analyses and external data sources like UniGene. STACKdb 
entries have internal accession numbers for the clonelinks, clusters, 
contigs and consensus sequences produced in the clustering pipeline and 
these, as well as the original GenBank sequence accession numbers, may 
be used to query the viewer.

  --------------------------------------------------------------------
  How To Use WebProbe:

  - Click on WebProbe in the menu bar
  - Enter the project name and owner, or use the 'Search' option to 
    filter the full project list
  - Query WebProbe: 
     o For cluster data by entering an internal or GenBank accession 
       number in the 'Accession' text box and clicking on 'View'
     o For clusters with potential alternate expression forms by 
       clicking on 'Alternates_List'
     o For a summary report by clicking on 'Summary'
  --------------------------------------------------------------------

WebProbe allows the user to do the following:  
- Retrieve specific cluster family information by querying with internal 
  clonelink, cluster, contig, consensus or original Genbank accession 
  number.

- Download clonelink, cluster and contig member sequences in FastA 
  format.

- Link out to the corresponding cluster in Unigene.

- Obtain a summary of any STACKdb category, including:
   o Total number of input sequences
   o Total number of clonelinked clusters
   o Total number of multi-sequence clusters 
   o Total number of sequences in multi-sequence clusters
   o Total number of singletons 

   NOTE: The same summary report is also obtainable from 
         WebProjectManager.

- Obtain a report of all alternate consensus sequences of any STACKdb 
  category.

- Access the following consensus and alignment views associated with 
  each cluster:

   o Clonelink Consensus Sequence View
     The clonelink consensus sequence produced during the clonelinking 
     step. Multi-sequence clusters and/or singletons are joined to
     form clonelinks by virtue of shared clone IDs.

   o All Final Consensus Sequence(s) View
     All final consensus sequences for a particular contig produced 
     during the alignment analysis step. The final consensus sequences 
     are displayed in rank order in this view and includes both primary 
     and alternate consensus sequences.

   o PHRAP Consensus Sequence View
     The intermediate contig consensus sequence produced during the 
     assembly step.

   o PHRAP Alignment View
     Assembly and initial sequence alignment produced during the 
     assembly step.

   o Alignment Analysis View
     An overview of the entire contig produced during the alignment 
     analysis step, highlighting any potential alternate expression 
     forms that may occur.

   o Final Consensus Sequence View
     The final consensus sequence for a particular contig produced 
     during the alignment analysis step. This could be a primary or 
     alternate consensus sequence.

   o Final Alignment View
     Final sequence alignment produced during the alignment analysis 
     step. Potential alternate expression forms for each contig are 
     identified and a separate alignment is displayed for each 
     potential alternate expression form.

    NOTE: 
    - Data analysis is simplified by four different display options in 
      WebProbe for the alignment views: plain text, coloured bases, 
      differences only and highlighted differences. 
    - You are able to output all consensus sequence members in FastA or 
      csv formats using one of the pre-defined reports.
    - You are able to output all alignments in ACE, ClustalW or MSF 
      formats using one of the pre-defined reports for further searching, 
      editing and use in other programs using the command line or 
      interface reports. 


ALTERNATE EXPRESSION FORMS

Contigs may have more than one possible consensus sequence, due to 
alternate splicing, chimeras or other isoforms of the gene represented 
by the cluster. 

  --------------------------------------------------------------------
  How to search for clusters with possible alternate expression forms:

  During the clustering pipeline, the program looks for subalignments 
  that might represent alternate expression forms. To find all clusters 
  in a tissue category with alternate expression forms:

  Go to stackPACK WebProbe by clicking WebProbe in the top menu bar
  - Enter the project name and owner, or use the 'Search' option to 
    filter the full project list
  - Request an Alternates Report by clicking on 'Alternates_List'
  StackPACK will provide you with a list of cluster contigs that 
  contain potential alternate expression forms.
  --------------------------------------------------------------------
Please refer to WebProbe in the online manual found at stackPACK's 
Support link for further details.


1.3 WebReport

WebReport provides a list of predefined reports that can be selected to 
download STACKdb entries for further data evaluation.

  --------------------------------------------------------------------
  How To Use WebReport:
  - Click on WebReport in the menu bar 
  - Enter the project name and owner, or use the 'Search' option to 
    filter the full project list
  - Choose the type of report you wish to generate
  - Click on 'Next' to select the report options
  - Click on 'Export' to generate the report
  --------------------------------------------------------------------

The following web reports are available:
- All input sequences in masked or original FastA format.

- Consensus sequences in FastA format
   o All clonelinked consensus sequences, multi-sequence cluster 
     primary consensus sequences or/and multi-sequence cluster 
     alternate consensus sequences.
   o Note: The alternate consensus sequence option report is 
     provided for each of the STACKdb categories with the release.

- Singleton sequences in FastA format
   o Cluster and/or phrap singleton options per project.

- Alignments in MSF, ClustalW or ACE format
   o Final or intermediate (phrap) alignments per accession or 
     per project in ACE, ClustalW or MSF formats.

- Alignment analyses 
   o The Alignment Analysis logs, per accession or for the whole 
     project.

- Non-redundant output of the entire project in FastA format
   o This selection produces a comprehensive output report 
     containing all consensus and singleton data from the project. 
   o This report is provided for each of the STACKdb categories 
     with the release.

- Non-redundant summary of the entire project, in comma-delimited 
  format. 
   o This report is provided as an overview to the entire project 
     that can easily be manipulated by the user. 
   o All information about each EST or mRNA sequence is provided 
     in a single row, including various stackPACK internal IDs, 
     original EST or mRNA ID, length, clone library and clone ID 
     information. 
   o The resultant comma delimited file can be imported into most 
     spreadsheet programs and sorted for further analysis. 


Please refer to WebReport in the online manual found at stackPACK's 
Support link for further details.

  --------------------------------------------------------------------
  How do I create a BLASTable database of data processed by stackPACK?

  - Using one of stackPACK's FastA-formatted reports, such as the 
    non-redundant report provided with the distribution, extract the 
    data you wish to search using BLAST. 
  - Format the resulting FastA file using NCBI's 'formatdb' program. 
  - For detailed information on how to run formatdb or set up BLASTable 
    database, please refer to the BLAST OPTIONS in the NCBI BLAST 
    directory (http://cbr-rbc.nrc-cnrc.gc.ca/documentation/blast/formatdb.html)
  --------------------------------------------------------------------



2. ACCESSING DATA FROM THE COMMAND LINE

A series of scripts are provided that allow STACKdb users to export 
STACKdb data from the command line using a number of predefined 
reports. These reports correspond to those found in the WebReport, 
with the exception of the stack_ReportClusterMemberEst.py report 
that is not available from the web interface.

The following command line reports are available:
- stack_ReportAllSequences.py    
   o All masked or original unmasked input sequence, in FastA format.

- stack_ReportConsensus.py
   o All clonelinked consensus sequences, multi-sequence cluster 
     primary consensus sequences or/and multi-sequence cluster 
     alternate consensus sequences, in FastA format.

- stack_ReportAllSingleton.py 
   o All cluster and/or phrap singleton sequences, in FastA format.

- stack_ReportClusterMemberEst.py
   o List of all constituent EST or mRNA sequences per cluster 
     accession or for the whole project, in FastA or comma-delimited 
     format.

- stack_ReportAlignment.py
   o Final or intermediate (phrap) alignments, per accession or for 
     the whole project, in MSF, ClustalW or ACE formats.

- stack_ReportClusterAlignmentAnalysis.py
   o The Alignment Analysis logs, per accession or for the whole 
     project.

- stack_ReportNonRedundant.py
   o Non-redundant output of the entire project, in FastA or 
     comma-delimited format.

Please refer to section 5, " Exporting Data from the Command Line," in 
the stackPACK command line manual found at stackPACK's Support link 
for a detailed description of each report.


3. DOCUMENTATION

The following documentation is available from stackPACK's Introduction 
link, to assist the user in getting the most out of searching STACKdb:
- A brief guide to the interface.
- Detailed user manual with links available from every page of the
  interface.
- Naming conventions used in the stackPACK interface. 
- Input data format descriptions.
- Command line user manual.


4. DO YOU STILL HAVE ANY QUESTIONS?

Please contact Electric Genetics at: 
 phone  +27 21 959 3964
 fax    +27 21 959 2512
 e-mail support@egenetics.com