----------------------------------------------------------------------
Release Notes
stackPACK v2.2.1
----------------------------------------------------------------------
Release date: February 2003
INTRODUCTION
This version of stackPACK replaces stackPACK v2.1.1 for HP Tru64 only
and includes many new feature requests from customers. StackPACK v2.2.1
for HP Tru64 has the same functionality as stackPACK v2.2 for
Intel-based PC Linux Red Hat, Silicon Graphics Irix, and Sun Solaris,
and is a special release to accommodate many Tru64-specific issues that
were encountered during development.
StackPACK v2.2.1 contains significant improvements over stackPACK
v2.1.1, and focuses mainly on increasing the accuracy of consensus
sequences and improving the ability to handle large datasets.
Enhancements made towards these goals include:
- The incremental addition of new data to existing clusters,
maintaining the cluster history.
- The inclusion of phred quality scores contributing to accurate
consensus sequence generation.
- The ability to assemble and generate consensus sequences with
original data, whilst still retaining the benefits of accurate
clustering using masked data.
In addition continued improvements in terms of parameterization, data
management, viewing, and extraction functions enable more rapid
assessment and manipulation of alignments and alignment analyses.
SYSTEM INFORMATION
1. StackPACK v2.2.1 is available for the following platforms:
Hardware OS Version
----------------------- -----------------
HP Tru 64 UNIX 4.0F
HP Tru 64 UNIX 5.1A
2. StackPACK requires the following third-party software:
Software Version Location
--------------------- ------------------------- ----------
d2_cluster and CRAW(1) latest Academic: Biotique Systems: bpoh@biotiquesystems.com
Academic: University of Houston: bsmalley@uh.edu
Commercial: Electric Genetics: support@egenetics.com
Phrap and Cross_Match 1996 or 1999 Academic: http://www.phrap.org
Commercial: http://www.codoncode.com/
Commercial: http://www.geospiza.com/products/index.htm
RepBase user's choice Academic: http://www.girinst.org/index.html
(optional) Commercial: http://www.geospiza.com/products/index.htm
Commercial: http://www.girinst.org/index.html
RepeatMasker April 1999 or newer Academic: http://repeatmasker.genome.washington.edu/
(optional) Commercial: http://www.geospiza.com/products/index.htm
Apache 1.3 or newer http://www.apache.org
Python(2) python2-2.2.1 http://www.python.org/2.2.1/
MySQL(3) 3.23.27 or newer http://www.mysql.com/downloads/mysql-3.23.html
- Server
(MySQL 3.23.xx)
- Libraries and Header
files for development
(MySQL-devel-3.23.xx)
- Client programs
(MySQL-client-3.23.xx)
MySQLdb (4) 0.9.1 http://www.mysql.com/downloads/api-python.html
NOTE:
(1) Commercial customers do not need to obtain d2_cluster and CRAW
separately - it is included in the stackPACK distribution file.
Academic customers: Please state clearly for which of the supported
platforms you would like precompiled d2_cluster and CRAW binaries
for and use the following as the e-mail subject: "Precompiled
d2_cluster and CRAW for <your platform of choice>."
(2) Use Python-2.2.1 and compile using the --with-threads option.
(3) Although MySQL binaries can be obtained elsewhere, we strongly
recommend the binaries available from
http://www.mysql.com/downloads/mysql-3.23.html as they are
best supported.
MySQL-devel-3.23.xx and MySQL-client-3.23.xx are required for
installation of MySQLdb
(4) In order to build MySQLdb it may be necessary to manually edit
setup.py to correctly specify the MySQL and Python header and
library files as described in the README file provided in the
archive.
WHAT'S NEW IN THIS RELEASE
StackPACK v2.2.1 contains many improvements and new features, most of
which were requested by our academic and commercial customers.
1. StackPACK v2.2.1 focuses on enhancing the quality of consensus
sequences that are generated:
- Inclusion of phred quality scores either from the web interface or
from the command line.
- Ability to assemble sequences and generate consensus sequences with
original unmasked data, whilst still retaining the benefits of
accurate clustering using masked data. Alignments, alignment
analyses and consensus sequences can then be viewed in the web
interface in their original unmasked format.
2. Several enhancements have been made to improve the ability of
stackPACK to handle large datasets:
- Incremental addition of new data to existing clusters maintaining
the cluster history. This can be done either from the web interface
or from the command line.
- Reduction of memory usage for both the web interface and all steps
in the pipeline.
- Significant upgrades in terms of speed of the clonelinking algorithm.
3. New viewing and extraction functions enable rapid simplified
analysis and manipulation of alignments and alignment analyses.
Data exchange with third-party programs is also simplified
resulting in easier assessment of highlighted areas of potential
interest.
New viewing functions include:
- Viewing of the intermediate phrap consensus sequence.
- Improved parsing and viewing of sequence annotation.
- Display of phrap singletons in the cluster family tree view.
- Viewing of previous cluster versions in cases where clusters
have been deprecated or changed due to the incremental addition
of new data to existing clusters.
New reporting and output functions include:
- Output of all constituent sequences for a particular consensus,
contig and/or clonelink.
- Extraction of all original unmasked input sequences in a project,
in FASTA format.
- Extraction of intermediate and final alignments in MSF and
ClustalW format, either for a single alignment or for a whole
project.
- Extraction of phrap alignments in ACE format, either for a
particular cluster or for a whole project.
- Extraction of the Alignment Analysis CRAW logs for a particular
contig or for a whole project.
- Restriction of the non-redundant output report to clonelink
consensus sequences, contig consensus sequences and/or singleton
sequences.
- Extraction of all phrap singletons for a project, in FASTA format.
The singleton output can be filtered by sequence size.
All these new reporting functions can be output either from the web
interface or from the command line.
4. Several improvements in terms of ease of use, flexibility and
parameterization have been implemented, giving users the freedom to
optimize their clustering results and adapt the system according to
their needs. These include:
- Numerous enhancements ensuring a more robust automated installation.
- Creation of multiple projects with the same name provided that they
are owned by different users.
- Implementation of a configuration flag option for all pipeline steps
enabling use of multiple customized configuration files in any
location for different steps in the pipeline and/or for different
projects.
- Implementation of progress indication for all steps within the
stackPACK pipeline.
- Context-sensitive on-line support.
5. Significant improvements to data management functions have been
implemented in this release, including:
- Project filtering by name, owner or description within
WebProjectManager.
- Project description editing at any time from the WebProjectManager.
- Multiple project deletion from within WebProjectManager.
- Project search functions within WebProbe and WebReport.
- Project filtering by owner and/or description in the command line.
KNOWN PROBLEMS
1. Tru64-specific known problems within stackPACK v2.2.1 include:
- stackCORBAd does not spawn multiple threads during processing
leading to a degradation in performance when too many users
access the web interface. The results are not affected.
- The corruption of ACE files, generated during the
stack_Assemble step, during the insertion of these files back
into the database. These corrupt files do not affect the pipeline
processing or the integrity of the cluster results, and will
only be observed when users output data using the
stack_ReportAlignment.py report with the --Format=ACE option.
This is due to a memory allocation error within the ODBC driver,
myODBC, required to connect to the MySQL database. Electric
Genetics is working with the MySQL support engineers to address
this issue.
The intact ACE files can be captured before they are inserted back
into the database by specifying the --leavefiles command at the
end of the stack_Assemble application as follows:
stack_Assemble <project> --leavefiles.
The location of these ACE files are given when processing of the
stack_Assemble application has been completed.
The remainder of the known problems for stackPACK v2.2 and v2.2.1 is
across all platforms:
2. Some of the new features in stackPACK v2.2.1 are not compatible with
projects that have been created with stackPACK v2.1.1 and converted
to stackPACK v2.2.1 format. These include:
- Output of alignments in ACE format.
- Output of sequences in their original unmasked format.
- Usage of the --use-unmasked stack_Assemble option.
3. The stack_Link algorithm does not take singleton sequences into
account and will only link clusters with shared clone ID
information. Note that even if the redundancy parameter equals 1,
singleton sequences with matching clone IDs will not be linked,
and only clusters will be considered during the stack_Link step.
4. The 'Last Modified' date function displayed in both the command
line and WebProjectManager summary reports is not updated when
projects are modified from the command line.
5. NCBI entries that have been deleted and replaced by a new entry
with a different accession number may have the old accession number
appended to the new accession number within NCBI GenBank ACCESSION
field, e.g. U15570 L36804. In these cases stackPACK will strip
everything after the space.
6. StackPACK writes out temporary files to the location specified for
STACKPACK_TMP under the [STACKPACK] heading in the stackpack
configuration file. These temporary files are generally deleted as
the data is processed. If the processing pipeline is interrupted
for some reason, the temporary files may fail to delete and must
be deleted manually. When this temp directory becomes full, some of
the steps in the pipeline may not complete. It is important to
inspect this location periodically for accumulated files, and to
ensure that enough disk space is allocated for STACKPACK_TMP.
7. The hierarchical navigational icons that represent the various
cluster consensus and alignment views within WebProbe may become
misaligned when using certain font settings on Netscape under Linux.
This can be rectified by setting the Netscape variable width font
to 14 in Edit: Preferences: Font.
8. Anomalous behavior may be experienced when using Netscape FastTrack
as the web-server. This does not affect the integrity of the data,
and can be avoided by using Apache as the web-server.
9. Browser limitations may limit the size or length of cluster that
can be viewed in WebProbe, and can be rectified by increasing your
browser time out value.
10. If the MySQL database is overloaded or if the maximum number of
connections is exceeded, stackPACK may lose its connection with
the database. Processes using stackCORBAd, such as WebProbe, will
terminate with a database error and must be processed again. This
can be rectified by restarting stackCORBAd. stackCORBAd will
restart itself automatically when it is killed as follows:
killall -9 stackCORBAd
DO YOU STILL HAVE ANY QUESTIONS?
Please do not hesitate to contact Electric Genetics with any questions,
comments or suggestions for improvement at:
Tel: +27 (21) 959 3964
Fax: +27 (21) 959 2512
E-mail: support@egenetics.com