Homepage - News - Tools - Data and Tranche! - FAQ - Archive - Sitemap
| About FASTA Database Indexing Process This page describes how the administation team at ProteomeCommons.org archives popular FASTA databases. Institutions typically update and publish new versions of their FASTA databases on a monthly basis. The ProteomeCommons.org administration team downloads these databases and uploads them to Tranche. |
FASTA |
Download MethodNCBI nrThe NCBI nr database is a single file downloaded out of the folder ftp://ftp.ncbi.nih.gov/blast/db/FASTA/, and is titled nr.gz. Before going through the long process of uploading a new version of the data, check that the date of the file on NCBI's FTP site is newer than the most recently-uploaded version on Tranche. NCBI does not store archives of their data sets, so getting archives of NCBI nr from NCBI's website is not possible. NCBI generally updates their databases on a monthly basis. 7 Different IPI FASTA DatabasesAs of the writing of this text, there are 7 FASTA databases that IPI is currently publishing: Arath, Bovin, Chick, Danre, Human, Mouse, Rat. If they start to publish any new ones, archive those as well. There is currently one set of archives that is no longer being updated by IPI, and thus we are not archiving it: Brare. Download the latest versions of each of the databases from the proper folders on the IPI FTP website: ftp://ftp.ebi.ac.uk/pub/databases/IPI/old/. When you download the versions of the data, put them into a folder for that version before uploaded. Each set of databases is typically updated on a monthly basis. IPI versions their published databases, so keeping track of your archiving to Tranche is easy. The GPM's cRAP (common Repository of Adventitious Proteins)The GPM's cRAP (see-RAP) database is the set of files at ftp://ftp.thegpm.org/fasta/cRAP/. Do not include the directory /archive/ into the download. Before uploading a new version of the data, check that the date of the file on The GPM's FTP site is newer than the most recently-uploaded version on Tranche. cRAP is not archived by The GPM through their FTP site, so archiving old sets of cRAP is not possible. The database is generally updated on a monthly basis. ExPASy's Swiss-Prot and TrEMBL, the UniProt KnowledgebaseThis is by far the largest of FASTA archives: between 2 and 6 GB in size. Download the entire contents of ftp://ftp.expasy.org/databases/uniprot/current_release/knowledgebase/complete/, including the README and documentation directory. Check that the date of the file on The GPM's FTP site is newer than the most recently-uploaded version on Tranche before uploading.Archives are not kept on ExPASy's FTP server, so no archives are downloadable. The database is generally updated on a monthly basis. Upload MethodRefer to the upload documentation for a full guide. The following are a list of the upload parameters and how they should be set for these FASTA databases. Title
DescriptionThe descriptions for all of the uploads always have the same format: AnnotationsBecause these uploads are updates to older data sets, each FASTA upload needs to be annotated with the old version Tranche hash when applicable. It's not necessary to update the old project with the new version hash because the cache updater will handle this automatically. License
|
Comments or Questions? Please contact the site's administrators.