Now that the preparatory work is done, we can finally start to implement the blast search itself. The SeqSimilaritySearcher interface also defines a method that returns the DBs that can be searched; this is quickly implemented by using the SimpleSequenceDBInstallation class from the package org.biojava.bio.seq.db. I fill this installation in the constructor of the Blast class, using a constant array of the file names. This suffices for my purposes; of course, it would be nicer if the class looked for the available files itself...
public Blast() { this.seqDBs = new SimpleSequenceDBInstallation(); for (int i = 0; i < this.BLAST_DBS.length; i++) { try { this.seqDBs.addSequenceDB(SearchFactory.fastaProt2DB( BLAST_DBS[i]), null); } catch (java.io.FileNotFoundException e) { e.printStackTrace(); } } } public Set getSearchableDBs() { return this.seqDBs.getSequenceDBs(); }
The important method doing the search has the following signature:
public SeqSimilaritySearchResult search(SymbolList querySeq, SequenceDB db, Map searchParameters) throws BioException {
For a detailed description of the parameters and the usage of the
method, I refer you to the Biojava API
documentation. I'll just mention here that the parameters must be
quite generic, so an IllegalArgumentException is thrown if the
parameters are not compatible to BLAST. In this implementation, the
default search mode is blastp for protein searches.
The assembly of the command line and the call to the external program
is straightforward; I'll ommit the special safety precautions (are
there parameters? Is the program specified? etc..) here. The
environment variable BLASTDB must be set to the path to the data files
for BLAST.
final String BLAST_PROGRAM = "blastall"; String commandLine = SearchFactory.getBlastPath(); commandLine += BLAST_PROGRAM + " "; Iterator it = searchParameters.entrySet().iterator(); while (it.hasNext()) { Map.Entry entry = (Map.Entry) it.next(); commandLine += " -" + entry.getKey() + " " + entry.getValue(); } commandLine += " -d " + db.getName(); String[] envs = {"BLASTDB=" + SearchFactory.getBlastDBPath()}; Process blast = null; blast = Runtime.getRuntime().exec(commandLine, envs);
The next problem is how to give the input sequence to BLAST. Luckily, the program can use the standard input for that. The correct format (so that BLAST terminates and the results can be parsed) is a '>', followed by the sequence ID, a carriage return, the sequence itself, another carriage return, an empty line, and an end-of-input symbol (ctrl-d). Ommit the ctrl-d, and BLAST waits for another sequence; ommit the >ID, and the parsing of the output goes wrong. So we have:
String qSeqID = SearchFactory.createID(((Sequence)querySeq).getName()); PrintWriter writer = new PrintWriter(new BufferedWriter( new OutputStreamWriter(blast.getOutputStream()))); writer.println(">" + qSeqID); writer.println(querySeq.seqString()); writer.println(); writer.print('\u0004'); // ctrl-d writer.flush(); writer.close();
All that remains is waiting for BLAST to finish, and to determine if everything was ok (return code 0). If there was an error, you could try to parse the output or error stream for sophisticated error handling; I just assumed that a parameter was wrong, so an IllegalArgumentException gets thrown.
try { blast.waitFor(); } catch (InterruptedException e3) { } if (blast.exitValue() != 0) { throw new IllegalArgumentException(); }
Next: How to parse the results
Back: How to implement a SequenceDB
The general idea
URL dieser Seite: http://www.joerg-ruedenauer.de/Software/blast/blast2.html
Autor dieser Seite: Jörg Rüdenauer
Letzte Änderung am: 14.07.2002
Haftungsausschluss