Dear all,
I would be grateful if you could help me with the following question:
BLASTP 2.2.13 [Nov-27-2005]
Query: sp|Q4U9M9|104K_THEAN 104 kDa microneme-rhoptry antigen precursor (p104).
Database: /import/bc2/data/databases/BLASTDB_NCBI/sprot
Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start,
sp|Q4U9M9|104K_THEAN sp|Q4U9M9|104K_THEAN 100.00 70 0 0
sp|Q4U9M9|104K_THEAN sp|P15711|104K_THEPA 58.57 70 27 2
the following code is for parsing the above file:
#!/usr/bin/env perl
use strict;
use warnings;
my $file = “myblastOuput.txt”;
open FILE, $file or die "Cannot open $file: $!";
while (my $line = <FILE>) {
if ($line !~ /^#/) { # only read non commented lines
# get the different fields
my ($query_ids, $subject_ids, $percent_id, $align_length,
$num_mismatches, $num_gaps, $query_start, $query_end,
$subject_start, $subject_end, $e_value, $bit_score)
= split( / /, $line, 12);
}
}
close(FILE);
It is not clear to me:" the line_array consists of 12 variables -which correspond to the elements of the array? Each variable contains the values that will be iterated and parsed with tabs, which should be for each line 12 in total ,as well". Is this correct?
I look forward to hearing from you,
mariaig
Put code in code tags so that it’s easier to see spaces and what not.
my ($query_ids, $subject_ids, $percent_id, $align_length, $num_mismatches, $num_gaps, $query_start, $query_end, $subject_start, $subject_end, $e_value, $bit_score) = split( /t/, $line, 12);
This code assumes that fields in your line are separated by a regular expression which is the ‘t’ character. Surely not? What did you intend? Space? Whitespace? Tab? (Tab is .)
The fields are assigned to the variables as listed, $query_ids gets the first field, $subject_ids the second and so forth.
Because you specified a limit of 12, any characters beyond the 12th field go into the 12th variable also.
You have to understand that in Perl, a list like this:
($var1, $var2, $var3)
behaves like an array of 3 variables.
Interesting. mariaig did use a backslash in the original post (as can be seen looking at the NNTP version) but it seems to have been elided by the quoting. Let’s try an experiment. Here’s three identical pieces of text, quoted in three ways:
here comes quoted backslash t " " and here is a pattern / / in slashes
here comes quoted backslash t " " and here is a pattern / / in slashes
here comes quoted backslash t " " and here is a pattern / / in slashes
Oops!
mariaig, it’s probably the Camel taking revenge for you implying that Perl is somehow the same as PHP. What a gross insult!
And it’s generally a bad idea to write your own code for parsing file formats. Somebody’s usually done the job for you. If you’re planning to do a lot of BLAST processing and related things, you should probably study Bioperl. But you need to get a better grasp of Perl itself first. There are several good books in the O’Reilly series. Or you could start at Perl programming documentation - perldoc.perl.org
Dear ken_yap,
thank you very much for your helpful reply.
best
mariaig
Dear djh-novell,
thank you for your reply,
best,
mariaig