I have an interesting problem at work at the moment. The team have some novel enzymes which we are trying to characterise. Now we know the overall groups that these enzymes belong to - the problem is that the groups are very, very broad - specifically lipases and proteases. A quick search through BRENDA indicates a huge variety of lipases and proteases. So, I can do a BLAST and find the closest matches to our protein sequences. The problem is though, finding the closest match that has been characterised. The sheer amount of genome sequence data being generated is flooding the protein data banks at NCBI and EMBL. Having a characterised protein for comparison does help in determination of novelty, as well as possibly giving you a baseline to start at for characterisation of your protein
I have a few options - I can do the standard BLAST and check every single result for level of characterisation - this is not the prefered alternative.
I can start playing around with various databases - the optimal databases to BLAST against would be the BRENDA database, but I can't find an option to do this.
The SWISS-PROT database would also be an option, but I get very few hits, and no significant ones using this method.
Alternatively I can BLAST against the PDB database. This is the database for structural information, and almost by definition, any protein that has had it's structure determined is likely to be well characterised. However this is only appropriate for groups of enzymes and proteins that have a certain number of structures, which is not applicable to the vast number of proteins out there.
Using TrEMBL, and other annotated databases is another option, but again these tend to have a high noise to signal ratio for what I want.
Basically I need a database for BLAST that only contains characterised proteins that have been published, and I'm out of ideas as to where to get one!
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment