|
Knowledge Extraction from Biological Databases
|
This is a line of research that we are not currently pursuing, but
that still interest us. With Temple F. Smith we addressed, some time
ago, the problem of finding the query selecting the closest database
subset to a given arbitrary subset a problem which we term now
reverse querying. We addressed the problem informally in Guigó
et al. (1991), and more rigorously in Guigó and Smith
(1993). In this paper, we mapped the semantic problem of finding
appropriate descriptions in a first order language (a database query
language) into the algebraic problem of finding similar sets in a set
algebra. Using the properties of a set similarity measure, we were
able to design an efficient algorithm, that was latter implemented in
a program (Guigó et al., 1993). We were particularly interested
in the case in which the given database subset is the set of protein
sequences in a database matching a given (maybe randomly generated)
pattern, and the query was built on functional annotation of the
database. During the development, this method was tested to
automatically search a protein sequence database for functional amino acid
patterns, and a few interesting cases were discovered (Vega et
al, 1990; Guigó and Smith, 1992).
- R. Guigó, I. Vazquez, S. Rao, and T. F. Smith.
"A protein sequence database cross-field association system."
In IEEE Computer Society Press, editor:
Proceedings of The 26th Hawaii International Conference on System Sciences:Biotechnology Computing.
Vol.I, Pp:822-833, Los Alamitos, CA, 1993.
- R. Guigó and T.F. Smith.
"Inferring correlation between database queries: Analysis of protein sequence patterns."
IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:1030-1041 (1993).
- R. Guigó and T.F. Smith.
"A common pattern between tgf-beta family and glutaredoxin."
Biochemical Journal, 280:833-834 (1991).
- M.A. Vega, R. Guigó and T.F. Smith.
"Autoimmune response in AIDS."
Nature, 345:26 (1990).
|
|