Biomedical Text Mining Tools

Tuesday, August 2, 2011

Listservs:

Bio-NLP
TREC Genomics Track
Association for Uncertainty in AI
BioCreAtIvE
DBWorld
SIGHIT sighit-members@acm.org

General Data Mining and Knowledge Discovery Sites:

KDnuggets
Machine Learning Projects
ACM SigKDD
Statistical Data Mining Tutorials
UCI Machine Learning Repository
of databases used by the machine learning community for the empirical analysis of machine learning algorithms.
Google Ranking Factors
describes over 100 factors that are thought to be used by Google to rank web pages.

Resources and Tools for Text Representation and Visualization:

Online Registry of Biomedical Informatics Tools (ORBIT) Project is a community-wide effort to create and maintain a structured, searchable metadata registry for informatics software, knowledge bases, data sets and design resources. http://orbit.nlm.nih.gov/
NaCTeM - National Centre for Text Mining (http://www.nactem.ac.uk/services.php)
develops and offers several tools for text processing, including TerMine, AcroMine, Cheshire/TerMine, and U-Compare (http://u-compare.org/).
Open Text Mining Interface
Visualcomplextiy
maintains a diverse collection of projects that visualize complex networks.
Gender Genie
uses a simplified version of an algorithm developed by Moshe Koppel, Bar-Ilan University in Israel, and Shlomo Argamon, Illinois Institute of Technology, to predict the gender of an author.
Termino
marks up text by identifying a variety of different types of biomedical terms that can be linked to databases like the UMLS and Gene Ontology via unique ids.
Corpora for biomedical natural language processing
ACL Wiki
This is the main page of the ACL Wiki for Computational Linguistics, a wiki that is running under the auspices of The Association for Computational Linguistics.
Martin Krallinger - Biology and Text Mining related links
An NLP-oriented compendium of tools, resources, groups, evaluations, etc.
Martin Krallinger - BioNLP resources
FetchProt Corpus
tagged full-text articles describing experiments on proteins to validate tyrosine kinase.
TextArc
just for fun!
Benchmarks and Corpora for BioNLP
annotated evaluation data sets for various IR/IE/TM tasks
NLP registry
natural language processing software.
BioNLP.org
natural language processing of biology text.
LingPipe
a suite of Java tools designed to perform linguistic analysis on natural language data.
Information Visualization Software Repository
for visualizing knowledge domains.
WordNet
a lexical database for the English language - nouns, verbs, adjectives, and adverbs organized into synonym sets.
BLIMP
a forum for collection, compilation and exchange of publications on biomedical text mining.
GATE open source software for text processing - includes wrappers for ABNER, MetaMap, AbGene, GENIA sentence splitters and taggers, Penn BioTagger, MutationFinder, NormaGene and LingPipe, among others.
BioEnEx - a bio-entity mention recognition system that can annotate multiple biomedical semantic types with high performance. http://fmchowdhury.googlepages.com/bioenex.

Knowledge Environments (Information Portals, Online Communities):

Flybase
E. Coli Community
WormBase
Textpresso for C. elegans
Mouse Genome Informatics
Got Mice?
Saccharomyces Genome Database
BioPAX: Biological Pathways Exchange
a collaborative effort to create a data exchange format for biological pathway data.
SMRIDB
The Stanley Medical Research Institute online genomics database (SMRIDB) is a comprehensive web-based system for understanding the genetic effects of human brain disease (i.e. bipolar, schizophrenia, and depression). This database contains fully annotated clinical metadata and gene expression patterns generated within 12 controlled studies across 6 different microarray platforms.
SHARing Point Server
Neuroscience Database Gateway
Alzheimer Research Forum
interactive site maintained by journalists with the latest news, interviews, live chats, etc. Has lists of reagents, ongoing drug development, clinical trials, and much more. information for researchers, physicians and patients.
Schizophrenia Research Forum
similar to the Alzheimer Forum, but dealing with schizophrenia and other major mental illnesses.
Internet Mental Health
information for researchers, physicians and patients.
WebMD and Medscape
large, comprehensive medical sites for researchers, physicians and patients.
UIC's Corner for Collaborative Informatics

Sites that are devoted to genes, proteins, and other bioinformatic resources:

SENT
WEb-based tool for semantic features in text
PIE
extracts protein interaction information from input text.
NextBio
finds genes, articles and public gene expression data that are related to query term.
Whatizit
MarkerInfoFinder
babelomic
ORegAnno
An open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation.
Biorag
Bio Resource for array genes is a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c.elegans genes that are represented in Unigene clusters.
Ontogene
focuses on the extraction of semantic relations (e.g. bind, activate, block) between specific biological entities (such as Genes and Proteins) from the scientifical literature
GeneLibrarian
a platform to provide users well-organized information about any specific group of genes (e.g. one cluster of genes from a microarray chip) they might be interested in.
Bio Resource for Array Genes
a free online resource for easy access to collective and integrated information from various public biological resources for human, mouse, rat, fly and c.elegans genes.
CBioC: Collaborative Bio Curation
allows anyone to search for and vote on statements related to a particular protein, gene, disease, or interaction extracted from PubMed articles.
Medminer
site for extracting information about gene-gene and gene-drug interactions from the abstracts of papers in Medline.
MedGene
links genes and diseases in Medline by a variety of statistical measures.
Pubmatrix
allows the searcher to rapidly and systematically compare any list of terms against any other list of terms in PubMed. It reports back the frequency of co-occurrence between all pairwise comparisons between the two lists as a matrix table
Pubgene
the PubGene Webtools allow users to analyze gene expression data with literature network information, browse literature neighbors of a given gene, search literature articles for a set of genes, search ontology terms related to a given gene, search MeSH terms found with a set of genes, and search for official nomenclature.
HAPI
the High-density Array Pattern Interpreter (HAPI) will accept gene expression array data in tab-delimited format for clusters of up to 250 genes and create keyword hierarchies from the published literature linked to those genes. Keyword hierarchies are intended to help interpret the biological similarities of the genes in the cluster.
MedBlast
searching articles related to a biological sequence.
Chilibot
Mining MEDLINE for gene/protein/keyword relationships.
Genes2Diseases(G2D)
a database of candidate genes for mapped inherited human diseases.
GoMiner
organizes a list of genes for biological interpretations in the context of the Gene Ontology.
Semantic Gene Organizer (SGO)
helps identify gene-gene and keyword-gene associations using gene concepts (via latent semantic indexing [LSI]) in Medline.
GNF SymAtlas
database of genes and expression data
Organism and Bioinformatics Resources
large collection of organism biology and bioinformatics resources
Harvester
caches and cross-links public bioinformatic databases and prediction servers to provide fast access to protein specific bioinformatic information.
DRAGON
a collection of programs for analysis and extraction of biological data.
IHOP
search for abstract sentences that mention a specific gene, and create a gene network of co-mentioned genes.
PILGRM for the biologist with a set of proteins relevant to a disease, biological function or tissue of interest who wants to find additional players in that process. It uses a data driven method that provides added value for literature search results by mining compendia of publicly available gene expression datasets using lists of relevant and irrelevant genes (standards). /http://pilgrm.princeton.edu/

Sites that are, or Contain Lists of, Search Engines and Repositories that include Biomedical Topics:

VADLO
a biomedical search engine wherein queries can be restricted to protocols, online tools, seminars, databases, and software.
DOAJ
Directory of Open Access Journals - covers free, full text, quality controlled scientific and scholarly journals.
Nature Precedings
archive of pre-publication research, unpublished manuscripts, presentations, posters, technical papers, etc. in biology, medicine (except clinical trials), chemistry and earth sciences.
Quadsearch
metasearch engine for scientific articles. Also computes H-index for authors although it does not disambiguate author names.
OReFiL
allows for searching for online resources (URL's) described in scientific articles.
PubMedCentral
the NIH free digital archive of full-text biomedical and life sciences journal literature (much of which are open access).
CiteSeer
indexes PostScript and PDF research articles on the Web, focusing primarily on the literature in computer and information science.
Rexa
a digital library and search engine covering the computer science research literature and the people who create it.
arXiv
Physics E-Print Archive: mostly physics, math and computer science, but has some biomedically relevant articles (e.g. on biophysics and information retrieval), including unpublished articles and those in press.
Cogprints E-Print Archive
author-submitted archive for papers in any field related to cognition.
Google Scholar
searches literature on the web and from various publishers.
Scirus
searches both the literature and the web on scientific topics.
KartOO
web metasearch engine that displays the output in visual form and permits one to zoom in and out while refining one's query.
Grokker
web search engine that displays the output in visual form and permits one to zoom in and out.
Teoma
In contrast to Google, which ranks web pages according to how many other sites link to it over the entire web, Teoma attempts first to define small communities of webpages that are devoted to the topic covered by the query. Then, Teoma ranks webpages according to how many other sites link to it within that same topical community. Teoma also provides suggestions to limit and refine searches.
CompletePlanet
compendium of search engines, including U. S. Patents, Cancer Net, Census Bureau, Library of Congress, and many more on all topics.
MediLexicon
compendium of search engines for abbreviations, medical terms and a variety of other medically related topics (e.g., upcoming conferences).
HONselect
site that certifies medically related web sites for objectivity.
SUM Search
site for clinical queries, similar but not identical to PubMed clinical queries.

Sites that Augment the Standard PubMed Search Service

MESHy unanticipated knowledge discovery through statistical ranking of MeSH term pairs http://tools.bat.ina.certh.gr/meshy/

PIE
searching PubMed literature for protein interaction information http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/IRET/PIE/
PolySearch
The typical query supported by PolySearch is "Given X, find all Y's" where X or Y can be diseases, tissues, cell compartments, gene/protein names, SNPs, mutations, drugs and metabolites.
Jane(Journal/Author Name Estimator)
helps identify the journals and authors related to any input text.
EAGL
Reference.MD
searching/browsing for biomedical concepts and their links to information taken from MeSH, UMLS, Drugs@FDA, FDA AERS.
BioText
for searching full-text of Open Access articles. Allows for searching and browsing figures and their captions.
PubReMiner
performs detailed analysis of PubMed search results, similar in some ways to our Anne O'Tate tool. For any given PubMed query, it computes frequency tables for publication years, authors, journals, words, or MeSH that can be used to refine the query.
EBIMed
analyses PubMed abstracts to offer a complete overview on associations between UniProt protein/gene names, GO annotations, Drugs and Species. The results are shown in a table that displays all the associations and links to the sentences that support them and to the original abstracts.
MEDIE
an intelligent search engine to retrieve biomedical correlations from MEDLINE. You can find abstracts/sentences in MEDLINE by specifying semantics of correlations; for example, "What activates p53" and "What causes colon cancer".
ADAM
Another Database of Abbreviations in Medline.
Acromine
acronym dictionary automatically constructed from the whole of MEDLINE.
Stanford Abbreviations server.
an online dictionary of abbreviations in PubMed articles.
PubMed Gold
finds PDFs for PubMed citations by automatically searching Google.
PubMed Assistant
a biologist-friendly interface for enhanced PubMed search
ReleMed
ranks MEDLINE articles by relevance
PubFinder
a tool for improving retrieval rate of relevant PubMed abstracts
PubNet
publication network graph utility
PubFocus
semantic PubMed/MEDLINE citation analytics
Alibaba
PubMed as a graph
MedKit
PubMed imposes an upper limit of 10,000 for downloading PMID list or citations; and MEDLINE files are too large for most off-the-shelf XML parsers. MedKit is a Java package to work-around the limitations, as well as provide other useful functionalities, e.g. random sampling. Its four modules (querier, sampler, fetcher and parser) can work independently, or be pipelined in various combinations.
SLIM
Slider Interface for MEDLINE/PubMed searches (BETA)
MEVA
upload the result of a PubMed query to MEVA and get a summary of selected MEDLINE fields like MeSH and author names.
BIOWIZARD
allows everyone in the scientific community to rank and discuss PubMed articles.
GoPubMed
your query is submitted to PubMed and the resulting abstracts are classified using Gene Ontology terms.
HubMed
an alternative interface to MEDLINE.
TWEASE
finds individual sentences for matches to your query, not the MEDLINE abstract as a whole.
eTBlast
input an entire paragraph and it returns MEDLINE abstracts that are similar to it.
ExpertMapper
a tool to help identify a medical expert on a given topic.
FABLE
finds MEDLINE articles that mention human genes and proteins more thoroughly than other systems.
Pubcrawler
alerting service that searches custom queries in PubMed or Genbank automatically and notifies the user by email when new relevant papers or sequences appear in the literature.
Vivisimo
demo site that searches PubMed (or the Web) and arranges the output into clusters of articles that are most thematically similar to each other.
BioEx
consists of three components at this point - a) definition question answering system (QA), b) image-based question answering (ImageQA), c) information retrieval system (IR).
BioIE
a rule-based system that extracts informative sentences from the biomedical literature.
askMEDLINE
a free-text, natural language query tool for MEDLINE/PubMed.
PICO
(Patient, Intervention, Comparison, Outcome) search interface, a method of searching MEDLINE/PubMed that encourages the creation of a well-formulated search.
Visual MeSH
browser to assist searchers in choosing appropriate MeSH terms.
Xplormed
to assist browsing articles by topics using chains of word associations.
AkwanMed
enhanced PubMed search interface which ranks articles by relevance
ConceptLink
makes concept maps of related MeSH terms.
Ask HERMES, computational system that automatically analyzes large sets of documents pertaining to specific questions and generates short text from them as output. http://www.askhermes.org/.

Tuesday, August 2, 2011

Listservs:

General Data Mining and Knowledge Discovery Sites:

Resources and Tools for Text Representation and Visualization:

Knowledge Environments (Information Portals, Online Communities):

Sites that are devoted to genes, proteins, and other bioinformatic resources:

Sites that are, or Contain Lists of, Search Engines and Repositories that include Biomedical Topics:

Sites that Augment the Standard PubMed Search Service

searching PubMed literature for protein interaction information http://www.ncbi.nlm.nih.gov/CBBresearch/Wilbur/IRET/PIE/