Mohanapriya, D and Beena, R (2020) Enhancing Prediction of Drug Indication and Side Effects through Named Entity Recognition and Jointly Learning of Syntactic Structures of Sentences. Enhancing Prediction of Drug Indication and Side Effects through Named Entity Recognition and Jointly Learning of Syntactic Structures of Sentences, 7 (6). pp. 170-186. ISSN 2515-8260
Ist-paper.pdf - Published Version
Download (471kB)
Abstract
The drug discovery process needs long time and cost to discover proper drug for
treating the patients effectively. The unintended effects of drugs and the beneficial impact
of drugs must be recognized because they may inflict severe patient’s injuries due to
unforeseen acts of the produced candidate drugs. One of the effective techniques is text
mining it can find the hidden relation between genes, diseases and drugs from the huge
volume of data. Predict drug Indications and Side effects using TOpic modeling and
Natural language processing (PISTON) was a text mining method which used to find the
association between drug-disease and drug-side effects. Natural Language Processing
(NLP) is used to identify words which relate association among drugs and genes from the
sentences which are collected from literatures where words represent drugs and genes co�occurred. The relation between drugs and genes is represented through building drug-topic
probability matrix by topic modeling. From the drug-topic probability matrix, the drugs for
phenotypes can be identified by training a classifier for high-rank topics of drugs. It also
predicted the association between drug and side effects. However, expressive power of
named entities and their potential for enhancing the quality of discovered topics has not
received much attention in PISTON. So in this paper, an Improved PISTON (IPISTON) is
proposed which enhance the quality of discovered topics through named entity recognition
system and inducing the syntactic structure from unannotated sentences. Initially, the
sentences from the collected literature data are extracted and a dependency graph is
constructed using NLP. After that, a Gene Regulation Score (GRS) of each sentence is
calculated to define the relationship between gene and diseases. The topic modeling is
enhanced by finding the biomedical entities in the biomedical repository using Conditional
Random Field (CRF) and Bi-directional Long-Short Term Memory-CRF (BLSTM-CRF).
CRF is a sequence modeling framework which finds the biomedical entities through the
conditional probability distributions of biomedical entities on collected documents.
BLSTM-CRF is a deep learning technique which is used to enhance the performance of
CRF based named entity recognition. Moreover, the syntactic structure of sentences is
calculated through syntactic distance measure. The syntactic structure, biomedical entities
and the drug-topic probability matrix is given as input to CRF, BLSTM-CRF, Naïve Bayes,
CART and Logistic for prediction of drug-phenotype and drug-side effects associations.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Drug discovery, drug-phenotype association, drug-side effect association, named entity recognition, Conditional Random Field, Bi-directional Long-Short Term Memory, syntactic structure. |
Divisions: | PSG College of Arts and Science > Department of Computer Science |
Depositing User: | Mr Team Mosys |
Date Deposited: | 29 Feb 2024 11:11 |
Last Modified: | 29 Feb 2024 11:11 |
URI: | http://ir.psgcas.ac.in/id/eprint/2115 |