-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add code for sentiment analysis #417
base: master
Are you sure you want to change the base?
Conversation
Required for fixing gramener#377 Since v0.24, sklearn's column transformers need the same order of feature names between .fit and .predict. We can still send URL parameters in any order, but they need to be ordered correctly by the MLHandler. See sklearn's release notes for more: https://scikit-learn.org/stable/whats_new/v0.24.html#sklearn-compose
…amex into jd-transformers
Cool! @jaidevd could you please review? Do let me know when to merge |
@MSanKeys963 The target branch has to be |
@MSanKeys963 other than these two changes, LGTM |
@MSanKeys963 this still showing merge conflicts. Please take a look. |
@jaidevd I've fixed all the issues mentioned above. Please let me know if there's anything else. |
Thanks, @MSanKeys963 @sanand0 This is ready for merge. |
@sanand0 I've fixed all the issues. Please check. |
For example, this is how we optionally import ElasticSearch: def gramexlog(conf):
try:
from elasticsearch import Elasticsearch, helpers
except ImportError:
app_log.error('gramexlog: elasticsearch missing. pip install elasticsearch')
return |
Removed
class NLPHandler()
and added sentiment analysis functionality inclass MLHandler()
.To setup a Gramex service for performing sentiment analysis, use the following configuration:
Getting predictions
GET
sentiments of short pieces of text as follows:curl -X GET --data-urlencode "text=This movie is so bad, it's good." http://localhost:9988/
The output will be:
Files containing text to be classified can also be
POST
ed to the endpoint, with_action=predict
. Any file supported bygramex.cache.open
will work. (Download a sample here.)The output will be:
Measuring model performance
Files containing the
text
andlabel
fields can bePOSTED
to the endpointwith
_action=score
to get the ROC AUC score of the model against the dataset. (Download a sample dataset here).The output will be something like:
Training the model
The model can be trained on a dataset by setting
_action=train
, andPOST
ing the file.The output will show the score of the trained model on the dataset:
Multiple training options for the transformer are supported, including the number of epochs, batch size and weight decay. These can all be specified in the
POST
request as follows:The output is the score of the trained model on the dataset after 3 epochs:
The output is the score of the trained model on the dataset after 3 epochs and a batch size of 32: