This section provides an overview of what nlp is, and why a developer might want to use it.
It should also mention any large subjects within nlp, and link out to the related topics. Since the Documentation for nlp is new, you may need to create initial versions of those related topics.
Stanford CoreNLP is a popular Natural Language Processing toolkit supporting many core NLP tasks.
To download and install the program, either download a release package and include the necessary *.jar
files in your classpath, or add the dependency off of Maven central. See the download page for more detail. For example:
curl http://nlp.stanford.edu/software/stanford-corenlp-full-2015-12-09.zip -o corenlp.zip
unzip corenlp.zip
cd corenlp
export CLASSPATH="$CLASSPATH:`pwd`/*
There are three supported ways to run the CoreNLP tools: (1) using the base fully customizable API, (2) using the Simple CoreNLP API, or (3) using the CoreNLP server. A simple usage example for each is given below. As a motivating use case, these examples will be for predicting the syntactic parse of a sentence.
CoreNLP API
public class CoreNLPDemo { public static void main(String[] args) { // 1. Set up a CoreNLP pipeline. This should be done once per type of annotation, // as it's fairly slow to initialize. // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, parse"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); // 2. Run the pipeline on some text. // read some text in the text variable String text = "the quick brown fox jumped over the lazy dog"; // Add your text here! // create an empty Annotation just with the given text Annotation document = new Annotation(text); // run all Annotators on this text pipeline.annotate(document); // 3. Read off the result // Get the list of sentences in the document List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class); for (CoreMap sentence : sentences) { // Get the parse tree for each sentence Tree parseTree = sentence.get(TreeAnnotations.TreeAnnotation.class); // Do something interesting with the parse tree! System.out.println(parseTree); } } }
Simple CoreNLP
public class CoreNLPDemo { public static void main(String[] args) { String text = "The quick brown fox jumped over the lazy dog"); // your text here! Document document = new Document(text); // implicitly runs tokenizer for (Sentence sentence : document.sentences()) { Tree parseTree = sentence.parse(); // implicitly runs parser // Do something with your parse tree! System.out.println(parseTree); } } }
CoreNLP Server
Start the server with the following (setting your classpath appropriately):
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port] [timeout]
Get a JSON-formatted output for a given set of annotators, and print it to standard out:
wget --post-data 'The quick brown fox jumped over the lazy dog.' 'localhost:9000/?properties={"annotators":"tokenize,ssplit,parse","outputFormat":"json"}' -O -
To get our parse tree from the JSON, we can navigate the JSON to sentences[i].parse
.