Rosetta Stone
|
|
|
|
Usage:
java programs.Parse [Options]
This parser breaks sentences into phrases. This parser is a minimal
committment barrier category parser.
The minimal commitment analysis assigns underspecified syntactic analysis
to lexically analyzed input. The current emphasis is on noun phrases;
however, the entire input string is bracketed. The analysis can be
thought of as the result of skimming the input to extract only NP's
and PrepP's.
The options include the following:
The prerequisites to run this program include:
| CLASSPATH |
The classpath needs to include the installation path to
$MMTx/mmtx/classes where
$MMTX is the top level directory that contains the MMTx software.
The classpath needs to include the installation path to
$MMTx/mmtx/config
directory.
The classpath needs to include the path to MySQL's jdbc driver
jar file, (or the path to your version of the jdbc driver, if
different than MySQL).
$MMTx/mmtx/mm.mysql.jdbc-1.2c directory.
(There exists an example cshrc with the correct class paths set
in the $MMTX/mmtx/config directory).
|
Input and Ouput File Options
The default behavior is to read input from standard input and write to
standard output. This program has a bias toward document based processing,
it requires having the entire document seen prior to processing any of it.
This is a polite way of saying that if input is coming from standard
input, processing will not begin until an end of file has been send. Put
another way, no processing happens from text put in from standard input,
when a line-feed is seen. Processing only begins when the end of file
marker has been seen: (A control-D) in the borne shell, a control-Z and
carrage return in DOS.
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--fileName= |
stdIn |
Name of file to process |
| __ |
--outputFileName= |
stdOut |
Name of the outputFile to write to |
Input Format Descriptions
The default behavior is to auto-detect MEDLINE Citation format or free
text. The following flags overwrite this feature.
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--medlineCitations |
false |
The input is a collection of medLine citations |
| __ |
--mrcon |
false |
The input is a collection of MRCON rows |
| __ |
--freeText |
true |
The input is free text |
| __ |
--fieldedText |
false |
Is the input file/stdin fielded text? |
| __ |
--textField= |
2 |
For fielded text, which field contains the text |
| __ |
--fieldSeparator= |
| |
For fielded text, what char is the separator |
Output Display Options
| Short Name |
Long Name |
Default Value |
Purpose |
| -T |
--tagger_output |
false |
Display tagger output |
| -p |
--plain_syntax |
true |
Display the phrases |
| -x |
--syntax |
false |
Display the MincoMan style output from the phrase
extractor
|
| __ |
--numberOfPhrases |
false |
Report the number of phrases the input has |
Options to retrieve ever more levels of detail
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--collections |
false |
Display Collection information |
| __ |
--documents |
false |
Display Documents |
| __ |
--sections |
false |
Display Sections |
| __ |
--sentences |
false |
Display Sentences |
| __ |
--phrases |
false |
Display Phrases |
| __ |
--nps |
false |
Display Noun Phrases |
| __ |
--lexicalElements |
false |
Display Lexical Elements |
| __ |
--lexicalEntries |
false |
Display Lexical Entries |
| __ |
--tokens |
false |
Display tokens |
| __ |
--pipedOutput |
false |
Display in a pipe delimited format |
| __ |
--details |
false |
Display the goory details |
Processing Options
Options of the underlying data model
| Short Name |
Long Name |
Default Value |
Purpose |
| -t |
--tag_text |
false |
Tag the text (NOTE: When used on the command line,
turns tagging off) |
| __ |
--lexicalLookup= |
2 |
Lookup Algorithm options 1-3 |
| __ |
--ambiguousAcronyms |
false |
Disambiguate sentence boundries using the acronyms and
abbreviations file |
Configuration Options
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--configName= |
mmtx cfg |
The name of the configuration file |
| -R |
--MMTX_ROOT= |
<installdir>/mmtx |
MMTX Root path |
| __ |
--MMTX_USERNAME= |
mmtxUser |
Database Account Name |
| __ |
--MMTX_HOSTNAME= |
<localhost> |
MMTX Root path |
| __ |
--ambiguousAcronymsFile= |
data/lexicon/ ambiguousAcronymsFile.txt |
Location of the acronyms and abbreviations file needed in
the tokenizer |
| __ |
--inflectionTable= |
inflStatic2001Lexicon |
The Lexicon's inflection table used |
| __ |
--lexiconVersion= |
Static2001Lexicon |
Lexicon Version |
| __ |
--nmm |
false |
Flag that flips between MetaMap output and non MetaMap output
style. This flag is useful when combined with the --pipedOutput
and display flags such as the --sentences, --phrases, --nps,
--variants and other levels of detail. |
Tagger Specific Options
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--useTagger |
true |
Use the tagger |
| __ |
--dontUseTagger |
false |
Don't use the tagger [Same as --tag_text] |
| -t |
--tag_text |
false |
[Don't] tag the text |
| __ |
--taggerMachineName= |
nls2 |
Tagger Server |
| __ |
--tagger= |
XeroxParc |
The name of the tagger that is hooked in |
| __ |
--taggerPortNumber= |
1774 |
Tagger Server Port number |
|