Rosetta Stone
|
|
|
|
Usage:
java programs.LexicalLookup [Options]
LexicalLookUp retrieves lexical elements from some given text.
Those lexical elements could have been terms that have
been found in a lexicon, or identified by pattern such
as numbers, dates, or identified by some other mechanism.
The options include the following:
The prerequisites to run this program include:
| CLASSPATH |
The classpath needs to include the installation path to
$MMTx/mmtx/classes where
$MMTX is the top level directory that contains the MMTx software.
The classpath needs to include the installation path to
$MMTx/mmtx/config
directory.
The classpath needs to include the path to MySQL's jdbc driver
jar file, (or the path to your version of the jdbc driver, if
different than MySQL).
$MMTx/mmtx/mm.mysql.jdbc-1.2c directory.
(There exists an example cshrc with the correct class paths set
in the $MMTX/mmtx/config directory).
|
| $MMTx/config/mmtxRepository.cfg |
The mmtxRepository.cfg has to exist. This file is created at
installation time, and should not be edited.
|
| $MMTx/config/mmtx.cfg |
$MMTx/config/mmtx.cfg should exist. This file can contain any
of the below options.
This configuration file is editable by users
with local configurations specific to their needs.
Any option that could be found on the command line
can be put into this configuration file. This file
is used to house database passwords, commonly used
options and the like.
|
There is an ordering to options seen. Options from the mmtx.cfg override
the default options from the mmtxRegistry.cfg. Options seen via a second
configuration file specified on the command line via the
--configName= option overides settings from the Repository
configuration as well as the standard config file. Options seen from
the command line overide all.
Option Hierarchy:
- User specified command line options (highest precedence)
which overwrite,
- Command line specified configuration file(s) which overwrite,
- $MMTx/config/mmtx.cfg which overwrite,
- $MMTx/config/mmtxRegistry.cfg (lowest precedence).
Input and Ouput File Options
The default behavior is to read input from standard input and write to
standard output. This program has a bias toward document based processing,
it requires having the entire document seen prior to processing any of it.
This is a polite way of saying that if input is coming from standard
input, processing will not begin until an end of file has been send. Put
another way, no processing happens from text put in from standard input,
when a line-feed is seen. Processing only begins when the end of file
marker has been seen: (A control-D) in the borne shell, a control-Z and
carrage return in DOS.
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--fileName= |
stdIn |
Name of file to process |
| __ |
--outputFileName= |
stdOut |
Name of the outputFile to write to |
Input Format Descriptions
The default behavior is to auto-detect MEDLINE Citation format or free
text. The following flags overwrite this feature.
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--medlineCitations |
false |
The input is a collection of medLine citations |
| __ |
--mrcon |
false |
The input is a collection of MRCON rows |
| __ |
--freeText |
true |
The input is free text |
| __ |
--fieldedText |
false |
Is the input file/stdin fielded text? |
| __ |
--textField= |
2 |
For fielded text, which field contains the text |
| __ |
--fieldSeparator= |
| |
For fielded text, what char is the separator |
Options to retrieve ever more levels of detail
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--collections |
false |
Display Collection information |
| __ |
--documents |
false |
Display Documents |
| __ |
--sections |
false |
Display Sections |
| __ |
--sentences |
false |
Display Sentences |
| __ |
--lexicalElements |
false |
Display Lexical Elements |
| __ |
--lexicalEntries |
false |
Display Lexical Entries |
| __ |
--tokens |
false |
Display tokens |
| __ |
--pipedOutput |
false |
Display in a pipe delimited format |
| __ |
--details |
false |
Display the goory details |
Processing Options
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--lexicalLookup= |
2 |
lexical Lookup Algorithm options 1-3 |
| __ |
--ambiguousAcronyms |
false |
Disambiguate sentence boundries using the acronyms and
abbreviations file. |
Configuration Options
| Short Name |
Long Name |
Default Value |
Purpose |
| __ |
--configName= |
mmtx cfg |
The name of the configuration file |
| -R |
--MMTX_ROOT= |
<installdir>/mmtx |
MMTX Root path |
| __ |
--MMTX_USERNAME= |
mmtxUser |
Database Account Name |
| __ |
--MMTX_HOSTNAME= |
<localhost> |
MMTX Root path |
| __ |
--ambiguousAcronymsFile= |
data/lexicon/ambiguousAcronymsFile.txt |
Location of the acronyms and abbreviations file needed in
the tokenizer.
|
| __ |
--inflectionTable= |
inflStatic2001Lexicon |
The Lexicon's inflection table used |
| __ |
--lexiconVersion= |
Static2001Lexicon |
Lexicon Version |
| __ |
--nmm |
false |
Flag that flips between MetaMap output and non MetaMap output
sytle. This flag is useful when combined with the --pipedOutput
and display flags such as the --sentences, --phrases, --nps,
--variants and other levels of detail.
|
|