MetaMapLite

Overview:

The primary goal of MetaMapLite to provide a near real-time named-entity recognizer which is not as rigorous as MetaMap but is much faster while allowing users to customize and augment its behavior for specific purposes.

MetaMapLite uses some of the tables originally developed for MetaMap. Currently, MetaMapLite does not support dynamic variant generation. Named Entities are found using longest match. Restriction by UMLS source and Semantic type is optional. Part-of-speech tagging which improves precision by a small amount (at the cost of speed) is also optional. Negation detection is available using either Wendy Chapman's context or a native negation detection algorithm based on Wendy Chapman's NegEx which is somewhat less effective, but faster.

You can use MetaMap on the web at the Interactive MetaMapLite Page. (A ReSTful service provided by the server is described on the "MetaMapLite ReSTful client page".)

Prerequisites:

MetaMapLite requires a minimum of 16GB of disk space when it has been uncompressed.
MetaMapLite requires a minimum of 2GB of memory to run. At least 4GB is recommended.
You will need a working version of bunzip2 or WinZip or 7-Zip to uncompress the MetaMapLite download file depending on which one you download. If you do not have a copy of bunzip2, it is available from http://www.bzip.org/. Similarly, WinZip is available from http://www.winzip.com/. And 7-Zip is available at http://www.7-zip.org.
To run MetaMapLite, you will need the Java Runtime Environment (JRE). We have tested MetaMapLite with JRE 1.8. The JRE is available from: http://www.java.com
To use MetaMapLite, you must comply with the MetaMap Terms and Conditions.
To download MetaMapLite, you must have accepted the terms of the UMLS Metathesaurus License Agreement, which requires you to respect the copyrights of the constituent vocabularies and to file a brief annual report on your use of the UMLS. You also must have activated a UMLS Terminology Services (UTS) account. For information on how we use UTS authentication please select the Info icon to the right:

For details of the licenses see the UMLS Metathesaurus License Agreement and How to License and Access the Unified Medical Language System® (UMLS®) Data.

Downloads

A note about the inclusion of Covid-19/SARS-Cov-2 strings in the 2022 data sets

MetaMapLite 3.6.2rc8 binary only Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
2022AB UMLS Level 0+4+9 DataSet (WinZip - 1.06g), [sha1sum], [md5sum]
2022AA UMLS Level 0+4+9 DataSet (WinZip - 1.08g), [sha1sum], [md5sum]

To use, extract the archive public_mm_lite_3.6.2rc8_binaryonly.zip and dataset archive public_mm_data_lite_usabase_2022aa.zip in the same directory:

$ unzip public_mm_lite_3.6.2rc8_binaryonly.zip
$ unzip public_mm_data_lite_usabase_2022aa.zip

2022AB UMLS Level 0 Dataset (WinZip - 943m), [sha1sum], [md5sum]
2022AA UMLS Level 0 Dataset (WinZip - 918m), [sha1sum], [md5sum]

To use, extract the archive public_mm_lite_3.6.2rc8_binaryonly.zip and dataset archive public_mm_data_lite_base_2022aa.zip in the same directory:

$ unzip public_mm_lite_3.6.2rc8_binaryonly.zip
$ unzip public_mm_data_lite_base_2022aa.zip

A note about the inclusion of Covid-19/SARS-Cov-2 strings in the 2020 data sets

MetaMapLite 3.6.2rc6 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
2020AB UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]
2020AA UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]
Note: users who have downloaded the 2020AA USAbase data set distribution before May 15th: The 2020AA USAbase data set that was published on this website was missing the SNOMEDCT_US vocabulary. The affected archives have the following checksums:
```
md5sum: aacca5e1e3a3791a5ecd8f4d91473cd2  public_mm_data_lite_usabase_2020aa.7z
sha1sum: 675ec4545373b156a04712b3ca72fcdeab90fc6d  public_mm_data_lite_usabase_2020aa.7z
md5sum: 000fac4b1be197f86386e4e5e1dabb49  public_mm_data_lite_usabase_2020aa.zip
sha1sum: 1c0a16bdeb5560ce40d7a8be5333aeb0a8cfa2a5  public_mm_data_lite_usabase_2020aa.zip
		
```
The archives have been replaced with ones containing the SNOMEDCT_US vocabulary.
2020AA UMLS Level 0 Dataset (WinZip - 877m), [sha1sum], [md5sum]

To use extract the archive public_mm_lite_3.6.2rc6_binaryonly.zip and dataset archive (public_mm_data_lite_base_2020aa.zip or public_mm_data_lite_usabase_2020aa.zip) in the same directory:

$ unzip public_mm_lite_3.6.2rc6_binaryonly.zip
$ unzip public_mm_data_lite_base_2020aa.zip

Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):

$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2020AA/Base file

The path for the level 0+4+9 dataset is data/ivf/2020AA/USAbase.

A note about the inclusion of Covid-19/SARS-Cov-2 strings in the 2020AA data sets

MetaMapLite 3.6.2rc5 binary only Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
2020AA UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]
Note: users who have downloaded the 2020AA USAbase data set distribution before May 15th: The 2020AA USAbase data set that was published on this website was missing the SNOMEDCT_US vocabulary. The affected archives have the following checksums:
```
md5sum: aacca5e1e3a3791a5ecd8f4d91473cd2  public_mm_data_lite_usabase_2020aa.7z
sha1sum: 675ec4545373b156a04712b3ca72fcdeab90fc6d  public_mm_data_lite_usabase_2020aa.7z
md5sum: 000fac4b1be197f86386e4e5e1dabb49  public_mm_data_lite_usabase_2020aa.zip
sha1sum: 1c0a16bdeb5560ce40d7a8be5333aeb0a8cfa2a5  public_mm_data_lite_usabase_2020aa.zip
		
```
The archives have been replaced with ones containing the SNOMEDCT_US vocabulary.
2020AA UMLS Level 0 Dataset (WinZip - 877m), [sha1sum], [md5sum]

To use extract the archive public_mm_lite_3.6.2rc5_binaryonly.zip and dataset archive (public_mm_data_lite_base_2020aa.zip or public_mm_data_lite_usabase_2020aa.zip) in the same directory:

$ unzip public_mm_lite_3.6.2rc5_binaryonly.zip
$ unzip public_mm_data_lite_base_2020aa.zip

Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):

$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2020AA/Base file

The path for the level 0+4+9 dataset is data/ivf/2020AA/USAbase.

MetaMapLite 2018 3.6.2rc3 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
2018AB UMLS Level 0 Dataset (WinZip - 721m), [sha1sum], [md5sum]
2018AB UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]

To use extract the archive public_mm_lite_3.6.2rc3_binaryonly.zip and dataset archive (public_mm_data_lite_base_2018ab_ascii.zip or public_mm_data_lite_usabase_2018ab_ascii.zip) in the same directory:

$ unzip public_mm_lite_3.6.2rc3_binaryonly.zip
$ unzip public_mm_data_lite_base_2018ab_ascii.zip

Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):

$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2018ABascii/Base file

The path for the level 0+4+9 dataset is data/ivf/2018ABascii/USAbase.

The 3.6.2rc2 version of MetaMapLite is a release candidate for version 3.6.2

Fixed error in tokenization when calling OpenNLP's Part-of-Speech tagger
Merged UTF-8 handling code from UTF branch into master

MetaMapLite 2018 3.6.2rc3 with Category 0 (Base) 2018AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2018 3.6.2rc3 with Category 0+4+9 (USAbase) 2018AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2018 3.6.2rc3 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
MetaMapLite 2018 3.6.2rc3 USABase (Category 0+4+9) data Version (WinZip - 1.2GB), [sha1sum], [md5sum]

The 3.6.2rc2 version of MetaMapLite is a release candidate for version 3.6.2 that fixes the following issues:

When using EntityLookup4 (i.e., setting metamaplite.enable.scoring = false), disabling postagging (i.e., setting metamaplite.enable.postagging = false) significantly reduces the number of entities found. On the same collection, I go from a median of 50 entities per document (with postagging = true) to a median of 0 entities per document (with postagging = false).
When using MetaMapLite, EntityLookup4 is initialized every time processDocumentList list is called and again each time processDocument is called, while EntityLookUp5 is only re-initialized when needed.
When using a non-standard data directory, the property: opennlp.en-pos.bin.path: $DATA_DIR/ models/en-pos-maxent.bin must be set. This property is not supplied in the template config file and MML falls back to using the hardcoded default value which results in a crash. It may be helpful to add this property to the generated config file so if a user is customizing their data directory they will know to adjust the properties accordingly.
When using a non-standard data directory, the following properties must be set for MMI file output or null pointer exceptions are thrown:
- metamaplite.index.directory: $DATA_DIR/ivf/2017AA/Base/strict/indices/
- metamaplite.ivf.meshtcrelaxedindex: $DATA_DIR/ivf/2017AA/Base/strict/indices/meshtcrelaxed
These properties are not supplied in the template config file, and result in null pointer exceptions. I think that it might be helpful to add these properties to the generated config file.

MetaMapLite 2018 3.6.2rc2 with Category 0 (Base) 2018AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2018 3.6.2rc2 with Category 0+4+9 (USAbase) 2018AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2018 3.6.2rc2 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
MetaMapLite 2018 3.6.2rc2 USABase (Category 0+4+9) data Version (WinZip - 1.2GB), [sha1sum], [md5sum]

The 3.6.1p1 version of MetaMapLite is a bugfix release that fixes the following issue:

Fixes an error where docid is not propagated to Entity records in output result.

MetaMapLite 2017 3.6.1p1 with Category 0 (Base) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.6.1p1 with Category 0+4+9 (USAbase) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.6.1p1 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
MetaMapLite 2017 3.6.1p1 USABase (Category 0+4+9) data Version (WinZip - 1.2GB), [sha1sum], [md5sum]

The 3.6.1 version of MetaMapLite is a bugfix release that fixes the following issue:

Fixes an error in the method which removes entities which are subsumed by a larger entity in which some entities that were not subsumed were removed.

MetaMapLite 2017 3.6.1 with Category 0 (Base) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.6.1 with Category 0+4+9 (USAbase) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.6.1 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
MetaMapLite 2017 3.6.1 USABase (Category 0+4+9) data Version (WinZip - 1.2GB), [sha1sum], [md5sum]

The 3.6 version of MetaMapLite is a bugfix release that fixes the following issues:

Fixes an error in the longest match algorithm in which entities which were subsumed by a longer enitity were not removed.
Includes an example of creating a result formatter.
Readme documentation has been updated.

MetaMapLite 2017 3.6 with Category 0 (Base) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.6 with Category 0+4+9 (USAbase) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.6 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
MetaMapLite 2017 3.6 USABase (Category 0+4+9) data Version (WinZip - 1.2GB), [sha1sum], [md5sum]

The 3.5 version of MetaMapLite is a bugfix release that fixes the following issues:

The negation status of a concept was not refected in the MMI fielded output.
The location of chunker model file was not user modifiable.
The default properties file was missing a reference to the treecodes file used for MMI fielded output.
Readme documentation has been updated.

MetaMapLite 2017 3.5 with Category 0 (Base) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.5 with Category 0+4+9 (USAbase) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.5 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]

The 3.4 version of MetaMapLite now optionally adds scoring similar to the original MetaMap of concept mapping results for BRAT output and ranked indexing results for MMI Output using MetaMap's Ranked Indexing algorithm. MMI Results may be somewhat different from MetaMap's due to differences in MetaMapLite's mapping scores which are supplied as input to the MMI Ranked Indexing algorithm.

MetaMapLite 2017 3.4 with Category 0 (Base) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]
MetaMapLite 2017 3.4 with Category 0+4+9 (USAbase) 2017AA UMLS dataset (WinZip - 1g), [sha1sum], [md5sum]

MetaMapLite 2016 3.1 SNAPSHOT Version (WinZip - 601 MB), [sha1sum], [md5sum]
MetaMapLite 2016 3.1 SNAPSHOT Version (Bzip2 Tar - 544 MB), [sha1sum], [md5sum]

MetaMapLite 2016 3.0 SNAPSHOT Version (Bzip2 Tar - 544 MB), [sha1sum], [md5sum]

Example of using MetaMapLite in a Servlet instance. The archive provides a minimal example of ANT project demonstrating the use of MetaMapLite in a servlet instance.

Documentation

MetaMapLite README Documentation

MetaMapLite 3.6.2rc5 README Documentation
MetaMapLite 3.6.1 README Documentation
MetaMapLite 3.6 README Documentation
MetaMapLite 3.1 README Documentation
MetaMapLite 3.0 README Documentation (Last Updated: September 26, 2016)

MetaMapLite Source Code

MetaMapLite Github Page

Publications

MetaMap Lite: an evaluation of a new Java implementation of MetaMap. Demner-Fushman D., Rogers WJ, Aronson AR. JAMIA. Volume 24, Issue 4, July 2017. DOI: 10.1093/jamia/ocw177. URL: https://academic.oup.com/jamia/issue/24/4. ALT URL: https://www.ncbi.nlm.nih.gov/pubmed/28130331.

Sources

The Source code for MetaMapLite is supplied with the distribution in the directory public_mm_lite/src. The source code is also available at the MetaMapLite Github Page.

Indexing Initiative:TOOLS

MetaMapLite

Overview:

Prerequisites:

Downloads

MetaMapLite 3.6.2rc8 and UMLS 2022

MetaMapLite 3.6.2rc6 and UMLS 2020

MetaMapLite 3.6.2rc5 and 2020AA datasets

MetaMapLite 3.6.2rc3 and 2018AB datasets

MetaMapLite 3.6.2rc3

MetaMapLite 3.6.2rc2

MetaMapLite 3.6.1p1

MetaMapLite 3.6.1

MetaMapLite 3.6

MetaMapLite 3.5

MetaMapLite 3.4

MetaMapLite 2016 3.1 SNAPSHOT

MetaMapLite 2016 3.0 SNAPSHOT

Example MetaMapLite Servlet

Documentation

MetaMapLite README Documentation

MetaMapLite Source Code

Publications

Sources