TOOLS: MetaMap

MetaMapLite

Overview:

The primary goal of MetaMapLite to provide a near real-time named-entity recognizer which is not as rigorous as MetaMap but is much faster while allowing users to customize and augment its behavior for specific purposes.

MetaMapLite uses some of the tables originally developed for MetaMap. Currently, MetaMapLite does not support dynamic variant generation. Named Entities are found using longest match. Restriction by UMLS source and Semantic type is optional. Part-of-speech tagging which improves precision by a small amount (at the cost of speed) is also optional. Negation detection is available using either Wendy Chapman's context or a native negation detection algorithm based on Wendy Chapman's NegEx which is somewhat less effective, but faster.

You can use MetaMap on the web at the Interactive MetaMapLite Page. (A ReSTful service provided by the server is described on the "MetaMapLite ReSTful client page".)

Prerequisites:

Downloads

MetaMapLite 3.6.2rc8 and UMLS 2022

A note about the inclusion of Covid-19/SARS-Cov-2 strings in the 2022 data sets

To use, extract the archive public_mm_lite_3.6.2rc8_binaryonly.zip and dataset archive public_mm_data_lite_usabase_2022aa.zip in the same directory:
$ unzip public_mm_lite_3.6.2rc8_binaryonly.zip
$ unzip public_mm_data_lite_usabase_2022aa.zip
	    
To use, extract the archive public_mm_lite_3.6.2rc8_binaryonly.zip and dataset archive public_mm_data_lite_base_2022aa.zip in the same directory:
$ unzip public_mm_lite_3.6.2rc8_binaryonly.zip
$ unzip public_mm_data_lite_base_2022aa.zip
	    

MetaMapLite 3.6.2rc6 and UMLS 2020

A note about the inclusion of Covid-19/SARS-Cov-2 strings in the 2020 data sets

  • MetaMapLite 3.6.2rc6 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
  • 2020AB UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]
  • 2020AA UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]

    Note: users who have downloaded the 2020AA USAbase data set distribution before May 15th: The 2020AA USAbase data set that was published on this website was missing the SNOMEDCT_US vocabulary. The affected archives have the following checksums:

    md5sum: aacca5e1e3a3791a5ecd8f4d91473cd2  public_mm_data_lite_usabase_2020aa.7z
    sha1sum: 675ec4545373b156a04712b3ca72fcdeab90fc6d  public_mm_data_lite_usabase_2020aa.7z
    md5sum: 000fac4b1be197f86386e4e5e1dabb49  public_mm_data_lite_usabase_2020aa.zip
    sha1sum: 1c0a16bdeb5560ce40d7a8be5333aeb0a8cfa2a5  public_mm_data_lite_usabase_2020aa.zip
    		
    The archives have been replaced with ones containing the SNOMEDCT_US vocabulary.

  • 2020AA UMLS Level 0 Dataset (WinZip - 877m), [sha1sum], [md5sum]
To use extract the archive public_mm_lite_3.6.2rc6_binaryonly.zip and dataset archive (public_mm_data_lite_base_2020aa.zip or public_mm_data_lite_usabase_2020aa.zip) in the same directory:
$ unzip public_mm_lite_3.6.2rc6_binaryonly.zip
$ unzip public_mm_data_lite_base_2020aa.zip
	    
Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):
$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2020AA/Base file
	    
The path for the level 0+4+9 dataset is data/ivf/2020AA/USAbase.

MetaMapLite 3.6.2rc5 and 2020AA datasets

A note about the inclusion of Covid-19/SARS-Cov-2 strings in the 2020AA data sets

  • MetaMapLite 3.6.2rc5 binary only Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
  • 2020AA UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]

    Note: users who have downloaded the 2020AA USAbase data set distribution before May 15th: The 2020AA USAbase data set that was published on this website was missing the SNOMEDCT_US vocabulary. The affected archives have the following checksums:

    md5sum: aacca5e1e3a3791a5ecd8f4d91473cd2  public_mm_data_lite_usabase_2020aa.7z
    sha1sum: 675ec4545373b156a04712b3ca72fcdeab90fc6d  public_mm_data_lite_usabase_2020aa.7z
    md5sum: 000fac4b1be197f86386e4e5e1dabb49  public_mm_data_lite_usabase_2020aa.zip
    sha1sum: 1c0a16bdeb5560ce40d7a8be5333aeb0a8cfa2a5  public_mm_data_lite_usabase_2020aa.zip
    		
    The archives have been replaced with ones containing the SNOMEDCT_US vocabulary.

  • 2020AA UMLS Level 0 Dataset (WinZip - 877m), [sha1sum], [md5sum]
To use extract the archive public_mm_lite_3.6.2rc5_binaryonly.zip and dataset archive (public_mm_data_lite_base_2020aa.zip or public_mm_data_lite_usabase_2020aa.zip) in the same directory:
$ unzip public_mm_lite_3.6.2rc5_binaryonly.zip
$ unzip public_mm_data_lite_base_2020aa.zip
	    
Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):
$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2020AA/Base file
	    
The path for the level 0+4+9 dataset is data/ivf/2020AA/USAbase.

MetaMapLite 3.6.2rc3 and 2018AB datasets

To use extract the archive public_mm_lite_3.6.2rc3_binaryonly.zip and dataset archive (public_mm_data_lite_base_2018ab_ascii.zip or public_mm_data_lite_usabase_2018ab_ascii.zip) in the same directory:
$ unzip public_mm_lite_3.6.2rc3_binaryonly.zip
$ unzip public_mm_data_lite_base_2018ab_ascii.zip
	    
Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):
$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2018ABascii/Base file
	    
The path for the level 0+4+9 dataset is data/ivf/2018ABascii/USAbase.

MetaMapLite 3.6.2rc3

The 3.6.2rc2 version of MetaMapLite is a release candidate for version 3.6.2

  • Fixed error in tokenization when calling OpenNLP's Part-of-Speech tagger
  • Merged UTF-8 handling code from UTF branch into master

MetaMapLite 3.6.2rc2

The 3.6.2rc2 version of MetaMapLite is a release candidate for version 3.6.2 that fixes the following issues:

  • When using EntityLookup4 (i.e., setting metamaplite.enable.scoring = false), disabling postagging (i.e., setting metamaplite.enable.postagging = false) significantly reduces the number of entities found. On the same collection, I go from a median of 50 entities per document (with postagging = true) to a median of 0 entities per document (with postagging = false).
  • When using MetaMapLite, EntityLookup4 is initialized every time processDocumentList list is called and again each time processDocument is called, while EntityLookUp5 is only re-initialized when needed.
  • When using a non-standard data directory, the property: opennlp.en-pos.bin.path: $DATA_DIR/ models/en-pos-maxent.bin must be set. This property is not supplied in the template config file and MML falls back to using the hardcoded default value which results in a crash. It may be helpful to add this property to the generated config file so if a user is customizing their data directory they will know to adjust the properties accordingly.
  • When using a non-standard data directory, the following properties must be set for MMI file output or null pointer exceptions are thrown:
    • metamaplite.index.directory: $DATA_DIR/ivf/2017AA/Base/strict/indices/
    • metamaplite.ivf.meshtcrelaxedindex: $DATA_DIR/ivf/2017AA/Base/strict/indices/meshtcrelaxed
    These properties are not supplied in the template config file, and result in null pointer exceptions. I think that it might be helpful to add these properties to the generated config file.

MetaMapLite 3.6.1p1

The 3.6.1p1 version of MetaMapLite is a bugfix release that fixes the following issue:

  • Fixes an error where docid is not propagated to Entity records in output result.

MetaMapLite 3.6.1

The 3.6.1 version of MetaMapLite is a bugfix release that fixes the following issue:

  • Fixes an error in the method which removes entities which are subsumed by a larger entity in which some entities that were not subsumed were removed.

MetaMapLite 3.6

The 3.6 version of MetaMapLite is a bugfix release that fixes the following issues:

  • Fixes an error in the longest match algorithm in which entities which were subsumed by a longer enitity were not removed.
  • Includes an example of creating a result formatter.
  • Readme documentation has been updated.

MetaMapLite 3.5

The 3.5 version of MetaMapLite is a bugfix release that fixes the following issues:

  • The negation status of a concept was not refected in the MMI fielded output.
  • The location of chunker model file was not user modifiable.
  • The default properties file was missing a reference to the treecodes file used for MMI fielded output.
  • Readme documentation has been updated.

MetaMapLite 3.4

The 3.4 version of MetaMapLite now optionally adds scoring similar to the original MetaMap of concept mapping results for BRAT output and ranked indexing results for MMI Output using MetaMap's Ranked Indexing algorithm. MMI Results may be somewhat different from MetaMap's due to differences in MetaMapLite's mapping scores which are supplied as input to the MMI Ranked Indexing algorithm.

MetaMapLite 2016 3.1 SNAPSHOT

MetaMapLite 2016 3.0 SNAPSHOT

Example MetaMapLite Servlet

Documentation

MetaMapLite README Documentation

MetaMapLite Source Code

Publications

MetaMap Lite: an evaluation of a new Java implementation of MetaMap. Demner-Fushman D., Rogers WJ, Aronson AR. JAMIA. Volume 24, Issue 4, July 2017. DOI: 10.1093/jamia/ocw177. URL: https://academic.oup.com/jamia/issue/24/4. ALT URL: https://www.ncbi.nlm.nih.gov/pubmed/28130331.

Sources

The Source code for MetaMapLite is supplied with the distribution in the directory public_mm_lite/src. The source code is also available at the MetaMapLite Github Page.