FAQ for IRMNG
Author: Tony Rees, CSIRO Marine and Atmospheric Research, Hobart
Last updated: 30 January 2011
Q: What is the purpose of IRMNG?
A: IRMNG, the Interim Register of Marine and Nonmarine Genera, exists to provide a machine- and human- queryable system that is able to answer some basic questions about organisms based on the genus component (or in around 50% of cases, the genus+species component) of their scientific name. Such questions in the first instance comprise:
- What is the correct spelling (orthography) and authorship of the relevant genus name
- Where does this taxon (i.e. species / genus) fit in a taxonomic hierarchy (i.e., attribution to family wherever possible, plus relevant higher taxa)
- Is this a known extant or fossil taxon
- Is this a known marine or nonmarine taxon
- Is this genus name unique or non-unique (i.e., a homonym) either within a taxonomic group, or across multiple taxonomic groups or codes
- What genus and/or species names are spelled similarly to my input (search) term? (in the event that either the search term is misspelled, or variant / misspellings exist in the IRMNG database)
Additional questions that may also be answerable in a subset of cases:
- Nomenclatural and taxonomic information regarding this taxon, e.g.
- Nomenclatural: (e.g. is the genus name validly published or a nomen nudum, a replacement for a previously published name, a published misspelling, etc.)
- Taxonomic: is this genus currently considered a synonym of another (valid) genus name (and if so, which name)
- (Partial) list of species names for this genus – including nomenclatural and taxonomic information in some cases
- Citation of the place of publication of a genus name (e.g. as given in Nomenclator Zoologicus or elsewhere)
- Link to online image of the original publication instance (e.g. via BHL, JSTOR or other).
When initially conceived in 2006, this information was not available in a comprehensive, internally consistent form in any other system, although at some future point it may be (for example via the proposed Global Names Architecture or its components).
Q: How does IRMNG differ from other, apparently comparable biodiversity databases?
A: IRMNG aspires to maximize coverage at the level of genus across all groups, i.e. animal, plants, algae, protists, fungi, prokaryotes, and viruses, both extant and fossil, and also include flags to indicate extant/fossil and basic habitat status as above. Other comparable biodiversity databases are either limited to a single taxonomic group (e.g. Index Nominum Genericorum for plants, Index Fungorum for fungi, Catalog of Fishes for fish, etc. etc.), or to taxa of a particular type (e.g. Paleo Database for fossils, World Register of Marine Species for marine species only), or to taxa from a particular geographic region (e.g. Fauna Europaea, Australian Plant Checklist, Species 2000 New Zealand). The most comparable, broad spectrum initiative (though excluding fossils) is probably the Catalogue of Life, however this omits much detail at genus level (such as genus authorities and genus level synonyms) and in addition presently aims for completeness at species level, so proceeds much more slowly towards completion.
Another noteworthy compilation, that of Nomenclator Zoologicus, has excellent coverage of many zoological genus names from 1758-2004 approx., but omits family allocation, habitat flags, and consideration of current taxonomic validity in all but a few cases.
Q: What is the significance of "Interim" in the IRMNG context?
A: In the context of IRMNG, "Interim" indicates that this is largely a first-pass compilation of data from a wide range of sources which may contain some internal inconsistencies and data errors, that have not subsequently received the degree of scrutiny and validation found in more authoritative, single-group sources. Over time, these aspects of IRMNG should be improved however in the first instance, it is deemed desirable to have a system with the range of IRMNG available for use in the Interim rather than unavailable on account of certain residual taxonomic or data issues.
Q: How has IRMNG been populated?
A: For obvious reasons, IRMNG draws heavily on pre-existing genus level compilations which in a number of cases, have been generously made available to the project by their respective compilers. In approximate order of incorporation, the major sources utilised to date have been as follows:
- Parker, S.P. (ed.), 1982. Synopsis and Classification of Living Organisms. McGraw-Hill, New York. [Print source] (Initial family and higher level classification – 6,800 family names)
- The Taxonomicon & Systema Naturae 2000 online compilation, 2006 version, courtesy Dr. S. Brands, Netherlands (112,000 genus names plus additional 2,300 family names) – current web address: http://taxonomicon.taxonomy.nl/
- Catalogue of Life 2006 version, incorporating contributions from over 40 GSDs (Global species databases) plus ITIS, the Integrated Taxonomic Information System, courtesy Catalogue of Life partnership (36,000 additional genus names, 2,100 additional families, 1,282,000 species names) - current web address (latest version): http://www.catalogueoflife.org/
- Museum Victoria KEmu database (Oct 2006) (9,000 additional genus names, 900 additional families, 56,000 additional species names)
- Sepkoski, J.J., 2002. A compendium of fossil marine animal genera. Bulletins of American Paleontology, 364. Ithaca, NY (27,000 additional genus names, no families but sorted by order). Available online at http://strata.geology.wisc.edu/jack/
- Benton. M. (ed.), 1993. The Fossil Record 2. Chapman & Hall, London. (2,900 additional fossil + extant families). Spreadsheet version available online at http://www.fossilrecord.net/fossilrecord/index.html
- Index Nominum Genericorum (2007 version) for plant genera, courtesy Dr. E. Farr (35,000 additional plant genus names, 400 additional families) – current web address http://botany.si.edu/ing/
- Aphia databases maintained at VLIZ, Belgium (supporting European Register of Marine Species and 17 other region or taxon-specific databases), 2006 version, courtesy ERMS editors (3,300 additional genus names, 120 additional families, 45,000 additional species names)
- Australian Faunal Directory (October 2007 version) (9,800 additional genus names, 190 additional families, 55,000 additional species names) - current web aaddress http://www.environment.gov.au/biodiversity/abrs/online-resources/fauna/afd/
- Unpublished (as at 2007) Species 2000 New Zealand compilation, courtesy Dr. D. Gordon (1,800 additional genus names, 54 additional families, 10,000 additional species names)
- List of Names with Standing in Prokaryotic Nomenclature (2008 version), courtesy Dr. J-P. Euzéby (all taxonomic allocations checked, plus 450 additional prokaryote genus names, 77 additional families) – current web address http://www.bacterio.cict.fr/
- Nomenclator Zoologicus (2006 electronic version), (205,000 additional genus names, 440 additional families) – current web address http://ubio.mbl.edu/NomenclatorZoologicus/
- Melville, R.V. & Smith, J.D.D., (eds). Official Lists and Indexes of Names and Works in Zoology. ICZN, London. (Approx. 50% of taxonomic status information on generic names from relevant ICZN Opinions uploaded to IRMNG, covering 1,800 genera)
- Index Fungorum, electronic database and nomenclator for fungi (2009 version) (all taxonomic allocations checked, plus 1,800 additional genus names, 150 additional family names) – current web address http://www.indexfungorum.org/
- GBIF taxonomy, May 2010 (incorporating Catalogue of Life 2009, Paleobiology Database and numerous other sources not otherwise consulted): upgraded taxonomic placement for 46,000 genera not previously placed to family level – current web address http://www.gbif.org/
Plus in addition, a wide range of print sources, more recent updates (e.g. fishes current taxonomy from FishBase courtesy Dr. N. Bailly) and smaller electronic compilations including CAAB (Codes for Australian Aquatic Biota) and others, contributing the balance of current IRMNG holdings (additional 6,000 genus names, 2,900 families, 10,000 species names).
From the above list it is also clear that other major sources exist which could potentially be utilized in IRMNG, including IPNI (http://www.ipni.org/), uBio (http://www.ubio.org/), the Paleobology Database (http://www.paleodb.org/) and more, but have not yet been as yet, mainly for reasons of time.
Q: How complete is IRMNG at this time?
A: This is difficult to answer exactly, since no reliable estimates of total numbers of extant and fossil, valid names and synonyms exist at either genus, family or species level; however the author estimates from a range of sources and guesstimates that there may be a total 6.5–7m published species names to date, of which approximately 2.2m are valid (the latter increasing at around 25,000 per year); 470,000 published genus names of which perhaps 250,000 are valid (increasing at around 2,500 per year); and perhaps 30,000 published family names of which maybe 17,000 are valid, for both extant and fossil taxa. On the basis of these approximations, IRMNG currently includes "most" valid published family names plus a small subset of synonyms, "most" published genus names, both valid and non-valid (446,000 of perhaps 470,000, i.e. around 95%), and a subset of species names only at this time (1.45m out of perhaps 6.75m, or a little over 20%, though the figure rises to around 50% if synonyms are excluded).
Q: How many homonyms / non-unique genus names are in IRMNG?
A: One important function of IRMNG is to indicate, at least as far as data already held, whether a particular genus name is unique or whether it occurs in multiple instances, either between or within the same taxonomic groups. Currently there are almost 69,000 genus-level homonyms (around 29,000 separate names) included in IRMNG, representing around 15% of all names or approx. one name in every 7 (this figure also includes nomina nuda plus a small number of misspellings that accidentally coincide with a different, correctly spelled name). The name with the largest number is probably Wagneria of which there are 12 listed instances in zoology and 2 in botany, of which a maximum of one instance can be valid in either domain with the remainder invalid, of which a subset may be either synonyms (for example replaced by subseqently published new names), or orthographic variants/misspellings of otherwise valid names.
Q: What editing is required for IRMNG compilation?
A: The majority of name data (taxon names and authorities) are imported into IRMNG from the relevant data sources without modification, except in the case of database errors apparent from cross-comparison with external sources and a limited amount of authority normalization to produce a consistent "house style" (including expansion of botanical authors for genera when supplied in abbreviated form, to match the format used in Index Nominum Genericorum).
Family attribution may be adjusted from that given with incoming data where a more recent, authoritative source is available, and editorial input may be required to decide which source to follow in instances where opinion is divided. Missing data (such as authorities, also nomenclatural/taxonomic comments, and habitat and extant/fossil flags) is frequently added from a variety of supplementary sources and in these cases, editorial decisions are sometimes required as to which instance of a genus name is involved in each case (often self evident, but sometimes not so). Editorial decisions are also required to decide whether two highly similar names and cited authorities in different base datasets represent either the same or different genus publication instances, for example some animal names may also be represented as plants, plants or protists as fungi, corals as sponges, etc. etc., particularly in early literature; where such cases are detected, a decision is then made either to retain both records as separate instances or to combine them into a single record for IRMNG.
Editorial input is also involved in determining the status of names from some of the less authoritative sources as either genuine new instances, or as misspellings of names already on the list, in which case a note is added together with a pointer to the name variant deemed to be the correct spelling.
Q: What gaps remain to be filled in IRMNG?
A: IRMNG can be deemed complete (at genus level) when (a) all published genus names to date are included (a moving target of course); (b) all name variants not yet "verified" from appropriate trusted sources (i.e., Nomenclator-grade compilations) are either verified from other sources e.g. primary literature, or assessed to be misspellings of "verified" names already on the list; (c) all genus names are assigned to actual families rather than "placeholders" such as "Mollusca (unallocated)"; (d) the higher taxonomic categories are all filled (e.g., no gap between family and class, or between order and phylum); (e) all genera have an assigned (and perhaps, separately verified) status flag for extant/fossil and marine/nonmarine status (or both as applicable); and (f) the taxonomic status of all generic names is known (i.e. valid or non-valid; if a synonym, what is the current valid name). Progress according to these various metrics is shown below, as at June 2010.
(a) Genus names held
As detailed above, currently IRMNG holds some 446,000 of an estimated 470,000 published genus names to date (the latter increasing at perhaps 2,500 per year), indicating that at present some 24,000 names are currently missing (although this figure could vary by perhaps +/- 10,000 according to the basis of the estimates used).
(b) Verified versus Unverified Names
Approximately 12,400 of the 446,000 genus names in IRMNG are "unverified" from appropriate authoritative sources at this time. Experience suggests that perhaps 50% of these will turn out to be database errors in sources used to construct IRMNG, and the remaining 50% "good" new names verifiable from additional sources.
(c) Genera assigned to "placeholder" rather than actual families
At present, approximately180,000 of the 446,000 genus names in IRMNG are allocated to "placeholders" at family level (example: "Mollusca (unallocated)") rather than true families. Mechanisms to address this deficiency are currently being investigated.
(d) Families assigned to "placeholder" rather than actual orders
At present, approximately 2,500 of the 19,300 family names in IRMNG are allocated to "placeholders" at order level (example: "Mollusca (unallocated)") rather than true orders. This deficiency is being corrected over time.
(e) Completeness of flagging (extant/fossil/both, marine/nonmarine/both) at genus level
Currently approx. 298,000 of the 446,000 genus names in IRMNG (67%) are currently allocated an extant/fossil status flag and 148,000 are not, while for marine/nonmarine status, 373,000 (84%) are allocated a marine/nonmarine status flag and 73,000 are not.
(f) Assessment of current taxonomic status of generic names
This is a low level priority for IRMNG at this time, however at present some 161,000 genus names of 446,000 (36%) are flagged as currently valid, and 52,000 (11%) are flagged as non-valid of which 38,000 are pointed to the relevant valid name instance, leaving 53% without valid/non-valid flags at this time.
Q: How is IRMNG currently maintained, and what are its long term development plans?
A: IRMNG construction commenced in 2006 following an analysis of needs in respect of taxonomic names management for OBIS (the Ocean Biogeographic Information System), and is currently considered a contribution to the International OBIS system from OBIS Australia (www.obis.org.au), which is hosted at CSIRO Marine and Atmospheric Research (CMAR) in Australia. CMAR has currently contributed in kind to IRMNG development and ongoing hosting as part of its commitment to OBIS AU, and small amounts of OBIS and also GBIF funds have contributed to aspects of its population. At present IRMNG continues to be maintained by CMAR as an ongoing contribution to OBIS and other initiatives that may have a use for it as a taxonomic information system. IRMNG may also evolve into a component of the Global Names Architecture (GNA) / Global Names Usage Bank (GNUB) at some point (see www.globalnames.org/), however scoping of those activities and potential IRMNG interaction is still at a relatively early stage at this time.
Last modified 31-01-2011