Number: 2000-025-1-800
Title: IUPAC- International Chemical Identifier
Task Group
Chairman: A. McNaught
Members: S. Heller
and S. Stein
Remarks:
- initiated by the ad
hoc Committee on Chemical Identity and Nomenclature Systems
- In July 2004, the Identifier was renamed INChI (formerly IChI) to
acknowledge the development work at NIST.
- In November 2004, the Identifier was renamed IUPAC International
Chemical Identifier (InChI), to allow trademark, copyright and licensing
issues to be resolved.
Completion Date: 2005 - project completed
Objective:
The objective of the IUPAC Chemical Identifier Project is to establish
a unique label, the IUPAC Chemical Identifier, which would be a non-proprietary
identifier for chemical substances that could be used in printed and
electronic data sources thus enabling easier linking of diverse data
compilations.
Description:
Develop a set of algorithms for the standard representation of
chemical structures that will be readily accessible to chemists in
all countries at no cost. The standard chemical representation could
be used as input into existing and newly developed computer programs
to generate a IUPAC name and a unique IUPAC identifier.
> See detailed
description
Progress:
Our initial work has focussed on the development of algorithms
for converting an input organic chemical structure to a unique (canonical)
form. This, in effect, involves the unique numbering of each atom,
with equivalent atoms being assigned identical numbers. "Serializing"
the result to create a string is the final, straightforward, step
in creating an identifier.
As discussed in the Cambridge IUPAC meeting to consider the feasibility
of the project in August 2000, most of the ideas employed in this
work have been reported in the technical literature. The principal
task of this project has been to identify and implement a workable,
robust set of procedures that will provide effective IChI processing
for a large proportion of organic chemical structures in common use.
At the Cambridge meeting it was agreed to develop a "layered" approach,
where different levels of structural information are separately represented
in the identifier. Work has consequently proceeded by step-by-step
building of the individual layers. Since the order of application
of the layers could affect the final labeling, this process is somewhat
more complex that might initially appear.
The layers under development are:
The first of these items does not seem to have been addressed adequately
in the literature, although appropriate processing algorithms have
been found in mathematical journals.
We hope to complete these remaining tasks within two months and then
to implement the IChI processor as a standalone program that can automatically
process standard "MOL-files". When this is available, assistance will
be sought to further test, and possibly refine the IChI name generation
process.
Depending on results of these tests and discussions, it will be decided
whether improvements or additional features are desirable, and, if
so, whether these need to be followed by another round of testing.
For instance, it needs to be determined whether the first version
should allow a canonical representation of partially-specified stereochemical
structures.
Finally, as discussed in the Cambridge meeting, there are no plans
to include the following structural representations in the first version:
March 2002 update
The first beta-test version of the program is now available. It runs
as a conventional Windows application under 32-bit Microsoft Windows
operating systems. Neither the underlying algorithms nor the program
have been perfected - this distribution is intended primarily to allow
others to participate in the further development.
This program treats only covalently bonded compounds and uses Molfiles
(and SDfiles) as input. Along with the executable programs, the distribution
package contains documentation and example structure files.
The package can be obtained from Steve Stein by e-mail to [email protected].
Unless requested otherwise, the package will be delivered as a 'zip'
file in an e-mail attachment to the return address.
A demonstration of Identifier generation within a (Windows) structure-drawing
program, working in conjunction with the beta test program, can be
obtained from Alan McNaught by e-mail to [email protected].
There was a discussion of the project at the "CAS/IUPAC
Conference on Chemical Identifiers and XML for Chemistry" on July
1, 2002 in Columbus, Ohio. On the preceding day (June 30th) at the
same location the Project Group met to review progress and consider
comments received.
July 2002 update
At the Task group meeting in Columbus, OH, on 30 June 2002,
Steve Stein reviewed the progress made by NIST in developing the test
version of the IUPAC Chemical Identifier. The test version handles
simple organic molecules. To date, in all of the testing (almost 70
copies have been distributed) there are no known examples of chemicals
that the program does not handle. A number of suggestions (described
below) were made regarding testing and output. The overall view was
that the project is progressing considerably faster than expected.
> Download report - pdf file (118
KB)
A lecture by Steve Stein on the project was given the following day
at the CAS/IUPAC Conference on Chemical identifiers and XML for Chemistry
and a copy of the slides presented can be viewed at: http://www.hellers.com/steve/pub-talks/columbus-702/frame.htm
November 2003 update
A combined meeting for two related IUPAC projects, the XML
Data Dictionary Project (#2002-022-1-024) and this Chemical Identifier
Project (#2000-025-1-800), was held at the National Institute of Standards
and Technology (NIST, Gaithersburg, Maryland, US) on November 12-14,
2003.
A report on that meeting is published in Chem.
Int.
July-Aug 2004.
A full account of the meeting is available at <www.warr.com/inchi.pdf>
July 2004 update
A new test version of the IUPAC-NIST Chemical Identifier (INChI)
is now available. It replaces the previous test version issued last
November. All features planned for inclusion in the final release
have now been implemented and the final format for Identifier has
been proposed. The new name of the Identifier (formerly IUPAC Chemical
Identifier, INChI) acknowledges the development work at NIST. The
test program accepts input in the form of MOLFiles (or SDfiles) and
CML files. An Application Program Interface (API) for communicating
with external programs is under development.
A single INChI is generated for a single input structure, which can
contain multiple components. Identifiers can be created for organic
compounds with Z/E and sp3 stereochemistry, tautomers, and isotopes
as well as salts, organometallic compounds and protonated forms of
a compound.
Test programs (for Microsoft Windows), documentation and sample structure
files are available upon request from Steve Stein <[email protected]>.
The project team very much welcomes comments concerning the INChI
and will be glad to assist in its testing or implementation.
November 2004 update
To allow trademark copyright and licensing issues to be resolved
before distribution of version 1.0, the name of the Identifier was
changed to IUPAC International Chemical Identifier (InChI).
April 2005 - project completed
Version 1 of IUPAC's International Chemical Identifier (InChI) has
now been released; software, documentation, source code and licensing
conditions are available from the IUPAC website at www.iupac.org/inchi.
Promotion and extension continue through project
2004-039-1-800.
> see release;
> FAQ
(prepared by Nick Day of the Unilever Centre for Molecular
Informatics, Cambridge University; http://wwmm.ch.cam.ac.uk/inchifaq/)
Clipping
> That
INChI Feeling Reactive Reports, Sep 2004 (issue
40)
> Unique
labels for compounds C&EN, 26 Nov 2002
> That
ICHI feeling ... The Alchemist, 24 Apr 2002
> What's
in a Name? The Alchemist, 21 Mar 2002
Last Update: 14 April 2005
<project announcement published
in Chem.
Int.
23(3) 2001>