max planck institut

informatik

informatik

Resnik's measure is a measure of semantic similarity between ontology terms. It is based on the information content of a term,

which uses the term probability. We define the relative frequency of a GO term in the UniProt database as its probability.

Resnik's measure for comparing two terms *t _{1}* and

We use the abbreviation Res for referring to Resnik's measure. By default, these columns are hidden in the results of functional

comparison.

Ref.: Resnik, J Artif Intell Res (1999), 11:95-130.

Lin's measure of semantic similarity is also based on the information content of GO terms. Lin's measure for comparing two terms

*t _{1}* and

Again,

similarity to 1 for terms with maximum similarity. We use the abbreviation Lin for referring to Lin's measure.

Ref.: Lin, Proc 15th Int'l Conf. on Machine Learning (ICML-98) (1998), 296-304.

The *simRel* score is a functional similarity measure for comparing two GO terms with each other. It is based on Resnik's and Lin's

similarity measures. The *simRel* score ranges from 0 for terms that have no similarity to 1 for terms with maximum similarity. It is

calculated as follows:

*t _{1}* and

similarity to 1 for terms with maximum similarity. We use the abbreviation

Ref.: Schlicker

Jiang and Conrath defined a distance measure between GO terms that is based on the information content. The similarity measure

using Jiang and Conrath's distance is defined as follows:

*S(t _{1}, t_{2})* is the set of common ancestors of terms

We use the abbreviation Jiang for referring to this score. By default, these columns are hidden in the results of functional

comparison.

Ref.: Couto

Functional similarity measures are used to compare two proteins or protein families. There are two types of functional similarity

measures, structure-based and semantic similarity-based. The different functional similarity measures between two proteins

or protein families *p* and *q* annotated with the sets *GO ^{p}* and

The *UI* score is defined as follows:

*g ^{p}* and

contains

By default, these columns are hidden in the results of functional comparison.

Ref.: Guo

The *simGIC* score is based on the *UI* score. Instead of counting the number of nodes in the union and intersection of the two induced

graphs, it sums up their information content. It is defined as follows:

The *simGIC* score ranges from 0 for no similarity to 1 for highest similarity. By default, these columns are hidden in the

results of functional comparison.

Ref.: Pesquita *et al*., Proc 10th Annual Bio-Ontologies Meeting (2007)

The term overlap (*TO*) and normalized term overlap (*NTO*) scores are based on the number of annotated terms shared between two proteins.

The *TO* score is defined as the number of terms shared between *g ^{p}* and

The

Ref.: Mistry and Pavlidis, BMC Bioinformatics (2008), 9:327

A *GOscore* is a measure of functional similarity between two proteins or protein families with respect to either biological process

(*BPscore*), molecular function (*MFscore*), or cellular component (*CCscore*). Considering two gene products A and B annotated

with the sets GO^{A} and GO^{B} of GO terms with sizes N and M, respectively, a similarity matrix S is calculated. This matrix contains

all pair wise similarity values of mappings GO^{A}_{i} of gene product A and mappings GO^{B}_{j} of gene product B:

The matrix S is not necessarily symmetric or square since the proteins can have different types and numbers of GO mappings. The

rows and the columns of S represent two different directional comparisons, row vectors correspond to a comparison of A to B

and column vectors to a comparison of B to A. The best hits for the comparison of A with B are determined as maximum values in the

rows in matrix S (row maxima). The maximum values in the columns of S (column maxima) are the best hits for the direction B to A.

The averages over the row maxima and the column maxima give similarity values for the comparison of A to B and the comparison of

B to A, respectively:

The *GOscore* is then computed as the maximum of *rowScore* and *columnScore*:

It can be computed either by Resnik's measure, Lin's measure or *simRel*. We use the abbreviations BP, MF, and CC for referring to

the *BPscore*, *MFscore*, and *CCscore*, respectively.

Ref.: Schlicker *et al*., BMC Bioinformatics (2006), 7(1):302

*GOscore max* is a measure of functional similarity between two proteins or protein families with respect to either biological process

(*BPscore*), molecular function (*MFscore*), or cellular component (*CCscore*). The matrix S is computed as described for *GOscore BM*.

*GOscore max* is defined as the maximum over all s_{ij}:

It can be computed either by Resnik's measure, Lin's measure or *simRel*.

Ref.: Lord *et al*., Bioinformatics (2003), 19(10):1275-1283

*GOscore avg* is a measure of functional similarity between two proteins or protein families with respect to either biological process (*BPscore*),

molecular function (*MFscore*), or cellular component (*CCscore*). The matrix S is computed as described for *GOscore BM*.

*GOscore max* is defined as the maximum over all s_{ij}:

It can be computed either by Resnik's measure, Lin's measure or *simRel*.

Ref.: Lord *et al*., Bioinformatics (2003), 19(10):1275-1283

The *funSim* score is calculated from the *BPscore* and the *MFscore* of a pair of proteins or protein families. It is defined as follows:

Here, max(*BPscore*) and max(*MFscore*) denote the maximal score for biological process and molecular function, respectively.

The *funSim* score is computed using *simRel*, and *GOscore*. It ranges from 0 for no functional similarity to 1 for maximal functional similarity.

Ref.: Schlicker *et al*., BMC Bioinformatics (2006), 7(1):302

The *rfunSim* score is calculated from the *funSim* of a pair of proteins or protein families. It is defined as square root of the *funSim* score. It

ranges from 0 for no functional similarity to 1 for maximal functional similarity.

Ref.: Schlicker *et al*., Genome Biol (2007), 8(3):R33

The *funSimAll* score is calculated from the *BPscore*, *MFscore* and the *CCscore* of a pair of proteins or protein families. It is defined as:

Here, *max(BPscore)*, *max(MFscore)* and *max(CCscore)* denote the maximum possible score for biological process, molecular function, and

cellular component, respectively. The *funSim* score is computed using *simRel*, and *GOscore*. It ranges from 0 for no functional similarity to 1

for maximal functional similarity.

The *rfunSimAll* score is calculated as the square root of the *funSimAll* score of a pair of proteins or protein families.

It ranges from 0 for no functional similarity to 1 for maximal functional similarity.