Erratum

March 2000

It has come to my attention that there is an inconsistency in some of the equations in the paper Improved Heterogeneous Distance Functions, in which some squares and square roots do not line up correctly.

The quick solution is that Equations (20), (22) and (26) should have square roots over the summations (along with Equations (19) and (21), though those are not as important), so that the equations are all consistent with Figure 12, which is correct. These equations are not necessarily "wrong," because they would be correct if Equations (19), (21) and (25) did not square the values they summed, but as it is, they are not consistent with those latter equations.


The Value Difference Metric (VDM) was originally defined as in equation (8), with no square root:

(8)

In the section on HVDM, the distance between input vectors was defined as the square root of the sum of the squares of the individual attribute distances, each of which could be a linear distance (for continuous attributes) or a VDM distance (for discrete attributes):

(11)

(12)

Since VDM is already a sum of squares, it is in a sense already squared. Therefore, we needed to take its square root so that it would not be squared one too many times when it is summed into equation (11). Section 3.2, in fact, showed that taking the square root improved results empirically as well, so this was done in equation (15):

N2: (15)

In implementation, the original VDM value is summed in as-is instead of taking the square root and then squaring the value. This may explain the inconsistency that arose in the following sections.

In Section 4.2, Equations (19) and (21) should have square roots over the sum in order to be consistent with the equations in Figure 12, though in practice taking the square root of that sum is unnecessary, since the nearest neighbor without the square roots are the same as the nearest ones with it.

More importantly, though, Equation (20) should have been a copy of Equation (15) (i.e., with the square root) rather than Equation (8). Similarly, there should be a square root over the sum of squares in Equations (22) and (26). Also, the function "vdm_a()" referred to in Equations (22) and (26) should be taken to be the one defined in the fixed version of Equation (20), i.e., the same as "normalized_vdm2()" as defined in Equation (15).

Another alternative would be to leave the square roots off of these equations (as they currently appear in the paper) and remove the "square" (i.e., the "2" exponent) in Equations (19), (21) and (25). This would make the equations self-consistent within these sections, and would actually most closely reflect how the functions are implemented, but would not be expressed the same as in Figure 12.

I apologize for any inconvenience these errors may have caused.

--Randy Wilson


Acknowledgement

Thanks are due to Alexey Tsymbal of Finland for pointing out the inconsistency in these equations.
Send comments to [email protected]