Last updated on February 9th, 2014Here’s how to calculate the Jaccard similarity coefficient and Jaccard distance between two or more images.
But first, some quick definitions…
The Jaccard index is the same thing as the Jaccard similarity coefficient. We call it a similarity coefficient since we want to measure how similar two things are.
The Jaccard distance is a measure of how dis-similar two things are. We can calculate the Jaccard distance as 1 – the Jaccard index.
For this to make sense, let’s first set up our scenario.
We have Alice, RobotBob and Carol. Alice draws a white line. RobotBob and Carol try to copy Alice. We want to measure how similar RobotBob’s and Carol’s lines are to Alice’s line.
Here are the drawings of lines (where a white pixel represents part of the line, and a black pixel represents the background),
Intuitively, we can see that Carol does a better job of drawing the line than RobotBob (Carol’s line is more straight, like Alice’s line. RobotBob’s line is all over the place.). But how do we give a number (quantify) this? How similar are the images? This is what the Jaccard similarity coefficient seeks to answer. So let’s get into the MATLAB code!
% JaccardTest.m % Compute the Jaccard similarity coefficient (index) of two images. % Also how to find the Jaccard distance. % % Kawahara (2013). % A value of "1" = the line object (foreground). % A value of "0" = the background. % Alice draws a vertical line. Alice = [0 1 0; 0 1 0; 0 1 0]; % RobotBob tries to draw a line. RobotBob = [0 0 0; 0 1 1; 0 0 1]; % Carol tries to draw a line. Carol = [0 1 0; 0 1 0; 1 1 0;]; % Let's see their two drawings. figure; subplot(1,3,1); imagesc(Alice); axis image; colormap gray; title('Alice''s nice line drawing'); subplot(1,3,2); imagesc(RobotBob); axis image; colormap gray; title('RobotBob tries to draw Alice''s line'); subplot(1,3,3); imagesc(Carol); axis image; colormap gray; title('Carol''s tries to draw Alice''s line'); % How similar are Alice's and Bob's drawing of a line? % An intuitive way to measure this is to compare each of the white "line" % pixels (a value of "1") to each other and see how many white pixels % overlap compared to the total number of white line pixels. % We compute the intersection of the two lines using the "AND" operator "&". intersectImg = Alice & RobotBob; figure; imagesc(intersectImg); axis image; colormap gray; title('intersection'); % We compute the union of the two lines using the "OR" operator "|". unionImg = Alice | RobotBob; figure; imagesc(unionImg); axis image; colormap gray; title('union'); % There is only one pixel that overlaps (intersects) numerator = sum(intersectImg(:)); % There are 5 pixels that are unioned. denomenator = sum(unionImg(:)); % So intuitively we might expect that a similarity of 1/5 would % be a good indication. This is exactly what Jaccard's does. jaccardIndex = numerator/denomenator % jaccardIndex = % 0.2000 % Jaccard distance shows how dis-similar the two line drawings are. jaccardDistance = 1 - jaccardIndex % jaccardDistance = % 0.8000 %% How simililar are Alice and Carol's two line drawings? % We can compute Jaccard's index in a single line, jaccardIndex_ac = sum(Alice(:) & Carol(:)) / sum(Alice(:) | Carol(:)) %jaccardIndex_ac = % 0.7500 % % As expected, we can see that Alice's and Carol's drawing of a line is % much MORE "similar" than Alice's and Bob's drawing (0.2). % Let's check the Jaccard distance. jaccardDistance_ac = 1 - jaccardIndex_ac % jaccardDistance_ac = % 0.2500 % % As expected, we can see there is LESS "distance" between Alice's and % Carol's drawing of a line than Alice's and Bob's drawing of a line (0.8).
There you go! Hopefully this helps to clear up how the Jaccard index can be applied to images. Note that another way to think about this is that we didn’t really measure how similar the images were, we measured how similar the shapes embedded in an image were.