Root Mean Square Error tutorial – MATLAB

Here’s how to calculate the root mean square error.

Assume you have one set of numbers that represent the Actual values you want to predict.

Actual = [1 2 3 4];

Then assume you have another set of numbers that Predicted the actual values.

Predicted = [1 3 1 4];

How do you evaluate how close Predicted values are to the Actual values?

Well you could use the root mean square error (RMSE) to give a sense of the Predicted values error.

Here’s some MATLAB code that does exactly that.
Continue reading “Root Mean Square Error tutorial – MATLAB”

MATLAB – how to calculate the Jaccard similarity coefficient/index and distance between images

Here’s how to calculate the Jaccard similarity coefficient and Jaccard distance between two or more images.

But first, some quick definitions

The Jaccard index is the same thing as the Jaccard similarity coefficient. We call it a similarity coefficient since we want to measure how similar two things are.

The Jaccard distance is a measure of how dis-similar two things are. We can calculate the Jaccard distance as 1 – the Jaccard index.

For this to make sense, let’s first set up our scenario.

We have Alice, RobotBob and Carol. Alice draws a white line. RobotBob and Carol try to copy Alice. We want to measure how similar RobotBob’s and Carol’s lines are to Alice’s line.
Continue reading “MATLAB – how to calculate the Jaccard similarity coefficient/index and distance between images”

Cluster commands

Often I find myself running jobs on the cluster and I can never remember the basic commands. So here’s some useful commands that you can use to run jobs on the cluster.

*** note that I’m running jobs on Simon Fraser University’s (SFU) cluster and I have no idea if these will work on your specific configuration ***

Assume your userid is billybob
Assume you have a cluster script called test.pbs
Continue reading “Cluster commands”

MATLAB – Calculate L2 Euclidean distance

Here’s how to calculate the L2 Euclidean distance between points in MATLAB.

The whole kicker is you can simply use the built-in MATLAB function, pdist2(p1, p2, ‘euclidean’) and be done with it. p1 is a matrix of points and p2 is another matrix of points (or they can be a single point).

However, initially I wasn’t really clear about what was going on. So if you are still a bit confused, let’s chat about it…

The scenario: You have one point, aPoint, that you wish to compare against a bunch of other points bunchOfPoints. For simplicity, let’s work in two-dimensional (2D) space. Note that 2D means each point is composed of two pieces of information (i.e. has 2 components).

We start by defining our points, then we calculate the L2 distance by hand, then we use the built-in pdist2() function to show we get the same result. And then finally, as a little bonus, we show how to get the minimum L2 Euclidean distance at the end.

Here’s how to calculate the equation by hand if you’re interested.
http://en.wikipedia.org/wiki/Norm_%28mathematics%29#Euclidean_norm

% disL2
 
% Define our points.
aPoint = [1,4]; % A single point with 2 components.
bunchOfPoints = [2,3; 1,4; 0,1]; % A bunch of other points.
 
% Make 'aPoint' the same size as a 'bunchOfPoints'.
aPointMatrix = repmat(aPoint,size(bunchOfPoints,1),1);
 
% Calculate by hand.
%% L2 Euclidean Norm.
% http://en.wikipedia.org/wiki/Norm_%28mathematics%29#Euclidean_norm
% 1) Take the difference between the two -> aPointMatrix-bunchOfPoints
% 2) Square the difference to get rid of positive/negative: 
%       (aPointMatrix-bunchOfPoints).^2
% 3) Sum this up along the rows.
%       (sum(((aPointMatrix-bunchOfPoints).^2), 2))
% 4) Take the square root of this.
%       (sum(((aPointMatrix-bunchOfPoints).^2), 2)).^0.5
pointsDifSquare = (sum(((aPointMatrix-bunchOfPoints).^2), 2)).^0.5
 
% output = 1.4142 0 3.1623
 
%% Or we can just use this handy built-in function...
d = pdist2(aPoint,bunchOfPoints,'euclidean') 
 
% same output! = 1.4142 0 3.1623
 
%% Bonus, how to find the min distance!
[theMinDistance, indexOftheMinDistance] = min(d)
 
% Yeah! As expected theMinDistance = 0 and indexOftheMinDistance = 2.

MATLAB – how to make a movie of plots

Making a video of your moving graphs/charts is surprisingly easy to do in MATLAB. However most of my online searches gave me old outdated methods to do it. Here’s how to make a movie or a video in MATLAB.

I kept getting this freakin’ error using the old methods (i.e. the avifile() function):
“Windows Media Player cannot play the file. The Player might not support the file type or might not support the codec that was used to compress the file.”

So I found MATLAB recommends you use the VideoWriter() class…
http://www.mathworks.com/help/techdoc/ref/videowriterclass.html

They have a nice little example in the documentation, but for the impatient, here’s my quick and dirty implementation of it (with some modifications/additions of course).
Continue reading “MATLAB – how to make a movie of plots”

MATLAB – How to scale/normalize values in a matrix to be between 0 and 1

I hate that I have to keep looking this up…

Here’s how to scale or normalize your numbers in MATLAB so they lie between 0 and 1.
Change the number of mins and maxs depending on the dimensionality of your matrix.

I = [ 1 2 3; 4 5 6]; % Some n x m matrix I that contains unscaled values.
scaledI = (I-min(I(:))) ./ (max(I(:)-min(I(:))));
min(scaledI(:)) % the min is 0
max(scaledI(:)) % the max 1

All of scaledI values are now between 0 and 1.

SQL vs ANSI ~ Why I now avoid the old SQL syntax like it has a bad disease

Old SQL syntax vs. New ANSI syntax.

The short: to answer the question of whether you should use SQL vs ANSI, use ANSI.

Why?

Let me tell you a story…
A young man is frantically programming a website. This is his first ever real site he has worked on so the learning curve is fairly steep. The website is to be used as an interface to a database storing a couple hundred thousand records.

Being quite used to desktop programming, this junior programmer is impatient and is coding using the old trial-and-error approach.

One query he is working on involves just a few tables being joined together. Pretty simple stuff… It went something like,

SELECT *
FROM specimens SP, box B
WHERE SP.box_fk = B.box_id

Then in a frenzied dash of debugging, he realizes he doesn’t want these tables joined so he comments out the last line,

SELECT *
FROM specimens SP, box B
-- WHERE SP.box_fk = B.box_id

He then faithfully tests his changes on his test database, notices no strange behavior and moves it over to the production side.

In his usual frantic pace, he executes the above statement three or four times from the webpage with no response. So he closes it, puzzled for a second, and then brushing it from his mind, he begins working on another project.

About 3 and a half hours later, his supervisor comes running in asking why he’s getting a million phone calls and emails from IT saying his queries are shutting down the production databases.

Oops…

And ever since this unfortunate incident, this young programmer used the ANSI syntax religiously.

So what happened?
1) A cross join.
Commenting out the last line resulted in each record in each table joining with each other to produce a enormous amount of records. Clearly not what he wanted.
2) The test database contained a small amount of records in it. Small enough in fact that the rows produced by cross join could be successfully returned. However the production database contained hundreds of thousands of records; this cross join probably was trying to pull millions of rows back.

By the way the new ANSI syntax (the good and holy way) of the same query would look like this,

SELECT *
FROM specimen SP LEFT OUTER JOIN box B ON SP.box_id = B.box_fk

Two things I learned.
1) Make sure to have a decent amount of test data in your development database.
2) Use the new ANSI syntax.

MATLAB – How to check if a file or a folder exists

So you want to check if a file or a folder exists in MATLAB? Here’s how to do it.

% The file/folder to find.
fPath = 'aLittleFile.m';
 
% To see what each of these "magic numbers" mean, go to, 
% http://www.mathworks.com/help/techdoc/ref/exist.html
if isequal(exist(fPath,'file'),2) % 2 means it's a file.
    % We have a file!
    display('a file!');
elseif isequal(exist(fPath, 'dir'),7) % 7 = directory.
    % We have a folder!
    display('a folder');
else
    % We have an invalid file or folder.
    display('an error!');
end

Note that we use the isequal bit to check if it is actually a file or a folder. If we take the isequal check out, then the first if statement would be true for a file and a folder (since exist(fPath,’file’) would return 7, thus the if statement would be true)!

Run a MATLAB function/script with parameters/arguments from the command line

Here’s how to run a MATLAB function with parameters from the command line.

> matlab -r "littleFunction batman superman"

where littleFunction is the name of your MATLAB file (i.e. littleFunction.m) and batman is the first parameter and superman is the second parameter. Note the quotes around the function name and the parameters! Note that the function name does NOT include the “*.m” extension.

If you need a bit more of an example, read on…

First we create a little function with two parameters.

%%%%%%
% Create a function.
function littleFunction(parameterA, parameterB)
 
display(parameterA);
display(parameterB);
 
% Uncomment this to exit MATLAB when complete.
%exit;

Note that little exit; at the end can be used if you want to close MATLAB immediately when it reaches the end of your code.

Then we navigate to the directory where littleFunction lives, and from the command line we type,

> matlab -r "littleFunction batman superman"

When we run command, MATLAB will start and run this function. You will see batman and superman being displayed.

parameterA =
 
batman
 
parameterB =
 
superman

And you’re done!