Introduction to Matlab[1]
MATLAB( is a computer program by MathWorks, Inc. for people doing numerical computation. Matlab handles numerical calculations and high-quality graphics, provides a convenient interface to built-in state-of-the-art subroutine libraries, and incorporates a high-level programming language.
The purpose of this document is to introduce you to some of the basic capabilities of Matlab which you will find useful for a range of data analysis, plotting, or modelling applications. You will learn to work with Matlab by solving a series of exercises to read, analyse and plot a variety of micrometeorological data.
0. Getting started
For the purpose of this introduction, we will assume that you will be using Matlab on a PC-Windows machine, although the appearance and functionality of Matlab is largely platform independent.
Begin by starting Matlab from the Start menu. The command window will appear, and this is where you enter commands. Matlab is “command driven”, unlike Excel and most other Windows-based programs.
In this document the symbol >> represents the Matlab prompt in the command window. Commands following it indicate that you should type these in; Matlab commands or responses are printed in Bookman Font. Pushing the Enter key causes the command to be executed.
Matlab has extensive help resources, available under the "Help" tab of the MATLAB window, or type
>> help topic.
Hint Use the ( key to recall previous commands, which you can then edit or re-execute. Type a letter or part command followed by the ( key, and the last instance of a matching command will be recalled.
1. Basic input and display
Matlab stands for MATrix LABoratory, and was originally written to perform matrix algebra[2]. It is very efficient at doing this, as well as having the capabilities of any normal programming language. In most instances, the data we will be dealing with are simple matrices, either single values, or one- or two-dimensional arrays. An example of a one dimensional array (vector) is a single row or a single column of values.
Assign a value into Matlab as a variable
>> fred=10.5
Matlab will reply with fred= 10.5000
Use a semi-colon to prevent the screen output (this is crucial when working with large datasets!)
>> Fred=12;
A row-array can be entered as (try these)
>> row1=[1,2.5,1/3,6,pi,Fred]
>> row2=1:10
>> row3=0:0.2:10 increment by 0.2 (if no increment used, default is 1)
Get a list of variables currently in the workspace, and their size
>> whos Name Size Bytes Class
Fred 1x1 8 double array ans 1x1 8 double array fred 1x1 8 double array row1 1x6 48 double array row2 1x10 80 double array row3 1x6 48 double array
Grand total is 25 elements using 200 bytes
Note that Matlab is case-sensitive, hence fred and Fred are different variables.
You can also use the functions
>> size(row2)
>> length(row3)
Use semi colons to indicate the end of a row or to create a column array.
>> col1=[1;2;3;4]
>> table1=[4,3,3,4; 5:8; 9:12]
To see the value of a variable, just type its name (if its huge and you need to stop displaying, use the ‘control-c’ keys)
>> row3
You can transpose an array (so a 1 ( 6 row vector becomes a 6 ( 1 column vector)
>> col2=row3’
You can index individual values in an array to extract a subset of the data
>> x=col1(4)
>> y=table1(2,3)
>> col3=table1(:,3) the colon is shorthand for “every value in (col 3)”
>> row4=table1(2,:)
2. Basic operators and functions
Matlab’s power comes from the way in which it can perform computations on large datasets with great efficiency. In micrometeorology we are mostly concerned with calculations that involve manipulating 1-dimensional (1(n or n(1) or two dimensional (m x n) matrices, much as we perform column-by-column calculations of time series data in Excel.
Lets first clear Matlab’s workspace
>> clear
Input some variables, so that x=5.5, a=[1.2,1.4,1.1,1.4,1.3], b=[0.5,0.2,0.4,0.1,0.9].
Use of operators is fairly straight-forward, so see what you get trying
>> y=x+1
>> y=x^(x/2)
>> c=a–b
>> c=a*x
>> c=a*b
Note that when you subtracted b from a you ended up with an array the same size as the originals, but the final multiplication operation resulted in an error, when what we wanted to do was just multiply the elements of one row by the matching elements of the other. To do this, you must use a period in front of the operator[3], so the multiplication operator becomes .*
>> c=a.*b
>> c=a./b
>> c=a.^3
Forming equations is easy, but you have to be very careful where you put brackets to determine order (just as in Excel), and also use the period notation where necessary (when in doubt, use it! Find out from the help resources what the period signifies here).
>>c=a.^2./(pi*b)
There are a host of basic Matlab functions, eg
>> c=mean(a)
>> c=sum(a)
>> c=cumsum(a)
>> max(a)
Most of the short names of these functions give a fairly intuitive indication of what the function does. See the Help window for what else is available and to see the definition and usage of a particular function. Try to find out what function you would use to: i. convert a decimal day number into month, day, hour , minute ii. read in data from an Excel spreadsheet file.
3. Simple graphics
Matlab is a powerful tool for visualising data and its basic plotting functions are very simple. Try the following examples.
Plot the sine function from 0 to 2(. First create a dataset which has 201 elements, at pi/100 increments, then calculate its sine:
>> x=0:pi/100:2*pi;
>> y=sin(x)
>> plot(x,y)
>> title(‘Graph of the sine function’)
>> xlabel(‘x’)
>> ylabel(‘sin(x)’)
>> grid on
You can inspect plotted data very easily using the zoom command, and using the mouse to select an area of the graph. Zoom in on the above graph until you can see the individual data points, a double-click will return the original graph.
>> zoom on
Variations of the plot function allow you to select line and marker types:
>> plot(x,y,’r*’) plots the sine function using red asterix’s
>> plot(x,y,’go:’) with green circles and a dotted line
Type help plot for more information.
You can include more than one dataset on a graph, eg
>> y2=sin(x–0.25);
>> y3=sin(x–0.5);
>> plot(x,y,x,y2,x,y3) or a shorthand version is
>> plot(x,[y,y2,y3])
Add a legend
>> legend(‘sin(x)’,’sin(x-0.25)’,’sin(x-0.5)’)
Much of the time we are dealing with time series data, eg hourly values of temperature etc, so the x-axis usually consists of time or date.
Repeated plot commands will just replace the previous figure. To create a new figure window:
>> figure
>> plot(x,y3)
4. Some more sophisticated stuff — indexing and data error handling
If you are dealing with large datasets, but only want to graph a part of it, or reduce it down in size, you need to be able to select out certain parts.
Plot the cosine function from 101 to 200 (the “time” variable) then select a subset of data for times 150:170. We use array indexes in order to do this:
>> time=101:200;
>> y=cos(time);
Find the indexes of the time values we want, and store these in an array called ind.
>> ind=find(time>=150 & time> plot(time(ind),y(ind)) or cut the original dataset down to just the data of interest
>> time=time(ind);
>> y=y(ind);
>> plot(time,y)
Handling errors in the data can be tricky. Environmental data can include errors for many reasons, eg. instrument malfunctions. Lets add a typical error “spike” to our new cosine dataset, then try to plot the result:
>> y(9)=1000;
>> plot(time,y) Note how the auto-scaling means we can’t see our “good” data!
We could rescale the y-axis but this is messy and still shows part of the spike:
>> axis([time(1),time(length(time)),–1,1]) format is axis([xmin,xmax,ymin,ymax]) or we could delete the datapoint. One way is to replace it with Matlab’s special number type “not a number”, or NaN, which will appear on the graph as a blank. In computations, the NaN will be carried through to the result, so it might need to be filtered out at some stage.
>> y(9)=NaN;
>> plot(time,y)
When we know about certain error conditions, say when spikes always result in very large or very small values, or are always the same number, we can filter them out:
>> ind=find(y>100);
>> y(ind)=NaN;
Lets say we have a long dataset of temperatures, and when the logger is not working, our dataset contains zeros. Part of the dataset is
>> Temp=[25 23.5 23 23.8 0 0 26 26.5 26.7 25.9];
To avoid having our graph spike down to zero when this happens, we can find and replace these
>> ind=find(Temp==0); note the double equal sign! Find out what this means!
>> Temp2=Temp; create a duplicate dataset to preserve the raw data
>> Temp2(ind)=NaN;
CAREFUL: maybe it is possible to have a valid datapoint with a value of exactly zero.
We can’t compute the average temperature now, and the original dataset has errors:
>> Tavg=mean(Temp2)
>> Tavg=mean(Temp)
If we wanted to compute the average without deleting the bad datapoints we can use:
>> ind=find(Temp~=0); ie find indexes of “good” (non-zero) data
>> Tavg=mean(Temp(ind)); mean of just the valid datapoints
5. Reading data from a text file
In your working directory, create the following text file: junk.dat, consisting of time (say) and data columns (you can use the notepad program to do this):
11 6 3
12 8 7
13 1 0
14 2.2 –9
You can read a text file in to Matlab using
>> load ‘junk.dat’ if the file is in a different directory, include the entire path
You will now have an array called junk. Break it up into column arrays (vectors):
>> time=junk(:,1) the colon is shorthand for “every value in column 1”
>> stuff1=junk(:,2)
>> stuff2=junk(:,3)
6. Using Scripts and Functions
Here’s where Matlab really takes off. If you are analysing a dataset, you want to be able to repeat it later. You should never ever delete raw data, no matter how sure you are that it is in error or that your analyzed data product is correct. Matlab allows you to keep your raw data, and you only need to keep the analysis instructions (the equations etc) as a script file or as a function. Within these files you can do all of the data processing you like, including modifying or ignoring bad data points, without resorting to editing your raw data.
One huge advantage of Matlab over Excel is that you only need to type each equation once, rather than copy it multiple times, so error checking is really easy.
Before you create script files or functions, you need to let Matlab know where to look for them. Examine Matlab’s default search path:
>> path
Add a new path, while retaining the existing paths (look at the following examples):
>> path(path,’a:\’) only if you have a floppy disk in the computer
>> path(path,’I:\micromet\matlab’)
To create script files and functions, use Matlab’s editor (File/New/m-file), and to edit an existing one use File/open, or if the editor is already open, use its file functions.
SCRIPTS
A script is just a collection of standard Matlab instructions, and these execute when you type the script name. A script file must end with the extension .m, so a valid script name on Matlab’s search path would be soiltemp.m. The script file can contain blank lines and comments, using the % sign. An example of the contents of a script file is
% soiltemp.m
% To read and plot soil temperatures
% Dave Campbell 30 March 1999
load 'soilT.dat' time=soilT(:,1); soil_temp=soilT(:,10); plot(time,soil_temp) FUNCTIONS
A function is a similar collection of Matlab instructions, however it only returns specified variables to Matlab’s workspace. It also runs much faster because Matlab compiles it when first called. You typically supply input variables and receive output.
A function must have a header line. An example function is included below which computes saturation vapour pressure given an input air temperature (it only has one instruction, but it could have thousands). Create this function yourself, and make sure you can run it, eg.
>> es=satvap_T(20) input 20(C
To be able to process the function satvap_T, your working directory must contain an m-file of the name satvap_T.m (where obviously 'satvap_T' can be replaced by whatever function name you choose). For the present function, to compute saturation vapor pressure using Teten's formula, the m-file would contain the following:
function [es] = satvap_T(T);
% es = satvap_T(T);
% Function to calculate saturation vapour pressure using Teten's formula
% Input T : air temperature (deg. C)
% Output es : sat. vapour pressure (kPa)
es = 0.61365.*exp((T.*17.502)./(240.97+T));
Here’s some of Matlab’s simplicity and power for you. Create a temperature array from -10(C to +40(C using 0.1(C increments, and plot the svp function:
>> Tair=–10:0.1:40;
>> es=satvap_T(Tair);
>> plot(Tair,es)
Add a title, x- and y-axis labels, and print the graph
7. Working with matrices
Since Matlab was developed to perform computations with matrices, we need to review some matrix basics.
A matrix is a set of numbers arranged in rows and columns so as to form a rectangular array. The numbers are called the elements, or entries, of the matrix. Matrices have wide applications in engineering, physics, economics, and statistics as well as in various branches of mathematics. The term matrix was introduced by the 19th-century English mathematician Arthur Cayley, who developed the formal algebraic aspect of matrices (although elements of matrix algebra were already used in ancient Mesopotamia and East Asia much earlier). The algebra of matrices, known as linear algebra, constitutes the most efficient way to work with, and solve, systems of linear equations[4].
For our purposes, we will mostly use matrices simply as a tool to organize data in tables (usually numbers, but character strings are also possible), much like the rows and columns in a spreadsheet. In Matlab, matrices are similar to arrays in other programming languages (such as Fortran, BASIC, or C), but be aware that matrix operations are different from array operations! However, some basic knowledge of matrix properties and matrix algebra will help us to organize, analyze and plot data efficiently using Matlab.
We have already seen in Section 1 how matrixes can be created and entered in Matlab. The dimension of a matrix is given by the number of rows and columns. A matrix A with n rows and m columns is a (n ( m) matrix. Matrix elements are referred to by indexing. For matrix A, the element in the i-th row and the j-th column is called by A(i, j) or A(row, column).
In Matlab, a matrix of dimension (1 ( 1) is effectively just a number or a scalar. A matrix of dimension (1 ( m) is a row-vector of length m. Conversely, a column-vector of length m has dimension (m ( 1). See in Section 1, how to enter row- or column-vectors and how to convert from one to the other (transpose).
Basic matrix algebra and array operations
To enter the matrix 1 2 3 4 and store it in a variable “A”, do this:
>>A = [1 2; 3 4 ] you can use spaces or commas to separate the columns, but rows must be separated by a semi-colon.
Enter another matrix B: 1 2 0 1
>> B=[1 2; 0 1];
Multiply A times B:
>> A*B ans = 1 4 3 10
Now multiply B times A:
>> B*A ans =
7 10 3 4
Notice that the two products are different: matrix multiplication is non-commmutative. Use the help resources to find out why this is so. What algorithm does Matlab perform with the instruction
>> B*A?
Now try the same thing, but using the array multiplication operator .* instead of matrix operation:
>> A.*B note the result; and then do
>> B.*A compare with the above.
In array multiplicaton, A(i,j) is multiplied by B(i, j) element by element and thus, this operation is commutative. Obviously A and B must be of the same size, unless one of them is a scalar. What happens if one of them is a scalar? Try it out!
Similar care must be taken with the division operators (note that there is right division and left division in Matlab). Refer to the help resources.
We can easily add or subtract matrices:
>> A + B see what happens
>> A - B see what happens
Again, A and B must be of the same size, unless one of them is a scalar. What happens if one of them is a scalar? Try it out!
Matrix manipulation
Define the matrix A:
>>A=[35 1 6; 3 32 7; 31 9 2; 8 28 33];
To figure out how many rows and columns is your matrix:
>> size(A)
ans =
3 3
What is the mean of A ?
>> mean(A)
ans =
19.2500 17.5000 12.0000
Notice that Matlab averages down each column.
If you want to average across rows – simply average the transpose of A:
>> mean(A')
ans =
14 14 14 23
Note how the transpose of A is defined – it is A'. Type:
>> A' to see what the transpose looks like. Do you see a symmetry between a square matrix and its transpose?
We want to find the mean of all the numbers in matrix A:
>> mean(mean(A))
ans =
16.2500
Find the minimum number in all matrix A:
>> min(min(A))
ans =
1
You can readily sort the matrix A
>> sort(A)
ans =
3 1 2 8 9 6 31 28 7 35 32 33
Notice that each column is sorted separately. Use the help resources of Matlab to find out how you can sort the rows of the matrix according to the order in a given column – an operation that is most likely more useful for you.
In micrometeorological data analysis, we often want to stratify data according to certain criteria, we call this "data mining". For some purposes, sorting is a valuable tool. More often, we will want to identify and select data that satisfy certain conditions. For example, we will want to select all data of variable X (data stored in array X) for which the concurrent variable Y (data stored in array Y) lies within a certain range of values. We may want to look at evaporation flux data, but only for conditions when the net radiation is positive; or we may want to evaluate the average temperature for each hour of the day over a given month (i.e., the monthly averaged daily course of temperature).
To achieve this, Matlab provides a simple function that is much more powerful than initially meets the eye: Z=find(X). Look this function up in the help resources. Z is a vector that contains the linear indices of all elements of array X that are non-zero. X can be replaced by a logical expression that imposes a criterion on a data array by some relational operator, as in the example below.
You can find the index of all elements less than 10:
>> I=find(A> A(I) ans =
.....3
.....8
.....1
.....6
.....7
If you have another matrix B that has the same structure as A (e.g., both are arrays with hourly observed data of different variables), the expression B(I) will return all elements of B for which A is less than 10 (in the example above), and mean(B(I)) will return the average of those selected elements.
You see that sort, find, mean and other Matlab functions are extremely powerful tools for data analysis and data mining. Use the help window in Matlab to peruse the variety of functions that are availabe in each category. Of course, we often have specialized data formats or analysis needs. Matlab easily alows us to construct our own specialized functions and work towards putting together our own Matlab toolbox.
-----------------------
[1] This document draws heavily from “Introduction to Matlab” by Dr. Dave Campbell, University of Waikato, Hamilton, New Zealand; and “Matlab BASICS” by Prof. Gabriel Katul, Duke University, NC, USA. And Introduction to Matlab by Prof Hans Peter Schmid (who was at that time at Indiana University, IN, USA) Any errors in this document are mine.
[2] see Section 7, below.
[3] See Section 7 about the significance of the .* operatior.
[4] For more on this topic see, e.g., http://mathworld.wolfram.com/Matrix.html