Free Essay

Implementing Portfolio Selection by Using Data Mining

In:

Submitted By coolananddd
Words 10967
Pages 44
The Chinese University of Hong Kong
Department of Computer Science and
Engineering

Final Year Project
Trading Strategy and Portfolio
Management (LWC 1301)
Implementing Portfolio Selection By
Using Data mining

Tseng Ling Chun (1155005610)
Supervisor: Professor Chan Lai Wan
Marker: Professor Xu Lei
1

Table of Contents
Table of Contents………………………………………….…………………………………………………2
1. Introduction………………………………………….…………………………………………................4
1.1 Financial Portfolios.......................................................................................................4
1.2 Data Mining and Decision Trees………………………………………..................….4
1.3 Flow of Report……………………………………….....................................................….5
2. Classification and Regression Trees (CART) …………………………………..........……….6
2.1 Detailed description of CART……………………………………................................6
2.2 Tree Construction………………………………………..............................................….8
2.2.1 Application of Impurity Function in CART……………………...…...9
2.3 Splitting Rules…………........……………...………….………………………….......……11
3. Optimizing Size of Tree……………………………....………..................................................….12
3.1 Parameterization of Trees…………………………………...........................……….13
3.2 Cost – Complexity Function……………………………………....….........................14
3.3 V – Fold Cross – Validation……………………………………..........................…….15
4. Iterative Dichotomiser 3 (ID3) …………………………………....…..................................….18
4.1 Entropy and Information Gain……………....……………..................…………….19
5. Data Used…………………………………..................……….………………………………………….20
5.1 Platforms and Open Source Library…................................................................20
5.1.1 Testing and Development Environment…….......…................…...20
5.1.2 Robotrader………...………...………...………...………...…..................…...21
5.1.3 Java Object Oriented Neural Engine………...………...……….……..22
5.1.4 TA-Lib (Technical Analysis Library) ………...………...………........23
5.1.5 WEKA………...………...………...………...………....……........................…...23
5.2 Historical Stock Data Source………...………...………...………...…............……...23
5.2.1 Raw Data Format………...………...………...………...................………...23
5.3 Pre-process of information………...………...………...…...............……...………...24
5.3.1 Simple moving average (SMA).………...………...………...…....…..…24
5.3.2 Exponential moving average (EMA).....……...……..........................25
5.3.3 Relative Strength Index (RSI) ………...………...………......................26
5.3.4 Momentum and rate of change………...………...…….................…...26
6. Experiments and Results………………………………………….…………………………...……27
6.1 Trading strategy………...………...………...………...………...……….......……...........27
6.1.1 Planning of Trading strategy………...………...………...………...…27
6.1.2 Stock of choosing………...………...………...………...…....…...…….......................27
6.1.3 Finding rising / dropping stock in the following 30 days…...28
6.1.3.1 Flow of testing………...………...………..................................28
6.1.3.2 Choosing of method………...………...………...………..........28
6.1.3.3 Test without using SMA in CART………...…….......…….28
6.1.3.4 Test with using SMA in CART by adding momentum related attribute and modify the class………...............30

2

6.1.3.5 Classification of using ID3 algorithm………..................33
6.2 Portfolio management………...………...………...……….........................................35
6.2.1 Planning of Portfolio management………...………...………............35
6.2.2 Stock List of choosing………...………...………......................................35
6.2.3 Finding rising stock in following year………...………...………......37
6.2.3.1 Flow of testing………...………...………..................................37
6.2.3.2 Test using accurate value………...………...………...…......37
6.2.3.3 Test using proportion value………...………...……….......39
6.2.3.4 Summary on choice of first step………...………...……...39
6.2.4 Choosing low relation stock from the list………...……….............41
6.2.4.1 Flow of testing ………...………...……….................................42
6.2.4.2 Method………...………...……….................................................42
6.2.4.3 Testing using monthly percentage change data……44
7. Conclusion……………………………………….….…………………………………………................46
8. Difficulties and Challenges.......................................................................................................48
9. Contribution of Work.……………………....…….....…………………………………………........49
Bibliography………………………………………….…....……....…………………………….................51
Appendix A...…………………………………………………………………………………………………52
A.1 Decision Tree from CART........................................................................................52
A.2 ID3 Testing Result......................................................................................................57
A.3 Sample Data used in ID3 Algorithm...................................................................57
A.4 Sample Data used in CART Algorithm...............................................................58
A.5 Sample Data used in K – means Clustering.....................................................59

3

1. Introduction
With the fast expense of computer technologies and Internet in the past decade, mass number of financial investment tools and portfolios have been published by different financial institutions. Since many people are trading in different markets through different financial institutions and portfolios, this situation attracts lots of economists and mathematicians to investigate different portfolio simulations so as to get higher returns from market with less risk. Under this environment, ability of computers has been improved to handle the numerous data generated by the markets. During this improvement, different data mining techniques have being used.
In this report, we are trying to select the best portfolio among different combinations by using different ideas of data mining techniques, including
Classification and Regression Trees – CART and Iterative Dichotomiser 3 (ID3).

1.1 Financial Portfolio
The meaning of “Portfolio” is in the sense of “collection”, while in financial aspect it directly means a collection of different investments. It is important that not to put all resources into one single security because different securities will have different characteristics like liquidity in different time series. Therefore, diversification of portfolios becomes a hot issue recently with concern of reducing systematic risk and specific risk. However, this kind of concerns is not the underlying part in this report.

1.2 Data Mining and Decision Trees
Data Mining is techniques that digging out useful information that hide in vast amount of data. It can be used to reduce noise, analyze data from different dimensions and summarize the relationship between data. Basically it is a process to find out the correlations or patterns in a large relational database.
Data mining method includes association, clustering, classification, regression,
4

deviation detection, summarization and sequential mining. Our report is going to use Classification and Regression Trees to select portfolio.
Decision tree is a non-parametric learning method with schematic tree-shape, which is used for classification and regression. Goal of decision tree is to derive a model that can predict the values of target variables by applying simple decision rules. These simple decision rules are induced by the features of used data.
Following shows learning example of decision tree, the diagram shows that decision tree model learns from data and approximates a curve with if-then-else rules. Figure. 1.1: Example of decision tree model with if-then-else decision rule

1.3 Flow of Report
The structure of this report is as follows: Chapter 2 will present you the decision tree methodology, which is used in the report – CART, by introducing some important components like impurity measures and Gini splitting rules. Then in chapter 3 we will show you how to optimize the tree in order to get a better result by applying cross validation to the model. After all, detailed description of data for experiment will be shown in chapter 4. Chapter 5 provides results and chapter 6 will give conclusion.

5

2. Classification and Regression Trees (CART)
2.1 Detailed Description
Classification and Regression Trees (CART) is a non-parametric method that uses available data in the form of

where X represents the matrix of

explanatory variables and Y represents the vector of data classes. There exists a situation that the available data might not belong to any classes and therefore further computation is need for the data to meet the characteristic of Y, computed available data imply X.
To select a portfolio or stock, we have to first know what kind of the portfolio or stock is, that means what class Y they belong to. For example, a stock can be defined as bearish or bullish or subjectively a neutral market, therefore class vector are predefined classes of stocks. X

then may contain available data such as technical variables related to the predefined three classes. The detailed construction of vector y will be told in another chapter.
Since there are so many observations inside a learning sample, we now assume each of them has its own class, therefore we can treat them as combinations of available data X and class vector Y and they are used to extract useful data patterns from the learning sample by “learning”. For example, the decision tree T gets outcomes from those samples and try to evaluate what is the relationship between X and Y. After explaining, the model can insert new data out of the sample pool into classes from Y. This process is important since market information is always updating in a large amount and this trained model is then available to give suggestion of to buy, sell or hold from new data.
Now consider an example, which is illustrated in Figure. 2.1 by using CART.
CART tends to simplify the data by splitting them with minimal number of questions. Note that CART will only answer Yes/No questions like “

6

?”. In

this example, we classify nodes with tags Apple, Orange, Banana, Grape and
Melon as terminal nodes

where k = 1, 2, ... , n. If the answer is yes from

question then left branch is taken.
X1 ≤ 0.5
Apple

X2 ≤ 0.5

X1 ≤ 0.75

Orange

Banana

X2 ≤ 0.25

Grape

Melon

Figure 2.1: Example of classification tree with minimal number of questions
CART goes through all the available variables so as to find out the splits s, which are combination of available data X and suitable question values used. Among all these splits, an optimal split is found if that split s* splits the data into two parts with maximum homogeneity. The process is repeated until all splits become
“optimal” and then the resulting tree becomes an “optimal” size, which means it contains minimal number of questions.

7

Figure. 2.2: Result of application of CART on tree in Figure. 2.1 with 2 variables.
Four Splits is enough for data to be separated into different nodes. Red lines indicate the questions.

2.2 Tree Construction
Since in this project we are going to apply decision tree to a vast amount of data.
Therefore, in order to separate them one by one and extract the relationship between them, constructing a tree is an important topic and it contains three major steps to complete it.
1. A maximum tree

should be constructed,

2. Adjusting the right size of tree and,
3. Inputting new data to newly constructed tree.
First of all, since available data X = (X1, X2, …, XP) where P is the number of variables and the class vector Y should have the same length with the number of observations because each observation should belongs to one class. Also, we assume J be the number of unique classes like in y
J = 3.

8

,

We now let tP be the parent node while tL and tR represent left and right child node such that there exist both fraction from parent node to left node, the pL, and fraction from parent node to right node, the pR = (1 – pL). If nP is the number of observations in parent node while nL and nR are also number of observations in left and right child node, then

,

Eq. 2.1

As we mentioned in Ch. 2.1, CART splits the initial data (parent node) into two separated parts (child node) in order to find out the most homogenous group so as to obtain an “optimal” tree. The word homogeneity is first defined in impurity function i(t).

2.2.1 Application of Impurity Function in CART
The impurity function measures the purity for a region containing data from different classes. Assume there are K classes then the impurity function becomes a function of probabilities p1, p2, …, pK in different regions that belong to class y1, y2, …, yK. Hence, we can conclude two major properties of this function:

1. It achieves a maximum when at point

, which is at

uniform distribution.
2. It achieves a minimum when at points
.
Impurity function provides a mean to split left and right child node with a probability of maximum homogeneity while compared to the parent node. The split is also equivalent to the split after measuring the maximum change of the impurity function

where t represents a node:
Eq. 2.2

Where

and C = set of child node

9

.

To apply to CART, we first let the two kinds of fraction pL and pR be the estimated probability of left and right child node and therefore we can define the goodness of split s for node t as:
Eq. 2.3
By Eq. 2.3, we can then obtain the equation to solve the optimization problem at each node as follows:

Eq. 2.4

Eq. 2.4 is a algorithm of CART searching out the best split s* through all variables constituting the matrix space X that maximizes the change of impurity function.
Therefore, by applying Eq. 2.4, a maximum tree TMAX would be established since it is a tree containing maximum number of nodes for a given data sample set. Eq.
2.4 works to the original given data set and result data splitting portions until the condition in Eq. 2.5 holds:
Eq. 2.5
Where

is the set of terminal nodes in tree T and p(j|t) is the estimated

posterior probability of class j given a point is in node t.
This condition means that all observations are in the same class j in each terminal node of maximum tree TMAX.

10

2.3 Splitting Rules
In this report, we employ the concept of Gini index to define the impurity function By using the idea of Gini index, the form of impurity function

is shown as

below:
Eq. 2.6

where k,l =

are class indices and:

Eq. 2.7

where nj(t) is the number of observations from variable set X in the class j and
. Note that these classes have been separated to each corresponding node t by a given split s.
By applying Gini impurity function into Eq. 2.3 and Eq. 2.4, we can obtain:
Eq. 2.8

Eq. 2.9

Gini algorithm tends to look for the largest class or the “most important” class and isolate it from other data.

11

3. Optimizing Size of Tree
From Chapter 2, we can conclude the process of creating a tree by (1) applying
Eq. 2.9 to the learning sample data set then (2) apply to the newly created nodes in a tree. (3) This process stops until the condition in Eq. 2.5 holds for every terminal nodes or until the tree size has been optimized. In this chapter, we are going to show you what is the optimized tree size and why maximum tree is not always the best choice.

3.1 Parameterization of Trees
Since this report aims to optimize the tree in order to use CART technique to obtain satisfying result for portfolio selection, by applying Eq. 2.9 until condition
Eq. 2.5 holds can achieve the optimization. However, since the homogeneity of classes at each stage were said to be increase by filtering out observations that of other classes to other nodes, therefore, it means that even the smallest or the
“less important” random noise would take account as parts of a final decision.
However, we would only want those reliable parts of the tree but not the noise because new data will pass through the whole trained tree, but they will also pass through those noisy parts and with a higher probability the classification of tree would be wrong.
In other words, a tree with larger tree size would cause over-parameterization problem because of noises while a tree with small tree size causes underparameterization since it cannot learn the significant part inside the learning sample. The solution to solve these problems is to apply cross-validation to the subtrees of tree T of different sizes and compare their performance.

12

3.2 Cost Complexity Function
Tree complexity means that how large the size of tree is and it depends on how much nodes are there. There are two such a extreme cases: (1) For a maximum tree TMAX, it will be penalized by it’s large size since larger portion of noisy parts will be produced however it makes a perfect predictions in learning sample.
(2) For very small size of tree, they get much lower penalty for their size but their ability of predicting become limited since they might ignore those significant learning sample data from what we have just mentioned in Chapter
3.1.
In order to find the balance between tree size, which might be penalized, and predicting power, which depends on tree size, we can apply cost complexity function – the tree pruning to achieve our goals.
Imagine that we cut off the growing process during the growing process at various time and various points then we can evaluate the misclassification error at that point and time. It leads to an error versus tree size diagram. There are two types of classification error:

Error

Error

Tree Size
(a)

Tree Size
(b)

Figure 3.1 (a) shows training error where those mistakes made on training set while (b) shows testing error where those mistakes made on testing set.

13

Increase in tree size due to decrease in training error. However, it is not the same in testing. Testing error is first decreased with increment in tree size since we expect the result is similar to what we have tested in training stage. However the fact is that during testing, larger amount of noisy data were passed through the model and therefore became less accurate on the testing data. In this case the model is actually fitting on noisy data but not test data. Therefore, the shape of the testing error curve starts to increase, in which the size of tree is too larget hat it cannot perform well on test data. Overfitting problem occurs.
To obtain the optimal result by considering the tradeoff between tree size and error, we first define the misclassification error at node t as:

Eq. 3.1 and Eq. 3.2
Therefore the misclassification error of the tree is:
Eq. 3.3
Where

is a set of terminal nodes. We then define the size of T as

for any

subtree T TMAX. Then the pruning method, the cost complexity function is defined as:
Eq. 3.4 where  is the complexity parameter and ≥ 0.

is the cost component.

Since the set of subtree T, could be considered as pruned tree, of TMAX is finite and so T = {T1, T2, … , TN) with decreasing number of nodes. The tree T() is then considered as “optimal” with some , it will remain “optimal” until another

14

parameter ’ in Eq. 3.4 makes the tree more “optimal” then it will be replaces.
This process carries on until a most optimal one is found.
An optimal tree T() exists when:

Eq. 3.5 or Eq. 3.6

After determining the misclassification error for optimization of tree size, we can then apply the technique of V-fold cross-validation to determine the optimal tree.

3.3 V-fold cross validation
When a tree is built by learning some specific dataset, and then separated, independent testing datasets are run through the tree. The classification errors first decreased with increasing of tree size until it reaches a minimum. If the tree grew continuously beyond this minimum point, the classification error then increase. Example is shown in Figure 3.1 (b).
However, it is difficult to hold back data from the learning dataset, which is used for separated tests and testing independent data frequently is also expensive.
Therefore V-fold cross-validation is used for testing independent tree size without separating test dataset and reducing data used to build a tree.

15

Brief principle of V-fold cross-validation is as follows:
1. Since all the data in the dataset are allowed to used to build a tree, so the tree was first intentionally to grow larger than “optimal” one. We refer this tree as unpruned tree, or the maximum tree TMAX, which is perfectly fit in learning data set. 2. The whole learning dataset is then separated into some groups called “folds”.
They are separated such that the distributions of classes of available variables are similar in different groups. The number of groups is V.
3. Now assume there are 10 partitions, the model first uses 9 of the partitions into a new pseudo-learning dataset then a test tree is built based on the pseudolearning dataset. This tree is then only fit 90% of the unpruned tree. The unused
10% of whole learning data becomes independent and can be used as test sample to the test tree.
4. During the testing, classification error for test tree 1 occurs and being considered as an independent error of tree 1.
5. Repeat the process 3 and 4 with different pseudo-learning datasets for V times
6. Once there are 10 tests tree is built and 10 classification errors have been extracted, stop the process.
7. We then average those classification errors by their tree sizes into average error rate. This average error rate for specific tree size represents “CrossValidation cost” – CV. After computing CV for each size of tested trees, a tree size that produces minimum CV must be found.

16

8. The unpruned tree is then pruned to the size with minimum CV. During pruning, we remove the “least important nodes”, which is found by cost complexity function in Chapter 3.2.
However, selecting the optimal tree with minimum value of CV – ECV(T) may not be the best solution for selection since there may exist more than one solution, which means it may occurs a situation that the answers inside a range of values
Also, if the value of V is less than the number of observations in available variables set X, then the V-fold cross-validation might cause different result to the original cross-validation.

17

4. Iterative Dichotomiser 3 (ID3)
Besides using CART, this project also includes the ID3 technique, which is slightly different to CART, to obtain the best selection.
ID3 is first developed by Ross Quinlan in 1986. This algorithm creates simple and efficient tree with the smallest depth. The major difference between ID3 and
CART is in their splitting method. Since CART is a binary tree model, every time it can only develop 2 nodes under 1 parent node. However, ID3 selection tree can have multiple children and siblings at the same time. Since the main method of
ID3 is based on Concept Learning Method System (CLS), therefore, the training approach over a set of learning dataset C should be:
1. TRUE node is created if all data in C are positive, then stop immediately.
2. FALSE node is created if all data in C are negative, then stop immediately
3. Divide the training data in C into subsets C1, C2, … , CN according to their values
4. Repeat process 1 – 3 for each Ci where i = 1, 2, … , N
Inside above procedures, ID3 goes through all attributes of the training dataset and selects the attribute that perfectly separated the given dataset. It stops until a best attribute is found. Note that ID3 will not turn back to look at those data once it passed through.
The requirements of learning dataset used are as follows:
1. Same attributes should have a same fixed number of values.
2. Attributes must be predefined before used as example.
3. Continuous classes are not directly allowed.
4. There must have sufficient amount of data, since ID3 needs large amount of data to distinguish useful patterns from occurrences. More data result in better accuracy. 18

4.1 Entropy and Information Gain
In order to decide which attribute to be chose, a technique called Information
Gain is introduced, which is based on the Entropy function as follows:

Eq. 4.1

For
Where p(ji) is the probability of class ji in the training dataset C. The probability is also defined as:

All attributes will be estimated so as to investigate which of them can reduce the impurity most when it is used to divide C. Suppose now the number of positive values inside attribute Ai is w and it will be used to divide the dataset C into different subset D1, D2, … , Dw. Therefore the entropy for particular subset of dataset becomes:
Eq. 4.2

Therefore, the information gain of Ai becomes:
Eq. 4.3
Where information gain measures the difference between entropies (before and after). It means that it measures how much uncertainty has been reduced after splitting dataset C on attribute A. The larger the information gain, the larger portion of uncertainties has been removed. Note that if IG(A) is too small then the process will halt automatically.

19

5. Data Used
5.1 Platform and Open Source Library
For testing and finding trading strategy, some Open Source Library and application will be used for visualizing the chart of stock and the trading result.
Some library is used to calculate technical analysis indicator for testing.

5.1.1 Testing and Development Environment
For the development, Windows 7 is used as the development platform. Since there are some finance / data mining library based on java. Java is decided be the development language. For the Integrated Development Environment. Eclipse is chose since it provides plugins for getting the open source application source code. Figure. 5.1 Logo of Eclipse

20

5.1.2 Robotrader1
Robotrader is a simulation platform for automated stock trading. This is an open source application built by java. It is built under open source license LPGLv2. The main usage of this software is to:
● Visualize the stock chart
● Obtain and convert stock data
● provide main framework of automated trading
Simple structure of Robotrader
Directory

Description

Important file

GUI

Code of the GUI, linking of different

ReportModule.java

module, showing the total return report Market

Class which is the data of the market

HistoricData.java

like

IIndicatorContainer.java

User current money
Data from certain stock(include pre-computed data)
Register the trading strategy(indicator) Quotedb

Download / reading of stock file

InstrumentQuoteFile.java
YahooUSHistoricLoader.java

Trader

Directory that store various trading

Our implementation and

strategy

testing of trading is based on here

Stat

Directory that generate statistics data
Table. 5.1 Structure of Robotrater

1

robotrader offical website http://jrobotrader.atspace.com

21

register

Market

day pass

load stock

Trader

quotedb

output report stat Figure. 5.2 The relationship between the structure of Robotrader

5.1.3 Java Object Oriented Neural Engine
JOONE is a neural network framework built in java. In the predicting algorithm, this library will be used.

Figure. 5.3 Logo of Java Object Oriented Neural Engine – JOONE

22

5.1.4 TA-Lib (Technical Analysis Library)2
TA-Lib is an open source library, which can perform technical analysis of financial market data. TA-Lib is under a BSD License and available in different programming language. In the testing, Java version of TA-Lib will be used.

5.1.5 WEKA
Weka is machine-learning software written in java. It provides a GUI for us to do machine learning easily. It provide a back end java library for the machine learning program. In this project, we are mainly using the library instead of using the GUI to test the program directly. We are going to combine the Robotrader and
Weka into one testing platform for us to test the strategy.

5.2 Historical Stock Data Source
5.2.1 Raw Data Format
The data used is EOD (End Of Day) data extract from yahoo.us. The following table shows its format:

Field

Description

Stock

String

Stock Number in market

Date

Integer

The Date of that row of data

Open

2

Type

Float

Opening price of the day

TA-Lib official webpage : http://ta-lib.org/

23

High

Float

Highest price of the day

Low

Float

Lowest price of the day

Close

Float

Closing price of the day

Volume

Integer

Total traded Amount of Stock of the day

Adjueste

Float

Adjuested closing price of the

d Close3

stock (orignal price before split / dividend)

Table. 5.2 EOD data format

5.3 Pre-process of information
Since each raw data only include certain day stock price information. It is a discrete data and hard to find the relationship like trend in the stock. Some of the stock information like technical indicator will be pre-process for the testing.
These are some technical indicator that may be used as extra information.

5.3.1 Simple moving average (SMA)
Simple moving average is an un-weighted means of price in n previous days. By changing the n, the average price of stock in n day can be showed. SMA can also apply in Volume.
The formula of SMA is :

Eq. 5.1
3

Finance Help | - SLN2311 - Historical prices and adjusted close http://help.yahoo.com/kb/index?page=content&y=PROD_FIN&locale=en_CA&id=SLN2311 24

Figure. 5.4 Figure of HSBC Holdings plc from 2010-11 to 2011-06 blue area is stock price, red is SMA with n = 15.

5.3.2 Exponential moving average (EMA)
Exponential moving average is weighted means of price in n previous days. This is similar to SMA. For the further day i, with smaller weighted . Then the further days have the smaller affect to this indicator.

Eq. 5.2

Figure. 5.5 Figure of HSBC Holdings plc from 2010-11 to 2011-06 blue is stock price, red is EMA with n = 15

25

5.3.3 Relative Strength Index (RSI)
Relative Strength Index is a technical indicator that shows the trend of given period. J. Welles Wilder develops it in 1978. The following shows the basic equations of RSI concept:

Eq. 5.3

Eq. 5.4

For the day raise:

Eq. 5.5
For the day drop:

Eq. 5.6

5.3.4 Momentum and rate of change
Momentum (MTM) and rete of change (ROC) is the similar indicator, they are both indicator for analysis the price-changing rate.
For the MTM:
Eq. 5.7

For the ROC:
Eq. 5.8

26

6. Experiments and Results
6.1 Trading strategy
Trading strategy is about to of finding some rule or model that can achieve a higher return. The transcetion can depends on the rule that was found without further decision of human.
To find the rules of trading strategy, this part of the project is going to use classification method to classify if there stock have a high chance to rise or drop in certain period.

6.1.1 Planning of Trading strategy
In finding Trading strategy by classification, we are going to apply different classification method to the stock. Also, we will also find out which type of attribute is more suitable for finding out the rules .

6.1.2 Stock of choosing
For this part, we are going to select 0005.HK for testing. The period is from
20000101 to 20131231 training for 20000101 to 20091231 about 3000 instances, testing for 20100101 to 20131231 about 1000 instance.
The reason of choosing that is in this period of time the stock have many type of trend. For more trends that may appear, the prediction of stock price will be more accurate.

27

6.1.3 Finding rising / dropping stock in the following 30 days
In this part, we are going to apply two decision tree method for classify if the stock have rising or dropping trend.
6.1.3.1 Flow of testing
1.

Introduce the method

2.

Introduce the attribute

3.

Study on result

6.1.3.2 Choosing of method
For the trading strategy part, we are going to choose two decision trees method for doing the classification. The first one is CART (Classification And Regression
Tree). For the second part we are going to use decision tree that are built from
ID3 algorithm.
6.1.3.3 Test without using SMA in CART
In this part we are going to use the data without SMA to test. For the input setting, we guess the price accurate value will have it won meaning in the trend.
Attribute:
Name

Value

Description

past_30price

-30closing_price()

currentprice

closing_price()

past_30volume

-30closing_volume()

volume

closing_volume()

class

{c0,c1,c2}

28

c0:
(+30closing_price)/closing_
price > 1.1 c2: (+30closing_price)/closing_ price < 0.9 c1: other x means x days after current day
-y means y days before current day
This is the tree of result: currentprice < 73.7750015258789
| past_30price < 72.95000076293945
| | past_30price < 56.92500114440918
| | | currentprice < 59.77499961853027: c0(23.0/0.0)
| | | currentprice >= 59.77499961853027: c1(27.0/2.0)
| | past_30price >= 56.92500114440918: c0(59.0/7.0)
| past_30price >= 72.95000076293945
| | currentprice < 67.25: c2(25.0/4.0)
| | currentprice >= 67.25
| | | currentprice < 72.95000076293945: c1(16.0/5.0)
| | | currentprice >= 72.95000076293945: c2(7.0/4.0) currentprice >= 73.7750015258789
| currentprice < 122.45000076293945
| | past_30volume < 1.926585E7
| | | currentprice < 76.6500015258789
| | | | past_30price < 81.17500305175781: c1(7.0/1.0)
| | | | past_30price >= 81.17500305175781: c0(8.0/3.0)
| | | currentprice >= 76.6500015258789
| | | | currentprice < 118.95000076293945: c1(1015.0/160.0)
| | | | currentprice >= 118.95000076293945
| | | | | volume < 5656500.0
| | | | | | currentprice < 120.70000076293945: c2(17.0/2.0)
| | | | | | currentprice >= 120.70000076293945: c1(6.0/2.0)
| | | | | volume >= 5656500.0: c1(69.0/7.0)
| | past_30volume >= 1.926585E7
| | | past_30price < 119.8499984741211
| | | | currentprice < 84.57500076293945
| | | | | past_30price < 80.79999923706055: c1(61.0/4.0)
| | | | | past_30price >= 80.79999923706055
| | | | | | currentprice < 81.82500076293945: c1(23.0/9.0)
| | | | | | currentprice >= 81.82500076293945: c2(16.0/5.0)
| | | | currentprice >= 84.57500076293945: c1(109.0/17.0)
| | | past_30price >= 119.8499984741211
| | | | past_30price < 130.20000457763672: c2(19.0/17.0)
| | | | past_30price >= 130.20000457763672: c1(9.0/1.0)
| currentprice >= 122.45000076293945
| | currentprice < 148.79999542236328: c1(929.0/49.0)
| | currentprice >= 148.79999542236328
| | | past_30price < 139.8499984741211: c2(8.0/1.0)
| | | past_30price >= 139.8499984741211: c1(12.0/3.0)

29

Study on result:
We guess the tree that constructed is always depends on the price only. And it is highly related to the price. The tree show that the tree only memorize some pattern in certain time (e.g. current price < 118.95000076293945 with many classification). And then for the further testing of testing data it show that the tree is even not related to the decision. The c0 classification there has True positive with 0%.
6.1.3.4 Test with using SMA in CART by adding momentum related attribute and modify the class
From the previous test, the accurate value will lead to some obey problem. So we are going to adding momentum change (percentage change), And for the class , after a series of test we guess it is too high / too low for setting 1.1 or 0.9.
By the volatility formula:
Volatility = SD (change)*sqrt (days)
For volatility of 0005, the volatility approximate to 0.05, so we modify to 1.05 and 0.95.
Attribute:
Name

Value

Description

past_30average

average(from -59 to -30 closing_price()) SMA of price from -59 to -30

30average

average(from -29 to 0 closing_price()) SMA of price from -29 to current day

past_30volume

average(from -59 to -30 closing_volume()) SMA of volume from -59 to 30

30volume

average(from -29 to 0 closing_volume()) SMA of volume from -29 to current day

30

price_MT

ln(30average/past_30avera ge) percentage change of price

volume_MT

ln(30volume/past_30volum
e)

percentage change of volume class

{c0,c1,c2}

c0:
(past_30average)/30averag
e > 1.05 c2: (past_30average)/30averag e < 0.95 c1: other

x means x days after current day
-y means y days before current day
The Result of the tree:
30average < 83.06250047683716
| past_30average < 75.5733317732811
| | past_30average < 48.232500433921814
| | | 30volume < 4.164597165625E7: c2(3.0/0.0)
| | | 30volume >= 4.164597165625E7: c1(16.0/4.0)
| | past_30average >= 48.232500433921814
| | | 30average < 68.06916570663452: c0(86.0/1.0)
| | | 30average >= 68.06916570663452
| | | | 30volume < 3.4278431484375E7: c0(22.0/2.0)
| | | | 30volume >= 3.4278431484375E7: c1(4.0/0.0)
| past_30average >= 75.5733317732811
| | past_30volume < 1.680683175E7: c0(47.0/1.0)
| | past_30volume >= 1.680683175E7: c2(53.0/0.0)
30average >= 83.06250047683716
| past_30volume < 2.03377383515625E7
| | 30volume < 1.077808501953125E7
| | | past_30volume < 6514848.3203125
| | | | 30average < 114.25833308696747
| | | | | 30average < 88.99583327770233
….
| | | | | | | | price_MT < 0.07610166271935453: c1(5.0/0.0)
| | | | | | | | price_MT >= 0.07610166271935453
| | | | | | | | | past_30average < 133.67000150680542: c1(3.0/1.0)
| | | | | | | | | past_30average >= 133.67000150680542: c0(3.0/0.0)
| | | | | past_30average >= 139.19499897956848
| | | | | | past_30volume < 2.7342849921875E7
| | | | | | | past_30average < 141.0199966430664: c2(17.0/1.0)
| | | | | | | past_30average >= 141.0199966430664
| | | | | | | | past_30volume < 2.29203465625E7: c0(7.0/2.0)
| | | | | | | | past_30volume >= 2.29203465625E7
| | | | | | | | | past_30average < 144.56166315078735: c1(19.0/2.0)
| | | | | | | | | past_30average >= 144.56166315078735: c2(3.0/0.0)
| | | | | | past_30volume >= 2.7342849921875E7: c2(40.0/1.0)

31

Number of Leaf Nodes: 140
Please see the reference page

The testing putting into the classification tree: c0 c1

c2

Classified as/real

150

26

6

c0

184

148

37

c1

45

19

63

c2

True positive rate: c0 82%

c1

40%

c2

50%

Study of the result:
For the decision tree, there will be not only depends on certain attribute only.
Even in the training section the accuracy is not so high, in the testing data set perform quite well. There are three type of action. For the accuracy over 33% should be quite good. Also there are not many rising signal become dropping signal. The prediction of rising is quite good in this classification.

32

6.1.3.5 Classification of using ID3 algorithm
For this part of the project, we are going to implement the classification by ID3 algorithm decision tree. Since this implementation of decision tree can only accept fixed type input. So, we modify them into {true, false} type.
Attribute:
Name

Value

Description

pre30pricerise

if(average(from -29 to 0 closing_price()) average(from -59 to -30 closing_price()) > 0)

SMA of price from -59 to -30

pbigrise

average(from -29 to 0
SMA of price from -29 to closing_price())/average(fro current day m -59 to -30 closing_price())
> 1.05

pbigdrop

average(from -59 to -30 closing_volume()) SMA of volume from -59 to 30

pre30volumerise

average(from -29 to 0 closing_volume()) SMA of volume from -29 to current day

vbigrise

ln(30average/past_30avera ge) percentage change of price

vbigdrop

ln(30volume/past_30volum
e)

percentage change of volume class

{c0,c1,c2}

c0:
(past_30average)/30averag
e > 1.05 c2: (past_30average)/30averag e < 0.95 c1: other

33

Decision Tree: pbigdrop = false
| pbigrise = false
| | vbigdrop = false
| | | pre30pricerise = false
| | | | vbigrise = false
| | | | | pre30volumnrise = false: c1
| | | | | pre30volumnrise = true: c1
| | | | vbigrise = true: c1
| | | pre30pricerise = true
| | | | vbigrise = false
| | | | | pre30volumnrise = false: c1
| | | | | pre30volumnrise = true: c1
| | | | vbigrise = true: c1
| | vbigdrop = true
| | | pre30pricerise = false: c1
| | | pre30pricerise = true: c1
| pbigrise = true
| | vbigrise = false
| | | pre30volumnrise = false
| | | | vbigdrop = false: c1
| | | | vbigdrop = true: c1
| | | pre30volumnrise = true: c1
| | vbigrise = true: c1 pbigdrop = true
| vbigdrop = false
| | vbigrise = false
| | | pre30volumnrise = false: c1
| | | pre30volumnrise = true: c1
| | vbigrise = true: c1
| vbigdrop = true: c1

c0

c1

c2

classified as/real

0

51

0

c0

0

565

0

c1

0

62

0

c2

Testing result:
We guess using attribute with only 2 types is a total fail choice. There are nothing can be classified as c0 (big rise) or c1 (big drop)

34

6.2 Portfolio management
Portfolio management is about to distribute different investment in different proportion. These investments include securities like shares, bonds, real estate and etc. The main reason of using portfolio management is not only to increase the return of the total investment. Most basically is to increase the return and maintain the volatility to a acceptable range.
This part of the project are going to find strategy for choosing stock to reach the portfolio management's aim, increase the return and lower the volatility of total investment in a low

6.2.1 Planning of Portfolio management
In this portfolio management, we are going to divide it into two part for our choosing the stock.
For the first part, we are going to find stocks that have high chance of rising in the following year. The main reason of doing this is to optimize the stock.
In the second part, we are going to select various stocks from first part to construct a stock combination. The main reason of doing this is to optimize the total volatility of the investment combination, to reaching a more stable return.

6.2.2 Stock List of choosing
For this part, we are going to select the following stock for finding the combination of stock. Following table is the list of the stock.
0001

0002

0003

0004

0005

0006

0008

0010

0011

0012

0013

0014

0016

0017

0019

0020

0023

0041

0069

0083

0097

0101

0142

0179

0267

35

0291

0293

0315

0363

0941

1038
These are the stock, which are components of HSI in 2000.
The reason of choosing stock from 2000 list is there will be a fair result of testing.
Since the HSI will change its component depends of the performance of stock.
“Performance” always means good performance of stock. If that particular stock did not perform well in the past and if we chose components from current list, these stocks will have a higher chance that obtain a good result (rising). So, we choose the stock of previous list.

36

6.2.3 Finding rising stock in following year
In this part we are using the same technique with the trading strategy,
CART(Classification and regression tree), this method perform not bad in the trading strategy part.
6.2.3.1 Flow of testing
1.

Introduce the method

2.

Introduce the attribute

3.

Study on result

6.2.3.2 Test using accurate value
First attempt, we are using cart to do the classification.
We are going to apply the method from trading strategy test. And the attribute was modified from 30dats to 260days that are similar to 1years. Since we want to find out the long period of trend, so we changed to 260days.
Here is the attribute table:
Name

Value

Description

past_260average

average(for -260 to -519 closing_price()) SMA of price from last 2 year to last 1year

260average

average(for 0 to -259 closing_price()) SMA of price from last year to current day

past_260volume

average(for -260 to -519 volume_price()) SMA of volume from last 2 year to last 1year

260volume

average(for 0 to -259 closing_price()) SMA of price from last year to current day

class

{c0,c1,c2}

c0:
(+260closing_price)/260ave
rage > 1.2 c2: 37

(+260closing_price)/260ave rage < 0.8 c1: other
+x means x days after current day
-y means y days before current day
Tree result: past_260volume < 8142249.042480469
| past_260average < 98.87307676672935
| | 260average < 86.81057678163052
| | | past_260average < 89.88461546599865:
| | | past_260average >= 89.88461546599865:
| | 260average >= 86.81057678163052:
| past_260average >= 98.87307676672935: past_260volume >= 8142249.042480469
| past_260average < 133.7886544317007:
| past_260average >= 133.7886544317007
| | 260average < 126.74211592972279
| | | past_260average < 142.62557727098465:
| | | past_260average >= 142.62557727098465
| | | | 260average < 124.98875057697296:
| | | | 260average >= 124.98875057697296:
| | 260average >= 126.74211592972279
| | | 260average < 128.8205772638321
| | | | past_260average < 142.49903884530067:
| | | | past_260average >= 142.49903884530067:
| | | 260average >= 128.8205772638321:

Study on result:
We guess this a bad result for just using the same concept in 1 year directly, it is almost using the stock price to predict next year’s drop/rise. Even in the classification rate is high. We do not think that the tree can learn anything from the tree. Also, for the SMA hold a so long period (260 days), and there only include SMA attributes for learning. Also, there also take no affect on volume of the stock and we both think that the volume is import in classification. So we are going to not use this tree.

38

6.2.3.3 Test using proportion value
Second attempt, we are using still cart to do the classification, but we modified the attribute.
Attribute:
Name

Value

Description

past_260average_MT

closing_price()-average(for
0 to -259 closing_price())

current_closing price minus
SMA of price of last 1 year,
It use to find the momentum of the stock price in previous year

past_260_over_current

closing_price()/average(for
0 to -259 closing_price())

It use to find the momentum rate change of the stock price of previous year

past_260volume_MT

closing_volume()average(for 0 to -259 closing_volume()) current_closing price minus
SMA of price of last 1 year,
It use to find the momentum of the volume in previous year past_260_over_currentvolu me average(for 0 to -259 closing_price()) It use to find the momentum rate change of the stock volume of previous year

class

{c0,c1}

c0:
(+260closing_price)/closing
_price > 1.3 c1: other

+x means x days after current day
-y means y days before current day

39

Tree result: past_260_over_current < 0.9727344170056904
| past_260volume_MT < 4509568.572265625
| | past_260_over_currentvolume < 0.9747692049082883: c1(132.0/0.0)
| | past_260_over_currentvolume >= 0.9747692049082883
| | | past_260average_MT < -26.765577137470245
| | | | past_260average_MT < -28.83451946079731: c0(60.0/0.0)
| | | | past_260average_MT >= -28.83451946079731
| | | | | past_260average_MT < -28.60211554169655: c1(3.0/0.0)
| | | | | past_260average_MT >= -28.60211554169655: c0(16.0/0.0)
| | | past_260average_MT >= -26.765577137470245
| | | | past_260average_MT < -11.35817302763462: c1(114.0/0.0)
| | | | past_260average_MT >= -11.35817302763462
| | | | | past_260volume_MT < 2026206.537109375
| | | | | | past_260average_MT < -7.8269229382276535
| | | | | | | past_260average_MT < -8.202884435653687
| | | | | | | | past_260average_MT < -9.647596016526222
| | | | | | | | | past_260average_MT < -10.551922991871834: c0(9.0/1.0)
| | | | | | | | | past_260average_MT >= -10.551922991871834: c1(7.0/3.0)
| | | | | | | | past_260average_MT >= -9.647596016526222: c0(17.0/0.0)
| | | | | | | past_260average_MT >= -8.202884435653687: c1(6.0/3.0)
| | | | | | past_260average_MT >= -7.8269229382276535: c0(87.0/3.0)
| | | | | past_260volume_MT >= 2026206.537109375
| | | | | | past_260average_MT < -3.1514424234628677: c1(29.0/0.0)
| | | | | | past_260average_MT >= -3.1514424234628677
| | | | | | | past_260_over_currentvolume < 1.3149355996046213: c1(18.0/0.0)
| | | | | | | past_260_over_currentvolume >= 1.3149355996046213
| | | | | | | | past_260volume_MT < 2363666.533203125: c0(23.0/1.0)
| | | | | | | | past_260volume_MT >= 2363666.533203125
| | | | | | | | | past_260_over_current < 0.9664495898604564: c1(4.0/0.0)
| | | | | | | | | past_260_over_current >= 0.9664495898604564: c0(2.0/0.0)
| past_260volume_MT >= 4509568.572265625: c1(310.0/0.0) past_260_over_current >= 0.9727344170056904: c1(1230.0/0.0)

Study on result:
We modified the attribute of the stock to the ratio instead of accurate value.
However, in the tree building result, we guess there are overfitting of that. This tree always depends on the proportion of past stock price and current stock price.
The decision always ends in the first step and these classification is seem not related. We guess this straightforward view is not a good decision tree.

40

6.2.3.4 Summary on choice of first step
The decision is not good for longer period (1 year) classification. So we think choosing high rising chance stock in yearly is difficult to do. We guess this is due to the pattern of yearly data is so small for building the decision tree. So the tree will always obey into certain direction.

41

6.2.4 Choosing low relation stock from the list
In this part, we are going to divide the stocks into different category. Different categories will have different kind of stock. In the same category. For testing is the stock with similar properties, we aim to use percentage of change
6.2.4.1 Flow of testing

1.

Introduce the method

2.

Introduce the attribute

3.

Study on result

6.2.4.2 Method
K-means clustering is a method that can define the number of k cluster from the existing data. The value of k is given by user. Every cluster will have its centroid.
After the process of clustering, the data will be divided into k group.
Working principle: it firstly random define k point as centroid if any of cluster have changing for each data point for each centroid calculate the distance between cluster and data point divide the point into nearest cluster for each cluster, reconstruct a new centroid by calculate the means of data

after the process of clustering, the data point will be divided into k groups.

42

Figure. 6.1 Clustering with k = 3.
There are many types of distance counting method. Details are shown as follows:
1. Euclidian distance:
Eq. 6.1 where n represents the dimension, p and q represent point 1 and 2. Also,
2. Manhattan distance:
Eq. 6.2 where n represents the dimension, p and q represent point 1 and 2.

43

6.2.4.3 Testing using monthly percentage change data
In this part, we are using K means clustering to divide the stock into 5 set. For each attribute of testing, is the percentage change of certain month. The reason of setting this input is that similar stock will have high chance that they have similar moving trend.
Attribute:
Name

Value

Description

percentage_change1

ln(closing_price()/ 30closing_price())

The percentage change of stock in one month

...percentage_change repeat for 36 times(3years data)
+x means x days after current day
-y means y days before current day

There is the result that the result of the clustering:

44

Since this is a 36-dimension graph, the real clustering is:
Cluster

Stock list

number of stock

0

0004,0010,0012,0014,0016, 14
0017,0019,0020,0023,0041,
0069,0083,0101,0267

1

0002,0003,0006,0011,0097, 11
0142,0179,0293,0315,0941,
1038

2

0001,0013

2

3

0291,0363

2

4

0005

1

Study on result:
For the clustering result, it is quite similar to the normal divide method. For people with business knowledge, they will cluster the stock that is similar with the result. For example all the utility stock (0002,0003,0006) are in the cluster 1.
For the Hutchison’s stock, they (0001, 0013) are in the cluster 2 (but one(1038) is in cluster 1).
The result seems useless on selection of the stock. Someone may think without kmeans algorithm can still divide the stock into different sector by industry. It is not a big problem of choosing stock.
The above result not only shows the similar stock in same cluster. The stocks in the same cluster will have similar price changing direction. So, investment in same cluster may have the similar return in a certain of time.
Then, for the portfolio management, we may divide the investment equally into different cluster.

45

7. Conclusion
For this project, we divide the project into 2 parts, trading strategy and portfolio management. For the portfolio management part, we further divide it into
1.increasing return and 2.lowering the volatility of the portfolio.
The aim of doing this project is to investigate what kind of method can we used to increase the return of the investment. For investing in the stock market, if we can increase the rate of successful prediction by 1%, it may bring us large profit.
In the trading strategy part, we find out a quite good input set for us to train the rule in a short period. For the stock market, numerical is better than only category type of input. For the same input, we found that different algorithms are also important in application and attributes may have complex relationship.
However, same method cannot fit every situation. We cannot find useful input for increasing the return in portfolio management (long period time for select of stock). The long term (yearly) testing is not sensitive to same input as rather short input (monthly). So, the testing result is not good as previous.
For lowering the volatility, we use k-means clustering method. It separates the stock in the way which is similar to industry category from some stock website.
There should be a further application with the group selection. However, for the reason of simplicity, reducing the volatility can make the model to do the simplest thing. Just hold the equal percentage of the stock in the different group, the volatility may be lower than previous (Assuming the price is moving randomly). 46

Some of the inputs seem quite simple for classify the stock. However In the testing of the stock, we found that the good set up of the model is more important than extra input. In fact, there are two variables from the stock market, first is price, second is volume. Other attribute is just derivatives of these two variables. Some testes are just prototype of concept. The most important outcome of this project is that we found which attribute is related to the result. That is important for the further improvement of thing part of the program.

47

8. Difficulties and Challenges
Lack of knowledge of the finance open source software:
For testing in the experiment part, we always need to download and further process the data into a data, which is easy to analysis. However, both of us have not much idea on the software and what the software can do. We have spend half semester time to finding and setting a complex software but it does not work.
Once we found the Robotrader, relatively not complex software. We spent most of the time to study the structure and code of Robotrader.
EOD Data limitation:
For the EOD data, we find that EOD data is not suitable to finding the long period trend of stock price (we also do not know if there are any long period trend exist). May be it is just the problem that we do not transform data to the useful input. Trading period limitation:
Since we are not doing the high frequency trading. Many technical indicator do not work for the changing into input.
Machine learning method choosing:
In the first semester, we chose neural network for training the strategy. However, the neural network is a "black box" method. It is quite hard to determine if the result is correct until using testing result. For us, it is quite hard to improve the performance of the prediction. So, we chose another type of predication method, decision tree which will generate the rule and more easy for us to improve the prediction. 48

9. Contribution of Work
1st Semester:
In last summer, my final year project partner and I decided to complete a project of stock price prediction by using neural network approach. After agree on this topic, we start to do some research and test the system that we found and establish. However, the result is quite disappointing since we cannot totally understand the whole picture of neural network and how it works on stock price prediction.
We were confused on the application of recursive neural network and stepforward neural network.
Therefore we could not obtain what we expected form this project in the 1st semester. After the project presentation, our supervisor suggested us to change the topic since there are so many researches related to this topic have launched in past decade, it is hard to get a great advance in the present stage.
2nd Semester
After failure in 1st semester, we discussed what makes us fail in 1st semester and finally we decided to change the topic to a more appropriate manner so as to have a investigation on different trading strategies and how to apply them into portfolio selection but not only focus on investigating trading strategies.
In this semester, we use different data mining methods to achieve our goal. I suggest using CART while my partner Frank suggested using ID3. However, we didn’t argue with each other, but decided to do these two together and the result is quite encouraging.

49

At last, I want to give thanks to my partner Frank. This project should be a great learning experience. Without his suggestion of topic, we have barely have chance to learn in this area. In addition, he also supported me a lot and always provides some suggestions for me. This project will not be done without his effort.

50

Bibliography
[1] T. Hastie, R. Tibshirani, and J. Friedman. Element of Statistical Learning,
Springer, 2009
[2] 1.8 Decision Trees, Scikit-Learn Organization, 2013
Available HTTP: http://scikit-learn.org/stable/modules/tree.html
[3] J. Li. Classification/Decision Trees(1), STAT 597 Lecture Notes, The
Pennsylvania State University, 2011.
Available HTTP: http://sites.stat.psu.edu/~jiali/
[4] The impurity Function, STAT 557 Lecture Notes, The Pennsylvania State
University, 2014.
Available HTTP: https://onlinecourses.science.psu.edu/stat557/node/85
[5] V-fold cross-validation, DTREG, 2010
Available HTTP: http://www.dtreg.com/crossvalidation.htm
[6] Chapter 4 – Decision Tree, asiaMiner, 1998
Available HTTP: http://120.105.96.8/lab/Past_Course/98-2/datamining/6.pdf
[7] W. Wang, and A. Gelman. Difficulty of selecting among multilevel models using predictive accuracy. New York: The Columbia University, 2014.
Available HTTP: http://www.stat.columbia.edu/~gelman/research/unpublished/xval.pdf
[8] T. Y. Fun, Cyrus. Analyzing stock quotes using data mining technique, The
University of Hong Kong, 2013.
Available HTTP: http://i.cs.hku.hk/fyp/2012/fyp12031/Final_Report.pdf
[9] V-fold cross-validation, Statsoft Electronic Statistic Textbooks, 2014.
[10] R. Schapire. COS 424 – Interacting with data, Princeton University, 2007
Available HTTP: http://www.cs.princeton.edu/courses/archive/spr07/cos424/scribe_notes/0220.pdf [11] A. Sunden. Trading based on classification and regression tree, KTH royal institute of technology, 2010.
[12] A. Andriyashin, W. Hardle, and R. Timofeev. Recursive partfolio selection with decision tree, Berlin: SFB 649 Economic Risk, 2008.
[13] S. Russell, and P. Norvig. Artificial Intelligence: A Modern Approach 3rd
Edition, London: Pearson, 2009.

51

Appendix A
A.1 Decision Tree from CART
(6.1.3.4 Test with using SMA in CART by adding momentum related

attribute and modify the class)
30average < 83.06250047683716
| past_30average < 75.5733317732811
| | past_30average < 48.232500433921814
| | | 30volume < 4.164597165625E7: c2(3.0/0.0)
| | | 30volume >= 4.164597165625E7: c1(16.0/4.0)
| | past_30average >= 48.232500433921814
| | | 30average < 68.06916570663452: c0(86.0/1.0)
| | | 30average >= 68.06916570663452
| | | | 30volume < 3.4278431484375E7: c0(22.0/2.0)
| | | | 30volume >= 3.4278431484375E7: c1(4.0/0.0)
| past_30average >= 75.5733317732811
| | past_30volume < 1.680683175E7: c0(47.0/1.0)
| | past_30volume >= 1.680683175E7: c2(53.0/0.0)
30average >= 83.06250047683716
| past_30volume < 2.03377383515625E7
| | 30volume < 1.077808501953125E7
| | | past_30volume < 6514848.3203125
| | | | 30average < 114.25833308696747
| | | | | 30average < 88.99583327770233
| | | | | | past_30average < 86.87916648387909: c0(2.0/0.0)
| | | | | | past_30average >= 86.87916648387909: c1(43.0/0.0)
| | | | | 30average >= 88.99583327770233
| | | | | | past_30average < 89.0083338022232: c0(27.0/1.0)
| | | | | | past_30average >= 89.0083338022232
| | | | | | | past_30average < 97.06666696071625
| | | | | | | | volume_MT < -0.21804166104012898: c2(30.0/2.0)
| | | | | | | | volume_MT >= -0.21804166104012898
| | | | | | | | | past_30average < 95.23333370685577
| | | | | | | | | | past_30average < 89.57499992847443: c1(6.0/0.0)
| | | | | | | | | | past_30average >= 89.57499992847443
| | | | | | | | | | | 30average < 90.00833404064178: c0(4.0/2.0)
| | | | | | | | | | | 30average >= 90.00833404064178
| | | | | | | | | | | | 30volume < 6488559.9765625: c2(21.0/1.0)
| | | | | | | | | | | | 30volume >= 6488559.9765625
| | | | | | | | | | | | | past_30average < 90.30833351612091
| | | | | | | | | | | | | | past_30average < 89.8125: c0(2.0/1.0)
| | | | | | | | | | | | | | past_30average >= 89.8125: c2(7.0/0.0)
| | | | | | | | | | | | | past_30average >= 90.3083335: c1(6.0/1.0)
| | | | | | | | | past_30average >= 95.23333370685577: c1(8.0/0.0)
| | | | | | | past_30average >= 97.06666696071625
| | | | | | | | past_30average < 116.51666676998138
| | | | | | | | | past_30average < 97.9208334684372: c1(10.0/0.0)
| | | | | | | | | past_30average >= 97.9208334684372
| | | | | | | | | | past_30average < 109.94166707992554
| | | | | | | | | | | 30volume < 5409938.328125: c1(2.0/0.0)

52

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

| | | | | | | | 30volume >= 5409938.328125
| | | | | | | | | volume_MT < -0.2725792239463332: c1(2.0/0.0)
| | | | | | | | | volume_MT >= -0.272579223: c0(9.0/1.0)
| | | | | | | past_30average >= 109.94166707992554: c1(9.0/1.0)
| | | | | past_30average >= 116.51666676998138: c0(15.0/2.0)
| 30average >= 114.25833308696747: c2(47.0/0.0) past_30volume >= 6514848.3203125
| past_30average < 85.07083308696747
| | price_MT < -0.11886135627750044: c2(8.0/0.0)
| | price_MT >= -0.11886135627750044
| | | past_30volume < 9937318.3203125
| | | | price_MT < -0.05290842702612067: c2(9.0/3.0)
| | | | price_MT >= -0.05290842702612067: c1(11.0/3.0)
| | | past_30volume >= 9937318.3203125: c1(52.0/0.0)
| past_30average >= 85.07083308696747
| | 30volume < 8215306.63671875
| | | 30volume < 6421098.345703125
| | | | 30average < 87.20416641235352: c0(17.0/0.0)
| | | | 30average >= 87.20416641235352
| | | | | volume_MT < 0.23452068635315515
| | | | | | 30average < 89.77500057220459: c0(13.0/0.0)
| | | | | | 30average >= 89.77500057220459
| | | | | | | past_30volume < 7179101.6484375
| | | | | | | | past_30average < 111.65833270549774
| | | | | | | | | price_MT < -0.010370028402811551
| | | | | | | | | | 30average < 91.65833365917206
| | | | | | | | | | | past_30average < 89.25: c1(2.0/0.0)
| | | | | | | | | | | past_30average >= 89.25: c2(3.0/0.0)
| | | | | | | | | | 30average >= 91.658333: c1(13.0/0.0)
| | | | | | | | | price_MT >= -0.010370028402811551
| | | | | | | | | | 30volume < 5697111.71875: c1(5.0/2.0)
| | | | | | | | | | 30volume >= 5697111.71875: c0(3.0/0.0)
| | | | | | | | past_30average >= 111.6583327: c2(2.0/0.0)
| | | | | | | past_30volume >= 7179101.6484375: c0(11.0/3.0)
| | | | | volume_MT >= 0.23452068635315515
| | | | | | past_30average < 108.6958349943161
| | | | | | | 30average < 96.48333370685577
| | | | | | | | volume_MT < 0.5989818068801052: c1(25.0/10.0)
| | | | | | | | volume_MT >= 0.5989818068801052
| | | | | | | | | past_30average < 93.6958335: c2(5.0/0.0)
| | | | | | | | | past_30average >= 93.6958335: c1(5.0/2.0)
| | | | | | | 30average >= 96.48333370685577: c2(15.0/2.0)
| | | | | | past_30average >= 108.6958349943161: c1(12.0/0.0)
| | | 30volume >= 6421098.345703125
| | | | past_30volume < 1.365124340625E7
| | | | | past_30volume < 6653846.6640625
| | | | | | past_30average < 100.30000030994415
| | | | | | | 30volume < 6828783.34765625: c2(2.0/1.0)
| | | | | | | 30volume >= 6828783.34765625: c1(3.0/0.0)
| | | | | | past_30average >= 100.30000030994415: c0(3.0/0.0)
| | | | | past_30volume >= 6653846.6640625
| | | | | | 30average < 108.05000042915344
| | | | | | | 30average < 98.37916719913483
| | | | | | | | price_MT < 0.028488586996955848
| | | | | | | | | past_30average < 88.2708331: c1(17.0/1.0)

53

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

| | | | | | | past_30average >= 88.27083313465118
| | | | | | | | past_30average < 88.5124998: c0(8.0/0.0)
| | | | | | | | past_30average >= 88.51249980926514
| | | | | | | | | past_30average < 89.51666688919067
| | | | | | | | | | past_30average < 89.30833351612091
| | | | | | | | | | | 30average < 88.00416672229767
| | | | | | | | | | | | price_MT < 0.022957: c0(4.0/0.0)
| | | | | | | | | | | | price_MT >= 0.022957: c1(8.0/1.0)
| | | | | | | | | | | 30average >= 88.004166: c1(6.0/0.0)
| | | | | | | | | | past_30average >= 89.3083: c0(5.0/0.0)
| | | | | | | | | past_30average >= 89.51666: c1(10.0/0.0)
| | | | | | price_MT >= 0.028488586996955848: c1(32.0/1.0)
| | | | | 30average >= 98.37916719913483: c0(6.0/0.0)
| | | | 30average >= 108.05000042915344: c1(61.0/5.0)
| | past_30volume >= 1.365124340625E7: c0(8.0/0.0)
30volume >= 8215306.63671875
| past_30average < 120.50833451747894
| | 30average < 93.59583342075348
| | | 30average < 87.72083365917206
| | | | past_30volume < 9248506.66796875
| | | | | past_30average < 88.01249969005585
| | | | | | 30volume < 9010618.3359375: c0(3.0/0.0)
| | | | | | 30volume >= 9010618.3359375: c1(16.0/2.0)
| | | | | past_30average >= 88.01249969005585: c0(11.0/0.0)
| | | | past_30volume >= 9248506.66796875: c0(18.0/0.0)
| | | 30average >= 87.72083365917206
| | | | 30volume < 9662961.67578125: c1(28.0/2.0)
| | | | 30volume >= 9662961.67578125
| | | | | past_30average < 87.6749997138977: c0(2.0/0.0)
| | | | | past_30average >= 87.6749997138977: c2(9.0/3.0)
| | 30average >= 93.59583342075348
| | | 30average < 107.48333442211151
| | | | price_MT < -0.027045834387435536: c0(61.0/3.0)
| | | | price_MT >= -0.027045834387435536
| | | | | price_MT < -0.01687964036108963: c1(7.0/2.0)
| | | | | price_MT >= -0.01687964036108963: c0(7.0/1.0)
| | | 30average >= 107.48333442211151
| | | | past_30average < 103.18750047683716
| | | | | past_30average < 93.69999957084656: c2(2.0/0.0)
| | | | | past_30average >= 93.69999957084656: c1(20.0/0.0)
| | | | past_30average >= 103.18750047683716
| | | | | 30average < 123.299999833107
| | | | | | 30average < 116.06666648387909: c0(16.0/2.0)
| | | | | | 30average >= 116.06666648387909
| | | | | | | 30volume < 1.0631763359375E7
| | | | | | | | past_30average < 118.69166767597198
| | | | | | | | | past_30average < 106.08333: c0(2.0/0.0)
| | | | | | | | | past_30average >= 106.0833: c1(24.0/5.0)
| | | | | | | | past_30average >= 118.69166767: c2(3.0/0.0)
| | | | | | | 30volume >= 1.0631763359375E7: c0(5.0/0.0)
| | | | | 30average >= 123.299999833107: c0(15.0/0.0)
| past_30average >= 120.50833451747894
| | past_30volume < 8881448.365234375: c2(5.0/0.0)
| | past_30volume >= 8881448.365234375
| | | past_30average < 127.58333241939545: c1(17.0/0.0)

54

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

| | | | | | past_30average >= 127.58333241939545: c2(2.0/0.0)
30volume >= 1.077808501953125E7
| 30volume < 2.28420066484375E7
| | past_30average < 92.55833280086517
| | | past_30average < 90.25416588783264
| | | | past_30average < 87.72916662693024
| | | | | past_30volume < 9556166.65625: c0(13.0/0.0)
| | | | | past_30volume >= 9556166.65625: c1(8.0/0.0)
| | | | past_30average >= 87.72916662693024: c1(15.0/1.0)
| | | past_30average >= 90.25416588783264
| | | | 30volume < 1.21910567578125E7: c2(15.0/1.0)
| | | | 30volume >= 1.21910567578125E7: c1(3.0/0.0)
| | past_30average >= 92.55833280086517
| | | past_30average < 131.8416666984558
| | | | 30volume < 2.03765516015625E7
| | | | | 30volume < 1.136689003515625E7
| | | | | | price_MT < -0.04136368921396885: c0(6.0/0.0)
| | | | | | price_MT >= -0.04136368921396885
| | | | | | | past_30average < 129.14999961853027
| | | | | | | | 30average < 115.40833258628845: c0(3.0/1.0)
| | | | | | | | 30average >= 115.40833258628845: c1(45.0/6.0)
| | | | | | | past_30average >= 129.14999961853027: c2(3.0/0.0)
| | | | | 30volume >= 1.136689003515625E7
| | | | | | past_30volume < 1.038330999609375E7
| | | | | | | past_30volume < 1.0308056671875E7
| | | | | | | | price_MT < -0.013769810465573058: c1(28.0/0.0)
| | | | | | | | price_MT >= -0.013769810465573058
| | | | | | | | | past_30average < 123.60833239555359
| | | | | | | | | | 30average < 124.09166646003723: c1(29.0/0.0)
| | | | | | | | | | 30average >= 124.09166646003723: c2(4.0/0.0)
| | | | | | | | | past_30average >= 123.60833239555: c2(7.0/0.0)
| | | | | | | past_30volume >= 1.0308056671875E7: c2(3.0/0.0)
| | | | | | past_30volume >= 1.038330999609375E7
| | | | | | | past_30volume < 2.006368996875E7
| | | | | | | | price_MT < 0.04761932146681137: c1(345.0/4.0)
| | | | | | | | price_MT >= 0.04761932146681137
| | | | | | | | | past_30average < 122.23333370685: c0(4.0/0.0)
| | | | | | | | | past_30average >= 122.23333370685: c1(17.0/0.0)
| | | | | | | past_30volume >= 2.006368996875E7
| | | | | | | | past_30average < 129.7716679573059: c2(2.0/0.0)
| | | | | | | | past_30average >= 129.7716679573059: c1(2.0/0.0)
| | | | 30volume >= 2.03765516015625E7
| | | | | 30average < 127.9516670703888: c1(14.0/0.0)
| | | | | 30average >= 127.9516670703888: c0(10.0/2.0)
| | | past_30average >= 131.8416666984558
| | | | 30average < 140.32666850090027
| | | | | past_30average < 141.5266673564911
| | | | | | 30volume < 1.3384851640625E7: c2(8.0/0.0)
| | | | | | 30volume >= 1.3384851640625E7
| | | | | | | 30average < 140.01333475112915
| | | | | | | | 30average < 128.41166615486145
| | | | | | | | | past_30volume < 1.584262825E7
| | | | | | | | | | past_30average < 132.9549994468: c2(3.0/0.0)
| | | | | | | | | | past_30average >= 132.9549994468: c1(8.0/0.0)
| | | | | | | | | past_30volume >= 1.584262825E7: c0(8.0/1.0)

55

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

| | | | | | | | | 30average >= 128.41166615486145: c1(91.0/7.0)
| | | | | | | | 30average >= 140.01333475112915
| | | | | | | | | past_30average < 135.96666598320007: c1(3.0/0.0)
| | | | | | | | | past_30average >= 135.9666659832000: c0(14.0/3.0)
| | | | | | past_30average >= 141.5266673564911: c0(24.0/1.0)
| | | | | 30average >= 140.32666850090027
| | | | | | past_30volume < 1.9316740078125E7
| | | | | | | past_30volume < 1.3976396609375E7
| | | | | | | | price_MT < -0.03302021305750843
| | | | | | | | | past_30average < 140.36666870117188: c2(8.0/0.0)
| | | | | | | | | past_30average >= 140.36666870117188: c1(2.0/0.0)
| | | | | | | | price_MT >= -0.03302021305750843: c1(28.0/1.0)
| | | | | | | past_30volume >= 1.3976396609375E7: c1(125.0/7.0)
| | | | | | past_30volume >= 1.9316740078125E7
| | | | | | | past_30average < 145.57166695594788
| | | | | | | | past_30average < 144.38000059127808: c2(3.0/0.0)
| | | | | | | | past_30average >= 144.38000059127808: c1(4.0/0.0)
| | | | | | | past_30average >= 145.57166695594788: c2(4.0/0.0)
| | 30volume >= 2.28420066484375E7
| | | 30average < 122.91666626930237: c2(21.0/2.0)
| | | 30average >= 122.91666626930237
| | | | 30average < 142.6049988269806: c0(11.0/1.0)
| | | | 30average >= 142.6049988269806: c1(11.0/1.0) past_30volume >= 2.03377383515625E7
| past_30average < 117.99999868869781
| | price_MT < 0.12036383786541163
| | | past_30volume < 2.35881233125E7: c2(6.0/1.0)
| | | past_30volume >= 2.35881233125E7
| | | | past_30average < 81.12166583538055
| | | | | past_30average < 76.39416575431824
| | | | | | past_30average < 68.06916570663452: c0(2.0/0.0)
| | | | | | past_30average >= 68.06916570663452
| | | | | | | past_30average < 72.4608324766159
| | | | | | | | past_30average < 70.5474990606308: c1(3.0/1.0)
| | | | | | | | past_30average >= 70.5474990606308: c2(3.0/0.0)
| | | | | | | past_30average >= 72.4608324766159: c1(6.0/0.0)
| | | | | past_30average >= 76.39416575431824: c0(9.0/0.0)
| | | | past_30average >= 81.12166583538055: c1(41.0/2.0)
| | price_MT >= 0.12036383786541163: c2(10.0/0.0)
| past_30average >= 117.99999868869781
| | 30average < 121.36500036716461
| | | 30average < 113.46416628360748
| | | | 30average < 94.62666630744934
| | | | | past_30average < 120.43833386898041: c2(5.0/0.0)
| | | | | past_30average >= 120.43833386898041: c1(4.0/0.0)
| | | | 30average >= 94.62666630744934: c2(15.0/0.0)
| | | 30average >= 113.46416628360748: c0(36.0/0.0)
| | 30average >= 121.36500036716461
| | | past_30average < 129.61166787147522: c2(52.0/0.0)
| | | past_30average >= 129.61166787147522
| | | | past_30average < 139.19499897956848
| | | | | 30volume < 2.393230975E7: c1(11.0/3.0)
| | | | | 30volume >= 2.393230975E7
| | | | | | 30volume < 2.48086630625E7: c0(2.0/0.0)
| | | | | | 30volume >= 2.48086630625E7

56

|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|

|
|
|
|
|
|
|
|
|
|
|
|
|

| | | price_MT < 0.07610166271935453: c1(5.0/0.0)
| | | price_MT >= 0.07610166271935453
| | | | past_30average < 133.67000150680542: c1(3.0/1.0)
| | | | past_30average >= 133.67000150680542: c0(3.0/0.0) past_30average >= 139.19499897956848
| past_30volume < 2.7342849921875E7
| | past_30average < 141.0199966430664: c2(17.0/1.0)
| | past_30average >= 141.0199966430664
| | | past_30volume < 2.29203465625E7: c0(7.0/2.0)
| | | past_30volume >= 2.29203465625E7
| | | | past_30average < 144.56166315078735: c1(19.0/2.0)
| | | | past_30average >= 144.56166315078735: c2(3.0/0.0)
| past_30volume >= 2.7342849921875E7: c2(40.0/1.0)

A.2 ID3 Testing Result
(6.1.3.5 Classification of using ID3 algorithm)
C0
0
0
0

C1
51
565
62

C2
0
0
0

Comparing with CART algorithm:
True positive rate:
ID3
C0
0
C1
1
C2
0

Classified as/ real
C0
C1
C2

CART
0.82
0.40
0.50

57

A.3 Sample Data used in ID3 Algorithm
(6.1.3.5 Classification of using ID3 algorithm)

A.4 Sample Data used in CART Algorithm
(6.1.3.3 Test without using SMA in CART)

58

A.5 Sample Data used in K-means Clustering
(6.2.4.3 Testing using monthly percentage change data)

0001.hk and 0002.hk:

59

Similar Documents

Premium Essay

Word of Mouth

...size according to local tastes and conditions. Effective marketing and business strategy therefore requires a segmentation of the market into homogeneous segments, an understanding of the needs and wants of these segments, the design of products and services that meet those needs and development of marketing strategies, to effectively reach the target segments. Thus focusing on segments is at the core of organizations’ efforts to become customer driven; it is also the key to effective resource allocation and deployment. The level of segment aggregation is an increasingly important issue. In today’s global economy, the ability to customize products and services often calls for the most micro of segments: the segment of one. Following and implementing a market segmentation strategy allows the firm to increase its profitability, as suggested by the classic price discrimination model which provides the theoretical rationale for segmentation. Since the early 1960s, segmentation has been viewed as a key...

Words: 14313 - Pages: 58

Free Essay

Artificial Intelligence

... ARTIFICIAL INTELLIGENCE IN BUSINESS Introduction Business applications utilize the specific technologies mentioned earlier to try and make better sense of potentially enormous variability (for example, unknown patterns/relationships in sales data, customer buying habits, and so on). However, within the corporate world, AI is widely used for complex problem-solving and decision-support techniques in real-time business applications. The business applicability of AI techniques is spread across functions ranging from finance management to forecasting and production.  In the fiercely competitive and dynamic market scenario, decision-making has become fairly complex and latency is inherent in many processes. In addition, the amount of data to be analyzed has increased substantially. AI technologies help enterprises reduce latency in making business decisions, minimize fraud and enhance revenue opportunities. Definition of AI  AI is a broad discipline that promises to simulate numerous innate human skills such as automatic programming, case-based reasoning, neural networks, decision-making, expert systems, natural language processing, pattern recognition and speech recognition etc. AI technologies bring more complex data-analysis features to existing applications. There are many definitions that attempt to explain what Artificial Intelligence (AI) is. I like to think of AI as a science that investigates knowledge and intelligence, possibly the intelligent application...

Words: 4049 - Pages: 17

Premium Essay

Carbon Tax Mining

...Faculty of Science and Engineering Department of Mining and Engineering and Mine Surveying Western Australia School of Mines 12585 - Mine Planning 532 Research Paper 1 – Mine Planning Process and the Carbon Tax Due Date : Friday 19-8-2011 Word Count: 2470 Abstract On 15 December 2008, the Federal Government launched its 2020 target for greenhouse gas emissions and its White Paper on the Carbon Pollution Reduction Scheme (CPRS) as the start of the policy and legislation process. The mining sector in Australia has been cited as being a major contributor to greenhouse gases. The introduction of the CPRS means carbon emissions of a mining project should be considered from the initial stages of mine planning. The traditional approach to mine planning involves consideration of technical and economic data as inputs to the process. This paper considers the effect of the CPRS on various technical and economic factors related to the mine planning process. The results of this paper imply that the introduction of the CPRS makes it is imperative for mining companies to assess the impact of carbon emissions on a mining project during mine planning. Introduction Climate change has become an increasingly topical issue in recent times. Mounting scientific evidence suggests that human activities are causing a buildup of greenhouse gases and that this in turn is causing changes to the world’s climate (Gregorczu, 1999). Further complicating the issue, there are economic costs, scientific...

Words: 2496 - Pages: 10

Premium Essay

Mba Special Assignment

...UNIVERSITY DEPARTMENTS ANNA UNIVERSITY CHENNAI : : CHENNAI 600 025 REGULATIONS - 2009 CURRICULUM I TO IV SEMESTERS (FULL TIME) MASTER OF BUSINESS ADMINISTRATION (MBA) SEMESTER – I |Code No. |Course Title |L |T |P |C | |BA9101 |Statistics for Management |3 |1 |0 |4 | |BA9102 |Economic Analysis for Business |4 |0 |0 |4 | |BA9103 |Total Quality Management |3 |0 |0 |3 | |BA9104 |Organizational Behaviour |3 |0 |0 |3 | |BA9105 |Communication Skills |3 |0 |0 |3 | |BA9106 |Accounting for Management |3 |1 |0 |4 | |BA9107 |Legal Aspects of Business |3 |0 |0 |3 | |BA9108 |Seminar I – Management Concept |0 |0 |2 |1 | | |Total | | | |25...

Words: 17609 - Pages: 71

Free Essay

An Integrated Framework for Project Portfolio

...framework for project portfolio selection NP Archer* and F Ghasemzadeh Michael G. DeGroote School of Business, McMaster University, Hamilton, Ontario, Canada L8S 4M4 The task of selecting project portfolios is an important and recurring activity in many organizations. There are many techniques available to assist in this process, but no integrated framework for carrying it out. This paper simpli®es the project portfolio selection process by developing a framework which separates the work into distinct stages. Each stage accomplishes a particular objective and creates inputs to the next stage. At the same time, users are free to choose the techniques they ®nd the most suitable for each stage, or in some cases to omit or modify a stage if this will simplify and expedite the process. The framework may be implemented in the form of a decision support system, and a prototype system is described which supports many of the related decision making activities. # 1999 Published by Elsevier Science Ltd and IPMA. All rights reserved Keywords: Project portfolio selection, project management, integrated framework, decision support Introduction Project portfolio selection and the associated activity of managing selected projects throughout their life cycles are important activities in many organizations,1± 3 since project management approaches are so commonly used in many industries for activities such as research and development of new products, implementing new systems and processes...

Words: 8671 - Pages: 35

Premium Essay

Hostel Management Project

...of data warehouse concepts for the proposed Greenville School Hostel Management System. The result of literature review gives us information with regard to the research done on the topic by others researchers. Result of this review will be the gaps that are found in the existing works and the good features that can be suggested for the proposed system. A data warehouse is projected in a way that data can be stored and accessed and is not restricted only to tables and relational lines. (Fink, 1998) As the data warehouse is separated from operational databases, user’s queries do not cause any impact in these systems. Data warehouse is protected from any non-authorized alteration or loss of data. Data warehouse contemplates the base and the resources needed for a Decision Support System (DSS), supplying historic and integrated data. These data are for top managers, decision makers, and partners, donors – who need brief, summarized and integrated information – and for low-level managers, for whom detailed data helps to observe some tactical aspects of the organization. In this way, data warehouse provides a specialized. For a centralized database oracle will be used for storing details of data being brought from different hostel location of the organization and saved into single database using oracle database management system.  (BI / DW Insider, 2011) 2.2 BACKGROUND The architecture of data ware housing includes tools, one of those tools which can extract the data base...

Words: 2337 - Pages: 10

Free Essay

Essay

...IFSM 304 – Ethics in the Information Age Learning Portfolio Assignment Descriptions Here are the detailed Assignment Requirements and Summary (Reflection Paper) that comprise activities for a Learning Portfolio for IFSM 304. These activities, along with class discussion and any other assignments your professor may require will enable you to achieve the course objectives and demonstrate knowledge of key concepts and apply this understanding to real-world digital ethics topics and situations. Understand that your work will comprise a Learning Portfolio for the course and these assignments are linked! You will be faced with work that advances with a progression from a general basic framework for decision making to more specific analysis and critical thinking about more complex ethical issues. Current Events 5% Conference Posting of articles on IT-related Ethical Global issues (multi-national corporation) The purpose of this assignment is to analyze a current event article on global ethical issues from a multi-national corporation perspective. This assignment is designed to increase your knowledge from a corporate viewpoint and enable you to analyze ethical issues from a current GLOBAL situation or event. This will also enhance your research and writing skills and your critical-thinking abilities. Select a current topic (2010 source or later) and find an appropriate article on the topic. Provide the persistent URL (one that anyone can click on to read the...

Words: 3771 - Pages: 16

Premium Essay

Abcd

...generated by the customer’s business? a) They are driven by patterns of business activity b) It is impossible to predict how they behave c) It is impossible to influence demand patterns d) They are driven by the delivery schedule generated by capacity management 6. Which of the following is NOT one of the ITIL core publications? a) Service Optimization b) Service Transition c) Service Design d) Service Strategy 7. Which of the following statements is CORRECT? 1. Only one person can be responsible for an activity 2. Only one person can be accountable for an activity a) All of the above b) 1 only c) 2 only d) None of the above 8. Which is the correct sequence of events in the selection of a technology tool? a) Select Product, Requirements, Selection...

Words: 8836 - Pages: 36

Premium Essay

Mba Syllabus

...SRM UNIVERSITY (Under section 3 of UGC Act, 1956) FACULTY OF MANAGEMENT SCHOOL OF MANAGEMENT MBA FULL TIME CURRICULUM AND SYLLABUS - 2013-14 1 Code MB 13101 MB 13102 MB 13103 MB 13104 MB 13105 MB 13106 SRM University MBA - Revised Curriculum - 2013-14 Semester –I Thinking and Communication Skills (Practical) Accounting for Decision Making Philosophy for Management Economics for Managers Managerial Statistics Managerial Skills (Practical) Semester-II Financial Management Management Information System Marketing Human Resource Management Production And Operation Management Legal Aspects of Business Semester- III Summer Internship (8 weeks)(Practical) Entrepreneurship Strategic Management Business Analytics (Practical) Elective-1 Elective-2 Elective-3 Elective-4 Semester- IV Elective-5 Elective-6 Industrial Elective (Practical) Total Credit L 0 2 3 2 2 0 T 0 4 0 2 4 0 P 4 0 0 0 0 6 C 2 4 3 3 4 3 19 4 3 4 2 4 3 20 2 3 3 2 3 3 3 3 22 3 3 5 11 72 MB 13207 MB 13208 MB 13209 MB 13210 MB 13211 MB 13212 MB 13313 MB 13314 MB 13315 MB 13316 2 2 3 2 3 2 0 2 2 0 2 2 2 2 2 2 0 4 2 2 0 2 2 0 2 2 0 2 2 2 2 2 2 0 0 0 0 0 0 0 4 0 0 4 0 0 0 0 0 0 10 MB 13417 Functional Electives Marketing Finance Systems Human Resource Operations Vertical Electives Pharma Hospitality Enterprise Resource Planning Agriculture Hospital and Health Care Retailing Auto Industry Project Management Media and Communication Banking Financial Service Insurance   2 MB...

Words: 53231 - Pages: 213

Premium Essay

Hello

...by the terms established by the vendor at the time you acquire this publication. U.S. Government Restricted Rights Notice. Use, duplication, or disclosure of this software and related documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227–19 Commercial Computer Software-Restricted Rights (June 1987). SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, April 2006 SAS Publishing provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the SAS Publishing Web site at support.sas.com/pubs or call 1-800-727-3228. SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Contents Chapter 1 Data Mining Overview 1 Layout of the Enterprise Miner Window 2 Organization and Uses of Enterprise Miner Nodes 7...

Words: 15666 - Pages: 63

Premium Essay

Predictive Analytics in Fmcg

...Table of Contents Introduction 4 Objectives 6 Literature Review 7 Framework 9 Data Analysis 14 Qualitative Analysis 14 Qualitative Analysis 16 Research Methodology 17 Conclusion 20 Bibliography 21 APPENDIX – I APPENDIX – II APPENDIX – III APPENDIX – IV APPENDIX - V Introduction Predictive analytics has its origin from a famous saying: Past performance is the basic indicator of future results. It looks at historical cases and builds models which can then be applied to benefit present scenarios or predict future scenarios. Predictive Analytics is the best way for a business to predict customer responses in the future. It provides solutions for businesses facing main problems like ‘What segment of potential consumers will respond best to our message’ and ‘how can I stop my customers from leaving, and why am I losing them?’(Curtis, 2010). Predictive analytics is not just for providing a solution for a business problem but involves techniques mainly to improve the focus of company towards customers and customers towards company. The magnificence of predictive analytics is that a business characteristically perceives a win-win situation. In other words, a business not only benefits from higher returns but also gets to save on cost (Colin, 2009). Predictive analytics is becoming a competitive necessity and an important aspect of many types of business, particularly in this type of economy where an organization is trying to increase its efficiency and at...

Words: 5988 - Pages: 24

Premium Essay

Sem3

...Second Year - Third Semester 3.0.1 International Business - University Assessment 100 Marks Course Content 1. Overview of the International Business Process 2. PEST factors affecting International Business 3. Government influence on trade 4. International Trade Theories 5. FDI 6. Country Evaluation and Selection 7. Collaborative Strategies 8. International Marketing 9. International Trade Agreements 10. International Trade Organizations 11. Forex 12. International HR Strategies 13. International Diplomacy Reference Text 1. International Business – Daniels and Radebough 2. International Business – Sundaram and Black 3. International Business – Roebuck and Simon 4. International Business – Charles Hill 5. International Business – Subba Rao 3.0.2 Strategic management 100 Marks Course Content 1. Strategic Management Process: Vision, Mission, Goal, Philosophy, Policies of an Organization. 2. Strategy, Strategy as planned action, Its importance, Process and advantages of planning Strategic v/s Operational Planning. 3. Decision making and problem solving, Categories of problems, Problem solving skill, Group decision making, Phases indecision making. 4. Communication, Commitment and performance, Role of the leader, Manager v/s Leader, Leadership styles. 5. Conventional Strategic Management v/s Unconventional Strategic Management, The differences, Changed Circumstance 6. Growth Accelerators: Business Web, Market Power, Learning based. 7. Management Control, Elements,...

Words: 13742 - Pages: 55

Free Essay

Cfdj

...www.pwc.co.uk The direct economic impact of gold October 2013 www.pwc.co.uk The work carried out by PricewaterhouseCoopers LLP ("PwC") in relation to this report has been carried out only for the World Gold Council and solely for the purpose and on the terms agreed between PwC and the World Gold Council. The report does not constitute professional advice. No representation or warranty (express or implied) is given as to the accuracy or completeness of the information contained in this report and, to the extent permitted by law, PricewaterhouseCoopers LLP, its members, employees and agents do not accept or assume any liability, responsibility or duty of care for any consequences to anyone acting, or refraining to act, in reliance on the information contained in this report or for any decision based on it. © 2013 PricewaterhouseCoopers LLP. All rights reserved. In this document, "PwC" refers to PricewaterhouseCoopers LLP (a limited liability partnership in the United Kingdom), which is a member firm of PricewaterhouseCoopers International Limited, each member firm of which is a separate legal entity. The direct economic impact of gold Contents Foreword ........................................................................................................................................................................1 Executive summary ...........................................................................................................................................

Words: 25301 - Pages: 102

Free Essay

What Explains the Stock Market’s Reaction to Federal Reserve Policy?

...estimating the size of the typical reaction, and understanding the reasons for the market’s response. On average over the May 1989 to December 2001 sample, a “typical” unanticipated 25 basis point rate cut has been associated with a 1.3 percent increase in the S&P 500 composite index. The estimated response varies considerably across industries, with the greatest sensitivity observed in cyclical industries like construction, and the smallest in mining and utilities. Very little of the market’s reaction can be attributed to policy’s effects on the real rate of interest or future dividends, however. Instead, most of the response of the current excess return on equities can be traced to policy’s impact on expected future excess returns. JEL codes: E44, G12. 1 Introduction The reaction of the stock market to monetary policy is clearly a topic of intense interest both to market participants and policymakers. Those holding equities would obviously like to know how possible Federal Reserve actions might affect the value of their portfolios. Similarly, an estimate of the likely effect of policy on asset prices is an important ingredient in assessing the transmission of monetary policy through the “wealth effect.” The size of and of Governors of the Federal Reserve System and Princeton University (Bernanke) and Federal Reserve Bank of New York (Kuttner). Correspondence to Ken Kuttner, Federal Reserve Bank of New York, 33 Liberty Street, New York, NY 10045, e-mail kenneth.kuttner@ny...

Words: 10553 - Pages: 43

Premium Essay

Minimizing Risk in the Ghanaian Banking Sector

...21, 2005 *Doron Avramov is from the University of Maryland, e.mail:davramov@rhsmith.umd.edu, Tel: 301405-0400, and Russ Wermers is from the University of Maryland, e.mail: rwermers@rhsmith.umd.edu, Tel: 301-405-0572. We thank seminar participants at Copenhagen Business School, George Washington University, Inquire-UK and Inquire-Europe Joint Spring Conference, Institute for Advanced Studies (Vienna), Stockholm Institute for Financial Research (SIFR), Tel Aviv University, University of Manitoba, University of Toronto, Washington University at St. Louis, and especially an anonymous referee, for useful comments. All errors are ours. Investing in Mutual Funds when Returns are Predictable Abstract This paper analyzes the performance of portfolio strategies that invest in noload, open-end U.S. domestic equity mutual funds, incorporating predictability in (i) manager skills, (ii) fund risk-loadings, and (iii) benchmark returns. Predictability in manager skills is found to be the dominant source of investment profitability – long-only strategies that incorporate such predictability considerably outperform prior-documented “hot-hands” and “smart-money” strategies, and generate positive and significant performance with respect to the Fama-French and momentum benchmarks. Specifically, these strategies outperform their benchmarks by 2-4% per year through their ability to time industries over the business cycle. Moreover, they choose individual funds that outperform their industry benchmarks...

Words: 25395 - Pages: 102