Support Vector Machines

Support Vector Machines

Operations Management Project Report by-

Suryansh Kapoor

PGPM (2011 – 2013)

11P171

Section – ‘C’

Supervised by- Prof. Manoj Srivastava

Abstract

In today’s highly competitive world markets, high reliability plays increasingly important role in the modern manufacturing industry. Accurate reliability predictions enable companies to make informed decisions when choosing among competing designs or architecture proposals. This is all the more important in case of specialized fields where operations management is a necessary requirement. Therefore, predicting machine reliability is necessary in order to execute predictive maintenance, which has reported benefits include reduced downtime, lower maintenance costs, and reduction of unexpected catastrophic failures. Here, the role of Support Vector Machines or SVMs comes in to predict the reliability of the necessary equipment. SVMs are cited by various sources in the field of medical researches⁶ and other non-mining fields¹ to be better than other classifying methods like Monte-Carlo simulation etc. because SVM models have nonlinear mapping capabilities, and so can more easily capture reliability data patterns than can other models. The SVM model minimizes structural risk rather than minimizing training errors improves the generalization ability of the models.

Contents

1. Objective

2. Literature Review * Introduction of Reliability

* Introduction of Support Vector Machine

3. Case Study

4. Conclusion

5. References

Objective –

The objective of this project is to give a general and temporary overview of the SVM algorithms and the concepts and methodology associated with it, as given by various sources²̕³ in different fields. Basically, the specific methodology adopted by one source´ is given to be understood and its subsequent application to the field of mining engineering is the primary objective.

Literature review –

Introduction to reliability -

Reliability may be defined in several ways:

* The idea that something is fit for a purpose with respect to time;

* The capacity of a device or system to perform as designed;

* The resistance to failure of a device or system;

* The ability of a device or system to perform a required function under stated conditions for a specified period of time

* The probability that a functional unit will perform its required function for a specified interval under stated conditions.

* The ability of something to "fail well" (fail without catastrophic consequences)

Reliability analysis relies heavily on statistics, probability theory, and reliability theory. Many engineering techniques are used in reliability analysis, such as reliability prediction, Weibull analysis, thermal management, reliability testing and SVMs. Because of the large number of reliability techniques, their expense, and the varying degrees of reliability required for different situations, most projects develop a reliability program plan to specify the reliability tasks that will be performed for that specific system.

The purpose of reliability testing is to discover potential problems with the design as early as possible and, ultimately, provide confidence that the system meets its reliability requirements.

Reliability testing may be performed at several levels. Complex systems may be tested at component, circuit board, unit, assembly, subsystem and system levels. (The test level nomenclature varies among applications.) For example, performing environmental stress screening tests at lower levels, such as piece parts or small assemblies, catches problems before they cause failures at higher levels. Testing proceeds during each level of integration through full-up system testing, developmental testing, and operational testing, thereby reducing program risk. System reliability is calculated at each test level. Reliability growth techniques and failure reporting, analysis and corrective active systems (FRACAS) are often employed to improve reliability as testing progresses.

Introduction to Support Vector Machine⁴ –

Support vector machines (SVMs) are a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. The original SVM algorithm was invented by Vladimir Vapnik and the current standard incarnation (soft margin) was proposed by Corinna Cortes and Vladimir Vapnik. The standard SVM is a non-probabilistic binary linear classifier, i.e. it predicts, for each given input, which of two possible classes the input is a member of. Since an SVM is a classifier, then given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that predicts whether a new example falls into one category or the other. Intuitively, an SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.

More formally, a support vector machine constructs a hyperplane or set of hyperplanes in a high or infinite dimensional space, which can be used for classification, regression or other tasks. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the nearest training data points of any class (so-called functional margin), since in general the larger the margin the lower the generalization error of the classifier.

Whereas the original problem may be stated in a finite dimensional space, it often happens that in that space the sets to be discriminated are not linearly separable. For this reason it was proposed that the original finite dimensional space be mapped into a much higher dimensional space presumably making the separation easier in that space. SVM schemes use a mapping into a larger space so that cross products may be computed easily in terms of the variables in the original space making the computational load reasonable. The cross products in the larger space are defined in terms of a kernel function K(x,y) which can be selected to suit the problem. The hyperplanes in the large space are defined as the set of points whose cross product with a vector in that space is constant. The vectors defining the hyperplanes can be chosen to be linear combinations with parameters αi of images of feature vectors which occur in the data base. With this choice of a hyperplane the points x in the feature space which are mapped into the hyperplane are defined by the relation:

∑ αiK(xi,x) = constant i

Formalisation´ –

We are given some training data , a set of n points of the form

where the ci is either 1 or −1, indicating the class to which the point belongs. Each is a p-dimensional real vector. We want

to find the maximum-margin hyperplane that divides the points having ci = 1 from those having ci = − 1. Any hyperplane can be written as the set of points | satisfying w.x – b = 0 where | denotes the dot product. The vector | is a normal vector: it is | perpendicular to the hyperplane. The parameter | determines |

the offset of the hyperplane from the origin along the normal vector .

Maximum-margin hyperplane and margins for an SVM trained with samples from two classes. Samples on the margin are called the support vectors.

We want to choose the and b to maximize the margin, or distance between the parallel hyperplanes that are as far apart as

possible while still separating the data. These hyperplanes can be described by the equations

and

Note that if the training data are linearly separable, we can select the two hyperplanes of the margin in a way that there are no points between them and then try to maximize their distance. By using geometry, we find the distance between these two

hyperplanes is , so we want to minimize . As we also have to prevent data points falling into the margin, we add the following constraint: for each i either

of the first class

or

of the second.

This can be rewritten as:

We can put this together to get the optimization problem:

Minimize (in )

subject to (for any )

Primal form

The optimization problem presented in the preceding section is difficult to solve because it depends on ||w||, the norm of w, which involves a square root. Fortunately it is possible to alter the equation by substituting ||w|| with (the factor of 1/2 being used for mathematical convenience) without changing the solution (the minimum of the original and the modified equation have the same w and b). This is a quadratic programming (QP) optimization problem. More clearly:

Minimize (in )

subject to (for any )

One could be tempted to express the previous problem by means of non-negative Lagrange multipliers αi as

but this would be wrong. The reason is the following: suppose we can find a family of hyperplanes which divide the points; then all . Hence we could find the minimum by sending all αi to , and this minimum would be reached for

all the members of the family, not only for the best one which can be chosen solving the original problem.

Nevertheless the previous constrained problem can be expressed as

that is we look for a saddle point. In doing so all the points which can be separated as do not matter since we must set the corresponding αi to zero.

This problem can now be solved by standard quadratic programming techniques and programs. The solution can be expressed by terms of linear combination of the training vectors as

Only a few αi will be greater than zero. The corresponding are exactly the support vectors, which lie on the margin and satisfy

. From this one can derive that the support vectors also satisfy

which allows one to define the offset b. In practice, it is more robust to average over all NSV support vectors:

Dual form

Writing the classification rule in its unconstrained dual form reveals that the maximum margin hyperplane and therefore the classification task is only a function of the support vectors, the training data that lie on the margin.

Using the fact, that and substituting , one can show that the dual of the SVM reduces to the following optimization problem:

Maximize (in αi )

subject to (for any )

and to the constraint from the minimization in b

Here the kernel is defined by .

The α terms constitute a dual representation for the weight vector in terms of the training set:

Biased and unbiased hyperplanes

For simplicity reasons, sometimes it is required that the hyperplane passes through the origin of the coordinate system. Such hyperplanes are called unbiased, whereas general hyperplanes not necessarily passing through the origin are called biased. An unbiased hyperplane can be enforced by setting b = 0 in the primal optimization problem. The corresponding dual is identical to the dual given above without the equality constraint

Non-linear classification

The original optimal hyperplane algorithm proposed by Vladimir Vapnik in 1963 was a linear classifier. However, in 1992, Bernhard Boser, Isabelle Guyon and Vapnik suggested a way to create non-linear classifiers by applying the kernel trick to maximum-margin hyperplanes. The resulting algorithm is formally similar, except that every dot product is replaced by a non-linear kernel function. This allows the algorithm to fit the maximum-margin hyperplane in a transformed feature space. The transformation may be non-linear and the transformed space high dimensional; thus though the classifier is a hyperplane in the high-dimensional feature space, it may be non-linear in the original input space.

If the kernel used is a Gaussian radial basis function, the corresponding feature space is a Hilbert space of infinite dimension. Maximum margin classifiers are well regularized, so the infinite dimension does not spoil the results. Some common kernels include,

* Polynomial (homogeneous)

* Polynomial (inhomogeneous):

* Gaussian or Radial Basis Function:

, for γ > 0. Sometimes parameterized using γ = 1 / 2σ2  Hyperbolic tangent: , for some
(not every) κ > 0 and c < 0

The kernel is related to the transform by the equation

. The value w is also in the transformed space, with Dot products with w for

classification can again be computed by the kernel trick, i.e.

. However, there does not in general exist a value w' such that

Parameter selection

The effectiveness of SVM depends on the selection of kernel, kernel's parameters and soft margin parameter C.

A common choice is a Gaussian kernel, which has a single parameter γ. Best combination of C and γ is often selected by a grid-search with exponentially growing sequences of C and γ, for example, ; . Each pair of parameters is checked using cross validation and

the parameters with best cross validation accuracy are picked. The final model, which is used for testing and classifying new data, is then trained on the whole training set using the selected parameters.

An Application -

The Load Haul Dump (LHD) is a principal equipment in the Mining Industry. Failure of a single part of LHD can cause downtime over the entire process of mineral transportation. It is well known that failure of equipment is indicated by some indicators. Therefore predicting LHD reliability is necessary in order to execute predictive maintenance.

In the given LHD system, model for each piece of machine should be consistent with the period in which it resides in. The life of machine is dividing into three parts: infant mortality, useful life and wear out period. Reliability is different in different part of the equipment life.

A Case Study –

Reliability prediction was done based on information on LHD in a mine.

The mine was situated in Central Coal Fields Limited. Mechanized bord and pillar mining was the method of operation.

The Details of specific district are –

Size of panel: 150 x 250 m

Size of Pillar: 25 x 25 m
Average Height of Gallery: 2.4m

Average Width of Gallery: 4.2m
No. of heading in the panel: 6

Average gradient of the seam: 1 in 20
Gassiness: Degree 1

The General conditions of the working environment of the LHD were as below –

Ambient Temperature: 35°-40° C

Humidity: 95-98%
Cross-Gradient: 1 in 25

Gradient: 1 in 20
Steering angle: 90°

Side Clearance: 1-1.5 m

Bottom Clearance: 0.3-0.4 m
Make of Water: Heavy

Airborne dust: Less as the seam is watery

Concepts used –

Based on the above SVM models we came to a system of equations to predict reliability of different parts of the LHD.

The system failure rate λ (t) = β/Θ (t/Θ) ^ (β-1) Where β = Shape Parameter

Θ = Scale Parameter t = time taken
Also we have data on the Mean Time between Failure

MTTF (η) =1/λ

Reliability R (t) = e ^ (-(t/Θ) ^β)
Also, R (t) = e ^ (-λt)

The following table shows the values taken and the reliability of the machine over a future period of 250 working hours –

Type of | β | Θ | η | λ (t) | R(t) | Failure | | | | | | Motor | 0.990 | 402.77 | 402.77 | 0.0025 | . | Hydraulic | 1.509 | 367.86 | 330.00 | 0.003 | 0.47 | System | | | | | | Brake | 1.034 | 303.92 | 299.80 | 0.0033 | . | Transmission | 1.034 | 256.44 | 252.90 | 0.0038 | . | Tyres | 1.005 | 298.37 | 297.70 | 0.00335 | . | Bucket | 1.017 | 172.37 | 171.15 | 0.0058 | . | Mechanism | | | | | | Trailing | 0999 | 329.66 | 329.66 | 0.003 | . | Cable | | | | | | Radiator | 1.059 | 296.64 | 289.90 | 0.0045 | . |

We have got the Reliability values of the of different parts of the LHD and forecasting taking the value of t as 250. The Probability of Failure is 1 - R (t)

Results Interpretation -

The following Graph shows the probability of failure of individual parts of the LHD over the next 250 working hours:-

| | Probability of Failure | | 0.8 | | | | | | | | | | 0.7 | | | | | | | | | | 0.6 | | | | | | | | | | 0.5 | | | | | | | | | | 0.4 | | | | | | | | | | 0.3 | | | | | | | | Probability of Failure | | 0.2 | | | | | | | | | | 0.1 | | | | | | | | | | 0 | | | | | | | | | |

Hence, we can infer from the graph that the bucket actuating mechanism is most probable to failure and is therefore least reliable in the LHD. Similarly, the Motor is least probable to fail and is most reliable.

Conclusion –

The method of reliability prediction based on SVM has been presented in this project. SVM models have non-linear mapping capabilities, and so can more easily capture reliability data patterns than can others models.

Effective prediction of equipment failure data helps machine maintenance staff and equipment designers to arrange the repair schedule in order to increase throughputs and productivity. An important outcome of this work is reliability prediction of LHD machine parts in its life period. This can help us understand the various parameters and subsequently improve upon the design of future machines to better our productivities across various fields in operations management.

References –

1. Feng Ding , Zhengjia He, Yanyang Zi , Xuefeng Chen , Jiyong Tan, Hongrui

Cao, Huaxin Chen, ―Application of Support Vector Machine for Equipment Reliability Forecasting‖, The IEEE International Conference, Industrial

Informatics, DCC, Daejeon, Korea, July 13- 16, 2008

2. Ping-Feng Pai , Wei-Chiang Hong, and Yu-Shen Lee, ―Determining Parameters of Support Vector Machines by Genetic Algorithms—Applications to Reliability Prediction‖, International Journal of Operations Research Vol. 2, No. 1, 1-7 (2005)

3. L.P. Wang, Support Vector Machines: Theory and Application. Springer, Berlin, 2005.

4. Hansen G, Sparrow E, Kokate JY, Leland KJ, Iaizzo PA. Wound status evaluation using color image processing. IEEE Transactions on Medical Imaging. 1997;16:78–86. doi: 10.1109/42.552057.

5. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC394338/

PDF to Word

Support Vector Machines

Similar Documents

Offiline Arabic Handwritten Character Recognizer Based on Feature Extraction and Support Vector Machine

Support Vecor Machine

Natural L Anguage P Rocessing (a Lmost ) from S Cratch

Bio and Electrocardiogram

Writer Adaptation for Handwriting Recognition in Hindi Language – a Survey

Svmdd

Fsr Technology

Ewfdwefwefwf

Assignment

Predicting Student Academic Performance in an Engineering Dynamics Course: a Comparison of Four Types of Predictive Mathematical Models

Rotordynamics

Data Mining In Computer Science

Comp Sci as

Control Architecture and Algorithms of Biped Robot

Machine Learning

Popular Essays