Free Essay

A Weighted Tag Similarity Measure Based on a Collaborative Weight Model

In:

Submitted By harispotters
Words 5886
Pages 24
A Weighted Tag Similarity Measure Based on a Collaborative Weight Model
G.R.J.Srinivas
Search and Information Extraction Lab, IIIT Hyderabad,


Niket Tandon



Vasudeva Varma
Search and Information Extraction Lab, IIIT Hyderabad, India

Max Planck Institute, Germany

ntandon@mpi-inf.mpg.de srinivasg@research.iiit.ac.in
India

vv@iiit.ac.in

ABSTRACT
The problem of measuring semantic relatedness between social tags remains largely open. Given the structure of social bookmarking systems, similarity measures need to be addressed from a social bookmarking systems perspective. We address the fundamental problem of weight model for tags over which every similarity measure is based. We propose a weight model for tagging systems that considers the user dimension unlike existing measures based on tag frequency. Visual analysis of tag clouds depicts that the proposed model provides intuitively better scores for weights than tag frequency. We also propose weighted similarity model that is conceptually different from the contemporary frequency based similarity measures. Based on the weighted similarity model, we present weighted variations of several existing measures like Dice and Cosine similarity measures. We evaluate the proposed similarity model using Spearman’s correlation coefficient, with WordNet as the gold standard. Our method achieves 20% improvement over the traditional similarity measures like dice and cosine similarity and also over the most recent tag similarity measures like mutual information with distributional aggregation. Finally, we show the practical effectiveness of the proposed weighted similarity measures by performing search over tagged documents using Social SimRank over a large real world dataset.

Keywords
Tagging, Vector Space Model, Tag weighting, Similarity Measures, Tag Similarity

1. INTRODUCTION
Social bookmarking systems like Delicious, Bibsonomy, CiteULike etc. have become extremely popular in recent years [10]. Users share resources by adding keywords in the form of tags, leading to the creation of an aggregated tag-index called folksonomy1 . This large amount of usergenerated content has created significant interest in the research communities to exploit the hidden semantics. Social bookmarking systems are built upon three dimensions: Resource, User and Tags. Existing models consider two out of the three dimensions i.e. resource and tags, and ignore the user dimension. Some rich information is lost due to the loss of user dimension. For example, when considering the relevance (rank) of a tag with respect to a document considering only overall frequencies(ignoring user dimension), results in assigning exceedingly high weights to some generic and uninformative tags like web2.0, Internet and during normalizing weights these highly generic tags push down important yet less frequent tags. This is clearly a drawback of existing weight models. We address the problem of loss of user dimension, first by, proposing a weight model for tags that does not account only tag frequency to provide weightage(importance) to a tag. The weight model is built upon vector space model with some variations. We observed that simple weighting approaches like TF-IDF do not work well for social bookmarking systems’ weight model, thereby making it a challenging task. Another topic of active research is, computing tag similarity that finds application in a wide range of applications like tag clustering, tag recommendation, query expansion, and semantic web amongst other applications. Several methods of computing similarities using ontological resources like WordNet have been proposed [3, 12, 13, 17]. However, these approaches cannot be applied for folksonomies. When users are free to choose tags, the resulting metadata can include homonyms, synonyms about the subject. The terms in a folksonomy may have inherent ambiguity as different users apply terms to documents in different ways. Folksonomies A folksonomy is a system of classification derived from the method of collaboratively tagging resources with descriptive strings, called tags to annotate and categorize content.
1

Categories and Subject Descriptors
H.3 [Information Storage and Retrieval]: Miscellaneous

General Terms
Experimentation, Measurement ∗International Institute of Information Technology, Hyderabad †Most of the work has been done when the author was working in IIIT Hyderabad

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SMUC’10, October 30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM 978-1-4503-0386-6/10/10 ...$10.00.

provide for no synonym control; for example, the tags mac, macintosh, and apple are used to describe Apple Macintosh Computers. Similarly, both singular and plural forms of terms appear (e.g., flower and flowers), thus creating a number of redundant tags. In addition, as most of the tagging systems do not allow word separators, many users use compound tags (combinations of words) for tagging resources. Such uncontrolled vocabularies lead to ambiguity, polysemy and basic level variation [8, 18]. Hence, these ontology based similarity measures cannot be applied directly to folksonomies. There are also some existing approaches for extracting tag similarities from folksonomies in the literature. The distribution of tag co-occurrence frequencies has been investigated by Cattuto et.al. in [6]. In [19], Zhang et.al. infer some global semantics from a folksonomy by applying some statistical methods. In [8, 9], Golder, Halpin et.al. have performed extensive analysis to infer global semantics from folksonomies. In [15], Mohammad and Hirst have concluded that distributional measures can easily provide domain specific similarity measures for a large number of domains. In [4, 14], Hotho, Stumme et.al. extended some of the traditional similarity methods of finding semantic relatedness to folksonomy. Majority of the approaches mentioned above consider the frequency of co-occurrence of tags for computing similarity. These approaches suffer from assigning high relatedness values to extremely generic terms and low relatedness values to relevant specialized terms. Consider a document about The future of videos, some of the tags assigned are video, f uture, model, toread . . . . Note here that, some users tend to give self-organisation tags like toread. Now, consider a query expansion task, where the query is ”programming model”. For the expansion task, the similarity of ’video’ is computed with the remaining tags including similarity (video, toread). Existing similarity measures are directly proportional to the number of co-occurrences of the two tags. Here, the similarity value accumulates 1’s in the numerator, thereby giving a higher value to sim(video, toread). This is clearly not intended. The apparent problem occurs because the existing similarity measures consider cooccurrence whereas the two tags have different relevance to the document. We address these problems using our approach by proposing a concept of weighted similarity. The weighted model considers the weights of tags in calculating similarities instead of frequency of co-occurrence. Consider the previous example, assume we have the weights to the tags: video:1.0,P toread:0.0, a weighted co-occurrence value is proportional [weight(video) ∗ weight(toread)] i.e. over all co-occurrences of ’video’ and ’toread’. This approach gives a very low co-occurence weights to these tags, hence the similarity measure is less, which is desirable. We find that the proposed weighted similarity measures perform better than the existing measures. We use extensive evaluation to show the effectiveness of the proposed weighted similarity, based on the weight model we present. Further, we demonstrate practical advantages of our weight model in tag visualization and show effectiveness over existing frequency based tag clouds. The weighted similarity measures proposed find its use in several applications like tag clustering, query expansion, tag recommendation, semantic search. Over a large real world dataset, we demonstrate more than two folds improvement in precision while searching tagged

documents using Social SimRank that uses the weighted cooccurrence. The remainder of this paper is organized as follows. Section 2 discusses the formal folksonomy model. Section 3 explains the proposed weight model for tagging systems. Section 4 presents our weighted similarity measure concept. Section 5 presents the experimental setup and results over different benchmarks and applications like social search. Finally section 6 provides concluding remarks followed by future work.

2. FOLKSONOMY MODEL
We use the formal definition of a folksonomy provided by Hotho et.al. in [11]. Formal Definition: A folksonomy is formally defined as a tuple F := (U, T, R, Y) where U, T, and R are finite sets, whose elements are users, tags, resources and Y is a ternary relation between them i.e. Y ⊆ U × T × R. A post is a triple (u, Tur , r) where u ∈ U, r ∈ R and a non empty set Tur := {t ∈ T | (u, t, r) ∈ Y}.

Figure 1: Example of a folksonomy A folksonomy can be represented as a network shown in the figure 1. In this example there are 3 users, 3 resources and 4 tags. Each dot in the figure represents an annotation (tag posting). In this example user1 annotated goal.com with the tag football.

3.

WEIGHT MODEL FOR TAGS IN A FOLKSONOMY

Our weight model for tags is based on Vector Space Model (VSM). In VSM, individual documents are represented as vectors in term space. Terms are words, phrases, or any other indexing units used to identify the content of a text. In case of folksonomy, we consider tags as terms. We represent resources as vectors in tag space. Since different terms have varying importance in a text, an importance indicator, term weight is associated with every term. The term weighting scheme plays an important role for the similarity measures.

According to [16] the weight of a term is calculated using the following formula. aij = gi ∗ tij ∗ dj th (1)

Weighting method TF - IDF TF’ - IDF TF’

Formula (tf in d) ∗ gt (tf in d)/dl ∗ gt (tf in d)/dl

Where gi is the global weight of the i term, tij is the local weight of the ith term in the jth document; dj is the normalization factor for the jth document. There are three components in a tag weighting model: wtd = gt ∗ ltd ∗ nd (2)

Table 1: Weighting methods used in TRU

cosine normalization which is computed using the formula. 1 nd = p n Σi=1 (gti ∗ lti d )2 where n is the number of terms in the document d ti is the ith term in the document. lti d is the local weight of the term ti in the document. gti is the global weight of the term ti in the corpus. (4)

Where t ∈ Tr , wtd is the weight of tag t with respect to a document, gt is the global weight of the tag, ltd is the local weight of the tag in the document d, nd is the normalization factor of the document d. Let us visit the three components one by one, with their context into tagging systems:

3.0.1 Local weight
Local weight depends only on the frequencies within the document and not on inter-document frequencies. In case of tagging, a single user will not repeat exactly the same tag to a resource. In many cases, we cannot see duplicates in the tags given by a single user to a particular resource. However, we have observed that users tend to give some morphological variations of the words as tags. For example, consider the following set of tags given by a user for a resource. Mathematics, algorithms, math, matrix, multiplication, parallel, maths, optimization. Here both math and maths are same. It indicates the importance of the tag math to that resource. Hence, we have performed stemming during preprocessing. We have experimented with two variants of term frequency for local weighting. Simple term frequency (tf) and normalized term frequency (TF’) which is calculated as shown in formula 3 . tf tf ′ = (3) document length(dl) We chose simple term frequency based weighting as it will not be large in case of tagging. We chose normalized term frequency as some users tend to give too many tags to a resource.

3.1 Proposed Tag Weighting Model
We have computed weights of tags in folksonomies using two different models. These two models differ in the perspective of a document. We name these two models as TagResource-User(TRU) model and Tag-Resource(TR) model.

3.1.1 Tag-Resource-User(TRU) model
In this model, we consider tags at user level. So we named it Tag-Resource-User (TRU) model. We consider the set of tags (Tur ) given by a user to a resource as a document. Each tag given by the user is considered as a term in the document. We consider all the posts associated with a resource as a collection of documents (corpus) for computing global weight. In this model, we calculate global weight using formula 5. gtr = log |Ur | |{u ∈ Ur |(u, t, r) ∈ Y }| (5)

3.0.2 Global weight
Global weighting tries to give a discriminative value to each term in the corpus. It is used to place emphasis on terms that are discriminating based on the dispersion of a particular term in the corpus. Many schemes are based on the idea that the less frequently a term appears in the whole collection, the more discriminating it is. This is particularly true in the case of tags, because tags are usually generic in nature. But there are tags which are more into detail, and hence more discriminating, and are generally less frequent.

where Ur is the set of users who have annotated resource r. We compute the weight of a tag w.r.t each document using the weighting formulae listed in table 1. Then to obtain the weight of a tag w.r.t a resource wtr , we add the weights of the tag obtained across all the users who tagged the resource as shown in equation 6. wtr = Σu∈Ur wtur (6)

Then we normalize the weights of tags using cosine normalization.

3.1.2 Tag-Resource(TR) model
In this model, we consider tags at the resource level. So we named it as Tag-Resource (TR) model. We consider the set of tags (Tr ) given by all the users to a resource as a document. The tags given by the users are considered as terms of the document. In this model, we calculate global weight using the following formula.

3.0.3 Normalization
The third component of the weighting scheme is the normalization factor, which is used to correct discrepancies in document lengths. E.g. In case of tagging systems, a resource that has been given more tags, will be favoured if weights are not normalized. Since it is not always true that a resource that has been given more number of tags is more relevant than a resource with lesser number of tags. Hence, it is useful to normalize the document vectors so that documents are not favoured based on their lengths. We have used

gt = log

|R| |{r ∈ R|(u, t, r) ∈ Y }|

(7)

We compute the weight of a tag in each document using the weighting formulae listed in table 2. We compute wrt which gives the weight of tag t for the resource r for all the resources and tags. This will be used in

Weighting method TF - IDF TF’ - IDF TF’

Formula (tf in d) ∗ gt ∗ nd (tf in d)/dl ∗ gt ∗ nd (tf in d)/dl

cricinfo.com goal.com cnn.com

magazine 0.5 0 0.4

football 0 1 0.2

sport 0 0 0

news 0.5 0 0.4

Table 2: Weighting methods used in TR magazine 0.41 0 0 football 0 0.41 0 sport 0 0.41 0 news 0 0 0.41

Table 5: Weights using TR(TF-IDF) model

cricinfo.com goal.com cnn.com

Table 3: Weights using TRU(TF-IDF) model according to user1’s annotations

magazine football sport news

magazine 0.5 0.8 1.00

football 0.5 0.8 0.5

sport 0.8 0.8 0.8

news 1.00 0.5 0.8 -

Table 6: Similarities using Dice similarity

computing similarity between pairs of tags. We have experimented with different variations of weighting in TRU and TR assumptions of a document as listed in tables 1 and 2. We show the weights obtained using the above weight models for the tags in the folksonomy depicted in the figure 1. Using TRU(TF-IDF) model, we get the weights shown in table 3 for the tags given by user1 to the three URLs. Similarly, we compute the weights of the tags given by user2 and 3. Then we get the final weights as shown in table 4 after aggregating across all users. Using TR(TF-IDF) model, we have obtained the weights shown in table 5 for all the tags w.r.t the three URLs. In this case sport got a weight of 0 because global weight becomes zero. In this example, sport is related to all the URLs. However in real world scenarios, a single tag cannot be used for all the resources because the information content of the tag is low. Such tags are not useful for practical applications like query suggestion, ranking and other applications. Thus global weight penalizes very frequent tags, but as a downside, it also penalizes some of the important terms that occur in every document. Along with these variations of tag weighting models, we have also experimented with a machine learning (ML) approach to obtain weights.

features. These different variations of weighting techniques get the importance of a tag for the resource. Then we have used the probability of a tag belonging to the relevant class as weight of the tag w.r.t resource.

4. SIMILARITY MEASURES
In this section we define some of the existing similarity measures and also our weighted similarity model. We have compared the similarities obtained using our model with dice, cosine and mutual information with distributional aggregation. According to [14] mutual information with distributional aggregation is the best performing method. Dice and cosine are some of the best corpus based measures. Hence, we have considered these measures as baselines to compare our model. In this section, we first define the baselines we have considered and then we define our weighted similarity measures. We will use the following notations throughout the paper. σ(t1 , t2 ) is used to denote the similarity of pair of tags t1 and t2 . ti is used to denote a tag. Ti is the set of resources tagged with ti . |Ti | is the cardinality of the set of resources Ti .

3.1.3 Tag weighting using ML Approaches
In addition to the weighting models mentioned in sections 3.1.1 and 3.1.2, we have also experimented with machine learning approaches for tag weighting. We view the problem of weighting tag-resource annotation as a one class classification problem. The probability of the annotation being relevant is considered as the weight of the tag w.r.t the resource. We have classified the tags as relevant/non relevant using different classification algorithms like Adaboost, LibSVM, RandomForest etc. We have trained the classifiers using frequency(tf) and the weights obtained using the formulae given in table 1,2 in the TRU and TR models as magazine 0.51 0 0.42 football 0 0.71 0.57 sport 0.69 0.71 0.57 news 0.51 0 0.42

4.1 Dice Similarity
Dice similarity for two sets X and Y is defined as sim = 2|X ∩ Y | |X| + |Y | (8)

Similarly, in case of folksonomies we have computed dice similarity of pair of tags using the following formula. σ(t1 , t2 ) = 2 ∗ |T1 ∩ T2 | |T1 | + |T2 | (9)

cricinfo.com goal.com cnn.com

For the tags the folksonomy shown in figure 1 we obtain the similarities shown in table 6 using dice. The simple example in table 6 explains the similarity measure using a dice similarity measure. The value of Sim dice (Football, sport) is relatively higher than Sim dice (Football, news).

Table 4: Weights using TRU(TF-IDF) model

magazine football sport news

magazine 0.5 0.82 1.00

football 0.5 0.82 0.5

sport 0.82 0.82 0.82

news 1.00 0.5 0.82 -

magazine football sport news

magazine 0.09 0.28 0.63

football 0.09 0.51 0.14

sport 0.28 0.51 0.29

news 0.63 0.14 0.29 -

Table 7: Similarities using Cosine similarity magazine 0.68 1.35 1.34 football 0.68 1.06 0.68 sport 1.35 1.06 1.35 news 1.34 0.68 1.35 -

Table 9: Similarities using Weighted Dice(TF’) magazine 0.72 1.3 1.16 football 0.72 1.2 0.72 sport 1.3 1.2 1.3 news 1.16 0.72 1.3 -

magazine football sport news

magazine football sport news

Table 8: Similarities using MI

Table 10: Similarities using Weighted MI(TF’)

It indicates that football and sport are more related compared to football and news. In this example, (football, sport) and (magazine,sport) are given the same similarity values. However (football, sport) are more related when compared to (magazine,sport).

then the weighted co-occurrence for that document becomes zero, this is the lower bound.

4.4.1 Weighted Dice Similarity
We propose a modified version of Dice Similarity which uses the weight of a tag w.r.t a resource in computing the similarities of tag pairs. We consider the association of a tag to a resource as fuzzy relation where the value of association is the weight of the tag w.r.t resource. We define the weighted dice similarity as sim(t1 , t2 ) = Σr∈T1 ∩T2 wt1 r ∗ wt2 r Σr1 ∈T1 wt1 r1 + Σr2 ∈T2 wt2 r2 (13)

4.2 Cosine Similarity
Cosine similarity for two tags t1 ,t2 is defined as |T1 ∩ T2 | σ(t1 , t2 ) = p |T1 |.|T2 | (10)

For the tags the folksonomy shown in figure 1 we obtain the similarities shown in table 7 using cosine similarity. This measure also faces the same problems mentioned in dice similarity (section 4.1).

4.3 Distributional Mutual Information
According to [14] mutual information using distributional aggregation for a folksonomy is computed as σ(t1 , t2 ) = Σr1 ∈T1 Σr2 ∈T2 p(r1 , r2 ) log where p(r) = Σt min(wtr1 , wtr2 ) Σt wtr , p(r1 , r2 ) = Σt,r wtr Σt,r wtr (12) p(r1 , r2 ) p(r1 )p(r2 ) (11)

Table 9 gives similarity measures of the tag pairs computed using weighted dice similarity with normalized term frequency weighting. In this case, (football, sport) is given more similarity value when compared to (magazine,sport).

4.4.2 Weighted Mutual Information
We have also evaluated the impact of our weights in case of distributional mutual information. In case of weighted mutual information(weighted MI) we use the weights wtr obtained using our weight model in the formula 11. Table 10 shows the similarities of the pairs obtained using weighted MI with weights obtained using TR(TF-IDF) model.

For the tags in figure 1 we obtain the similarities shown in table 8 using Mutual Information. In this example, sim (sport, news) is the same as sim (magazine, news) which shouldnt be.

5. EVALUATION
In this section, we first describe the data used for our experiments.

5.1 Data collection
We have used a publicly available crawl of Delicious2 provided by DAI-Labor3 . This dataset contains all public bookmarks of about 950,000 users retrieved from del.icio.us between December 2007 and April 2008. The retrieval process resulted in about 132 million bookmarks or 420 million tag assignments that were posted between September 2003 and December 2007. For reasons of tractability, we randomly chose a smaller subset of 100 URLs from this dataset for our experiments. The subset contains 39,632 users, 100 urls, 7,495 tags and 190,724 tag assignments.
2 3

4.4 Proposed Model - Weighted Similarity Measures
The similarity measures discussed i.e. dice, cosine and mutual information give higher similarity values to tag pairs proportional to the co-occurrence count. This is not desirable as depicted in the example on sim(′ video′ ,′ toread′ ) in Section 1. We compute the upper and lower bound values of the weighted co-occurrence. Consider two tags a, b whose similarity we want. If a, b are both completely relevant to a document then weighted co-occurrence for that document is upper bounded by a weighted co-occurrence of 1. Whereas, if one of the tags is irrelevant to the document,

http://www.delicious.com http://www.dai-labor.de

5.1.1 Labelled Data for ML Approaches
For training and testing, we randomly chose url-tag tuples among the 420 million tuples. The tags of these tuples were then labelled manually as either relevant or irrelevant. The number of labelled examples is 717. We experimented with several learning algorithms, including RBF-kernel support vector machines as implemented in LIBSVM [7], Random Forest and Adaboost amongst others. In Section 5.2.1, we report and evaluate results obtained with alternative algorithms. For evaluation, we rely on 10-fold leave-one-out cross-validation, where the set of labelled examples is randomly partitioned into 10 equal-size parts, and then an average score is computed over 10 runs. In each run, a different part is reserved for testing, and the remaining 9 parts are used as the training set.

Figure 2: Tag cloud based on Frequency

5.2 Evaluation of Tag Weighting approaches
5.2.1 Weight Model Accuracy
Table 11 gives the cross validation results for the weights obtained using different machine learning approaches like SVM, Random Forest, Adaboost etc. These results indicate that the Adaboost is the best performing approach in case of tag weighting. Classifier SVM J48 BF-Tree Random-Forest Adaboost Table 11: learned. Precision 0.63 0.643 0.69 0.624 0.685 Recall 0.455 0.507 0.425 0.54 0.466 F1 -Measure 0.528 0.567 0.526 0.579 0.555 ROC 0.655 0.655 0.669 0.653 0.684 Figure 3: Tag cloud based on the proposed Model

Cross validation results for weights

5.2.2 Visual Analysis
Popular tag visualization techniques like Tag Clouds, weigh tags in the visualization (e.g. cloud) based on the frequency of the tags. We use a modified tag cloud based on weights from our weight model thereby assigning relevance weightage instead of frequency. We compare the tag clouds for the url 4 in Figure 2,3. Tags in existing techniques assign higher weights to generic tags like web2.0 , Internet. These tags being uninformative, and not supportive during tag based search. Our weight model is able to penalize the high frequency terms that are not relevant to the url post.

5.3 Evaluation of Similarity Measures
There are two ways of evaluating tag similarity measures. One way of evaluating is having a two ranked lists of word pairs with two different similarity measures and obtaining the correlation between them using standard correlation coefficients. Another way is doing an indirect form of evaluation by the performance of these similarity measures in tasks like automatic spelling correction, word sense disambiguation etc. We used the first way of evaluation. WordNet 5 is a semantic lexicon of the English language. There are a number of semantic relatedness measures based on WordNet. We have used the evaluation method proposed in [5]
4 http://37signals.com/svn/archives2/dont_scale_ 99999_uptime_is_for_walmart.php 5 http://wordnetweb.princeton.edu/

using WordNet. According to [3] the method proposed by Jiang and Conrath[12] performs the best amongst the wordnet based measures. We obtained the semantic relatedness measure of pairs using Jiang-Conrath distance and considered it as a gold standard. Then, we have obtained a ranked list of 2000 tag pairs according to the Jiang-Conrath distance. We have also obtained the similarities of those pairs of tags using dice, MI and weighted MI and ranked them. We have used Spearman’s rank correlation coefficient and Kendall tau rank correlation coefficient for calculating the correlation between the ranked pairs. For computing Kendall tau we have used the efficient implementation of Knight’s O(N logN ) algorithm by [2]. Figure 4 depicts the Kendall’s τ correlation and Spearman’s ρ correlation coefficient between each measure and the WordNet reference. We have also compared our weighted dice similarity measure with dice, cosine and mutual information similarity measures. The correlation coefficients give a measure of the association between the rankings given by any similarity measure and the gold standard i.e. WordNet. From the results shown in the figure 4, weighted dice similarity with normalized term frequency is correlating well with the gold standard better than other measures. It is the best performing method among the existing similarity measures. We have also evaluated our weighted similarity measures by on their performance in tag search.

Figure 4: Kendall’s and Spearman correlation coefficient comparison for all Sim measures. TRU sim measure with TF’ achieves the highest co-efficients, signifying the best performance.

Figure 5: Precision values at varying thresholds, for Dice and Weighted Dice

5.3.1 Evaluation of similarity measures in Search performance
Let q = q1 , q2 , . . . , qn be a query that consists of n query terms and A(p) = a1 , a2 , . . . , am be the annotation set of web page p, Equation 14 shows the similarity calculation based on the Social SimRank proposed by Shenghua Bao et.al. in [1]. n m XX i=1 j=1

simSSR (q, p) =

SA (qi , aj )

(14)

Finding a good set of queries and relevant results for them is not an easy task. We used the approach by Shenghua Bao in [1] to use DMOZ categories as global ground truth. We had 10 queries and relevant documents related to these queries. But, this is not sufficient to compute precision, so we solve this problem by injecting irrelevant documents to this set. Consider a query q1 , we find two queries q2 and q3 that are most unrelated to q1 through manual inspection. The set of documents related to q2 and q3 are irrelevant to q1 . Next, we compute ranking scores based on Social SimRank, Eq:14. We compute this score using a Dice Similarity Measure and using our Weighted Dice Similarity. In order to compare the results, at different settings of threshold of SimRank score, we compute the Precision values. Figure 5 clearly shows the high precision obtained by Weighted Dice Similarity, outperforming the Dice Similarity. Figure 6 depicts the high F1 −M easure obtained by Weighted Dice Similarity, outperforming the Dice Similarity. Further, three different users manually rank the top 10 results of 10 queries. In order to check the correlation of the ranking order of the Dice and Weighted Dice Similarity measure, we compute the average Kendall’s value over all the ten queries. Figure 7 clearly shows that for majority of the queries, Weighted Dice Similarity outperforms Dice Similarity and comes closer to manual rankings.

Figure 6: F-Measure values at varying thresholds, for Dice and Weighted Dice

6. CONCLUSION AND FUTURE WORK
We have proposed a weight model for tags in a folksonomy. Among the variants of weight models, Normalized Term Frequency with Tag Resource User(TRU) model is the best performing model. We showed the application of our weight model in tag visualization achieving more intuitive tag cloud than frequency based tag clouds. We introduced the concept of weighted similarity, and proposed similarity measures extending the traditional similarity measures using weighted similarity concept. The proposed similarity measure outperforms the existing similarity measures on metrics like Kendall correlation, Spearman correlation coefficient and gives impressive results on search using Social SimRank over a large real world dataset. As a further extension of our work, we plan to explore the effectiveness of our weighted similarity model in applications like tag clustering, tag recommendation, resource similarity etc..

Figure 7: Kendall Coefficient values for Dice and Weighted Dice for search rankings when compared to human evaluated ranking.

7.

REFERENCES

[1] S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su. Optimizing web search using social annotations. In Proceedings of the 16th international conference on World Wide Web, pages 501–510. ACM, 2007. [2] P. Boldi, M. Santini, and S. Vigna. Do your worst to make the best: Paradoxical effects in pagerank incremental computations. In Proceedings of the third Workshop on Web Graphs (WAW), volume 3243 of Lecture Notes in Computer Science, pages 168–180. Springer, 2004. [3] E. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32:13–47, 2006. [4] C. Cattuto, D. Benz, A. Hotho, and G. Stumme. Semantic analysis of tag similarity measures in collaborative tagging systems. In Proceedings of the 3rd Workshop on Ontology Learning and Population (OLP3), pages 39–43, Patras, Greece, July 2008. [5] C. Cattuto, D. Benz, A. Hotho, and G. Stumme. Semantic grounding of tag relatedness in social bookmarking systems. In ISWC ’08: Proceedings of the 7th International Conference on The Semantic Web, pages 615–631, Berlin, Heidelberg, 2008. Springer-Verlag. [6] C. Cattuto, V. Loreto, and L. Pietronero. Semiotic dynamics and collaborative tagging. Proceedings of the National Academy of Sciences (PNAS), 104(5):1461–1464, January 2007.

[7] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. [8] S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Science, 32(2):198–208, Aug. 2005. [9] H. Halpin, V. Robu, and H. Shepard. The dynamics and semantics of collaborative tagging. In Proceedings of the 1st Semantic Authoring and Annotation Workshop (SAAW’06), 2006. [10] T. Hammond, T. Hannay, B. Lund, and J. Scott. Social bookmarking tools (i). D-Lib Magazine, 2005. [11] A. Hotho, R. Jaschke, C. Schmitz, and G. Stumme. Information retrieval in folksonomies: Search and ranking. In Y. Sure and J. Domingue, editors, The Semantic Web: Research and Applications, volume 4011 of Lecture Notes in Computer Science, pages 411–426, Heidelberg, June 2006. Springer. [12] J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proc. of the Int’l. Conf. on Research in Computational Linguistics, pages 19–33, 1997. [13] D. Lin. An information-theoretic definition of similarity. In In Proceedings of the 15th International Conference on Machine Learning, pages 296–304. Morgan Kaufmann, 1998. [14] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. Evaluating similarity measures for emergent semantics of social tagging. In WWW ’09: Proceedings of the 18th international conference on World wide web, pages 641–650, New York, NY, USA, 2009. ACM. [15] S. Mohammad and G. Hirst. Distributional measures as proxies for semantic relatedness. Submitted for publication, 2005. [16] N. Polettini. The vector space model in information retrieval- term weighting problem, 2004. [17] P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, 1995. [18] Spiteri and F. Louise. Structure and form of folksonomy tags: The road to the public library catalogue, volume 4, chapter 2. Webology, 2007. [19] L. Zhang, X. Wu, and Y. Yu. Emergent semantics from folksonomies: A quantitative study. Journal on Data Semantics, VI:168–186, 2006.

Similar Documents

Free Essay

Tv Recommendation

...A GRAPH BASED COLLABORATIVE AND CONTEXT AWARE RECOMMENDATION SYSTEM FOR TV PROGRAMS A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY BY EMRAH ŞAMDAN IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER ENGINEERING SEPTEMBER 2014 Approval of the thesis: A GRAPH BASED COLLABORATIVE AND CONTEXT AWARE RECOMMENDATION SYSTEM FOR TV PROGRAMS submitted by EMRAH ŞAMDAN in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Department, Middle East Technical University by, Prof. Dr. Canan Özgen Dean, Graduate School of Natural and Applied Sciences _____________ Prof. Dr. Adnan Yazıcı Head of Department, Computer Engineering _____________ Prof. Dr. Nihan Kesim Çiçekli Supervisor, Computer Engineering Dept, METU _____________ Examining Committee Members: Prof. Dr. Ferda Nur Alpaslan Computer Engineering Dept., METU _____________ Prof. Dr. Nihan Kesim Çiçekli Computer Engineering Dept., METU _____________ Prof. Dr. Ali Doğru Computer Engineering Dept., METU _____________ Prof. Dr. Ahmet Coşar Computer Engineering Dept., METU _____________ M.Sc. Deniz Kaya Arçelik A.Ş. _____________ Date: 05.09.2014 I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare ...

Words: 13697 - Pages: 55

Premium Essay

Dataminig

...Data Mining Third Edition This page intentionally left blank Data Mining Practical Machine Learning Tools and Techniques Third Edition Ian H. Witten Eibe Frank Mark A. Hall AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann Publishers is an imprint of Elsevier Morgan Kaufmann Publishers is an imprint of Elsevier 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA This book is printed on acid-free paper. Copyright © 2011 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must...

Words: 194698 - Pages: 779

Free Essay

Crf Klinger Tomanek

...Classical Probabilistic Models and Conditional Random Fields Roman Klinger Katrin Tomanek Algorithm Engineering Report TR07-2-013 December 2007 ISSN 1864-4503 Faculty of Computer Science Algorithm Engineering (Ls11) 44221 Dortmund / Germany http://ls11-www.cs.uni-dortmund.de/ Classical Probabilistic Models and Conditional Random Fields Roman Klinger∗ Katrin Tomanek∗ Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) Schloss Birlinghoven 53754 Sankt Augustin, Germany Jena University Language & Information Engineering (JULIE) Lab F¨rstengraben 30 u 07743 Jena, Germany Dortmund University of Technology Department of Computer Science Chair of Algorithm Engineering (Ls XI) 44221 Dortmund, Germany katrin.tomanek@uni-jena.de roman.klinger@scai.fhg.de roman.klinger@udo.edu Contents 1 Introduction 2 2 Probabilistic Models 2.1 Na¨ Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ıve 2.2 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Maximum Entropy Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 5 6 3 Graphical Representation 10 3.1 Directed Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Undirected Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . 13 4 Conditional Random Fields 4.1 Basic Principles . . . . . . . . 4.2 Linear-chain CRFs . . . . . . 4.2.1 Training . . . ...

Words: 10623 - Pages: 43

Premium Essay

Trying to Join Site

...Explain the purpose, functions, and characteristics of a CPU. 4. Describe the physical components of a computer and various input and output devices, including storage and memory. 5. Describe the function of BIOS and the booting process of a computer. 6. Describe basic operating system architecture, its components, and storage management. © ITT Educational Services, Inc. All Rights Reserved. [2] 6/15/15 IT1115 Introduction to Information Technology Syllabus 7. Describe basic types of computer network topologies and connections, protocols, and services used on the Internet. 8. Describe virtual computing and virtual networking concepts. 9. Describe fundamental cloud computing architectures and services. 10. Apply basic computer security measures by using authentication and access control. 11. Explain the basics of program algorithms and scripts used for software application development. 12. Apply basic...

Words: 12527 - Pages: 51

Free Essay

Business Process Management

...Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen 6336 Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany Richard Hull Jan Mendling Stefan Tai (Eds.) Business Process Management 8th International Conference, BPM 2010 Hoboken, NJ, USA, September 13-16, 2010 Proceedings 13 Volume Editors Richard Hull IBM Research, Thomas J. Watson Research Center 19 Skyline Drive, Hawthorne, NY 10532, USA E-mail: hull@us.ibm.com Jan Mendling Humboldt-Universität zu Berlin, Institut für Wirtschaftsinformatik Unter den Linden 6, 10099 Berlin, Germany E-mail: contact@mendling.com Stefan Tai Karlsruhe Institute of...

Words: 147474 - Pages: 590

Free Essay

Advances in Management Accounting

...LIST OF CONTRIBUTORS Solomon Appel Robert H. Ashton Reza Barkhi Metropolitan College of New York, New York, NY, USA Fuqua School of Business, Duke University, Durham, NC, USA Pamplin College of Business, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA School of Management, University of Michigan-Dearborn, MI, USA College of Business Administration, San Diego State University, San Diego, CA, USA Department of Accounting, University of Arkansas at Little Rock, AR, USA Zicklin School of Business, CUNY – Baruch College, New York, NY, USA Belk College of Business, University of North Carolina at Charlotte, NC, USA College of Business and Economics, West Virginia University, Morgantown, WV, USA RSM Erasmus University, Department of Financial Management, Rotterdam, The Netherlands Mohamed E. Bayou Chee W. Chow Cynthia M. Daily Harry Z. Davis Nabil Elias Arron Scott Fleming Frank G. H. Hartmann vii viii LIST OF CONTRIBUTORS Fred A. Jacobs Frances Kennedy James M. Kohlmeyer, III Leslie Kren John Y. Lee Michael S. Luehlfing Adam S. Maiga School of Accountancy, Georgia State University, Atlanta, GA, USA Department of Accountancy and Legal Studies, Clemson University, SC, USA College of Business, East Carolina University, Greenville, NC, USA School of Business, University of Wisconsin, Milwaukee, WI, USA Lubin School of Business, Pace University, Pleasantville, NY, USA School of Professional Accountancy, Louisiana Tech University, LA...

Words: 111886 - Pages: 448

Premium Essay

Business Performance Measurement

...This page intentionally left blank Business Performance Measurement Drawing together contributions from leading thinkers around the world, this book reviews recent developments in the theory and practice of performance measurement and management. Significantly updated and modified from the first edition, the book includes ten new chapters that provide a comprehensive review of performance measurement from the perspectives of accounting, marketing, operations, public services and supply chain management. In addition to these functional analyses the book explores performance measurement frameworks and methodologies, practicalities and challenges, and enduring questions and issues. Edited by one of the world’s leading experts on performance measurement and management, Business Performance Measurement will be of interest to graduate students, managers and researchers who wish to understand more about the latest developments in this rapidly changing field. Andy Neely is Deputy Director of the ESRC/EPSRC AIM Research initiative, Professor of Operations Strategy and Performance at Cranfield School of Management, and Visiting Professor of Operations Management at London Business School. Business Performance Measurement Unifying theories and integrating practice Second edition Edited by Andy Neely CAMBRIDGE UNIVERSITY PRESS Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge...

Words: 191452 - Pages: 766

Premium Essay

Ggggggg

...Retailing in the 21st Century Manfred Krafft ´ Murali K. Mantrala (Editors) Retailing in the 21st Century Current and Future Trends With 79 Figures and 32 Tables 12 Professor Dr. Manfred Krafft University of Muenster Institute of Marketing Am Stadtgraben 13±15 48143 Muenster Germany mkrafft@uni-muenster.de Professor Murali K. Mantrala, PhD University of Missouri ± Columbia College of Business 438 Cornell Hall Columbia, MO 65211 USA mantralam@missouri.edu ISBN-10 3-540-28399-4 Springer Berlin Heidelberg New York ISBN-13 978-3-540-28399-7 Springer Berlin Heidelberg New York Cataloging-in-Publication Data Library of Congress Control Number: 2005932316 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springeronline.com ° Springer Berlin ´ Heidelberg 2006 Printed in Germany The use of general descriptive names, registered names, trademarks, etc. in this publication does not...

Words: 158632 - Pages: 635

Premium Essay

Doc, Docx, Pdf, Wps, Rtf, Odt

...Coping with Continuous Change in the Business Environment CHANDOS KNOWLEDGE MANAGEMENT SERIES Series Editor: Melinda Taylor (email: melindataylor@chandospublishing.com) Chandos’ new series of books are aimed at all those individuals interested in knowledge management. They have been specially commissioned to provide the reader with an authoritative view of current thinking. If you would like a full listing of current and forthcoming titles, please visit our web site www.chandospublishing.com or contact Hannah Grace-Williams on email info@chandospublishing.com or telephone number +44 (0) 1993 848726. New authors: we are always pleased to receive ideas for new titles; if you would like to write a book for Chandos, please contact Dr Glyn Jones on email gjones@chandospublishing.com or telephone number +44 (0) 1993 848726. Bulk orders: some organisations buy a number of copies of our books. If you are interested in doing this, we would be pleased to discuss a discount. Please contact Hannah Grace-Williams on email info@chandospublishing.com or telephone number +44 (0) 1993 848726. Coping with Continuous Change in the Business Environment Knowledge management and knowledge management technology ANTONIE BOTHA DERRICK KOURIE AND RETHA SNYMAN Chandos Publishing Oxford · England Chandos Publishing (Oxford) Limited TBAC Business Centre Avenue 4 Station Lane Witney Oxford OX28 4BN UK Tel: +44 (0) 1993 848726 Fax: +44 (0) 1865 884448 Email:...

Words: 69553 - Pages: 279

Free Essay

Understanding Customer Needs

...UNDERSTANDING CUSTOMER NEEDS Barry L. Bayus Kenan-Flagler Business School University of North Carolina Chapel Hill, NC 27599 (919)962-3210 cherryflavorine@gmail.com January 2005 Revised November 2007 prepared for Shane, S. (ed.), Blackwell Handbook of Technology and Innovation Management, Cambridge, MA: Blackwell Publishers The comments of the following people on an earlier draft are greatly appreciated: Sridhar Balasubramanian, Dick Blackburn, Paul Bloom, Ed Cornet, Ely Dahan, Abbie Griffin, Steve Hoeffler, Erin MacDonald, Jackki Mohr, Bill Moore, Vithala Rao, Allan Shocker, and Gal Zauberman. Introduction Touted as the “most significant category innovation since toilet paper first appeared in roll form in 1890,” dispersible (flushable) moist toilet tissue on a roll was introduced in the United States by Kimberly Clark in 2001. According to a corporate press release, Cottonelle Fresh Rollwipes was a breakthrough product that “delivers the cleaning and freshening of pre-moistened wipes with the convenience and disposability of toilet paper.” Internal market research seemed to indicate that there was a clear customer need for a new product to supplement dry toilet paper. Surveys and focus groups revealed that over 60% of adult consumers had experimented with a moist cleaning method (e.g., using baby wipes, wetting a washcloth, sprinkling water on dry toilet paper) and one out of four used a moist cleaning method...

Words: 11667 - Pages: 47

Premium Essay

The Six Sigma Handbook

...Where such designations appear in this book, they have been printed with initial caps. McGraw-Hill eBooks are available at special quantity discounts to use as premiums and sales promotions, or for use in corporate training programs. For more information, please contact George Hoare, Special Sales, at george_hoare@mcgraw-hill.com or (212) 904-4069. TERMS OF USE This is a copyrighted work and The McGraw-Hill Companies, Inc. (“McGraw-Hill”) and its licensors reserve all rights in and to the work. Use of this work is subject to these terms. Except as permitted under the Copyright Act of 1976 and the right to store and retrieve one copy of the work, you may not decompile, disassemble, reverse engineer, reproduce, modify, create derivative works based upon, transmit, distribute, disseminate, sell, publish or sublicense the work or any part of it without...

Words: 236475 - Pages: 946

Free Essay

Thinking Fast and Slow

...In memory of Amos Tversky Contents Introduction Part I. Two Systems 1. The Characters of the Story 2. Attention and Effort 3. The Lazy Controller 4. The Associative Machine 5. Cognitive Ease 6. Norms, Surprises, and Causes 7. A Machine for Jumping to Conclusions 8. How Judgments Happen 9. Answering an Easier Question Part II. Heuristics and Biases 10. The Law of Small Numbers 11. Anchors 12. The Science of Availability 13. Availability, Emotion, and Risk 14. Tom W’s Specialty 15. Linda: Less is More 16. Causes Trump Statistics 17. Regression to the Mean 18. Taming Intuitive Predictions Part III. Overconfidence 19. The Illusion of Understanding 20. The Illusion of Validity 21. Intuitions Vs. Formulas 22. Expert Intuition: When Can We Trust It? 23. The Outside View 24. The Engine of Capitalism Part IV. Choices 25. Bernoulli’s Errors 26. Prospect Theory 27. The Endowment Effect 28. Bad Events 29. The Fourfold Pattern 30. Rare Events 31. Risk Policies 32. Keeping Score 33. Reversals 34. Frames and Reality Part V. Two Selves 35. Two Selves 36. Life as a Story 37. Experienced Well-Being 38. Thinking About Life Conclusions Appendix Uncertainty A: Judgment Under Appendix B: Choices, Values, and Frames Acknowledgments Notes Index Introduction Every author, I suppose, has in mind a setting in which readers of his or her work could benefit from having read it. Mine is the proverbial office watercooler, where opinions are shared and gossip is exchanged. I...

Words: 189666 - Pages: 759

Premium Essay

Cmms

...MARK: ………. % FACULTY OF ENGINEERING AND THE BUILT ENVIRONMENT DEPARTMENT OF QUALITY AND OPERATIONS MANAGEMENT OPERATIONS MANAGEMENT 3 (BPJ 33A3) Subject notes COMPUTERIZED MAINTENANCE MANAGEMENT SYSTEMS “CMMS” Table of Contents Enterprise resource planning 5 ERP is short for enterprise resource planning 6 ERP Software Modules 6 Origin of "ERP” 7 Functional areas covered in “ERP” 7 Integrations 8 What is the basic structure of a good ERP solution? 9 Need for Enterprise Resource Planning - Why ERP? 11 ERP Overview 12 Why ERP 14 Selection Criteria of ERP 14 Implementation of ERP 15 CMMS 16 Work orders and CMMS 16 Inventory control and CMMS 17 Functions of CMMS 19 CMMS process flow 19 The maintenance processes 20 Stores Requisitioning, Stock Control and Purchasing 24 WARRANTY MANAGEMENT 25 What is Warranty? 25 STORES 27 What is a corporate store? 27 Stock classification 28 ABC 28 Disadvantage of ABC classification 29 XYZ 29 Criticality analysis 30 Consignment stock 32 STORES MODULES 32 REPORTING 34 Maintenance Reporting Requirements 35 OPEN APPLICATION INTERFACE 35 OPEN APPLICATION INTERFACE DIAGRAM 36 CUSTOMER MIGRATION 38 Computerized maintenance management or CMMS 39 Old methods provide limited benefits 46 Safety FACTORS Plans 49 ISO FACTOR 50 The Productivity Factor 51 The Cost Factor 52 7 Cost factors to CMMS Configurability 53 Areas of saving 54 Maintenance Efficiency 54 Increased...

Words: 19085 - Pages: 77

Premium Essay

Strategy Management

...Connect platform also includes author-developed case exercises for all 12 cases in this edition that require students to work through answers to assignment questions for each case. These exercises have multiple components and can include: calculating assorted financial ratios to assess a company’s financial performance and balance sheet strength, identifying a company’s strategy, doing five-forces and driving-forces analysis, doing a SWOT analysis, and recommending actions to improve company performance. The content of these case exercises is tailored to match the circumstances presented in each case, calling upon students to do whatever strategic thinking and strategic analysis is called for to arrive at a pragmatic, analysis-based action recommendation for improving company performance. eBook Connect Plus includes a media-rich eBook that allows you to share your notes with your students. Your students can insert and review their own notes, highlight the text, search for specific information, and interact with media resources. Using an eBook with Connect Plus gives your students a complete digital solution that allows them to access their materials from any computer. Tegrity Make your classes available anytime, anywhere. With simple, one-click recording, students can search for a word or phrase and be taken to the exact place in your lecture that they need to review. EASY TO USE Learning Management System Integration McGraw-Hill Campus is a...

Words: 219639 - Pages: 879

Free Essay

Big Data

...McKinsey Global Institute June 2011 Big data: The next frontier for innovation, competition, and productivity The McKinsey Global Institute The McKinsey Global Institute (MGI), established in 1990, is McKinsey & Company’s business and economics research arm. MGI’s mission is to help leaders in the commercial, public, and social sectors develop a deeper understanding of the evolution of the global economy and to provide a fact base that contributes to decision making on critical management and policy issues. MGI research combines two disciplines: economics and management. Economists often have limited access to the practical problems facing senior managers, while senior managers often lack the time and incentive to look beyond their own industry to the larger issues of the global economy. By integrating these perspectives, MGI is able to gain insights into the microeconomic underpinnings of the long-term macroeconomic trends affecting business strategy and policy making. For nearly two decades, MGI has utilized this “micro-to-macro” approach in research covering more than 20 countries and 30 industry sectors. MGI’s current research agenda focuses on three broad areas: productivity, competitiveness, and growth; the evolution of global financial markets; and the economic impact of technology. Recent research has examined a program of reform to bolster growth and renewal in Europe and the United States through accelerated productivity growth; Africa’s economic potential;...

Words: 60035 - Pages: 241