Free Essay

Social Media White Paper

In:

Submitted By yannieyannie
Words 5930
Pages 24
Creating Usable Customer Intelligence from Social Media Data:
Network Analytics meets Text Mining

Killian Thiel Tobias Kötter Dr. Michael Berthold Dr. Rosaria Silipo Phil Winters

Killian.Thiel@uni-konstanz.de Tobias.koetter@uni-konstanz.de Michael.Berthold@uni-konstanz.de Rosaria.Silipo@KNIME.com Phil.Winters@KNIME.com

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 1

Table of Contents
Creating Usable Customer Intelligence from Social Media Data: Network Analytics meets Text Mining............................................................................................................................................ 1 Summary: “Water water everywhere and not a drop to drink” ............................................................ 3 Social Media Channel-Reporting Tools. .................................................................................................. 3 Social Media Scorecards .......................................................................................................................... 4 Predictive Analytic Techniques ............................................................................................................... 4 The Case Study: A Major European Telco. ............................................................................................. 5 Public Social Media Data: Slashdot ......................................................................................................... 6 Text Mining the Slashdot Data ................................................................................................................ 6 Network Mining the Slashdot Data ....................................................................................................... 11 Social Media Intelligence: Combining Text and Network Mining ........................................................ 14 New Insight - Merging Sentiment Analysis and Network Analysis........................................................ 15 The KNIME Advantage and Conclusion ................................................................................................. 17

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 2

Summary: “Water water everywhere and not a drop to drink”
The Rime of the Ancient Mariner, Samuel Taylor Coolidge

In today’s world of social media and the wide variety of social media channels available, there is a huge amount of data available. The challenge comes in accessing that data and transforming it into something that is usable and actionable. Generally, organizations want to use the social media data to understand the needs and behavior of their customers or specific targeted groups of individuals with respect to the organizations’ current or future products or services. There are three major approaches to looking at social media – channel reporting tools, overview score-carding systems and predictive analytic techniques (primarily text mining). Each has its useful aspects, but each also has limitations. In this paper we will discuss a fourth approach – using a predictive analytic platform that includes not only text mining, but network analysis as well as other predictive techniques such as clustering to overcome not only the limitations of the previous techniques, but generate new fact based insight as well. This approach was first used at a major European Telco. To explain the detailed approach, this white paper will work on publicly available data. We show not only sentiment analysis and influencers, but we are able to combine these techniques and – in our example – prove that participants who are very negative in their sentiment are actually not highly regarded as thought leaders by the rest of the community. This is an amazing result which goes against the popular marketing adage that negative users have a very high effect on the community at large. Note: To enable our approach to be repeated by the reader, we used the KNIME open source platform throughout this white paper. Sample data and workflows showing these techniques are available on the KNIME site at WWW.KNIME.COM .

Social Media Channel-Reporting Tools.
A wide number of tools – both open source and commercial – is now available to gain a first impression of a particular social media channel. Whether this is Twitter, Facebook, Google, YouTube or any of the other popular social media channels, tools or services can be found to provide an overview of that channel. These tools are generally surfaced as websites or web application components and serve as an interface to a cloud-based application that collects the data. Channel-reporting tools are particularly useful for gaining an instant overview on a focused subset of activity, or for looking at recent changes either in real time or for a fixed time frame. Virtually all provide some capacity for focusing or limiting what is looked at. This can be very helpful if you are first starting out on the road to understanding social media tools or if you have a responsibility to respond quickly to tactical activity on a particular channel. A good example of such an activity is monitoring a number of Tweet feeds related to the travel industry. If you work for an airline and that airline is mentioned, you will want to know whenever it is mentioned in order to be able to respond quickly with an offer of help or clarification. While these tools can give you an initial overview, and can be very valuable tactically, they are not suitable for gaining a deeper insight into the behavior or needs, concerns, wishes or trends of the individuals contributing since these tools and services do not actually provide the data, nor do they provide any context for the summarized data being presented.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 3

Social Media Scorecards
These tools are generally cloud-based applications that pull many different social media data sources together including communities and blogs. They are able to do this because they generally incorporate a massive back end infrastructure that constantly crawls and captures new data as it occurs. They all provide an interface to filter the data and enter selection criteria to look across a broad range of channel choices. The results usually take some form of a visual scorecard that combines different graphical and tabular techniques for displaying the summarized information. Many allow an interactive “drill down” to see further details, most of them allowing you to drill right through to the original source of the data. These scorecards are very good for keeping an overview. They can give a high level perspective on such topics as positive and negative users and writings as well as the quantity and relative popularity of an individual or a topic. The disadvantages are again that the actual data collected and displayed by the cloud application cannot be made available for enhancement, contextual focusing or for doing any sort of predictive analytics.

Predictive Analytic Techniques
Predictive analytic techniques used on social media enable us to start generating new fact-based insight on the social media data. Generally, social media data contained in public forums is now accessible through standard APIs or tools that allow the data to be downloaded from the cloud. There are also specialized service providers that will scan and deliver all the data in a usable form. Traditionally, text mining has been used to perform sentiment analysis on social media data. Sentiment analysis takes the written text and translates it into different contexts, such as positive or negative. In addition, there are a number of new visualization techniques, such as the word cloud or tag cloud, that help translate the vast quantities of text-mined words into something that is more compact and therefore understandable. Text mining can be very powerful. Sentiment analysis depends on an appropriate subjectivity lexicon that understands the relative positive, neutral or negative context of a word or expression. It is both language and context specific. A good example can be seen in Figure 1.

I find PRODUCTX to be very good and useful, but it is a bit too expensive.
Figure 1: Example of text interpretation from sentiment analysis.

The expression (and therefore the PRODUCTX) is rated as positive, since there are two positive words “good” and “useful” – and one negative word “expensive”. In addition, one of the positive words is enhanced with the word “very” while the negative word is put into perspective by the qualifier “a bit”. The more advanced the lexica, the more detailed the analysis and the findings can be. Sentiment analysis using text mining can be very powerful and is a well-established, stand-alone predictive analytic technique.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 4

Figure 2: Example Network Network analysis of social media data on the other hand focuses on the relationships between individuals – using their communication on particular topics as the connectors between them. Figure 2 illustrates this wellThese networks can become incredibly complex, but advanced networking techniques not only identify the network, but can translate it into either graphical representations or solid numeric features, which can be used for analytics. In this way, you can identify the users that act as leaders or influencers and the users that act as followers, as well as determining the relative strength of the leader for a particular topic or forum. Network analysis is a relatively new predictive analytic technique, and traditionally has been run as a standalone application. All other predictive analytic techniques, such as clustering, modeling and scoring, have relevance in social media but only if the data can be captured and translated into the traditional numeric and string features these techniques require. What is unique about this white paper is that by applyingthe KNIME platform we have been able to use both text mining and network mining within a single platform. For the first time, results have not only been provided by combining the two techniques, but data has been supplied in a form that can be used by other predictive analytics techniques.

The Case Study: A Major European Telco.
The original work combining text analysis and network mining was carried out for a major European Telecommunications company. They had invested heavily in social media platforms to enable their large community to share and discuss – for example on UEFA Cup Football via Facebook – or to support each other – for example with installation or usage questions via a community. While they had a very good social media scorecard, which gave them an immediate and tactical view of the related social media data, it was impossible to actually create new customer insight. It was also impossible for them to carry out sentiment analysis around their topics, since the lexicon used by the social media monitoring tools did not allow additions or modifications. From the scorecards, it was noticed that the most active or positive individuals were frequently later identified as staff members – and these could not be easily eliminated from the monitor tools. Over time, they also wanted to make the insight even more relevant by adding product words and other public information about the topics and individuals to try and create something even more relevant. None of this could be done.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 5

So the public data was extracted from the cloud and brought in house. It was enhanced with product information and staff information so that very specific queries and techniques could be applied. Not only text mining and network analytic techniques were performed, but after converting the raw text back into normal data, clustering and modeling techniques were applied. The results were fascinating: insight into not only positive and negative sentiment, but an indication of how well the users were regarded within the community. Staff entries could be eliminated, and clusters of individuals with similar needs and response patterns were discovered. It soon became clear that the applied techniques were extremely powerful, and the desire to create a white paper documenting these new approaches was great. There was also a desire to actually share in the form of working examples. This required us to find a publicly available source of data that everyone could use. We discovered a public social media data set very similar in shape and structure to the one used for the European Telco and the data that is used throughout this white paper.

Public Social Media Data: Slashdot
Slashdot is a popular website, which was created in 1997. It publishes frequent short news posts mostly about technological questions and allows its readers to comment on them. The user community is quite active with more than 200 responses to a thread tending to be the rule rather than the exception. Most of the users are registered and leave comments by their nickname, although some participate anonymously. The data we used is a subset of the Slashdot homepage which is provided by Fundación Barcelona Media4 (http://caw2.barcelonamedia.org/node/25 ). The subset contains about 140,000 comments to 496 articles about politics from a total of about 24,000 users.

Text Mining the Slashdot Data
In a first step our goal is to identify negative and positive users, that is to determine whether the known (not anonymous) users express predominantly positive or negative opinions, attitudes, feelings, or sentiments in their comments and articles. In order to measure the sentiment of a user a level of attitude is determined, which measures whether a user writes his or her comments and articles mainly negatively or positively. The level of attitude can also be used to categorize the users afterwards. To categorize sentiment a lexicon containing words (clues) and, in addition to other information, their polarity is used. The polarity of a word specifies whether the word seems to evoke something positive or something negative. Possible polarity values are: positive, negative, both, and neutral. Naturally, the lexicon is incredibly important as not only the language but the contextual usage of the language for the given audience is significant. With KNIME, you can freely choose the lexicon that is most appropriate for your text data or – alternately – use KNIME to build or modify an available lexicon to suit your tasks. For the Slashdot data, the sentiment analysis of the user comments and articles is based on a publicly available lexicon called the MPQA subjectivity lexicon (http://www.cs.pitt.edu/mpqa/lexicons.html). In the MPQA lexicon, the words as well as their polarity have been identified and collected manually and automatically from annotated and non-annotated data.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 6

Before applying text mining, it is important to understand the structure of the Slashdot data and how it is read into KNIME and transformed. An article is an initial contribution. A comment is a note or a reply to an article or to another comment. Each article with all its following comments and notes represents a document. Users write bits and pieces across many documents. To quantify the user attitude we need then to navigate through all documents and measure the amount of negativity and positivity he or she has been using. As we have seen, a word can be seen as positive or negative just by itself or depending on the context. The frequency of negative and positive words throughout a document defines the attitude of the document. Similarly the frequency of negative and positive words, among all words used by specific users across all documents, defines the attitude of the user. The more negative words used by a user, the more negatively the user attitude is perceived. In contrast, the more positive words a user uses, the more positively the user attitude is perceived. We excluded the “anonymous” user from the analysis, since this represents a collection of many different users rather than a single user and therefore carries no interesting information. For each non-anonymous user, the frequencies of positive and negative words, respectively fpos(u) and fneg(u), are calculated over his/her contributions across all documents. The difference between such frequencies defines the user attitude as: λ(u) = fpos(u) - fneg(u). Positive λ define positive users, negative λ negative users. KNIME has a whole Text Processing category, with nodes solely devoted to read, manipulate, and quantify texts. The Text Processing nodes operate on a new Document data type. In the first part of the workflow built to text mine the Slashdot data, a “Table Reader” and a few traditional data manipulation nodes read the data, remove the the anonymous user, and isolate the posts. Each post is then converted into a Document data type to allow further text analysis operations. At the same time, another branch of the same workflow reads data from the MPQA corpus, extracts the polarity associated with each term, and creates two separate sets of words: the set of positive words and the set of negative words. Finally, the “Dictionary Tagger” node associates a polarity tag to each word of the Document column.

Figure 3: Preprocessing of posts and corpus in the KNIME workflow used for text mining

Now that all words in the posts are tagged as positive or negative, we can proceed with the calculation of the level of attitude for each post and for each user.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 7

There are two very important nodes in KNIME’s Text Processing category: - The “BoW creator” node creates a bag of words (BoW) for a set of documents. A BoW consists of at least two columns: one containing the documents and one containing the terms occurring in the corresponding post. The “TF” node computes the relative term frequency (TF) of each term in each document and adds a column containing the term frequency value. The term frequency value is computed by dividing the absolute frequency of a term occurring in a post by the number of all terms of that post.

Negative and positive term frequencies are then aggregated over user ID, to obtain the total frequency of negative and positive words for each user. The level of attitude of each user is then

Figure 4: KNIME sub-workflow calculating and aggregate term frequency at the user level

measured as the difference between the two term frequencies. We would like to categorize the users using only three categories “positive”, “neutral”, and “negative” based on their level of attitude. If we assume that the user level of attitude is Gaussian distributed (Fig. 5) around a mean value µλ with a variance σ λ and that most users around µλ are neutral. Therefore we assume that users with a level of attitude λ inside µλ±σ λ are neutral, while users with λ in the left queue of the Gaussian (λ ≤ µλ-σ λ) are negative users and users with λ in the right queue of the Gaussian (λ ≤ µλ+σ λ) are positive users. Based on the calculated values for µλ and σ λ, the binning process results in 58 negative users, 21877 neutral, and 1188 positive users. Figure 6 shows a scatter plot of all known users. The X axis represents the frequency of positive words, the Y axis the frequency of negative words used by a user. Negative users are colored red, positive users are colored green, and neutral users are colored gray. The most prolific users, i.e. those that have written the most (positive and negative words) are positive or neutral users. The user with the highest number of words (positive and negative) is “99BottlesOfBeerInMyF” which can be seen in the right top corner of Figure 6. However, he or she is not the user with the highest level of attitude.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 8

Figure 5: Distribution of the level of attitude by user

Figure 6: Scatter plot of frequency of negative words vs. frequency of positive words for all users

The most positive user is “dada21” with over 2800 positive words and only about 1700 negative words. He or she is a frequent writer as well. A tag cloud of the 200 most frequent nouns and adjectives of “dada21” can be seen in Figure 7. Stop-words have been removed, positive words are colored green, negative words are colored red, and neutral words are colored gray. The most frequent word is “government” followed by “money” and “people”. It is clear that there are more positive than negative words.

Figure 7: Word cloud of user dada21

In contrast, the user with the lowest level of attitude is “pNutz”, with only 43 positive words and 109 negative. This user is not a frequent writer. It seems that this user simply wished to vent his anger

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 9

once but did not want to participate in a reasonable conversation. A tag cloud of the 200 most frequent nouns and adjectives of “pNutz” can be seen in figure 8. Stop-words have been removed, positive words are colored green, negative words are colored red, and neutral words are colored gray. Here, the most frequent word is “stupid” and the negative words outnumber the positive words.

Figure 8: Word cloud of user pNutz

The average word frequency (positive and negative) of positive users is at 418 almost twice as high as those of negative users with 217. Thus, negative users do not write frequently. The final workflow to process the texts from the Slashdot repository is shown in Figure 9.

Figure 9: The complete KNIME workflow for text processing

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 10

Network Mining the Slashdot Data
The main goal of network analysis is to detect leaders and followers based on the status and structural position of the users in the generated network. After filtering all articles and comments submitted by anonymous users, we created a user network of 26 unconnected components. Of these 26 components, 25 contained only 3 or less vertices and one contained 24,055 vertices and 98,150 edges. Since we are interested in the main leaders and followers we concentrated the analysis on this last largest connected component. The network created from the Slashdot data is extremely complex, as visualized in Figure 10.

Figure 10: complete Slashdot Network

While impressive, the power of network visualization techniques becomes clear when we focus in on particular areas. In the Slashdot dataset we have subtopics that allow us to subset the data. Two examples of network visualization are around the topics of NASA and Science Fiction:

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 11

NASA

Science Fiction

Figure 11: Focused Networking Diagrams around the topics of NASA and Science Fiction In our two examples, there are individuals that are very much at the center of all activity for their related topics. These are the leaders. Leaders are the users that post an article or comment that provokes a lot of discussion e.g. comments about the post. These users might be of interest for opinion-making since they draw a lot of attention to their posts. Followers on the other hand comment a lot on other comments and articles but do not receive many comments on their own posts. In order to detect the main leaders and followers in our selected large component, we borrowed a centrality index from web analytics. This centrality index was mainly developed to improve the results of web searches by discovering the most authoritative web pages for a broad search topic. It assigns each vertex two different weights, called authority weight and hub weight. A vertex is assigned a high hub weight if it refers to many vertices with a high authority weight. Vertices are assigned a high authority weight if they are referenced by many vertices with a high hub weight. Therefore a high hub weight is assigned to users who frequently react to articles posted by others; in contrast a high authority weight describes those users whose articles generate a lot of comments. Figure 12 shows a scatterplot of the leader vs. follower score for all users belonging to the largest component. The X axis represents the follower score based on the hub weight whereas the Y axis represents the leader score based on the authority weight.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 12

Dada21

Carl Bialik from the WSJ

Doc Ruby

Figure 12: Scatter plot of leader vs. follower score for all users

The user that attracts immediate attention is “dada21”, who has the highest authority weight of 1 and a very high hub score of 0.9. This user thus receives a lot of comments from other users in his/her posts (high leader score) and at the same time comments often on other users' articles and comments. This user is indeed one of the most active users with regard to the political topics in Slashdot. Another user that might be of interest is the user with synonym “Carl Bialik from the WSJ”. This user has the highest authority weight of .4 but a very low hub weight of 0, implying that he is followed by a very high number of people, but never responds to anyone else’s entries. On the opposite side of the scatter plot, we find the user “Doc Ruby”. “Doc Ruby” has the highest hub weight of 1 and only a moderate authority weight of 0.2, meaning that he leaves a lot of comments on other users’ posts but rarely writes a post of his own, and, if he/she does, rarely receives a comment. This makes him one of the top followers. Network creation and analysis as well as visualization were accomplished by using the KNIME Network Analysis suite in combination with the R and R Network mining 7 plugins, which can be automatically installed within the KNIME platform. Figure 13 shows the KNIME workflow that filters out all anonymous posts and users, creates the user network based on the Slashdot dataset, extracts the largest component, computes the authority and

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 13

hub weight for each user using the R Network mining 7 plugin, and finally visualizes the authority and hub weight in a scatter plot.

Figure 13: Hybrid KNIME and R workflow for network mining

Social Media Intelligence: Combining Text and Network Mining
Text mining in social media data and network mining in social media data as analytic approaches are by now widely used for revealing new insights in social media data. In texts from blogs, comments or posts, the sentiments and opinions of users on certain topics, products, or persons are often mined. However, each technique follows its own very specific goal. In text mining, the emphasis is on translating the textual data into sentiment in a carefully controlled process that places the emphasis on words and expressions within a given context. However, the information about the actual creator of the text, the sentiment expressed in it, and the counts and numbers of readers and responders cannot reveal the relevance of that person with respect to all others in that community, nor can it reveal the relative interactions between that person and others and how they relate to each other. Network mining, on the other hand, does a very good job of identifying how individuals interact with each other. It does not rely on a categorical captured “thumbs up” or “star rating” of individuals to rate the importance of a person, but rather identifies those people of influence and those that are followers through physical nodes and connectors. Network mining is also very good at identifying anomalies, such as “I will vote for you and you will vote for me”, which is a classic challenge for text mining used in sentiment analysis. And yet powerful, networking analysis alone cannot provide us with any information about the context. Until recently, there have been two major reasons for not combining these relatively new techniques. First, the academic world has developed the two techniques separately with a different analysis focus. Second, both techniques required dedicated machine-generated algorithms that implement different computational methods. This led to early separate standalone platforms, each one used to perform the specific technique required. Two major paradigm shifts have changed all of this. First, the introduction of open source data mining and predictive analytic platforms, such as KNIME, has emerged which provide these techniques within the same environment. Secondly, the needs of business to identify both positive and negative individuals around a particular topic and as well as establish the individual’s relevant importance to the community.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 14

Taking advantage of having both network mining and text processing available within the KNIME environment, we combined the results from the sentiment analysis with the results from the network analysis in order to better position each user inside his/her community in terms of influence (leaders vs. followers) and sentiment (positive, neutral, and negative users). The workflow that integrates the text mining results with the network analysis results can be seen in Figure 14.

Figure 14: KNIME workflow to combine network analysis results and text mining results

New Insight - Merging Sentiment Analysis and Network Analysis
In general the goal of marketing is to identify negative and neutral users and, by means of dedicated operations, to convince them to become positive users. However, working on all possible users might be impractical and expensive. Thus from time to time a new marketing trend emerges trying to identify only the top influencers and to act on them. An even better strategy, in terms of saved resources, would be to identify only the top negative and/or neutral influencers. Indeed, it might not be as effective to try to influence an already positive user, since he or she is already writing mostly positive comments and an excessive marketing action might push him/her train of thoughts in an unwanted direction. Based on sentiment analysis and on text mining we might be able to position each user on a scatter plot with the attitude level on the X axis and the follower or the leader score on the Y axis. However, the real power of predictive analytics and visualization comes when we combine both scatter plots, as shown in Figure 15. The X axis represents the follower score and the Y axis represents the leader score. In addition, users are colored by their attitude: red for negative users, green for positive users, and gray for neutral users. We have here a clear separation between leaders and followers, with clear leaders being identified by those points above the diagonal, and followers below the diagonal. The top influencers are the users with the highest leader score. These users are most likely to have a high impact on other users, since their articles and comments are widely read and used as reference by other users.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 15

Figure 15: Leader vs. follower score colored by attitude for all users

What first becomes clear is that very few negative users have a high leader score. One such user who clearly requires further investigation is “Catbeller”, the top negative attitude leader. The user “dada21” has the highest leader score but also the highest follower score. In addition, he or she is already positive anyway. The user “WebHostingGuy” has the highest leader score among the neutral users. A low follower score in combination with a high leader score identifies those users who get a lot of replies and only rarely post a comment on other users’ articles. Such users are e.g. “WebHostingGuy” and “Snap E Tom”, who are the top leaders of the neutral users and have a very low follower score. Virtually all other negative attitude users are only occasionally followers and have almost no leader influence. This goes against the popular marketing adage that says that all negative attitude users are relevant. Also all positive attitude thinkers on the right of the diagonal would not be relevant for our marketing actions. In fact, even though positive, they mainly just react to someone else's original thinking (whether positive or negative). On the other hand, there are a few positive attitude users above the diagonal who clearly are leaders. These users might be thought of as original thinkers, whose positive contributions are seen as relevant by the community. In general, though, it is neutral users who are best recognized as leaders in the community. The highest rated neutral attitude leader is “Carl Bialik from the WSJ”. If

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 16

this really is Carl Bialik, a noted journalist from the Wall Street Journal, then it is the ultimate compliment to a journalist: strong neutral writing is appreciated and followed by readers everywhere. With this new fact-based insight, we can go beyond identifying individuals and start using grouping or clustering techniques, such as k-means, to identify groups of individuals with similar traits. A very simple example of this is shown in Figure 16.

Figure 16: Identified groups of similar readers

In this example, we have identified a group “Targeted Neutral”, “Targeted Positive” and “Targeted Negative” by utilizing the ratio of leader to follower score, as well as the relative sentiment of the reader. All three of these groups would be interesting for further analysis. Next steps might include reintroducing the topics that interest a reader, as well as location, time of day/week and recency / frequency data (all available within the data), to then start using predictive techniques such as kmeans to further cluster interesting groups of readers. By combining text mining sentiment analysis with network analysis, we have created an extremely relevant fact-based insight that would not have been possible by using just one analytic technique alone. We have also created a new base of data describing each user in terms of his/her sentiment and his/her influence. This new base of data represents just a starting point for further user investigation, using for example traditional machine learning and predictive analytics techniques such as clustering and modeling.

The KNIME Advantage and Conclusion
The combination of text mining and network mining, as shown in this paper, reveals new heterogeneous insights into social media customer behavior, which would not have been possible by using either technique on its own. Combining sentiment analysis from online forum posts together with reference structures from the quotation network has allowed us to position negative and positive users in context with their

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 17

relative weight as influencers or followers in the underlying discussion forum. Although originally performed for a major European Telco and repeated here using publicly available data, this study has a strong relevance for other industries as well, as long as they can access information containing both text and networking relationship data. This approach can also be enhanced by including additional relevant data sources regarding particular focus areas, such as company and product names, political parties, known users of products, and so on. The additional data would further enhance the method capability of identifying, segmenting and drilling down to interesting user groups. This technique introduces additional features into the user data, to improve the process of analytic investigation as popularized by the CRISP (Cross Industry Standard Process) methodology. This new fact-based insight provides a foundation for applying other types of investigative analytic, segmentation, predictive, and machine learning techniques. The ability to group individuals into clearly defined social media segments has of course a strong relevance for all companies that already use data mining and customer intelligence techniques within their organization. In fact, a good understanding of the social media segments can provide an invaluable contribution on the decision about how to invest and shape the company’s social media and marketing strategies. In conclusion, we see this technique as a basis for further data exploration. By using the KNIME platform, which allows not only traditional data mining, predictive analytics and machine learning, but also text mining and network analysis, we have been able to quickly and effectively create this new insight with a minimum of delay. Final Note: We used KNIME open source software and publicly available data sources, therefore the complete workflows that also include the data can be freely downloaded from www.knime.com.

Copyright © 2012 by KNIME.com AG all rights reserved

Revision: 120403F

page 18

Similar Documents

Premium Essay

Reflective Essay About White Feminism

...like to think of myself as a creative person. I had my doubts about taking a class that required so much writing in a little amount of time but I took on the challenge head first, just like I do with the rest of my problems. In my first essay I wrote my argument paper about how “white feminism” isn't as inclusive as it claims to be and discriminates against numerous minority groups. I chose this topic because it’s something I’m extremely passionate about. I’ve been officers of several social justice orhinatiocs and I’m always up for rallying for human rights. The arguments that I made in this essay were showcasing a couple of ways that “white feminism” left minority groups feeling left out and not worthy of a voice within social justice platforms. Such as Patricia Arquette giving a speech about how minority groups did not help out feminism, even though many people in minority also identify as women....

Words: 761 - Pages: 4

Premium Essay

Social Media In Presidential Elections

...The goal of this paper is to find evidence of the use and impact of social media in the 2012 presidential election. This is because it was reported that President Obama won the elections because of the ground operation presented by volunteers of his elections' campaigns (CNN Wire 1). I chose this topic since reports in state media indicated that the Republican Party was leading in the pre-election polls, but in the end the Democratic Party won due to the use of technological innovation (Edsall 1). An in depth analysis reveals that the presidential contest favored President Obama for using social media. Social media is increasingly an easy, fast, and effective way for people to have personal contact through technology. The intention is to prove...

Words: 3016 - Pages: 13

Free Essay

Social Inequality

...Sociology Introduction One of the most important trends in the study of sociology is the inevitable social inequality in the society. When talk about inequality, sociologists usually link stratifying institutions which label people into social categories such as the educational system and the formal labor market. The three articles discussed below explore the different faces of inequality in society. These articles are Incarceration and Stratification (2010), The Mark of Criminal Record (2003) and The Black-White Test Score Gap (2004). The overarching theme that will be pointed out below is inequality face by black people in the United States. These articles show inequalities face by black people in three different landscapes: incarceration cells, employment, and education. This paper included the role of media in the proliferation of racial inequality between white and black people. Lastly, this paper also presented the missing gaps on literature and how should we address the problem of racial inequality. Summary Inequality is present in incarceration cells. Wakefield and Uggen (2010) claimed that incarceration became a powerful “engine of social inequality that plays a massive and racialized part in the contemporary stratification system” (Wakefield and Uggen, 2010, p. 388). The study conducted by Wakefield and Uggen (2010) covers the scope of imprisonment and the process of selection into prison. The authors then proceed by giving the implications of incarceration in...

Words: 1958 - Pages: 8

Premium Essay

Cr Js 105 Unit 1

...Unit 1 IP CRJS 105 Abstract   After reading this paper you will be able to explain the differences between Criminologists, Criminalists, and Forensic psychologists and the differences in their discipline of expertise. Additionally, this paper will briefly discuss blue collar crimes vs. white collar crimes, the way in which the FBI reports and measures these crimes in their Uniformed Crime Reporting (UCR) system. The author further elucidates blue collar crimes and how their culture is more populated by the media. This paper also includes the variations of Index I and Index II crimes and the manner in which they are reported either as violent or property crimes under the UCR system, as well as, the sentence that accompany such crimes. The author concludes this paper with a basic understanding of these three fields and the importance of their role within our criminal justice system that together create a unified force to battle crime. Introduction To get a better understanding of how our criminal justice system operates, society needs to know the involvement and the many specialists that are needed to keep our communities safe as well as, get the criminals off the streets to prevent them from committing future crime. One should look at it like baking a cake and each ingredient is an important part of the recipe. 1st ingredient is the investigation, 2nd ingredient is solving the crime...

Words: 1246 - Pages: 5

Premium Essay

Business

...Small Logo White Paper Introduction to Cloud Computing Introduction to Cloud Computing White Paper Executive Summary A common understanding of “cloud computing” is continuously evolving, and the terminology and concepts used to define it often need clarifying. Press coverage can be vague or may not fully capture the extent of what cloud computing entails or represents, sometimes reporting how companies are making their solutions available in the “cloud” or how “cloud computing” is the way forward, but not examining the characteristics, models, and services involved in understanding what cloud computing is and what it can become. This white paper introduces internet-based cloud computing, exploring the characteristics, service models, and deployment models in use today, as well as the benefits and challenges associated with cloud computing. Also discussed are the communications services in the cloud (including ways to access the cloud, such as web APIs and media control interfaces) and the importance of scalability and flexibility in a cloud-based environment. Also noted for businesses desiring to start using communication services, are the interface choices available, including Web 2.0 APIs, media control interfaces, Java interfaces, and XML based interfaces, catering to a wide range of application and service creation developers. Introduction to Cloud Computing Table of Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . ....

Words: 3208 - Pages: 13

Free Essay

How Culture and Location Affect News

...are from the West for its Arabic sentiments. Despite all accusations, the station has only grown in popularity in the Middle East and among Arabs around the world. B. While Western countries such as the US and Russia have expressed their reservations about the objectivity of the station, its audiences has only been growing especially among Arabs. Its stance on popularizing Arabic sentiments may be due to its location and the cultural beliefs of its target audience, reporters and owners. Thesis Statement: Using Al Jazeera and the New York Times as primary cases, this paper explores how the location of a media outlet and culture of its primary employees and target audience can affect the ways in which news items are reported. As it was briefly discussed, Al Jazeera’s primary reporting obligations may possibly be to protect the interests of its region and culture. This paper explains how and why media outlets may, perhaps unwillingly, take stances on many matters just because of their locations and cultures. II. Body Paragraphs A. Al Jazeera and its Arabic, Middle East Possible Bias: i. Qatari, and indeed Middle East’s, most popular satellite television, Al Jazeera has been hailed and criticized in equal measure for its daring, high quality and alleged biased (in the case of critics) news coverage. Its reporting of issues in the Middle East is unrivalled while its coverage of global issues with an Arabic twist is also...

Words: 3667 - Pages: 15

Premium Essay

Afericna American

...African American's Journey Essay Below is a free essay on "African American's Journey" from Anti Essays, your source for free research papers, essays, and term paper examples. “African American’s Journey to Freedom” Charity Johnson HIS204: American History since 1865 Instructor: Leslie Ruff February 11, 2013 “African American’s Journey to Freedom” To some African Americans it may seem ironic that The United States of America is known as “the land of the free” considering that majority of their ancestors entered the US as slaves. African Americans were brought to North America via the middle passage which originated during the fifteenth century.   They were enslaved for approximately 400 hundred years until the end of the Civil War in 1865. Although African Americans were enslaved in America, they were determine to survive and one day be freed in this great country. During The African American’s journey to freedom several significant events took place which was inclusive of but not limited to: The Civil Rights Movement of 1865-1877, Separate but Equal Legislation (Plessy vs. Ferguson court case) in 1896, The Harlem Renaissance of 1920, Brown vs. Board of Education in 1954, The March on Washington Movement of 1963, and The Black Power Movement of the late 1960s and 1970. I will discuss the significance of these events in relation to the African American journey to freedom and how they have help shape American society today. THE CIVIL RIGHTS MOVEMENT OF 1865-1877 Frequently when...

Words: 5251 - Pages: 22

Premium Essay

Competitor Analysis Website & Thought Leadership

...Competitor analysis: Website & Thought leadership Report by Rashmi Singh (PGDM No: 10098) Work carried out at Tata Consultancy Services, Bangalore, Karnataka Submitted in partial fulfilment of the requirement of Summer Internship Programme Under the Supervision of Mr. Ashish Shetty, Marketing Lead, Insurance ISU, TCS, Bangalore SDM Institute for Management Development Mysore, Karnataka, India (June 2011) SDM IMD INSTITUTE CERTIFICATE This is to certify that Ms. Rashmi Singh, undergoing PGDM program 2010-12 at this institute has successfully completed the Summer Internship Programme on the project titled ―Competitor Analysis: Website and Thought Leadership‖ at TCS, Bangalore, from April 01, 2011 to May 31, 2011 as a partial requirement for completion of his PGDM curriculum. Prof. Govinda Sharma Internal Faculty Guide SDM IMD, Mysore. Date: 24/06/2011 Place: Mysore Summer Internship Project Page 2 SDM IMD Acknowledgement I take this opportunity to extend my sincere gratitude to our guides at Tata Consultancy Services, Mr. Ashish Shetty and Ms. Varsha Nair who spent a lot of time mentoring and guiding us. The insurance ISU was a completely new arena for us. We stumbled a few times, yet they have been very patient and supportive with us, always encouraging us to give our best. I also thank the Academic relationship manager, TCS, Mr. Chandra Koduru, for helping us with the joining formalities and induction program. I would also like to thank Prof. N. R. Govinda Sharma...

Words: 12774 - Pages: 52

Premium Essay

Medical Records

...keeping is one reason why health care costs in the United States are so high. The majority of medical record keeping here in the United States is paper it makes it difficult to access and share. 2. The factors responsible for the building of electronic medical record systems are like organization, doctors, insurance companies, patients, and the lacking of technology. Doctors difficulties is that they have limited time and to take 20 hours of training to use the EMR. The United States government also plays a big role by having a short term goal that by 2015 that all medical facilities will be utilizing EMR. And patients have a concern of confidential information that will be stored in this system that can be read all across the United States. 3. The business, political, and social impact of not digitizing medical records are that doctors will not be able to have instant access to a patient’s medical records. And without digitizing medical records means that there will be the need for more paper work. Politically the 2015 goal set by the US government would also be impacted. Socially people would expect little error to this EMR system like correct dosages. And by not digitizing medical records has a greater risk of errors. 4. The business and social benefits of digitizing medical recordkeeping are when taken from paper and filing them to the EMR patient’s records will be easier accessed. And unlike filing cabinets a computer has extensive memory to store them...

Words: 1289 - Pages: 6

Free Essay

Miss

...and explain the emergence of the complex forms of hybrid culture and identity that increasingly occur amongst youth throughout the world, but what factors best explain the participation of young people in these subcultures? Also, how do these factors operate? The purpose of this paper is to argue that the participation of young people in youth cultures is best explained by 2 factors; the media and one’s ethnicity. This argument is will made with particular reference to punk and hip hop subcultures as well as brief discussion of Indigenous subculture. The paper will begin with an over view of how subcultures are used to form identities and invent cultural meaning which will be followed by a discussion of the mass media’s influence on youth in today’s society and how and why the media is a major factor in determining youths involvement in different sub cultures. The influence a young person’s ethnicity has on their participation in subcultures will then be addressed with reference to Cohen’s (1955) version of strain theory and how this effects the formation and involvement in subcultures such as indigenous subculture and hip- hop. A conclusion will then be given stating that both ethnicity and the media are the best means of explaining youth participation in subcultures as they are largely influential in determining youth involvement in, as well as the original formation of subcultures. Sub- cultures are often seen as a way of forming collective identities from which an individual...

Words: 2261 - Pages: 10

Premium Essay

Using Social Media as a Job Search Platform

...Using Social Media as a Job Search Platform Patricia Crouch 11/12/15 Abstract This paper explains the many uses of social media and how it relates to pursuing a career. It explores a few important social media platforms commonly used today, and how they can be utilized in one’s job search. Not only does this paper describe how to use certain online sites, but also goes into the do’s and don’ts of how to present themselves to an employer using their “profile”. Real-world examples from articles and business experts are used to give this paper a sense of relevance to young professionals in their current career search. The examples used vary in which social media sites that were utilized, and describe the affect each site had on one’s success. Using Social Media as a Job Search Platform In recent years, social media has become more prevalent in the lives of everyone. It has even entered the workplace as a tool used to research candidates for hire. Explained by Joyce in an online article, “Social media is becoming increasingly important as a critical method by which to market their skills and network online.” This fairly new way of online communication plays a large part in the hiring process, and it is important to be aware of how you portray yourself to a possible future employer. Anyone can be “Googled” and searched for, but the content that the search yields can be tailored to whatever job one may be searching for. Although social media is popular and accessible by...

Words: 1524 - Pages: 7

Premium Essay

The Nature of Crime in Society

...presented in the media? Is the majority of crime in our society violent in nature? Topic 1: Is the nature of crime in our society accurately presented in the media? Word Count: 774 Over the course of the 20th century and the transition into the 21st century, media has played a pinnacle point in society. It has developed from not only a means of information but also as a source of entertainment and consequently the line between information and entertainment can be distorted. This paper aims to conclude whether or not the nature of crime in society is accurately portrayed in media. As a result, this paper will include information that demonstrates the differences between crime in society and crime in media. Furthermore, the nature in which media crime framed will be examined. Finally, this paper will demonstrate how media crime distorts public perception in relation to how crime is in reality. It is through an examination of these points that a conclusion will be met, one that projects the crime in media as a false representation of crime in society. A criminal justice system plays a key role in the functioning of modern societies around world and despite this most people only have the portrayal of media sources to give context to the idea. According to Shrum, the only exposure to a criminal justice system that people perceive is through the media (Tapscott, 2011). Corroboratively, research conducted by the Australian Psychological Society suggests that “media portrayals of...

Words: 2058 - Pages: 9

Free Essay

Airbnb

...Dissecting the Success of Nina4Airbnb A White Paper on The Foundations of Marketing and How You Can Apply The Lessons to Your Campaigns By Nina Mufleh July 2015 What’s This About? In April, I launched the most exciting campaign of my career. After a decade of working with high profile personalities and Fortune 500 brands, I applied what I learned about marketing and storytelling to build a campaign that would show Silicon Valley com panies the value I would add to their teams. W ithin two weeks, the website that hosted the cam paign received nearly half a m illion hits, m y resum e was viewed over 14,000 tim es and I achieved m y goal of interviewing with Airbnb and dozens of other high profile companies. W ith global m edia attention and m illions of im pressions through social m edia, the experim ent was a hit. I’ve worked on several high impact campaigns, and this is the most exciting one because it succeeded without a budget or a support team, proving that the success was completely tied to executing the foundations of marketing. I never formally studied marketing, but I’ve always had an insatiable curiosity about what grasps people’s attention. I taught m yself the foundations by harnessing that curiosity, and I’m sharing my approach to creating Nina4Airbnb with the aim of adding value to other curious minds and sparking m ore interesting campaigns. The Background / The Challenge I moved back to California in 2014 after a decade in the Middle...

Words: 4962 - Pages: 20

Premium Essay

Ethic Group Conflict Paper

...Ethnic Group Conflict Paper Steven Byrd Psy/450 7/13/2013 Conflict in Ethnics The world is made up of many different types of people and cultures and of course when you have people who are different, you will have some type of conflict with some of them, maybe not as whole but there will be conflicts for the simple fact that sometimes we fear what we don’t know and we can also become very judgmental just based on how we were raised or what we have seen throughout the media. Times have improved when it comes to ethnic relations but there is still a very high tension rate with certain ethnic groups. Two groups that I believe that have improved vastly but are still at war would be African Americans and Caucasians. War is not just countries at battling, war is also defined as being in a state of conflict. Of course war is usually recognized by two countries fighting against each other or even wars within the same country such as Korea, North Korea and South Korea for example have had many different problems but that is all because of leadership. The war amongst whites and blacks is one that takes place all over and as mentioned before it is not as bad as it used to be but there is still a lot of tension between the two races, especially in certain places in the world. Examples of Conflict There are many examples of the how blacks and whites are still at war. Although they are not as dominant as before, the Ku Klux Klan is a group that was at war against many different races...

Words: 1458 - Pages: 6

Free Essay

Miss

...1 COM 3702 MEDIA STUDIES Policy management and media representation Semester 1,2014 Student number : 3356-7514 2 I, the undersigned, hereby declare that this is my own and personal work, except where the works or publications of others have been acknowledged by mean of reference techniques. I have read and understood Tutorial Letter CMNALLE/301, regarding technical and presentation requirements, referencing techniques and plagiarism. Name: Bonita Europa Student Number: 33567514 Date: 2014/05/06 Assignment Number: PORTFOLIO EXAM 892092 Witness: Clint Newkirk 3 TABLE OF CONTENTS 1 MEDIA POLICY AND REGULATION 1.1 History of communication policy 1.1.1 Emerging communications industry policy 1.1.2 Public service media policy 1.1.3 New communications policy 1.2 Freedom of the media 1.2.1 The need for freedom of speech 1.2.2 Threats to it 1.2.3 If, how and why such threats can or cannot be motivate 2 MEDIA MANAGEMENT AND MEDIA MARKETS 2.1 Media concentration 2.1.1 The relationship of competition to concentration 2.1.2 The dual nature of the media industry 2.1.3 The four forms of concentration 2.1.4 The dangers of concentration 2.1.5 Positive externalities of the media 3 REPRESENTATION AND THE MEDIA 3.1 Media representation of race 3.1.1 Discussion 3.1.2 Brief explanation of the meaning whiteness 3.2 Media representation of violence 3.2.1 An introductory thesis (argument) 3.2.2 Define and explain representation, violence and moral panics 3.2.3...

Words: 12513 - Pages: 51