IEEE International Conference on Data Engineering
Business Intelligence from Voice of Customer
L. Venkata Subramaniam, Tanveer A. Faruquie, Shajith Ikbal, Shantanu Godbole, Mukesh K. Mohania
IBM India Research Lab, India
{lvsubram,ftanveer,shajmoha,shantanugodbole,mkmukesh}@in.ibm.com
Abstract— In this paper, we present a first of a kind system, called Business Intelligence from Voice of Customer (BIVoC), that can: 1) combine unstructured information and structured information in an information intensive enterprise and 2) derive richer business insights from the combined data. Unstructured information, in this paper, refers to Voice of Customer (VoC) obtained from interaction of customer with enterprise namely, conversation with call-center agents, email, and sms. Structured database reflect only those business variables that are static over
(a longer window of) time such as, educational qualification, age group, and employment details. In contrast, a combination of unstructured and structured data provide access to business variables that reflect upto date dynamic requirements of the customers and more importantly indicate trends that are difficult to derive from a larger population of customers through any other means. For example, some of the variables reflected in unstructured data are problem/interest in a certain product, expression of dissatisfaction with the business provided, and some unexplored category of people showing certain interest/problem.
This gives the BIVoC system the ability to derive business insights that are richer, more valuable and crucial to the enterprises than the traditional business intelligence systems which utilize only structured information. We demostrate the effectiveness of
BIVoC system through one of our real-life engagements where the problem is to determine how to improve agent productivity in a call center scenario. We also highlight major challenges faced while dealing with unstructured information such as handling noise and linking with structured data.
I . I N T RO D U C T I O N
Business Intelligence (BI) refers to methodologies and technologies for the collection, integration, and analysis of all relevant information in a business for the purpose of better business decision making. This is in essence discovering different variables crucial for business and their correlation against variables that define success of the business. There are several established products in the market to perform BI such as SAS1 and Cognos2. These products use back-end structured information stored in a data warehouse that include customer profiles, transactions, billing and usage data.
Structured information constitute only a small portion of large volumes of information available in enterprises. Gartner3 estimates that structured data constitutes only 20% of the complete enterprise data. Major portion of the enterprise information is unstructured data, which among others include communications from the customer, training manuals, and product manuals. These communications from the customers,
1 http://www.sas.com
2 http://www.cognos.com
3 http://www.gartner.com
1084-4627/09 $25.00 © 2009 IEEE
DOI 10.1109/ICDE.2009.41
which we refer to in this paper as Voice of Customer (VoC), constitute a major portion and is ever expanding.
VoC when combined with structured data is an interesting candidate from BI perspective because it provides a dynamic view of customer needs, problems, opinions, sentiments, inclinations, and propensities that change from time to time. Access to these variables provide an opportunity to dynamically optimize/control the entire business process more effectively.
Realizing the importance of VoC, enterprises are now beginning to use them along with structured data to get insights that can be used to improve performance in terms of quality, operational efficiency, and revenue. However, enterprises today resort to manual process to derive insights from voice of customers. A small set of people called quality analysts (QA) randomly select some voice recordings or emails and use survey verbatims to analyze metric such as communication skills, problem resolution, lapses in quality, service time, and so on. This kind of manual analysis suffers from problems like small sample set and subjectivity between QAs. More importantly, it is a very time consuming process and results in shallow analysis.
In this paper, we describe a system we developed called
BIVoC (Business Intelligence from Voice of Customer) where a significant portion of the VoC analysis and integration of
VoC with structured information for deriving BI is automated.
Advantages of BIVoC includes:
1) Ability to derive richer business insights, as a result of the use of new set of business variables extracted from combination of structured and unstructured data.
2) Ability to derive accurate business insights, as a single system is used over a large collection of data to compute statistics. 3) Removal of subjectivity, as there are no disagreements and confusions as like in the case of manual analysis where QAs typically differ in their perceptions.
4) Reduction in turn around time, thus providing opportunity to optimize/control business processes dynamically with smaller response time.
In the paper, we have illustrated the potential of our system with one of the solutions we have derived for a call center partner who operates the business of car rentals. This partner wanted to know how to gain competitive edge by improving their booking rates. We took the audio transcripts of their customer-agent conversations and linked them with structured records which contained information like business outcomes,
1391
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
agent names, car types, booking cost, booking duration, and so on. Then the text mining techniques, in the BIVoC system, are used to identify the differences between approaches and practices used by successful agents and unsuccessful agents.
This in fact resulted in discovering call handling behaviours of the agents which influenced the booking rates positively and negatively. These findings were incorporated in the training program of selected agents and their performance was measured against other agents not included in training programme and also against their past performance. This resulted in 3% improvement in the booking rates. Extracting such kinds of business insights is almost impossible with manual analysis or existing BI systems.
The organization of this paper is as follows: Section 2 discusses some prior work in traditional BI, and BI using VoC.
Section 3 takes a deeper look at VoC and discusses major issues and challenges in using them in a BI system. Section
4 describes the architecture of our BIVoC system. Sections 5 and 6 present use cases. Sections 7 and 8 discuss and conclude the paper.
A. Contributions of this paper
In this paper we describe a BI system, we have developed called BIVoC, that can derive richer business insights from a combination of structured and unstructured information in information intensive enterprises. The unique contribution of our system are:
1) An engine to link VoC to structured database, which is specifically a challenging task given the fact that VoC is usually very noisy.
2) An integrated architecture for extracting business insights through unified analysis of large collection of structured and unstructured data.
3) An Automatic Speech Recognition (ASR) engine, where effort has been made to improve recognition accuracy of specific parts of speech that are important from the perspective of further analysis to derive business insights. Additional contributions of this paper are the discussions about various challenges faced while building the system.
I I . BAC K G RO U N D
Business Intelligence systems are used widely across many industries such as retail, finance, insurance and telecom. BI systems are typically used to monitor business conditions, track Key performance Indicators (KPIs), aid as decision support systems, perform data mining and do predictive analysis.
They are used in a variety of ways like real time dashboards, interactive OLAP tools or static reports. Traditionally BI systems operate on structured data gathered in a data warehouse.
These systems usually use data such as transactional data, billing data, usage history and call records for applications such as churn prediction [24], customer lifetime value modeling [17],[15], campaign management [16], customer wallet estimation [12] and data mining [1]. There are several mature products available in the market which operate on structured
data. Recently, text mining technologies are being used to search, organize and extract value from voice of customer
[14][4][19][20]. However, still most of the unstructured data processing is manually done by QAs.
There has been a lot of work on specific tools for contact centers. Some of these are geared towards automating manual process and thus reducing contact center operation costs. These include call type classification for the purpose of categorizing calls [21], automatic call routing [10] [7], agent assisting and monitoring [13], building of domain models [18] and analyzing records of customer contacts [14]. Some other tools are used to measure and track the KPIs of contact centers and self service voice applications [5]. Companies like
NICE4 and VERINT5 provide analysis tools for measuring and monitoring agent performance in terms of average handle time, tone, emotion and agent accent. They also use word spotting
[23][22] technologies to index audio conversations and provide a framework to write rules to discover associations. However, these tools are not geared towards discovering patterns in the larger business interest since they focus on tracking the contact center performance but not the business metrics.
I I I . VO I C E O F C U S T O M E R
Voice of Customer (VoC), as defined in Section 1, refers to customer communications, such as conversational voice recordings, emails, text messages, chat transcripts, and agent notes. Most of the VoC is collected through contact centers.
The fact is that VoC can enrich business insights and turn contact centers, that are often viewed as cost centers, into profit centers. Apart from deriving valuable insights through contact centers it is possible to influence the customer back through contact centers based on those insights. However, carrying this out in an effective manner requires effective BI systems for
VoC.
Figure 1 shows a few examples of sanitized VoC, where phrases that are highlighted provide valuable feedback to the enterprise. In these examples, we have highlighted phrases that refer to service quality issues and point to efficiency lapses.
They also reflect the sentiments and opinions of the customers and indicate the level of (dis)satisfaction of the customer or his churn propensity. They also point to the products, services, and features customers are interested in. Enterprises have access to such kind of valuable information only through the
VoC. However VoC is often noisy with multilingual phrases, unconventional abbreviations and shorthands, spelling and grammatical mistakes.
Figure 2 gives a high level overview of how VoC can be used to mine information for operational insights and customer value management. When the VoC is analyzed in the context of existing structured enterprise data it can also answer the whys apart from just the whos. For example, apart from discovering who is potentially going to churn, it is also possible to discover why a person is going to churn. This gives the organisation an opportunity to be more proactive.
4 http://www.nice.com
5 http://www.verint.com
1392
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
Contact center notes:
1. the cust secratory called up and he inf tht he was not able to access GPRS
,he was not able to confirm whether its or ,and he told that he will call back with other details later and disconn teh call
2. Customer was charged SMS for
Rs.2013.But customer didnt give request for deactivation of 10000sms pack.Since om dwn,not able to chk active or not.But its shows active in new crm window.
Emails:
1. Call center officer asured that requstwill be carried out within 2 to
3 daysbut it seems that nothing has been initiated till date in this regard If neccessary correction is not done we would "not like to acceptgreat services of your company"
2. I have a , with postpaid as of now, an feel my bill is too high as per my undestanding, I almost feel robbed when paying my bill
Maybe, The plan is not appropriate ...
SMS:
1. Pl. confirm the receipt of payment of Rs. 500 paid on 19.05.07 vide receipt
1243213 at Karanagar. Thanks
2. No care for customer is what focus on. hai.custmer ko satisfied hi nahi karte. I’ve to leave as it is not solving my problem. Gudbye Keep NOT care customers 3. not activating unable to connect to
Call transcripts:
1. ME CHECK BECAUSE OF WHICH IS CHARGES
ULTIMATE I WANT TO DISCONTINUE WITH AUTO
DEBIT FACILITY LIKE TO YOUR ACCOUNT
YES SIR YOUR ACCOUNT AND I AM A TO YOU
LIKE TO SEND A SIGNED APPLICATION ALL
CANCELING IS THIS YOU OK THANK YOU CAN I
DO ANYTHING ELSE FOR YOU
2. PLEASE TELL ME HOW CAN I HELP YOU
CREDIT CARD FOR WHICH I WAS TOLD TO PAY
ONE TIME MEMBERSHIP FEES OF TWO HUNDRED
AND SEVENTY FIVE BUT LATER THEY DEBIT
THE AMOUNT FROM MY SAVINGS ACCOUNT IF
YOU ARE GIVING CREDIT CARD OF YOUR TO
OTHERS WHY YOU ARE CHARGING ME
Fig. 1.
A. Challenges in using VoC
There are three major issues in using VoC for BI. The first challenge is data quality. VoC data is noisy and contains not only spelling and grammatical mistakes, but also inconsistent and incomplete sentences. Sometimes the content is multilingual where the customer expresses himself in two or more languages. As can be seen from Figure 1, in addition to the above mentioned problems, text messages use non-standard linguistic forms. Call transcripts are especially noisy because they are generated using an automatic speech recognition
(ASR) system. Best Word Error Rate (WER)6 reported in literature for conversational telephone speech are in the range
20%-40%, which leads to highly noisy transcripts from ASR.
In fact, WER of call center speech transcription is expected to be even higher because of huge variabilities introduced by various noise sources such as cross talk, key strokes, breathing sound, long silence, hold music, false starts, sentence corrections, rapid variation in distance between mouth and microphone due to other activities performed during the conversation. Additionally, background noise picked up by agent and customer handsets, differences in the channels (such as land line, mobile phone, VOIP) used by customers makes automatic transcription of call-center speech a much harder task. Customer’s mood such as pleasure, satisfaction, agitation, and frustration bring in additional variabilities.
The second major challenge is in integrating the VoC with other enterprise structured and unstructured data. To integrate the VoC with the structured information of the customers and products, a crucial requirement is to identify and extract named entities, such as name, date of birth, part numbers, etc., mentioned in the VoC that convey customer and product identification information. In case of calls, erroneous ASR leads to partial recognition of named entities. Other VoC channels can also provide partial and erroneous named entities.
As we will see in the later sections, specialized techniques at text analytics stage can be used to handle such partially recognized/mentioned named entities that make integration of calls with structured data possible.
The third challenge in using VoC for BI is in storing and processing large volumes of data. This is in fact a major issue for the conversational speech data. One of the help desk accounts, which we worked with, generated about 150GB of recordings every day. Typical state-of-the-art ASR systems take half the duration of the call on a 3-4GHz processor for transcription. This means huge computational power is required to process these recordings. ASR systems can be made faster through avoiding computationally costly steps such as speaker adaptation and multi-pass recognition. How6 Accuracy of ASR system is measured in terms of Word Error Rate (WER) as defined by equation below:
Word Error Rate =
S+D+I
N
(1)
where S, D, and I are respectively the substitution, insertion, and deletion counts after aligning recognized text against the reference text. N is the total number of words in the reference text.
Voice of Sanitized Customer Examples
1393
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
Fig. 2.
System overview.
ever, reduction in speed always comes at the cost of increase in WER, which in turn means more noisy transcriptions.
I V. B I VO C S Y S T E M A R C H I T E C T U R E
In Figure 3 we show the architecture of the BIVoC system. The final goal is to obtain useful associations between structured and/or unstructured concepts. In this section we outline the components and steps involved in obtaining BI from structured/unstructured data.
A. Data Processing
Contact centers produce gigabytes of data every day in the form of audio, email, SMS etc. Voice is a prominent component of the voice of customer data. This is because a bulk of the customers contact agents over phone. The contact center we have been working with does more than 70% of its business through voice. The first step in voice processing is to transcribe it and convert it into text. We will now describe the steps involved in transcribing the audio to text, we will also outline the challenges in doing automatic speech recognition, and the challenges in using the noisy transcription data.
1) Automatic speech recognition: Automatic Speech
Recognition (ASR) is a crucial component in BIVoC system because the performance of subsequent components depend heavily upon the accuracy of speech transcriptions obtained.
As mentioned in Section 3, crucial information relating to the identity of the customer is required to be extracted from the calls to link with the structured database. Identity of the customer is typically reflected in named entities such as names,
data of birth, telephone number, etc, uttered during the call.
However, high WER of ASR system typically result in only a partial recognition of these named entities, for example, only the surname or the given name may get recognized, or similar sounding names may get substituted, or only 6 out of a 10 digit telephone number may get recognized. At the end of this sub-section we explain specialized techniques we have developed to 1) improve the named entity recognition and
2) more accurately link the calls to the structured database even with partially recognized named entities. Apart from the entities mentioned above other entities such as problems mentioned, complaints given, information requested are also important to derive business insights.
ASR in BIVoC system is performed using state-of-the-art
Hidden Markov Model (HMM) based large vocabulary continuous speech recognition system. It is a context-dependent phoneme system, using US English phoneme set of size 54.
As like any typical ASR system it is composed of an Acoustic
Model (AM) and a Language Model (LM) to compute acoustic and linguistic scores respectively. Details of development of each of them are explained below.
Acoustic Model training is performed using well-known
Baum-Welch algorithm [2]. Approximately 210 hours of manually transcribed call-center conversational speech data is used to do a flat-start training. There are approximately
4000 context-dependent phonemes, each of them is modeled with a tri-state left-to-right HMM. Emission probability of each HMM state is modeled using Gaussian Mixture Models
(GMM). During training optimal number of Gaussians is found
1394
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
Fig. 3.
System architecture.
for each distinct HMM state, with total number of Gaussians in the whole system being limited to a maximum value of
50000. Feature vectors are extracted by dividing speech signal into frames of 25 milli second frame size and 10 milli second frame shift. From each frame 13 dimensional Perceptual
Linear Prediction (PLP) coefficients [8] are extracted, and 40 dimensional LDA feature is computed from PLP coefficients of 9 consecutive frames.
Language model used in BIVoC system is an interpolated
N-gram model [11]. Independent N-gram models constructed from general purpose US English text and call center specific text are linearly combined with high weight given to callcenter specific model. LM uses a vocabulary size of approximately 40000 words.
Performance of ASR: We have performed recognition experiments with the ASR system we have developed to measure its performance. The database used is our in-house call-center test database containing customer-agent conversational speech in car booking domain and banking domain. Table I gives results of the experiments. As can be seen from the table I, recognition of names is a difficult task. This is because the number of conflicting words in the vocabulary is very high (of the order of tens of thousands) when it comes to recognizing names. Improvements: We have developed some specialized techniques to 1) improve accuracy of linking calls to structured database and 2) improve accuracy of named entity recognition.
These are explained below:
In a single call, typically multiple named entities are uttered, all pointing to the same customer identity. We utilize this fact to more accurately find out the customer identity. For example, suppose that a customer has uttered name, date of birth, and contact telephone number in a call, the ASR is going to recognize each of these partially. As opposed to finding the identity based on individual entities we take all the partially
TABLE I
ASR PERFORMANCE
Entity
Entire Speech
Names
Numbers
Word Error Rate, %
45
65
45
recognized entities together and match them with entries in the structured database. This is expected to result in more accurate recognition of identity than using individual partially recognized entities.
To improve the named entity recognition we first extract topN matching identities from the structured database using the multiple partially recognized entities from the call. These topN identities are then used to limit the number of possibilities for a named entity to N values in the LM to perform a second pass ASR. For example, in case of name recognition if we limit the number of conflicting names to only N names from the topN list found, the over all accuracy of the entity recognition is expected to improve further. In fact using this method we could improve the accuracy of the name recognition by 10% absolute in comparison to the experiment reported in table I.
2) Cleaning Email and SMS data: Not only ASR but also
SMS and email data is noisy. This is because customers use very informal language and do not bother about spelling or grammar. Moreover they often use two or more languages to express themselves. They also tend to use unconventional abbreviations and shorthands for words and sentences. Often emails and sms are received from people who are not customers of the enterprise and most of these emails contain junk information not related to enterprise operations.
Processing SMS and email involves two steps of cleaning.
In the first step we detect spam messages and non-English
1395
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
messages and discard them from further processing as they do not contain useful information. For emails we also remove headers, disclaimers and promotional material from actual messages. We also segregate the agent conversation from customer conversation so that only customer conversation is used for processing. In the second step we handle the noise in customer messages. The domain of noisy text correction is comparatively new, though considerable insight into probable approaches may be taken from the field of automatic spelling correctors [9]. Most of the efforts involved in cleaning sms comes from building domain specific dictionaries which are built to capture common variations of product names and services. We also build dictionaries for common lingo used in text messaging. Still a large number of words are noisy and are not utilized fully.
B. Data Linking
For a unified analysis of structured and unstructured variables it is neccessary to link the customer communications with correct variables in the structured records. In a call transcript many times the customer communications are handled independently by a different department or company and stored in different systems. All the meta-data related to the calls are stored temporarily during transaction time and then deleted later on. Other than this the documents do not contain entities which can uniquely identify structured records. For example named entities spoken during the conversation such as customer’s name, date of birth or product name do not help in uniquely identifying a customer. Even if a unique identifier for structured records is present it may so happen that because of noise it may not help us to uniquely identify the record.
Even though there may be some ambiguity in pin-pointing the correct record based on noisy independent entities, it is possible to disambiguate using a combination of all named entities spoken during conversation [3].
In this section, we formulate the technical problem that we address and propose a solution for it. We start with the simplest problem setting, and then gradually build up to the real problem. The simplest setting is the single-type entity identification problem, where we are given a structured database that contains a table with k attributes {Ai }. We will refer to each row of the table as an entity e, having its own value e.Ai for the k attributes. In addition to the structured table containing entities, we have a collection {di } of documents that refer to the entities. More formally, each document d has a set of terms {ti }. If ei be the central entity for this document, then each term ti will correspond to some attribute ei .Aj of entity ei . For instance, a document about a transaction entity refers to the customer name, shop name, date attributes of a specific transaction. Given the terms in a document, our goal is to identify the central entity (the specific transaction, in our example) from the given structured table.
One of the challenges is that no explicit identifiers of the entity, such as a unique transaction number, may be available in the document. Additionally, the document is noisy, so that ti does not exactly match its corresponding attribute of ei . The
customer may mention a different transaction amount in her email or spell her name differently from that in the database.
This naturally affects recall. It also affects precision when the noisy and partial information in a document leads to an incorrect entity being identified.
Traditionally, documents are scored against potential entities as follows: score(d, e) =
score(ti , e) = i wj × sim(ti , e.Aj ) i j
(2) where wj is the weight associated with attribute Aj and sim(ti , e.Ai ) is the fuzzy similarity score between the token and the entity attribute. Our focus is not on specific attribute similarity measures — the best similarity measure available for specific attributes can be readily plugged into our architecture.
Although Eqn 2 suggests that each token needs to be matched against all attributes of the entity, in practice, this can be avoided. We use annotators to extract relevant tokens from a document and then map each extracted token to a small subset of the attributes for determining matches. Using a Name annotator, for example, we can extract all the names from the document, and match names only against the customer name and agent name attributes of the transaction table. This allows us to efficiently determine a score for an entity for a given document, and the best entity for the document is the one that produces the highest score. Again, the highest-scoring entity can be determined efficiently, without computing scores explicitly for all entities. Performing fuzzy match on each extracted token in the document results in a ranked list of possible entities. Then, we can use the Fagin Merge algorithm
[6] to efficiently merge multiple ranked lists to find the highestscoring entities for the entire document.
This solution works when all documents are focused on the same entity type, or, in other words, we only need to decide between entities from the same table. As we generalize to the multi-type entity identification problem, we have multiple entity types (or alternatively, entities from multiple tables), and each document in the collection refers to a specific entity of one of these types. A transaction document can talk about a specific transaction from the transaction table, a customer document deals with issues specific to a customer and mention details such as name, address etc. of a specific customer, while a credit card document can be about the billing date for a specific credit card. This central entity (along with its type) needs to be identified given the terms in the document. This adds to the entity identification challenge when the different types have overlapping attributes. For example, a transaction table and a customer table may both contain the customer’s address, which may get mentioned in the document. But, independently of other attributes, the presence of a customer address in a document provides more evidence for a customer entity than a transaction entity. We capture this in our representation by associating a different weight wij with each attribute Ai for different entity types Tj . The scoring function is updated for
1396
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
pairs as follows: wj k × sim(ti , e.Aj )
score(d, e, Tk ) = i (3)
j
Using this scoring function, the highest-scoring pair can be determined efficiently using a variant of the Fagin
Merge algorithm used earlier.
It is interesting to note that this problem is not the same as categorizing documents into types. Imagine a document where a customer lists all his credit card numbers to identify himself.
A traditional classifier that predicts the document category based on the annotations contained in it is likely to tag this as a credit card document. Only when we look at the attribute values in the database do we see that each credit card reference in the document contributes to a different credit card entity in the credit card table. But they all point to the same customer entity in the customer table. Therefore, the aggregate score for the pair turns out to be higher than that for each of the pairs.
The highest-scoring pair can be identified using this scoring function when the attribute weights wik are provided. But these are typically hard to specify manually. Alternatively, these can be learned from the document collection itself if documents labeled with entity types are available; we can learn the weights from training data. It is however unrealistic to assume availability of any such labeled data; in the very least it will be very costly to obtain. Thus in our system, we learn weights in an unsupervised fashion using an EMstyle approach that obviates the need for training samples. We start from an initial estimate of the weights, which we use to assign each document to an entity of a specific type. From n this assignment, we re-estimate the weights as wij = P ij ij , in where nij is the number of occurrences of attribute Ai in documents assigned to type Tj . This two-step process is continued for a fixed number of iterations or until convergence.
Once these weights have been learnt from the document collection, they can be plugged into the scoring function in
Eqn 3.
C. Annotation
Simple applications of the well-known association rule extraction techniques used in data mining do not work well in text mining. Unlike basket analysis, many of the items, such as the words and phrases in a text, tend to have some dependencies upon each other. So it is important to first extract the important concepts from the text. we use the term
“concept” as a representation of the textual content in order to distinguish it from a simple keyword with the surface expression. Each domain has important terms for analysis, we make a list of words extracted from call transcriptions sorted by their frequency and ask domain experts to assign semantic categories to words that they consider important. The domain experts are also asked to assign appropriate canonical forms to take care of synonymous expressions or variations in the expressions. This dictionary consists of entries with surface
representations, parts of speech (PoS), canonical representations, and semantic categories as depicted by the following example from the car rental domain.
• child seat [noun] −
→ child seat [vehicle feature]
• NY [proper noun] −
→ New York [place]
• master card [noun] −
→ credit card [payment methods]
We observed that the number of frequently appearing words is relatively limited in a textual database, especially when the content belongs to a narrow domain. The workload for this dictionary creation has been relatively small in our experience.
In natural language, there are many ways to express the same concept. Users are allowed to define patterns of grammatical forms, surface forms and/or domain dictionary terms.
A user is expected to define some patterns in which he is interested. For example to identify how car rental agents are putting their requests or mentioning value selling phrases, user defined phrases could be as follows. Usually a domain expert has to manually create these patterns based on experience.
• please + VERB → VERB[request]
• just + NUMERIC + dollars → mention of good rate[value selling] • wonderful + rate → mention of good rate[value selling]
This allows us to associate communicative intentions with predicates by analyzing grammatical features and lexical information. For example, for “rude”, we can define following expression patterns.
• X was rude. → rude[complaint]
• X was not rude. → not rude[commendation]
• Was X rude? → rude[question]
The previous dictionary look up process assigns semantic categories to each word without considering any features around the target word. The pattern extraction phase extracts groups of words or phrases and assigns them labels such as value selling and complaint.
D. Indexing and Reporting
Once appropriate concepts have been extracted from the documents, we can apply various statistical analysis methods in data mining to the set of concepts as well as to the structured data. As a result, even a simple function that examines the increase and decrease of occurrences of each concept in a certain period may allow us to analyze trends in the topics. Also, the semantic classification of concepts enables us to analyze the content of the texts from the viewpoints of various semantic categories. The dataset is indexed based on the annotations (semantic classifications). This allows quick reporting to be done on datasets containing even millions of documents. In the following subsections, we introduce some analysis functions that are useful for gathering insights [14].
1) Relevancy Analysis with Relative Frequency: The basic idea of relative frequency is very simple. It compares the distributions of concepts within a specific data set featured with one or more concepts with the distribution of the concepts in the entire data set. For example, given conversational data from the car rental reservation center, the distribution of the
1397
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
Fig. 4.
Two dimensional association analysis.
phrases in calls for customers enquiring about rates may be different from calls where customers make bookings. By sorting phrases in a category based on the relative frequencies, relevant concepts for a specific data set are revealed.
2) Two Dimensional Association Analysis: In order to obtain valuable insights it is important to find useful associations among concepts. Some of these concepts could be dimensions from unstructured data and others could be from structured data. For example, they may form a compound word, or they may have other grammatical relationships with each other, such as between a verb and its object. Thus, the application of association rule extraction by considering the text data as a basket full of concepts expressed by words and phrases usually results in a list of item sets that correspond to typical compound words and predicate-argument pairs.
In order to extract the valuable relationships among concepts, it is important to pre-identify what kinds of relationships can be valuable. For example, given a set of car rental conversations, it may be valuable to know what kinds cars get booked from a given location. Then we would need to target expressions that indicate car types such as ”full-size,” and
”SUV” as well as places such as ”New York,” ”Los Angeles,”
”Seattle,” and ”Boston.”
Once we set up such car categories and location names, we need to develop a dictionary and information extraction rules for identifying mentions of each item as described in Section
4.2. For example, ”SUV” may be indicated by ”a seven seater,” and ”full-size” may be indicated by ”Chevy Impala.” As a result, we can fill in each cell in a two-dimensional table as in Table II by counting the number of texts that contain both the column and row labels.
Because of the differences in recall and precision for
TABLE II
T W O D I M E N S I O N A L A S S O C I AT I O N A N A LY S I S
Location category New York
Los Angeles
Seattle
Boston
SUV
Vehicle type category mid-size full-size luxury car
information extraction for each concept, the absolute numbers may not be reliable. Still, if we can assume that the recall and precision for extracting each concept are coherent over the whole data set, we can calculate indices showing the strengths of the associations for each cell compared to the other associations in the table.
One simple measure of the correlation between a vertical item and a horizontal item would be
Ncell × N
Ncell /N
=
(4)
Nver × Nhot
(Nver /N )(Nhor /N ) which is the point estimation of the exponential of the mutual information, given the number N as number of records in all of the data, Ncell as the number of records with both the horizontal and vertical items, Nver as the number of records with the vertical item, and Nhor as the number of records with horizontal item. However, it can be inaccurate when the value of Ncell , Nver , or N is not sufficiently large. To avoid this problem, we use the left terminal value (smallest value) of the interval estimation instead of the point estimation.
Then the cells indicate smaller values than those obtained by the point estimation considering the uncertainty of the three density values in the right-hand member. By using this type
1398
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
of measurement, we can identify pairs of concepts that exhibit stronger relationships than other pairs.
Figure 4 is a screen shot of an association analysis view.
Here we are association the mentions of competitor credit cards in the email with the category assigned to the email.
The left hand pane shows all the concepts between which association analysis can be performed. One can drill down through table cells right upto individual documents to gain understanding at a finer level.
data and the structured data, we tried to find insights to improve agent productivity.
B. Analysis Results and Actionable Insights
Table III shows the result of two dimensional association analysis between customer intention and pick up. These results
TABLE III
A S S O C I AT I O N B E T W E E N C U S T O M E R I N T E N T I O N S A N D P I C K U P R E S U LT S
V. U S E C A S E : AG E N T P RO D U C T I V I T Y I M P ROV E M E N T
Customer Intent
In this section we share our experience in using customer agent conversations in the context of other structured data to improve agent productivity.
A. Analysis Approach
We use telephonic conversation data from a car rental help desk. There are three types of calls:
1) Reservation Calls: Calls which got converted. Here,
”converted” means the customer made a reservation for a car.
2) Unbooked Calls: Calls which did not get converted.
3) Service Calls: Customers enquiring or changing a previous booking.
Every day about 1800 calls (about 25% of all calls) get recorded covering about ninety agents. Our goal was to improve agent productivity by analyzing the recorded conversations. Agent productivity is measured in terms of the ratio of reserved calls to unbooked calls.
Using automatic speech recognition we transcribed the calls.
We also linked each call to the structured reservation database to find out details about the call.
We identify the key concepts from the call transcripts and group them under appropriate semantic categories. Examples of some semantic categories we prepared are:
• Customer intention at start of call: From the customer’s first or second utterance, we extract the following intentions based on the patterns.
– Strong start: would like to make a booking, need to pick up a car, want to make a car reservation, . . .
– Weak start: can I know the rates for booking a car,
I would like to know the rates for a full size car, . . .
• Discount-relating phrases: discount, corporate program motor club, buying club . . . are registered into the domain dictionary as discount-related phrases.
• Value selling phrases: we extract phrases mentioning good rate and good vehicle by matching patterns related to such utterances.
– mention of good rate: good rate, wonderful price, save money, just need to pay this low amount, . . .
– mention of good vehicle: good car, fantastic car, latest model, . . .
In addition we also linked the call to obtain structured fields like the agent name, booking date, booked car type, call type and so on. Using the categories defined on the unstructured
Strong start
Weak start
Result reservation unbooked
63%
37%
32%
68%
show that how the customer begins the call has a bearing on the outcome of the call. Further by analyzing the Weak start calls that were successful, we found that in these calls agents were offering more discounts to convert the customers into reservation cases.
Table IV shows the result of two dimensional association analysis between agent utterance after quoting rate and customer objection to rate.
TABLE IV
A S S O C I AT I O N B E T W E E N A G E N T U T T E R A N C E A N D
Agent Utterance
Value selling
Discount
C U S TO M E R O B J E C T I O N
Result reservation unbooked
59%
41%
72%
28%
We observed the effect of certain phrases on the customers.
The results here show that mentioning value selling phrases and discount phrases results in more bookings. Agents who were doing well were successful in converting weak starts into pick up. They were primarily doing this by offering more discounts to weak start customers. Also good agents in general used value selling phrases more often resulting in more bookings. Two types of insights were derived from the analysis:
1) Customer phrases and their result on the outcome
2) Agent phrases that resulted in positive outcome
C. Measuring Improvements in Agent Productivity
By implementing the actionable insights derived from the analysis in an actual car rental process, we verified improvements in booking. We divided the 90 agents in the car rental reservation center into two groups. One of them, consisting of 20 agents, was trained based on the insights from the text mining analysis. These 20 agents were told about the findings from the system. They were told that customers can be classified as weak and strong start customers based on the way they began their calls. They were also told that weak start customers typically did not end up booking a car.
1399
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
The agents were told that offering discounts to weak start customers resulted in more bookings from them. Also they were asked to use value selling phrases more generously in their conversations with the customers. The list of positive value selling phrases were provided to them. The remaining
70 agents were not told about these findings. By comparing these two groups over a period of two months we hoped to see how the actionable insights contributed to improving agent performance. As the evaluation metric, we used the reservation ratio - that is the ratio of the number of reservations to the number of unbooked calls. Following the training the pick up ratio of the trained agents was higher by 3% compared to the general population of agents. Before training the ratios of both groups were comparable. We confirmed that this difference is meaningful because the p-value of the t-test statistic is 0.0675 and this probability is close to the standard t-test (α=0.05).
VI. USE CASE: CHURN PREDICTION
In this section we share our experience in implementing a solution for churn prediction and analysis using the features extracted from customer emails and sms messages received at the contact center. The client we worked with is one of the biggest telecom service providers in wireless telephony serving many different geographical regions providing many subscription plans in prepaid and post categories along with a host of value added services. Each geographical region is managed by their respective business heads who were responsible for managing churn in that geography. Each of these heads implemented a different churn management strategy.
While some were proactive and reached out to all customers whom they thought were going to churn, some others reached out to high value customers only. Almost all of them agreed more or less on key drivers that affected churn. For example, a few drivers that affect churn are competitor tariff, quality of problem resolution, service related issues, billing related issues, low awareness of services etc. However there was variation in the variables these people tracked to detect churn.
Most of these variables were extracted from structured data and included bill values, payment history, variation in payment days, method of payment and usage pattern. Our objective was to use the voice of customers who had already churned and discover the presence of churn drivers in the voice of existing customers. We trained a classifier using VoC of churners and non-churners to predict future churners. Here we list some of the challenge in building such a system
Noisy text: One challenge was to extract dimensions that represent churn drivers from noisy emails and sms messages.
As a first step we converted the sms messages to a common standard representation suitable for further processing. We invested a lot of effort into building domain dictionaries and common sms lingo for capturing variations in important products and services. Similarly we filtered out sms messages which largely contained non-english words using a dictionary.
Emails were relatively free from shorthands, however, they had to be segmented to separate agent voice from customer voice. Integration: We analyzed churn for pre-paid customers because they formed almost 78% of our client customer base and unlike post-paid customers these customers churned without warning and didn’t offer any opportunity for the enterprise to retain them. In order to associate a churn label to the emails and sms messages we linked them with respective customer records in the database which stores churn status and churn date, if any. We used our data linking engine to associate sms messages and emails with the corresponding records. Around
18 % of emails could not be linked. Most of these emails were from people who were not customers of the enterprise.
Imbalanced data: We conducted analysis on 47460 emails out of which only 3% emails came from churners. Similarly we analyzed 289314 text messages out of which 7.6% came from churners. These are highly imbalanced classes and identifying key features corresponding to churn drivers was a challenge. We conducted churn prediction experiments on communications from customers belonging to a particular geographical region. We took emails and sms messages for one month and identified potential churners based on these communications using the classifier we trained. The churners and non-churners for this geography were already known and we compared the number churners we were able to predict against the actual churners for that month. We found that we were able to detect
53.6% percent of churners correctly using emails.
VII. LESSONS LEARNT
In this section we discuss our experience in using VoC for BI applications. We have worked with several services organizations like contact centers and direct contact cells. We share some of our experiences and bring forth the challenges and issues in such research engagements. We also highlight some lessons we learned while working with the services industry. Data ownership: As researchers, we believe we understand problems around data management in business scenarios where a 360o view of the customer is important for managing customer relationship and engagement. In order to verify our hypotheses about the various customer and business dimensions we need to have an integrated view of these dimensions. These dimensions are located in different data sources (structured or unstructured) and geographies and are most often managed by different organizations. Particularly in contact centers and other outsourcing avenues, the only data interchanged between a contact center and a parent enterprise may be monthly summary reports measuring some dimension like customer satisfaction with particular services.
Security and privacy considerations also limit access rights of contact centers to only front end of CRM systems using secure virtual desktops. For example while the contact center records the agent customer interactions and satisfaction ratings, it is very difficult for them to associate additional dimensions such as billing history, purchase frequency etc. thus hampering integration efforts of the kind targeted by the BIVoC system. However, in cases where it is technically
1400
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
possible to bring together all the data sources, we believe
BIVoC-like systems can prove to be valuable.
The other impact of security and privacy concerns is that rich proprietary data of the kind we have described is limited to the industrial segment. This data cannot reach the research community at large. Data in services organizations and problems associated with it, we believe, are goldmines for information management research. Scalability challenges, the interesting aspects of noise in unstructured data, and measures for success in services engagements are some of the new problems we came across in our BIVoC engagements.
Services engagement model: Service organizations (contact centers as an example) are process driven. Each process can be viewed as a sequence of steps which defines the work flow. Each step is a fixed job which people are expected to perform routinely. Like any other organization, services organizations strive for operational efficiency and cost effectiveness. Operational efficiency means the people management process should be able to scale up operations both in terms of volumes as well as variations in workload. Cost effectiveness without compromising quality helps in directly impacting the bottom line of the organization. However the services industry is all about servicing people; in our opinion this differentiates service arms from other business divisions.
Most middle management in services organizations expect research to provide them with software tools and implementations that can help save money by being more cost effective. Fortunately there are a few people who can see that research can provide value not just in reducing cost but in helping them rise in the value chain. In our experience crystallizing the engagement model as one of collaborative value creation rather than software/tool deployment is critical to success. Only with engagement from both sides can the gap between delivering technology and effectively using it to impact measurable bottom lines be closed.
VIII. CONCLUSIONS
We have described the BIVoC system resulting from our research engagements with various services industry clients.
Structured data warehousing and BI has traditionally helped in identifying ‘what’ is happening with an enterprises’ customers; what they are buying, who is leaving, and what they want.
However, only unstructured data coming from the voice of the customer can give us insight into ‘why’ certain effects and patterns are being seen. In particular we described VoC channels of unstructured data present in services arms of enterprises. We then described the BIVoC architecture that enables combined insight delivery over structured and unstructured VoC data.
With our agent productivity improvement use case in a voice-based car rental contact center account, we showed the combined analysis we could do over structured and unstructured data sources. We argue that such rich analysis is not possible without achieving the combined all-round view of customers and enterprises. We highlighted several technical challenges relating to speech recognition, entity annotation, linking text to structured records, mining patterns and rules
of interest, and visualizing results and relating them to business insights. We are currently applying the BIVoC system architecture in other targeted industry verticals through client engagements; we believe domain knowledge thus learned can be fed back into future product and service offerings.
REFERENCES
[1] R. Agrawal and T. Imielinski. Database Mining: A Performance
Perspective. IEEE Transactions on Knowledge and Data Engineering,
1993.
[2] L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A Maximization
Technique Occuring in the Statistical Analysis of Probabilistic Functions of Markov Chains. Ann. Math. Statistics. vol. 41, no. 1, pp. 164-171.
1970.
[3] V. T. Chakravarthy, H. Gupta, P. Roy and M. K. Mohania. Efficiently linking text documents with relevant structured information. VLDB,
2006.
[4] W. F. Cody, J. T. Kreulen, V. Krishna, and W. S. Spangler. The integration of business intelligence and knowledge management. IBM
Systems Journal, pages 697–713, 2002.
[5] S. Douglas, D. Agarwal, T. Alonso, R. M. Bell, M. Gilbert, D. F.
Swayne, and C. Volinsky. Mining customer care dialogs for “daily news”. IEEE Trans. on Speech and Audio Processing, 13(5):652–660,
2005.
[6] R. Fagin. Fuzzy queries in multimedia database systems. PODS, 1998.
[7] P. Haffner, G. Tur, and J. H. Wright. Optimizing svms for complex call classification. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, April 6-10 2003.
[8] H. Hermansky. Perceptual Linear Predictive (PLP) Analysis of Speech.
Journal of Acoustic Society of America. vol. 87, no. 4, pp. 1738-1752,
1990.
[9] K. Kukich. Technique for automatically correcting words in text. In
SCM Computing Surveys, volume 24, 1992.
[10] H.-K. J. Kuo and C.-H. Lee. Discriminative training of natural language call routers. IEEE Trans. on Speech and Audio Processing, 11(1):24–35,
2003.
[11] C. D. Manning and H. Schutze. Foundations of Statistical Natural
Language Processing. MIT Press. 1999.
[12] S. Merugu, S. Rosset, and C. Perlich. A new multi-view regression approach with an application to customer wallet estimation. In Proceedings of 12th ACM SIGKDD International conference on Knowledge discovery and data mining, 2006.
[13] G. Mishne, D. Carmel, R. Hoory, A. Roytman, and A. Soffer. Automatic analysis of call-center conversations. In Conference on Information and
Knowledge Management, Bremen, Germany, October 31-November 5
2005.
[14] T. Nasukawa and T. Nagano. Text analysis and knowledge mining system. IBM Systems Journal, pages 967–984, 2001.
[15] B. Raskutti and A. Herschtal. Predicting the product purchase patterns of corporate customers. In Proceedings of 11th ACM SIGKDD International conference on Knowledge discovery and data mining, 2005.
[16] S. Rosset, E. Neumann, U. Eick, N. Vatnik, and I. Idan. Evaluation of prediction models for marketing campaigns. In Proceedings of 7th
ACM SIGKDD International conference on Knowledge discovery and data mining, 2001.
[17] S. Rosset, E. Neumann, U. Eick, N. Vatnik, and I. Idan. Customer lifetime value modeling and its use for customer retention planning.
In Proceedings of 8th ACM SIGKDD International conference on
Knowledge discovery and data mining, 2002.
[18] S. Roy and L. V. Subramaniam. Automatic generation of domain models for call centers from noisy transcriptions. In Proc. of COLING/ACL 06, pages 737–744, Sydney, Australia, July 2006.
[19] D. Sullivan. Document Warehousing and Text Mining. John Wiley and
Sons, Inc., New York, 2001.
[20] H. Takeuchi, L. V. Subramaniam, T. Nasukawa, S. Roy, and S. Balakrishnan. A conversation-mining system for gathering insights to improve agent productivity. In Proc. of 9th IEEE International Conference on
E-Commerce Technology and the 4th IEEE International Conference on
Enterprise Computing, E-Commerce and E-Services (CEC-EEE 2007),
Tokyo, Japan, July 23-26 2007.
1401
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.
[21] M. Tang, B. Pellom, and K. Hacioglu. Call-type classification and unsupervised training for the call center domain. In Automatic Speech
Recognition and Understanding Workshop, St. Thomas, U S Virgin
Islands, November 30-December 4 2003.
[22] R. C. Rose and D. B. Paul. A Hidden Markov Model Based Keyword
Recognition System. In Proc. of ICASSP. 1990.
[23] M. Weintraub LVCSR Log-Likelihood Ratio Scoring for Keyword
Spotting. In Proc. of ICASSP. 1995.
[24] Y. Zhang and J. Qi and H. Shu and J. Cao A Hybrid KNN-LR classifier and its application to customer churn prediction. In Proc. of IEEE
Interntional Conference on Systems, Man and Cybernetics. 2007.
1402
Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on July 31, 2009 at 22:10 from IEEE Xplore. Restrictions apply.