Free Essay

Lab Validation Report

In:

Submitted By dcrudar
Words 5195
Pages 21
Lab Validation Report

MemSQL’s Distributed In-­‐Memory Database Real-­‐time Analytics for the Big Data Revolution

By Tony Palmer, Senior ESG Lab Analyst

August 2013

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

2

Contents Introduction ................................................................................................................................................ 3 Background .............................................................................................................................................................. 3 MemSQL .................................................................................................................................................................. 4 Real-­‐time Transaction Processing and Analytics ...................................................................................................... 6 Simple, Durable, Reliable ....................................................................................................................................... 12

ESG Lab Validation ...................................................................................................................................... 6

ESG Lab Validation Highlights ................................................................................................................... 16 Issues to Consider ..................................................................................................................................... 16 The Bigger Truth ....................................................................................................................................... 17

Appendix ................................................................................................................................................... 18

ESG Lab Reports The goal of ESG Lab reports is to educate IT professionals about data center technology products for companies of all types and sizes. ESG Lab reports are not meant to replace the evaluation process that should be conducted before making purchasing decisions, but rather to provide insight into these emerging technologies. Our objective is to go over some of the more valuable feature/functions of products, show how they can be used to solve real customer problems, and identify any areas needing improvement. ESG Lab's expert third-­‐party perspective is based on our own hands-­‐on testing as well as on interviews with customers who use these products in production environments. This ESG Lab report was sponsored by MemSQL.

All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-­‐copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

3

Introduction

The methodology presented in this report was designed to assess the performance capabilities and scalability of the MemSQL real-­‐time analytics platform. Tests were designed to emulate a customer environment where real-­‐world interactive analytic queries are executed while the platform simultaneously ingests new data in real-­‐time.

Background Data analytics have always played a key role in enabling businesses to harness value from electronically stored information. Banking on the potential value that data can bring to an organization, executives are demanding more from it and are expecting faster, more impactful results. As a result, business intelligence and data analytics was the fifth most cited response among the top 2013 IT priorities reported by respondents to ESG’s annual IT spending intentions survey.1 As more information becomes available to businesses via new data sources (such as social networks, sensor data, and machine-­‐generated log data), organizations want to extend their data analytics across their enterprises in a more predictive, real-­‐time manner. The result is an increasing priority on data analytics activities, and subsequently, more pressure on business-­‐analyst and IT teams to deliver results. While data analytics is a priority, it is not without challenges. With the expanding importance of data analytics and the increased pressure to provide more real-­‐time results as data volumes grow, many businesses are driven to consider deploying a new analytics platform. ESG research indicates the most cited data analytics challenge that respondents reported experiencing is tied to data integration complexities (47%), and the next three most frequently cited challenges all have elements related to processing, integrating, and analyzing larger data sets in less time (see Figure 1).2 Figure 1. Data Analytics Challenges Experienced

Which of the following data analyFcs challenges has your organizaFon experienced? (Percent of respondents, N=270, mulFple responses accepted) Data integrafon is complex Lack of skills necessary to properly manage large data sets and derive value from them Data set sizes limit our ability to perform analyfcs Unable to complete analyfcs in a reasonable period of fme Current database license costs are too expensive Current data analyfcs license costs are too expensive Storage requirements are too expensive 0% 10% 20%

47% 34% 29% 28% 25% 21% 21%

Source: Enterprise Strategy Group, 2013. 30% 40% 50%

1 2

Source: ESG Research Report, 2013 IT Spending Intentions Survey, January 2013. Source: ESG Research Report, The Impact of Big Data on Data Analytics, September 2011.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

4

MemSQL MemSQL is a real-­‐time analytics platform built on a highly scalable, in-­‐memory database, designed for simultaneously handling real-­‐time data transactions and analytic workloads. Designed for use cases that require instant access to both real-­‐time and historical data, MemSQL is based on a distributed relational database architecture with an analytics engine that runs on SQL, the most popular database language. Designed around horizontal scale-­‐out on commodity hardware, MemSQL’s two-­‐tiered architecture is comprised of two types of nodes: aggregators and leaves. Aggregator nodes are cluster-­‐aware query routers that act as a gateway into the distributed system. The only data that is stored in aggregators is metadata. An aggregator queries the leaves, aggregates the results, and sends them back to the client. Leaf nodes function as storage and compute nodes. In the shared-­‐nothing architecture, leaf nodes run independently of one another and only communicate to aggregators. When queries are issued from the client machine to an aggregator, they are transformed and spread out to all the leaf nodes using a Distributed Query Optimizer. This optimizer ensures consistent distribution of query workloads so MemSQL can take advantage of the entire cluster’s resources. Figure 2. MemSQL Cluster Architecture

MemSQL has two types of tables: reference tables and distributed tables. Each node in the cluster has an identical copy of all reference tables. Distributed tables are spread across all nodes in the cluster, so each node has a piece of each distributed table. This enables joins to be more efficient, with compute overhead offloaded to the leaf nodes. Key features of the MemSQL analytics platform include: •









Real-­‐time, distributed in-­‐memory SQL analytics: MemSQL is designed to query results across millions of events in seconds while simultaneously processing real-­‐time transactions. Data is stored in-­‐memory, resulting in database insert latency of less than a millisecond. Design for complex analytics: Using a row-­‐based storage engine, MemSQL provides multi-­‐version concurrency control (MVCC) along with lock-­‐free skip lists and hash tables. As a result, read operations never block write operations and vice-­‐versa, resulting in extremely fast analytic results while simultaneously updating data. A massively parallel execution engine: Leveraging the clustered architecture, MemSQL uses all available CPU for every scan operation. Using hash partitioning, data is distributed uniformly across all leaf nodes so there are no hot spots. A distributed query optimizer ensures consistent distribution of query workloads across all cluster resources. Easy integration with existing data management technology: MemSQL provides comprehensive SQL-­‐92 support and MySQL client compatibility, eliminating barriers for analysts and existing or future data technologies. MemSQL also supports standard interfaces (ODBC, JDBC, Excel, etc.), and fully supports a wide range of business intelligence tools. Horizontal scale-­‐out: MemSQL’s distributed in-­‐memory database can grow to thousands of nodes, easily accommodating hundreds of terabytes of data. MemSQL’s ability to horizontally scale on commodity hardware is designed to enable organizations to scale their applications easily with manageable economics.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

5





Each node added to a cluster increases both storage capacity and compute performance to scale query performance linearly. Simple management: MemSQL ‘s real-­‐time dashboard provides visual insights into the software status, hardware performance, and system configuration in a single view. Additional features include cluster topology visualization, slow query analysis, and real-­‐time alerts. MemSQL is designed to maintain high availability while monitoring and providing notification of hardware failures or other issues. ACID compliant, highly available, and fault-­‐tolerant: Using a shared-­‐nothing two-­‐tiered architecture with no single point of failure, leaf nodes run independently and only communicate to aggregator nodes. Queries execute on individual nodes and intermediary results are merged at the aggregator tier. Transactions are committed to disk as a log and later compressed into full-­‐database snapshots, which can be used for server node failure recovery. With replication, MemSQL distributes multiple copies of data on separate nodes, both within the data center and across data centers. Replication to a replacement or slave node can be initiated without having to pause or reconfigure the master.

MemSQL is ideal for use cases that require a fast database with industry-­‐leading query performance and the ability to ingest and query data simultaneously to provide instant access to both real-­‐time and historical data, such as: • • • • •

Operational analytics – Understand how the business is performing in real-­‐time at the most granular level and immediately respond to customers. Operations security – Know immediately when customers are subject to malicious attacks to help prevent financial losses and the erosion of customer confidence. Marketing campaign optimization – Identify top-­‐performing channels in real-­‐time and shift investment away from underperforming channels to maximize the return on marketing spend. Supply chain management – Leverage sensor and machine data from warehouses and shipping channels to keep inventory lean and optimize timing and routes for resupply. Real-­‐time trend analysis – Capitalize on fluctuations in the market as they occur.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

6

ESG Lab Validation

The real-­‐world performance and functional capabilities of MemSQL’s distributed in-­‐memory database were assessed by ESG Lab via hands-­‐on tests at MemSQL’s headquarters in San Francisco, California. The data set and queries were designed to emulate data and analyses typically found in real-­‐world applications.

Real-­‐time Transaction Processing and Analytics ESG Lab tested MemSQL in the cloud, using Amazon EC2 virtual machines configured in clusters of 20, 40, and 80 nodes. In each configuration, there were four leaf nodes for each aggregator node. The data set used two tables adapted from TPC-­‐H: orders and lineitem. Using the TPC dbgen tool as a model, MemSQL wrote tpchgen, a tool that generates real-­‐time SQL statements. To perform inserts and updates, ESG Lab ran 64 tpchgen processes on each aggregator node in parallel.3 Figure 3. The ESG Lab Test Bed

ESG Lab Testing Starting with a 20 node cluster consisting of 4 aggregator nodes and 16 leaf nodes, the tpchgen processes were kicked off on all 4 aggregator nodes simultaneously using cluster-­‐ssh. ESG Lab used the MemSQL dashboard, seen in Figure 4, to monitor the status of all the clusters throughout the testing. The dashboard is simple and intuitive, showing CPU and memory usage for individual nodes and the entire cluster, as well as running performance monitoring that shows insert/update and query performance over time.

3

Detailed descriptions of the test bed, database schema, and query statements can be found in the Appendix.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

7

Figure 4. Dashboard – 20 Node Cluster: 4 Aggregator Nodes, 16 Leaf Nodes

CPU

Memory

The 20 node cluster sustained an insert rate of over 300,000 rows per second. While this workload was running, ESG Lab executed several queries, ranging from simple to complex. Table 1 describes the queries performed while inserting rows into the database.4 Table 1. Analytic Queries

Queries Query 1 Query 2 Query 3 Query 4 Query 5

Descriptions Basic Reporting: Pricing Summary Count Distinct: Parts/Supplier Relationship Complex Join: Discounted Revenue National Market Share Returned Item Report

MemSQL performs SQL to machine-­‐code conversion to provide the most efficient and fastest query performance on terabytes of data. Each SQL statement that is entered for the first time is converted to x86 machine code and stored as a query plan with the parameters stored as variables. This process never caches the data, just the query statement. Each subsequent time that query is run, a highly efficient code path minimizes the number of CPU instructions used, further reducing query time by eliminating the need for query interpretation. This process provides many of the benefits of stored procedures without the rigidity of predetermined queries and allows the platform to remain agile and adapt to changing business requirements. To assess the performance gains provided by MemSQL’s SQL to machine-­‐code conversion process, ESG Lab ran the five selected queries once the database size had reached 500 million rows. Queries were run twice—once to compile, and the next to show execution from plan cache.

4

Complete SQL statements can be found in the Appendix.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

8

Figure 5. Queries From 500 Million Row Database 18 16 14 12 Seconds 10 8 6 4 2 0 Q1 Q2 Compile & Query Q3 Q4 Q5

Query From Plan Cache

As Figure 5 shows, the plan cache reduced query time significantly for both simple and complex queries. Reduction in response time ranged from 28% to 88% across the set of queries being tested. Next, the 20 node cluster was destroyed and a new 40 node cluster was created with 8 aggregator nodes and 32 leaf nodes. Tcphgen was kicked off on all 8 aggregator nodes simultaneously. As seen in Figure 6, the 40 node cluster was able to insert more than 600,000 rows per second.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

9

Figure 6. Scaling to Medium Cluster, 8 Aggregators, 32 Leaf Nodes

In the same sequence, ESG Lab ran the same set of queries as the 20 node cluster and captured run times. Once all queries had been executed, ESG destroyed the 40 node cluster and created a new 80 node cluster with 16 aggregator nodes and 64 leaf nodes. Figure 7. Scaling to Large Cluster, 16 Aggregators, 64 Leaf Nodes

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

10

The 80 node configuration was able to insert nearly 1.2 million rows per second. Again, ESG Lab ran the same set of queries in the same sequence and captured run times. The results of all three tests are shown in Figure 8. Figure 8. Real-­‐time Analytics Performance–Active 500 Million Row Database 1,400,000 1,200,000 1,000,000 Rows Per Second 800,000 600,000 400,000 200,000 0 20 Node Cluster 40 Node Cluster 80 Node Cluster

40.0 35.0 30.0 Query Response Time (sec) 25.0 20.0 15.0 10.0 5.0 0.0

As MemSQL scaled, the amount of data that could be ingested increased linearly while queries continued to execute remarkably fast, even as the database was simultaneously inserting nearly 1.2 million rows per second. Detailed query results for each cluster configuration are shown in Table 2. Table 2. Simultaneous Transaction/Analytic Results on 500 Million Rows of Data

Cluster Size

Rows Per Second 303,526 600,091 1,145,510

Query 1 (Seconds) 34.17 17.62 8.45

Query 2 (Seconds) 0.89 0.93 0.94

Query 3 (Seconds) 30.46 20.46 11.66

Query 4 (Seconds) 4.95 2.50 1.18

Query 5 (Seconds) 10.2 8.5 6.31

4 Aggregators, 16 Leaves 8 Aggregators, 32 Leaves 16 Aggregators, 64 Leaves

A key capability of the MemSQL platform is fast deletes. Customers need to be able to delete data even faster than they can insert it so the system is not overwhelmed. When the data ingest rate is faster than the system can delete, customers are forced to limit the amount of data they retain for real-­‐time analytics. A system that can delete large volumes of data quickly can increase the amount of data that can be retained for real-­‐time analytics. ESG Lab tested deletes using MemSQL on both the 40 node and 80 node configurations. As seen in Figure 9, a delete of more than a billion rows completed in just over six minutes (nearly 2.7 million rows a second) on the 40 node cluster, and just under three minutes (nearly 5.5 million rows a second) with an 80 node cluster.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

11

Figure 9. Deleting One Billion Rows

Delete One Billion Rows

40 Node Cluster

80 Node Cluster

0

50

100

150

200 Seconds

250

300

350

400

Why This Matters

In ESG’s 2013 IT spending intentions survey, many organizations identified improving business intelligence and/or delivery of real-­‐time business information as a key business initiative that will have an impact on their IT spending decisions.5 Considering the volumes of data that organizations intend to analyze in shorter time frames, they need to evaluate whether or not their current approaches are adaptable to these demanding and constantly changing requirements. How long data cleansing tasks take to complete on their largest data sets is a significant business challenge organizations identified to be addressed in new data technologies.6 ESG Lab validated outstanding performance and linear scalability of the MemSQL real-­‐time analytics platform as it was scaled from 20 to 80 nodes. MemSQL was able to insert millions of rows per second while executing complex queries and presenting insight into huge data sets in seconds. In addition, MemSQL proved to be adept at data cleansing activities, deleting a billion rows of data from an 80 node cluster in under three minutes.

5 6

Source: ESG Research Report, 2013 IT Spending Intentions Survey, January 2013. Source: ESG Research Report, The Convergence of Big Data Processing and Integrated Infrastructure, July 2012.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

12

Simple, Durable, Reliable MemSQL is designed to run on a cluster of commodity hardware and uses a shared-­‐nothing two-­‐tiered architecture eliminating single points of failure. MemSQL also comes with an easy-­‐to-­‐use yet robust dashboard, enabling users and administrators to manage the cluster and tune MemSQL to provide optimized performance. ESG Lab Testing ESG Lab analyzed the MemSQL dashboard in each of the three cluster configurations—from 20 to 80 nodes in total. The basic dashboard view for each of these clusters is shown in Figure 4, Figure 6, and Figure 7. The simple dashboard view is intuitive and easy to understand while providing a wealth of information, enabling the user to quickly grasp the state of the MemSQL platform. Like MemSQL itself, the dashboard is a real-­‐time system, providing continuously updated information (see Table 3). Table 3. Dashboard Information

Cluster Size

MemSQL Dashboard Information Number of aggregator nodes Number of leaf nodes Average system CPU usage Average user CPU usage MemSQL memory used Total memory used Available memory Number of rows written per second Number of rows read per second Each node is represented graphically, showing: • User CPU % • System CPU % • Memory Used/Available

Resource Utilization

Instantaneous Performance

Node Performance

By selecting the expanded view of each node icon, as shown in Figure 10, the node graphic provides detailed bar charts for each core, showing MemSQL, system CPU, and memory usage. By providing this data both visually and numerically, and updating the data in real-­‐time, the user is able to quickly and easily determine how MemSQL is utilizing all of the available resources in the cluster. The cluster can be tuned for optimal performance based on real-­‐time performance data and instantaneous feedback for each change.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

13

Figure 10. Expanded Dashboard

Should the user wish to explore the detailed behavior of the cluster in real-­‐time, MemSQL provides an advanced view of all metrics of the MemSQL platform. In the advanced management dashboard, data is presented as a heat map, with one block for each node-­‐metric in the cluster (see Figure 11). This provides clear visual indication of any imbalances or other opportunities to tune the system for optimum performance. As with the basic dashboard, all data is updated in real-­‐time. The advanced management dashboard also provides the ability for the user to select a specific time range, and view the heat-­‐map and metrics for historical data. The heat map is an active object, and hovering the mouse over each block in the map provides a pop-­‐up with more detailed information about the metric for that specific node in the cluster.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

14

Figure 11. Advanced Management

MemSQL uses redundancy level 2 by default, meaning that it distributes two copies of all rows in the database across separate nodes, providing data durability in the case of node failure. To analyze the reliability and durability of the MemSQL platform, ESG Lab first executed a simple count and verified the total number of rows at 1,018,398,477. Next, a leaf node was taken offline to simulate a node failure. Figure 12. Resilience

With one node offline, the cluster continued to operate, returning 1,018,398,477 rows when the same simple count was performed. As expected, losing one node out of the 40 total nodes had minimal impact on CPU and memory © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

15

resources, with the time required to count all rows increasing slightly from 2.31 seconds for the full cluster to 2.99 seconds for the degraded cluster. ESG Lab then ran the same complex queries that were run before the node failure and verified that MemSQL returned exactly the same results, indicating that all data was available in the degraded cluster. MemSQL provides the ability to rebalance the cluster, whereby data is redistributed evenly across the nodes in the cluster when nodes are added or taken offline. After taking a node offline, ESG Lab added a new node in the cluster, bringing the cluster back to 8 aggregator nodes and 32 leaf nodes, and initiated rebalancing. At the conclusion of the operation, the dashboard showed consistent memory usage across all 40 nodes, indicating the data was spread evenly across all nodes in the cluster.

Why This Matters

Big data is already a reality for many businesses—one-­‐third of respondent organizations have at least 6TB of data in their single largest data analytics sets. Additionally, more than half of these organizations are pulling from at least three unique data sources, and nearly one-­‐quarter are integrating data from five or more sources.7 As more sources (each with large and growing volumes of data) are integrated, data analytics tools and processes may stretch to the breaking point. Analytics is gaining strategic significance within many organizations. In addition, more data analytics activities are running in real-­‐time or near real-­‐time, placing an even bigger premium on availability. As such, it makes sense that more than one-­‐quarter of all organizations indicated that downtime for their data analytics platforms cannot exceed more than one hour without causing adverse business impact. MemSQL is a shared-­‐nothing architecture deployed on commodity hardware, with data replicated across multiple nodes, within or across data centers. MemSQL provides an always-­‐on solution with no single point of failure. MemSQL also offers an intuitive, easy-­‐to-­‐use dashboard for real-­‐time visibility into the state and performance of the cluster, enabling easy cluster maintenance and performance tuning. ESG Lab validated the durability and reliability of the MemSQL distributed in-­‐memory database by taking a node offline. With the cluster degraded, all data was still available, and losing one out of 40 nodes had minimal impact on performance or total available memory. ESG Lab then replaced the failed node and rebalanced the cluster to redistribute the data evenly across all nodes. Throughout testing, MemSQL provided highly available, high-­‐ performance, real-­‐time analytics.

7

Source: ESG Research Report, The Convergence of Big Data Processing and Integrated Infrastructure, July 2012.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

16

ESG Lab Validation Highlights þ ESG Lab validated outstanding performance and linear scalability of the MemSQL distributed in-­‐memory database as it was scaled from 20 to 80 nodes. þ MemSQL was able to consume millions of rows of data per second while executing complex queries and presenting rapid insight into terabyte scale data sets. þ MemSQL proved to be adept at data cleansing activities, deleting a billion rows of data from an 80 node cluster in less than three minutes. þ With the cluster degraded, ESG Lab validated the durability and reliability of MemSQL. All data was still available with minimal impact on performance or total available memory. þ ESG Lab rebalanced the cluster to redistribute the data evenly across all nodes with minimal impact.

Issues to Consider þ The test results presented in this report are based on benchmarks and tools deployed in a standard Amazon EC2 environment. Due to the many variables in each production data center, testing in your own environment is recommended if you choose to deploy MemSQL on your own hardware in your own data center.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

17

The Bigger Truth Whether measured by increased revenues, market share gains, reduced costs, or scientific breakthroughs, data analytics has always played a key role in the ability to harness value from electronically stored information. What has changed recently is that as more business processes have become automated—information that was once stored in separate online and offline repositories and formats is now readily available for amalgamation and analysis to increase business insight and enhance decision support. Business executives are asking more of their data and are expecting faster and more impactful answers. The result is an ever-­‐increasing priority on data analytics activities and subsequently, more pressure on existing business analyst and IT teams to deliver. MemSQL’s distributed in-­‐memory database is designed to consume terabyte-­‐scale data sets rapidly while simultaneously querying the data, delivering a complete big data analytics solution that focuses on real-­‐time results while using the familiar SQL query language. MemSQL enables runtime operational analytics of transactions and interactions, in real-­‐time. Considering that in-­‐memory computing is faster than traditional disks or SSDs by orders of magnitude and doesn’t require batch-­‐loading to consume data, it comes as no surprise that MemSQL becomes more powerful as organizations scale-­‐out horizontally on commodity hardware. ESG Lab validated outstanding performance and linear scalability of MemSQL’s distributed in-­‐memory database as it scaled from 20 to 80 nodes. MemSQL was able to insert more than a million rows of data per second into the database while executing complex queries and presenting rapid insight into huge data sets at the same time. In addition, MemSQL proved to be adept at data cleansing activities, deleting a billion rows of data from an 80 node cluster in less than three minutes. MemSQL demonstrated reliability and durability as well. MemSQL is ACID compliant, highly available, and fault tolerant. When a live node was taken offline, all data in the cluster was still available, with minimal impact on performance and total available memory. Throughout testing, MemSQL continued to provide highly available, high-­‐ performance, real-­‐time analytics. Data growth shows no signs of abating. As data accumulates, there is a corresponding burden on IT to maintain acceptable levels of performance, whether that is measured by the speed with which an application responds, the ability to aggregate and deliver data, or the business value of information. Organizations are recognizing that their growing data stores bring massive, and largely untapped potential to improve business intelligence. At the same time, they also recognize the challenges that big data poses to existing analytics tools and processes, as well as the impact data growth is having on the bottom line in the form of increased requirements for storage capacity and compute power. MemSQL is built from the ground up to take advantage of modern hardware, leveraging dozens of cores per machine, terabytes of memory, and horizontal scale-­‐out on commodity hardware. If your organization needs a faster, easily scalable database to query big data and move faster to adapt to changing business conditions in real-­‐ time, ESG Lab would recommend that you take a serious look at MemSQL’s distributed in-­‐memory database.

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

18

Appendix Table 4. ESG Lab Test Bed

EC2 Instance m2.4xlarge

Amazon EC2 Configuration (Used for both Aggregator and Leaf nodes) OS Ubuntu Linux Small 4 16 160 1280 RAM 64GB EC2 Compute Units 26 (8 cores) Medium 8 32 320 2560 Local Storage (Not used) 1.69TB Large 16 64 640 5120

MemSQL Cluster Configuration Aggregator Nodes Leaf Nodes CPU Cores Memory (GB) Figure 13. Test Bed Database Schema region PK regionkey name comment PK

nation nationkey regionkey comment

FK1

orders customer PK FK1

custkey name address nationkey phone acctbal mktsegment comment PK FK2

FK1 orderkey custkey orderstatus totalprice orderdate orderpriority clerk shippriority comment suppkey supplier PK

FK1 suppkey name address phone acctbal comment nationkey

lineitem PK PK,FK1

FK3 linenumber orderkey quantity extendedprice discount tax returnflag linestatus shipdate commitdate receiptdate shipinstruct shipmode comment suppkey

part PK partsupp PK

FK1 suppkey availqty supplycost comment partkey

partkey name mfgr brand type size container retailprice comment

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

Lab Validation: MemSQL’s In-­‐Memory Distributed Database

19

Figure 14. Total Database Row Count Select count(*) from lineitem;

Figure 15. Query 1: Basic Reporting: Pricing Summary Report select lineitem.returnflag, lineitem.linestatus, sum(lineitem.quantity) as sum_qty, sum(lineitem.extendedprice) as sum_base_price, sum(lineitem.extendedprice * (1 - lineitem.discount)) as sum_disc_price, sum(lineitem.extendedprice * (1 - lineitem.discount) * (1 + lineitem.tax)) as sum_charge, avg(lineitem.quantity) as avg_qty, avg(lineitem.extendedprice) as avg_price, avg(lineitem.discount) as avg_disc, count(*) as count_order from orders, lineitem where orders.orderkey = lineitem.orderkey and lineitem.shipdate = '1' and lineitem.quantity = '2' and lineitem.quantity = '3' and lineitem.quantity = date('1993-10-01') and orders.orderdate < date('1993-10-01') + interval '3' month and lineitem.returnflag = 'R' and customer.nationkey = nation.nationkey group by customer.custkey, customer.name, customer.acctbal, customer.phone, nation.name, customer.address, customer.comment order by revenue desc limit 20;

© 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.

20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0218 | www.esg-­‐global.com

Similar Documents

Free Essay

Lab Report Instructions

...London School of Engineering and Materials Science Laboratory report writing instructions DEN101 - Fluid Mechanics 1 Flow Rate Measurement Experiment A. Student Student Number: 1234567 Version 2.0, 27 November 2010 Template for Word 97-2003 Abstract This document explains what is expected in your Fluids 1 lab report. The sections that should be covered are outlined and a structure you could follow is proposed. Detailed advice on how to edit the report is given. The document concludes with the marking criteria for this lab report. Table of Contents Abstract 2 1. Introduction 3 1.1. Writing 3 1.2. Editing and formatting 3 1.3. Content of the introduction 4 2. Background and theory 4 3. Apparatus 4 4. Test 4 5. Experimental procedure 4 6. Results 5 7. Discussion 5 8. Conclusions 5 9. References 5 10. Appendix A: Marking criteria 6 Introduction Before starting to write a report, you should think about what is your audience. Am I writing for colleagues who want a lot of detail how it is done, or am I writing for my boss who just wants an executive summary as he has no time for details? In general, there is not a single type of audience and we have to make our writing suitable for the detailed read, as well as the fast perusal. To understand what is required from you in this report, please have a look at the marking criteria in the Appendix. 1 Writing To limit...

Words: 2017 - Pages: 9

Free Essay

Lab # 8

...accessibility? Work stations and server. 4. What types of authentication and authorization requirements should be audited in a vulnerability assessment? Passwords and data access. 5. When categorizing vulnerabilities for a report that enumerates them, what would be a model? Common Vulnerability Enumeration (CVE) 6. What is the standard formula to rank potential threats? Decompose the application, determine and rank threats, and determine countermeasures and mitigation. 7. If an organization is identified as not using any password policies for any of its applications what would be two suggestions to note in the assessment? That the organization is a great risk without any password policies. 8. Should newly-released patches for a known vulnerability be applied to production system once released? Yes 9. What is the importance of having a security incident response plan in an organization? So that when there is a security incident all employees involved know the correct way to handle the incident. 10. What would an auditor be trying to verify if he/she is asking to view logs for certain dates? To view what was enter in on that day to match the reports. 11. How could the findings from the Skipfish and rats scanning performed in lab #7 be...

Words: 473 - Pages: 2

Free Essay

A History of Modern

... Evaluating Computer Forensics Tool Needs • Look for versatility, flexibility, and robustness – – – – – OS File system Script capabilities Automated features Vendor’s reputation • Keep in mind what application files you will be analyzing Guide to Computer Forensics and Investigations 3 Types of Computer Forensics Tools • Hardware forensic tools – Range from single-purpose components to complete computer systems and servers • Software forensic tools – Types • Command-line applications • GUI applications – Commonly used to copy data from a suspect’s disk drive to an image file Guide to Computer Forensics and Investigations 4 Tasks Performed by Computer Forensics Tools • Five major categories: – – – – – Acquisition Validation and discrimination Extraction Reconstruction Reporting Guide to Computer Forensics and Investigations 5 Tasks Performed by Computer Forensics Tools (continued) • Acquisition – Making a copy of the original drive • Acquisition subfunctions: – – – – – – – Physical data copy Logical data copy Data acquisition format Command-line acquisition GUI acquisition Remote acquisition Verification 6 Guide to Computer Forensics and Investigations Tasks Performed by Computer Forensics Tools (continued) • Acquisition (continued) – Two types of data-copying methods are used in software acquisitions: • Physical copying of the entire drive • Logical copying of a disk partition – The...

Words: 2076 - Pages: 9

Free Essay

Validation Process

...Literature Books SOPs Validation Examples Free Literature Glossary Usersclub Intro Log-in Register Preview Renewal Tutorials Risk Management Practices Computer Validation Part11 Method Validation ISO 17025 Lab Equipment Qualification Good Laboratory Practices About About Labcompliance Contact Labcompliance Scope Tax/Bank Information All come with 10+ Best Practice Documents: SOPs, Checklists, Examples Transfer of Analytical Procedures According to the New USP Chapter <1224> With SOPs, templates and examples for easy implementation March 21, 2013 Quality by Design (QbD) for Analytical Method Development and Validation Learn how to design robustness for easy transfer and to avoid OOS situations Recorded, available at any time Validation of Analytical Methods for GLP and Clinical Studies Learn how to design, prepare, conduct and document for FDA compliance Recorded, available at any time Eight Steps for Cost-effective Laboratory Compliance Up-to-date overview, hot topics and trends. Recorded, available at any time Verification of Compendial Methods according to the New USP Chapter <1226> Understand the new risk based approach and and get real world case studies for testing Recorded, available at any time Effective HPLC Method Development and Validation Preparation, conduct and documentation for FDA/EMA Compliance Recorded, available at any time Validation of Analytical...

Words: 10613 - Pages: 43

Premium Essay

Relocation Project Fro Lab Equipment

...Relocation Project Project Management Plan (PMP) For ABC QC Lab Equipment Relocation * * May 4, 2010 Prepared by Ingrid Valmes Table of Contents 1. Introduction 1 1.1 Project Summary 1 1.1.1 Scope 1 1.1.2 Funding Source 1 1.1.3 Objectives 1 1.1.4 Products Produced by the Project 1 1.2 Document Summary 1 1.2.1 Purpose 2 1.2.2 Evolution of the Plan 2 2. Roles and Responsibilities 3 2.1 External Roles and Responsibilities 4 2.1.1 Project Sponsor 4 2.1.2 Resource Manager 4 2.1.3 Contracts Representative 4 2.2 Project Roles and Responsibilities 4 2.2.1 Senior Manager 4 2.2.2 Project Manager 4 2.2.3 Requirements Manager (Project Team Member) 4 2.2.4 Measurement Analyst (Project Team Member) 4 2.2.5 Quality Assurance Manager (Project Team Member) 4 2.2.6 Configuration Manager (Project Team Member) 4 2.2.7 Risk Manager (Project Team Member) 4 2.2.8 Team Leaders (Project Team Member) 4 2.2.9 Project Training Needs 4 3. Project Management Activities 4 3.1 Integrated Project Management 4 3.1.1 Use of DHI’s Defined Processes 4 3.1.2 Coordinate and Collaborate with Relevant Stakeholders 4 3.2 Project Planning 4 3.2.1 Establish Estimates 4 3.2.1.1 Material Costs 4 3.2.2 Develop a Plan 4 3.2.3 Obtain Commitment to the Plan 4 3.2.4 Communicate the Plan 4 3.2.5 Risk Management Planning 4 3.2.6 Quality Assurance Planning 4 3.2.7 Quality Assurance Audit Schedule 4 3.2.8 Project Management Tools 4 3.3 Project Monitoring and Control 4 ...

Words: 4534 - Pages: 19

Free Essay

Phsics

...courses with the way physicists engage in research, we have developed an epistemology and expectations survey to assess how students perceive the nature of physics experiments in the contexts of laboratory courses and the professional research laboratory. The Colorado Learning Attitudes about Science Survey for Experimental Physics (E-CLASS) evaluates students’ shifts in epistemology and affect at the beginning and end of a semester. Also, at the end of the semester, the E-CLASS assesses students’ reflections on their course’s expectations for earning a good grade. By basing survey statements on widely embraced learning goals and common critiques of teaching labs, the E-CLASS serves as an assessment tool for lab courses across the undergraduate curriculum and as a tool for PER research. We present the development, evidence of validation, and initial formative assessment results from a sample that includes 45 classes at 20 institutions. We also discuss feedback from instructors and reflect on the challenges of large-scale online administration and distribution of results. I. INTRODUCTION Laboratory courses offer significant opportunities for engagement in the practices and core ideas of science. Laboratory course environments typically have apparatus, flexible classroom arrangements, low student/teacher ratios, and opportunities for collaborative work that promote students’ engagement in a range of scientific practices (e.g., asking questions, designing and carrying out experiments...

Words: 9395 - Pages: 38

Premium Essay

Srdr

...AT Computer Labs Advanced Microsoft Excel 2013 Introduction Microsoft Excel is program designed to efficiently manage spreadsheets and analyze data. It contains both basic and advanced features that anyone can learn. Once some basic features are known, learning the advanced tools becomes easy. This lesson is composed of some advanced Excel features. It assumes basic prior knowledge of Excel, and it is expected that the objectives from AT Step’s Excel Essentials are known. This lesson will talk about the advanced customization and formatting features that allow for easier data manipulation and organization. Objectives 1) Learn how to Customize the Interface 2) Advanced Formatting: Custom Lists, Cell Groups, and Transposing Tables 3) Learn how to Reference Across Sheets 4) Advanced Formulas and Using Data Ranges 5) Using Data Validation AT Computer Labs Interface Customization Most of Excels interface can be customized to fit many people’s needs. For some, customization makes tools more readily available by placing those tools in a location that is more natural for the user. This section will introduce you to customizing Excel’s interface by adding a tab in the ribbon, customizing the status bar at the bottom of the program, opening separate panes, and scrolling through a sheet with a static column or row. Adding a Tab in the Ribbon In this subsection, we will explore the tab options in Excel. Tabs in Excel can be added, deleted and even reorganized. This feature is useful...

Words: 10539 - Pages: 43

Premium Essay

System Management

...SRS For Hospital Management System This page contains SRS documentation for Hospital Management System. The SRS is produced at the culmination of the analysis task. The function and performance allocated to software as part of the system engineering and refined by establishing a complete information description, a detailed functional description, a representation of system behavior, indication of performance requirements and design constrains, appropriate validation criteria and the other information related to requirements. The SRS is technical specification of requirement of Hospital Management system. This specification describes what the proposed system should do without describing how it will do it. It also describes complete external behavior of proposed system. Purpose:- The main purpose of our system is to make hospital task easy and is to develop software that replaces the manual hospital system into automated hospital management system. This document serves as the unambiguous guide for the developers of this software system. Scope:- The document only covers the requirement specification for the hospital management system. This document does not provide any references to the other component of the hospital management system. All the external interfaces and the dependencies are also identified in this document. Feasibility Study:- The overall scope of the feasibility study was to provide sufficient information to allow a decision...

Words: 696 - Pages: 3

Free Essay

Management

...of Windows 7" and clicked it. and after a while it became windows 7 ultimate. but yesterday, a pop-up appeared on my taskbar saying that my windows is not genuine. now i cant change my background wallpaper and there is a message in the lower right of the screen saying "windows 7 ultimate, build 7601, this copy of windows is not genuine". i ran the diagnostic and this is the report. PLEASE HELP ME!! Diagnostic Report (1.9.0027.0): ----------------------------------------- Windows Validation Data--> Validation Code: 50 Cached Online Validation Code: 0xc004c4a2 Windows Product Key: *****-*****-YG69F-9M66D-PMJBM Windows Product Key Hash: /kehptF9HHVxM5d8dUnqgcfndXw= Windows Product ID: 00426-OEM-8992662-00497 Windows Product ID Type: 2 Windows License Type: OEM SLP Windows OS version: 6.1.7601.2.00010100.1.0.001 ID: {50A2E917-6A96-4AE0-ACB0-F354288D364B}(3) Is Admin: Yes TestCab: 0x0 LegitcheckControl ActiveX: N/A, hr = 0x80070002 Signed By: N/A, hr = 0x80070002 Product Name: Windows 7 Ultimate Architecture: 0x00000000 Build lab: 7601.win7sp1_gdr.111118-2330 TTS Error: Validation Diagnostic: Resolution Status: N/A Vista WgaER Data--> ThreatID(s): N/A, hr = 0x80070002 Version: N/A, hr = 0x80070002 Windows XP Notifications Data--> Cached Result: N/A, hr = 0x80070002 File Exists: No Version: N/A, hr = 0x80070002 WgaTray.exe Signed By: N/A, hr = 0x80070002 WgaLogon.dll Signed By: N/A, hr = 0x80070002 OGA Notifications Data--> Cached...

Words: 1549 - Pages: 7

Premium Essay

Skill Validation

...Skills Validation PNCI Eliana Ruiz Age: 86 Weight: 55 kg Base: Standard Adult Overview Synopsis The learner will be providing care to an 86-year-old Hispanic female admitted to the MedicalSurgical Unit with a non-healing wound on her right upper leg where a femoral-popliteal bypass graft was performed two weeks ago. She is diabetic and injured her left ankle by tripping on a curb on the way to the hospital. She is anxious about not being able to care for herself when she returns home. She lives alone but has a daughter close by, and has no insurance. This Simulated Clinical Experience™ (SCE™) has five states, that are transitioined manually. With manual transitions, the instructor should advance to the applicable state when appropriate interventions are performed. Initially, in State 1 0900 Hours Assessment, the learner is presented with a patient who is febrile and exhibiting other signs of infection. Initial assessment reveals a temperature of 38.6o Celsius, HR in the 80s, BP in the 140s/80s, RR in the low 20s and SpO2 in the mid 90s on room air. Breath sounds demonstrate crackles bilaterally. The patient is anxious and incontinent of urine. She has a non-productive cough and reports tenderness over the left ankle. Initial treatment includes application of an elastic bandage to the left ankle, assessment of pain level, administration of pain medications, insertion of a urinary catheter and a sterile wet-to-moist dressing change to the graft site. If learners request...

Words: 4609 - Pages: 19

Premium Essay

Graduating

...Jaye Weinberg Lab # 4 Assessment Worksheet 1. What is a PHP Remote File Include (RFI) attack, and why are these prevalent in today's Internet world? RFI stands for Remote File Inclusion that allows the attacker to upload a custom coded/malicious file on a website or server using a script. This vulnerability exploits the poor validation checks in websites and can eventually lead to code execution on server or code execution on website (XSS attack using javascript). RFI is a common vulnerability and all website hacking is not entirely focused on SQL injection. Using RFI you can deface the websites, get access to the server and do almost anything. What makes it more dangerous is that you only need to have your common sense and basic knowledge of PHP to execute this one. 2. What country is the top host of SQL Injection and SQL Slammer infections? Why can't the US Government do anything to prevent these injection attacks and infections? The U.S. is the top host of SQL Injection and SQL Slammer infections. Cybercriminals have made vast improvements to their infrastructure over the last few years. Its expansion is thousands of websites vulnerable to SQL Injections. Malicious code writers have exploited these vulnerabilities to distribute malware so quick that the government cannot contain such a large quantity. 3. What does it mean to have a policy of Nondisclosure in an organization? It is a contract where the parties agree not to disclose information covered by the agreement...

Words: 319 - Pages: 2

Premium Essay

The Decline of Animals Used for Scientific Research

...A scientific revolution is underway that promises to spare millions of animals from suffering and death. Countless animals are used every year in the United States and abroad to assess the potential health hazards of cosmetics, soaps, household cleaners, pesticides, drugs, and other chemicals and products to which people might be exposed. In these assessments, chemicals are applied to the animals’ eyes and skin or injected into their bodies, or the animals are forced to ingest or inhale them. Crude animal tests have been the mainstay of medical research for decades. However, recent developments suggest that the quiet evolution of alternative methods will turn into a fast-paced revolution in research testing, without the use of live animals. For years, critics have been saying that animal testing is unreliable, that the reaction to a drug in an animal is different from the reaction in a human. (“Animal Testing” 2008) What do animal trials really tell us? The reason for animal trials is to determine two issues: safety and efficacy, whether a compound is safe for human ingestion and whether a product works for its intended purpose. The whole purpose of using an animal’s complex biological system in research is to learn how a compound will affect all the organs. Historically, it is clear that animals have played an important part in that determination. However, alternatives are now being developed for testing, which would eliminate the need for animals and speed up drug approval in...

Words: 1783 - Pages: 8

Premium Essay

Competition and Strategy

...global pharma players and forcing them to invest in these new markets Global Pharmaceutical Market  The pharmaceutical market was $ 820Bn in Other Patented 86% 83% 81% 79% 2009, growing at a CAGR of 9% over 200309  Generics is growing at double the rate of the Generics 2004 2006 2008 2010E total industry (18% CAGR), while Patented drugs market is growing at 7% India 1% Geographical Market Share Middle Africa 1% East 1% CIS 2% Others 12% US 42%  Top 10 countries account for 73% of sales  US accounts for 42% of the world sales  While the US sales are growing at only 1-2% CAGR, growth in emerging markets is over 10% Latin America 6% South East and East Asia 6% EU 29% Source: Centrum Research Report on Pharma Industry 2009; 2 Key Growth Drivers for Global Pharmaceutical Market 1 Sluggish Growth in Mature Markets  Economic slowdown leading to reduced healthcare spending  Top brands’ patent expiries and very few new product launches  Decentralization of government healthcare budgets.  High healthcare expenditure by government 2 Faster Growth in Emerging Markets  Broader public and private healthcare funding  Greater access to, and demand for, innovative medicines 250 200 150 100 50 0 3  Pressure on healthcare budgets worldwide  More doctors prescribing Generic Drugs  Huge generics future outlook $ 224 Bn drugs coming off patent Declining Patented Drugs Sales & Rising Generics Acceptability ...

Words: 3256 - Pages: 14

Free Essay

Studying

...Fall 2015 TERM PROJECT ASSIGNMENT [pic] Objectives The assignment's objective is to gain experience in software engineering practices. Project The project will be chosen by each group from one of the project descriptions in the attached “Project Proposals” document. The names and numbers of group members along with the project topic that is chosen must be submitted under the heading Projects Groups and Topics Assignment on the moodle page of the course by 8.10.2015 by each team. The project will be undertaken in teams consisting of three or four members. All the members must belong to the same lab group. 1 Expected Work The teams should apply waterfall process model. They are free to use any tool or report format that may be preferred for its being very suitable for the teams specific task. For example teams may add a data flow diagram or even a flowchart to their analysis or design if they think they are necessary to get the job done. “Getting the job done” is however a strict requirement and it is expected from the team as a pay-back of the freedom that they have been granted. You will frequently find the instructors act as customers who even do not know what exactly they should expect from the project, or in general from software or computers. On the other hand the team should be motivated to discover and demonstrate what the project may do for the end users. The teams must perform all kinds of research including moving...

Words: 1146 - Pages: 5

Free Essay

Planning Supervisor

...requiring invasive treatment. Similar complications are rare with insertion of central venous catheters, as they should not enter the heart. Injury to the right bundle during central venous catheter insertion can be by trauma from the guide wire or from the catheter itself. The function of the AV node and bundle of His in these patients has not been studied before. We report a patient with LBBB who developed CHB during insertion of a central venous cannula. Conduction through the AV node and His–Purkinje system was intact, showing that the transient RBBB was caused by traumatic injury rather than by other disease of the conduction system. Designing central venous catheter product, which will appeal customer who are looking for a quality, healthy, sterility, and pyrogen free. The new product should be on shelf in test markets within nine months of start of the project and meet following goals. Page 5 Team members: Mahmud Abu el-Atta Mohamed Elmorsy Hamada Sharaf Amr Abdel Aziz Ismail El Hamalawy Maged Hamdy Emad Ali Production Planning specialist Quality assurance specialist Production manager Quality control lab Purchasing manager Operations Warehouse manager The mission goals meet the five...

Words: 2122 - Pages: 9