Query Evaluation Using Dynamic Code Generation
Magnus Bjornsson, Senior Director of Engineering at Oracle
Typical query evaluation in a database uses a static evaluator which is built to handle all types of queries. For performance reasons it has become more and more common in recent years to dynamically build the evaluator based on the query itself (using JIT compilation). In this talk I’ll talk about the approach that we at Oracle/Endeca took in our own columnar, in-memory data store to dynamically generate the query evaluator at query time.
Making Enterprise Software That's As Easy To Install As Dropbox
Martin Martin, Software Architect at Infinio
There's a lot of great software that does cool stuff, but when it comes to software that is deeply embedded in your infrastructure, all too often it's too much of a project to deploy and try it out. We confronted the problem of intercepting and modifying the data stream between all the virtual machines in a data center and the backend storage arrays that host their virtual disks. This talk describes how networking works at the TCP and link levels, and how we subvert that to make installation so easy and nondisruptive that you could try out our product over lunch.
Data Movement for Distributed Execution
Derrick Rice, Software Engineer at HP Vertica
Networked data movement techniques are critical to scalability in distributed computing. As data sets have grown and analytics have increased in complexity, traditional approaches have run into some surprising problems. Exhaustion of ephemeral ports and OS buffers. Deadlock and unexpected performance degradations. Extraordinary overhead costs. Congestion control challenges and firmware corner cases.
This talk will introduce HP Vertica's data transmission layer and the challenges encountered in the context of its distributed execution engine. We will share our journey from a naive implementation to a topology-aware data flow. We will also look at what can be learned from other technologies and ask “What's next?” Looking forward, operating at scale will continue to reveal problems and require new techniques.
Scaling to over 1,000,000 requests per second
Beth Logan, Senior Director of Optimization at DataXu
DataXu’s decisioning technology handles over 1,000,000 ad requests per second. To put this into context, Google Search handles 5,000 to 10,000 transactions per second and Twitter handles 5,000 to 7,000 transactions per second. Behind this statistic is an incredible architecture that has enabled us to scale. We use a blend of open source and home grown tools to place ads, record their impact and learn and deploy our decisioning models automatically, all while running 24x7 in over 30 countries worldwide. In this presentation we will dive into some of these tools and discuss the challenges we faced and the tradeoffs we made.
Fractal Tree Indexing in MySQL and MongoDB
Tim Callaghan, VP of Engineering at Tokutek
As transactional and indexed reporting data sets continue to grow, traditional B-tree indexing struggles to keep up, especially when the working set of data cannot fit in RAM. Fractal Tree indexes were purpose built to overcome this limitation, while retaining the read properties we expect for our queries. We’ll start by covering the theoretical differences between the two indexing technologies. We’ll end the talk by discussing the benefits that Fractal Tree indexes bring to MySQL (TokuDB) and MongoDB (TokuMX). “Benchmarks or it doesn’t count”, so expect to see a few.
Scalable Collaborative Filtering on top of Apache Giraph
Maja Kabiljo, Software Engineer at Facebook
Apache Giraph is a highly performant distributed platform for doing graph and iterative computations. Collaborative filtering is a well known recommendation technique that is often solved with matrix-factorization based algorithms. This talk will detail our scalable implementation of SGD and ALS methods for collaborative filtering on top of Giraph. We will describe our novel methods for distributing the problem and the related Giraph extensions that allows us to scale to over a billion people and tens of millions of items. We will also review various additions required for handling Facebook’s data (for example, implicit and skewed item data). Finally, to complete our easy to use and holistic approach to scalable recommendations at Facebook, we detail our approach to quickly finding top-k recommendations per user.
Using Graph Partitioning in Distributed Systems Design
Alon Shalita, Software Engineer at Facebook; and Igor Kabiljo Software Engineer at Facebook
Large graph datasets, like online social networks or the world wide web, introduce new challenges to the field of systems design. Their size requires scaling resources horizontally by splitting data and queries across several computation units, but standard sharding and routing schemes that ignore the inherent graph structure of the datasets result in suboptimal performance characteristics. In this talk, we present an efficient distributed algorithm for graph partitioning, the problem of dividing a graph into equally sized components with as few edges connecting these components as possible, and show how its results can be used for optimizing distributed systems serving graph based datasets.
H-Store/VoltDB architecture vs. CEP systems and newer streaming architectures
John Hugg, Engineering at VoltDB
In 2007, researchers at MIT, Brown & Yale set out to build a new kind of relational database called H-Store. Commercially developed as VoltDB, it was suddenly possible to build applications that did millions of transactional operations per second at very low cost and with high fault tolerance. While suitable for micro-payments and other high volume, traditional transactional work, many early customers built systems for stream processing. As the product evolved, more and more features were added to support streaming, event processing and ingestion workloads, including materialized views, Kafka ingestion and push-to-HDFS data migration. This talk will explain, through customer use-cases and some development backstory, how the H-Store/VoltDB architecture compares to CEP systems and newer streaming architectures like Storm and Spark Streaming.
Scaling Cassandra and MySQL
Stefan Piesche, CTO at Constant Contact
CTCT used to scale data vertically in large DB2 databases attached to even larger SANs. Since this is not only cost prohibitive but poses significant scalability and availability issues, we have now 2 primary other data strategies.
Cassandra. We use Cassandra as a horizontally scalable data tier for key/value type data. We have around 350 Cassandra nodes spanning 2 data centers. That systems provides 10x the performance of the old RDBMS and 1/10th of the cost. This system is our consumer event tracking systems that scales to 100TB of data, 150BN records that arrive at a velocity of 10k/sec.
Sharded mysql. Our largest deploy is a 36TB system spanning 2 data centers. But, instead of just sharding the DB tier, we even shard the application tier using that system in order to provide complete transparency of the sharding mechanism. Our SOA allows for RESTful access of that data, without any knowledge of the underlying sharding mechanism. However, we have learned that this led to a substantial underutilization of the app tiers – a 96 node cluster of a Ruby Rails application – so we are looking into proprietary DB level sharding mechanisms as well.
The mixture of RDMBS and NOSQL data tiers has caused issues in our analytics platform, a 150TB Hadoop cluster. We use similar mechanism like Netflix does to read data from Cassandra nodes – reading from the SSTables to extract the data.
F4 - Photo Storage at Facebook
Joe Gasperetti, Production Engineer at Facebook; and Satadru Pan, Software Engineer at Facebook
Vertica Live Aggregate Projection: Partial Aggregation for Incremental Maintenance of Live Views With Deletes
Nga Tran, Software Engineer at HP Vertica
Live Aggregate Projections (LAPs) are a new type of projection, introduced in HP Vertica 7.1, that contain one or more columns of data that have been aggregated from a table. The data in a LAP are aggregated at load time, thus querying it is several times faster than applying the aggregation directly on the table.
Unlike other databases which do not support incremental maintenance of non-distributive aggregate functions (e.g. MIN and MAX) for updates and deletes, incremental maintenance is possible in Vertica because the data in LAPs are *partially* aggregated.
This talk focuses on the implementation of Vertica LAPs, for which the current supported aggregate functions are (distributive) SUM, COUNT and (non-distributive) MIN, MAX, and TOP-K. By aggregating data per load on each partition, Vertica allows incremental maintenance on both distributive and non-distributive aggregation while allowing data to be deleted.
Scaling Crashlytics Answers - Real time high-volume analytics processing with the lambda architecture.
Ed Solovey, Staff Software Engineer at Twitter
In the fifteen seconds that it takes you to read this, Answers will have processed 12,000,000 events in support of its actionable, insightful, and real-time mobile analytics dashboards. Learn how it leverages the lambda architecture and probabilistic algorithms to handle this influx of information. We'll dig into how the two pillars of the lambda architecture - off-line, batch processing and real-time, stream-compute processing come together to help us achieve a scalable, fault-tolerant, real-time data processing system.
Building Scalable Caching Systems With Mcrouter
Anton Likhtarov, Software Engineer at Facebook
Modern large scale web infrastructures rely heavily on distributed caching (e.g memcached) to process user requests. These caches serve as a temporary holding spot for the most commonly accessed data. However, this makes these services very sensitive to cache performance and reliability. Mcrouter is the lynchpin of Facebook’s caching infrastructure. It handles the basics of routing requests to the appropriate hosts and managing the responses in a highly performant way. In addition, there are a lot of features in mcrouter that have been designed to dramatically improve the reliability of the caching infrastructure. The problems that mcrouter addresses are not specific to Facebook, but distributed caching systems in general. As a result, Instagram and Reddit have also adopted mcrouter as the primary communication layer to their cache tiers. Mcrouter is open source software and we hope it will be useful in many other applications that rely on caching. This talk gives a very brief overview of mcrouter and the basics of integrating it into different pieces of infrastructure.
Geo-spatial Features in RocksDB
Igor Chanadi, Database Engineer at Facebook; and Yin Wang, Research Scientist at Facebook
RocksDB is an embeddable key-value store optimized for fast storage. It is based on Log-Structured merge-tree (LSM) architecture and is widely used across Facebook’s services. In this talk we’ll present how we implemented spatial indexing on top of RocksDB’s LSM architecture, which enables us to efficiently store geo-spatial data in RocksDB. We’ll also discuss how we optimized the spatial indexing for bulk-load, read-only and read-mostly workloads. Finally, we’ll talk about how we use the geo-spatial features to build database serving OpenStreetMaps data, which can then be used to render map tiles using Mapnik.
Benefits of Big Data: Handling Operations at Scale
Don O'Neill, VP of Engineering, Operations and Infrastructure at TripAdvisor
Don O’Neill from TripAdvisor presents Big Data business lessons learnt from handling operations on a site with over 280 million unique visitors every month - discussing Hadoop, log shipping and analytics, new operations monitoring, and anomaly detection.
Scaling Redis and Memcached at Wayfair
Ben Clark, Chief Architect at Wayfair
At Wayfair, we had to take the caching layer for our customer-facing web sites from a simple master/slave pair of Memcached nodes in 2012, to a set of consistently hashed clusters of in-memory cache servers and persistent key-value stores, in multiple data centers, in time for the holiday rush of 2013. Building on the work of giants and innovators, particularly Akamai, Last.fm, and Twitter, we used composable tools: Memcached, Redis, Ketama, Twemproxy, Zookeeper, to create a resilient distributed system. It's big. Well. That’s always relative. Maybe that’s too bold a claim, considering some of the others speakers at this conference. Let's say it seems big to us, and we've been through some explosive growth over the last few years! It's definitely inexpensive, strong, and fast, and Ben Clark will describe our techniques and add-ons, which are available on github, and explain how to do it yourself.
DBD: An Automated Design Tool for Vertica
Vivek Bharathan, Software Engineer at HP Vertica
Query performance in any database system is heavily dependent on the organization and structure of the data. The task of automatically generating an optimal design becomes essential when dealing with large datasets. Vertica is a distributed, massively parallel columnar database that physically organizes data into projections. Projections are attribute subsets from one or more tables with tuples sorted by one or more attributes, that are replicated on or distributed across cluster nodes.
The key challenges involved in projection design are picking appropriate column sets, sort orders, cluster data distributions and column encodings. In this talk we shall discuss Vertica’s customizable physical design tool, called the Database Designer (DBD), that is tasked with producing projection designs that are optimized for various scenarios and applications. In particular, we will focus on the challenges that the DBD faces, and its evolution over the years.
Look Back Videos
Goranka Bjedov, Capacity Engineer at Facebook
On February 4th 2014, Facebook celebrated its 10th anniversary by releasing the Look-Back Videos product. Every Facebook user was given a 62 second video of their most important events over the course of their Facebook presence. If the users didn’t have a lot of activity, a cover page was generated instead. As reported in the press and on Facebook engineering blog, this project was realized in less than a month worth of time.
This talk will focus on discussing technical challenges related to the project, some of which include: compute, network, storage, distribution, projections and modeling. But, more importantly, the talk will focus on which parts of infrastructure enabled successful undertaking on a project of this size. What are the infrastructure pieces that had to be in place to make this happen? What are the necessary parts that enable fast product development on a massive scale, while at the same time keeping the risk to the remainder of the service acceptable? How can you plan and execute a project of this magnitude and how do you mitigate risks?