Hadoop Summit North America 2013: Call for Lightning Talks

At this year’s Hadoop Summit San Jose, you will have the opportunity to share your initiatives with the Hadoop community through our Lighting Talks session.

Lightning talks is a chance for you to present and demonstrate your work our share your vision for Apache Hadoop and Big Data. If you’re interested in participating, submit your topic by June 7, 2013.

Topics will be VOTED by the community. Top 8 voted topics will be added to the Hadoop Summit Lightning talks agenda.

Enter in your topics today!

*Lightning Talk speakers must register for the conference and are responsible for your own travel and expenses. By submitting this talk, you are confirming you have authority and permission from your company to present on the topic.

Hadoop Summit North America will be held at the San Jose Convention Center on June 26 & 27, 2013.

Lightning Talks

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

  • When an admin closes an idea you've voted on, you'll get your votes back from that idea.
  • You can remove your votes from an open idea you support.
  • To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".
(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can vote and comment on it.

If it doesn't exist, you can post your idea so others can vote on it.

Enter your idea and we'll search to see if someone has already suggested it.

  1. Rewriting Hadoop Compute from scratch (Or How YARN came about)

    Apache Hadoop MapReduce has undergone a revolution and emerged as Apache Hadoop YARN, a generic compute platform. Apache Hadoop YARN has come a long way since its humble beginnings in 2011 from nothing to now running at scale of tens of thousands of nodes. With few of the recent releases, we are slowly moving Hadoop YARN off its alpha status towards widespread use.

    This talk will cover the journey of Apache Hadoop YARN from level zero to large scale deployments. We will describe how we started with almost no code, how the project got open sourced, how the community jumped…

    2 votes
    Vote
    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      You have left! (?) (thinking…)
    • C3 - Compute Capacity Calculator for Hadoop

      Recently Hadoop is getting more and more attention from big enterprises enabling Data Analytics as a hosted service in the cloud, where one of the key aspects of operational management is to dynamically estimate and provision the required cluster capacity for the map/reduce applications to meet their storage and the latency SLA needs. In this presentation, we describe our work at Yahoo, addressing the capacity estimation problem for the Yahoo production clusters. Typically, users develop and test their Map/Reduce jobs (or Pig scripts/Oozie workflows comprising multiple Map/Reduce jobs) on the limited capacity shared test cluster with mostly partial data sets.…

      1 vote
      Vote
      Sign in
      Check!
      (thinking…)
      Reset
      or sign in with
      • facebook
      • google
        Password icon
        I agree to the terms of service
        Signed in as (Sign out)
        You have left! (?) (thinking…)
      • Powering Efficient Data Processing Pipelines with Oozie and HCatalog

        Internet-scale data presents many challenges for data pipelines, in terms of storage modeling on the grid and optimizing processing for efficiency. Of particular interest is the latency between the availability of the data needed before a pipeline-processing stage can begin working and the actual commencement of that stage. Enterprise pipelines can span hundreds of stages, to form a complex Directed Acyclic Graph (DAG), mandating the use of a workflow manager like Apache Oozie to process dependencies in a fast and scalable way. The resulting latency gains in data discovery for each pipeline stage are cumulative and can have significant business…

        18 votes
        Vote
        Sign in
        Check!
        (thinking…)
        Reset
        or sign in with
        • facebook
        • google
          Password icon
          I agree to the terms of service
          Signed in as (Sign out)
          You have left! (?) (thinking…)
        • Hadoop, Meet Superman – Faster Than A Speeding Bullet, More Powerful Than A Locomotive

          Hadoop, Meet Superman – Faster Than A Speeding Bullet, More Powerful Than A Locomotive

          Possessing unbelievable strength …For the first time ever, there is a way to ensure robust data protection of sensitive information going into Hadoop without any compromises to performance, scalability, data type handling, and the extraction of business intelligence of that data.

          This lightning presentation will show how format-preserving encryption (FPE), which is presently going through official recognition in NIST, can deliver protection without breaking what you need to get your job done in Hadoop. We will provide a specific blueprint for how to secure sensitive assets…

          2 votes
          Vote
          Sign in
          Check!
          (thinking…)
          Reset
          or sign in with
          • facebook
          • google
            Password icon
            I agree to the terms of service
            Signed in as (Sign out)
            You have left! (?) (thinking…)
          • Bridging the Gap Between Data Security and Data Insight

            Big Data is very powerful. Analysis of massive amounts of data provides companies the power to identify trends and improve business strategy. Big data is reliable, scalable and has superior performance. However companies are now faced with the obstacle of compliance and security regulations that had not been considered before. In addition, the data environment used today must adapt to the ever-changing data security landscape in the future. Now is the time to bridge the gap between security regulations, privacy and compliance, yet still be able to provide powerful analysis and data insight to achieve the power behind a big…

            3 votes
            Vote
            Sign in
            Check!
            (thinking…)
            Reset
            or sign in with
            • facebook
            • google
              Password icon
              I agree to the terms of service
              Signed in as (Sign out)
              You have left! (?) (thinking…)
            • Building a Modern BI application: Why Hadoop requires a different approach

              Helping everyday business users make sense of big data requires a fundamentally different approach to identify, extract, combine and visualize data. Delivering this breed of interactive self-service BI through a web browser allows simple, democratized access to everyday users. Building such a modern BI application involves working at the intersection of Hadoop, database research, HCI, graphics and software robustness. This lighting talk will cover the challenges, techniques and tips for modeling, wrangling, aggregating and visualizing massive datasets on Hadoop.

              55 votes
              Vote
              Sign in
              Check!
              (thinking…)
              Reset
              or sign in with
              • facebook
              • google
                Password icon
                I agree to the terms of service
                Signed in as (Sign out)
                You have left! (?) (thinking…)
              • Weave - The sensible YARN

                Hadoop YARN can be used as a generic cluster resource management framework to run any type of distributed applications. However, YARN was designed primarily for batch jobs, which makes it challenging to use for long running workloads such as realtime services. Moreover, YARN’s interfaces are too low level for rapid application development and it requires a great deal of boilerplate code even for a simple application, posing a high ramp up cost that turns developers away.
                Weave is designed to greatly improve this situation. It provides a set of libraries that makes writing distributed applications easy through an abstraction layer…

                19 votes
                Vote
                Sign in
                Check!
                (thinking…)
                Reset
                or sign in with
                • facebook
                • google
                  Password icon
                  I agree to the terms of service
                  Signed in as (Sign out)
                  You have left! (?) (thinking…)
                • In Hadoop MapReduce, Sort is like the Force

                  Syncsort has been the performance leader in data sorting for 40 years. In Hadoop, sort is like “The Force”, binding everything together. This session focuses on the importance of Sort within the MapReduce data flow. Syncsort’s contribution to Apache Hadoop project , MAPREDUCE-2454, introduced a new feature to allow alternative implementations of the Sort phase within the MapReduce data flow. The patch, MAPREDUCE-2454, allows an alternative sort implementation to be called for both Map and Reduce side Sort instead of the native Hadoop sort, i.e. make the Sort phase pluggable. The patch goes far beyond just making the MapReduce Sort…

                  94 votes
                  Vote
                  Sign in
                  Check!
                  (thinking…)
                  Reset
                  or sign in with
                  • facebook
                  • google
                    Password icon
                    I agree to the terms of service
                    Signed in as (Sign out)
                    You have left! (?) (thinking…)
                  • Keep Calm and Contribute to Hadoop

                    For projects like Hadoop to evolve, it’s critical more commercial companies and individuals contribute. Syncsort is a unique company that has evolved from a commercial mainframe sort company to doing Smarter Big Data ETL on Hadoop and now an Apache Open Source contributor. Using a real life case study, hear about one organization’s journey – how it benefited both the employees and more importantly users of the Apache Hadoop Project. We will share the key lessons we learned on WHAT TO DO and what NOT TO DO when contributing to an open source project, and the best way YOU can…

                    20 votes
                    Vote
                    Sign in
                    Check!
                    (thinking…)
                    Reset
                    or sign in with
                    • facebook
                    • google
                      Password icon
                      I agree to the terms of service
                      Signed in as (Sign out)
                      You have left! (?) (thinking…)
                    • MySQL and Hadoop: HowTo

                      Maintaining a multi-petabyte data warehouse is always a challenge. It is even harder if need to run a close to realtime reports. However, we can use an external indexes in MySQL for our reports and connect MySQL to Hadoop with the new no-sql/hadoop connector to receive close to real-time data from MySQL. In this talk we will show how to do it.

                      1 vote
                      Vote
                      Sign in
                      Check!
                      (thinking…)
                      Reset
                      or sign in with
                      • facebook
                      • google
                        Password icon
                        I agree to the terms of service
                        Signed in as (Sign out)
                        You have left! (?) (thinking…)
                      • Hadoop's Gaping Security Hole

                        Security is one of the primary roadblocks to broader enterprise adoption of Hadoop and other big data technologies. Unlike traditional databases and data stores, Hadoop clusters present a unique architectural challenge that current security technologies weren't built to handle. Popular Hadoop distributions have not made security a priority, and this has consequences for the enterprise. Yes, Hadoop’s core specifications are still being developed by the Apache community, but they have yet to adequately address enterprise requirements for robust security, policy enforcement, and compliance. Hadoop may have its challenges, but its distributed approach to data is the future of enterprise computing.…

                        31 votes
                        Vote
                        Sign in
                        Check!
                        (thinking…)
                        Reset
                        or sign in with
                        • facebook
                        • google
                          Password icon
                          I agree to the terms of service
                          Signed in as (Sign out)
                          You have left! (?) (thinking…)
                        • Hive Correlation Optimizer

                          Hive translates a SQL query to multiple MapReduce jobs based on dependencies of relational operations and operations which need to re-distribute (shuffle) data. For a complex SQL query, it is not uncommon to find correlations among those generated MapReduce jobs. Two representative correlations are (1) multiple MapReduce jobs share common input tables and (2) multiple MapReduce jobs shuffle the data in the same way. Without leveraging these two correlations, MapReduce jobs generated by Hive can do unnecessary work on data loading and data shuffling, which result in long query processing time for complex queries. In this talk, I will introduce…

                          48 votes
                          Vote
                          Sign in
                          Check!
                          (thinking…)
                          Reset
                          or sign in with
                          • facebook
                          • google
                            Password icon
                            I agree to the terms of service
                            Signed in as (Sign out)
                            You have left! (?) (thinking…)
                          • Merging Hadoop clusters with isolation

                            Although job duration in many hadoop clusters is limited by the available hardware, hadoop clusters are typically not 100% utilized across all hardware resources. This is because typically jobs are not running 24/7, and the running jobs typically don't consume 100% of CPU, memory, and storage bandwidth simultaneously. Users with multiple hadoop clusters could dramatically reduce job duration by combining the separate underutilized clusters into a single big cluster. Typically users choose not to do that for several reasons: no guarantee their jobs would get a fair share of the resources, need to have same version and configuration for all…

                            1 vote
                            Vote
                            Sign in
                            Check!
                            (thinking…)
                            Reset
                            or sign in with
                            • facebook
                            • google
                              Password icon
                              I agree to the terms of service
                              Signed in as (Sign out)
                              You have left! (?) (thinking…)
                            • Case Study: How a Just Do It Attitude Illuminates Multi-Million Dollar Opportunities

                              It's easy to attend conferences and talk about implementations, but often plans get derailed due to perceived difficulties. Here's how a very small, very dedicated team went from a small group of servers that wouldn't boot to reporting on big opportunities on eBay in about a month!

                              1 vote
                              Vote
                              Sign in
                              Check!
                              (thinking…)
                              Reset
                              or sign in with
                              • facebook
                              • google
                                Password icon
                                I agree to the terms of service
                                Signed in as (Sign out)
                                You have left! (?) (thinking…)
                              • Case Study: How the Big Data game is played at WildTangent Games

                                You’ve installed Hadoop but vast amounts of data still reside in data warehouses and traditional sources. Learn how big data can be integrated ‘on the fly’, both in the data center and the cloud, with other data sources and delivered to the desktop without requiring skills in MapReduce, Hive, Pig or SQL. This elegant approach is easy to implement and works with existing data visualization tools on users’ desktops.

                                1 vote
                                Vote
                                Sign in
                                Check!
                                (thinking…)
                                Reset
                                or sign in with
                                • facebook
                                • google
                                  Password icon
                                  I agree to the terms of service
                                  Signed in as (Sign out)
                                  You have left! (?) (thinking…)
                                • 1 vote
                                  Vote
                                  Sign in
                                  Check!
                                  (thinking…)
                                  Reset
                                  or sign in with
                                  • facebook
                                  • google
                                    Password icon
                                    I agree to the terms of service
                                    Signed in as (Sign out)
                                    You have left! (?) (thinking…)
                                  • Hadoop is here: Now what? Managing Enterprise Data in the evolving Big Data landscape

                                    Creating a data management strategy that can assist the management of different data assets across different platforms (relational, unstructured, multidimensional) and systems (oltp, olap) can be a real challenge in leveraging and maintaining data as it flows thru the organization. In this track I will present the top 10 steps in order to build a great enterprise data management system that spawns across different data platforms.

                                    1 vote
                                    Vote
                                    Sign in
                                    Check!
                                    (thinking…)
                                    Reset
                                    or sign in with
                                    • facebook
                                    • google
                                      Password icon
                                      I agree to the terms of service
                                      Signed in as (Sign out)
                                      You have left! (?) (thinking…)
                                    • Bringing the Value of Hadoop to the Enterprise

                                      Hadoop is an innovative new technology causing CIOs to rethink their data architecture - making this an exciting time to be a “big data” technologist. However, one of the key challenges in any technology's evolution is to move from the lab to the data center and create value for the masses in an organization. How do we better integrate the tools in the Hadoop and big data ecosystem to create seamless usability and value to businesses? What are the business problems and use cases which leading businesses are trying to solve? This session will discuss his views on a unified…

                                      1 vote
                                      Vote
                                      Sign in
                                      Check!
                                      (thinking…)
                                      Reset
                                      or sign in with
                                      • facebook
                                      • google
                                        Password icon
                                        I agree to the terms of service
                                        Signed in as (Sign out)
                                        You have left! (?) (thinking…)
                                        1 comment  ·  Admin →
                                      • Leveraging your hadoop cluster better - running performant code at scale

                                        Somebody once said that Hadoop is a way of running highly unperformant code at scale. I will show how we changed that and made our map reduce jobs twice as fast. I will show how to analyze jobs at scale and optimize the job itself, instead of just tinkering with Hadoop options. The result is a much better utilized cluster and jobs that run in a fraction of the original time – running performant code at scale!

                                        10 votes
                                        Vote
                                        Sign in
                                        Check!
                                        (thinking…)
                                        Reset
                                        or sign in with
                                        • facebook
                                        • google
                                          Password icon
                                          I agree to the terms of service
                                          Signed in as (Sign out)
                                          You have left! (?) (thinking…)
                                        • HVE: Hadoop Virtualization Extensions

                                          Hadoop plays as a key infrastructure in big data ecosystem. In this session, we will introduce the flexible deployment options for Hadoop provided by cloud and go through semantics of reliability, elasticity and performance for each option. Then, we will present our work to enhance reliability, elasticity, efficiency and QoS for Hadoop running in cloud and virtualization environment, includes: virtualization-aware topology, virtualization-aware resource scheduling, application/infrastructure aware QoS, etc.

                                          1 vote
                                          Vote
                                          Sign in
                                          Check!
                                          (thinking…)
                                          Reset
                                          or sign in with
                                          • facebook
                                          • google
                                            Password icon
                                            I agree to the terms of service
                                            Signed in as (Sign out)
                                            You have left! (?) (thinking…)
                                          ← Previous 1

                                          Hadoop Summit North America 2013: Call for Lightning Talks

                                          Feedback and Knowledge Base