Leapfire Solutions, LLC • Sunnyvale, California
Satisfying Join Queries in Elasticsearch for many use cases

Leapfire’s PreJoin Solution is an innovative join-compare solution that works well for many use cases. It needs no changes to Elasticsearch, no extra plugins, and uses the existing Elasticsearch Query DSL. Do Relational-style SQL Joins in Elasticsearch Fast.

Denormalizing data indexed into Elasticsearch does not cost much extra disk space due to Elasticsearch's Term Dictionary and compression. Denormalization is only infeasible if it leads to an explosion of document updates, as described below. PreJoin avoids excessive updates while satisfying queries that compare values from joined tables/indices. It is particularly effective for periodic-job use cases such as replacing an Oracle reporting database.

Proof of Concept: Norton Communications Platform

Use Case
  • Send Email to selected users
  • For example, send alert to all Norton 360 users running Windows
  • Periodic Jobs to select users and identify them by product serial number
  • 59 million results from 100 million documents
Reporting Database
  • Reporting queries on the fully-normalized database would take days to execute.
  • Maintain a second copy of the database in denormalized form.
  • The reporting copy of the database is written back into Oracle.

Why investigate a Distributed Database alternative?

  • Cost-Effectively scale horizontally across stock hardware
  • Run wider range of ad-hoc queries and metrics in near-real-time
  • Enable fast-to-market queries. Avoid writing SQL schema and program code to add new Summary Tables for new queries.

The need for Join

  • The Seats table comprises over 10 tables denormalized into 90 columns.
  • The Features table defines the product features (e.g., platform type).
  • It is not feasible to combine the Seats and Features tables.
    • The Seats table contains 100 million rows.
    • The Features table contains about 116,000 rows.
    • If combined, updating a Feature field would cause hundreds of thousands of rows to be updated.
  • Elasticsearch Scroll can do join/compare combinations very efficiently.

  SELECT S.PRODUCT_SERIAL FROM
      SEATS S
      JOIN FEATURES F ON F.GVNG_SKU = S.SKU 
      WHERE
          S.ENABLED = 1
          AND S.END_DATE > SYSDATE
          AND F.PLATFORM = -1
	  
Leapfire’s PreJoin Solution

  • An innovative join-compare solution that works well for many use cases. 
  • No changes to Elasticsearch. No extra plugins. Uses existing Elasticsearch Query DSL.
  • Write a simple, configurable scroll program to generate reusable pre-join documents.
  • Write a scroll program to satisfy reporting queries.


Performance Results

  • We tested with different data set sizes and results count.
  • Elasticsearch cluster of 3 dedicated master nodes and 3 data nodes. AWS r4.xlarge: 4vCPU + 30.5 GiB memory
  • The first three groups show evenly distributed data yielding about 5% of the documents as results.
  • The last group shows the skewed data that is present in the Norton reporting database yielding nearly 60% results.
  • The error rows show the current limits of the Java REST API.
  • After query satisfaction, it takes about 20-90 seconds per million to download the results to the client program.

Learn the PreJoin Solution!

  • Leapfire can provide training and mentoring so that your software engineers can effectively implement the PreJoin Solution.
  • Offered as a package of 10 consulting hours.
    • Start with a 2-3 hour working session.
    • One hour working sessions on subsequent days to allow engineers to make progress between sessions.
    • On Premise or Video Conference.
  • Sole Proprietorship: Service is provided by the principal, Geena Rollins.

Findings

  • Elasticsearch can:
    • Take database updates and index documents in near-real-time.
    • Enable fast time-to-market for introducing new ad-hoc queries.
  • Elasticsearch Scroll can:
    • Do join-compare combinations using the PreJoin solution.
    • Satisfy ad-hoc queries very quickly.
    • Scale up to millions of results and scale down to thousands of results.
    • Require some results-processing by applications.
  • Elastic can:
    • Provide Training, Support, X-Pack Extensions, and Production SLAs.
  • Symantec can:
    • Protect the world’s data for consumers and enterprises.
  • Leapfire can:
    • Accelerate your Elasticsearch Development; Do Elasticsearch Fast.

         

Elastic asks

“How did Geena become so knowledgeable of Elasticsearch?”

Elastic Training Classes that I have taken

▸ Core Elasticsearch
▸ Core Elasticsearch: Operations
▸ From 5 to 1000 Nodes
▸ Core Elasticsearch: Developer
▸ Advanced Elasticsearch: Data Modeling

“And another one bites the dust. Replication has saved my butt… Because, I’ve got an Elastic Heart.”