Category: Json ETL

  • Complex Data Integration: Best Practices

    With 25 years plus experience as a Data Architect in both Fortune 500 and start up business I’ve seen a lot of pain points of complex data integration. Let’s list out typical issues facing data integration; (Note: This article is a work in progress) Poorly designed data models (This deserves 5 bullet points) Large and/or […]

  • ETL SQL into Elasticsearch

    Low latency, complex data model SQL data synchronization with Elasticsearch Elasticsearch (ES) is a NoSQL databases optimized for the fuzzy logic of text search. SQL databases tend to struggle with text search.¬† They are CPU and IO intensive and not as flexibility with the search logic. So there is a strong need for a SQL […]

  • ETL NoSQL Best Practices

    Mongo, Redis, Elasticsearch or Couch Legacy ETL tools are problematic for NoSQL databases. They assume tabular data and are not well suited for hierarchical Json data typically used in NoSQL databases. They also assume batch processing and JDBC style connection. ETL for NoSQL database use cases likely require a restful interface and streaming for low […]

  • Use Case: Automatic ETL schema generation

    Semistructured source data with automatic ETL schema generation¬†at SQL destination Problem: A Software as a Service (SaaS) company stores OLTP data in a mixed format of relational and semi-structured data. This multi-tenant data consists of workflow, form data and documents. The company must export the data for their clients in a tabular form by pivoting […]

  • ETL Data Lake vs Data Warehouse

    Rules Based/Metadata ETL has Lowered the LOE of ETL Typically Associated with a Data Warehouse. METL makes a fundamental change of the cost calculus between a Data Lake vs the IDW. A Data Lake is loosely integrated data typically placed in Hadoop. An IDW has tightly integrated data stored in either a relational database and/or […]