Category: Hadoop ETL

  • ETL NoSQL Best Practices

    Mongo, Redis, Elasticsearch or Couch Legacy ETL tools are problematic for NoSQL databases. They assume tabular data and are not well suited for hierarchical Json data typically used in NoSQL databases. They also assume batch processing and JDBC style connection. ETL for NoSQL database use cases likely require a restful interface and streaming for low…

  • Use Case: Automatic ETL schema generation

    Semistructured source data with automatic ETL schema generation at SQL destination Problem:A Software as a Service (SaaS) company stores OLTP data in a mixed format of relational and semi-structured data. This multi-tenant data consists of workflow, form data and documents. The company must export the data for their clients in a tabular form by pivoting the…

  • ETL Data Lake vs Data Warehouse

    Rules Based/Metadata ETL has Lowered the LOE of ETL Typically Associated with a Data Warehouse. METL makes a fundamental change of the cost calculus between a Data Lake vs the IDW. A Data Lake is loosely integrated data typically placed in Hadoop. An IDW has tightly integrated data stored in either a relational database and/or…