{"componentChunkName":"component---src-templates-blog-post-js","path":"/post/opensource-data-lake-for-the-hybrid-cloud","result":{"data":{"headerImage":{"childImageSharp":{"fluid":{"aspectRatio":3.3992537313432836,"src":"/static/b72d38f0a9a131a445c0798c8f11b233/85c19/blog-post-intro.png","srcSet":"/static/b72d38f0a9a131a445c0798c8f11b233/c95ef/blog-post-intro.png 911w,\n/static/b72d38f0a9a131a445c0798c8f11b233/6d938/blog-post-intro.png 1822w,\n/static/b72d38f0a9a131a445c0798c8f11b233/85c19/blog-post-intro.png 3635w","srcWebp":"/static/b72d38f0a9a131a445c0798c8f11b233/bbedc/blog-post-intro.webp","srcSetWebp":"/static/b72d38f0a9a131a445c0798c8f11b233/8f106/blog-post-intro.webp 911w,\n/static/b72d38f0a9a131a445c0798c8f11b233/4b1a2/blog-post-intro.webp 1822w,\n/static/b72d38f0a9a131a445c0798c8f11b233/bbedc/blog-post-intro.webp 3635w","sizes":"(max-width: 3635px) 100vw, 3635px"}}},"relatedPosts":{"nodes":[{"fields":{"slug":"/blog-aws-kubernetes/"},"frontmatter":{"url":"aws-kubernetes/part-1","title":"The State of Kubernetes in AWS: Persistent Data Storage, Application Engineering and More","description":"When it comes to orchestrating containerized workloads, there are several options in the market, with [Kubernetes](https://kubernetes.io) being the most adopted and sought-after solution.","tags":["AWS","Kubernetes"],"date":"2022-12-20T16:44:23.317Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.5,"src":"/static/eb8228db77951dd583fd607fb3b3d3bd/836e2/kubernetes-and-aws.jpg","srcSet":"/static/eb8228db77951dd583fd607fb3b3d3bd/6e81a/kubernetes-and-aws.jpg 120w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/fbe0e/kubernetes-and-aws.jpg 240w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/836e2/kubernetes-and-aws.jpg 480w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/94285/kubernetes-and-aws.jpg 720w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/b1cc5/kubernetes-and-aws.jpg 960w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/097fa/kubernetes-and-aws.jpg 1920w","srcWebp":"/static/eb8228db77951dd583fd607fb3b3d3bd/35871/kubernetes-and-aws.webp","srcSetWebp":"/static/eb8228db77951dd583fd607fb3b3d3bd/83552/kubernetes-and-aws.webp 120w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/2b5a3/kubernetes-and-aws.webp 240w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/35871/kubernetes-and-aws.webp 480w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/9754a/kubernetes-and-aws.webp 720w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/fcc10/kubernetes-and-aws.webp 960w,\n/static/eb8228db77951dd583fd607fb3b3d3bd/30cf3/kubernetes-and-aws.webp 1920w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/kubernetes-node-management/"},"frontmatter":{"url":"karpenter","title":"Karpenter - A New Way to Manage Kubernetes Node Groups","description":"One of the most common discussions that happen when adopting Kubernetes is around autoscaling. You can autoscale your workloads horizontally or vertically, but the main challenge has always been the nodes.\n","tags":["Kubernetes","AWS"],"date":"2022-01-20T00:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.9047619047619047,"src":"/static/e0d4e328e64d982af16b722b7165263b/b460a/aws-karpenter.png","srcSet":"/static/e0d4e328e64d982af16b722b7165263b/d966b/aws-karpenter.png 120w,\n/static/e0d4e328e64d982af16b722b7165263b/67196/aws-karpenter.png 240w,\n/static/e0d4e328e64d982af16b722b7165263b/b460a/aws-karpenter.png 480w,\n/static/e0d4e328e64d982af16b722b7165263b/9a8d7/aws-karpenter.png 720w,\n/static/e0d4e328e64d982af16b722b7165263b/6e898/aws-karpenter.png 960w,\n/static/e0d4e328e64d982af16b722b7165263b/6050d/aws-karpenter.png 1200w","srcWebp":"/static/e0d4e328e64d982af16b722b7165263b/35871/aws-karpenter.webp","srcSetWebp":"/static/e0d4e328e64d982af16b722b7165263b/83552/aws-karpenter.webp 120w,\n/static/e0d4e328e64d982af16b722b7165263b/2b5a3/aws-karpenter.webp 240w,\n/static/e0d4e328e64d982af16b722b7165263b/35871/aws-karpenter.webp 480w,\n/static/e0d4e328e64d982af16b722b7165263b/9754a/aws-karpenter.webp 720w,\n/static/e0d4e328e64d982af16b722b7165263b/fcc10/aws-karpenter.webp 960w,\n/static/e0d4e328e64d982af16b722b7165263b/9000d/aws-karpenter.webp 1200w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/aws-kubernetes-part-2/"},"frontmatter":{"url":"aws-kubernetes/part-2","title":"The Current State of Kubernetes on AWS: Kubernetes Security, Scalability, Performance Engineering & More, Part 2","description":"In the first part of our two-part post on the current state of Kubernetes in AWS, we discussed how Kubernetes can help you handle stateful workloads with persistent data storage and standardize your application and data engineering approaches.","tags":["AWS","Kubernetes"],"date":"2021-12-09T08:30:41.061Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.5,"src":"/static/dddeb31efb8e1c04a57b32e10aa14653/836e2/kubernetes-security.jpg","srcSet":"/static/dddeb31efb8e1c04a57b32e10aa14653/6e81a/kubernetes-security.jpg 120w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/fbe0e/kubernetes-security.jpg 240w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/836e2/kubernetes-security.jpg 480w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/94285/kubernetes-security.jpg 720w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/b1cc5/kubernetes-security.jpg 960w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/097fa/kubernetes-security.jpg 1920w","srcWebp":"/static/dddeb31efb8e1c04a57b32e10aa14653/35871/kubernetes-security.webp","srcSetWebp":"/static/dddeb31efb8e1c04a57b32e10aa14653/83552/kubernetes-security.webp 120w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/2b5a3/kubernetes-security.webp 240w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/35871/kubernetes-security.webp 480w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/9754a/kubernetes-security.webp 720w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/fcc10/kubernetes-security.webp 960w,\n/static/dddeb31efb8e1c04a57b32e10aa14653/30cf3/kubernetes-security.webp 1920w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/gitops-why-is-it-relevant-now/"},"frontmatter":{"url":"gitops-why-is-it-relevant-now","title":"GitOps - Why is it Relevant Now?","description":"There seems to have been a lot of talk about GitOps just recently. This impression is certainly reinforced by the sessions and booths during KubeCon San Diego late 2019. Regardless of the discipline or services, GitOps was the keyword that was constantly repeated.","tags":["Kubernetes"],"date":"2020-01-21T17:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.3333333333333333,"src":"/static/602b397bd0ef200acbf6007f11c2f3f5/836e2/shutterstock_1019460151-1-.jpg","srcSet":"/static/602b397bd0ef200acbf6007f11c2f3f5/6e81a/shutterstock_1019460151-1-.jpg 120w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/fbe0e/shutterstock_1019460151-1-.jpg 240w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/836e2/shutterstock_1019460151-1-.jpg 480w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/94285/shutterstock_1019460151-1-.jpg 720w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/b1cc5/shutterstock_1019460151-1-.jpg 960w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/405f0/shutterstock_1019460151-1-.jpg 4856w","srcWebp":"/static/602b397bd0ef200acbf6007f11c2f3f5/35871/shutterstock_1019460151-1-.webp","srcSetWebp":"/static/602b397bd0ef200acbf6007f11c2f3f5/83552/shutterstock_1019460151-1-.webp 120w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/2b5a3/shutterstock_1019460151-1-.webp 240w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/35871/shutterstock_1019460151-1-.webp 480w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/9754a/shutterstock_1019460151-1-.webp 720w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/fcc10/shutterstock_1019460151-1-.webp 960w,\n/static/602b397bd0ef200acbf6007f11c2f3f5/cdeed/shutterstock_1019460151-1-.webp 4856w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/setting-up-a-multi-tenant-aws-eks-cluster/"},"frontmatter":{"url":"setting-up-a-multi-tenant-aws-eks-cluster","title":"Setting up a Multi-tenant Amazon EKS cluster: a few things to consider","description":"MyOps prides itself in heavy use of cloud-native technology, and Kubernetes is often the primary platform of choice to run containerized workloads. In this blog we discuss using name space, network policies, Integrating AWS IAM to EKS cluster/workloads, isolation techniques and much more.","tags":["Kubernetes","AWS"],"date":"2019-12-12T17:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.7647058823529411,"src":"/static/242e9209b664bee2a7dc6b090d3a07e1/836e2/setting-up-multi-tenant-aws-eks-cluster.jpg","srcSet":"/static/242e9209b664bee2a7dc6b090d3a07e1/6e81a/setting-up-multi-tenant-aws-eks-cluster.jpg 120w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/fbe0e/setting-up-multi-tenant-aws-eks-cluster.jpg 240w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/836e2/setting-up-multi-tenant-aws-eks-cluster.jpg 480w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/94285/setting-up-multi-tenant-aws-eks-cluster.jpg 720w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/b1cc5/setting-up-multi-tenant-aws-eks-cluster.jpg 960w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/e147c/setting-up-multi-tenant-aws-eks-cluster.jpg 5760w","srcWebp":"/static/242e9209b664bee2a7dc6b090d3a07e1/35871/setting-up-multi-tenant-aws-eks-cluster.webp","srcSetWebp":"/static/242e9209b664bee2a7dc6b090d3a07e1/83552/setting-up-multi-tenant-aws-eks-cluster.webp 120w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/2b5a3/setting-up-multi-tenant-aws-eks-cluster.webp 240w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/35871/setting-up-multi-tenant-aws-eks-cluster.webp 480w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/9754a/setting-up-multi-tenant-aws-eks-cluster.webp 720w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/fcc10/setting-up-multi-tenant-aws-eks-cluster.webp 960w,\n/static/242e9209b664bee2a7dc6b090d3a07e1/b4d70/setting-up-multi-tenant-aws-eks-cluster.webp 5760w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/walkthrough-ecs-local/"},"frontmatter":{"url":"walkthrough-ecs-local","title":"Walkthrough - ECS Local: Bringing ECS to your local environment","description":"As someone who works with AWS on a day-to-day basis, It's important to stay up to date with all the changes and new features of the different services on the platform. That's how one recent announcement caught my eye - The new capability of local testing of ECS.","tags":["Kubernetes","AWS"],"date":"2019-09-17T16:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":2.142857142857143,"src":"/static/12224681f2fd40bf0749423e29cf8d0c/836e2/technology-education-information-handover.jpg","srcSet":"/static/12224681f2fd40bf0749423e29cf8d0c/6e81a/technology-education-information-handover.jpg 120w,\n/static/12224681f2fd40bf0749423e29cf8d0c/fbe0e/technology-education-information-handover.jpg 240w,\n/static/12224681f2fd40bf0749423e29cf8d0c/836e2/technology-education-information-handover.jpg 480w,\n/static/12224681f2fd40bf0749423e29cf8d0c/94285/technology-education-information-handover.jpg 720w,\n/static/12224681f2fd40bf0749423e29cf8d0c/b1cc5/technology-education-information-handover.jpg 960w,\n/static/12224681f2fd40bf0749423e29cf8d0c/0ff54/technology-education-information-handover.jpg 1200w","srcWebp":"/static/12224681f2fd40bf0749423e29cf8d0c/35871/technology-education-information-handover.webp","srcSetWebp":"/static/12224681f2fd40bf0749423e29cf8d0c/83552/technology-education-information-handover.webp 120w,\n/static/12224681f2fd40bf0749423e29cf8d0c/2b5a3/technology-education-information-handover.webp 240w,\n/static/12224681f2fd40bf0749423e29cf8d0c/35871/technology-education-information-handover.webp 480w,\n/static/12224681f2fd40bf0749423e29cf8d0c/9754a/technology-education-information-handover.webp 720w,\n/static/12224681f2fd40bf0749423e29cf8d0c/fcc10/technology-education-information-handover.webp 960w,\n/static/12224681f2fd40bf0749423e29cf8d0c/9000d/technology-education-information-handover.webp 1200w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/opensource-data-lakes-for-the-hybrid-cloud-designing-an-oss-datalake/"},"frontmatter":{"url":"opensource-data-lakes-for-the-hybrid-cloud-designing-an-oss-datalake","title":"OpenSource Data Lake for the Hybrid Cloud - Part 2: Designing an OSS DataLake","description":"In part 1 of this series, we answered the question of WHY Open Source components are often an attractive option when building a data lake of any significant size. In this second installment, we describe HOW to cost-effectively build a data lake out of Open Source components.","tags":["Kubernetes","Big Data"],"date":"2019-08-27T16:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.6,"src":"/static/107087aec2d3327919bcfb2ab38201da/836e2/datalake-p2.jpg","srcSet":"/static/107087aec2d3327919bcfb2ab38201da/6e81a/datalake-p2.jpg 120w,\n/static/107087aec2d3327919bcfb2ab38201da/fbe0e/datalake-p2.jpg 240w,\n/static/107087aec2d3327919bcfb2ab38201da/836e2/datalake-p2.jpg 480w,\n/static/107087aec2d3327919bcfb2ab38201da/94285/datalake-p2.jpg 720w,\n/static/107087aec2d3327919bcfb2ab38201da/b1cc5/datalake-p2.jpg 960w,\n/static/107087aec2d3327919bcfb2ab38201da/32638/datalake-p2.jpg 6399w","srcWebp":"/static/107087aec2d3327919bcfb2ab38201da/35871/datalake-p2.webp","srcSetWebp":"/static/107087aec2d3327919bcfb2ab38201da/83552/datalake-p2.webp 120w,\n/static/107087aec2d3327919bcfb2ab38201da/2b5a3/datalake-p2.webp 240w,\n/static/107087aec2d3327919bcfb2ab38201da/35871/datalake-p2.webp 480w,\n/static/107087aec2d3327919bcfb2ab38201da/9754a/datalake-p2.webp 720w,\n/static/107087aec2d3327919bcfb2ab38201da/fcc10/datalake-p2.webp 960w,\n/static/107087aec2d3327919bcfb2ab38201da/85285/datalake-p2.webp 6399w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/opensource-data-lake-for-the-hybrid-cloud/"},"frontmatter":{"url":"opensource-data-lake-for-the-hybrid-cloud","title":"OpenSource Data Lake for the Hybrid Cloud - Part 1","description":"Data lakes have become the de-facto standard for Enterprises and Corporations looking to take advantage of their existing Data.\n","tags":["Kubernetes","Big Data"],"date":"2019-06-17T16:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.5,"src":"/static/8640602d41c9ebdbd88a4281c37bcae9/836e2/myops-data-lake-blog-profile-1-.jpg","srcSet":"/static/8640602d41c9ebdbd88a4281c37bcae9/6e81a/myops-data-lake-blog-profile-1-.jpg 120w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/fbe0e/myops-data-lake-blog-profile-1-.jpg 240w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/836e2/myops-data-lake-blog-profile-1-.jpg 480w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/94285/myops-data-lake-blog-profile-1-.jpg 720w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/b1cc5/myops-data-lake-blog-profile-1-.jpg 960w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/724c8/myops-data-lake-blog-profile-1-.jpg 1000w","srcWebp":"/static/8640602d41c9ebdbd88a4281c37bcae9/35871/myops-data-lake-blog-profile-1-.webp","srcSetWebp":"/static/8640602d41c9ebdbd88a4281c37bcae9/83552/myops-data-lake-blog-profile-1-.webp 120w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/2b5a3/myops-data-lake-blog-profile-1-.webp 240w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/35871/myops-data-lake-blog-profile-1-.webp 480w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/9754a/myops-data-lake-blog-profile-1-.webp 720w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/fcc10/myops-data-lake-blog-profile-1-.webp 960w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/36ebb/myops-data-lake-blog-profile-1-.webp 1000w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/top-10-misconceptions-around-migrating-hadoop/"},"frontmatter":{"url":"top-10-misconceptions-around-migrating-hadoop","title":"Top 10 Misconceptions around Migrating Hadoop to the Cloud","description":"Lots of mid-size companies and Enterprises want to leverage the Cloud for their Data Processing requirements. But in reality migrating a production, Petabyte scale, multi-component Data Processing pipeline from on-prem to the Cloud can be a nightmare.","tags":["Big Data"],"date":"2018-11-26T17:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":1.5,"src":"/static/d5db71f736f36c26e2d3007f65b0dd52/836e2/cloud-elephant.jpg","srcSet":"/static/d5db71f736f36c26e2d3007f65b0dd52/6e81a/cloud-elephant.jpg 120w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/fbe0e/cloud-elephant.jpg 240w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/836e2/cloud-elephant.jpg 480w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/94285/cloud-elephant.jpg 720w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/b1cc5/cloud-elephant.jpg 960w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/0ff54/cloud-elephant.jpg 1200w","srcWebp":"/static/d5db71f736f36c26e2d3007f65b0dd52/35871/cloud-elephant.webp","srcSetWebp":"/static/d5db71f736f36c26e2d3007f65b0dd52/83552/cloud-elephant.webp 120w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/2b5a3/cloud-elephant.webp 240w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/35871/cloud-elephant.webp 480w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/9754a/cloud-elephant.webp 720w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/fcc10/cloud-elephant.webp 960w,\n/static/d5db71f736f36c26e2d3007f65b0dd52/9000d/cloud-elephant.webp 1200w","sizes":"(max-width: 480px) 100vw, 480px"}}}}},{"fields":{"slug":"/securing-kubernetes-secrets-how-to-efficiently-secure-access-to-etcd-and-protect-your-secrets/"},"frontmatter":{"url":"securing-kubernetes-secrets-how-to-efficiently-secure-access-to-etcd-and-protect-your-secrets","title":"Securing Kubernetes secrets: How to efficiently secure access to etcd and protect your secrets","description":"Etcd is a distributed, consistent and highly-available key value store used as the Kubernetes backing store for all cluster data, making it a core component of every K8s deployment. Due to its central role etcd may contain sensitive information related to access of the deployed services and their associated components,","tags":["Kubernetes","Security"],"date":"2018-06-20T16:00:00.000Z","image":{"childImageSharp":{"fluid":{"aspectRatio":0.7407407407407407,"src":"/static/62bd016a89ce5970467a24df70a52cf0/836e2/close-up-door-golden-67537.jpg","srcSet":"/static/62bd016a89ce5970467a24df70a52cf0/6e81a/close-up-door-golden-67537.jpg 120w,\n/static/62bd016a89ce5970467a24df70a52cf0/fbe0e/close-up-door-golden-67537.jpg 240w,\n/static/62bd016a89ce5970467a24df70a52cf0/836e2/close-up-door-golden-67537.jpg 480w,\n/static/62bd016a89ce5970467a24df70a52cf0/94285/close-up-door-golden-67537.jpg 720w,\n/static/62bd016a89ce5970467a24df70a52cf0/b1cc5/close-up-door-golden-67537.jpg 960w,\n/static/62bd016a89ce5970467a24df70a52cf0/fb46d/close-up-door-golden-67537.jpg 2820w","srcWebp":"/static/62bd016a89ce5970467a24df70a52cf0/35871/close-up-door-golden-67537.webp","srcSetWebp":"/static/62bd016a89ce5970467a24df70a52cf0/83552/close-up-door-golden-67537.webp 120w,\n/static/62bd016a89ce5970467a24df70a52cf0/2b5a3/close-up-door-golden-67537.webp 240w,\n/static/62bd016a89ce5970467a24df70a52cf0/35871/close-up-door-golden-67537.webp 480w,\n/static/62bd016a89ce5970467a24df70a52cf0/9754a/close-up-door-golden-67537.webp 720w,\n/static/62bd016a89ce5970467a24df70a52cf0/fcc10/close-up-door-golden-67537.webp 960w,\n/static/62bd016a89ce5970467a24df70a52cf0/d0805/close-up-door-golden-67537.webp 2820w","sizes":"(max-width: 480px) 100vw, 480px"}}}}}]},"socials":{"frontmatter":{"socials":{"linkedin":"https://www.linkedin.com/company/myops-yael","github":"https://github.com/opsguru-israel"}}},"markdownRemark":{"html":"<p>Data lakes have become the de-facto standard for Enterprises and Corporations looking to harness value and take advantage of their existing data. Ultimately, businesses do not care whether they're running their workloads on public, private or hybrid cloud. They just want to make sure that they do not miss out on the opportunities their data offers.</p>\n<p>Through our experience we're convinced that when it comes to deploying data lakes, the public cloud is by far the cheapest option for deploying most of the data lake solutions.</p>\n<p>The public cloud offers:</p>\n<ul>\n<li>Independently scaling of storage and compute capacity (don't run servers when they're not needed)</li>\n<li>Rapid innovation cycles providing virtually infinite scale spin up and spin down capabilities to test theories as fast as they are thought of</li>\n<li>Simple storage for both relational and non relational data</li>\n<li>Pay as you go flexibility</li>\n<li>Spot or preemptive compute capacity offering significant savings for batch analytics spikes</li>\n</ul>\n<p>In this series of blogs, we will share some of MyOps's experience building Open Source data lakes on public and hybrid clouds. We will also provide some detailed architecture examples of the most critical components for any data lake.</p>\n<h2><strong>Why should I care about an Open Source Data Lake?</strong></h2>\n<p>Two common questions we hear from our customers: <em>“What value do Open Source solutions provide?”</em> and <em>“Why should I investigate building my own solution when my cloud provider can build one for me?”</em></p>\n<p>First, let’s analyse in detail what it is that you’re really deciding when choosing cloud vendor solutions versus your own:</p>\n<p><strong><em>1) The True Total Cost of Ownership:</em></strong></p>\n<p>We will conduct a basic cost analysis of an ingestion pipeline running on AWS. We assume that we are evaluating Apache Kafka as an alternative to Kinesis Streams.</p>\n<h2><strong><em>Kinesis Streams</em></strong></h2>\n<p>Assuming we have a single stream that is used primarily for data ingestion:</p>\n<p><img src=\"/img/kinesis-streams-myops-data-table.png\"></p>\n<p>In order to ingest 50,000 messages per second with 24h of data retention, an annual cost of <strong>$52,400</strong> is expected. This number doubles to <strong>$104,800</strong> if your Kinesis stream is storing data with increased retention (up to a week).</p>\n<h2><strong>*Open Source</strong>:*</h2>\n<p>Now, let’s look at an alternative, Kafka and Zookeeper running on top of Amazon EC2 or Kubernetes utilizing Stateful Sets.</p>\n<p>A few assumptions are made here of course, compression (such as: ‘gzip’) is being used and we are replicating data (replication factor of 3) for increased redundancy:</p>\n<ul>\n<li>3 x <strong>m3.medium</strong> machines with 10GB io1 EBS volumes for Zookeeper</li>\n<li>3 x <strong>m5.2xlarge</strong> machines with 5TB st1 EBS volumes for Kafka</li>\n</ul>\n<p>The cost of the infrastructure that is confidently able to handle similar workloads on demand, will be <strong>$1,840</strong> per month which runs an annual cost of <strong>$22,080</strong> per year. This excludes any other potential instance savings. Even if you factor-in some engineering effort the annual cost of your Open Source solution will be considerably cheaper than a vendor based solution. This is especially evident when you extrapolate out over longer terms:</p>\n<p><img src=\"/img/myops-data-lake-blog-graph-a.png\" alt=\"Open source Kafka vs AWS Kineses cost graph.\"></p>\n<p>As shown above, we have factored in an initial expenses of approximately <strong>$60,000</strong> for deploying an Open Source solution either through your own staff or by using a consultant, like MyOps. In this example, the break-even point happens within the first year of operation. Overall Kinesis is not cheaper than the Open Source example, and offers you less flexibility while locking your services into a specific cloud vendor. If you factor data growth of your streaming solution - the break-even point arrives even sooner.</p>\n<h2><strong><em>Managed Kafka:</em></strong></h2>\n<p>AWS recently released a managed Kafka solution, Amazon MSK, with similar configurations of instances as per our previous examples. This can be a much more appealing option, however it’s important to also conduct a cost benefit example for Amazon MSK. Depending on your configuration, our modelling shows that the calculated costs of Amazon MSK can be almost twice as expensive as the Open Source alternative.</p>\n<p>An Amazon MSK deployment with almost identical configurations (3x <strong>kafka.m5.2xlarge</strong> brokers with similar storage) can cost <strong>$3,350</strong> a month or <strong>$40,200</strong> a year. While this is already almost double the cost of the Open Source alternative, it’s also important to note some rather large feature gaps when utilizing the service. Amazon MSK requires that you deploy your Kafka cluster across a minimum of 3 availability zones. While this is great for redundancy, there is a rather large hidden transfer cost when you factor in your application consuming data from brokers across availability zones. When you factor the $0.01/GB charge for data transfers you will find a pipeline processing 50,000 messages per second (of an average size of 10KB) will transfer ~500GB of data each day across availability zones. This adds an additional ~<strong>$4,562</strong> per month and gives us a more accurate cost estimate of <strong>$90,000 to $100,000</strong> per year.</p>\n<p><img src=\"/img/myops-data-lake-blog-graph-b.png\" alt=\"Open source Kafka vs AWS Kineses cost graph.\"></p>\n<p>As your streaming throughput and retention requirements increase, vendor solutions quickly become less appealing. This can be well summarised as a rent vs buy argument, just like if you were looking for a place to live. We have seen time and time again, that an Open Source solution becomes cheaper and more appealing as volumes grow and time goes by.</p>\n<p>* Reference: <em><a href=\"https://calculator.s3.amazonaws.com/index.html\">AWS cost</a></em> <em><a href=\"https://calculator.s3.amazonaws.com/index.html\">calculator</a></em>.</p>\n<p><img src=\"/img/myops-quote-data-lake-1-.png\"></p>\n<p><strong><em>2 ) Vendor lock-in</em></strong></p>\n<p>Another key discussion point for an approach to deploying a data lake is vendor lock-in.</p>\n<p>The main question we believe you need to ask yourself is: Given our business requirements to remain flexible across multiple cloud providers. Will our teams be able to migrate our services from vendor A to vendor B, without a significant engineering effort, at relatively low cost, and in a timely manner?</p>\n<p>Obviously, there are some services that are so tightly integrated into the vendor’s platform that it is almost never worth trying to implement an Open Source alternative. Object stores like Amazon S3 and Google Cloud Storage are great examples. These global scale systems are very simple in an operational sense and in some cases even <em><a href=\"https://cloud.google.com/storage/docs/interoperability\">compatible</a></em> with each others APIs. Because of this, we recommend utilizing a hybrid approach where <em>some</em> base primitives are consumed from a cloud vendor (Kubernetes, Blob Storage, etc.) but you still maintain the flexibility Open Source systems provide.</p>\n<p>Developing using proprietary SDKs can be a significant sunk cost. Ultimately if another cloud provider offers a significant saving on compute, you want to be able to chase that opportunity without worrying about portability.</p>\n<p>Thankfully, there’s also a good answer to this. Kubernetes has become the industry standard method for orchestrating technology and is heralded because of its portability. If we base ourselves in Kubernetes, one deployment script can be utilised to deploy to AWS, GCP, Azure or on premises with very little extra effort.</p>\n<p>By approaching the vendor lock-in problem pragmatically, we combine the power of “the cloud” with the portability to take your services anywhere. This sets up your business for long term success.</p>\n<p><strong><em>3) Increased performance, visibility and customization:</em></strong></p>\n<p><br>\nAnother reason to investigate your own customized solution for services such as Kafka is increased performance.</p>\n<p>We work with clients both large and small who want to squeeze every last drop of performance out existing compute resources. By allowing flexible instance types, storage and opening up kernel parameters we have seen significant increases of throughput without costing any more. Being able to attach to and configure the runtime layer (for example JVM) also affords benefits for performance tuning.</p>\n<p>Our experience shows that vendor provided visibility services are rather basic and lacking. By running your own Open Source solution you are free to utilize whatever monitoring solution you use, or to run our preferred Open Source system - Prometheus. With dedicated exporters you can export any relevant service metric and gain better insights into the service internals for a much lower cost.</p>\n<p><strong><em>4) Security and compliance requirements:</em></strong></p>\n<p>Strict security and compliance requirements are another reason for choosing to deploy an Open Source data lake.</p>\n<p>Some regulations require strict data locality enforcement and/or no internet access (even in an encrypted form). Many of the systems deployed by cloud vendors do not allow for this type of deployment.</p>\n<p>Additionally, there are cases where customer managed encryption keys or HSM keys are required. These are not always supported by a specific vendor service and may require downgrade functionality if used. By deploying an Open Source solution you are in control of encryption at rest and in transit which allows you to configure strict security policies when used.</p>\n<p>We have deployed Open Source solutions in some of the most secure and regulated sectors on the planet. Many companies will simply not be able to use cloud vendor services which do not meet their security requirements.</p>\n<p><strong><em>5) Limited feature sets and regional availability:</em></strong></p>\n<p>Cloud managed services may not always support the features offered by the Open Source software they are based on. If we take the AWS managed Kafka service as an example, at the time of writing, the following could be considered key blockers for your workloads:</p>\n<ul>\n<li>No in-place rolling upgrades, cluster migration required to upgrade software</li>\n<li>Limited broker version support (1.1.1 and 2.1 only, at the time of writing)</li>\n<li>No custom 3rd party jars such as data balancers or metrics exporters</li>\n<li>No hosted schema registry</li>\n<li>Inter-zone network transfer costs caused by clients consuming and producing from brokers</li>\n</ul>\n<p>Additionally, even within the same cloud vendor, often services are regionally available. Cloud vendors prioritize feature releases to their largest or most strategic regions. As an example, Amazon MSK is not available in the Canadian region meaning if your deployments required Canadian data locality, you will need to investigate your own solution there.</p>\n<p>It’s important to note that it took Amazon MQ (a managed message queue) <a href=\"https://aws.amazon.com/blogs/aws/amazon-mq-managed-message-broker-service-for-activemq/\">more than a year an a half</a> to be available in the Canadian region after the initial launch of the service.</p>\n<p>Now that we have established what might drive an organisation to adopt an Open Source data lake, we will talk about some of the design patterns and considerations to deploying them on the cloud.</p>\n<p>We expect this series of blog posts will cover the end to end lifecycle of deploying and managing data lakes.</p>\n<p>If you have any questions or queries, feel free to reach out to <em><a href=\"mailto:info@myops.co.il\">info@myops.co.il</a></em> to discuss further.</p>","frontmatter":{"url":"opensource-data-lake-for-the-hybrid-cloud","seo":{"title":"OpenSource Data Lake for the Hybrid Cloud - Part 1","description":"Data lakes have become the de-facto standard for Enterprises and Corporations looking to take advantage of their existing Data.","canonical":null,"image":{"childImageSharp":{"fluid":{"aspectRatio":1.4970059880239521,"src":"/static/8640602d41c9ebdbd88a4281c37bcae9/724c8/myops-data-lake-blog-profile-1-.jpg","srcSet":"/static/8640602d41c9ebdbd88a4281c37bcae9/84d81/myops-data-lake-blog-profile-1-.jpg 250w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/f0719/myops-data-lake-blog-profile-1-.jpg 500w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/724c8/myops-data-lake-blog-profile-1-.jpg 1000w","srcWebp":"/static/8640602d41c9ebdbd88a4281c37bcae9/36ebb/myops-data-lake-blog-profile-1-.webp","srcSetWebp":"/static/8640602d41c9ebdbd88a4281c37bcae9/1d872/myops-data-lake-blog-profile-1-.webp 250w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/4e6d4/myops-data-lake-blog-profile-1-.webp 500w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/36ebb/myops-data-lake-blog-profile-1-.webp 1000w","sizes":"(max-width: 1000px) 100vw, 1000px","maxHeight":667,"maxWidth":1000}}}},"title":"OpenSource Data Lake for the Hybrid Cloud - Part 1","date":"2019-06-17T16:00:00.000Z","tags":["Kubernetes","Big Data"],"author":{"name":"MyOps","photo":{"extension":"png","publicURL":"/static/3ff870573bc56665ee67e3cf3f5fc163/logo-small.png","childImageSharp":{"fluid":{"aspectRatio":0.8759124087591241,"src":"/static/3ff870573bc56665ee67e3cf3f5fc163/b460a/logo-small.png","srcSet":"/static/3ff870573bc56665ee67e3cf3f5fc163/d966b/logo-small.png 120w,\n/static/3ff870573bc56665ee67e3cf3f5fc163/67196/logo-small.png 240w,\n/static/3ff870573bc56665ee67e3cf3f5fc163/b460a/logo-small.png 480w,\n/static/3ff870573bc56665ee67e3cf3f5fc163/eec14/logo-small.png 596w","srcWebp":"/static/3ff870573bc56665ee67e3cf3f5fc163/35871/logo-small.webp","srcSetWebp":"/static/3ff870573bc56665ee67e3cf3f5fc163/83552/logo-small.webp 120w,\n/static/3ff870573bc56665ee67e3cf3f5fc163/2b5a3/logo-small.webp 240w,\n/static/3ff870573bc56665ee67e3cf3f5fc163/35871/logo-small.webp 480w,\n/static/3ff870573bc56665ee67e3cf3f5fc163/c0cb3/logo-small.webp 596w","sizes":"(max-width: 480px) 100vw, 480px"}}}},"image":{"childImageSharp":{"fluid":{"aspectRatio":1.5,"src":"/static/8640602d41c9ebdbd88a4281c37bcae9/22a6f/myops-data-lake-blog-profile-1-.jpg","srcSet":"/static/8640602d41c9ebdbd88a4281c37bcae9/15aed/myops-data-lake-blog-profile-1-.jpg 300w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/a07a5/myops-data-lake-blog-profile-1-.jpg 600w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/22a6f/myops-data-lake-blog-profile-1-.jpg 1000w","srcWebp":"/static/8640602d41c9ebdbd88a4281c37bcae9/c8563/myops-data-lake-blog-profile-1-.webp","srcSetWebp":"/static/8640602d41c9ebdbd88a4281c37bcae9/4fec1/myops-data-lake-blog-profile-1-.webp 300w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/483a3/myops-data-lake-blog-profile-1-.webp 600w,\n/static/8640602d41c9ebdbd88a4281c37bcae9/c8563/myops-data-lake-blog-profile-1-.webp 1000w","sizes":"(max-width: 1000px) 100vw, 1000px"}}}}}},"pageContext":{"id":"981da38b-6d22-5c59-b0f2-ac959bef42ca","categories":["Kubernetes","Big Data"]}},"staticQueryHashes":["2022990323","639612397"]}