Improving IT Operational Efficiency for AI / ML, Data Science, and Analytics with BlueData
Articles,  Blog

Improving IT Operational Efficiency for AI / ML, Data Science, and Analytics with BlueData


(upbeat instrumental music) Hi, I’m Matt Maccaux, and I’m here today to talk about how HPE BlueData can provide infrastructure efficiency, agility back to the business, and a path to cloud and containers, from IT operations as it relates to big data and advanced analytics. So what does this mean from an operational and IT perspective? We have teams of data scientists, data engineers, and data analysts, all making requests for different things that we have to support. They all want to bring their own tools to the table. They have different IDE’s that they want to do their job with, whether it’s an IntelliJ, whether it’s a Jupyter Notebook, they’re all gonna have different sets of requirements. They need access to code, or models that we have to support from a number of repositories. And most importantly of all, they want access to data. And we have to provide that access to data, in a safe and secure manner, so that they are not corrupting the data stores that we worked so hard to curate. Meanwhile, we need to know how long they’re requesting these environments, the performance characteristics of those environments so that we can provide charge back and show back to them, because as we know in IT, nothing comes for free. Now, this should all be tapped in through metadata. This metadata drives interfaces like ServiceNow to provide this common interface for those sets of users, and then we use templates to drive automation. This may be your CI/CD, this may be your DevOps platform, but what’s important is under the covers, the infrastructure that you’ve built and you’ve rolled out over the years, and potentially the cloud infrastructure that you’re supporting, we need to be able to deploy these tools, these IDE’s, this code, across all of that heterogeneous infrastructure in a consistent manner, all manners through a consistent pane of glass. BlueData provides that through true multi tenancy. So under the covers, what BlueData is doing is we’re using software defined networks to create tenants, the logical grouping of those infrastructure resources. So that maybe I have a data science tenant, or I have a data engineering tenant. Now, under the covers, this may be running on bare metal servers, VM’s or potentially, in the cloud. Maybe this tenant, because it’s the most efficient place to do that, is running an EC2, or maybe this tenant is running on premises, using resisting cloud infrastructure The important thing to note here, is that we’re giving those users the choice, to pick their tools, and spin up these environments using containers. And so what BlueData is doing, is we are taking the set of applications and tools that those users want to use, and wrapping them in Docker containers, in an unmodified way. We can also take the tools that are using Kubernetes, and those containers, and deploy them on top of your Kubernetes orchestration framework that we in BlueData, are managing. Finally, though, we haven’t talked about data. We need to be able to now connect these environments to the data sources where they reside. And so, in most organizations we have ETL, ingest, batch and streaming jobs that are flowing into data lakes. This may be one lake, this is probably many lakes in most organizations. The point is, we’ve got curated data, that is flowing into this lake, that we need to treat as Read Only. The only people that get to put data in this lake, are the ones that follow the operational processes to get it through that ETL ingestion engine. And making sure that that data’s being fed back into the metadata engine, so that it can be available in the catalog. Now this read only data, we want to make available to these various clusters that exist out here. These clusters are cloud air clusters, spark clusters, whatever the case may be, and then we want to be able to create connections, from those clusters, in that multi tenant environment, in a read only way. But we also want to be able to make connections into analytical sandboxes that are in the same logical lakes. But over here, it’s read-write. This is where these users can do joins of data that’s read only. They can potentially bring data from outside. All the while, we’re maintaining the integrity of the data lake and underlying data here but providing auditability and traceability over here. And so we have heterogenous infrastructure, we’ve separated compute from storage, which allows us to now scale out the storage and the compute tiers independent of one another. Leveraging the automation that probably already exists in your organization, and tying into the IT provisioning processes and tools that already exist. And so it’s HPE BlueData, that allows you to leverage existing tools and processes, use software defined networking containerization, it allows us to tap into the data sources where they exist, to provide that infrastructure efficiency, the agility, by spinning up and spinning down, and a path to cloud and containers with the HPE BlueData software. Learn more about HPE BlueData here.

Leave a Reply

Your email address will not be published. Required fields are marked *