8 Security Standards For Safeguarding Customer Data

As a company, Plutoshift has many responsibilities towards our customers, teammates, vendors, and the environment. We manage critical data across many facets of our business. Being accountable for data privacy is at the very top of our priority list.

Our approach to data privacy and protection is straightforward. We are committed to upholding the highest and most internationally recognized privacy standards while maintaining our record of zero data security incidents.

In today’s world, it’s critical that businesses use data to make decisions. Our operational data platform was built for businesses to use their data to monitor physical infrastructure. Your data is the fuel that drives our platform’s engine. Just like how fuel from different companies varies in quality, data from different companies also varies in quality. Your data is unique and it gives your company a specific competitive advantage. We know you can’t afford to lose that edge while pursuing more efficient and effective operations.

That’s why we follow industry-leading security standards for data storage and protection. This encompasses security standards for how customer data is stored within our platform. It also includes user access requirements for things like passwords and administrative controls. Below are 8 ways we protect our customers’ data:

  1. All customer data is stored in a secure cloud container.
  2. Each container is assigned to a customer and there is no sharing of the containers.
  3. Plutoshift hosts the solution in a secure cloud and each solution is unique to the customer.
  4. There is no sharing of data with any other vendors, partners, or third parties.
  5. We engage certified auditors to evaluate our policies and procedures.
  6. User access is tightly controlled by the use of secure passwords and CAPTCHA.
  7. Users are not able to change the models or predictions.
  8. Admin access is granted to only those designated users within the customer’s organization who need to have the ability to provide individual user access and/or delete, demote, and disable other users.

As part of our ongoing commitment to data protection, we will review our policies and practices on a quarterly basis and update our customers on any changes. 

In an increasingly data-first world, we appreciate your trust in Plutoshift to keep your operations running safely and efficiently.

Towards 3Z Podcast: Zero Emissions, Zero Downtime, Zero Waste and Digital Transformation

I was honored to join Albert Vazquez-Agusti on Towards 3Z’s first podcast to talk about zero emissions, zero downtime and zero waste in a world where industrial transformation and energy transition are a must for everyone’s safety and economic development.

During this podcast, you’ll learn:

  • How to deploy an enterprise data platform across several plants belonging to the same company
  • How Covid accelerated the adoption of automation across various workflows
  • How to manage conversations with customers when they are considering CAPEX versus OPEX accounting in enterprise software
  • The need to focus on thoroughly assessing prospective customers to avoid “pilot purgatory”

Click the link below to listen.

https://medium.com/albert-vazquez-agusti/towards-3z-podcast-with-prateek-joshi-from-plutoshift-episode-1-f3eb14767559

6 key ingredients of successful Machine Learning deployments

Machine Learning (ML) is a vehicle to achieve Artificial Intelligence (AI).

ML provides a framework to create intelligent systems that can process new data and produce useful output that can be used by humans. Automation technologies are the fastest-growing type of AI. Why? Because they are faster to implement, easy to deploy, and have high ROI. Leaders at an organization are often faced with the problem of figuring out how to make it work within their business.

Before any new technology is adopted, it needs to prove that it works. Business leaders need to create success templates to show how to make it work within their organizations. These success templates can then be used to drive enterprise-wide adoption.

How do we make machine learning deployments successful?

From our experience, there are 6 key ingredients to achieve this:

1. Identify a work process with repetitive steps

You should start by identifying the right work process. A good target here is a process where someone has to go through the same steps over and over again to get to a piece of information. Before deploying ML, the question you should ask is whether or not such a process exists. If this process exists, will the people benefit from solving it? If solved, it can directly increase productivity and revenue for the company. These work processes are actually very simple to describe as shown below:
– “How much electricity did the membranes consume 3 days ago?”
– “How long do we take on average to fix our pumps when someone files a support ticket?”
– “How much money did we spend last month on chemical dosing?”

2. Gather data specific to that work process

Once you identify a work process, you need to gather data for it. You should be selective with your data. You need to understand what specific data is going to support this particular operation. If you try to digest all available data, it leads to chaos and suboptimal outcomes. If you’re disciplined around what data you need, it will drive focus on the outcomes and ensure that the ML deployment is manageable. We conducted a survey of 500 professionals to get their take on operation-specific digital transformation and we found 78% felt supported by their team leaders when they embarked on this approach. Here’s the full report: Instruments of Change: Professionals Achieving Success Through Operation-Specific Digital Transformation

3. Create a blueprint for the data workflow

Once you have a clear understanding of the data, the next step is to create a blueprint for the data workflow. A data workflow is a series of steps that a human would take to transform raw data into useful information. Instead of figuring out a way to work with all the available data across the entire company, you should pick a workflow that’s very specific to an operation and create a blueprint of how the data should be transformed. This allows you to understand what it takes to get something working. The output of this data workflow is the information that can be consumed by the operations team on a daily basis.

4. Automate the data workflow

Once you have the blueprint for the data workflow, you should automate it. An automated data workflow connects to the data sources, continuously pulls the data, and transforms it. The operations teams will be able to access the latest information at all times. New data that gets generated goes through this workflow as well.

5. Create and track the benefits scorecard

The main reason you’re creating the automated data workflow is to drive a specific outcome. This outcome should be measurable and should have a direct impact on the business. You should involve all the stakeholders in creating and tracking this benefits scorecard. The people implementing and using the ML system should hold themselves accountable with respect to this benefits scorecard. The time to realize those benefits should be 90 days or less.

6. Build the data infrastructure to scale

Once you successfully execute on this workflow, what do you do next? You should be able to replicate it with more workflows across the company. A PoC is not useful if it can’t scale across the entire organization. Make sure you have the data infrastructure that supports deploying a wide range of workflows. A good platform has the necessary data infrastructure built into it. It will enable you to create many workflows easily on top of it. The capabilities of the platform include automating all the work related to data — checking data quality, processing data, transforming data, storing data, retrieving data, visualizing data, keeping it API-ready, and validating data integrity. This will allow you to successfully use the platform to drive real business value at scale.

The Water Values Podcast: Digital Transformation with Prateek Joshi

CEO Prateek Joshi talks about digital transformation in the water sector. Prateek hits on a number of important and practical points in a wide-ranging discussion on data, AI, and machine learning in the water sector.

In this session, you’ll learn about: 

  • Prateek’s background & how it influenced his arc into the water sector
  • Water-intensive industries and using water data in those industries
  • Prateek’s view on digital transformation
  • How COVID influenced the digital transformation
  • The limitations of human-based decision-making
  • Common challenges for data-centric organizations
  • How to drive organizational behavior change with respect to data usage
  • The difference between AI and machine learning
  • Data quality and verification issues
  • The factors companies look for when selecting an AI system

Click the link below to listen:

https://episodes.castos.com/watervalues/TWV-192-Digital-Transformation-with-Prateek-Joshi.mp3

 

 

8 Dimensions of Data Quality

Large companies have enormous physical infrastructure. This infrastructure is well-instrumented and data is collected continuously. The Plutoshift platform uses this data to help them monitor their physical infrastructure. When we look at the data flowing in, we need to standardize and centralize the data in an automated way. One of the first steps in monitoring physical infrastructure is to check data quality. How do we do that? What framework should we use to validate data quality?

The topic of data quality is vast. There are many ways to check and validate data quality.

To automate the work of monitoring physical infrastructure, we employ a variety of machine learning tools. You need to automate the work of looking for anomalous performance metrics and surface them. If you look at deploying machine learning in these situations, it is basically a data infrastructure problem. If you have good data infrastructure, your machine learning tools will do a good job. Needless to say, your machine learning tools will look bad if the data infrastructure is not robust.

In our experience, there are 8 criteria we can use to ensure data quality in the world of physical infrastructure:

1. Consistency

There shouldn’t be any contradictions within the data. If you do a sweep of the entire data store, the observations should be consistent with each other. For example, let’s say that there’s a sensor monitoring the temperature of a system. The dataset shouldn’t contain the same timestamp with two different temperature values.

2. Accuracy

The data should accurately reflect the reality. You should be able to trust your instrumentation. In general, this is a consideration for the data collection systems. For example, let’s say that you’re looking at the data store for flow rates within a pipe. The data should accurately reflect the reality of what’s actually happening in the pipe. The machine learning model will assume that the data is true to make a prediction. If the data itself is inaccurate, the machine learning model can’t do much.

3. Relevancy

The data should be relevant to the use case. You need data that enables you to achieve a specific goal. For example, let’s say we’re looking at the energy consumption problem. If you want to reduce energy consumption, you need to have data on the levers that are responsible for driving energy consumption. Machine learning can’t do much with high-quality data if it’s not relevant.

4. Auditability

We should be able to trace the changes made to the data. You can make sure that nothing gets overwritten permanently. By understanding the changes made to the data over time, you can detect useful patterns. For example, let’s say that you’re looking at a response tracker filled with user-inputted values. The ability to trace the changes made to the data gives us the ability to look at the evolution of the dataset.

5. Completeness

It means that all elements of the data should be in our database. Fragmented data is one of the most issues of subpar performance. In order to drive a use case, you need all elements of the data. Data completeness allows machine learning models to perform better. For example, let’s say you are looking at monitoring membranes within the physical infrastructure at a beverage company. The aim is to predict cleaning dates and there are 5 key factors that affect the cleaning dates. If the dataset only has 3 of those, then the machine learning model can’t achieve the level of desired accuracy.

6. Timeliness

We should get data with minimal latency. Data tells us something about the real world. The sooner we know it, the sooner we can dissect it to take action. If something is happening in the real world, the data collection system should be able to get that data into the hands of the end-user with minimal latency. For example, let’s say we’re look at pump monitoring. In case of emergency, the aim is to take action within the hour to minimize damage. If the data collection system sends you the data with a gap of 3 hours, then you’ll be too late.

7. Orderliness

The data should have a fixed structure and format. Data format plays an important role in building scalable products. For software to work at a large scale, the data needs to be in an agreed-upon shape. This allows machine learning systems to work at scale, which is really powerful given the amount of data they can handle. For example, let’s say we’re looking at monitoring cooling systems across 400 sites. A machine learning model is effective if the data from all those sites is in a standardized format. If all 400 sites have different data formats, then you’ll have to build separate workflows for each. The ability to scale reduces.

8. Uniqueness

Data shouldn’t be duplicated. This one seems obvious, but data duplication is a very real issue that we face. In a given database, the data shouldn’t be duplicated. There’s no reason to store it more than once. It occupies more space and doesn’t serve any purpose. For example, let’s say we’re looking at pressure values within a steam system. For a given timestamp and location, we only need the value to occur once. If the values are duplicated, we need to deduplicate it before processing further.

100th Episode Of The Dan Smolen Podcast

Prateek Joshi, Founder and CEO of Plutoshift, discusses how A.I. makes the world a better place on the 100th episode of The Dan Smolen Podcast. The Dan Smolen Podcast is the best at covering future of work and meaningful work topics and trends.

In this episode, Prateek:

  • Describes Plutoshift and his role in the company. Starts at 3:03
  • Defines A.I. and contrasts it with Machine Learning. Starts at 3:51
  • Addresses workforce concerns that A.I. takes jobs away from people. Starts at 8:52
  • Illustrates how Plutoshift helps clients involved with providing clean and potable water. Starts at 13:03
  • Identifies the training and advanced skill that he seeks in hired talent. Starts at 20:25
  • Tells us how, beyond his work, he adds fun and enjoyable activity to each day. Starts at 27:59


Listen to the A.I. and the Future of Work podcast.

Databases, Infrastructure, and Query Runtime

Recently, my team was tasked with making a switch from a combined MySQL and Cassandra infrastructure to one in which all of this data is stored entirely on a PostgreSQL server. This change was partially due to an increased drive to provide necessary and crucial flexibility to our customers, in tandem with the fact that Cassandra was simply not necessary for this particular application, even with the high quantities of data we were receiving. On its face, the mere need for such a change almost looks backwards given how much movement within the tech industry has been made away from SQL databases and towards NoSQL databases. But, in fact, NoSQL — or even hybrid systems — are not always best.

Performance Gain Considerations

In certain applications, one might find that performance gains, hoped to be reaped from NoSQL’s optimizations, may not translate perfectly to production without some forethought. I would personally argue that SQL databases often are preferable (over something like Cassandra) in non-trivial applications, most of all when JOIN operations are required.. Generally speaking, NoSQL databases — certainly Cassandra, among others — do not support JOIN. I will add to this that the vast majority of ORMs (for those who may not be familiar with the term, these are effectively systems of abstracting database relations into typically “object-oriented” style objects within one’s backend code) are built around SQL. Thus, the flexibility and readability that is afforded by these ORMs — at least when operating a database of non-trivial objects —can be a lifesaver for development time, database management, integrity, and readability. Indeed, I would even argue that, for most web applications, it often outweighs the sometimes marginal or even relatively negligible performance increases that a NoSQL database may provide (of course, this is completely dependent on the nature and scale of the data, but that is perhaps a topic for another time).

Cloud Infrastructure

However, none of this matters if the engineer is not paying close attention to their cloud infrastructure and the way that they are actually using their queries in production. In evaluating one engineer’s project, I found they were doing all of their insertion operations individually rather than attempting to batch or bulk insert them (when this was well within the scope of this particular application). It appeared they had been developing with a local setup and then deploying their project to the cloud where their database was running on a separate machine from their server. The end result in this case was rather comical, as once insertions were batched, even in Postgres, they were orders of magnitude faster than the piecemeal NoSQL insertions. They had not considered the simple fact of latency.

How did this original engineer miss this? I do not know, as this particular piece of software was inherited with little background knowledge. But, given that they were testing locally, I can assume that they elected for individual insertions. Making queries in this way can sometimes be less tricky than bulk insertions (which often have all sorts of constraints around them, and require a bit more forethought, especially when it comes to Cassandra). We found the performance was beyond satisfactory. What they did not consider, however, is that the latency between the backend server and a Cassandra (or SQL) server hosted in any sort of distributed system (ie. production). This meant that it didn’t really matter how fast these queries were; the latency between the backend and the database was so much greater than the query runtime, that, in fact, it really didn’t even remotely matter which database was used. So it followed that the real-world performance was actually significantly improved by simply batching insertions in Postgres (though of course, batching is supported in Cassandra — but the change was necessary nonetheless).

The Moral of the Story

In any case, the moral of the story here, in my opinion, is that understanding your own cloud infrastructure is crucial to writing actual performant programs in the real world. As well as the fact that, just because one database may be purported to perform better than another given certain circumstances, without a solid understanding of the environment in which this application is going to be deployed in, one cannot hope to see any appreciable performance gain.

Machine Learning In 20 Words Or Less

I’m often told that Machine Learning sounds complicated – but it doesn’t have to be. If I was asked to explain ML in 20 words or less, this is what it would sound like:

Understand the problem. Clean up the data. Investigate relationships. Engineer the dataset. Build the model. Tune to high performance.

At its core, ML is pretty straightforward. But it does need to follow a process. Here’s a more in-depth breakdown of the stages that can help you turn your data into proactive learnings: 

  • Understand – We can’t improve what we don’t understand, so our solutions are always grounded in a deep understanding of a process and the data related to that process.
  • Clean – The real world is messy, and data is almost never what we’ve been told. To get data ready for both analysis and (eventually) machine learning, we have to clean and process it.
  • Investigate – Before we can teach a machine what is important in a dataset, we have to understand it ourselves. Investigating data is really about driving a deeper understanding of a dataset, its correlations and relationships, identifying patterns, and so on. It’s rare that complex processes have simple solutions, but it’s often relatively simple analysis that sets us on the path of a solution.
  • Engineer – Machines are not smarter than humans; they are just great at fast math. But to learn best, they must be taught in very specific ways. This step is about prepping a dataset to train a model in the best way possible, as well as about bringing new information to the model to give it the best chance of seeing the signal we want.
  • Build & Tune – This is the fun part — creating, testing, and tuning predictive models. This stage includes retraining models as new data becomes available, as well as assessing model performance over and doing maintenance work to make sure the model continues to deliver value.

Don’t let complex terminology overwhelm you when it comes to using ML. All it takes is 20 words and 1 open mind.

Executive guide to assessing your data readiness in 5 steps

Within large companies, data is stored across many systems. If we specifically look at companies with large operations infrastructure, there are many different types of data they have to work with — sensors, inventory, maintenance, financials, and more. In order to perform the operational tasks, this data has to be centralized and piped into different workflows. The data needs to be ready for that! This allows the operations teams to use that information and ensure that the business is running smoothly. How do you assess the data readiness? How do you make sure that the frontline teams are well equipped to perform their tasks? From what we’ve seen, here are 5 items you’d need from data on a daily basis:

1. Ability to access the data
As fundamental as it seems, accessibility has always been a big issue. The data that’s stored across many systems is difficult to access. It’s inside arcane systems that are not friendly to use. In today’s world, anyone in the organization should be able pull the data via a simple API.

2. Centralizing the data
Once we pull the data from different systems, what do we do with it? Operations teams need the data to be centralized so that they can perform their tasks. These tasks usually require data from multiple sources. Centralizing the data and keeping it ready for use is a useful step here.

3. Preprocessing the data in an automated way
To make the data useful after it’s centralized, it has to be preprocessed to make sure it’s ready for different types of transformations. Operations teams need to prepare the data and pipe it into various data workflows. This is usually done manually using Excel spreadsheets. Automating this step will be very helpful so that operations teams can focus on high-impact items.

4. Piping the preprocessed data into different workflows
To put the preprocessed data to use, we need workflows that can transform raw data into useful information. A data workflow usually consists of 5-7 steps of transformation depending on what we’re aiming to achieve. These steps can be done manually using Excel, but it’s not a good use of anyone’s time because they are repetitive computations. Having a set of prebuilt workflows and automating the work of pushing the data through these workflows is a big time-saver. In addition to that, it will lead to a significant increase in accuracy of their work. Machine Learning is very impactful on this front.

5. System of record for the centralized processed data
Operations teams have to frequently access historical information for many reasons. Having a system of record that can store the centralized processed data is very useful. Operations teams need to reference them for various tasks such as internal reporting, knowledge-based tasks, learning, best practices, and more.

Data readiness is critical and lays the foundation for success.

5 ways for business leaders to package AI into bite-sized pieces

Large companies have been tackling the issue of AI for the last few years. Business leaders are often faced with the problem of figuring out how to use this technology in a practical way. Any new technology needs to be packaged into bite-sized pieces to show that it works. These “success templates” can then be used to drive enterprise-wide adoption. But should they do it all at once? How do you ensure that you’re not boiling the ocean? How can a company package AI into bite-sized pieces so that their teams can consume it? From what we’ve worked on with our customers and seen in the market, there are 5 steps to do it:

1. Start with the use case
It always starts with a use case. Before launching any AI initiative, the question you should ask is whether or not there’s a burning need today. A need qualifies as “burning” if it has a large impact on your business. If solved, it can directly increase revenue and/or margins for the company. We need to describe this burning need in the form of a use case. These use cases are actually very simple to describe as shown below:
– “We’re using too much electricity to make our beverage product”
– “We’re taking too long to fix our pumps when someone files a support ticket”
– “We’re spending a large amount of money on chemicals to clean our water”

2. Pick a data workflow that’s specific to an operation
Once you figure out the use case, the next step is to figure out the data workflow. A data workflow is a series of steps that a human would take to transform raw data into useful information. Instead of figuring out a way to automate all the workflows across the entire company, you should pick a workflow that’s very specific to an operation. This allows you to understand what it takes to get something working. We conducted a survey of 500 professionals to get their take on this and we found 78% felt supported by their team leaders when they embarked on this approach. Here’s the full report: Instruments of Change: Professionals Achieving Success Through Operation-Specific Digital Transformation

3. Be selective with data
Once you pick a workflow, you need to understand what specific data is going support this particular workflow. If you try to digest all available data, it leads to chaos and suboptimal outcomes. If you’re disciplined around what data you need, it will drive focus on the outcomes and ensure that the project is manageable.

4. Create a benefits scorecard collaboratively
The main reason you’re deploying AI is to drive a specific outcome. This outcome should be measurable and should have a direct impact on the business. You should include all stakeholders in creating a benefits scorecard. The people implementing the AI solution should hold themselves accountable with respect to this benefits scorecard. The time to realize those benefits should be short e.g. 90 days.

5. Have the nuts-and-bolts in place that enable you to scale
Let’s say you successfully execute on this PoC. What’s next? You should be able to replicate it with more use cases across the company. There’s no point in doing this if the approach is not scalable. Make sure you have a data platform that supports deploying a wide range of use cases. The nuts-and-bolts of the platform should enable you to compose many workflows with ease. What does “nuts-and-bolts” include? It includes automating all the work related to data — checking data quality, processing data, transforming data, storing data, retrieving data, visualizing data, keeping it API-ready, and validating data integrity.