Data Data Everywhere and it’s all the same

We’ve all heard the stories of how our data is growing exponentially and let’s face it our storage spend is probably backing that up, well certainly that’s what the CFO will tell you!

But how often do we stop and really think about why it’s growing and how to control it?

I had some of the traditional thinking about this challenged in an interesting way a couple of weeks back by an old friend who has just undertaken a new role with Actifio (www.actifio.com) and, as people from solution providers do, he was sharing some information on what they do and the value they provide, then he threw up the following information;

It certainly struck a chord with me, the numbers where based on some IDC figures and the basics of the graph are that a staggering 80% of the data in many organisations storage architectures is in fact copies of the production data sets.

according to IDC figures around 80% of data in production storage is copies of the production data set

As you can see from the graph above lots of that data is there for all the right reasons, dev & test, Backups, DR, so it’s not that the capacity is wasted or shouldn’t be there, its not all Johnny in accounts and his holiday snaps!

Well if all the data has a place and is valid, then what do we do about controlling it?

Firstly there are definitely a number of technology solutions out there that can help – for example, I’ve worked with NetApp storage for around 9 years now and their message has always been incredibly strong about storage efficiency with some of the industries leading efficiency technologies around snapshots, de-duplication and compression, thin provisioning etc… many other vendors now bring these technologies to market, some do it well..some not so much…but the option is there…

What else can we do to control the growth of data in our organisations ? – I did a little research and came up with 5 tips that you can follow and then one thing you can look at as an emerging trend that may change the way you look at managing data in your business;

Classify and understand your data – know where it is, who has access to it, even if anyone does access it
Store it in the right place – we hear lots about automated tiering etc.…but maybe more importantly ensure you understand what storage tier your data should sit in and place it there at the outset
Look at an archiving policy – if you’re applying pressure to your production storage, look at what is filling it and does it really need to be there – if no one has accessed data for 5 years does it need to sit on your production storage
Manage data retirement – How much data is in your organisation that no longer has an owner, look at how a strong governance solution can identify this data and help you to remove or archive it
Storage efficiency – earlier I mentioned NetApp and their storage efficiency technology, make sure if your storage solution can dedupe and compress then use it where you can.

Back to the start of this article and my meeting with the chaps at Actifio, where do they sit in this, well those tips are all great if the data we are talking about is no longer needed or can be shifted out of the production environment, but what if the data you need is still key and critical, if you think about the graph I showed, most of that data is key to the business, it’s part of DR and Backup, it operates in QA and Dev environments, so it is needed within that production environment.

How do we deal with that then? and that’s how this emerging trend of copy data virtualisation can help

Copy data virtualisation is an emerging trend for managing storage growth

what’s copy data virtualisation? – it’s the ability for a solution from companies like Actifio or Catalogic (www.catalogicsoftware.com) to take a copy of production data and store it outside of the production environment, but unlike archiving or traditional backups, the data is housed in such a way it can be manipulated and presented back to the business instantly for a range of uses, not only a really efficient model for backup and recovery but great for presenting test and dev environments, or presenting data to a data analytics solution or maybe extracting data and moving it to the cloud. All in all providing a hugely efficient and flexible way of handling the challenge of so many copies of our data sitting in production storage systems and as we all know, efficiency and flexibility is all part of the future for business IT.

Copy Data Virtualisation certainly addresses the data growth challenge in a new and interesting way, but don’t rule out the more traditional approaches we listed as well, data growth is only going to continue to be a massive challenge for all of us charged with delivering business IT services, regardless of size of organisation, don’t fear though there is plenty of tech out there to help, some great traditional approaches which are still hugely valid, but also some clever new emerging solutions that can change the way we manipulate and handle our data in the future.

Any questions please feel free to contact me on twitter or hunt me down on LinkedIn

3 thoughts on “Data Data Everywhere and it’s all the same”

Michael Troiano (@miketrap) says:

October 28, 2014 at 6:33 pm

Thanks for the post, Paul. Well stated.

Storage Alchemist (@skenniston) says:

November 8, 2014 at 4:21 pm

Paul, I echo the sentiments about the post. Nice work!

It is clear that the paradigm of data management has to evolve. I do wonder about your 5 tips however. I agree with them all 100%, however in practice I have yet to see clients embrace them. For example; classifying data is definitely the best and smartest way to get your arms around data management. There have been tools put forth and mandates from management to “classify your data” but at the end of the day, without a lot of policies to guide people in what data has what value, it is very hard to implement so it just doesn’t happen.

This is why tools such as Catalogic and Actifio help with this growing problem. If you are going to have all that data, lets find the best way to put it to work for the business. Catalogic’s software only implementation for NTAP and VMware allow clients to leverage the data and snapshots they have for multiple use cases, not only recovery, but DR, analytics and the emerging test/dev to dev/op trend.

I look forward to more of your pieces Paul.
Steve

1. stringy99 says:
  
  November 9, 2014 at 1:38 am
  
  Thanks for the comments Steve, and you are quite right, but like much in enterprise technology, there is very rarely a magic bullet to solve the problem and getting control of your data is built of multiple layers both traditional technology, policy, user education and of course emerging technology such as yours around single copy data management.
  
  I think the best we can do for our customers is to educate and ensure they have access to a great suite of tools to help control that data growth.