The other night myself and Mrs Techstringy were discussing a work challenge. She works for a well-known charity and one of her roles is to book locations for fundraising activities, on this occasion the team were looking at booking places at railway stations and considering a number of locations, however all they really had to go on was a “gut feeling”.
As we discussed it we did a bit of searching and came across this website http://www.orr.gov.uk/statistics/published-stats/station-usage-estimates which contained information of footfall in every UK railway station over the last 20 years, this information was not only train geek heaven, it also allowed us to start to use the data available to make a more informed choice and to introduce possibilities that otherwise would not have been considered.
This little family exercise was an interesting reminder of the power of data and how with the right analysis we can make better decisions.
Using data to make better decisions is hardly news, with the ever-increasing amounts of data we are collecting and the greater access to powerful analytics, machine learning and AI engines, all of us are already riding the data train taking us to a world of revolutionary ideas, aren’t we?
The reality is, that most of us are not, but why?
For many, especially with data sets gathered over many years, it’s hard, hard to package our data in such a way that we can easily present it to analytics engines and get something useful from it.
But don’t let it stop you, there is potentially huge advantage to be had from using our data effectively, all we need is a little help to get there.
So what kind of steps can we take so we too can grab our ticket and board the data train?
Understand our data
The first thing may seem obvious, understand our data, we need to know, where is it? what is it? is it still relevant?
Without knowing these basics, it is going to be almost impossible to identify and package up the “useful” data.
The reality of data analytics is we just can’t throw everything at it, remember the old adage garbage in, garbage out, it’s not changed, if we feed our data analytics elephant a lot of rubbish, we aren’t going to like what comes out the other end!
Triage that data
Once we’ve identified it, we need to make sure we don’t feed our analytics engine a load of nonsense, it’s important to triage, throw out the stuff that no one ever looks at, the endless replication, the stuff of no business value, we all store rubbish in our data sets, things that shouldn’t be there in the first place, so weed it out, otherwise at best we are going to process irrelevant information, at worst we are going to skew the answers and make them worthless.
Make it usable
This is perhaps the biggest challenge of all, how do we make our massive onsite datasets useful to an analytics engine.
Well we could deploy an on-prem analytics suite, but for most of us this is unfeasible and the reality is, why bother? Amazon, Microsoft, Google, IBM to name but a few have fantastic analytics services ready and waiting for your data, however the trick is how to get it there.
The problem with data is it has weight, gravity, it’s the thing in a cloud led world that is still difficult to move around, it’s not only its size that makes it tricky, but there is our need to maintain control, meet security requirements, maintain compliance, these things can make moving our data into cloud analytics engines difficult.
This is where building an appropriate data strategy is important, we need to have a way to ensure our data is in the right place, at the right time, while maintaining control, security and compliance.
When looking to build a strategy that allows us to take advantage of cloud analytics tools, we have two basic options;
Take our data to the cloud
Taking our data to the cloud is more than just moving it there, it can’t just be a one off copy, ideally in this kind of setup, we need to move our data in, keep it synchronised with changing on-prem data stores and then move our analysed data back when we are finished, all of this with the minimum of intervention.
Bring the cloud to our data
Using cloud data services doesn’t have to mean moving our data to the cloud, we can bring the cloud to our data, services like Express Route into Azure or Direct Connect into AWS means that we can get all the bandwidth we need between our data and cloud analytics services, while our data stays exactly where we want it, in our datacentre, under our control and without the heavy lifting required for moving it into a public cloud data store.
Maybe it’s even a mix of the two, dependent on requirement, size and type of dataset, what’s important is that we have a strategy, a strategy that gives us the flexibility to do either.
Once we have our strategy in place and have the technology to enable it, we are good to go, well almost, finding the right analytics tools and of course what to do with the results when we have them, are all part of the solution, but having our data ready is a good start.
That journey does have to start somewhere, so first get to know your data, understand what’s important and get a way to ensure you can present it to the right tools for the job.
Once you have that, step aboard and take your journey on the data train.
If you want to know more on this subject and are in or around Liverpool on July 5th, why not join me and a team of industry experts as we discuss getting the very best from your data assets at our North West Data Forum.
And for more information on getting your data ready to move to the cloud, check out a recent podcast episode I did with Cloud Architect Kirk Ryan of NetApp as we discuss the why’s and how’s of ensuring our data is cloud ready.
One thought on “All Aboard the Data Train”