IEEE and Big Data
Christine “Chris” Miyachi chairs IEEE Cloud Computing. She works as a systems engineer and software architect at Xerox Corporation, and is a graduate of the Massachusetts Institute of Technology. In this Q&A, Miyachi discusses big data’s and the Internet of Things’ (IoT) implications for the cloud and for her professional work.
Q: Chris, would you share some thoughts on the various drivers for cloud computing and its relationship with big data?
A: Simply put, cloud computing services are a way for enterprises with big data on their hands to cost-effectively manage related challenges such as storage, processing, and analysis. The companies that offer cloud services make the investment in software and hardware and offer access at an affordable price. So cloud services allow customers to avoid the cyclical, capital-intensive investments that otherwise would be needed to keep pace with maintaining storage, processing and analysis technologies. Rather than buying and maintaining expensive IT resources for intermittent use, one can turn to cloud services and take advantage of their economies of scale and a la carte opportunities.
This scenario applies to research labs as well as start-ups. Today I see companies use virtual servers that connect with the cloud. This arrangement allows enterprises to be more nimble and less capital intensive. That said, many companies continue to maintain their own data infrastructure, for a variety of reasons.
I would point readers to the IEEE Big Data Initiative, which provides valuable resources on this topic and related areas.
Q: As the IoT connects a zillion devices and they all generate data, and “big data” becomes “much bigger data,” are there concerns within the cloud computing community that current technology has limitations?
A: Right now in cloud computing it almost appears as if there’s an infinite amount of data storage – but there isn’t. People in the field that I’ve spoken with tell me that, at some point, we’re going to fall off a cliff. And they think that IoT is that cliff. We can see a few possible strategies for dealing with that scenario. One is processing data at the edge of the network and only returning a portion of it to central storage and processing. Another possible strategy is dividing up the data into more manageable portions, if you will, so that it’s logically partitioned. Dividing the problem down to smaller microportions will decrease the size of the data needed.
Q: We’ve just mentioned storage as a potential chokepoint with big data, but are there other aspects of cloud computing that might prove inadequate under the advent of IoT?
A: I’m not an expert on IoT, but I can tell you that processing and analysis is already an issue with the vast amounts of unstructured data that we currently have in storage.
I worked a lot with embedded systems before “big data” ever got its name. These embedded systems generate tremendous amounts of data that isn’t immediately used. It’s completely unstructured and it wasn’t made to be analyzed by any big data systems. Companies that have had legacy embedded systems now want to start analyzing this data because it might contain a goldmine of insights into how their products have been used.
Maybe that’s a proxy for the advent of IoT-generated big data. When I mentioned “logical partitioning,” it applies here. Maybe we’ll need to narrow our focus and look for one specific insight from massive data sets. Or use sampling to cut the processing down to a manageable scale.
Q: It sounds like these issues will be among the contemporary challenges for young professionals coming into the field. What are the prospects for careers in cloud computing?
A: A number of very large companies, many medium-sized companies, and a slew of start-ups are hiring young people out of school, so there’s no problem with the pipeline of talent. And newcomers have an immediate opportunity to tackle challenges that they didn’t learn about in school, because things are moving so fast that curricula can’t keep pace.
That said, IEEE Cloud Computing is developing young professional tracks for our conferences and bringing in young entrepreneurs to give "elevator talks” – five minutes on what they do and why. IEEE needs to reach out to establish the benefits of getting involved as a volunteer. We’ve got an upcoming IEEE Technology Time Machine conference taking place 20-21 October 2016 in San Diego, California. And we’re holding a cloud computing track in tandem with the annual International Conference on Consumer Electronics (ICCE) occuring 8-11 January 2017 in Las Vegas, Nevada, where we hope to invite young professionals to speak.
I would add something here about cloud computing and young people getting into the field. Computing today is cloud computing – there’s really no distinction. Students and young professionals don’t remember a time when there wasn’t an Internet, with data stored in the cloud.
Q: Do any issues in the cloud computing field keep you up at night?
A: Yes, two aspects of security in the cloud concern me. One is simply the potential for loss of data. The people I talk to in the industry think that there is potential for disaster as far as data in the cloud being lost, that people aren’t properly protected. I’ve experienced this on a personal and professional level. A few weeks ago I was in Japan and our IEEE Cloud Computing Community Facebook page disappeared. We spent probably five years building it up. It’s a big loss and Facebook has not been able to help us recover the page. Some businesses I know rely solely on their Facebook page as their website, but luckily for us, we have a website and LinkedIn, Twitter, and Flipboard accounts.
The other topic that concerns me is there’s got to be a revolution around the way companies collect data about people and use that data. Right now companies are collecting key data about individuals and we don’t know it, other than to be surprised when ads pop up that reflect our consumer habits. And I think there’s got to be a big change to that practice.
Q: So one issue is redundancy for security, the other is the transparency of data collection practices and the issue of who owns that data and controls its use?
A: Yes, and whether those issues should be resolved by regulation or by the market, I’m not sure. Perhaps enlightened, transparent data policies will become a business differentiator and people won’t do business with firms that don’t provide transparency and security for your data. Many firms scrub the data of any personally identifiable information before they use it. That’s one solution. For more on this topic, readers can consult IEEE Cloud Computing magazine, which has published some excellent articles on these issues.