Others Interview Questions and Answers (265) - Page 8

Explain "Velocity" in Big Data Parlance

A person struck in a jam would not wait for days to get information about the latest traffic situation so that he can take the shortest clear route home. He will need the information instantly. Here is the role of Velocity in big data. The huge data needs to be processed very fast and the analysis needs to be completed quickly, that’s the need of the hour. Currently these technologies are helping to keep things fast for big data, streaming data or complex event processing and in-memory processing. There are several proprietary and open source tools for this. Examples include MapReduce, S4, and Storm etc. There is also another factor which we should keep in mind, information retrieval here technologies like NoSQL helps, which are optimized for the fast retrieval of pre-computed information
Explain "Variety" in Big Data Parlance

One of the underlying characteristics of big data is the diversity of the data sources which results in various data types. Data is not perfectly ordered and ready for processing. One of the biggest challenges of processing is extracting meaningful information out of it. Here is the role of VARIETY in big data. The first technique used is the SQL-NoSQL integration .The integration of the relational and non-relational world provides the most powerful analytics by bringing together the best of both also it provides storage solutions for various data types. Linked data, semantics are two techniques which have gained some popularity too. NLP plays a role too in Entity Extraction. Statistics plays a big role in flattening out and extracting data sets. The open source statistical language R provides great integration points for several tools and solutions for big data. Apache projects also have a couple of solutions which craters to this space, which along with a couple of proprietary technologies are currently used to solve the problems of variety.
What is BigData and how much big it is?

Data in a form which cannot be represented in databases are known as Unstructured/Semi-structured data. A collection of a huge set of such data which conventional software is unable to capture, manage and process in a stipulated amount of time is known as “BIG DATA”. It is not an exact term it is characterized by accumulation of exponential unstructured data. It describes data sets which are large and raw which conventional relational databases are unable to analyze.
Now ‘how much is BIG’, it is a moving target size which is increasing as the day passes. Currently in 2012 it is represented by few dozen terabytes to many petabytes of data in a single data set. We also think it also depends on the context in which it is used. For example size of sets would vary if we compare astronomical data with data collected from an online feedback.
Notwithstanding the fact that the data itself is overwhelming, the magnitude and complexity of extracting information out of it and making sense of it is “big” too. Scientists all round the world are looking for answers to solve these complexities. The best example of that is http://amplab.cs.berkeley.edu/.
Where do we find the growth of "Big Data"?

Mobile devices, remote sensing technologies, software logs, cameras, microphones, radio-frequency identification, wireless sensors, weather satellites and sensors, scientific experiments , social networks, internet text and documents, Internet search indexing, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and often interdisciplinary scientific research, military surveillance, medical records, photography archives, video archives, and large-scale e-commerce, all contribute. As more and more sensors, mobile devices, cameras etc are added into the network/system as more number of people share photos, music etc, as more number of netizens join social networks the size increases. A few examples of some systems and the amount of data they generate:

CERN:-The Large Hadron Collider project in CERN produced 22PB of data this year after accepting only 1% of the data produced which is about 100MB per sec.

FLICKR:-More than 4 billion queries per day, ~35M photos in squid cache (total), ~2M photos in squid’s RAM, ~470M photos, 4 or 5 sizes of each, 2 PB raw storage

FACEBOOK: - As of July 2011 750 million worldwide users uploaded approximately 100 terabytes of data every day to the social media platform. Extrapolated against a full year, that’s enough data to manage the U.S. Library of Congress’ entire print collection—3,600 times over.

Not only is this but the per-capita capacity to store information also responsible for this huge data explosion. Data storage was very expensive about quarter of a century back, as prices of storage came down more and more data got stored and currently as per estimates per-capita capacity to store information has roughly doubled every 40 months since the 1980

Is that the end of the story as far as the source and rational behind the growth of data? No, please include enterprise ‘structured’ data to the list too, which can provide wonderful insights. Metadata, data about data, which is increasing twice as fast as the digital data growth, also adds to the list.
Why is such a buzz about "Big Data"?

First let’s take a look into some statistics. It appeared about 174000 times in New York Times Technology section, about 11,040 times in news articles in CNET, 75 articles in O’Reilly in last one year. What is it? …… “BIG DATA”
Tech companies have dedicated entire pages in their websites for the same

•IBM - http://www-01.ibm.com/software/data/bigdata/
•CISCO - http://www.cisco.com/en/US/solutions/ns340/ns517/ns224/big_data.html
•Oracle - http://www.oracle.com/us/technologies/big-data/index.html
•EMC2 - http://www.emc.com/microsites/bigdata/index.htm

A report was published by the World Economic Forum recently

Global Consulting firms like McKinsey have “big” things to say about it

The press can’t keep stop talking about it, all the leading news papers are discussing it and tech websites have it all over. Why? Let’s delve deeper into the matter

According to Forrester’s research only about 5% of the data available to enterprises are effectively utilized as the rest is too difficult to analyze as it is very expensive. A McKinsey Global Institute report claims a retailer embracing big data has the potential to increase its operating margin by more than 60%. As the price of storing data came down and companies began to realize the hidden potential of the same they started focusing into this to drive business goals, setting future goals, getting customer feedback etc. They realized that this trove of hidden treasure can also be used for creating business values. This has been corroborated in the report published by WEF wherein it was declared that data is a new class of economic asset, like currency or gold. When used correctly, big data can yield insights to develop, refine or redirect business initiatives; discover operational roadblocks; streamline supply chains; improve operational efficiency; better understand customers; create new revenue streams, differentiated competitive advantage; propose entirely new business models; as well as develop new products, services and business models.
What is the importance of Securities for Big Data Companies?

Security is very important for big data companies. There are two ways it can have adverse effects.

First by storing information which is not legal to store the company makes itself vulnerable, also accidently leaking information like credit card details, social security information can cause a lot of damage to the reputation of the company.

Secondly a pure security breach by hackers can cause exposure the entire data set. Technology plays a very crucial role in ensuring security but there is a need to control it more effectively.
What values are created by "Big Data"?

Education, Physics, Economics, Astronomy, Telecom, Healthcare, Financial Services, Management, Transportation, Digital Media, Retail, Law Enforcement, Energy and Utilities, Social Media, Online Services, Security are some of the domain and areas where “big data” is already creating a lot of value today and the list is increasing day by day.

First let’s see what generic values it unlocks. According to Mckinsey
1)big data can unlock significant value by making information transparent and usable at much higher frequency.
2)as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance. Leading companies are using data collection and analysis to conduct controlled experiments to make better management decisions; others are using data for basic low-frequency forecasting to high-frequency nowcasting to adjust their business levers just in time
3)big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services
4)sophisticated analytics can substantially improve decision-making
5)big data can be used to improve the development of the next generation of products and services.

Now let’s take a few examples from these domains and see how “big data” has added value to them.

Heavy Industrial Machinery : - In GE complex and high-volume sensor data is in use for years to monitor and test industrial equipment such as turbines, jet engines, and locomotives. Big Data are used currently to predict performance and maintenance needs of these heavy machines. It also helps to deal with unprecedented down times. It helped GE to use more parameters and data points which were not in use previously.

Retail: - Sears was using about 10% of the data produced by its stores and it took eight weeks for them to correctly calculate the “price elasticity”, which is crucial in retail. Enter Hadoop and Big Data techniques Sears is not only able to use 100% of the data it produces, it calculates the “price elasticity” in almost real time. Big Data has helped Sears set more competitive prices and also move inventories according to the current demands.

Medical Science: - National Cancer Institute along with UC – Santa Cruz is planning to create the world's largest depository for cancer genomes. They claim that it will be used in "personalized" or "precision" care, whereby the treatment targets specific genetic changes found in an individual patient's cancer cells. It helps them to complete molecular characterization of cancer which will be of immense help. The entire setup runs with the help of “Big Data” techniques.
What is the future of "Big Data"?

The future of “big data” from a business perspective looks exiting, so let’s take a look at what technology has in store for big data. The classic problem of “big data” is the ability to assemble and preparing it for analysis. The multitude of systems leaving digital traces is storing data in different formats. Assembling, standardizing, normalizing, cleaning up and selecting the crème-de-la-crème data for analysis is the crux of the problem. This is currently handled by Hadoop and other advancement of technologies such as high-speed data analysis, in memory processing etc. The challenges which remain are processing huge data very quickly; defining a platform; make the technology much more accessible; removing the complexities and making the data more secure. The speed of data processing will be enhanced once the in-memory technologies etc. evolve. The next challenge would be create a high availability platform which would take out the complexities of processing and analyzing the huge amount of data. The creation of the platform would involve developing tools for quick digestion of huge amount of data and extracting the juice, the process should be made simpler too so that a person who is a layman in technology can also perform the task. The techniques of effectively using the pointers post-analysis would also gain attention.
What is Erlang?

Erlang is a programming language designed for developing robust systems of programs that can be distributed among different computers in a network. Developed in the 1980s at the Ericsson Computer Science Laboratory to address a then-unfulfilled need for telecommunications programming, it has evolved as a general purpose concurrency-oriented functional programming language suited for fault-tolerant, error recovery, distributed, soft real-time systems since then. It supports hot swapping, so that code can be changed without stopping a system
What is Pair Programming?

Pair programming is the practice of two programmers working together on the same terminal and on the same task.One person is the driver and the other is the observer. The driver has control of the pen/keyboard and contributes by coding or drawing design diagrams. The observer wears the hat of a strategist. He or she observes what the driver is doing, to analyze its correctness and also to understand how well it fits into the big picture of the software being developed. Developers switch roles as well as pairs during the duration of a project.
How Code ownership improves quality of code in Pair Programming?

Code ownership improves quality of code: It is true that when developers own the code they write, they try to make it as good as possible, because after all, it is their reputation at stake. However, individual code ownership diminishes the chances of peer review. Peer review has been found to be very helpful in increasing code quality. During Pair programming, the code and design is reviewed immediately. Thus, if a developer makes a mistake due to an oversight, it is very likely that his pair will point it out. Also, when dealing with a particularly difficult coding or design problem, it has been found that two developers will explore a larger solution space and will find a better solution.
Who are software architects? What role do they play in the industry?

Architects create architectures, and their responsibilities encompass all that is involved in doing so. This would include articulating the architectural vision, conceptualising and experimenting with alternative architectural approaches, creating models, and validating the architecture against requirements and assumptions.

The role goes beyond technical activities to strategic and sometimes consulting. A sound sense of business and technical strategy is required to envision the "right" architectural approach to the customer's problem set. Activities in this area include the creation of technology roadmaps, making assertions about technology directions and determining their consequences for technical strategy and hence architectural approach.
What are the qualities and qualifications that a Software Architect needs to possess?

The best architects, command respect in the technical community, are good strategists, organisational consultants and leaders. Above all, they should be highly creative.
What is the importance of architecting software - say business solutions software?

Architecture plays an important role in dealing with complexity of the system in the following ways.
•Understanding of the system at a fairly high-level, by all the stakeholders
•Detailed understanding of the portions of the system that they work on, by all the developers.
•Partitioning work effectively across all the developers.
•Evaluating and make tradeoffs among requirements of differing priority.
•Anticipating major changes that will occur and designing a system that accommodates such changes flexibly and minimises maintenance cost.
•Facilitating the re-use of existing components, frameworks, libraries or third-party applications.
Where does Software Architecture start and where does it end in a typical software product/project scenario?

Software architecture extends from the initial understanding of customer requirements to the final delivery of the product. Beyond delivery, the architect can play a consultant's role in the evolution of subsequent versions of the product.
What are the different schools of thought in Software Architecture?

The main schools of thought are oriented toward:
•Reference Model

The pattern-oriented model works with the aim of collecting and reusing different architectural patterns. As software architecture evolves in the organization, it will leave a collection of architectural patterns, which can be reused to tackle similar issues. The functionality-oriented model identifies individual components of the problem domain, their functionality and their communication patterns with other components. The reference model view uses existing reference models like RM-ODP. A total architecting solution derives from all these views.
How has software architecture evolved over the years and where is it going?

In the early days, software architecture was largely an ad hoc affair. Descriptions relied on informal flowchart diagrams, which were never maintained. Over the years, software architecture has evolved as the complexity of the system has been increasing. The emergence of architecture description languages (ADLs) needs to find acceptance.
How can a typical software professional evolve into a Software Architect?

An architect needs a thorough knowledge of the product and service domain, relevant technologies, development processes, business strategy, organizational politics and also possess consulting and leadership skills. From a technical role he or she needs to acquire all the necessary skills (technology, business strategy, consulting, leadership) to evolve into a software architect.
What is 'R' language?

R is a language and environment for statistical computing and graphics, similar to the S language originally developed at Bell Labs. It’s an open source solution to data analysis
that’s supported by a large and active worldwide research community.
R is a case-sensitive, interpreted language.We can enter commands one at a time at
the command prompt (>) or run a set of commands from a source file. There are a wide variety of data types, including vectors, matrices, data frames ,and lists.
Explain some of the features of 'R' language

- Most commercial statistical software platforms cost thousands of dollars. R is free!.

- R is a comprehensive statistical platform, offering all manner of data analytic techniques.

- R has state-of-the-art graphics capabilities. If we want to visualize complex data,R has the most comprehensive and powerful feature set available.

- R is a powerful platform for interactive data analysis and exploration. For example, the results of any analytic step can easily be saved, manipulated, and
used as input for additional analyses.

- Getting data into a usable form from multiple sources can be a challenging proposition. R can easily import data from a wide variety of sources, including text files,
database management systems, statistical packages, and specialized data repositories.
Found this useful, bookmark this page to the blog or social networking websites. Page copy protected against web site content infringement by Copyscape

 Interview Questions and Answers Categories