Data Science-What is data science?
DATA SCIENCE
Data Science means extracting useful and meaningful information to make decisions, from the huge amount of data collected from a variety of sources.
Due to the growing applications of data science, data science had shown a stupendous ballooning in today’s century.
There are endless areas where data science is being used; for better decision making
- Domains where there are always a need to make some tricky decisions
- For performing predictive analysis like predicting delays in case of airlines and demand for some product,
- Pattern discovery or pattern recognition
Various domains are there where hidden patterns of data serve a lot of purposes of the business.
- Data science can be applied in politics as well, to capture the votes or influence voters; probably not all the predictions come out to be true but then they have an equal probability to be true.
How Data Science Works?
While applying data science; the first step is asking the right question and exploring the data for the problem one is trying to solve.
Then, after exploring the data and asking questions, one must have some data as input too.
Further, some exploratory analysis needs to be performed on that data i.e. data cleaning, data stemming, or data lemmatization.
Then, data modeling is being done using Machine Learning algorithms and finally, the developed model is trained, so all this is a part of data modeling and after this come to the visualization of results; preparing the way to communicate the results to the concerned user.
The primary non-technical essential traits to be data scientists are the curiosity to ask the right questions, common sense to be creative to use the data, and the communication skills to communicate with data.
Technically, the first prerequisite is machine learning; the backbone of data science. The second part is modeling; you need to be good at identifying what are the algorithms which will suit our data to solve a given problem.
Then is the statistics; a core foundation of data science.
Programming is to some extent requiring at least some program or the other will be required as a part of executing the data science project.
Then is the databases; how to handle databases; how to play with data. Then, are the tools or the skills used in data science like Python, R, SAS, Jupyter Notebook, etc.
As far as data warehousing is concerned, there are Extract Transform Load, Structured Query Language, Hadoop for handling structured as well as unstructured data.
R in Data Science
For data visualization; R provides very good data visualization, Tableau is a proprietary tool, Cognos an IBM product, and many more.
Data Science is an area of great demand, the demand for data science is currently huge and the supply is very low.
So, there is a huge gap between demand and supply. Data Science is useful in health care; in predicting the disease; in diagnosis and several other relatable activities.
It is useful in finance; in insurance companies; in banks. Marketing is like a horizontal functionality across all industries.
There is a demand for data science out there also. Then, of course, is the technology arena.
Globally, there is a huge demand. This is a very critical skill that will be required currently as well as in the future.
Author: Dr. Swati Sharma