Data Science AND Data Engineering
The current century is the century of data. Since the onset of internet-based technologies, there has been massive consumption and generation of data. This opportunity of storage, transfer, and retrieval of data has helped in the creation of several tools, technologies as well as newer disciplines for its study. Two such disciplines that we are going to discuss today is the difference between Data Science and Data Engineering.
- Data Science:
Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies. Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new market opportunities and increase the organization's competitive advantage.
The term data science was used since 1960’s as a substitute for computer science. Although, in 2001 the word Data Science was presented as an independent discipline. Since then, researchers and scientists have tried to explain the meaning based on their own understanding and work.
According to Wikipedia, "Data Science is an interdisciplinary subject that exploits the methods and tools from statistics, application domain, and computer science to process data, structured or unstructured, in order to gain meaningful insights and knowledge".
reference: https://www.quora.com/What-is-data-science
- Responsibilities of Data Scientist:
A data scientist’s chief responsibility is data analysis, a process that begins with data collection and ends with business decisions made on the basis of the data scientist’s final data analytics results.The data that data scientists analyze, often called big data, draws from a number of sources. There are two types of data that fall under the umbrella of big data: structured data and unstructured data. Structured data is organized, typically by categories that make it easy for a computer to sort, read and organize automatically.
Unstructured data, the fastest growing form of big data, is more likely to come from human input — customer reviews, emails, videos, social media posts, etc. This data is typically more difficult to sort through and less efficient to manage with technology. Because it isn’t streamlined, unstructured data can require a big investment to manage. Businesses typically rely on keywords to make sense of unstructured data as a way to pull out relevant data using searchable terms.
Typically, businesses employ data scientists to handle this unstructured data, whereas other IT personnel will be responsible for managing and maintaining structured data. Yes, data scientists will likely deal with plenty of structured data in their careers, but businesses are increasingly wanting to leverage unstructured data in service of their revenue goals, making approaches to unstructured data key to the data scientist role.
- Data Engineering:
Data engineering includes what some companies might call Data Infrastructure or Data Architecture. The one who gathers and collects the data, stores it, does batch processing or real-time processing on it and serves it via an API (application programming interface) to a data scientist who can easily query it. And one who does all these is called a Data Engineer.
Typically, a data engineer will have strong programming skills and a deep understanding of the big data ecosystem, and distributed systems in general.
- Responsibilities Of Data Engineer:
Responsibilities for Data Engineer
- Create and maintain optimal data pipeline architecture,
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
- Create and maintain optimal data pipeline architecture,
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
Remember to post your job on Glassdoor for free here: https://www.glassdoor.com/post-job?src=jd
Remember to post your job on Glassdoor for free here: https://www.glassdoor.com/post-job?src=jd