1) Data Analytics vs Data Analysis vs Data Science
- Data Analysis: Finding answers from data (like totals, trends, comparisons).
- Data Analytics: Bigger view—analysis + using insights to help business decisions.
- Data Science: Advanced level—uses coding + machine learning to predict and build models.
2) Data Analysis Process (Steps)
A common step-by-step flow:
- Understand the problem (what question to solve)
- Collect data
- Clean data (remove mistakes, blanks, duplicates)
- Analyze (calculations, SQL queries, charts)
- Visualize (dashboard/report)
- Share insights (what it means + what action to take)
3) What is Data Cleaning?
Data cleaning means making data correct and usable.
Common problems:
- Missing values (blank cells)
- Duplicates (same record repeated)
- Wrong format (dates stored as text)
- Outliers (very unusual values)
- Spelling issues (Noida / noida / NOIDA)
4) Types of Data
- Structured: Proper tables (Excel/SQL). Example: Student marks sheet.
- Semi-structured: Has some structure but not perfect tables. Example: JSON, XML.
- Unstructured: No fixed format. Example: images, videos, emails, PDFs.
5) What is SQL and Why Important?
SQL is used to talk to databases and get data.
Data analysts use SQL to:
- fetch data quickly
- filter, sort, and group data
- join multiple tables (customer + orders)
6) SQL Query to Find Duplicate Records
Duplicates depend on which columns should be unique (example: email).
SELECT email, COUNT(*) AS total
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
This shows emails that appear more than once.
7) WHERE vs HAVING in SQL
- WHERE filters rows before grouping.
- HAVING filters groups after
GROUP BY.
Example:
SELECT city, COUNT(*) AS total
FROM users
WHERE city IS NOT NULL
GROUP BY city
HAVING COUNT(*) > 100;
8) What are JOINs? (INNER JOIN vs LEFT JOIN)
JOIN means combining data from two tables.
Example tables:
customers(customer_id, name)orders(order_id, customer_id, amount)
INNER JOIN: only matching records from both tables.
SELECT c.name, o.amount
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;
LEFT JOIN: all customers + matching orders (even if no order, customer still shows).
SELECT c.name, o.amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
9) What is GROUP BY?
GROUP BY groups similar records and helps you calculate totals/average/count.
Example: total students per course
SELECT course, COUNT(*) AS total_students
FROM students
GROUP BY course;
10) Primary Key vs Foreign Key
- Primary Key (PK): Unique ID in a table (cannot repeat). Example:
student_id. - Foreign Key (FK): A column that links to another table’s primary key. Example:
orders.customer_idlinks tocustomers.customer_id.
Simple meaning: PK = unique identity, FK = connection between tables.
11) Excel: VLOOKUP vs XLOOKUP vs INDEX-MATCH
- VLOOKUP: Old and common. Looks from left to right only. Can break if columns change.
- XLOOKUP: Newer and easier. Can look left/right, safer.
- INDEX-MATCH: Powerful combo, works like flexible lookup and used widely.
Simple tip: If available, XLOOKUP is usually easiest.
12) What is a Pivot Table?
A Pivot Table is used to summarize big data quickly without formulas.
You can get:
- total sales by month
- count of students by course
- average marks by subject
It’s like making a report in 2 minutes.
13) What is Data Visualization? What makes a dashboard good?
Data visualization means showing data in charts/graphs so people understand quickly.
A good dashboard:
- answers the main business questions
- is simple (not too many charts)
- has correct labels and clear numbers
- shows key KPIs (sales, profit, conversion, etc.)
14) Power BI/Tableau: Calculated Column vs Measure
- Calculated Column: Created for every row (stored in the table). Good for row-level logic.
- Measure: Calculated only when needed (dynamic). Good for totals like sum, average.
Example idea:
- Column:
Profit = Sales - Cost(per row) - Measure:
Total Profit = SUM(Profit)(changes with filters)
15) Correlation vs Causation
- Correlation: Two things move together (example: ice cream sales and cold drinks sales).
- Causation: One thing causes the other.
Important: Correlation does NOT always mean causation.
Example: Ice cream sales increase and drowning incidents increase in summer—not because ice cream causes drowning, but because summer increases both.



