Last active
January 14, 2026 15:22
-
-
Save Visnusah/53b565e3f7fb4ea7a55b575a604f8fe5 to your computer and use it in GitHub Desktop.
Data Science Final Exam Mastery Guide
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ๐ Data Science Final Exam Mastery Guide |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
๐ Data Science Final Exam Mastery Guide
Target Exam: BSc Computing (Year 2) - Data Science
Focus Areas: Statistics, Big Data (HDFS/MapReduce), Visualization, and Methodologies.
๐ Part 1: Analytics & Visualization
Tip
Visualization Rule of Thumb: Always choose the chart that makes comparison easiest for the human eye.
1. Choosing the Right Chart
2. Analytics Maturity Model
3. The "Correlation Trap"
Warning
[cite_start]Correlation$\neq$ Causation > If ice cream sales and drowning incidents both rise, it does not mean ice cream causes drowning[cite: 104, 109].
[cite_start]Correct Answer: A confounding variable (like hot weather) causes both to increase[cite: 110].
๐๏ธ Part 2: Big Data & Infrastructure
1. HDFS (Hadoop Distributed File System)
2. Database Architecture
3. MapReduce
genre, count)[cite: 17].๐งช Part 3: Methodologies & Lifecycle
1. OSEMN Framework
2. CRISP-DM
3. Hypothesis Testing
Important
[cite_start]The p-value Rule > * Scenario: p-value = 0.08, Significance Level ($\alpha$ ) = 0.05[cite: 53].$0.08 > 0.05$ , you fail to reject the null hypothesis[cite: 55]. There is insufficient evidence to prove the effect.
[cite_start]* Conclusion: Since
๐งฎ Part 4: Mathematical Solvers (Corrected)
The calculations in the source text contained errors. Below are the corrected steps.
1. Linear Regression
Scenario: Predict Salary Increase ($Y$ ) based on Training Hours ($X$ ).
Dataset:
Step 1: Means
Step 2: Slope ($\beta_1$ )
Step 3: Intercept ($\beta_0$ )
Step 4: Prediction (for 11 hours)
2. Probability (Bayes/Total Prob)
Scenario:
Question A: Probability of ANY defect?
$$P(Defect) = (0.60 \times 0.05) + (0.40 \times 0.10)$$
$$P(Defect) = 0.03 + 0.04 = \mathbf{0.07} \text{ (or 7%)}$$
Question B: If defective, prob it is Type A?
$$P(A | Defect) = \frac{P(A \cap Defect)}{P(Defect)}$$
$$P(A | Defect) = \frac{0.03}{0.07} \approx \mathbf{0.43} \text{ (or 43%)}$$
๐ก Part 5: Rapid Review Mnemonics
The "ACID" Test (Transactions)
Note
Atomicity: "All or Nothing." [cite_start]If a payment system crashes mid-transaction, Atomicity ensures the transaction is completely reversed[cite: 119, 120].
The "CAP" Theorem
Note
[cite_start]Availability + Partition Tolerance (AP): If a system must stay operational (accepting orders) during a network crash, it sacrifices Consistency for Availability[cite: 132, 134].
Anscombe's Quartet
Note
Lesson: "Never trust summary statistics alone." [cite_start]Different datasets can have the exact same mean and correlation but look completely different on a scatter plot[cite: 37, 38].