Introduction to Statistics
Statistics and measures of central tendency.
What is Statistics?
Statistics is the branch of mathematics that deals with the collection, organisation, presentation, analysis, and interpretation of numerical information. In Class 9, Chapter 9 introduces you to the foundational ideas of statistics — starting from what data actually means, all the way to building frequency distribution tables. These concepts form the backbone of the Statistics chapters in Class 10 and beyond, and are regularly tested in CBSE, Telangana, and Andhra Pradesh board examinations.
Every time a teacher records attendance, a scientist measures temperatures, or a government publishes census numbers — they are all working with data. Statistics gives us the tools to make sense of all that information.
Facts or figures — whether numerical or otherwise — that are collected with a definite purpose are called data. Examples include marks scored by students, heights of players, daily temperatures, or monthly rainfall.
Primary Data vs Secondary Data
The first important classification in statistics is understanding how data was originally collected. This determines whether it is primary or secondary data — a distinction that appears as a 1-mark question in many board exams.
Data collected directly by the investigator for a specific purpose for the first time.
- Freshly gathered, original information
- Collected through surveys, interviews, experiments, or direct observation
- More reliable and specific to the need
Example: A teacher personally measuring the heights of all students in her class right now.
Data collected from a source that has already recorded it — not gathered fresh by the investigator.
- Sourced from registers, books, websites, government records, newspapers
- Already processed or compiled by someone else
- Quicker to obtain but may not perfectly match the investigator's need
Example: Using school registers from 2001–2010 to study enrollment trends.
Practice: Identify Primary or Secondary?
The textbook "Do This" activity asks you to classify these two situations. Think carefully — the key question is: who collected the data and when?
| Situation | Type | Reason |
|---|---|---|
| Collection of enrollment data of students in your school from 2001 to 2010 | Secondary | The data was already recorded in school registers by someone else in the past |
| Height of students in your class recorded by the physical education teacher | Secondary | If you are using data already recorded by the PE teacher, it is secondary for you — the PE teacher's act of measuring was primary collection |
Raw Data and Range
When data is collected but has not yet been arranged or organised in any way, it is called raw data. Consider the marks scored by 15 students in a mathematics test out of 100:
85, 92, 78, 46, 88, 93, 71, 69, 84, 77, 91, 82, 76, 89, 95
This unordered list is raw data. It is hard to draw conclusions directly from it. The first step is to find the range, which tells you how spread out the data is.
Range = Maximum value − Minimum value
Here: Range = 95 − 46 = 49
Once arranged in ascending order, the data becomes much easier to read:
46, 69, 71, 76, 77, 78, 82, 84, 85, 88, 89, 91, 92, 93, 95
| Question | Answer | Explanation |
|---|---|---|
| What is the range? | 49 | 95 (max) − 46 (min) = 49 |
| What is the middle value (8th value)? | 84 | After arranging 15 values in order, the 8th is the middle |
| How many students scored more than 80? | 9 students | Values above 80: 82, 84, 85, 88, 89, 91, 92, 93, 95 |
Frequency Distribution Tables — Organising Data
When there are many data points (like marks of 50 students), writing them all out is messy and unhelpful. The solution is to count how many times each value appears — this count is called the frequency — and record it in a frequency distribution table using tally marks.
Consider the marks of 50 students in a test out of 10:
5,8,6,4,2,5,4,9,10,2,1,1,3,4,5,8,6,7,10,2,1,1,3,4,4,5,8,6,7,10,2,8,6,4,2,5,4,9,10,2,1,1,3,4,5,8,6,4,5,8
Step 1: Ungrouped Frequency Distribution (Individual Values)
Each distinct mark is listed, and tally marks are used to count how many students scored that mark. This gives an ungrouped frequency distribution table, also called a table of weighted observations.
| Marks | Tally Marks | Number of Students (Frequency) |
|---|---|---|
| 1 | 6 | |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 9 | |
| 5 | 7 | |
| 6 | 5 | |
| 7 | 2 | |
| 8 | 6 | |
| 9 | 2 | |
| 10 | 4 | |
| Total | — | 50 |
Step 2: Grouped Frequency Distribution (Class Intervals)
With 10 different marks, the ungrouped table already has 10 rows — manageable. But if the data ranged from 1 to 100, an ungrouped table would have 100 rows, which is impractical. The solution is to group the data into class intervals and count the frequency within each group.
| Marks (Class Interval) | Number of Students (Frequency) |
|---|---|
| 1 – 3 | 15 |
| 4 – 6 | 21 |
| 7 – 10 | 14 |
| Total | 50 |
This is called a Grouped Frequency Distribution Table. It summarises the data compactly and makes patterns much easier to see — here, most students scored in the 4–6 range.
Inclusive Classes vs Exclusive Classes
When we write class intervals in a grouped frequency table, there are two important formats — and confusing them is one of the most common mistakes in Class 9 Statistics board exams.
Classes written as 30–39, 40–49, 50–59, ...
- Both the lower and upper limits are included in the class
- Classes do not overlap — 39 belongs to 30–39, and 40 belongs to 40–49
- Best suited for discrete data (like whole-number marks or counts)
Example: Orange weights 30–39 g, 40–49 g, 50–59 g, …
Classes written as 30–40, 40–50, 50–60, ...
- The upper limit is excluded from the class — it belongs to the next class
- Classes appear to overlap (both end at 40) but by convention, 40 goes into 40–50
- Best suited for continuous data (like heights, weights, temperatures)
Example: 30–40 includes 30, 31, … 39 but not 40. 40–50 starts at 40.
Class Boundaries — Converting Inclusive to Exclusive
Inclusive classes like 30–39 have a gap between them (nothing covers exactly 39.5). To bridge these gaps, we use class boundaries:
Lower boundary = Lower limit − 0.5 | Upper boundary = Upper limit + 0.5
| Inclusive Class | Class Boundaries (Exclusive) |
|---|---|
| 20 – 29 | 19.5 – 29.5 |
| 30 – 39 | 29.5 – 39.5 |
| 40 – 49 | 39.5 – 49.5 |
| 50 – 59 | 49.5 – 59.5 |
| 60 – 69 | 59.5 – 69.5 |
| 70 – 79 | 69.5 – 79.5 |
| 80 – 89 | 79.5 – 89.5 |
| 90 – 99 | 89.5 – 99.5 |
| 100 – 109 | 99.5 – 109.5 |
| 110 – 119 | 109.5 – 119.5 |
In the boundaries above, 49.5 appears as the upper boundary of the 39.5–49.5 class and the lower boundary of the 49.5–59.5 class. There seems to be a conflict!
Convention: By standard rule, a value that falls exactly on a class boundary is placed in the higher class. So 49.5 belongs to 49.5–59.5, not to 39.5–49.5.
How to Build a Grouped Frequency Distribution Table
Follow these steps every time you are given raw data and asked to construct a grouped frequency distribution table — a very common 3-mark or 4-mark question in Telangana, AP, and CBSE board exams.
Chapter 9 Introduction — Key Terms at a Glance
| Term | Meaning | Example |
|---|---|---|
| Data | Facts/figures collected for a purpose | Marks of students, daily rainfall |
| Primary Data | Collected fresh by the investigator | Teacher measuring heights today |
| Secondary Data | Already recorded by someone else | School register, census report |
| Raw Data | Unorganised, unprocessed data | 85,92,78,46,88,… (as collected) |
| Range | Max value − Min value | 95 − 46 = 49 |
| Frequency | Number of times a value/class appears | Mark 4 appeared 9 times |
| Inclusive Class | Both limits included; non-overlapping | 30–39, 40–49, 50–59, … |
| Exclusive Class | Upper limit excluded; overlapping form | 30–40, 40–50, 50–60, … |
| Class Boundary | Adjusted limits bridging inclusive gaps | 30–39 becomes 29.5–39.5 |
What This Introduction Prepares You For
The concepts introduced here — data types, tally marks, frequency tables, and class intervals — are the building blocks for the entire Chapter 9. In the exercises that follow, you will use grouped frequency distribution tables to draw histograms and frequency polygons (Exercise 9.1) and to calculate measures of central tendency like mean, median, and mode.
In Class 10, the same frequency table format is used in Statistics Chapter 14 to compute the mean using the assumed mean method and to draw cumulative frequency curves (ogives). Getting these fundamentals right in Class 9 makes Class 10 statistics significantly easier.
For Telangana and Andhra Pradesh SSC board exams, the introduction section of Statistics typically contributes 1-mark definition questions and 2-mark "identify primary or secondary data" problems. Understanding the difference between inclusive and exclusive classes can also earn you marks in table-construction questions.
→ Exercise 9.1 — Histograms & Frequency Polygons
→ Chapter 10 — Introduction to Probability
→ Class 10 Statistics — Mean, Median, Mode