four Information exploration
four.3 Frequency distribution

Text begins

Thefrequency (f) of a particular value is the number of times the value occurs in the data. Thedistribution of a variable is the design of frequencies, significant the set of all possible values and the frequencies associated with these values. Frequency distributions are portrayed equally frequency tables or charts.

Frequency distributions can show either the actual number of observations falling in each range or the percentage of observations. In the latter case, the distribution is chosen arelative frequency distribution.

Frequency distribution tables can be used for both chiselled and numeric variables. Continuous variables should merely be used with class intervals, which will be explained shortly.

Let's look at some examples of frequency distribution and relative frequency distribution for detached variables.

Example 1 – Amalgam a frequency distribution table

A survey was taken on Maple Avenue. In each of 20 homes, people were asked how many cars were registered to their households. The results were recorded equally follows:

i, 2, 1, 0, 3, 4, 0, i, one, 1, 2, 2, 3, ii, 3, 2, 1, four, 0, 0

Utilize the post-obit steps to present this data in a frequency distribution table.

  1. Carve up the results ( x ) into intervals, and so count the number of results in each interval. In this case, the intervals would be the number of households with no car (0), i car (ane), ii cars (2) and and so forth.
  2. Make a tabular array with carve up columns for the interval numbers (the number of cars per household), the tallied results, and the frequency of results in each interval. Characterization these columns Number of cars, Tally and Frequency.
  3. Read the list of data from left to correct and place a tally marker in the appropriate row. For example, the first effect is a 1, so identify a tally mark in the row abreast where ane appears in the interval column (Number of cars). The next result is a 2, so place a tally mark in the row abreast the two, then on. When you lot reach your fifth tally mark, describe a tally line through the preceding four marks to make your concluding frequency calculations easier to read.
  4. Add up the number of tally marks in each row and tape them in the final column entitled Frequency.

Your frequency distribution table for this exercise should look similar this:



Table 4.3.1
Frequency table for the number of cars registered in each household
Tabular array summary
This table displays the results of Frequency table for the number of cars registered in each household. The information is grouped by Number of cars (10) (appearing as row headers), Frequency (f) (appearing as column headers).
Number of cars (x) Frequency (f)
0 four
one half dozen
two five
iii 3
4 two

By looking at this frequency distribution table quickly, nosotros tin can see that out of 20 households surveyed, iv households had no cars, half-dozen households had i car, etc.

Instance two – Constructing a cumulative frequency distribution table

A cumulative frequency distribution table is a more than detailed tabular array. It looks almost the same as a frequency distribution table but it has added columns that give the cumulative frequency and the cumulative pct of the results, also.

At a recent chess tournament, all ten of the participants had to fill out a form that gave their names, accost and age. The ages of the participants were recorded as follows:

36, 48, 54, 92, 57, 63, 66, 76, 66, 80

Use the following steps to present these data in a cumulative frequency distribution tabular array.

  1. Divide the results into intervals, and then count the number of results in each interval. In this case, intervals of x are appropriate. Since 36 is the lowest age and 92 is the highest age, kickoff the intervals at 35 to 44 and end the intervals with 85 to 94.
  2. Create a tabular array similar to the frequency distribution table but with three extra columns.
    • In the first column or the Lower value column, list the lower value of the result intervals. For case, in the first row, you would put the number 35.
    • The next column is the Upper value column. Place the upper value of the result intervals. For case, you would put the number 44 in the get-go row.
    • The third column is the Frequency column. Tape the number of times a result appears betwixt the lower and upper values. In the start row, identify the number 1.
    • The fourth column is the Cumulative frequency column. Here we add together the cumulative frequency of the previous row to the frequency of the current row. Since this is the showtime row, the cumulative frequency is the aforementioned every bit the frequency. However, in the second row, the frequency for the 35–44 interval (i.e., 1) is added to the frequency for the 45–54 interval (i.eastward. 2). Thus, the cumulative frequency is 3, meaning nosotros have 3 participants in the 34 to 54 age grouping.

      i + 2 = iii

    • The side by side cavalcade is the Percentage cavalcade. In this column, listing the percentage of the frequency. To do this, dissever the frequency by the total number of results and multiply past 100. In this case, the frequency of the first row is ane and the full number of results is 10. The pct would then exist 10.0.

      x.0. (1 ÷ 10) X 100 = 10.0

    • The last column is Cumulative pct. In this cavalcade, split up the cumulative frequency by the total number of results and then to make a pct, multiply by 100. Annotation that the terminal number in this column should always equal 100.0. In this example, the cumulative frequency is i and the full number of results is 10, therefore the cumulative percentage of the first row is 10.0.

      x.0. (1 ÷ ten) X 100 = 10.0

    The cumulative frequency distribution tabular array should expect like this:

    
    Tabular array 4.3.2
    Ages of participants at a chess tournament
    Table summary
    This table displays the results of Ages of participants at a chess tournament. The information is grouped by Lower Value (appearing as row headers), Upper Value, Frequency (f), Cumulative frequency, Percent and Cumulative percentage (actualization as column headers).
    Lower Value Upper Value Frequency (f) Cumulative frequency Percentage Cumulative percentage
    35 44 1 1 x.0 ten.0
    45 54 2 three xx.0 xxx.0
    55 64 2 v 20.0 50.0
    65 74 ii 7 20.0 70.0
    75 84 2 9 20.0 ninety.0
    85 94 one 10 x.0 100.0

Grade intervals

If a variable takes a large number of values, so it is easier to present and handle the data past grouping the values into course intervals. Continuous variables are more than probable to be presented in class intervals, while discrete variables can be grouped into class intervals or not.

To illustrate, suppose we ready out age ranges for a study of young people, while allowing for the possibility that some older people may as well fall into the scope of our study.

The frequency of a class interval is the number of observations that occur in a item predefined interval. So, for instance, if 20 people anile 5 to 9 appear in our study's data, the frequency for the 5–ix interval is 20.

The endpoints of a class interval are the lowest and highest values that a variable tin take. So, the intervals in our study are 0 to 4 years, 5 to 9 years, 10 to 14 years, 15 to xix years, xx to 24 years, and 25 years and over. The endpoints of the first interval are 0 and 4 if the variable is discrete, and 0 and 4.999 if the variable is continuous. The endpoints of the other class intervals would be determined in the aforementioned way.

Course interval width is the difference between the lower endpoint of an interval and the lower endpoint of the side by side interval. Thus, if our written report'south continuous intervals are 0 to four, v to 9, etc., the width of the showtime five intervals is 5, and the last interval is open up, since no higher endpoint is assigned to it. The intervals could also be written as 0 to less than five, 5 to less than 10, 10 to less than xv, 15 to less than 20, twenty to less than 25, and 25 and over.

Rules for data sets that contain a large number of observations

In summary, follow these basic rules when constructing a frequency distribution tabular array for a information ready that contains a large number of observations:

  • find the everyman and highest values of the variables
  • make up one's mind on the width of the form intervals
  • include all possible values of the variable.

In deciding on the width of the class intervals, you will accept to notice a compromise between having intervals short enough then that not all of the observations fall in the same interval, but long enough so that you do not end upward with only ane observation per interval.

Information technology is also important to make sure that the course intervals are mutually exclusive and collectively exhaustive.

Example three – Constructing a frequency distribution table for big numbers of observations

Thirty AA batteries were tested to make up one's mind how long they would last. The results, to the nearest minute, were recorded as follows:

423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363, 391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410, 419, 386, 390

Apply the steps in Example 1 and the above rules to help yous construct a frequency distribution table.

Respond

The everyman value is 363 and the highest is 431.

Using the given data and a class interval of 10, the interval for the first class is 360 to 369 and includes 363 (the lowest value). Remember, in that location should always exist enough class intervals so that the highest value is included.

The completed frequency distribution tabular array should look like this:



Tabular array 4.iii.iii
Life of AA batteries, in minutes
Tabular array summary
This table displays the results of Life of AA batteries. The information is grouped past Bombardment life, minutes (ten) (appearing as row headers), Frequency (f) (appearing every bit column headers).
Battery life, minutes (x) Frequency (f)
360–369 2
370–379 three
380–389 5
390–399 7
400–409 5
410–419 4
420–429 3
430–439 1
Total 30
Example 4 – Constructing relative frequency and percentage frequency tables

An analyst studying the data from case 3 might desire to know not only how long batteries final, but besides what proportion of the batteries falls into each class interval of battery life.

This relative frequency of a item ascertainment or form interval is found past dividing the frequency (f) past the number of observations (n): that is, (f ÷ n). Thus:

Relative frequency = frequency ÷ number of observations

The percent frequency is plant by multiplying each relative frequency value by 100. Thus:

Percentage frequency = relative frequency Ten 100 = f ÷ n Ten 100

Use the data from Example 3 to make a table giving the relative frequency and pct frequency of each interval of battery life.

Here is what that tabular array looks like:



Table 4.3.4
Life of AA batteries, in minutes
Table summary
This table displays the results of Life of AA batteries. The information is grouped past Battery life, minutes (x) (appearing every bit row headers), Frequency (f), Relative frequency and Percent frequency (appearing as column headers).
Battery life, minutes (x) Frequency (f) Relative frequency Percent frequency
360–369 2 0.07 vii
370–379 iii 0.1 ten
380–389 5 0.17 17
390–399 7 0.23 23
400–409 5 0.17 17
410–419 4 0.xiii 13
420–429 3 0.i 10
430–439 i 0.03 3
Total thirty i 100

An analyst of these information could now say that:

  • 7% of AA batteries take a life of from 360 minutes upward to simply less than 370 minutes, and that
  • the probability of any randomly selected AA battery having a life in this range is approximately 0.07.
Case 5 – Visualization of the cumulative relative frequency distribution

As previously shown for example 2, cumulative frequency is used to make up one's mind the number of observations that lie below a particular value in a information set. The cumulative frequency is calculated by adding each frequency from a frequency distribution tabular array to the sum of its predecessors. The concluding value will e'er be equal to the total for all observations, since all frequencies volition already have been added to the previous total. Let's look at another example of how to summate the cumulative frequency.

The daily number of rock climbers in Lake Louise, Alberta was recorded over a 30-day catamenia. The results are every bit follows:

31, 49, xix, 62, 24, 45, 23, 51, 55, threescore, 40, 35 54, 26, 57, 37, 43, 65, 18, 41, 50, 56, 4, 54, 39, 52, 35, 51, 63, 42.

The number of rock climbers ranges from 4 to 65. In lodge to create a frequency tabular array, the information are best grouped in class intervals of 10. Each interval can exist ane row in the frequency table. The Frequency column lists the number of observations found inside a class interval. For instance, at that place are merely 2 values in the interval from 10 to xx, and so its frequency is two in the table accordingly.

Apply the Frequency column to summate cumulative frequency.

  1. First, add together the number from the Frequency column to its predecessor. For example, in the first row, nosotros have simply one observation and no predecessors. The cumulative frequency is ane.
    one + 0 = 1
  2. Withal, in the 2d row, at that place are two observations. Add together these two to the previous cumulative frequency (one), and the result is three.
    1 + 2 = 3
  3. Record the results in the Cumulative frequency column.

The other entries in the table can be calculated similarly. Results are presented in the table 4.iii.5.



Table 4.3.5
Frequency and cumulative frequency of daily number of rock climbers recorded in Lake Louise, Alberta, 30-twenty-four hour period period
Tabular array summary
This table displays the results of Frequency and cumulative frequency of daily number of rock climbers recorded in Lake Louise. The information is grouped by Number of stone climbers (appearing every bit row headers), Frequency (f) and Cumulative frequency (appearing as cavalcade headers).
Number of rock climbers Frequency (f) Cumulative frequency
<ten one 1
10 to <20 two 1 + 2 = 3
20 to <xxx 3 3 + 3 = 6
30 to <40 5 six + 5 = 11
xl to <50 half dozen 11 + 6 = 17
l to <threescore nine 17 + 9 = 26
>= 60 4 26 + iv = 30

Cumulative relative frequency is another way of expressing frequency distribution. It is obtained past computing the percentage of the cumulative frequency inside each interval.

Cumulative percentage is calculated by dividing the cumulative frequency by the total number of observations (n), then multiplying it past 100 (the last value will always be equal to 100%). Thus,

cumulative relative frequency = (cumulative frequency ÷ north) x 100

The quaternary cavalcade in the table 4.3.half-dozen shows the adding of the cumulative relative frequency of the daily number of rock climbers recorded in Lake Louise.



Table 4.3.6
Cumulative relative frequency of daily number of rock climbers recorded in Lake Louise, Alberta, 30-day catamenia
Table summary
This table displays the results of Cumulative relative frequency of daily number of stone climbers recorded in Lake Louise. The data is grouped by Number of rock climbers (actualization every bit row headers), Frequency (f), Cumulative frequency and Cumulative relative frequency (%) (appearing equally cavalcade headers).
Number of stone climbers Frequency (f) Cumulative frequency Cumulative relative frequency (%)
<10 ane i i ÷ thirty ten 100 = 3
10 to <20 2 1 + ii = iii three ÷ xxx x 100 = 10
20 to <xxx 3 3 + three = 6 6 ÷ 30 10 100 = 20
30 to <forty 5 6 + v = 11 11 ÷ thirty x 100 = 37
40 to <50 vi 11 + 6 = 17 17 ÷ 30 x 100 = 57
50 to <sixty 9 17 + 9 = 26 26 ÷ xxx x 100 = 87
>= 60 4 26 + 4 = 30 30 ÷ thirty x 100 = 100

The cumulative relative frequency distribution can be visualized with a bar chart or a line nautical chart, similar in chart 4.3.i beneath. The value on the horizontal axis is the upper jump of the course interval.

Chart 4.3.1 Cumulative relative frequency of the daily number of rock climbers in Lake Louise, Alberta, during a 30-day period

Information table for Nautical chart iv.3.1 
Information tabular array for chart 4.3.1
Table summary
This table displays the results of Data table for chart four.3.1. The data is grouped by Upper spring of the course interval of daily number of stone climbers (appearing as row headers), Cumulative relative frequency (%) (actualization as column headers).
Upper jump of the class interval of daily number of stone climbers Cumulative relative frequency (%)
ix 3
19 ten
29 20
39 37
49 57
59 87
69 100

Chart 4.3.1 shows that for the bulk of days (57%) in the period, the number of stone climbers was lower or equal to 49.

Frequency distribution can be visualized using:

  • a pie nautical chart (nominal variable),
  • a bar chart (nominal or ordinal variable),
  • a line chart (ordinal or discrete variable),
  • or a histogram (continuous variable).

These types of charts volition exist presented in the section v on data visualization. Just first, nosotros will expect at other methods to summarize data using measures of key tendency and dispersion.


Written report a problem on this page

Is something not working? Is at that place information outdated? Can't find what yous're looking for?

Please contact us and let the states know how we can help you.

Privacy notice

Date modified: