Making Files Smaller: Data Compression

OCR GCSE Computer Science (J277) - Topic 1.2.4

Starter Activity (5 mins)

Quick Recall - File Sizes:

Think about different types of files. Which generally take up more storage space?

1. Rank these from typically smallest to largest file size:
(Plain text document, High-quality photo, Short audio clip, HD Movie)

2. Why might we want to make large files smaller?

Example Answers:

1. Plain text document < Short audio clip < High-quality photo < HD Movie.
2. To save storage space, send files faster (email/internet), reduce download/streaming times, save data allowance.

Lesson Outcomes

By the end of this lesson, you should be able to:

Define compressionA method or protocol for using fewer bits to represent the original information. and explain why it is needed.
Explain the difference between lossy compressionReduction of file size by removing certain, redundant information from the file. The eliminated data is unrecoverable. Tries to recreate a file without the omitted data. Much smaller file sizes but there will be some loss of quality. and lossless compressionEvery bit of the original data can be recovered from the compressed file. The uncompressed image will be the same as the original with no loss of data. Works by looking for patterns in the data. Larger compressed file sizes than lossy. e.g. Run-length encoding..
Describe how lossy compression works for images and audio (e.g., reducing colour depth, perceptual shaping).
Describe how lossless compression works, including Run Length Encoding (RLE)Lossless compression technique that summarises consecutive patterns of the same data. Works well with image and sound data where data could be repeated many times..
Identify common file types associated with lossy (JPEGA commonly used file format that uses lossy compression for digital photography., MP3Lossy compressed audio file format.) and lossless compression.
Discuss the benefits and drawbacks of each compression type.

Introduction: Why Compress?

Read

Digital files, especially images, audio, and video, can be very large. CompressionA method or protocol for using fewer bits to represent the original information. is the process of reducing the size of a data file.

There are many reasons to compress dataDownload/buffering times decrease. Smaller files = fewer packets = faster transmission time. Reduces traffic over the Internet. Less chance of collisions or transmission errors. Data allowances do not run out as quickly = Saves money. Voice can be transmitted fast enough to keep up with speech in a video. Saves spending more money on data storage e.g. hard drives, cloud etc. Faster to back-up data., including saving storage space and reducing the time it takes to transfer files over networks (like the internet).

There are two main types of compression: lossy compressionReduction of file size by removing certain, redundant information from the file. The eliminated data is unrecoverable. Tries to recreate a file without the omitted data. Much smaller file sizes but there will be some loss of quality. (which removes some data permanently) and lossless compressionEvery bit of the original data can be recovered from the compressed file. The uncompressed image will be the same as the original with no loss of data. Works by looking for patterns in the data. Larger compressed file sizes than lossy. e.g. Run-length encoding. (which allows the original data to be perfectly reconstructed).

This lesson explores how these methods work and when they are used. Hover over keywordsThis is an example tooltip! for definitions.

Task 1: Lossy Image Compression Techniques

Describe ways we could reduce the file size of a digital image using lossy compressionReduction of file size by removing certain, redundant information from the file. The eliminated data is unrecoverable. Tries to recreate a file without the omitted data. Much smaller file sizes but there will be some loss of quality. techniques. [3 marks]

Mark Scheme / Example Points:

Permanently deleting some data / file cannot be restored to original (1 mark).
Reducing the colour depth / using a smaller colour palette (1 mark).
Reducing the image resolution / reducing the number of pixels (1 mark).
Applying a compression algorithm that removes less noticeable details (1 mark).
Removing redundant data (e.g., areas of similar colour) (1 mark).

(Max 3 marks)

Task 2: Choose the Compression Type

For each use case below, select whether lossy compressionReduction of file size by removing certain, redundant information from the file. The eliminated data is unrecoverable. Tries to recreate a file without the omitted data. Much smaller file sizes but there will be some loss of quality. or lossless compressionEvery bit of the original data can be recovered from the compressed file. The uncompressed image will be the same as the original with no loss of data. Works by looking for patterns in the data. Larger compressed file sizes than lossy. e.g. Run-length encoding. would be more appropriate and briefly explain why.

Use Case	Lossy / Lossless	Reason
A music track for streaming
A text document (e.g., essay)
A video for online streaming
An image for a website
A Python program file
A database file

Task 3 – Text Compression (Dictionary/Index)

A simple lossless technique uses a dictionary (or index) to replace repeated words/phrases with shorter codes. Consider this text:

Hickory, dickory, dock,
The mouse ran up the clock.
The clock struck one,
The mouse ran down.
Hickory, dickory, dock.

Using the dictionary provided, observe the simulation of Line 1 being decoded, then enter the codes for Line 2.

Dictionary:

Code	Text	Code	Text
0	End of Line	8	ran_
1	H	9	up_
2	ickory,_	10	the_
3	d	11	cl
4	ock	12	.
5	,	13	_struck_
6	The_	14	one,
7	mouse_	15	down.

Note: '_' represents a space character.

Simulate Line 1 Decoding:

Click 'Simulate' to see Line 1 reconstructed.

Enter codes for Line 2 ("The mouse ran up the clock."):

Now, consider the entire compressed text sequence (all 5 lines).

1 2 3 2 3 4 5 0 6 7 8 9 10 11 4 12 0 10 11 4 13 14 0 6 7 8 15 0 1 2 3 2 3 4 12

If one byte represents each code in the sequence above, what is the file size (in bytes) of the compressed text (excluding the dictionary)? Bytes

The original text is 117 characters (assume 1 byte/char). Calculate the reduction in file size as a percentage using the full compressed sequence. %

Hint: Percentage Reduction =
((Original Size - Compressed Size) / Original Size) * 100

Task 4: Audio Compression Calculation

Click the link below to open a separate page where you can calculate download times for different audio file qualities and connection speeds, and consider the advantages of audio compression.

Open Download Time Calculator Task

Real-World Connection: Compression in Daily Life

Read

Compression is essential for the modern internet and digital media:

Web Images: Most images online use lossy compressionReduction of file size by removing certain, redundant information from the file. The eliminated data is unrecoverable. Tries to recreate a file without the omitted data. Much smaller file sizes but there will be some loss of quality. (like JPEGA commonly used file format that uses lossy compression for digital photography.) to load quickly in web browsers. Lossless compressionEvery bit of the original data can be recovered from the compressed file. The uncompressed image will be the same as the original with no loss of data. Works by looking for patterns in the data. Larger compressed file sizes than lossy. e.g. Run-length encoding. (like PNG) is used when exact detail is crucial (e.g., logos, diagrams).
Music Streaming: Services like Spotify use lossy audio formats (like MP3Lossy compressed audio file format. or AAC) to reduce bandwidth usage and storage, allowing smooth playback even on slower connections. Perceptual music shapingRefers to the process of removing inaudible sounds in order to make a file size smaller. e.g. Noises at frequencies that humans cannot hear. Quiet sounds that cannot be heard over louder sounds. is key here.
Video Streaming: YouTube, Netflix etc., heavily compress video using lossy methods to make streaming feasible. Quality settings often adjust the level of compression.
File Archiving: ZIP or RAR files use lossless compression to bundle multiple files and reduce their total size without losing any data, perfect for backups or sending attachments.

Myth Busters: Compression Misconceptions

Read

Myth 1: "Lossless compression doesn't make files much smaller."

Reality: While lossless compressionEvery bit of the original data can be recovered from the compressed file. The uncompressed image will be the same as the original with no loss of data. Works by looking for patterns in the data. Larger compressed file sizes than lossy. e.g. Run-length encoding. typically achieves lower compression ratios than lossy, it can still significantly reduce file size, especially for data with repeating patterns (like text or simple graphics). The reduction depends heavily on the data itself.

Myth 2: "Lossy compression completely ruins the quality of images/audio."

Reality: Good lossy compressionReduction of file size by removing certain, redundant information from the file. The eliminated data is unrecoverable. Tries to recreate a file without the omitted data. Much smaller file sizes but there will be some loss of quality. algorithms are designed to remove data that humans are least likely to notice (perceptual music shapingRefers to the process of removing inaudible sounds in order to make a file size smaller. e.g. Noises at frequencies that humans cannot hear. Quiet sounds that cannot be heard over louder sounds.). While high levels of compression *can* cause noticeable artifacts, moderate levels often provide a good balance between file size and perceived quality.

Myth 3: "You should always use lossless compression for everything."

Reality: Lossless is essential when *every single bit* matters (e.g., program code, text documents, medical scans). However, for media like photos, music, and video intended for viewing/listening, the much smaller file sizes offered by lossy compression are often more practical for storage and transmission, even with a slight, often imperceptible, quality reduction.

Myth 4: "Compressing a file multiple times with lossy compression is okay."

Reality: Re-compressing an already lossy-compressed file (e.g., saving a JPEG as a JPEG again with quality settings) causes further data loss and quality degradation each time. It's best to work with original or losslessly compressed files if further editing and saving are needed.

Key Takeaways

Read

Compression: Reduces file size for easier storage and faster transmission.
Lossy Compression: Permanently removes data (often data humans won't notice) to achieve significant size reduction (e.g., JPEGA commonly used file format that uses lossy compression for digital photography., MP3Lossy compressed audio file format.). Quality is slightly reduced. Cannot recover original file.
Lossless Compression: Reduces file size without losing any data by finding patterns (e.g., Run Length Encoding (RLE)Lossless compression technique that summarises consecutive patterns of the same data. Works well with image and sound data where data could be repeated many times., ZIP). Original file can be perfectly restored. Less size reduction than lossy.
RLE: A lossless method that replaces sequences of repeating data with one instance of the data and a count (e.g., AAAAA becomes A5).
Benefits of Compression: Saves storage space, reduces download/upload times, lowers bandwidth usage, saves data costs.
Trade-offs: Lossy offers smaller files but sacrifices perfect quality; Lossless preserves quality but results in larger files than lossy.

Exam Practice Questions

1. Describe 3 reasons for compressing data. [3 marks]

Mark Scheme / Example Points:

(1 mark for each valid reason, max 3 marks)

Reduces file size to save storage space (on devices, servers, cloud).
Reduces time needed to transmit files over a network / the internet.
Reduces bandwidth usage (saving data allowance / cost).
Allows large files (like videos) to be streamed effectively without excessive buffering.
Faster to backup or archive data.
Reduces download times for users.
(Accept other valid reasons related to efficiency/cost/speed)

2. How does Run Length Encoding (RLE)Lossless compression technique that summarises consecutive patterns of the same data. Works well with image and sound data where data could be repeated many times. work for sound data compression? [2 marks]

Mark Scheme / Example Points:

Sound is sampled many times per second, creating sequences of digital values. (1 mark)
RLE identifies consecutive, identical sample values (representing the same sound/note for a duration). (1 mark)
It stores the sample value once, along with the count of how many times it repeats consecutively. (1 mark for detail of storage)

(Max 2 marks)

3. Explain why lossy compressionReduction of file size by removing certain, redundant information from the file. The eliminated data is unrecoverable. Tries to recreate a file without the omitted data. Much smaller file sizes but there will be some loss of quality. cannot be used for program code. [2 marks]

Mark Scheme / Example Points:

Lossy compression permanently removes data from the file. (1 mark)
Program code requires every single instruction/character to be exactly correct to function. (1 mark)
Removing even a small amount of data (e.g., a single character or instruction) would corrupt the code, likely causing it to fail, produce errors, or behave unpredictably. (1 mark for consequence)
The original code cannot be perfectly reconstructed from a lossy compressed version. (1 mark)

(Max 2 marks)

Extension Activities

1. Huffman Coding

Another common lossless compression technique is Huffman Coding. It assigns shorter binary codes to more frequent characters/symbols and longer codes to less frequent ones.

Research Task: How does Huffman Coding create its variable-length codes? Why is it considered a lossless technique?

2. Video Compression (Codecs)

Video compression is complex, often using lossy techniques. It involves compressing individual frames (intra-frame) and also only storing the differences between consecutive frames (inter-frame).

Research Task: What is a video codec (e.g., H.264, H.265/HEVC, AV1)? Briefly explain the difference between intra-frame and inter-frame compression in video.

What's Next?

Read

You now understand the basics of lossy and lossless data compression!

This concludes the main topics for Memory & Storage. You might want to review:

Primary Storage (RAM, ROM, Cache).
Secondary Storage (HDD, SSD, Optical).
Storage Units and Capacity Calculations.
Or move on to Data Representation (Topic 1.3).