Python Experiment: For VS Comprehension 1
When we use Python, we often encounter the problem which is better, for statement or comprehension when we want to generate list. There're many articles which recommends Comprehension because of better performance than for statement.
BUT, when we're coding, you may encounter some questions below.
Is it correct way to use comprehension in this situation?
Are there any situations that we should avoid using comprehension?
To answer these questions, I decided to conduct experiments of the differnce between for statement and comprehension when we generate list.
In this post, I'll conduct experiment in a simple situation: use range function to generate the series of integer and push them to a list. If you want to check the whole code I use for this experiment, please visit here.
TL; DR
In briefly, the result of this experiments are
- It's better to use comprehension instead of for statement under the conditions below.
- It's also a good way to define append function before for statement.
Experiment summary
- We'll create a list which contains the series of integers.
- The maximum data pushed to the list is range from 10 to 1M in 10 times increments.
- Prepare three functions to compare the performance between for statement and comprehension. I'll tell the detail of each functions later.
- Exeucte same function 10 times to calculate average elapsed time.
Code description
Import libraries
First, import libraries.
import time
import matplotlib.pyplot as plt
from tqdm import tqdm
time is used to calculate elapsed time. matplotlib is used to show the experiment result. And tqdm is imported as to show the progress of execution.
Define functions
Next, implement functions which create list.
def make_list_with_for_v1(item_num: int):
result = []
for i in range(item_num):
result.append(i)def make_list_with_for_v2(item_num: int):
result = []
append = result.append
for i in range(item_num):
append(i)def make_list_with_comprehension(item_num: int):
result = [i for i in range(item_num)]
The details of each functions are
- make_list_with_for_v1 creates a list using for statement.
- make_list_with_for_v2 creates a list with for statement as well. Compared to make_list_with_for_v1, this function define append function which push new item to list. By defining append function before for statement, the overhead for calling result.append can be ignored.
- make_list_with_comprehension creates a list using comprehension.
Carry out experiment
Let's start experiment.
ITEM_NUMS = [10**i for i in range(1, 7)]
EXPERIMENT_ITERATION = 10for_v1_elasped_times = []
for_v2_elasped_times = []
comprehension_elapsed_times = []
for item_num in tqdm(ITEM_NUMS):
for_v1_elapsed_time = 0
for_v2_elapsed_time = 0
comprehension_v1_elapsed_time = 0
for iteration in range(EXPERIMENT_ITERATION):
# Experiment of make_list_with_for_v1
st_for_v1 = time.time()
make_list_with_for_v1(item_num)
for_v1_elapsed_time += time.time() - st_for_v1
# Experiment of make_list_with_for_v1
st_for_v2 = time.time()
make_list_with_for_v2(item_num)
for_v2_elapsed_time += time.time() - st_for_v2
# Experiment of make_list_with_comprehension
st_comprehension = time.time()
make_list_with_comprehension(item_num)
comprehension_v1_elapsed_time += time.time() - st_comprehension
for_v1_elapsed_time /= EXPERIMENT_ITERATION
for_v2_elapsed_time /= EXPERIMENT_ITERATION
comprehension_v1_elapsed_time /= EXPERIMENT_ITERATION
for_v1_elasped_times.append(for_v1_elapsed_time)
for_v2_elasped_times.append(for_v2_elapsed_time)
comprehension_elapsed_times.append(comprehension_v1_elapsed_time)
The detail of these codes are
- ITEM_NUMS contains the number of items which the result list contains.
- EXPERIMENT_ITERATION is for defining how many times the experiment will be executed under the same conditions.
- XXX_elapsed_times contains the elapsed time
Visualize results
Finally, let's make a graph for visualizing relationships between the number of items in a list and elapsed time.
plt.plot(ITEM_NUMS, for_v1_elasped_times, label="for v1", color="r")
plt.plot(ITEM_NUMS, for_v2_elasped_times, label="for v2", color="b")
plt.plot(ITEM_NUMS, comprehension_elapsed_times, label="comprehension", color="g")
plt.xlabel("the number of items in the result list")
plt.xscale("log")
plt.ylabel("Elapsed time [s]")
plt.legend()
plt.show()
In my environement, the result is like below.

As you can see the result above, comprehension is the best way to craete list under the conditions. And It's better to define append function before statement than not to define the function.
Moreover, the difference of elapsed time between for statement and comprehension linearly increases.
As a result, it's better to use comprehension than for statement.
MY FUTURE WORK
In this post, I conducted a simple experiment for calculating performance differences between for statement and comprehension. I found that it's better to use comprehension instead of for statement in this situation.
BUT, I think there're situations that we should avoid using comprehension. So, I'll do more experiments in a variety of experiment conditions. Please leave comments if you have any ideas what to examine, and questions to this post as well!
THANK YOU FOR WATCHING MY POST!