How to remove duplicates from a list in Python

Removing duplicate elements from Python lists is a common data cleaning task that developers encounter regularly. Python provides multiple built-in methods and techniques to efficiently handle duplicate values while maintaining list order and data integrity.

This guide covers proven techniques for duplicate removal, with practical examples and performance tips. All code examples were created with Claude, an AI assistant built by Anthropic.

Using `set()` to remove duplicates

numbers = [1, 2, 3, 2, 1, 4, 5, 4]
unique_numbers = list(set(numbers))
print(unique_numbers)

[1, 2, 3, 4, 5]

The set() function provides the fastest way to remove duplicates from a Python list. Sets store only unique values by design, automatically discarding duplicates during conversion. Converting the list to a set and back to a list creates a new sequence containing just the unique elements.

This approach offers key advantages for data cleaning:

Maintains O(n) time complexity even with large lists
Handles any hashable data type including numbers and strings
Requires minimal code compared to manual filtering methods

One important consideration: set() does not preserve the original order of elements. If maintaining sequence order matters for your use case, you'll need to explore alternative methods.

Basic techniques for removing duplicates

While set() excels at speed, Python offers several order-preserving methods to remove duplicates—from basic for loops to elegant dict.fromkeys() solutions.

Using a `for` loop to preserve order

numbers = [1, 2, 3, 2, 1, 4, 5, 4]
unique_numbers = []
for num in numbers:
    if num not in unique_numbers:
        unique_numbers.append(num)
print(unique_numbers)

[1, 2, 3, 4, 5]

This straightforward approach uses a for loop to iterate through the original list while building a new list of unique elements. The not in operator checks if each number already exists in unique_numbers before adding it.

Preserves the original order of elements, unlike the set() method
Works with both hashable and unhashable data types
Simple to understand and modify for custom filtering logic

While this method requires more code than using set(), it offers better control over the deduplication process. The trade-off comes in performance. The not in check becomes slower with larger lists because it must scan the entire unique_numbers list for each element.

Using list comprehension with a tracking set

numbers = [1, 2, 3, 2, 1, 4, 5, 4]
seen = set()
unique_numbers = [x for x in numbers if not (x in seen or seen.add(x))]
print(unique_numbers)

[1, 2, 3, 4, 5]

This elegant solution combines list comprehension with a tracking set to maintain element order while achieving better performance than a basic loop. The seen set efficiently tracks encountered elements, while the list comprehension creates the final unique list.

The clever part lies in the condition not (x in seen or seen.add(x)). It leverages Python's short-circuit evaluation and the fact that add() returns None. Here's how it works:

When an element is first encountered, it's not in seen. The add() method adds it and returns None
For duplicates, the in check returns True immediately, skipping the element
This approach preserves order while maintaining good performance for larger lists

The result combines the speed benefits of sets with the order preservation of loops, making it an excellent choice for most deduplication tasks.

Using the `dict.fromkeys()` method

numbers = [1, 2, 3, 2, 1, 4, 5, 4]
unique_numbers = list(dict.fromkeys(numbers))
print(unique_numbers)

[1, 2, 3, 4, 5]

The dict.fromkeys() method creates a dictionary using list elements as keys. Since dictionary keys must be unique, this automatically removes duplicates. Converting back to a list with list() produces the final deduplicated sequence.

Preserves the original order of elements in Python 3.7+ due to dictionary insertion order guarantees
Offers a clean one-line solution that's more readable than complex loops
Performs efficiently by leveraging dictionary's O(1) lookup time

This approach strikes an excellent balance between code simplicity and performance. It works particularly well for basic data types like numbers and strings that can serve as dictionary keys.

Advanced techniques for removing duplicates

Beyond Python's built-in methods, specialized libraries like collections, pandas, and NumPy offer powerful tools for handling duplicate values in complex data structures.

Using `OrderedDict` from collections

from collections import OrderedDict
numbers = [1, 2, 3, 2, 1, 4, 5, 4]
unique_numbers = list(OrderedDict.fromkeys(numbers))
print(unique_numbers)

[1, 2, 3, 4, 5]

The OrderedDict approach combines the benefits of dictionaries with guaranteed order preservation. Similar to dict.fromkeys(), it creates a dictionary using list elements as keys while maintaining their original sequence.

The fromkeys() method automatically discards duplicates since dictionary keys must be unique
Converting back to a list with list() produces the final deduplicated sequence
This method works reliably across all Python versions. Unlike regular dictionaries which only preserve order in Python 3.7+

While OrderedDict requires importing from the collections module, it provides a dependable solution when maintaining element order is crucial. The slight performance overhead compared to regular dictionaries rarely impacts real-world applications.

Using pandas `drop_duplicates()` method

import pandas as pd
data = [(1, 'a'), (2, 'b'), (1, 'a'), (3, 'c')]
df = pd.DataFrame(data, columns=['num', 'letter'])
unique_rows = df.drop_duplicates().values.tolist()
print(unique_rows)

[[1, 'a'], [2, 'b'], [3, 'c']]

Pandas offers a powerful solution for removing duplicates from complex data structures. The drop_duplicates() method efficiently handles duplicate rows in a DataFrame, considering all columns by default when determining uniqueness.

The example creates a DataFrame with paired number-letter data, where one pair (1, 'a') appears twice
Converting the data to a DataFrame enables pandas' optimized duplicate detection
The values.tolist() chain converts the deduplicated DataFrame back into a familiar Python list format

This approach particularly shines when working with tabular data or when you need to remove duplicates based on multiple columns. Pandas handles the heavy lifting of comparing complex data structures while maintaining excellent performance.

Using NumPy's `unique()` function with index tracking

import numpy as np
numbers = [4, 1, 3, 2, 1, 4, 5, 3]
unique_indices = np.unique(numbers, return_index=True)[1]
unique_in_order = [numbers[i] for i in sorted(unique_indices)]
print(unique_in_order)

[4, 1, 3, 2, 5]

NumPy's unique() function with return_index=True returns both unique values and their first occurrence positions in the original array. The code leverages these indices to maintain the original order while removing duplicates.

The unique_indices variable captures the positions where each unique number first appears in the list
Sorting these indices with sorted() ensures elements appear in their original sequence
The list comprehension [numbers[i] for i in sorted(unique_indices)] rebuilds the list using only the first occurrences

This approach combines NumPy's efficient array operations with Python's built-in sorting capabilities. It works particularly well for numerical data where maintaining the original order matters.

Finding unique words in a text

Text processing often requires extracting unique words while preserving their original sequence, and Python's list comprehension with a tracking set delivers an elegant solution for this common task.

text = "The quick brown fox jumps over the lazy dog. The dog was not amused."
words = text.lower().replace('.', '').split()
seen = set()
unique_words = [word for word in words if not (word in seen or seen.add(word))]
print(unique_words)

This code efficiently extracts unique words from a text string while preserving their original order. The process starts by converting the text to lowercase with lower() and removing periods with replace(). The split() function then creates a list of individual words.

The seen set tracks encountered words
The list comprehension creates a new list containing only first occurrences
The condition not (word in seen or seen.add(word)) cleverly combines checking and tracking in one step

This technique proves especially useful when processing natural language text where maintaining the original word sequence matters. The solution balances readability with efficient memory usage.

Removing duplicate users while keeping most recent data

When managing user data, a common challenge involves retaining only the most recent record for each unique user ID while discarding outdated entries—this example demonstrates an elegant dictionary-based solution for deduplicating and updating user profiles.

user_records = [
    {"id": 101, "name": "Alice", "timestamp": "2023-01-15"},
    {"id": 102, "name": "Bob", "timestamp": "2023-01-16"},
    {"id": 101, "name": "Alice Smith", "timestamp": "2023-02-20"},
    {"id": 102, "name": "Robert", "timestamp": "2023-02-25"}
]

latest_records = {}
for record in user_records:
    user_id = record["id"]
    if user_id not in latest_records or record["timestamp"] > latest_records[user_id]["timestamp"]:
        latest_records[user_id] = record

unique_users = list(latest_records.values())
print([f"{user['id']}: {user['name']}" for user in unique_users])

This code efficiently handles user profile updates by maintaining only the most recent record for each unique user ID. The latest_records dictionary stores user records with their IDs as keys, automatically overwriting older entries when newer timestamps appear.

The core logic lies in the if condition. It checks two scenarios: either the user ID doesn't exist yet in latest_records, or the current record has a more recent timestamp than the stored one. When either condition is true, the code updates the dictionary with the current record.

Uses dictionary's O(1) lookup time for efficient duplicate checking
Preserves the most recent data by comparing timestamp strings
Outputs a clean list of unique users with their latest information

Common errors and challenges

Python developers often encounter three key challenges when removing duplicates: handling unhashable types, preserving element order, and managing case sensitivity.

Dealing with unhashable types like `list` when removing duplicates

Python's set() function cannot directly handle lists as elements because lists are mutable. When you try to convert nested lists into a set, Python raises a TypeError. The code below demonstrates this common pitfall.

data = [[1, 2], [3, 4], [1, 2], [5, 6]]
unique_data = list(set(data))
print(unique_data)

The code fails because Python can't hash lists as elements within a set(). Lists can change after creation, making them incompatible with Python's hash-based data structures. Let's examine a working solution in the code below.

data = [[1, 2], [3, 4], [1, 2], [5, 6]]
unique_data = []
for item in data:
    if item not in unique_data:
        unique_data.append(item)
print(unique_data)

The solution uses a simple for loop with not in checks to handle unhashable types like lists. This approach works because it compares list elements directly instead of trying to hash them. While slightly slower than set(), it reliably removes duplicates while preserving the original order.

Watch for this issue when working with nested data structures like lists of lists or dictionaries
Consider converting unhashable elements to tuples if order preservation isn't critical
Remember that modifying list elements after deduplication could create unintended duplicates

This pattern becomes especially important when processing complex data structures from APIs or file imports that contain nested arrays or objects.

Maintaining order when using `set()` to remove duplicates

The set() function efficiently removes duplicates but randomizes the sequence of elements in your list. This behavior can create unexpected results when order matters. The output below demonstrates how Python's set operations shuffle the original sequence.

numbers = [10, 5, 3, 5, 10, 8]
unique_numbers = list(set(numbers))
print(unique_numbers)

The set() operation discards the original sequence, outputting elements in an arbitrary order that may differ between Python runs. The code below demonstrates a reliable solution that maintains the initial ordering.

from collections import OrderedDict
numbers = [10, 5, 3, 5, 10, 8]
unique_numbers = list(OrderedDict.fromkeys(numbers))
print(unique_numbers)

The OrderedDict solution elegantly preserves element sequence while removing duplicates. Unlike set(), which randomizes order, OrderedDict.fromkeys() maintains the original position of each element in the list. This approach works consistently across all Python versions.

Watch for order preservation when deduplicating sorted data or sequences where element position carries meaning
Consider using this method when processing user inputs, log files, or time-series data where sequence matters
The slight performance overhead rarely impacts real-world applications

Removing duplicates in a case-insensitive manner

Python's set() function treats strings with different letter cases as distinct elements. When deduplicating text data, this default case-sensitive behavior often produces unexpected results by keeping both uppercase and lowercase versions of the same word.

words = ["apple", "Apple", "banana", "orange"]
unique_words = list(set(words))
print(unique_words)

The set() function treats "apple" and "Apple" as completely different strings. This creates duplicate entries in the final output when we only want one version of each word. Let's examine a solution that handles case differences properly.

words = ["apple", "Apple", "banana", "orange"]
seen = set()
unique_words = []
for word in words:
    if word.lower() not in seen:
        seen.add(word.lower())
        unique_words.append(word)
print(unique_words)

The solution uses a tracking set to store lowercase versions of words while maintaining the original case in the output list. The seen set checks for duplicates by converting each word to lowercase with word.lower(). When a new word appears, both the lowercase version enters the tracking set and the original word joins the output list.

Watch for case sensitivity when processing user input or text data from multiple sources
Consider this approach for search functionality where "Apple" and "apple" should match
Remember that the first occurrence of a word preserves its original capitalization

This pattern proves especially useful when cleaning data from user forms, processing search queries, or standardizing text datasets where case variations shouldn't create duplicates.

How to remove duplicates from a list in Python

Using `set()` to remove duplicates

Basic techniques for removing duplicates

Using a `for` loop to preserve order

Using list comprehension with a tracking set

Using the `dict.fromkeys()` method

Advanced techniques for removing duplicates

Using `OrderedDict` from collections

Using pandas `drop_duplicates()` method

Using NumPy's `unique()` function with index tracking

Finding unique words in a text

Removing duplicate users while keeping most recent data

Common errors and challenges

Dealing with unhashable types like `list` when removing duplicates

Maintaining order when using `set()` to remove duplicates

Removing duplicates in a case-insensitive manner

FAQs

How do you remove duplicates while preserving the original order of elements?

What's the difference between using set() and dict.fromkeys() for duplicate removal?

Can you remove duplicates from a list containing unhashable types like dictionaries?

How do you remove duplicates based on a specific condition or key?

What happens to the original list when you use methods like set() to remove duplicates?

🏠

How to remove duplicates from a list in Python

Using set() to remove duplicates

Basic techniques for removing duplicates

Using a for loop to preserve order

Using list comprehension with a tracking set

Using the dict.fromkeys() method

Advanced techniques for removing duplicates

Using OrderedDict from collections

Using pandas drop_duplicates() method

Using NumPy's unique() function with index tracking

Finding unique words in a text

Removing duplicate users while keeping most recent data

Common errors and challenges

Dealing with unhashable types like list when removing duplicates

Maintaining order when using set() to remove duplicates

Removing duplicates in a case-insensitive manner

FAQs

How do you remove duplicates while preserving the original order of elements?

What's the difference between using set() and dict.fromkeys() for duplicate removal?

Can you remove duplicates from a list containing unhashable types like dictionaries?

How do you remove duplicates based on a specific condition or key?

What happens to the original list when you use methods like set() to remove duplicates?

🏠

Using `set()` to remove duplicates

Using a `for` loop to preserve order

Using the `dict.fromkeys()` method

Using `OrderedDict` from collections

Using pandas `drop_duplicates()` method

Using NumPy's `unique()` function with index tracking

Dealing with unhashable types like `list` when removing duplicates

Maintaining order when using `set()` to remove duplicates