Python Hamming Distance














































Python Hamming Distance




Hamming Distance

The Hamming distance between two strings of the same length is the number of positions in which the corresponding symbols are different. This metric, proposed by Richard W. Hamming in his seminal paper "Error Detecting and Error Correcting Codes", Bell System Technical Journal 26(2):147-160 (1950), also expresses the least number of (intended or unintended) substitutions needed to change one string to another.

Here is a Wikipedia image with a brief explanation and examples:


                      https://en.wikipedia.org/wiki/Hamming_distance

Example: Hamming Distance between ATCGATCG and ATCCATGG is 2.

Besides being used in computer- and communications-related fields such as information theory, coding theory, and cryptography, the Hamming distance concept has also found its way into genomics for the comparison of genomic sequences. Its importance can also be judged by the fact that modern microprocessors have a specific instruction (POPCNT, "population count") for counting the number of bits set to 1.

Assuming two bit strings of equal length, x and y, the 'obvious' algorithm to calculate the Hamming distance between them is to count the number of bits in the result of the expression 'x xor y', as shown in the following Python code:

1
2
3
4
5
6
7
8
9
def hamming1(x,y):
    """Calculate the Hamming distance between two bit strings"""
    assert len(x) == len(y)
    count,z = 0,int(x,2)^int(y,2)
    while z:
        if z&1:
            count += 1
        z >>= 1
    return count

While searching the web the other day, I have come across a very elegant algorithm for the same purpose, which can be coded in Python as follows:

1
2
3
4
5
6
7
8
def hamming2(x,y):
    """Calculate the Hamming distance between two bit strings"""
    assert len(x) == len(y)
    count,z = 0,int(x,2)^int(y,2)
    while z:
        count += 1
        z &= z-1 # magic!
    return count

Apart from being 50% faster than the first one, its simplicity is awesome! After being counted, the lowest-order nonzero bit is cleared . 



More Articles of Aditi Kothiyal:

Name Views Likes
Python AdaBoost Mathematics Behind AdaBoost 421 1
Python PyCaret How to optimize the probability threshold % in binary classification 2069 0
Python K-means Predicting Iris Flower Species 1322 2
Python PyCaret How to ignore certain columns for model building 2624 0
Python PyCaret Experiment Logging 679 0
Python PyWin32 Open a File in Excel 941 0
Python Guppy GSL Introduction 219 2
Python Usage of Guppy With Example 1100 2
Python Naive Bayes Tutorial 552 2
Python Guppy Recent Memory Usage of a Program 892 2
Introduction to AdaBoost 289 1
Python AdaBoost Implementation of AdaBoost 512 1
Python AdaBoost Advantages and Disadvantages of AdaBoost 3713 1
Python K-Means Clustering Applications 332 2
Python Random Forest Algorithm Decision Trees 439 0
Python K-means Clustering PREDICTING IRIS FLOWER SPECIES 456 1
Python Random Forest Algorithm Bootstrap 475 0
Python PyCaret Util Functions 441 0
Python K-means Music Genre Classification 1763 1
Python PyWin Attach an Excel file to Outlook 1540 0
Python Guppy GSL Document and Test Example 248 2
Python Random Forest Algorithm Bagging 386 0
Python AdaBoost An Example of How AdaBoost Works 279 1
Python PyWin32 Getting Started PyWin32 602 0
Python Naive Bayes in Machine Learning 374 2
Python PyCaret How to improve results from hyperparameter tuning by increasing "n_iter" 1723 0
Python PyCaret Getting Started with PyCaret 2.0 356 1
Python PyCaret Tune Model 1325 1
Python PyCaret Create your own AutoML software 320 0
Python PyCaret Intoduction to PyCaret 296 1
Python PyCaret Compare Models 2696 1
Python PyWin Copying Data into Excel 1152 0
Python Guppy Error: expected function body after function declarator 413 2
Python Coding Random forest classifier using xgBoost 246 0
Python PyCaret How to tune "n parameter" in unsupervised experiments 658 0
Python PyCaret How to programmatically define data types in the setup function 1403 0
Python PyCaret Ensemble Model 805 1
Python Random forest algorithm Introduction 227 0
Python k-means Clustering Example 337 1
Python PyCaret Plot Model 1243 1
Python Hamming Distance 715 0
Python Understanding Random forest algorithm 309 0
Python PyCaret Sort a Dictionary by Keys 244 0
Python Coding Random forest classifier using sklearn 340 0
Python Guppy Introduction 368 2
Python How to use Guppy/Heapy for tracking down Memory Usage 1068 2
Python AdaBoost Summary and Conclusion 231 1
Python PyCaret Create Model 365 1
Python k -means Clusturing Introduction 325 2
Python k-means Clustering With Example 348 2

Comments