Automatic Vehicle Number Plate Recognition using OpenCV and Tesseract OCR A Real World Python project














































Automatic Vehicle Number Plate Recognition using OpenCV and Tesseract OCR A Real World Python project



Automatic Vehicle Number Plate Recognition using OpenCV and Tesseract OCR

Automatic Vehicle Number  Plate Recognition is an image-processing technology used to identify vehicles by their license plates. This technology is used in various security and traffic applications. We will use the Tesseract OCR An Optical Character Recognition Engine (OCR Engine) to automatically recognize text in vehicle registration plates.
Steps involved in License Plate Recognition
1. License Plate Detection: The first step is to detect the License plate from the car. We will use the contour option in OpenCV to detect for rectangular objects to find the number plate. The accuracy can be improved if we know the exact size, color and approximate location of the number plate. Normally the detection algorithm is trained based on the position of camera and type of number plate used in that particular country. This gets trickier if the image does not even have a car, in this case we will an additional step to detect the car and then the license plate.
2. Character Segmentation: Once we have detected the License Plate we have to crop it out and save it as a new image. Again this can be done easily using OpenCV.

3. Character Recognition: Now, the new image that we obtained in the previous step is sure to have some characters (Numbers/Alphabets) written on it. So, we can perform OCR (Optical Character Recognition) on it to detect the number

Python-tesseract:
Py-tesseract is an optical character recognition (OCR) tool for python. That is, it%u2019ll recognize and read the text embedded in images. Python-tesseract is a wrapper for Google s Tesseract-OCR Engine. It is also used as an individual script, because it can read all image types like jpeg, png, gif, bmp, tiff, etc. Additionally, if used as a script, Python-tesseract will print the recognized text rather than writing it to a file. It has ability to recognize more than 100 languages.
Installation:
pip install pytesseract

OpenCV:
OpenCV is an open source computer vision library. The library has more than 2500 optimized algorithms. These algorithms are often used to search and recognize faces, identify objects, recognize scenery and generate markers to overlay images using augmented reality, etc.

Installation:
pip install opencv-python

Note: make sure you installed pytesseract and OpenCV-python modules properly
Note: you should have the dataset ready and all images should be as shown below in image processing techniques for best performance; dataset folder should be in same folder as you are writing this python code in or you will have to specify the path to dataset manually wherever needed.

Procedure:
# Loading the required python modules 
          import pytesseract    # this is tesseract module 
          import matplotlib.pyplot as plt 
          import cv2      # this is opencv module 
          import glob 
          import os
Note: the name of image files has to be the exact number in respective license plate image. example: if you have a with license plate having number as FTY349U then name the image file as FTY349U

Code: Perform OCR using the Tesseract Engine on license plates

# specify path to the license plate images folder as shown below 
path_for_license_plates = os.getcwd() + "/license-plates/**/*.jpg"
list_license_plates = [] 
predicted_license_plates = [] 
  
for path_to_license_plate in glob.glob(path_for_license_plates, recursive = True): 
      
    license_plate_file = path_to_license_plate.split("/")[-1] 
    license_plate, _ = os.path.splitext(license_plate_file) 
    ''' 
    Here we append the actual license plate to a list 
    '''
    list_license_plates.append(license_plate) 
      
    ''' 
    Read each license plate image file using openCV 
    '''
    img = cv2.imread(path_to_license_plate) 
      
    ''' 
    We then pass each license plate image file 
    to the Tesseract OCR engine using the Python library  
    wrapper for it. We get back predicted_result for  
    license plate. We append the predicted_result in a 
    list and compare it with the original the license plate 
    '''
    predicted_result = pytesseract.image_to_string(img, lang ='eng', 
    config ='--oem 3 --psm 6 -c tessedit_char_whitelist = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789') 
      
    filter_predicted_result = "".join(predicted_result.split()).replace(":", "").replace("-", "") 
    predicted_license_plates.append(filter_predicted_result) 

Now we have the plates predicted but we haven%u2019t seen what is the prediction, so to view the data and prediction we do a bit of visualization as shown below. we are also calculating the accuracy of prediction without using any built-in function.


print("Actual License Plate", " ", "Predicted License Plate", " ", "Accuracy") 
print("--------------------", " ", "-----------------------", " ", "--------") 
  
def calculate_predicted_accuracy(actual_list, predicted_list): 
    for actual_plate, predict_plate in zip(actual_list, predicted_list): 
        accuracy = "0 %"
        num_matches = 0
        if actual_plate == predict_plate: 
            accuracy = "100 %"
        else: 
            if len(actual_plate) == len(predict_plate): 
                for a, p in zip(actual_plate, predict_plate): 
                    if a == p: 
                        num_matches += 1
                accuracy = str(round((num_matches / len(actual_plate)), 2) * 100) 
                accuracy += "%"
        print("     ", actual_plate, " ", predict_plate, "   ", accuracy) 
  
          
calculate_predicted_accuracy(list_license_plates, predicted_license_plates)

Code: Image Processing Techniques

# Read the license plate file and display it 
      test_license_plate = cv2.imread(os.getcwd() + "/license-plates / GWT2180.jpg")   
      plt.imshow(test_license_plate) 
      plt.axis('off')  
      plt.title('GWT2180 license plate')


Image resizing:
Resize the image file by a factor of 2x in both the horizontal and vertical directions using cv2.resize

resize_test_license_plate = cv2.resize( 
    test_license_plate, None, fx = 2, fy = 2,  
    interpolation = cv2.INTER_CUBIC)

Converting to Gray-scale: Next, we convert our resized image file to gray scale to optimize the detection and reduce the amount of colors present in image drastically which will help in the detection of license plates easily.

grayscale_resize_test_license_plate = cv2.cvtColor( 
    resize_test_license_plate, cv2.COLOR_BGR2GRAY) 

Denoising the Image:
Gaussian Blur is a technique for denoising images. it makes the edges more clearer and smoother which in-turn makes the characters more readable.

gaussian_blur_license_plate = cv2.GaussianBlur( 
    grayscale_resize_test_license_plate, (5, 5), 0) 
Now, pass the transformed license plate file to the Tesseract OCR engine and see the predicted result.


new_predicted_result_GWT2180 = pytesseract.image_to_string(gaussian_blur_license_plate, lang ='eng', 
config ='--oem 3 -l eng --psm 6 -c tessedit_char_whitelist = ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789') 
filter_new_predicted_result_GWT2180 = "".join(new_predicted_result_GWT2180.split()).replace(":", "").replace("-", "") 
print(filter_new_predicted_result_GWT2180)
Output:
GWT2180 

Similarly, do this image processing for all other number plates that did not  get 100% accuracy. 



Comments