Augmented Reality Sudoku Solver: Extracting Sudoku from Image
This article is a continuation of the Augmented Reality Sudoku Solver, we are going to build a GUI-based AR Sudoku Solver. In this article, we talk about the image processing pipeline of the project and how to implement our approach in Python using OpenCV. We discuss the pipeline in various steps to simplify our understanding of the code.
We aim to solve any sudoku of the N dimension, where N is a non-prime. The project will be implemented in two forms -
- An option to load an image saved on the system or use the webcam to feed the image to the program and then play the game on the system.
- An option to use Augmented Reality and solve the sudoku by showing the puzzle to the webcam.
The project uses Dancing Links in the form of Algorithm X to find the solution of the Sudoku puzzle. Sudoku is a well-known NP-Complete problem and Algorithm X is a means of implementing a form of greedy depth-first search to find the appropriate solution. The project will be blogged in 4 parts -
- Part I — Understanding the Sudoku Solver i.e. the Algorithm that goes into solving the Sudoku.
- Part II — Processing Image from Camera to be able to extract the grid of the Sudoku.
- Part III — Processing the Image and the respective model to detect the numerical values in each cell.
- Part IV — Building the GUI using PyGame.
Part II: Processing the image from the camera —
For processing the image, we use the python OpenCV library. It is a simple step by step approach in the following order -
- Detect the outer grid of the sudoku and return the area inside this grid.
- Apply warp perspective transformation by finding the corners of the contour to convert the sudoku from being part of the image to the complete image itself.
- Get the dimensions of the sudoku, i.e. extract the number of rows and columns in a sub-box of the puzzle.
- Get the final matrix, by applying a convolution neural network model on each cell to extract the digit value.
Sudoku Image Processing Class -
Here, we first initialize the class with the following variables -
- Image: Numpy N-Dimensional Array containing the image data is being loaded from the camera.
- Fname: The filename of the image, if being loaded from a file.
- Game_Size: This is the total number of cells in a row or column of the sudoku.
- Box_Row: This is the number of rows in a box of the sudoku.
- Box_Col: This is the number of columns in a box of the sudoku.
Step 1: Detect the outer sudoku grid -
The method “get_grid” finds the largest contour and masks the area around the image with white pixels. Here we do the following -
We first convert the image to grayscale. Then we apply the Gaussian Blur using a kernel of (5,5) with zero deviation in the x-axis direction. We then apply adaptive thresholding which is used to find threshold factors for each local region of the image and accordingly apply thresholding. You can know more about adaptive thresholding here.
In order to find the outer edges in the image, we need to find the contours in the image. A contour refers to the outline or silhouette of an object which in our case is the outline of the Sudoku Puzzle.
We use the findContours method of the OpenCV library, here we first pass the image and then cv2.RETR_TREE is used to compute the hierarchal relationship between the contours of the image. cv2.CV_CHAIN_APPROX_SIMPLE is used to compress the contours in order to save space. From these contours, we then find the contour with the maximum area using cv2.contourArea as the key.
After finding the largest contour we perform a sanity check to ensure that the area of the largest contour is greater than 250x250 pixels. Next, we generate two masks one a black mask with white as the inner region of the selected contour and the other a white mask with white as the outer region of the selected contour. Then we copy the white value positions of the original gray image onto the white mask with the outer white region to get the below images. This area is our Region of Interest or ROI.
We notice that the ROI Image is slightly skewed and does not fit the complete image frame, i.e. the perspective of the screen is not out of planar perspective. Ideally, we would want to have a top-down, birds-eye-view of the Puzzle.
Step 2: Apply warp perspective transformation -
The method “get_warped” utilize cv2.getPerspectiveTransform and cv2.warpPerspective to accomplish these transformations.
We first perform a sanity check to endure that the previous step was successful and we have an image with our ROI. We then find the perimeter of the contour surrounding the puzzle using cv2.arcLength. Since we know that the box will be a rectangle, we know that it will have four vertices which find using the cv2.approxPolyDP. In order to approximate a contour, we use 1.5% approximation precision of the perimeter of the contour.
After finding the corner coordinates of this image, we approximate the positions of the corner coordinates of the output warped image. To perform the perspective transformation, we need a transformation matrix which is calculated using the cv2.getPerspective transformation by passing the coordinates of the puzzle box from the original image, followed by the four points we specified for our output image. We then apply the transformation using cv2.warpPerspective.
Step 3: Get the dimensions of the sudoku -
To get the dimensions using the “get_dimensions” method, we first threshold the image using Otsu’s Binarization. Post binarization, we are left with an image consisting of zeros and ones only. Now we apply simple matrix operations to deduce the sub-grid dimension.
To get the dimensions, consider an image of 22x22 pixels post-binarization as displayed below.
Here, ⬛
means a pixel of value 0 and ⬜
means a pixel value of 1.
⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
⬛🔲⬛⬛🔲⬛🔲🔲⬛🔲⬛⬛⬛⬛⬛🔲⬛⬛⬛⬛⬛⬛
⬛⬛🔲⬛🔲⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛🔲⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛🔲⬛⬛⬛⬛🔲🔲⬛🔲⬛⬛⬛🔲⬛⬛⬛
🔲⬛🔲⬛⬛🔲🔲🔲⬛🔲⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛⬛🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲⬛🔲🔲🔲🔲🔲🔲🔲🔲🔲
🔲🔲⬛🔲🔲🔲🔲⬛🔲🔲🔲🔲🔲🔲⬛🔲⬛🔲🔲⬛🔲🔲
⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛🔲⬛🔲🔲⬛⬛⬛⬛⬛⬛
⬛🔲⬛⬛⬛⬛🔲🔲⬛⬛🔲⬛⬛⬛🔲🔲⬛🔲⬛⬛⬛⬛
⬛⬛⬛⬛🔲⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲⬛⬛⬛⬛🔲⬛⬛
⬛⬛⬛⬛⬛⬛⬛🔲⬛⬛⬛🔲⬛⬛🔲🔲⬛⬛⬛⬛⬛🔲
⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛⬛🔲⬛⬛⬛⬛⬛⬛
🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲⬛🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲
🔲🔲⬛🔲🔲🔲🔲⬛🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲🔲
⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
⬛🔲⬛⬛⬛⬛🔲🔲⬛🔲⬛⬛⬛⬛⬛🔲⬛⬛⬛⬛⬛🔲
⬛⬛⬛⬛🔲⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛🔲⬛🔲⬛⬛⬛🔲⬛⬛🔲⬛⬛🔲🔲⬛⬛⬛
⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛🔲🔲⬛⬛⬛⬛⬛⬛
The white pixels in each sub-grid or the black pixels in the white grid lines can be treated as noise. We then find the sum of the image matrix across the vertical and the horizontal axis.
⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 4
⬛🔲⬛⬛🔲⬛ 🔲🔲 ⬛🔲⬛⬛⬛⬛ ⬛🔲 ⬛⬛⬛⬛⬛⬛ 6
⬛⬛🔲⬛🔲⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛🔲⬛⬛⬛⬛ 7
⬛⬛⬛⬛⬛⬛ 🔲⬛ ⬛⬛⬛🔲🔲⬛ 🔲⬛ ⬛⬛🔲⬛⬛⬛ 5 H
🔲⬛🔲⬛⬛🔲 🔲🔲 ⬛🔲⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 8 o
⬛⬛⬛⬛⬛⬛ ⬛🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 3 r
🔲🔲🔲🔲🔲🔲 🔲🔲 🔲🔲🔲🔲⬛🔲 🔲🔲 🔲🔲🔲🔲🔲🔲 21 i
🔲🔲⬛🔲🔲🔲 🔲⬛ 🔲🔲🔲🔲🔲🔲 ⬛🔲 ⬛🔲🔲⬛🔲🔲 17 z
⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 4 o
⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛🔲⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 5 n
⬛🔲⬛⬛⬛⬛ 🔲🔲 ⬛⬛🔲⬛⬛⬛ 🔲🔲 ⬛🔲⬛⬛⬛⬛ 7 t
⬛⬛⬛⬛🔲⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲⬛ ⬛⬛⬛🔲⬛⬛ 5 a
⬛⬛⬛⬛⬛⬛ ⬛🔲 ⬛⬛⬛🔲⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛🔲 5 l
⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ ⬛🔲 ⬛⬛⬛⬛⬛⬛ 3
🔲🔲🔲🔲🔲🔲 🔲🔲 🔲🔲⬛🔲🔲🔲 🔲🔲 🔲🔲🔲🔲🔲🔲 21 S
🔲🔲⬛🔲🔲🔲 🔲⬛ 🔲🔲🔲🔲🔲🔲 🔲🔲 🔲🔲🔲🔲🔲🔲 20 u
⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 4 m
⬛🔲⬛⬛⬛⬛ 🔲🔲 ⬛🔲⬛⬛⬛⬛ ⬛🔲 ⬛⬛⬛⬛⬛🔲 6
⬛⬛⬛⬛🔲⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 5
⬛⬛⬛⬛⬛🔲 ⬛🔲 ⬛⬛⬛🔲⬛⬛ 🔲⬛ ⬛🔲🔲⬛⬛⬛ 6
⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 4
⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 🔲🔲 ⬛⬛⬛⬛⬛⬛ 4
5 7 4 4 8 6 19 19 4 7 4 7 5 4 18 19 3 7 6 4 4 6
Vertical Sum
The sum of the axes are then checked if the value of any row or column sum is greater than 2/3rd the width or height respectively. Then in a new array, it is marked as 1, else 0.
Height of Image = 22
Width of Image = 22
66% of Height = 15
66% of Width = 15
Vertical Sum = [5 7 4 4 8 6 19 19 4 7 4 7 5 4 18 19 3 7 6 4 4 6] Horizontal Sum = [4 6 7 5 8 3 21 17 4 5 7 5 5 3 21 20 4 6 5 6 4 4] Vertical Bool = [0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0]
Horizontal Bool = [0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0] Total number of 0 to 1 Transitions in Vertical Bool = 2
Total number of 0 to 1 Transitions in Horizontal Bool = 2
Step 4a): Pre-Process Digit Cells -
The “preprocess_digit” method is used to pre-process each cell in the puzzle for further detection using the convolutional neural networks. We first resize the image to 112x112 pixels. We then apply gaussian blur and threshold the image to convert the values to either 0 or 1. We then clear the border to ensure noise removal from the edges. We again resize the image back to 28x28 to make it suitable for our digit recognition model.
Next, we check if the number of white pixels in the cell image is less than 10, if so we assume that the cell is empty and return None. Then we scale the pixels with values less than 150 to 75% of their value and the rest to double their value.
Step 4b): Get the Final Matrix -
Here, we first get the warped image and the dimensions of the sudoku puzzle. Next, we initialize an empty matrix and load our trained model file. We then determine the width and height of each cell and iterate over each cell. Once we have done the preprocessing of the cell, we input it into our model to find the digit value of the cell.
We then perform a sanity check to ensure that no other cell in the same row, column, or box has the same value. In case it does, we find out the prediction score for that cell as well. The one with the higher prediction score gets the value and the other gets its second-highest prediction score value.
Plot on Image -
The “plot_on_image” method is used to draw the solution over the image. We perform all the previously mentioned steps to reach the warped image stage. Then we iterate over each cell image and use cv2.putText to draw the value in the cell.
Note: We have plotted our solution over the warped image, so to transform this image back to the original image, we first calculate the matrix inverse of the warping transformation matrix we got from cv2.getPerspective and use the cv2.warpPerspective to apply the inverse of the initial transformation to get it back to the original image positions.
You can find the complete python implementation here. If you found this article helpful, please follow me on Medium and GitHub and star the Project Repository.
You can find the complete implementation explained in detail for this project here.
Do check out other parts of this article series -