When Haidilao Hotpot Maze Meets Image Processing

Image Processing, Computer Vision and AI + IoT + 5G are going to and have been changing our life. They should not be something mysterious but something be applied to daily life.

English

中文啦

This is an article inspired by a free side dish + my curious soul.

I will divide my article into 2 sections: the fun part and the technical part.

The Fun Part will show you what this project does and how it looks like. That's it, simple and short :) but if you were as curious as me and would like to know the details of the implementation then keep reading.

In the Technical Part, I will treat it as my technical note to remind me what I did for solving the problems and why I did that.

The Fun Part

It was a Sunday afternoon, we had been cooking for ages and were sick of doing so... finally we decided to try the expensive hotpot Haidilao. The Haidilao is famous about its premium services and expensive cost.

I know they offer nail and hand skin-care services to customers in waiting queue and yes I actually got the service :)

我的目的很单纯,就是想那一盘免费的菜~ 没想到能扯这么多出来~

我会把本文分成两个部分: 看热闹 和 从技术角度谈如何实现这个小项目。

看热闹部分会展示这个项目看起来长什么样,能做什么。就是这么短小精悍。但如果你和我一样好奇 想它怎么实现的,请继续阅读技术部分。

技术部分对我来说是相当于笔记,记录下来我做了什么来解决这个问题,还有我为什么这么做。这里不讨论具体如何做。

一个风和日丽的周日, 在家做饭许久的我们,被困在澳洲不能回国已一年有余,生活是如此单调。。。最终我们决定去吃饭,试一试听说了许久但从来没舍得吃的 海~底~捞。都知道海底捞服务一流,排队送美甲服务、手部护理。我也去试一试~!

nail service

But what I did not know was that they also offered me a little maze to play to kill the waiting time. What's even better was if I got it solved while I was still in the waiting queue, they would reward me with a free dish 😊!

maze

到了以后 果然如此!但我不知道的是,排队等候的时候我们还拿到了两张 小纸片,上面印有迷宫。解出来就有菜吃~! 嘿嘿黑,这不是送分题吗~太简单了~

不到10分钟, 我就被莉莉打败了。。。。好吧 咳咳,如果我不能打败你, 我就编个程序来战胜你。 从照片中提取 迷宫,看起来很有趣~!而且这正好是计算机图像处理的领域,也是我的最爱~

我们先看来来整个程序是怎样的。

A piece of cake~!

Within 10 minutes, Lily beat me.... huh.... okay, if I couldn't beat you, I could make a program to beat you. Extracting a maze from a photo looks like a good fit for Image Processing!

Here is what it looks like~!

简单介绍一下: 点击猪哥亮 选择你拍摄的 迷宫照片。(小窍门: 保持迷宫在照片的正中间,确保迷宫的外壳(不是那条它外面的壳)是最外面一层的。让迷宫纸片尽量摊平,和背景区分开来。)剩下的就交给Haidilao Maze Buster来做吧~

如果你也碰巧要去吃海底捞,下面就是这个app的网站,可以试一试,但不保证管用 :)

As you can see here, the app takes a photo and extracts the maze inside the photo and solves the maze problem.

If you were going to have your lovely Haidilao Hotpot, then I have also made it available for you :)


>>>Haidilao Maze Buster<<<



技术部分

视频看起来挺短,做解迷宫的部分也就花了2、3天, 但是为了让它更稳定,并且找到合适的免费hosting服务 我找了估计至少有3-4周。。。。

现在它仍有一些不足,比如说:如果箭头堵住迷宫出入口的话 就无法识别出进出口。不过我已经很满意当前的表现了, 毕竟我还有很多其他事情要做, 没法投入太多时间。具体有哪些限制,下面会提到。

是我怎样实现它的 大体步骤:

👉 从照片中提取出迷宫并且让迷宫正直的面对我们

👉 找到出入口

👉 找出迷宫路经

接下来 具体阐述如果实现迷宫提取和 矫正迷宫

这是细分的步骤:

  1. 对照片做白平衡
  2. 标出低饱和度的区域(因为 碰巧发现迷宫就在这个区域)
  3. 运用 Morphology Close/Open来擦掉迷宫里面的具体信息 只留下一片空白区域
  4. 找到图中最外面一层的轮廓(这是其中的一个限制)
  5. 裁减掉除迷宫外的其他区域(这个时候迷宫纸还在)这样可以减少接下来的计算量
  6. 二值化图像,让图像非黑(0)即白(255) 来进一步减少运算量 并且也有助于下一步
  7. 从图像中间的一部分区域获取种子图像(这就是为什么 迷宫必须在照片中间)然后运用Morphological Reconstruction来提取迷宫(这个时候迷宫纸就没有了,只剩下纯净的迷宫,但纸所占的区域还在裁减后的图像里)
  8. 颠倒黑白像素值
  9. 获取迷宫的 凸包
  10. 使用获得的凸包找到它的4个角
  11. 再次 裁减 迷宫,整个图就只剩迷宫
  12. 如果迷宫角度不正,旋转迷宫让它摆正位置

值得指出的是白平衡真太重要啦~!我感觉我浪费了2周的时间就是因为没有做这一步。 图像处理的白平衡,我自己的感觉是,就像 AI里面的 Normalization 一样。如果不做这一步,同样的代码,这张图可以,灯光稍微改变一些 就会导致另一张图就不行。

为什么这么说:因为如果灯光的亮度,温度,等变化了(对我们人眼都没影响)但是对于计算机来说就是完全不一样了。

来看一下白平衡后的照片和原照片的对比

The technical part

The demo video looks pretty short, to implement it from the scratch, it only took me 2 or 3 days, but... to perfect it and make it available online actually cost me 3 or 4 weeks....

After couple of weeks working on it, it's still not perfect but good enough to satisfy my requirements, so don't expect it to be a commercial project which works for each and every scenario, I have to admit that it has its limitations. I will explain what the limitations are in the following paragraphs.

So here is the procedure how I got it implemented:

👉 Extract the maze out of the photo and warp/rotate it to face directly to us

👉 Find its 2 entrances, the Start and the Exit

👉 Find the path :)

Now, let's talk about how to extract the maze out of the photo and get it face straight to us

There are certain steps to do:

  1. Get the white balanced image first~!!
  2. Mask the low saturation area, because the maze area happens to be the low saturation area,
  3. Use Morphology Close/Open to remove details of the maze so that we only get a big block of white area where the maze is located,
  4. Find the most outside contour of the image(now you know one of the limitations),
  5. Crop the image so that it only contains the maze area and thus reduces the computation later on,
  6. Threshold the image, so that only 0 and 255 are left to the values of pixels, it simplifies the following processing,
  7. Use a seed image which is created by taking the center part of the maze image and apply Morphological Reconstruction to extract the maze,
  8. Invert the extracted maze to be white wall and black path,
  9. Get the convex hull of the extracted maze,
  10. Use the convex hull to get its corners,
  11. Then crop the image again to extract the exact maze,
  12. Warp the maze if it's rotated and get it ready for the next step.

I cannot stress it enough that how important of white balance is. I can tell you that 2 weeks of my precious time was wasted because I did not do the White Balance. To my understanding, it behaves like Normalization in Computer Vision for AI. Without doing it, your code would be fragile and endless frustration is waiting ahead.

What did I mean by this?

Any changes of lighting, the human eyes cannot event notice the differences, will jeopardize the result and breaks the program. I was scratching my head to try to figure out why it works for one image but not for another.

To give you an idea of how the white balanced image look like:

As you can see above, it does change the look and feel of the image by equalizing the Red, Green and Blue channels.

This is very helpful, as it makes the rest of the code free from being affected by any changes of lighting, so the code is more stable.

I wasn't thinking of using HSV color space at first as RGB is the most popular color space. But by operating in the HSV, I found out that the maze area is always the darkest area.

可以看出,图片的温度好像改变了,这是通过 均衡化 红绿蓝三色通道达到的。 这一步非常重要,它能让接下来的代码不会受到任何由于光改变所带来的影响,所以代码就会更加可靠一些。 我起先并没有想到用HSV颜色空间,因为大部分时间都是在RGB空间下工作。 但通过观察HSV空间下的 迷宫,我发现 迷宫区域总是最暗的一部分。

This is great, as it enables me to easily pick the maze paper out of the whole photo(Mask).

As I mentioned above, by also applying morphological close and open I can erase out the rest image and leave a big block of white area to indicate where the maze is located:

这就太棒啦!因为这样我就能轻松的提取出迷宫部分而不需要做各种复杂的图像处理(图Mask)。之后再用Morphology Close/Open(MC/MO)来擦除迷宫内部细碎的细节和周边的噪音点。

The morphological close "eats" the surrounded black pixels, so walls inside are removed see Mask1, then I apply "open" to connected the isolated black holes together just to make it less "noises" see Mask2.

With the mask2, we can now find out biggest and out-most contour and use it to crop the image: cut image.

Then we simplify the image further to only allow it to have 2 values, black(0) and white(255): thresholdimgrec2

简单解释一下 为什么要用 MC/MO, 因为MC能吃掉碎碎的黑色部分,参见图Mask1.然后用MO处理一下,可以进一步减少分离的不链接的细小黑色块(图Mask2)。

Mask2我们就能找到大概的外框,然后用它来裁减出来迷宫的部分 参见图 Cut Image

为了进一步简化计算量,我用二值化处理Cut Image,得到Thresholdimgrec2.

With the image: thresholdimgrec2, we can now apply a new operation called Morphological Reconstruction(MR) to the image to further extract the maze from its paper.

👉 You can think of reconstruction as a way to isolate the connected regions of an image. For dilation(which is the default mode of the MR), reconstruction connects regions marked by local maxima in the seed image: neighboring pixels less-than-or-equal-to those seeds are connected to the seeded region. Local maxima with values larger than the seed image will get truncated to the seed value.

Once the exact maze is extracted, we can get its convex hull which will be used to get the locations of maze corners.

使用thresholdimgrec2, 再配合 Morphological Reconstruction(MR) 来进一步提取迷宫,消除周边不需要的图像。

MR可以想象成一种能够帮助你把中相连的区域从其他区域中里出来的操作。这个操作需要用到原图(图Thresholdimgrec2)中的一部分作为种子图像(图Cut Image),以种子图像作为起始 向外蔓延扩张,任何相连的像素都会被保留下来,不相连的则会被丢弃。所以我们进一步提取出了更好的 迷宫图(图Rec3)

一旦有了Rec3,那么就能很容易的求出它的凸包(图Convex Hull)

The convex hull does not look perfect, because that there is an arrow for exit point which was not removed by the previous steps, but don't worry, it will be removed eventually.

Up until now, we have gotten the extracted maze, but it looks like rotated a bit.

Luckily, in OpenCV, there is a method called warpPerspective() which can help us to rotate the maze.

虽然凸包图看起来并不完美,但是没关系 后面的处理会忽略掉不对的地方。 到目前为止,迷宫 已经被 提取出来了,但是。。。。它看起来有些 斜了。幸运的是 OpenCV里有一个函数warpPerspective() 能帮我们矫正它

After the maze is extracted, I would like my code to detect the 2 entrances automatically

Is it possible?

I didn't design this part initially, as I thought users could simply click the screen to point out the Start and Exit points. But I found it's not user-friendly, so I will make my code to do it.

Here are the procedures:

  1. Running through the 4 edges and "Ray-casting" until it hits a wall, through this step we can get the idea of where the "mountains" or the entrances are,
  2. Pick up the Himalaya mountain in each side,
  3. Get the middle point of the Himalaya peak,
  4. Push the point a bit inwards so the later on code does not crash.

The idea of ray-casting on each side is very interesting. Imagining a beam of light cast on one edge of the maze and it bounces back. The way it works feels like bats using ultrasound to detect obstacles. I then draw graph to show the distance between the source of light(let's say it's distance 0) and how far it goes until it hits a wall. Once I have run the ray-casting on each side of the maze, the graph looks like this:

迷宫提取后,能不能自动找进出口呢?

这个很典型视觉问题,那么就应该能用 图像处理技术来解决。 虽然我一开始并没有设计这个部分,因为我觉得用户可以自己用手指点出出入口,但是我发现好像比较麻烦。所以还是用代码来解决吧~

我是这么做的:

  1. 从位置0的地方(迷宫外面)垂直于迷宫的外墙发射一激光直到它碰到墙壁反射回来,记录它从发射点到碰壁的距离,重复做这一步从上到下(对于迷宫左右边)从(对于迷宫上下边)。
  2. 找出 那一边上的 喜马拉雅(如果有的话),
  3. 算出喜马拉雅正中间的位置
  4. 把那一点往迷宫里面推一些(这样不会让代码崩溃,如果给的位置太靠近迷宫边缘使用的代码Scikit-mpe库会崩溃)

激光射线这个算法 很有意思, 通过它就可以知道哪里有“缺口” 也就是出入口。这种感觉就好像 蝙蝠用超声波来定位障碍物一样。 来看:

It looks beautiful~! On the sides which have no entrance, it's a flat line, but for the sides with entrance, the peak not only shows how far it goes until it bounces back but also shows where the location of entrance is. If you took a closer look at the X axis on the graph Left and Right, the values of the peaks on X axis collocate perfectly with their entrances. This is exactly what I want.

By combining how far the light goes on the Y axis and the value of the peaks on X axis, the location (x,y) of the entrance can be found.

The case above is a perfect case, but sometimes, a maze could be warped because that it is bent by the hand holding it. Let me show you the case:

这4副图看起来真是棒啦!它们就是我想像中的样子~! 对于没有出入口的边沿来说(top, bottom) 他们的线是平的,代表距离都是0.而对于有出入口的边沿(Left, Right) 我们能够清晰的看到它们有一个 高峰! 高峰在X轴上对应的位置就是它们在图像中Y的值,而高峰的高度就是那个入口有多深,这样我们就知道可以把 起点和终点向里面对多少。 这样一来 我就得到了起点和终点的 坐标值。棒~!

上面的这个 迷宫正好没有任何扭曲,所以是个很完美的例子。但是有时候 你手拿住迷宫纸的时候 会把迷宫掰弯,所以就可能会在图表中看到多个“山峰”。 我们来看一个例子:

In this case, there are more than one peak on the bottom side:

肉眼可见,这个迷宫的下面被我的手给掰弯了进去,所以 当从底部发射射线的时候 将会有多个“山峰”

被难题困住的时候很无聊啊, 所以就给这些关键部分起有意思又形象的名字。像在图Bottom中的凸起部分 我都叫做 “山峰”, 而最高的那一座“山峰” 我叫它“喜马拉雅”。

Stuck in some problems is boring, so in order to entertain myself, I start to call those peaks mountains and the highest mountain is called Himalaya.

So clearly, if there are multiple mountains, we only want the Himalaya, as the Mountain 1 could happen to be just a bent part on a edge.

Now if we drew the Start and Exit points on the maze, it should look like this:

如果多座山峰同时存在,那么我就只对最高峰感兴趣(真是人类的通性啊,总是喜欢高耸的山峰)。现在 把起点和终点在图片中画出!

说到这里了 文章已经进行了80%了,但是解决迷宫的部分都还没开始,真是感慨啊~

我发现,大多数时间当我准备解决一个问题的时候,90%的时间和精力并没有花在解决要解决的问题上,而是花在了在为要解决的问题做铺垫、打基础和让程序变得更稳定的事情上。

终于到了要解决迷宫路经的问题上啦~!

当我找到了 起始点和终点 还有矫正了旋转的图像后,是时候开始解迷宫啦!

多谢Scikit-MPE这个代码库, 它提供了一套解决迷宫的方案,所以我所要做的便是 调用它的代码库 传入处理好的迷宫图像还有起点和终点,这个代码库便能找出一条迷宫路经。

你可能会很失望因为我完全没有讲解迷宫的算法,但是网上这样的算法应该是多如牛毛,我就不再费时间去自己研究啦。毕竟伟大的人要学会如何利用已有资源去创造和搭建更多更伟大的项目!😁

最后 分享几个免费的Python 服务器网站。小看这个,我为了找能用的free hosting花了 估计至少有2周的时间!

PythonAnywhere: 还行啦~ 界面很老套,如果你的项目不要求额外的配置,那么这个平台还是可以的。但如果像我的网站需要额外的配置,那么它这边不是太灵活。

Microsoft Azure: 它有免费的服务,也有SSH 接口,但是当我登陆进去 没几分钟链接就断开了。升级到$35/月的服务后 仍旧是这个效果。太差!

Heroku: 它允许我运行一些bash 命令,但是限制非常多。我知道它支持Docker,我等下次再试试看啦~

Oracle Cloud: 有免费的服务,可以host Python website,最棒的是 它提供SSH到虚拟机。你还能选择虚拟机的操作系统。这简直太棒啦~!就是给了你一台电脑啊~!绝对的赢家!

还要多谢网上给予我帮助的社,没有他们的回答 我也许做不出来这个项目。

As you can see that initially what I wanted to do was just to solve the maze by finding a path going through it, but up until now, it's already 80% of the article, this part has not even started yet. So by going through these many steps, I realized that

👉 Most of the time when I am trying to solve a particular problem, 90% of my time is not spent on solving the problem itself, instead it is spent on getting the problem ready to be solved and how to perfect my solution.

What a lesson~! 🤔

Now, finally let's solve the maze~!

With the start and exit entrances and the clean and "corrected" maze, we are ready to solve the problem.

Thanks to the Scikit-MPE, it provides a set of code to solve the maze problem. 👍 So all I need to do is to provide the maze image with locations of Start and Exit entrances.

I am sorry to disappoint you if you were expecting to see how the maze got solved. But I am not going to re-invent the wheel since solving a maze is a pretty popular topic, so I know there must be heaps of existing code to do so.

Finally, I would like to point out that the free hosting Python environment I use is Oracle Cloud.

Initially, I thought it's easy to find a free hosting service for Python as I know there are so many good cloud platforms for hosting Nodejs, JavaScript projects and Python is so popular. But it turns out, there are not many choices for a free and easy to use Python hosting platform.

Believe or not, I spent 2 weeks on finding the platform. I have tried:

PythonAnywhere: it is okay, but the UI looks a bit old fashion. If your project requires installing something outside Python's package system, it's a headache.

Microsoft Azure: It has free tier, but when I ssh into its VM, my connection was cut off every a few minutes and whether I can re-connect to the VM again really depends on my luck. I then changed to a plan which cost me $35/month (not a free tier but a developer/testing environment), the problem persists.

Heroku: I can run some bash command, but it is really limited. I know it also supports Docker. I probably would give it another try next time.

Oracle Cloud: the one which provides SSH access from my local console. So I can do everything like what I could on a computer. It also provides all sorts of OS image, I picked up Ubuntu as I am familiar with it. It is the clear winner.

Finally, I woud like to appreciate the help I got from the online community, without their help I cannot finish this hobby project.