WWDC Highlights Part 3 – Not Banana App Using Core ML

This  WWDC series  highlights new technologies and insights I took away from WWDC 2017.  My second post explored new drag and drop API. In this post, I demonstrate how easy it is to use the new Core ML framework

The Silicon Valleynot hotdog” reference was used many times at WWDC while showing off Core ML. You are probably tired of the reference, but I wanted to show just how easy it is to create an app using Core ML along with a device’s camera to predict what objects are being captured. Since we all have a fondness for bananas around here, I decided to create a “Not Banana” App using Core ML – Apple’s new machine learning framework.

Let’s get started. Create a new Single View App and limit device orientation to portrait (no need for landscape in this app).

Camera Setup

Since we will be using the user’s camera, we need to start by adding a camera usage description to the app’s Info.plist.

Next, we need to authorize the app to use the camera. We can create the following method that can be called from ViewController.viewDidLoad().

To start displaying camera frames, we need to setup an AVCaptureSession and add the back camera as input.

For displaying camera output in the UI, we will use an AVCaptureVideoPreviewLayer. We can add this in ViewController.viewDidLoad() as well.

Note: If you do not want the device to go idle while using the app, you can add the following.

Then, we need to start the capture session. ViewController.viewDidLoad() should look something like this.

If you launch the app on a device with a back camera, a permissions dialog should be displayed. After authorizing the app to use the camera it should display your app full screen. Now that we are displaying the camera to the user, we need to get a CVPixelBuffer from the capture session using AVCaptureVideoDataOutput.

For this app, we will only be using one AVCaptureVideoDataOutputSampleBufferDelegate method (to grab the sample buffer output).

Core ML Integration

Now that we have a pixel buffer, we can start the Core ML integration. We first need a Core ML model that we will use to predict what objects are in images we pass. Apple has listed a few on its developer site. For this app, I chose the ResNet50 model. Simply download the model and drag it into the project—make sure it has been added under Target Membership. Once added, we can use it to create an objectModel that we can use for predictions.

If we try to get the prediction directly from the pixelBuffer in our AVCaptureVideoDataOutputSampleBufferDelegate method, you will see an error in the logs. Something like “Image is not valid width 224, instead is 1920”. The solution for this is that the Resnet50 model requires a 224×224 image to be passed as the input. There are a few different ways to resize the pixel buffer. I chose to convert it to a CIImage, resize and then render it back to a CVPixelBuffer. I added this extension.

And using it in our delegate method.

For the final todo, we could simply print the results from the Resnet50 prediction, but since this app is all about detecting bananas, let’s add an image view that swaps between a banana image and a not banana image.

Not Banana CoreML

I set the banana image as the highlight image and the not banana image as the normal state. I also added a visual effect view to cover the area of the screen that is not in the 1:1 image that will be processed by the model. Finally, I created a method that updates the banana image view highlight state based on the Resnet50 prediction of the objects in a given CVPixelBuffer.

Replace the final todo.

Launch the app. 

Not Banana CoreML App

Pretty cool what you can do with minimal effort using Core ML! Check out these videos to learn more about using Core ML



Subscribe to our Blog for more WWDC

Kyle Balogh

Kyle is a mobile software engineer with experience in iOS and Android dating back to 2008. He has also dabbled in the world of automated test tools. When not spending time with his family he is learning new technologies to solve interesting problems.

Deliver off-the-chart results.