Infragistics WPF controls

Machine Learning tutorial: How to create a decision tree in RapidMiner using the Titanic passenger data set

 

 

Greetings! And welcome to another wam bam, thank you ma'am, mind blowing, flex showing, machine learning tutorial here at refactorthis.net!

This tutorial is based on a machine learning toolkit called RapidMiner by RapidI.  RapidMiner is a full featured Java based open source machine learning toolkit with support for all of the popular machine learning algorithms used in data analytics today.  The library supports supports the following machine learning algorithms (to name a few):

  • k-NN
  • Naive Bayes (kernel)
  • Decision Tree (Weight-based, Multiway)
  • Decision Stump
  • Random Tree
  • Random Forest
  • Neural Networks
  • Perception
  • Linear Regression
  • Polynomial Regression
  • Vector Linear Regression
  • Gaussian Process
  • Support Vector Machine (Linear, Evolutionary, PSO)
  • Additive Regression
  • Relative Regression
  • k-Means (kernel, fast)
  • And much much more!!
Excited yet?  I thought so!

How to create a decision tree using RapidMiner

When I first ran across screen shots of RapidMiner online, I thought to myself, "Oh boy.. I wonder how much this is going to cost...".  The UI looked so amazing.  It's like Visual Studio for Data Mining and Machine learning!  Much to my surprise, I found out that the application is open source and free!

Here is a quote from the RapidMiner site:

RapidMiner is unquestionably the world-leading open-source system for data mining. It is available as a stand-alone application for data analysis and as a data mining engine for the integration into own products. Thousands of applications of RapidMiner in more than 40 countries give their users a competitive edge.

I've been trying some machine learning "challenges" recently to sharpen my skills as a data scientist, and I decided to use RapidMiner to tackle the kaggle.com machine learning challenge called "Titanic: Machine Learning from Disaster" .  The data set is a CSV file that contains information on many of the passengers of the infamous Titanic voyage.  The goal of the challenge is to take one CSV file containing training data (the training data contains all attributes as well as the label Survived) and a testing data file containing only the attributes (no Survived label) and to predict the Survived label of the testing set based on the training set.

Warning: Although I'm not going to provide the complete solution to this challenge, I warn you, if you are working on this challenge, then you should probably stop reading this tutorial.  I do provide some insights into the survival data found in the training data set.  It's best to try to work the challenge out on your own.  After all, we learn by TRYING, FAILING, TRYING AGAIN, THEN SUCCEEDING.  I'd also like to say that I'm going to do my very best to go easy on the THEORY of this post..  I know that some of my readers like to get straight to the action :)  You have been warned..

 

Why a decision tree?

A decision tree model is a great way to visualize a data set to determine which attributes of a data set influenced a particular classification (label).  A decision tree looks like a tree with branches, flipped upside down..  Perhaps a (cheesy) image will illustrate..

 

After you are finished laughing at my drawing, we may proceed.......  OK

In my example, imagine that we have a data set that has data that is related to lifestyle and heart disease.  Each row has a person, their sex, age, Smoker (y/n), Diet (good/poor), and a label Risk (Less Risk/More Risk).  The data indicates that the biggest influence on Risk turns out to be the Smoker attribute.  Smoker becomes the first branch in our tree.  For Smokers, the next influencial attribute happens to be Age, however, for non smokers, the data indicates that their diet has a bigger influence on the risk.  The tree will branch into two different nodes until the classification os reached or the maximum "depth" that we establish is reached.  So as you can see, a decision tree can be a great way to visualize how a decision is derived based on the attributes in your data.

RapidMiner and data modeling

Ready to see how easy it is to create a prediction model using RapidMiner?  I thought so!

Create a new process

When you are working in RapidMiner, your project is known as a process.  So we will start by running RapidMiner and creating a new process.

 

 

The version of RapidMiner used in this tutorial is version 5.3.  Once the application is open, you will be presented with the following start screen.

 From this screen you will click on New Process

 You are presented with the main user interface for RapidMiner.  One of the most compelling aspects of Rapidminer is it's ease of use and intuitive user interface.  The basic flow of this process is as follows:

  • Import your test and training data from CSV files into your RapidMiner repository.  This can be found in the repository menu under Import CSV file
  • Once your data has been imported into your repository, the datasets can be dragged onto your process surface for you to apply operators
  • You will add your training data to the process
  • Next, you will add your testing data to the process
  • Search the operators for Decision Tree and add the operator
  • In order to use your training data to generate a prediction on your testing data using the Decision Tree model, we will add an "Apply Model" operator to the process.  This operator has an input that you will associate with the output model of your Decision Tree operator.  There is also an input that takes "unlearned" data from the output of your testing dataset.
  • You will attach the outputs of Apply Model to the results connectors on the right side of the process surface.
  • Once you have designed your model, RapidMiner will show you any problems with your process and will offer "Quick fixes" if they exists that you can double click to resolve.  
  • Once all problems have been resolved, you can run your process and you will see the results that you wired up to the results side of the process surface.
  • Here are screenshots of the entire process for your review

 Empty Process

 

Add the training data from the repository by dragging and dropping the dataset that you imported from your CSV file

 

Repeat the process and add the testing data underneath the training data

Now you can search in the operators window for Decision Tree operator.  Add it to your process.

The way that you associate the inputs and outputs of operators and data sets is by clicking on the output of one item and connecting it by clicking on the input of another item.  Here we are connecting the output of the training dataset to the input of the Decision Tree operator.

 

Next we will add the Apply model operator

Then we will create the appropriate connections for the model

Observe the quick fixes in the problems window at the bottom.. you can double click the quick fixes to resolve the issues.

You will be prompted to make a simple decision regarding the problem that was detected.  Once you resolve one problem, other problems may appear.  be sure to resolve all problems so that you can run your process.

Here is the process after resolving all problems.

 

Next, I select the decision tree operator and I adjust the following parameters:

Maximum Depth: change from 20 to 5.

check both boxes to make sure that the tree is not "pruned".

Once this has been done, you can Run your process and observe the results.  Since we connected both the model as well as the labeled result to the output connectors of the process, we are presented with a visual display of our Decision Tree (model) as well as the Test data set with the prediction applied.

(Decision Tree Model)

 

(The example test result set with the predictions applied)

 

As you can see, RapidMiner makes complex data analysis and machine learning tasks extremely easy with very little effort.

This concludes my tutorial on creating Decision Trees in RapidMiner.

Until next time,

 

Buddy James

 



Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List

refactorthis.net | Xamarin 2.0 Product Review - Android and iOS development in C# just got easier
Infragistics JQuery controls

Xamarin 2.0 Product Review - Android and iOS development in C# just got easier

Xamarin 2.0 : The prayers of mobile developers have been answered

 

Hello, and welcome to my latest article on refactorthis.net.  This article is a product review of the latest release by Xamarin called Xamarin 2.0.

The state of mobile development

We are experiencing a revolution unlike any in the history of software development.  Thanks to mobile devices, most of the world's population holds more computing power in their pocket than I could have dreamed about when I first started writing programs back in the days before the internet.  I remember when we dialed into BBS or Bulletin Board Systems which were communities that ran on your neighbor's computer and allowed us to send email and play text based door games to pass the time.  My how times have changed.  And with this change brings a new paradigm for software developers to learn in order to keep up with the now hyper speed pace of devices and technology.  We now live in a world where our TV's have IP addresses, and our children have cell phones.  At the forefront of this new way of software development is Xamarin. 

On 02/20/2013 Xamarin announced the release of Xamarin 2.0.  The Xamarin 2.0 product release includes many ground breaking changes to Xamarin's mobile development tools for Android and iOS development with C#.  The Xamarin 2.0 release includes:

  • A new IDE called Xamarin Studio which is an IDE aimed specifically at mobile development using C#.
  • A new component store with a catalog of free and paid libraries to help developers create better mobile applications faster.
  • A new pricing tier which includes a free starter edition for developers starting out with mobile development.
  • Visual Studio integration that makes developing iOS applications in Visual Studio possible.

I've been extremely excited about the release of Xamarin 2.0.  This article is a product review of the new features of Xamarin 2.0 and my experience with the new features.

Xamarin.iOS for Visual Studio

I'm a .NET developer and for the past 10 years, Microsoft related technologies have put food on my table.  C# is one of the most popular programming languages around today.  Xamarin has lead the way for C# developers interested in developing mobile applications using C#.  The release of Xamarin 2.0 has provided what many thought to be impossible.  iOS development from within Visual Studio.  Here is my experience with Xamarin.iOS and Visual Studio.

One thing to understand is that even though Xamarin.iOS allows you to develop iOS applications from within Microsoft Visual Studio, you still need a machine running Mac OS X, the iPhone SDK, and XCode to allow this integration to work.

Due to Apple's licensing, you can't build iOS applications on anything other than an Apple device.  The Xamarin 2.0 release includes Xamarin.iOS which allows us to write our code in C# using Visual Studio, then build or deploy the application using a build server which is a system running Mac OS X and XCode.

Xamarin.iOS will search the network for a build server that meets the requirements to build iOS applications.  Visual Studio will prompt the user with a list of all machines on the network that meet the build server requirements and allow you to choose which build server to use.

Since I don't own a Mac, I asked my brother-in-law if we could use his OS X machine to test the integration.

We ran into an issue at first in which the Xamarin.iOS host prompt helped me to diagnose.  It turned out that we needed to change the firewall settings to open port 5000 to allow the two machines to communicate to build the application.

After my initial testing at my brother-in-law's place, I decided to continue my research from home.  

I wanted to see if there were any other alternatives to buying a Mac in order to be able to develop iOS applications.  I ran across a service called MacInCloud which allows you to rent a Mac development server in the cloud!  They offer several options regarding payment plans.  They even offer the ability to pay for use by the hour.  Xamarin studio is found in the list of supported software on their features page.

This is a great option for developers like me who wish to develop iOS applications but aren't ready to run out and buy a Mac just yet.

My final thoughts on the Visual Studio Xamarin.iOS development integration are as follows.

Is the process 100% transparent?  No.  Though I don't believe it will ever be possible to make a completely transparent experience.  That would require running Visual Studio on OS X, or Apple removing their restrictions regarding iOS development.  EDIT It's been brought to my attention that since you will need to run a Mac system anyway, some people find it useful to run Visual Studio inside of a Virtual machine on the Mac.  This will make the integration even better.  You may also run the two machines side by side and use a program like VNC Server/Viewer to switch between the two environments. As I've stated earlier, you can also run the Mac build server in a virtual machine by using a service such as Macinthecloud.  If you find a useful combination that isn't mentioned here, feel free to share in the comments!

Is this the best option that .NET developers have regarding developing iOS applications using Visual Studio on in Microsoft Windows?  Yes.  If you are a .NET developer and you are interested in developing applications for iOS, download Xamarin 2.0 and give it a try.  Your mileage may vary.

Xamarin Studio — Its like MonoDevelop on steroids

One of the greatest parts of the Xamarin 2.0 release has got to be the addition of their new IDE, Xamarin Studio.  Xamarin Studio is a multi-platform IDE that is aimed specifically at mobile development using .NET.  The IDE is based on the older MonoDevelop IDE, however, there are new features that make this a completely different product.

MonoDevelop had a look and feel about it that suggested that it was aimed at Linux developers.  The new Xamarin Studio has moved the focus away from Linux developers and is geared more toward the Mac OS X crowd.  The motivation behind the new Xamarin Studio IDE was very much geared toward User Experience and productivity.

 

 

The IDE looks sleek and has a very minimalist design.  The IDE exposes its features based on the context of the current developers actions.  For instance, you don't see debugging buttons unless you are in debug mode.  You don't have useless windows hanging around when they are aren't usable in the current context.  IDE development has been geared toward developers so they've often tried to add more features at the price of sacrificing user experience.  Xamarin understands that developers are users too and that providing a perfect blend of UX and functionality is the best approach to use when designing an IDE.  I think that this was a brilliant move on Xamarin's part and I wouldn't be surprised to see some of the other well known IDEs following this design in future releases.  

 

The IDE's search bar is another great addition to the Xamarin Studio IDE.  The search bar can be found on the start screen as well as while developing an application.  The search offers an auto complete type of functionality which provides results for everything from key shortcuts to code specific namespaces and classes.

 

The search bar provides context specific search results based on your current location within the IDE.

The IDE offers other great tools such as:

  • A regex builder
  • An XML designer
  • Integrated Source Control
  • Unit testing capabilities
When ran from an OS X machine, Xamarin studio offers iOS specific project templates with the ability to create, build, and deploy iOS applications.

Xamarin 2.0 also offers project templates for creating Android applications.  One of the great tools offered by the Xamarin Studio is an Android user interface designer that allows the developer to drag and drop Android user interface widgets to allow the developer to visually create the UI for the Android application.  The UI designer in Xamarin Studio is more intuitive and feature rich than the original designer included in the eclipse IDE!  You also get the same Android UI designer from within Visual Studio when using MonoDroid for Android application development.
 
 
Xamarin Studio was built with the user in mind.  The IDE is built to increase productivity by providing world class user experience.  I'm very happy with the Xamarin Studio and I believe that I will probably use it more than Visual Studio when developing mobile applications.

Component Store

The component store is a wonderful resource for developers using Xamarin to develop Android and iOS applications using C#.  The component store ties directly into the Xamarin Studio IDE and also has a web interface for obtaining components that will greatly increase your productivity in developing professional looking mobile applications.  The Component store offers controls, frameworks, themes, web services, and other components that will make your life as an Android or iOS developer much easier.  They offer free and paid components to assist in your mobile development efforts.  You can even create your own libraries and submit them to the component store for review to be included in the list of components.  I believe Xamarin's component store will be a great resource for anyone developing Android and iOS applications using Xamarin's development tools.

 

 

 


Starter Edition

In the past, Xamarin offered a free trial for developers to try out their product.  The free trial simply limited the functionality of the product.  One of the biggest limitations in my opinion was the inability to deploy to a physical device to test your applications.  When I first began testing Xamarin for Android development, this limitation was a show stopper for me.  I have had no luck with the Android emulator and this seems to be a problem with many Android developers.  The emulator is slow and unstable.  So when I heard that Xamarin had changed their licensing and now offers a free starter edition, I was thrilled.  The starter edition is free and allows testing an Android application on the developer's Android device.  You can even deploy the application to the Android App stores.  There are limitations on Android applications that are created under the Starter edition.  They can’t call to third party native libraries (i.e., P/Invoke) and they are capped in size at a max of 32k of IL code.  However, we as developers at least have the option to deploy to our devices and test the applications so we can make an informed decision regarding upgrading to their new Independent developer license which at $299 is the cheapest license that Xamarin has offered to date!  So if you are like me and you've put off trying Xamarin because of the restrictions that bound you to the horrible Android emulator, there should be nothing stopping you from giving the new Xamarin 2.0 release a try.

 

Conclusion

This concludes my review on Xamarin 2.0.  I'd like to say that I'm very excited about the possibilities that these enhancements provide for .NET developers that wish to develop applications for iOS and Android using C# and .NET.

The free starter edition provides a solution to my biggest issue with the Xamarin tool set, and that's the ability to test your applications built with MonoDroid on your device instead of the horrible emulator.

The Xamarin.iOS integration with Microsoft Visual Studio is the best option to allow developers to develop iOS applications while using their favorite IDE to do it.

The Xamarin Studio IDE is a beautiful, feature rich product that will change the way that we look at what an IDE should be.

If you are a .NET developer and you are looking into mobile development, now is the time to check out Xamarin 2.0.  I'm anxious to see what the next release will bring.  Oh and don't forget about the Xamarin Evolve 2013 world wide developer conference in Austin, Texas from April 14-17 2013.  I hope to see you there!

Thanks for reading!

Buddy James




Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

About the author

My name is Buddy James.  I'm a Microsoft Certified Solutions Developer from the Nashville, TN area.  I'm a Software Engineer, an author, a blogger (http://www.refactorthis.net), a mentor, a thought leader, a technologist, a data scientist, and a husband.  I enjoy working with design patterns, data mining, c#, WPF, Silverlight, WinRT, XAML, ASP.NET, python, CouchDB, RavenDB, Hadoop, Android(MonoDroid), iOS (MonoTouch), and Machine Learning. I love technology and I love to develop software, collect data, analyze the data, and learn from the data.  When I'm not coding,  I'm determined to make a difference in the world by using data and machine learning techniques. (follow me at @budbjames).  

Related links

Month List