回到主页

Data mining prerequisites

Something in learning, something in life

· Data mining

Data mining prerequisites

The software for data mining includes text interfaces and graphical ones.

broken image

The example for the text interfaces ones is the statistical programming language R. R is free to use and the source is opened to everyone. You can download it at www.r-project.org This is the first software I learned for data processing, and it’s powerful and logical and easy to apply, even for me without a computer science background. There are hundreds and thousands of R packages which are ready to use and all of them are shared in https://cran.r-project.org/web/views/

One tip is download RStudio at https://www.rstudio.com after downloading R because it can make you much easier to deal with R, just like a helper correcting your programming language when you are typing.

broken image

Other options are Python and Matlab. Jupyter, or IPython at http://jupyter.org , is the interface for python, that’s simply one of the kernels that you can install on it right now, but the easiest way to get Phython, Jupyter and hundreds of open source packages is to go and download one of the distributions that’s available from a couple of companies, for example, Continuum Analytics at http://www.continuum.io and Enthought Canopy at http://goo.gl/tigTeU .

broken image

Now, turning to graphical interfaces, you can try Microsoft Azure Machine Learning Studio at https://studio.azureml.net, the interface is really cool, you don’t need to download any thing, all you need to do is just uploading your dataset and connecting it with widgets and arrows and running it. I tried to do the SVM algorithms on it and find it save me a lot of time on finding a specific package and editing it. But, it’s not a free one. Another online one is BigML at http://bigml.com , it’s free to do small tasks. The task is processed on it’s own server and it’s a nice way to work with. Especially for decision trees.

broken image
broken image

A similar program is RapidMiner, which has a free version. You can drag widget and set your options, bring in the dataset, and it’s really nice to see the whole process in a graphical interface. Other graphical interfaces including KNIME at http://www.knime.org

Another option is Orange at https://orange.biolab.si , it’s a little bit different because it’s based on Python, and in fact there is also a text interface version of Orange as well.

broken image

Finally, you can do data mining with just the plain old terminal, don’t forget your old, true friend!

broken image