回到主页

Which bucket should the data in?

The general ideas of classification in machine learning

· Data mining

Classification is the attempt to place data into the right bucket. In machine learning, we have two analysis way.

Firstly, unsupervised learning, in which the data don’t have a “named”tag or criteria, and the groups are determined by similarity among the variables. In this type of analysis, we are looking for what should the buckets be.

broken image

As an opposite way is supervised learning, also call labeled learning. It just like you know what the slots you want to put coins into. The data has a true class or outcome variable and the models that go into the classification system are determined by accuracy. If you don’t have the most useful variables, you may have a hard time getting an accurate and useful classification model.

broken image

There are a lot of domains where these approaches are handy, for example, determining whether an email is a legitimate email or a junk one, or fraud detection in credit card transaction. Given that I’m a radiologist, there is radiologist diagnosis, or the attempt to classify patients into the right diagnostic category based on a range of image data and clinical symptoms. It is important in many situations to get good quantitative data as a way of empirically assessing the diagnoses.