Ensemble learning or grouping methods is a machine learning technique that combines several basic models to create one optimal predictive model.
Ensemble learning largely uses decision trees to define the definition and practicality of team methods (however, it should be remembered that team methods relate not only to decision trees). The decision tree determines the predictive value based on a series of questions and conditions. A simple decision tree defines two decisions. The tree takes into account several factors, taking into account every factor or making a decision or asking another question. When you create decision trees, you need to consider several factors:
Group methods allow us to consider a sample of decision trees, calculate which functions to use, or questions to ask at each division and make the final predictor based on the aggregated results of the sampled decision trees.
The grouping methods can be divided into two groups:
– sequential team methods in which elementary students are generated sequentially. The basic motivation of sequential methods is to use the relationship between primary students. The overall performance can be increased by weighing previously poorly marked examples with a higher weight.
– parallel-group methods in which elementary students are generated in parallel. The basic motivation of parallel methods is the use of independence between primary students because the error can be radically reduced by averaging.
Most team methods use one basic learning algorithm to produce homogeneous elementary students, i.e. learners of the same type, leading to homogeneous teams.
There are also some methods that use heterogeneous students, i.e. students of different types, leading to heterogeneous teams. For compilation methods to be more accurate than any of its individual members, primary students must be as accurate as possible and as diverse as possible.
Voting and averaging are the two simplest team methods. Both are easy to understand and implement. Voting is used for classification, and averaging is used for regression.
In both methods, the first step is to create multiple classification or regression models using some training data set. Each basic model can be created using different divisions of the same set of training data and the same algorithm or using the same set of data with different algorithms or any other method.
The methods can be divided into:
Each model makes predictions (votes) for each test instance, and the final forecast of results is one that receives more than half of the votes. If none of the forecasts reaches more than half the votes, we can say that the team method could not obtain a stable forecast for this instance. Although this is a commonly used technique, you can try the most voted forecast (even if it is less than half the votes) as the final forecast. In some articles, you can see that this method is called “multiple voting”.
Unlike majority voting, where each model has the same rights, we can increase the importance of one or more models. In the weighted vote, we count the forecasts of better models many times. Finding the right set of weights is yours.
In the simple averaging mode, average forecasts are calculated for each occurrence of the test data set. This method often reduces the excess and creates a milder regression model.
Weighted averaging is a slightly modified version of simple averaging, where the forecasts of each model are multiplied by the weight, and then their average is calculated.
Bagging, which is the aggregation of bootstrap (Bootstrap aggregating), is one of the earliest, most intuitive and perhaps the simplest band-based algorithms, with surprisingly good performance. The variety of classifiers is obtained using the bootstrap training data replicas. This means that various subsets of training data are randomly drawn – with the exchange – from the entire training data set. Each subset of training data is used to train another classifier of the same type. Individual classifiers are then combined, making a decision by simple majority vote. For each instance, the class chosen by most classifiers is the team’s decision. Because training data sets can significantly overlap, additional measures can be used to increase diversity, such as using a subset of training data for each classifier training or using relatively weak classifiers.
The set of decision trees is Random Forest. Random forests perform internal bagging. A random forest creates several trees, sometimes thousands, and calculates the best possible model for a given data set. Instead of considering all functions when splitting a node, the Random Forest algorithm selects the best function from a subset of all functions. This causes a greater deviation in the case of less variance, which gives a much better model.
As in the case of bagging, the gain also creates a group of classifiers by transforming the data signal, which is then combined by majority voting. However, in the increasing phase, resampling is strategically adjusted to provide the most informative training data for each subsequent classifier. In fact, each amplification iteration creates three weak classifiers.
The team of classifiers is first trained using training data samples in which start data is retrieved. The basic idea is to learn if the training data has been properly taught. For example, if a given classifier has incorrectly learned a specific region of feature space, and consequently erroneously classifies instances from this region, then the next classifier may be able to learn this behavior, and with the learned behaviors of other classifiers, it may correct such inappropriate training .
Incremental learning refers to the ability of a learning algorithm based on new data that can become available after the classifier (or model) has already been generated from the previously available data set. The algorithm is said to be an incremental learning algorithm if it generates a sequence of hypotheses for the training data sequence (or instance) in which the current hypothesis describes all the data that has been observed so far, but depends only on previous hypotheses and current training data. Therefore, the incremental learning algorithm must learn new information and preserve the previously acquired knowledge without having access to previously viewed data. The commonly used approach to learning from additional data – rejection of an existing classifier and retraining of the new with old and new data combined together does not meet the definition of incremental learning, because it causes catastrophic oblivion of all previously known information and uses earlier data.
Errors in the correction of output codes (ECOC) are commonly used in information theory to correct bit reversals caused by loud communication channels or machine learning to convert binary classifiers, such as vector machines, into multiple classifiers by decomposing a multiple problem into several class two problems.
The goal of any machine learning problem is to find one model that best predicts our desired results. Instead of creating one model and hoping that this model is the best or the most accurate predictor we can create, group methods take into account the myriad of models and allow the average of these models to produce one final model. It is important to remember that decision trees are not the only form of group methods, currently the most popular and important in data science.
In addition to these methods, it is common to use deep learning sets by training various and accurate classifiers. Diversity can be achieved through various architectures, hyperparameters and training techniques.
Group methods have been very successful in determining record performance on difficult data sets and belong to the group of winners of competitions with Kaggle data sciences. Choosing the right bands is more an art than a simple science.
Chociaż metody zespołowe mogą nam pomóc w uczeniu maszynowym poprzez opracowanie wyrafinowanych algorytmów i uzyskiwanie wyników z dużą dokładnością, często nie jest to preferowane w branżach, w których ważniejsza jest możliwość interpretacji. Niemniej skuteczność tych metod jest niezaprzeczalna, a ich korzyści w odpowiednich zastosowaniach mogą być ogromne. W takich dziedzinach, jak opieka zdrowotna, nawet najmniejsza poprawa w dokładności algorytmów uczenia maszynowego może być naprawdę wartościowa.