How to adjust the parameters when the predicted values of logisticregression are all 0?

The parameters are described as follows:

Penalty: penalty item, str type, optional parameters are l 1 and l2, and the default value is l2. Used to specify the specifications used in the fine item. Newton -cg, sag and lbfgs algorithms only support L2 specification. The L 1G specification assumes that the parameters of the model satisfy Laplace distribution, and the parameters of the L2 assumed model satisfy Gaussian distribution. The so-called normal form is to add constraints to the parameters so that the model will not be over-fitted. However, if it is necessary to add constraints, no problem. No one can answer this question. It can only be said that with constraints, in theory, we should get a result with stronger generalization ability.

Dual: dual or original method, bool type, the default value is False. The dual method is only used to solve the L2 penalty of linear multi-cores. When the number of samples is >; When sampling features, dual is usually set to False.

Tol: the standard for stopping solving, floating-point, and the default value is 1e-4. That is, when the solution is reached, stop and think that the optimal solution has been found.

C: reciprocal of regularization coefficient λ, floating-point type, and the default value is 1.0. Must be a positive floating point number. Like SVM, smaller values indicate stronger regularization.

Fit_intercept: whether there is intercept or deviation; Boolean type; The default value is True.

Intercept_scaling: only useful when the regular item is "liblinear" and fit_intercept is set to True. Type float, the default is 1.

Class_weight: used to represent the weights of various types in the classification model, which can be a dictionary or a' balanced' string. Not entered by default, that is, none regardless of the weight. If you choose input, you can choose balanced to let the class library calculate the type weight by itself, or you can enter the weight of each type by yourself. For example, for the binary model of 0, 1, we can define class _ weight = {0: 0.9, 1: 0. 1}, so that the weight of type 0 is 90%, while that of type 1 is 10%. If class_weight selects balanced, the class library will calculate the weight according to the training sample size. The more samples of a certain type, the lower the weight, and the smaller the sample size, the higher the weight. When class_weight is balanced, the calculation method of class weight is as follows: n _ samples/(n _ classes * NP. bincount (y)). N_samples is the number of samples, n_classes is the number of categories, and np.bincount(y) outputs the number of samples of each category, for example, y=.

So what does class_weight do?

In the classification model, we often encounter two kinds of problems:

The first is the high cost of misclassification. For example, it is very expensive to classify legal users and illegal users. We prefer to classify legal users as illegal users, and then we can re-identify them manually, but we don't want to classify illegal users as legal users. At this time, the weight of illegal users can be appropriately increased.

The second is that the sample height is not balanced. For example, we have 10000 binary sample data of legal users and illegal users, of which 9995 are legal users and only 5 are illegal users. If we don't consider the weight, we can predict all the test sets as legal users, so the theoretical prediction accuracy is 99.95%, but it doesn't make any sense. At this time, you can choose balanced to let the class library automatically increase the weight of illegal user samples. By increasing the weight of a certain classification, more samples will be classified into high-weight categories than without considering the weight, thus solving the above two kinds of problems.

Random_state: random number seed, int type, optional parameter, default to none, only useful when regularization optimization algorithm is Sag and Liblinear.

Solver: the optimization algorithm selects parameters, and there are only five optional parameters, namely Newton-CG, LBFGS, Liblinear, Sag and Saga. The default value is liblinear. The solver parameters determine our optimization method for the loss function of logistic regression, and there are four algorithms to choose from, namely:

Liblinear: Use the open source Liblinear library, and use the coordinate axis descent method to iteratively optimize the loss function internally.

Lbfgs: A quasi-Newton method, which uses the second derivative matrix of the loss function, namely Heisenberg matrix, to iteratively optimize the loss function.

Newton -cg: It is also a family of Newton's methods. The second derivative matrix of the loss function, namely Heisenberg matrix, is used to iteratively optimize the loss function.

Sag: random average gradient descent, which is a variant of gradient descent method. Different from the ordinary gradient descent method, only a part of samples are used in each iteration to calculate the gradient, which is suitable for the case of more sample data.

Saga: Variable Weight Linear Convergence Stochastic Optimization Algorithm.

Summary:

Liblinear is suitable for small data sets, sag and saga are suitable for large data sets because it is faster.

For multi-classification problems, only newton-cg, sag, saga and lbfgs can handle multiple losses, while liblinear is limited by a pair of residuals (OvR). What do you mean? When using liblinear, if it is a multi-classification problem, one category must be regarded as one category first, and all the remaining categories should be regarded as another category. By analogy, all categories are traversed and classified.

Newton-cg, sag and lbfgs all need to lose the first or second continuous derivative of the function, so they can't be used for L 1 regularization without continuous derivative, and can only be used for L2 regularization. Liblinear and saga adopt all regularization of L 1 and L2.

At the same time, sag only uses some samples for gradient iteration at a time, so don't choose it when the sample size is small, but if the sample size is large, such as more than 65438+ million, sag is preferred. But sag can't be used to regularize L 1, so when you have a large number of samples and need L 1 regularization, you have to make your own choice. Either reduce the sample size by sampling samples, or return to L2 regularization.

From the above description, you may think that since newton-cg, lbfgs and sag all have so many restrictions, if it is not a large sample, we can just choose liblinear! Wrong, because liblinear also has its own weaknesses! We know that logistic regression includes binary logistic regression and multivariate logistic regression. One-to-many (OvR) and many-to-many (MvM) are common methods of multivariate logistic regression. MvM is usually more accurate than OvR. Depressed is that liblinear only supports OvR, not MvM, so if you need relatively accurate multivariate logistic regression, you can't choose liblinear. This also means that if we need relatively accurate multivariate logistic regression, we can't use L 1 regularization.

Max_iter: the maximum number of iterations of the algorithm, int type, and the default value is 10. Sag and lbfgs are useful only when the regularization optimization algorithm is Newton -cg, and the maximum number of iterations of the algorithm converges.

Multi_class: selection parameter of classification method, str type, optional parameters are ovr and polynomial, and the default value is ovr. Ovr is the aforementioned one-to-many (OvR) and polynomial is the aforementioned many-to-many (MvM). If it is binary logistic regression, ovr and polynomial have no difference, and the difference is mainly in multivariate logistic regression.

What's the difference between OvR and MvM? *

OvR's idea is simple. No matter how many meta-logistic regression you have, we can regard it as binary logistic regression. Specifically, for the classification decision of K-class, we take all samples of K-class as positive examples and all samples except K-class as negative examples, and then carry out binary logistic regression on them to get the classification model of K-class. Get the classification model of other classes and so on.

However, MvM is relatively complicated. Here, the special case of MvM One-to-One (OvO) is explained. If the model has a T-class, we can select two samples from all the T-class samples at a time, which might as well be recorded as T 1 and T2, put all the samples whose outputs are T 1 and T2 together, take T 1 as a positive example and T2 as a negative example, and carry out binary logic regression to get the model parameters. We need T(T- 1)/2 classification.

It can be seen that OvR is relatively simple, but the classification effect is relatively poor (here refers to most sample distributions, and OvR may be better under some sample distributions). However, MvM classification is relatively accurate, but the classification speed is not as fast as OvR. If ovr is selected, four optimization methods of loss function can be selected: liblinear, newton-cg, lbfgs and sag. But if you choose polynomials, you can only choose Newton -cg, lbfgs and sag.

Verbose: log redundancy, int type. The default value is 0. That is, the training process is not output, and the results are occasionally output at 1, which is greater than 1 and is output for each sub-model.

Warm_start: warm start parameter, bool type. The default value is False. If true, the next training will take the form of an additional tree (reusing the last call as initialization).

N_jobs: number of parallel rows. Int type, the default is 1. At 1, run the program with a one-core CPU, and at 2, run the program with a two-core CPU. When it is-1, run the program with all CPU cores.

Summary:

Advantages: simple implementation, easy to understand and realize; The calculation cost is not high, the speed is fast, and the storage resources are low.

Disadvantages: it is easy to under-fit, and the classification accuracy may not be high.

Others:

The purpose of Logistic regression is to find the best fitting parameters of a nonlinear function Sigmoid, and the solution process can be completed by optimization algorithm.

Some improved optimization algorithms, such as sag. It can complete the parameter update when new data arrives, without re-reading the whole data set for batch processing.

An important problem of machine learning is how to deal with missing data. There is no standard answer to this question, it depends on the demand in practical application. There are some solutions, each of which has its own advantages and disadvantages.

According to the data, this is the parameter of Sklearn, in order to achieve better classification effect.

Is one day enough for Enshi Grand Canyon?

Where is City University of Macau ranked in the world?

How about the bandwagon of Yingtan Hantang Culture and Media Co.

Attractive wechat dynamic talk about the connotation of a sentence praised by seconds

What is the 719 Telecom 5G Membership Day event? What's in it

How to protect personal information in the age of big data?

Where do you think the cities around Shanghai have the most potential in terms of property prices?

What is the frame rate of Final Fantasy 13: Thunderbolt Returns related to? Reasonable way to set the game frame rate

Yellow River Water Conservancy Vocational and Technical College Zip code with address and introduction

Difference between Mercury router 300M and 1200M