動画検索
関連広告
検索結果
Introduction
2. Seven ways to select columns
4. Don't use "fit" on new data!
6. Encode categorical features
8. Chain steps with Pipeline
10. Why set a random state?
12. Pipeline vs make_pipeline
14. Handle missing values automatically
16. Tune a Pipeline
18. Examine grid search results
20. Plot a confusion matrix
22. Use the correct Pipeline methods
25. Improve a decision tree by pruning it
27. Impute missing values for categoricals
29. Add multiple text columns to a model
31. Know when shuffling is required
33. Create custom features with scikit-learn
35. Use pandas objects with scikit-learn
37. Create an interactive Pipeline diagram
39. Load a toy dataset into pandas
41. Encode binary features
43. Save time when encoding categoricals
45. Create feature interactions
47. Tune an ensemble
50. Solve many ML problems with one solution
What we will be doing!
Sci-Kit Learn Overview
How do we find training data?
Download data
Load our data into Jupyter Notebook
Cleaning our code a bit (building data class)
Using Enums
Converting text to numerical vectors, bag of words (BOW) explanation
Training/Test Split (make sure to "pip install sklearn" !)
Bag of words in sklearn (CountVectorizer)
fit_transform, fit, transform methods
Model Selection (SVM, Decision Tree, Naive Bayes, Logistic Regression) & Classification
predict method
Analysis & Evaluation (using clf.score() method)
F1 score
Improving our model (evenly distributing positive & negative examples and loading in more data)
Let's see our model in action! (qualitative testing)
Tfidf Vectorizer
GridSearchCv to automatically find the best parameters
Further NLP improvement opportunities
Saving our model (Pickle) and reloading it later
Category Classifier
Confusion Matrix