Python tester.dump_classifier_and_data函数代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› Python›Python编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了Python中tester.dump_classifier_and_data函数的典型用法代码示例。如果您正苦于以下问题：Python dump_classifier_and_data函数的具体用法？Python dump_classifier_and_data怎么用？Python dump_classifier_and_data使用的例子？那么恭喜您, 这里精选的函数代码示例或许可以为您提供帮助。

在下文中一共展示了dump_classifier_and_data函数的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的Python代码示例。

示例1: finish

def finish():
    try:
        if best_solution_so_far:

            print ""
            print "============================="
            print "Optimization Path:"
            print "============================="
            for solution in optimization_path:
                print_result_item(solution[0])
                print solution[1]
            print ""
            print "============================="
            print "Final Solution:"
            print "============================="
            print best_solution_so_far
            clf = best_solution_so_far[0][3]
            features_list = best_solution_so_far[1]
            dump_classifier_and_data(clf, my_dataset, features_list)
            print "Model saved with success."
        else:
            print ""
            print "No solution found"
    except Exception as e:
        print e

开发者ID:luiscruz，项目名称:udacity_data_analyst，代码行数:25，代码来源:poi_id.py

示例2: setup_and_test

def setup_and_test(my_dataset, features_list, classifier):
    # Dump classifier and features list, so we can test them
    dump_classifier_and_data(classifier, my_dataset, features_list)

    # load up student's classifier, dataset, and feature_list
    clf, dataset, feature_list = load_classifier_and_data()
    # Run testing script
    test_classifier(clf, dataset, feature_list)

    return

开发者ID:joashxu，项目名称:enron，代码行数:10，代码来源:utils.py

示例3: detect_poi

def detect_poi():
### Load the dictionary containing the dataset
    data_dict = pickle.load(open("final_project_dataset.pkl", "r") )
### Task 1: Remove outliers
    data_dict.pop('TOTAL',0)    
    
### Task 2: Select what features
### 'stk_pay_ratio','to_poi_ratio', 'from_poi_ratio','bonus_salary_ratio'
### features_list is a list of strings, each of which is a feature name.
### The first feature must be "poi".
    my_dataset = data_dict
    stk_pay_ratio(my_dataset)
    from_poi_ratio(my_dataset)
    to_poi_ratio(my_dataset)
    bonus_salary_ratio(my_dataset)
     
### Task 3: Feature Selection
### Generate a set of 15 feature lists from these 4 features
### This way, all possible combinations of these features are tested

    all_features_list = fList_set()

### Because of the small size of the dataset, the script uses stratified
### shuffle split cross validation in tester.py
    metrics = []    
    clf = GaussianNB()    
### ptest uses Stratified shuffle split cross validation and calculates the precision
### Find the precision for every list
    for i in range(0,15):
        metrics.append(ptest(clf,my_dataset,all_features_list[i]))
### Go for the feature list that produces the best precision.  
### For this dataset only, it is harder to get a high precision.
    best = np.array(metrics).argmax()  
    
### Run test_classifier to print evaluation metrics to console
    test_classifier(clf, my_dataset,all_features_list[best])

### Now use the same feature list to run the decison tree classifier
    features_list = all_features_list[best]
### Task 4: Try a varity of classifiers
    samples_split_values = [2,4]
    samples_leaf_values = [1,2]

    for split in samples_split_values:
        for leaf in samples_leaf_values:
            clf = tree.DecisionTreeClassifier(min_samples_split=split,\
            min_samples_leaf=leaf)
            test_classifier(clf, my_dataset, features_list)
            print_feature_importances(features_list, clf)
###Choose best classfier and feature set    
    clf = GaussianNB()   

### Dump classifier, dataset, and features_list
    dump_classifier_and_data(clf, my_dataset, features_list)

开发者ID:RaphaelTam，项目名称:Enron_Bad_Guys，代码行数:54，代码来源:poi.id.py

示例4: main

def main():
    data_dict = pickle.load(open("final_project_dataset.pkl", "r"))
    my_dataset = data_dict
    my_dataset = AddFeatures(my_dataset)
    # Exclude using Discretion.
    Exc1 = ["email_address"]
    # Replaced by creating better versions of the features
    Exc2 = ["to_messages", "from_messages", "from_this_person_to_poi", "from_poi_to_this_person"]
    # Exclude because Highly Correlated with stronger features
    Exc3 = [
        "deferral_payments",
        "expenses",
        "deferred_income",
        "restricted_stock_deferred",
        "director_fees",
        "long_term_incentive",
        "bonus",
        "total_payments",
        "salary",
        "total_stock_value",
        "restricted_stock",
        "exercised_stock_options",
        "other",
    ]
    exclude = Exc1 + Exc2 + Exc3
    # QueryDataSet(my_dataset)
    # ShowCorrel(my_dataset)
    features_list = next(my_dataset.itervalues()).keys()
    for i in exclude:
        features_list.remove(i)
    features_list.insert(0, features_list.pop(features_list.index("poi")))
    data = featureFormat(my_dataset, features_list, sort_keys=True)
    ### Extract features and labels from dataset for local testing
    labels, features = targetFeatureSplit(data)
    features_train, features_test, labels_train, labels_test = train_test_split(
        features, labels, test_size=0.1, random_state=42, stratify=labels
    )
    # clf=TuneSVM(features, labels,features_list)
    # clf=TuneKNN(features, labels,features_list)
    # clf=NoTuneDT(features, labels,features_list)
    # clf=TuneDT(features,labels,features_list)
    features_list.insert(0, "poi")
    dump_classifier_and_data(clf, my_dataset, features_list)
    test_classifier(clf, my_dataset, features_list)

开发者ID:datalord123，项目名称:MachineLearning，代码行数:44，代码来源:poi_id.py

示例5: Pipeline

        testing_features_list = [u'poi']
        for feature in features_list_score_order:
            testing_features_list.append(feature)
            pipe = Pipeline([('impute', Imputer(strategy='median')), 
                    ('classify', GaussianNB(priors=[(i/2.)*.1, (1 - (i/2.)*.1)]))])
            total_predictions, accuracy, precision, recall, f1, f2 = \
                test_classifier(pipe, my_dataset, testing_features_list, folds=200)
            acc.append(accuracy)
            prec.append(precision)
            reca.append(recall)
        acc_all.append(acc)
        prec_all.append(prec)
        reca_all.append(reca)
        results_dict['prec' + str(i)] = prec
        results_dict['reca' + str(i)] = reca
        results_dict['acc' + str(i)] = acc
#tuneNB()
test_df = pd.DataFrame(results_dict)
### Task 6: Dump your classifier, dataset, and features_list so anyone can
### check your results. You do not need to change anything below, but make sure
### that the version of poi_id.py that you submit can be run on its own and
### generates the necessary .pkl files for validating your results

features_list_score_order = [u'poi', u'exercised_stock_options', u'total_stock_value', u'bonus']
pipe = Pipeline([('impute', Imputer(strategy='median')), 
        ('classify', GaussianNB(priors=[.15, .85]))])
total_predictions, accuracy, precision, recall, f1, f2 = \
    test_classifier(pipe, my_dataset, features_list_score_order, folds=1000)   
        
dump_classifier_and_data(pipe, my_dataset, features_list_score_order)

开发者ID:eistre91，项目名称:DataAnalystNanodegree，代码行数:30，代码来源:poi_id.py

示例6: SelectKBest

#        #print self.X_fit+X
#        best_words=self.wt.transform(self.X_fit+X)
#	word_pca = self.pca.fit_transform(best_words)
#        qqq = np.array(word_pca)[np.arange(len(self.y_fit)),:]
#        best_pca_train = self.pt.fit_transform(qqq,self.y_fit)
#	self.clf.fit(best_pca_train,self.y_fit)
#        #x=remove_low_frequency_words(X)
#        best_pca_test = self.pt.transform(np.array(word_pca)[np.arange(len(X))+len(self.X_fit)])
#	#word_pca = self.pca.transform(best_words)
#        #best_pca = self.pt.transform(word_pca)
#        return self.clf.predict(best_pca_test)

## create filtered_gnb classifier
#word_transformer = SelectKBest(f_regression,200)
#pca              = PCA(n_components=86)
#pca_transformer  = SelectKBest(f_classif,20)
#classifier1      = DecisionTreeClassifier(min_samples_leaf=2)
#classifier2      = GaussianNB()
#classifier3      = KNeighborsClassifier()
#filtered_gnb=FilteredGNB(word_transformer,pca,pca_transformer,classifier1)

#print "FILTERED GNB CLASSIFIER USING ALL WORD FEATURES"
#test_classifier(filtered_gnb, my_dataset, ["poi"]+ words.tolist(),folds=5)

print "Gaussian NB with Word PCA Features:"
test_classifier(GaussianNB(), my_dataset, ["poi"]+ best_word_pca_features)

### Dump your classifier, dataset, and features_list so 
### anyone can run/check your results.
dump_classifier_and_data(GaussianNB(), my_dataset, ["poi"]+best_word_pca_features)

开发者ID:avisochek，项目名称:enron_fraud_detection，代码行数:30，代码来源:poi_id.py

示例7: build_model

# Without new features
_ = build_model(original_features, estimator, {}, use_kbest=True, k=['all'], use_scaler=True)

# With grand_total
_ = build_model(original_features + ['grand_total'], estimator, {}, use_kbest=True, k=['all'], use_scaler=True)

# With from_poi_ratio
_ = build_model(original_features + ['from_poi_ratio'], estimator, {}, use_kbest=True, k=['all'], use_scaler=True)

# With to_poi_ratio
_ = build_model(original_features + ['to_poi_ratio'], estimator, {}, use_kbest=True, k=['all'], use_scaler=True)


# ----------------------------------------------------------
#   Final Model
# ----------------------------------------------------------

final_model, final_features = build_model(original_features + ['grand_total'],
                                          estimator, {},
                                          use_kbest=True,
                                          use_scaler=True)

test_classifier(final_model, data_dict, final_features, folds=1000)

# ----------------------------------------------------------
#    Dump Classifier and Data
# ----------------------------------------------------------

dump_classifier_and_data(final_model, data_dict, final_features)

开发者ID:nhtruong，项目名称:ud120-projects，代码行数:29，代码来源:poi_id.py

示例8: evaluate_clasifier

def evaluate_clasifier(df, extras, algo, dump=False):
    """Evaluate and possibly store classifier and data"""

    if not dump:
        # Only redirect output for the search
        orig_stdout, logfile = init_logfile(extras, algo)

    ### Task 3: Create new feature(s)
    df = create_features(df, *extras)

    ### Extract features and labels from dataset for local testing
    dfx, dfy = features_split_df(df)

    ### Task 4: Try a varity of classifiers
    ### Please name your classifier clf for easy export below.
    ### Note that if you want to do PCA or other multi-stage operations,
    ### you'll need to use Pipelines. For more info:
    ### http://scikit-learn.org/stable/modules/pipeline.html

    ### Task 5: Tune your classifier to achieve better than .3 precision and recall
    ### using our testing script. Check the tester.py script in the final project
    ### folder for details on the evaluation method, especially the test_classifier
    ### function. Because of the small size of the dataset, the script uses
    ### stratified shuffle split cross validation. For more info:
    ### http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedShuffleSplit.html

    split_indices = StratifiedShuffleSplit(dfy, n_iter=1000, test_size=0.1)

    features_list = ['poi'] + dfx.columns.values.tolist()

    pipeline, params = create_pipeline(
        algo,
        extras,
        is_search=(not dump),
        max_features=len(dfx.columns))

    grid_searcher = GridSearchCV(
        pipeline,
        param_grid=params,
        cv=split_indices,
        n_jobs=-1,
        scoring=create_scorer(),
        verbose=0)

    t0 = time()
    with warnings.catch_warnings():
        warnings.simplefilter('ignore', UserWarning)

        grid_searcher.fit(dfx, y=dfy)
        print '\nTime to fit: {:0>8}\n'.format(dt.timedelta(seconds=(time() - t0)))

        print "Best parameters set:"
        print grid_searcher.best_params_
        print ''

        print 'Grid score:'
        for params, mean_score, scores in grid_searcher.grid_scores_:
            print "%0.3f for %r" % (mean_score, params)
        print ''

        selector = grid_searcher.best_estimator_.named_steps['selection']
        scored = pd.DataFrame(zip(
            dfx.columns.tolist(),
            selector.scores_,
            selector.get_support()))

        scored.columns = ['Feature', 'Score', 'Selected']
        scored = scored.sort_values(by=['Score'], ascending=False)
        scored.index = range(1, len(scored) + 1)
        n_selected = len(scored[scored.Selected])
        print 'Scored features: {} selected'.format(n_selected)
        print scored
        print ''

        # n_pca_components = grid_searcher.best_estimator_.named_steps[
        #     'reducer'].n_components_

        # print "Reduced to {0} PCA components".format(n_pca_components)

        ### Task 6: Dump your classifier, dataset, and features_list so anyone can
        ### check your results. You do not need to change anything below, but make sure
        ### that the version of poi_id.py that you submit can be run on its own and
        ### generates the necessary .pkl files for validating your results.

        clf = grid_searcher.best_estimator_

        ### Store to my_dataset for easy export below.
        df = features_combine_df(dfx, dfy)
        my_dataset = df.to_dict(orient='index')

        test_classifier(clf, my_dataset, features_list)

        if dump:
            dump_classifier_and_data(clf, my_dataset, features_list)
        else:
            close_logfile(orig_stdout, logfile)

开发者ID:j-bennet，项目名称:udacity-nano-da，代码行数:96，代码来源:poi_id.py

示例9: zip

param_grid=tree_param_grid,
scoring="recall")



### Show results of parameter tuning
grid_search.fit(features_train, labels_train)
print "\nbest estimator: \n", (grid_search.best_estimator_),\
"\n best score:\n",grid_search.best_score_ ,\
"\n best params:\n",grid_search.best_params_


clf = grid_search.best_estimator_

features_selected_bool  =  clf.named_steps['skb'].get_support()
features_selected_list = [x for x, y in zip(features_selected_list[1:],
features_selected_bool ) if y]

print "\nselected features: ", features_selected_list



### Task 6: Dump your classifier, dataset, and features_list so anyone can
### check your results. You do not need to change anything below, but make sure
### that the version of poi_id.py that you submit can be run on its own and
### generates the necessary .pkl files for validating your results.



dump_classifier_and_data(clf, my_dataset, ["poi"]+features_selected_list)

开发者ID:olix20，项目名称:Enron-fraud-analysis，代码行数:30，代码来源:poi_id.py

示例10: Pipeline

pipe = Pipeline(steps=[('skbest', SelectKBest(score_func=f_classif)), ('clf', GaussianNB())])
cv = StratifiedShuffleSplit(labels,n_iter = 60,random_state = 42)
b_grid_search = grid_search.GridSearchCV(pipe, param_grid = clf_params,cv = cv,scoring = 'precision')
b_grid_search.fit(features_saved,labels_saved)

print 'Time:',round(time()-t0,3) ,'s\n'
t0 = time()

# pick a winner
best_clf_nb = b_grid_search.best_estimator_
print best_clf_nb

found_skb_nb=best_clf_nb.steps[0][1]
found_clf_nb=best_clf_nb.steps[1][1]

features=found_skb_nb.fit_transform(features_saved,labels_saved)
features_list_to_use_nb=np.asarray(all_features_list_saved)[found_skb_nb.get_support()].tolist()
print "\nFeatures used:"
print features_list_to_use_nb

test_classifier(found_clf_nb, dataset_to_export, ['poi']+features_list_to_use_nb)



### Task 6: Dump your classifier, dataset, and features_list so anyone can
### check your results. You do not need to change anything below, but make sure
### that the version of poi_id.py that you submit can be run on its own and
### generates the necessary .pkl files for validating your results.
dump_classifier_and_data(svm_clf, dataset_to_export, ['poi']+features_list)

开发者ID:buinyi，项目名称:Intro-To-Machine-Learning，代码行数:29，代码来源:poi_id.py

示例11: main

def main():
    print "=========="
    import sys
    #import os
    import pickle
    from time import time
    
    ## evaluation
    from sklearn.metrics import precision_score, recall_score 
    import matplotlib.pyplot as plt
    import pandas as pd
    #from ggplot import *
    import numpy as np
    
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.pipeline import Pipeline, FeatureUnion
    from sklearn.grid_search import GridSearchCV
    from sklearn.svm import SVC
    #from sklearn.datasets import load_iris
    from sklearn.decomposition import PCA
    from sklearn.feature_selection import SelectKBest
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.naive_bayes import GaussianNB
    from sklearn.cross_validation import StratifiedShuffleSplit
    from sklearn.cross_validation import train_test_split
    ###############################################################################
    ###############################################################################
    ###############################################################################
    ## current file running
    print "Running:", sys.argv[0].split("/")[-1]
    t_start_all = time()
    
    ### import helper functions
    sys.path.append("../tools/")
    from feature_format import featureFormat, targetFeatureSplit
    ## make sure 'tester' in same dir 
    from tester import dump_classifier_and_data
    
    ## moving loading dict code to be consistent with 'validate.py' ex from prev.
    ## lesson.
    ### Load the dictionary containing the dataset
    with open("final_project_dataset.pkl", "r") as data_file:
        data_dict = pickle.load(data_file)
    
    '''
    #Example structure of data_dict:
    >>> data_dict
            {'METTS MARK': {'salary': 365788, 
                            'to_messages': 807, 
                            'deferral_payments': 'NaN', 
                            'total_payments': 1061827, 
                            'exercised_stock_options': 'NaN', 
                            'bonus': 600000, 
                            'restricted_stock': 585062, 
                            'shared_receipt_with_poi': 702, 
                            'restricted_stock_deferred': 'NaN', 
                            'total_stock_value': 585062, 
                            'expenses': 94299, 
                            'loan_advances': 'NaN', 
                            'from_messages': 29, 
                            'other': 1740, 
                            'from_this_person_to_poi': 1, 
                            'poi': False, 
                            'director_fees': 'NaN', 
                            'deferred_income': 'NaN', 
                            'long_term_incentive': 'NaN', 
                            'email_address': '[email protected]', 
                            'from_poi_to_this_person': 38
                            }, 
            'BAXTER JOHN C': {'salary': 267102, 
                              'to_messages': 'NaN', 
                              'deferral_payments': 1295738, 
                              'total_payments': 5634343, 
                              'exercised_stock_options': 6680544, 
                              'bonus': 1200000, 
                              'restricted_stock': 3942714, 
                              'shared_receipt_with_poi': 'NaN', 
                              'restricted_stock_deferred': 'NaN', 
                              'total_stock_value': 10623258, 
                              'expenses': 11200, 
                              'loan_advances': 'NaN', 
                              'from_messages': 'NaN', 
                              'other': 2660303, 
                              'from_this_person_to_poi': 'NaN', 
                              'poi': False, 
                              'director_fees': 'NaN', 
                              'deferred_income': -1386055, 
                              'long_term_incentive': 1586055, 
                              'email_address': 'NaN', 
                              'from_poi_to_this_person': 'NaN'
                              },
            ...
    '''
    ###############################################################################
    ###############################################################################
    ###############################################################################
    print "----------"
    '''
    ##### Task 0. Data Exploration
#.........这里部分代码省略.........

开发者ID:ximiki，项目名称:udacity_P5，代码行数:101，代码来源:poi_id_B1_1605131351.py

示例12: zip

                feats.append(y)
        
        for x,y in zip(full_features_list[1:], clf_best.named_steps['skb'].scores_): 
            list_scores.append({'feature_list' : x, "scores" : y})
        
        print feats
        print pd.DataFrame(list_scores)        
        print "---------------------------------------------------------------"
        for param_name in sorted(grid_search.param_grid.keys()):
            print("\t%s: %r" % (param_name, best_parameters[param_name]))

    print "GridSearch time:" 
    time1 = round(time()-t0,2)
    print time1
    print "test_classifier time:" 
    t1 = time()
    test_classifier(clf_best, my_dataset, full_features_list)
    time2 = round(time()-t1, 2)
    print time2
    print "total time:", time2+time1
print "-----------------------------------------------------------------------"
############################################################################### 
## Tune classifier
### Generates the necessary .pkl files for validating results.

if full_report:
    for clf in [ dtc, gnc, knn, abc, rfc ]:
       test_classifier(clf, my_dataset, features_list)

dump_classifier_and_data(clf_best, my_dataset, full_features_list)

开发者ID:ondramie，项目名称:machine-learning-python-enron-employees，代码行数:30，代码来源:poi_id.py

示例13: make_pipeline

#              decision_function_shape='ovo', degree=3, gamma='auto',
#              kernel='linear', max_iter=-1, probability=False,
#              random_state=20160308, shrinking=False, tol=0.001,
#              verbose=False))
pipe = make_pipeline(
          Imputer(axis=0, copy=True, missing_values='NaN',
                  strategy='median', verbose=0),
          ExtraTreesClassifier(bootstrap=False, class_weight='balanced',
                               criterion='gini', max_depth=None,
                               max_features='sqrt', max_leaf_nodes=None,
                               min_samples_leaf=3, min_samples_split=2,
                               min_weight_fraction_leaf=0.0, n_estimators=30,
                               n_jobs=-1, oob_score=False,
                               random_state=20160308, verbose=0,
                               warm_start=False))
#pipe = make_pipeline(
#          Imputer(axis=0, copy=True, missing_values='NaN',
#                  strategy='median', verbose=0),
#          SelectFpr(alpha=0.05, score_func=f_classif),
#          ExtraTreesClassifier(bootstrap=False, class_weight='balanced',
#                               criterion='gini', max_depth=None,
#                               max_features='sqrt', max_leaf_nodes=None,
#                               min_samples_leaf=3, min_samples_split=2,
#                               min_weight_fraction_leaf=0.0, n_estimators=30,
#                               n_jobs=-1, oob_score=False,
#                               random_state=20160308, verbose=0,
#                               warm_start=False))

# Task 6: Dump your classifier, dataset, and features_list
dump_classifier_and_data(pipe, df.to_dict(orient='index'), ['poi'] + F_ALL_NEW)

开发者ID:pqmagicwu，项目名称:uda-da-p5-enron-fraud-detection，代码行数:30，代码来源:poi_id.py

示例14: Pipeline

t= time.time()
pipeline = Pipeline([('normalization', scaler), 
                     ('classifier', KNeighborsClassifier(n_neighbors=3, weights='uniform', algorithm='auto', 
                                                         leaf_size=30, p=1, metric='minkowski'))])

test_classifier(pipeline, enron_data, features_select(4))

print time.time()-t


# ###Data dump

# In[45]:


### Dump your classifier, dataset, and features_list so 
### anyone can run/check your results.

dump_classifier_and_data(pipeline, enron_data, features_select(4))


# ###Additional methods to explore include:
# 
# * using k-fold cross-validation to improve model validation

# In[ ]:

开发者ID:BlaneG，项目名称:Udacity_Intro_machine_learning，代码行数:26，代码来源:poi_id.py

示例15: time

dtc_clf = sklearn.tree.DecisionTreeClassifier() 
dtcclf = grid_search.GridSearchCV(dtc_clf, parameters, scoring = scoring, cv = cv)

dtcclf.fit(features, labels)
print dtcclf.best_estimator_
print dtcclf.best_score_
print 'Processing time:',round(time()-t0,3) ,'s'


#Classifier validation
##DecisionTreeClassifier Validation 1 (StratifiedShuffleSplit, folds = 1000)
t0 = time()
dtc_best_clf = dtcclf.best_estimator_
   
test_classifier(dtc_best_clf, enron_data, eng_feature_list)

print 'Processing time:',round(time()-t0,3) ,'s'


##DecisionTreeClassifier Validation 2 (Randomized, partitioned trials, n=1,000)
t0 = time()
dtc_best_clf = dtcclf.best_estimator_
   
evaluate.evaluate_clf(dtc_best_clf, features, labels, num_iters=1000, test_size=0.3)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print 'Processing time:',round(time()-t0,3) ,'s'

#Dump my classifier
dump_classifier_and_data(dtc_best_clf, enron_data, eng_feature_list)

开发者ID:myaqoob67，项目名称:Udacity_ML，代码行数:29，代码来源:poi_id.py

示例16: train_test_split

#print "accuracy score is ",accuracy
#print "recall score is ",recall
#print "precision score is ",precision


# Example starting point. Try investigating other evaluation techniques!
#from sklearn.cross_validation import train_test_split
#features_train, features_test, labels_train, labels_test = \
#    train_test_split(features, labels, test_size=0.3, random_state=42)

### Task 6: Dump your classifier, dataset, and features_list so anyone can
### check your results. You do not need to change anything below, but make sure
### that the version of poi_id.py that you submit can be run on its own and
### generates the necessary .pkl files for validating your results.

dump_classifier_and_data(clf, my_dataset, testfeatureswithpoi)


##MYCODE :convert my_dataset dictionary to list
#import csv
#dictlist=[]
#temp=[]
#fieldnames=['name']


#for name,detail in my_dataset.iteritems():
#   temp.append(name)
   
#   for key,value in detail.iteritems():
       
#	   temp.append(value)

开发者ID:Maggiebj，项目名称:P5_enron_project，代码行数:31，代码来源:poi_id.py

示例17: main


#.........这里部分代码省略.........
    best_clf_config_list = clf_collection.sort_values(['precision','recall',
                                            'accuracy','number of features'],
                                    ascending=[False,False,False,True])

    clf_collection.sort_values(['precision', 'recall',
                                'accuracy', 'number of features'],
                               ascending=[False, False, False, True])

    # dump the results of all the tested classifiers and related configurtion
    # and train/test setup
    clf_collection.to_csv("training_data.csv", sep=',', encoding='utf-8')

    # iterating through all the classifiers chosen
    print "Validating list of best classifiers: "
    for index, best_clf_config in best_clf_config_list.iterrows():
        # go for the best, instantiate it and dump the data
        best_clf_class_id = best_clf_config["class_id"]
        best_clf_params = best_clf_config["best parameters"]

        for id, clf_class, clf_kwargs, feat_scaling in gl_clf_list:
            if id == int(best_clf_class_id):

                try:
                    # instantiate classifier
                    best_clf = clf_class(**best_clf_params)
                    best_clf_org = clf_class(**best_clf_params)

                    if best_clf_config["features_scaled"]:
                        # train the algorithm
                        best_clf.fit(my_features_train_scaled, my_labels_train)
                        best_clf_org.fit(orig_features_train_scaled, orig_labels_train)
                    else:
                        # train the algorithm
                        best_clf_org.fit(orig_features_train, orig_labels_train)
                        best_clf.fit(my_features_train, my_labels_train)

                    print "start original data set"
                    # test with original data set
                    #v_o_total_predictions, v_o_accuracy, v_o_precision, v_o_recall, v_o_f1, v_o_f2 =\
                    #    test_classifier(best_clf_org, orig_dataset, orig_features_list)

                    #clf_best_collection.loc[1000 + index] = (best_clf_config["class_id"],
                    #                                  best_clf_config["clf"],
                    #                                  best_clf_config['features_scaled'],
                    #                                  len(orig_features_list),
                    #                                  str(my_features_list),
                    #                                  v_o_accuracy, v_o_precision,
                    #                                  v_o_recall, best_clf_params,
                    #                                  best_clf_config["best estimator"],
                    #                                  True,
                    #                                  create_new_message_features,
                    #                                  create_new_finance_features,
                    #                                  PCA_info)

                    # dump final information
                    dump_classifier_and_data(best_clf, my_dataset, my_features_list)

                    print "start original data set with new features"
                     #test with newly created features on top of the original data set
                    v_total_predictions, v_accuracy, v_precision, v_recall, v_f1, v_f2 =\
                        test_classifier(best_clf, my_dataset, my_features_list, do_perform_PCA,
                            pca_components, best_clf_config['features_scaled'])

                    clf_best_collection.loc[index] = (best_clf_config["class_id"],
                                                      best_clf_config["clf"],
                                                      best_clf_config['features_scaled'],
                                                      len(my_features_list),
                                                      str(my_features_list),
                                                      v_accuracy, v_precision, v_recall,
                                                      best_clf_params,
                                                      best_clf_config["best estimator"],
                                                      False,
                                                      create_new_message_features,
                                                      create_new_finance_features,
                                                      PCA_info
                                                      )
                except TypeError:
                    clf_best_collection.loc[index] = (best_clf_config["class_id"],
                                                      best_clf_config["clf"],
                                                      best_clf_config['features_scaled'],
                                                      len(my_features_list),
                                                      str(my_features_list),
                                                      "Error", "Error", "Error",
                                                      best_clf_params,
                                                      best_clf_config["best estimator"],
                                                      False,
                                                      create_new_message_features,
                                                      create_new_finance_features,
                                                      PCA_info
                                                     )

    cbc = clf_best_collection.sort_values(['precision', 'recall', 'accuracy'],
                                                 ascending=[False, False,
                                                            False])

    # write classifier validation result to file
    cbc.to_csv(output_file_results, sep=',', encoding='utf-8')

    print "###################################################################"
    print "###################################################################"

开发者ID:cellarstar，项目名称:udacity，代码行数:101，代码来源:poi_id.py

示例18: accuracy_score

    print "For optimum",score,":"
    for name in classifier_names:
        print "  ",name,":  ",best_performance[score][name]


# acc  = accuracy_score(pred, labels_test)
# print ""
# print "Accuracy:",acc," (Good predictions / All predictions)"
# pre = precision_score(pred, labels_test)
# print "Precision:",pre," (Real POIs / Predicted POIs)"
# rec  = recall_score(pred, labels_test)
# print "Recall:",rec," (Identified POIs / All POIs)"
### Task 5: Tune your classifier to achieve better than .3 precision and recall
### using our testing script. Check the tester.py script in the final project
### folder for details on the evaluation method, especially the test_classifier
### function. Because of the small size of the dataset, the script uses
### stratified shuffle split cross validation. For more info:
### http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedShuffleSplit.html

# Example starting point. Try investigating other evaluation techniques!
# features_train, features_test, labels_train, labels_test = \
    # train_test_split(features, labels, test_size=0.3, random_state=42)

### Task 6: Dump your classifier, dataset, and features_list so anyone can
### check your results. You do not need to change anything below, but make sure
### that the version of poi_id.py that you submit can be run on its own and
### generates the necessary .pkl files for validating your results.

# dump_classifier_and_data(clf, my_dataset, features_list)
dump_classifier_and_data(clf, data_dict, features_list)

开发者ID:DaniGate，项目名称:ud120-projects，代码行数:30，代码来源:poi_id.py

示例19: SelectKBest

                ('select_features', SelectKBest(f_classif, k=opt_features)),
                ('reduce_dim', PCA()),
                ('naive', GaussianNB())])
clf.fit(features_train, labels_train)
pred = clf.predict(features_test)
print("")
print("Efficiency of selected algorithm:")
print 'F1 score:\t', '{0:.2f}'.format(f1_score(labels_test, pred))
print 'Accuracy:\t',  '{0:.2f}'.format(accuracy_score(labels_test, pred))
print 'Precision:\t', '{0:.2f}'.format(precision_score(labels_test, pred))
print 'Recall:\t',  '{0:.2f}'.format(recall_score(labels_test, pred))

scores = clf.named_steps['select_features'].scores_
features_selected_bool = clf.named_steps['select_features'].get_support(indices=True)
features_selected = [features_list[i+1] for i in features_selected_bool]
features_scores = [scores[i] for i in features_selected_bool]

print("")
print('Feature scores:')
for i in range(len(features_scores)):
    print features_selected[i], '{0:.2f}'.format(features_scores[i])

features_selected.insert(0, 'poi')

### Task 6: Dump your classifier, dataset, and features_list so anyone can
### check your results. You do not need to change anything below, but make sure
### that the version of poi_id.py that you submit can be run on its own and
### generates the necessary .pkl files for validating your results.

dump_classifier_and_data(clf, my_dataset, features_selected)

开发者ID:nikita-barsukov，项目名称:intro-to-ml，代码行数:30，代码来源:poi_id.py

示例20: test_clf

test_clf(grid_search, labels, features, parameters)

clf = AdaBoostClassifier()
parameters = {'n_estimators': [10, 20, 30, 40, 50],
               'algorithm': ['SAMME', 'SAMME.R'],
               'learning_rate': [.5,.8, 1, 1.2, 1.5]}
grid_search = GridSearchCV(clf, parameters)
print '\nAdaBoost:'
test_clf(grid_search, labels, features, parameters)

### Task 5: Tune your classifier to achieve better than .3 precision and recall 
### using our testing script. Check the tester.py script in the final project
### folder for details on the evaluation method, especially the test_classifier
### function. Because of the small size of the dataset, the script uses
### stratified shuffle split cross validation. For more info: 
### http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedShuffleSplit.html

# Example starting point. Try investigating other evaluation techniques!
from sklearn.cross_validation import train_test_split
features_train, features_test, labels_train, labels_test = train_test_split(features, labels, test_size=0.3, random_state=42)

from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()

### Task 6: Dump your classifier, dataset, and features_list so anyone can
### check your results. You do not need to change anything below, b

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

Python tester.test_classifier函数代码示例发布时间：2022-05-27

Python utils.update_storage函数代码示例发布时间：2022-05-27

Python util.grid_equal函数代码示例

1 Python 入门教程

Python入门教程 Python 是一种解释型、面向对象、动态数据类型的高级程序设计语言。 P

阅读：13772|2022-01-22

2 Python wikiutil.getFrontPage函数代码示例

Python wikiutil.getFrontPage函数代码示例

阅读：9579|2022-05-24

3 Python 简介

Python 简介 Python 是一个高层次的结合了解释性、编译性、互动性和面向对象的脚本

阅读：4066|2022-01-22

4 Python tests.group函数代码示例

Python tests.group函数代码示例

阅读：4039|2022-05-27

5 Python util.check_if_user_has_permission

Python util.check_if_user_has_permission函数代码示例

阅读：3819|2022-05-27

6 Python 操练实例98

Python 练习实例98 Python 100例题目：从键盘输入一个字符串，将小写字母全部转换成大

阅读：3499|2022-01-22

7 Python 环境搭建

Python 环境搭建本章节我们将向大家介绍如何在本地搭建 Python 开发环境。 Py

阅读：3022|2022-01-22

8 Python output.darkgreen函数代码示例

Python output.darkgreen函数代码示例

阅读：2638|2022-05-25

9 Python 基础语法

Python 基础语法 Python 语言与 Perl，C 和 Java 等语言有许多相似之处。但是，也

阅读：2624|2022-01-22

10 Python 中文编码

Python 中文编码前面章节中我们已经学会了如何用 Python 输出 Hello, World!，英文没

阅读：2290|2022-01-22

客服电话

电子邮件

Python tester.dump_classifier_and_data函数代码示例

示例1: finish

示例2: setup_and_test

示例3: detect_poi

示例4: main

示例5: Pipeline

示例6: SelectKBest

示例7: build_model

示例8: evaluate_clasifier

示例9: zip

示例10: Pipeline

示例11: main

示例12: zip

示例13: make_pipeline

示例14: Pipeline

示例15: time

示例16: train_test_split

示例17: main

示例18: accuracy_score

示例19: SelectKBest

示例20: test_clf

请发表评论

全部评论

上一篇：

下一篇：

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.grid_equal函数代码示例

Python util.get_worker_name函数代码示例

Python util.get_webmention_target函数代

Python util.get_uuid函数代码示例

Python util.get_type_by_name函数代码示例

Python util.get_stdout函数代码示例

关于我们

产品与服务

解决方案

139-2527-9053