云天徽上 发表于 2024-12-20 23:10:54

【机器学习案列】使用随机森林(RF)进行白葡萄酒质量预测

<section id="nice" data-tool="mdnice编辑器" data-website="https://www.mdnice.com" style="margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 10px; padding-right: 10px; background-attachment: scroll; background-clip: border-box; background-color: rgba(0, 0, 0, 0); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; font-family: Optima, 'Microsoft YaHei', PingFangSC-regular, serif; font-size: 16px; color: rgb(0, 0, 0); line-height: 1.5em; word-spacing: 0em; letter-spacing: 0em; word-break: break-word; overflow-wrap: break-word; text-align: left;"><blockquote class="custom-blockquote multiquote-1" data-tool="mdnice编辑器" style="margin-top: 20px; margin-bottom: 20px; margin-left: 0px; margin-right: 0px; padding-top: 10px; padding-bottom: 10px; padding-left: 20px; padding-right: 10px; border-top-style: none; border-bottom-style: none; border-left-style: solid; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; background-attachment: scroll; background-clip: border-box; background-color: rgba(0, 0, 0, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px; display: block; overflow-x: auto; overflow-y: auto;"><span style="display: none; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: normal;"></span>
<p style="text-indent: 0em; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;">🧑 博主简介:曾任某智慧城市类企业<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">算法总监</code>,目前在美国市场的物流公司从事<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">高级算法工程师</code>一职,深耕人工智能领域,精通python数据挖掘、可视化、机器学习等,发表过AI相关的专利并多次在AI类比赛中获奖。CSDN人工智能领域的优质创作者,提供AI相关的技术咨询、项目开发和个性化解决方案等服务,如有需要请站内私信或者联系任意文章底部的的VX名片(ID:<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">xf982831907</code>)</p>
</blockquote>
<blockquote class="custom-blockquote multiquote-1" data-tool="mdnice编辑器" style="margin-top: 20px; margin-bottom: 20px; margin-left: 0px; margin-right: 0px; padding-top: 10px; padding-bottom: 10px; padding-left: 20px; padding-right: 10px; border-top-style: none; border-bottom-style: none; border-left-style: solid; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; background-attachment: scroll; background-clip: border-box; background-color: rgba(0, 0, 0, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px; display: block; overflow-x: auto; overflow-y: auto;"><span style="display: none; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: normal;"></span>
<p style="text-indent: 0em; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;">💬 博主粉丝群介绍:① 群内初中生、高中生、本科生、研究生、博士生遍布,可互相学习,交流困惑。② 热榜top10的常客也在群里,也有数不清的万粉大佬,可以交流写作技巧,上榜经验,涨粉秘籍。③ 群内也有职场精英,大厂大佬,可交流技术、面试、找工作的经验。④ 进群免费赠送写作秘籍一份,助你由写作小白晋升为创作大佬。⑤ 进群赠送CSDN评论防封脚本,送真活跃粉丝,助你提升文章热度。有兴趣的加文末联系方式,备注自己的CSDN昵称,拉你进群,互相学习共同进步。</p>
</blockquote>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">一、引言</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  在葡萄酒产业中,质量评估是一个复杂的过程,涉及到多个化学和感官因素。随着机器学习技术的发展,我们可以使用这些技术来预测葡萄酒的质量。在这篇文章中,我们将使用Python中的<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>来预测白葡萄酒的质量。我们将通过分析数据集中的多个化学成分来训练一个模型,该模型能够预测葡萄酒的质量评分。</p>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">二、数据集介绍</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  我们的数据集包含了多个与葡萄酒质量相关的化学成分,这些成分包括:</p>
<ul data-tool="mdnice编辑器" style="list-style-type: disc; margin-top: 8px; margin-bottom: 8px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 25px; padding-right: 0px; color: rgb(0, 0, 0);">
<li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">非挥发性酸(fixed acidity)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">挥发性酸(volatile acidity)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">柠檬酸(citric acid)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">残糖(residual sugar)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">氯化物(chlorides)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">游离二氧化硫(free sulfur dioxide)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">总二氧化硫(total sulfur dioxide)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">密度(density)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">酸碱度(pH)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">硫酸盐(sulphates)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">酒精(alcohol)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">葡萄酒质量(quality,0-10)</section></li></ul>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">三、环境准备</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  在开始之前,确保你的Python环境中安装了以下库:</p>
<ul data-tool="mdnice编辑器" style="list-style-type: disc; margin-top: 8px; margin-bottom: 8px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 25px; padding-right: 0px; color: rgb(0, 0, 0);">
<li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">pandas</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">numpy</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">scikit-learn</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">matplotlib</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">seaborn</section></li></ul>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  你可以通过以下命令安装这些库:</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">pip&nbsp;install&nbsp;pandas&nbsp;numpy&nbsp;scikit-learn&nbsp;matplotlib&nbsp;seaborn<br></code></pre>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">四、数据预处理</span><span class="suffix" style="display: none;"></span></h2>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.1 导入相应的分析库和数据加载</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  首先,我们需要加载数据集。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;warnings&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;warning&nbsp;handling</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Third-party&nbsp;imports</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;pandas&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;pd&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;data&nbsp;processing,&nbsp;CSV&nbsp;file&nbsp;I/O</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;numpy&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;np&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;numerical&nbsp;operations&nbsp;and&nbsp;mathematical&nbsp;functions</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;matplotlib.pyplot&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;plt&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;data&nbsp;visualization</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;seaborn&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;sns&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;statistical&nbsp;graphics</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;plotly.express&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;px&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;interactive&nbsp;plotting</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span>&nbsp;sklearn.model_selection&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;train_test_split&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;data&nbsp;splitting&nbsp;for&nbsp;machine&nbsp;learning</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span>&nbsp;sklearn.preprocessing&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;MinMaxScaler,&nbsp;StandardScaler&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;feature&nbsp;standardization</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span>&nbsp;sklearn.metrics&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;accuracy_score&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;model&nbsp;evaluation</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span>&nbsp;termcolor&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;colored&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;colored&nbsp;text&nbsp;printing</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span>&nbsp;sklearn.ensemble&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;RandomForestClassifier&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;random&nbsp;forest&nbsp;classifier&nbsp;model</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;warning&nbsp;handling</span><br>warnings.filterwarnings(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'ignore'</span>)&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;For&nbsp;ignoring&nbsp;warnings</span><br></code></pre>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  加载相应的数据集。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;load&nbsp;data</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">try</span>:<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Relative&nbsp;file&nbsp;path</span><br>&nbsp;&nbsp;&nbsp;&nbsp;filePath&nbsp;=&nbsp;<span class="hljs-string" style="color: #98c379; line-height: 26px;">"winequality-white.csv"</span><br>&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Read&nbsp;the&nbsp;CSV&nbsp;file&nbsp;and&nbsp;save&nbsp;it&nbsp;in&nbsp;"data"&nbsp;variable</span><br>&nbsp;&nbsp;&nbsp;&nbsp;data=&nbsp;pd.read_csv(filePath,sep=<span class="hljs-string" style="color: #98c379; line-height: 26px;">';'</span>)<br>&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Check&nbsp;loading&nbsp;data</span><br>&nbsp;&nbsp;&nbsp;&nbsp;print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"THE&nbsp;DATASET&nbsp;LOADED&nbsp;SUCCESSFULLY..."</span>,&nbsp;<span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>,&nbsp;attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">except</span>&nbsp;FileNotFoundError:<br>&nbsp;&nbsp;&nbsp;&nbsp;print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"ERROR:&nbsp;File&nbsp;not&nbsp;found!"</span>,&nbsp;<span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>,&nbsp;attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">except</span>&nbsp;Exception&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;e:<br>&nbsp;&nbsp;&nbsp;&nbsp;print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"ERROR:&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{e}</span>"</span>,&nbsp;<span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>,&nbsp;attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/e787a97f-0a72-4ee1-b392-d20ffa56eea6.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.2 数据探索</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  在进行任何预处理之前,我们应该对数据有一个基本的了解,首先查看数据前几行。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;查看数据集的前几行</span><br>dataset_rows&nbsp;=&nbsp;data.head(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">7</span>)&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#.head()&nbsp;the&nbsp;default&nbsp;value&nbsp;=&nbsp;5</span><br><br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'As&nbsp;you&nbsp;can&nbsp;see,&nbsp;the&nbsp;first&nbsp;7&nbsp;rows&nbsp;in&nbsp;the&nbsp;dataset:\n'</span>,&nbsp;<span class="hljs-string" style="color: #98c379; line-height: 26px;">'green'</span>,&nbsp;attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Iterate&nbsp;over&nbsp;each&nbsp;row&nbsp;in&nbsp;the&nbsp;dataset_rows&nbsp;DataFrame</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span>&nbsp;index,&nbsp;row&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span>&nbsp;dataset_rows.iterrows():<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;the&nbsp;index&nbsp;label&nbsp;of&nbsp;the&nbsp;current&nbsp;row</span><br>&nbsp;&nbsp;&nbsp;&nbsp;print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Row&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{index&nbsp;+&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>}</span>:"</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"white"</span>,attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br>&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;the&nbsp;content&nbsp;of&nbsp;the&nbsp;current&nbsp;row</span><br>&nbsp;&nbsp;&nbsp;&nbsp;print(row)<br>&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;a&nbsp;separator&nbsp;line</span><br>&nbsp;&nbsp;&nbsp;&nbsp;print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"--------------------------------------"</span>)<br><br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/1d396693-4f3b-4fa0-b376-464934bed379.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  查看数据的基本情况,包括shape、特征、总数等等。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"The&nbsp;shape&nbsp;="</span>,data.shape)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Show&nbsp;information&nbsp;about&nbsp;the&nbsp;dataset</span><br>num_rows,&nbsp;num_cols&nbsp;=&nbsp;data.shape<br>num_features&nbsp;=&nbsp;num_cols&nbsp;-&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span><br>num_data&nbsp;=&nbsp;num_rows&nbsp;*&nbsp;num_cols<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;the&nbsp;information</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number&nbsp;of&nbsp;Rows:&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_rows}</span>"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number&nbsp;of&nbsp;Columns:&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_cols}</span>"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number&nbsp;of&nbsp;Features:&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_features}</span>"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number&nbsp;of&nbsp;All&nbsp;Data:&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_data}</span>"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Check&nbsp;and&nbsp;ensure&nbsp;running</span><br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"The&nbsp;task&nbsp;has&nbsp;been&nbsp;completed&nbsp;without&nbsp;any&nbsp;errors...."</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>,&nbsp;attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/ad07ae5b-f6b5-45c6-94c8-bc51c9a6e923.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;查看数据集的信息</span><br>print(data.info())<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/a41cc931-3b78-4b8b-ad6c-295bdc0e265d.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  查看数据统计特征。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">data.describe().T.round(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">2</span>)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/3776bde6-43ef-4d60-9fde-4b9966cd606d.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  查看数据标签的分布情况。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Create&nbsp;a&nbsp;count&nbsp;plot&nbsp;using&nbsp;seaborn</span><br>sns.catplot(data=data,&nbsp;x=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'quality'</span>,&nbsp;kind=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'count'</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Add&nbsp;labels&nbsp;and&nbsp;title&nbsp;to&nbsp;the&nbsp;plot</span><br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Distribution&nbsp;of&nbsp;Wine&nbsp;Quality'</span>)<br>plt.xlabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Quality'</span>)<br>plt.ylabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Count'</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Display&nbsp;the&nbsp;plot</span><br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/f501a692-eaf5-4a3f-99a9-93ced161dc89.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  这里的数据EDA部分不在做详细的介绍,具体参考<a href="https://blog.csdn.net/qq_38614074/article/details/144593316" style="color: rgb(30, 107, 184); font-weight: bold; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; text-decoration-line: none; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; overflow-wrap: break-word;">【数据可视化案列】白葡萄酒质量数据的EDA可视化分析</a>一文即可。</p>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.3 异常值处理</span><span class="suffix" style="display: none;"></span></h3>
<h4 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 18px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.3.1 缺失值处理</span><span class="suffix" style="display: none;"></span></h4>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Check&nbsp;for&nbsp;missing&nbsp;values</span><br>null_counts&nbsp;=&nbsp;data.isnull().sum()&nbsp;<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Display&nbsp;the&nbsp;number&nbsp;of&nbsp;null&nbsp;values</span><br>print(null_counts)<br><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"_________________________________________________________________"</span>)<br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Totally,&nbsp;there&nbsp;are&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{null_counts.sum()}</span>&nbsp;null&nbsp;values&nbsp;in&nbsp;the&nbsp;dataset."</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>,&nbsp;attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/10eb62ec-4d61-497c-99d9-cba5e6d39f86.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  发现数据集中无缺失值的存在(万恶的资本主义数据集中都没有空值)。</p>
<h4 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 18px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.3.2 异常值处理</span><span class="suffix" style="display: none;"></span></h4>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Set&nbsp;the&nbsp;figure&nbsp;size</span><br>plt.figure(figsize=(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">22</span>,&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">11</span>))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Add&nbsp;outliers&nbsp;to&nbsp;the&nbsp;plot</span><br>sns.stripplot(data=data,&nbsp;color=<span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>,&nbsp;jitter=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">0.2</span>,&nbsp;size=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Set&nbsp;the&nbsp;axis&nbsp;labels&nbsp;and&nbsp;title</span><br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Outliers"</span>)<br>plt.xlabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"X-axis&nbsp;label"</span>)<br>plt.ylabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Y-axis&nbsp;label"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Show&nbsp;the&nbsp;plot</span><br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/bc363e19-e395-4ee4-82b9-d5e35640124b.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Delete&nbsp;the&nbsp;outliers</span><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;The&nbsp;data&nbsp;before&nbsp;deleting&nbsp;outliers&nbsp;</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Before&nbsp;Removing&nbsp;the&nbsp;outliers"</span>,&nbsp;data.shape)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Deleting&nbsp;outliers&nbsp;(Removing&nbsp;the&nbsp;number&nbsp;of&nbsp;observation&nbsp;where&nbsp;the&nbsp;total&nbsp;sulfur&nbsp;dioxide&nbsp;is&nbsp;more&nbsp;than&nbsp;160)</span><br>data&nbsp;=&nbsp;data&lt;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">160</span>]<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#The&nbsp;data&nbsp;after&nbsp;deleting&nbsp;outliers</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"After&nbsp;Removing&nbsp;the&nbsp;outliers"</span>,&nbsp;data.shape)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/58c0c7e1-c407-4a2d-a9f2-3de54c5d6be6.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Set&nbsp;the&nbsp;figure&nbsp;size</span><br>plt.figure(figsize=(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">22</span>,&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">11</span>))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Add&nbsp;outliers&nbsp;to&nbsp;the&nbsp;plot</span><br>sns.stripplot(data=data,&nbsp;color=<span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>,&nbsp;jitter=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">0.2</span>,&nbsp;size=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Set&nbsp;the&nbsp;axis&nbsp;labels&nbsp;and&nbsp;title</span><br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Outliers"</span>)<br>plt.xlabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"X-axis&nbsp;label"</span>)<br>plt.ylabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Y-axis&nbsp;label"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Show&nbsp;the&nbsp;plot</span><br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/d06a4f4f-d7e8-4d63-9281-69c8dcb6bcbf.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">五、模型训练</span><span class="suffix" style="display: none;"></span></h2>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.1 数据标签的0/1化</span><span class="suffix" style="display: none;"></span></h3>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Split&nbsp;the&nbsp;data&nbsp;into&nbsp;features&nbsp;(X)&nbsp;and&nbsp;target&nbsp;variable&nbsp;(Y)</span><br>X&nbsp;=&nbsp;data.drop(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'quality'</span>,axis=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Create&nbsp;a&nbsp;new&nbsp;series&nbsp;'Y'&nbsp;by&nbsp;applying&nbsp;a&nbsp;lambda&nbsp;function&nbsp;to&nbsp;the&nbsp;'quality'&nbsp;column&nbsp;of&nbsp;the&nbsp;'data'&nbsp;DataFrame</span><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;The&nbsp;lambda&nbsp;function&nbsp;assigns&nbsp;a&nbsp;value&nbsp;of&nbsp;1&nbsp;if&nbsp;the&nbsp;'quality'&nbsp;value&nbsp;is&nbsp;greater&nbsp;than&nbsp;or&nbsp;equal&nbsp;to&nbsp;5,&nbsp;otherwise&nbsp;assigns&nbsp;0</span><br>Y&nbsp;=&nbsp;data[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'quality'</span>].apply(<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">lambda</span>&nbsp;y_value:&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">if</span>&nbsp;y_value&nbsp;&gt;=&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span>&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">else</span>&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">0</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;the&nbsp;shapes&nbsp;of&nbsp;X&nbsp;and&nbsp;Y&nbsp;to&nbsp;verify&nbsp;the&nbsp;splitting</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape&nbsp;of&nbsp;X:"</span>,&nbsp;X.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape&nbsp;of&nbsp;Y:"</span>,&nbsp;Y.shape)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/8ad2d7e1-9ee5-4b56-96a5-23b37a5c9ec2.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.2 数据的归一化</span><span class="suffix" style="display: none;"></span></h3>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Rescale&nbsp;and&nbsp;normalize&nbsp;the&nbsp;features</span><br><span class="hljs-string" style="color: #98c379; line-height: 26px;">'''<br>#&nbsp;Standardization&nbsp;(Normalization)<br>standard_scaler&nbsp;=&nbsp;StandardScaler()<br>X&nbsp;=&nbsp;standard_scaler.fit_transform(X)<br>'''</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Min-Max&nbsp;Scaling&nbsp;(Rescaling)</span><br>min_max_scaler&nbsp;=&nbsp;MinMaxScaler()<br>X&nbsp;=&nbsp;min_max_scaler.fit_transform(X)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#I&nbsp;will&nbsp;choose&nbsp;one&nbsp;of&nbsp;them&nbsp;in&nbsp;the&nbsp;future&nbsp;part&nbsp;"model&nbsp;selection"&nbsp;based&nbsp;on&nbsp;the&nbsp;highest&nbsp;accuracy</span><br></code></pre>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.3 划分数据集</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  我们将数据集划分为训练集和测试集。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">X_train,&nbsp;X_test,&nbsp;Y_train,&nbsp;Y_test&nbsp;=&nbsp;train_test_split(X,&nbsp;Y,&nbsp;test_size=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">0.2</span>,&nbsp;random_state=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">44</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;the&nbsp;shapes&nbsp;of&nbsp;the&nbsp;training&nbsp;and&nbsp;testing&nbsp;sets&nbsp;to&nbsp;verify&nbsp;the&nbsp;splitting</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape&nbsp;of&nbsp;X_train:"</span>,&nbsp;X_train.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape&nbsp;of&nbsp;X_test:"</span>,&nbsp;X_test.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape&nbsp;of&nbsp;Y_train:"</span>,&nbsp;Y_train.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape&nbsp;of&nbsp;Y_test:"</span>,&nbsp;Y_test.shape)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/5e610a57-19c1-4209-8ca9-3fd226cc5b4c.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.4 训练模型</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  使用<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>训练模型。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Initialize&nbsp;lists&nbsp;to&nbsp;store&nbsp;training&nbsp;and&nbsp;testing&nbsp;accuracies</span><br>scoreListRF_Train&nbsp;=&nbsp;[]<br>scoreListRF_Test&nbsp;=&nbsp;[]<br><br><span class="hljs-string" style="color: #98c379; line-height: 26px;">'''<br>max_dep&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;----------&gt;&nbsp;(1,&nbsp;5),(1,&nbsp;10)&nbsp;<br>rand_state&nbsp;&nbsp;&nbsp;----------&gt;&nbsp;(1,&nbsp;35),(1,&nbsp;50)<br>n_est&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;----------&gt;&nbsp;(1,&nbsp;30),(1,&nbsp;30)<br>'''</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Iterate&nbsp;over&nbsp;different&nbsp;values&nbsp;of&nbsp;max_depth</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span>&nbsp;max_dep&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span>&nbsp;range(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>,&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span>):<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Iterate&nbsp;over&nbsp;different&nbsp;values&nbsp;of&nbsp;random_state</span><br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span>&nbsp;rand_state&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span>&nbsp;range(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>,&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">20</span>):<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Iterate&nbsp;over&nbsp;different&nbsp;values&nbsp;of&nbsp;n_estimators</span><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span>&nbsp;n_est&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span>&nbsp;range(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>,&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">15</span>):<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Create&nbsp;a&nbsp;Random&nbsp;Forest&nbsp;model&nbsp;with&nbsp;the&nbsp;different&nbsp;values&nbsp;of&nbsp;max_depth,&nbsp;random_state,&nbsp;and&nbsp;n_estimators</span><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Model&nbsp;=&nbsp;RandomForestClassifier(n_estimators=n_est,&nbsp;random_state=rand_state,&nbsp;max_depth=max_dep)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Fit&nbsp;the&nbsp;model&nbsp;on&nbsp;the&nbsp;training&nbsp;data</span><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Model.fit(X_train,&nbsp;Y_train)<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Calculate&nbsp;and&nbsp;store&nbsp;the&nbsp;training&nbsp;accuracy</span><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;scoreListRF_Train.append(Model.score(X_train,&nbsp;Y_train))<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Calculate&nbsp;and&nbsp;store&nbsp;the&nbsp;testing&nbsp;accuracy</span><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;scoreListRF_Test.append(Model.score(X_test,&nbsp;Y_test))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Find&nbsp;the&nbsp;maximum&nbsp;accuracy&nbsp;for&nbsp;both&nbsp;training&nbsp;and&nbsp;testing</span><br>RF_Accuracy_Train&nbsp;=&nbsp;max(scoreListRF_Train)&nbsp;<br>RF_Accuracy_Test&nbsp;=&nbsp;max(scoreListRF_Test)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;the&nbsp;best&nbsp;accuracies&nbsp;achieved</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Random&nbsp;Forest&nbsp;best&nbsp;accuracy&nbsp;(Training):&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{RF_Accuracy_Train*<span class="hljs-number" style="color: #d19a66; line-height: 26px;">100</span>:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">.2</span>f}</span>%"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Random&nbsp;Forest&nbsp;best&nbsp;accuracy&nbsp;(Testing):&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{RF_Accuracy_Test*<span class="hljs-number" style="color: #d19a66; line-height: 26px;">100</span>:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">.2</span>f}</span>%"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;Print&nbsp;a&nbsp;success&nbsp;message&nbsp;indicating&nbsp;that&nbsp;the&nbsp;model&nbsp;has&nbsp;been&nbsp;trained&nbsp;successfully</span><br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"The&nbsp;Random&nbsp;Forest&nbsp;model&nbsp;has&nbsp;been&nbsp;trained&nbsp;successfully"</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>,&nbsp;attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/04e2145b-8f2f-41a8-bf40-ab07b95a72e3.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.5 模型评估</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  评估模型的性能。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span>&nbsp;sklearn.metrics&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;accuracy_score,&nbsp;classification_report,&nbsp;confusion_matrix<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;预测测试集</span><br>y_pred&nbsp;=&nbsp;Model.predict(X_test)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;计算准确率</span><br>accuracy&nbsp;=&nbsp;accuracy_score(Y_test,&nbsp;y_pred)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f'Accuracy:&nbsp;<span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{accuracy:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">.2</span>f}</span>'</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;打印分类报告</span><br>print(classification_report(Y_test,&nbsp;y_pred))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;打印混淆矩阵</span><br>print(confusion_matrix(Y_test,&nbsp;y_pred))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/164e822b-3a22-44e4-b859-e2c03cc3cc9b.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.6 特征重要性</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  <code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>提供了一个方便的特性,即可以查看每个特征的重要性。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;matplotlib.pyplot&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;plt<br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span>&nbsp;seaborn&nbsp;<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span>&nbsp;sns<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;获取特征重要性</span><br>feature_importances&nbsp;=&nbsp;Model.feature_importances_<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;创建一个DataFrame来存储特征和它们的重要性</span><br>feature_importance_df&nbsp;=&nbsp;pd.DataFrame({<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Feature'</span>:&nbsp;data.columns.tolist()[:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">-1</span>],<br>&nbsp;&nbsp;&nbsp;&nbsp;<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Importance'</span>:&nbsp;feature_importances<br>})<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;对特征重要性进行排序</span><br>feature_importance_df&nbsp;=&nbsp;feature_importance_df.sort_values(by=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Importance'</span>,&nbsp;ascending=<span class="hljs-literal" style="color: #56b6c2; line-height: 26px;">False</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#&nbsp;绘制特征重要性图</span><br>plt.figure(figsize=(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">10</span>,&nbsp;<span class="hljs-number" style="color: #d19a66; line-height: 26px;">8</span>))<br>sns.barplot(x=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Importance'</span>,&nbsp;y=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Feature'</span>,&nbsp;data=feature_importance_df)<br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Feature&nbsp;Importance'</span>)<br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/eccc8690-9957-4474-977a-56dff83178d4.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">六、结论</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;">  通过使用<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>,我们能够预测白葡萄酒的质量。在这个过程中,我们进行了数据预处理、特征选择、模型训练和评估,并分析了特征的重要性。这只是一个简单的示例,实际应用中可能需要更复杂的数据预处理和模型调优。</p>
</section>
页: [1]
查看完整版本: 【机器学习案列】使用随机森林(RF)进行白葡萄酒质量预测