【机器学习案列】使用随机森林(RF)进行白葡萄酒质量预测
<section id="nice" data-tool="mdnice编辑器" data-website="https://www.mdnice.com" style="margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 10px; padding-right: 10px; background-attachment: scroll; background-clip: border-box; background-color: rgba(0, 0, 0, 0); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; font-family: Optima, 'Microsoft YaHei', PingFangSC-regular, serif; font-size: 16px; color: rgb(0, 0, 0); line-height: 1.5em; word-spacing: 0em; letter-spacing: 0em; word-break: break-word; overflow-wrap: break-word; text-align: left;"><blockquote class="custom-blockquote multiquote-1" data-tool="mdnice编辑器" style="margin-top: 20px; margin-bottom: 20px; margin-left: 0px; margin-right: 0px; padding-top: 10px; padding-bottom: 10px; padding-left: 20px; padding-right: 10px; border-top-style: none; border-bottom-style: none; border-left-style: solid; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; background-attachment: scroll; background-clip: border-box; background-color: rgba(0, 0, 0, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px; display: block; overflow-x: auto; overflow-y: auto;"><span style="display: none; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: normal;"></span><p style="text-indent: 0em; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;">🧑 博主简介:曾任某智慧城市类企业<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">算法总监</code>,目前在美国市场的物流公司从事<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">高级算法工程师</code>一职,深耕人工智能领域,精通python数据挖掘、可视化、机器学习等,发表过AI相关的专利并多次在AI类比赛中获奖。CSDN人工智能领域的优质创作者,提供AI相关的技术咨询、项目开发和个性化解决方案等服务,如有需要请站内私信或者联系任意文章底部的的VX名片(ID:<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">xf982831907</code>)</p>
</blockquote>
<blockquote class="custom-blockquote multiquote-1" data-tool="mdnice编辑器" style="margin-top: 20px; margin-bottom: 20px; margin-left: 0px; margin-right: 0px; padding-top: 10px; padding-bottom: 10px; padding-left: 20px; padding-right: 10px; border-top-style: none; border-bottom-style: none; border-left-style: solid; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; background-attachment: scroll; background-clip: border-box; background-color: rgba(0, 0, 0, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px; display: block; overflow-x: auto; overflow-y: auto;"><span style="display: none; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: normal;"></span>
<p style="text-indent: 0em; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px; color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal; margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0px;">💬 博主粉丝群介绍:① 群内初中生、高中生、本科生、研究生、博士生遍布,可互相学习,交流困惑。② 热榜top10的常客也在群里,也有数不清的万粉大佬,可以交流写作技巧,上榜经验,涨粉秘籍。③ 群内也有职场精英,大厂大佬,可交流技术、面试、找工作的经验。④ 进群免费赠送写作秘籍一份,助你由写作小白晋升为创作大佬。⑤ 进群赠送CSDN评论防封脚本,送真活跃粉丝,助你提升文章热度。有兴趣的加文末联系方式,备注自己的CSDN昵称,拉你进群,互相学习共同进步。</p>
</blockquote>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">一、引言</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 在葡萄酒产业中,质量评估是一个复杂的过程,涉及到多个化学和感官因素。随着机器学习技术的发展,我们可以使用这些技术来预测葡萄酒的质量。在这篇文章中,我们将使用Python中的<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>来预测白葡萄酒的质量。我们将通过分析数据集中的多个化学成分来训练一个模型,该模型能够预测葡萄酒的质量评分。</p>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">二、数据集介绍</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 我们的数据集包含了多个与葡萄酒质量相关的化学成分,这些成分包括:</p>
<ul data-tool="mdnice编辑器" style="list-style-type: disc; margin-top: 8px; margin-bottom: 8px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 25px; padding-right: 0px; color: rgb(0, 0, 0);">
<li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">非挥发性酸(fixed acidity)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">挥发性酸(volatile acidity)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">柠檬酸(citric acid)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">残糖(residual sugar)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">氯化物(chlorides)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">游离二氧化硫(free sulfur dioxide)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">总二氧化硫(total sulfur dioxide)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">密度(density)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">酸碱度(pH)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">硫酸盐(sulphates)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">酒精(alcohol)</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">葡萄酒质量(quality,0-10)</section></li></ul>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">三、环境准备</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 在开始之前,确保你的Python环境中安装了以下库:</p>
<ul data-tool="mdnice编辑器" style="list-style-type: disc; margin-top: 8px; margin-bottom: 8px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 25px; padding-right: 0px; color: rgb(0, 0, 0);">
<li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">pandas</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">numpy</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">scikit-learn</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">matplotlib</section></li><li><section style="margin-top: 5px; margin-bottom: 5px; color: rgb(1, 1, 1); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; font-weight: normal;">seaborn</section></li></ul>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 你可以通过以下命令安装这些库:</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">pip install pandas numpy scikit-learn matplotlib seaborn<br></code></pre>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">四、数据预处理</span><span class="suffix" style="display: none;"></span></h2>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.1 导入相应的分析库和数据加载</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 首先,我们需要加载数据集。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> warnings <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For warning handling</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Third-party imports</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> pandas <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> pd <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For data processing, CSV file I/O</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> numpy <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> np <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For numerical operations and mathematical functions</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> matplotlib.pyplot <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> plt <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For data visualization</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> seaborn <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> sns <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For statistical graphics</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> plotly.express <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> px <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For interactive plotting</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span> sklearn.model_selection <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> train_test_split <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For data splitting for machine learning</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span> sklearn.preprocessing <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> MinMaxScaler, StandardScaler <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For feature standardization</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span> sklearn.metrics <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> accuracy_score <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For model evaluation</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span> termcolor <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> colored <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For colored text printing</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span> sklearn.ensemble <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> RandomForestClassifier <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For random forest classifier model</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For warning handling</span><br>warnings.filterwarnings(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'ignore'</span>) <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># For ignoring warnings</span><br></code></pre>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 加载相应的数据集。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># load data</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">try</span>:<br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Relative file path</span><br> filePath = <span class="hljs-string" style="color: #98c379; line-height: 26px;">"winequality-white.csv"</span><br> <br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Read the CSV file and save it in "data" variable</span><br> data= pd.read_csv(filePath,sep=<span class="hljs-string" style="color: #98c379; line-height: 26px;">';'</span>)<br> <br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Check loading data</span><br> print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"THE DATASET LOADED SUCCESSFULLY..."</span>, <span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>, attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">except</span> FileNotFoundError:<br> print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"ERROR: File not found!"</span>, <span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>, attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">except</span> Exception <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> e:<br> print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"ERROR: <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{e}</span>"</span>, <span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>, attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/e787a97f-0a72-4ee1-b392-d20ffa56eea6.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.2 数据探索</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 在进行任何预处理之前,我们应该对数据有一个基本的了解,首先查看数据前几行。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 查看数据集的前几行</span><br>dataset_rows = data.head(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">7</span>) <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#.head() the default value = 5</span><br><br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'As you can see, the first 7 rows in the dataset:\n'</span>, <span class="hljs-string" style="color: #98c379; line-height: 26px;">'green'</span>, attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Iterate over each row in the dataset_rows DataFrame</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span> index, row <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span> dataset_rows.iterrows():<br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print the index label of the current row</span><br> print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Row <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{index + <span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>}</span>:"</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"white"</span>,attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br> <br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print the content of the current row</span><br> print(row)<br> <br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print a separator line</span><br> print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"--------------------------------------"</span>)<br><br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/1d396693-4f3b-4fa0-b376-464934bed379.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 查看数据的基本情况,包括shape、特征、总数等等。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"The shape ="</span>,data.shape)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Show information about the dataset</span><br>num_rows, num_cols = data.shape<br>num_features = num_cols - <span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span><br>num_data = num_rows * num_cols<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print the information</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number of Rows: <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_rows}</span>"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number of Columns: <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_cols}</span>"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number of Features: <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_features}</span>"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Number of All Data: <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{num_data}</span>"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Check and ensure running</span><br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"The task has been completed without any errors...."</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>, attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/ad07ae5b-f6b5-45c6-94c8-bc51c9a6e923.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 查看数据集的信息</span><br>print(data.info())<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/a41cc931-3b78-4b8b-ad6c-295bdc0e265d.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 查看数据统计特征。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">data.describe().T.round(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">2</span>)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/3776bde6-43ef-4d60-9fde-4b9966cd606d.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 查看数据标签的分布情况。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Create a count plot using seaborn</span><br>sns.catplot(data=data, x=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'quality'</span>, kind=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'count'</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Add labels and title to the plot</span><br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Distribution of Wine Quality'</span>)<br>plt.xlabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Quality'</span>)<br>plt.ylabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Count'</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Display the plot</span><br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/f501a692-eaf5-4a3f-99a9-93ced161dc89.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 这里的数据EDA部分不在做详细的介绍,具体参考<a href="https://blog.csdn.net/qq_38614074/article/details/144593316" style="color: rgb(30, 107, 184); font-weight: bold; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; text-decoration-line: none; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; overflow-wrap: break-word;">【数据可视化案列】白葡萄酒质量数据的EDA可视化分析</a>一文即可。</p>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.3 异常值处理</span><span class="suffix" style="display: none;"></span></h3>
<h4 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 18px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.3.1 缺失值处理</span><span class="suffix" style="display: none;"></span></h4>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Check for missing values</span><br>null_counts = data.isnull().sum() <br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Display the number of null values</span><br>print(null_counts)<br><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"_________________________________________________________________"</span>)<br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Totally, there are <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{null_counts.sum()}</span> null values in the dataset."</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>, attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/10eb62ec-4d61-497c-99d9-cba5e6d39f86.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 发现数据集中无缺失值的存在(万恶的资本主义数据集中都没有空值)。</p>
<h4 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 18px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">4.3.2 异常值处理</span><span class="suffix" style="display: none;"></span></h4>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Set the figure size</span><br>plt.figure(figsize=(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">22</span>, <span class="hljs-number" style="color: #d19a66; line-height: 26px;">11</span>))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Add outliers to the plot</span><br>sns.stripplot(data=data, color=<span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>, jitter=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">0.2</span>, size=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Set the axis labels and title</span><br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Outliers"</span>)<br>plt.xlabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"X-axis label"</span>)<br>plt.ylabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Y-axis label"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Show the plot</span><br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/bc363e19-e395-4ee4-82b9-d5e35640124b.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Delete the outliers</span><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># The data before deleting outliers </span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Before Removing the outliers"</span>, data.shape)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Deleting outliers (Removing the number of observation where the total sulfur dioxide is more than 160)</span><br>data = data<<span class="hljs-number" style="color: #d19a66; line-height: 26px;">160</span>]<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#The data after deleting outliers</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"After Removing the outliers"</span>, data.shape)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/58c0c7e1-c407-4a2d-a9f2-3de54c5d6be6.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Set the figure size</span><br>plt.figure(figsize=(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">22</span>, <span class="hljs-number" style="color: #d19a66; line-height: 26px;">11</span>))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Add outliers to the plot</span><br>sns.stripplot(data=data, color=<span class="hljs-string" style="color: #98c379; line-height: 26px;">"red"</span>, jitter=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">0.2</span>, size=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Set the axis labels and title</span><br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Outliers"</span>)<br>plt.xlabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"X-axis label"</span>)<br>plt.ylabel(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Y-axis label"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Show the plot</span><br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/d06a4f4f-d7e8-4d63-9281-69c8dcb6bcbf.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">五、模型训练</span><span class="suffix" style="display: none;"></span></h2>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.1 数据标签的0/1化</span><span class="suffix" style="display: none;"></span></h3>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Split the data into features (X) and target variable (Y)</span><br>X = data.drop(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'quality'</span>,axis=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Create a new series 'Y' by applying a lambda function to the 'quality' column of the 'data' DataFrame</span><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># The lambda function assigns a value of 1 if the 'quality' value is greater than or equal to 5, otherwise assigns 0</span><br>Y = data[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'quality'</span>].apply(<span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">lambda</span> y_value: <span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span> <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">if</span> y_value >= <span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span> <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">else</span> <span class="hljs-number" style="color: #d19a66; line-height: 26px;">0</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print the shapes of X and Y to verify the splitting</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape of X:"</span>, X.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape of Y:"</span>, Y.shape)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/8ad2d7e1-9ee5-4b56-96a5-23b37a5c9ec2.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.2 数据的归一化</span><span class="suffix" style="display: none;"></span></h3>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Rescale and normalize the features</span><br><span class="hljs-string" style="color: #98c379; line-height: 26px;">'''<br># Standardization (Normalization)<br>standard_scaler = StandardScaler()<br>X = standard_scaler.fit_transform(X)<br>'''</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Min-Max Scaling (Rescaling)</span><br>min_max_scaler = MinMaxScaler()<br>X = min_max_scaler.fit_transform(X)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;">#I will choose one of them in the future part "model selection" based on the highest accuracy</span><br></code></pre>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.3 划分数据集</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 我们将数据集划分为训练集和测试集。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;">X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">0.2</span>, random_state=<span class="hljs-number" style="color: #d19a66; line-height: 26px;">44</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print the shapes of the training and testing sets to verify the splitting</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape of X_train:"</span>, X_train.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape of X_test:"</span>, X_test.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape of Y_train:"</span>, Y_train.shape)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"Shape of Y_test:"</span>, Y_test.shape)<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/5e610a57-19c1-4209-8ca9-3fd226cc5b4c.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.4 训练模型</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 使用<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>训练模型。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Initialize lists to store training and testing accuracies</span><br>scoreListRF_Train = []<br>scoreListRF_Test = []<br><br><span class="hljs-string" style="color: #98c379; line-height: 26px;">'''<br>max_dep ----------> (1, 5),(1, 10) <br>rand_state ----------> (1, 35),(1, 50)<br>n_est ----------> (1, 30),(1, 30)<br>'''</span><br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Iterate over different values of max_depth</span><br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span> max_dep <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span> range(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>, <span class="hljs-number" style="color: #d19a66; line-height: 26px;">5</span>):<br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Iterate over different values of random_state</span><br> <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span> rand_state <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span> range(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>, <span class="hljs-number" style="color: #d19a66; line-height: 26px;">20</span>):<br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Iterate over different values of n_estimators</span><br> <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">for</span> n_est <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">in</span> range(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">1</span>, <span class="hljs-number" style="color: #d19a66; line-height: 26px;">15</span>):<br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Create a Random Forest model with the different values of max_depth, random_state, and n_estimators</span><br> Model = RandomForestClassifier(n_estimators=n_est, random_state=rand_state, max_depth=max_dep) <br> <br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Fit the model on the training data</span><br> Model.fit(X_train, Y_train)<br> <br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Calculate and store the training accuracy</span><br> scoreListRF_Train.append(Model.score(X_train, Y_train))<br> <br> <span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Calculate and store the testing accuracy</span><br> scoreListRF_Test.append(Model.score(X_test, Y_test))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Find the maximum accuracy for both training and testing</span><br>RF_Accuracy_Train = max(scoreListRF_Train) <br>RF_Accuracy_Test = max(scoreListRF_Test)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print the best accuracies achieved</span><br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Random Forest best accuracy (Training): <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{RF_Accuracy_Train*<span class="hljs-number" style="color: #d19a66; line-height: 26px;">100</span>:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">.2</span>f}</span>%"</span>)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f"Random Forest best accuracy (Testing): <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{RF_Accuracy_Test*<span class="hljs-number" style="color: #d19a66; line-height: 26px;">100</span>:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">.2</span>f}</span>%"</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># Print a success message indicating that the model has been trained successfully</span><br>print(colored(<span class="hljs-string" style="color: #98c379; line-height: 26px;">"The Random Forest model has been trained successfully"</span>,<span class="hljs-string" style="color: #98c379; line-height: 26px;">"green"</span>, attrs=[<span class="hljs-string" style="color: #98c379; line-height: 26px;">'reverse'</span>]))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/04e2145b-8f2f-41a8-bf40-ab07b95a72e3.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.5 模型评估</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 评估模型的性能。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">from</span> sklearn.metrics <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> accuracy_score, classification_report, confusion_matrix<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 预测测试集</span><br>y_pred = Model.predict(X_test)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 计算准确率</span><br>accuracy = accuracy_score(Y_test, y_pred)<br>print(<span class="hljs-string" style="color: #98c379; line-height: 26px;">f'Accuracy: <span class="hljs-subst" style="color: #e06c75; line-height: 26px;">{accuracy:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">.2</span>f}</span>'</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 打印分类报告</span><br>print(classification_report(Y_test, y_pred))<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 打印混淆矩阵</span><br>print(confusion_matrix(Y_test, y_pred))<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/164e822b-3a22-44e4-b859-e2c03cc3cc9b.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h3 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 20px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">5.6 特征重要性</span><span class="suffix" style="display: none;"></span></h3>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> <code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>提供了一个方便的特性,即可以查看每个特征的重要性。</p>
<pre class="custom" data-tool="mdnice编辑器" style="border-radius: 5px; box-shadow: rgba(0, 0, 0, 0.55) 0px 2px 10px; text-align: left; margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px;"><span style="display: block; background: url(https://files.mdnice.com/user/3441/876cad08-0422-409d-bb5a-08afec5da8ee.svg); height: 30px; width: 100%; background-size: 40px; background-repeat: no-repeat; background-color: #282c34; margin-bottom: -7px; border-radius: 5px; background-position: 10px 10px;"></span><code class="hljs" style="overflow-x: auto; padding: 16px; color: #abb2bf; padding-top: 15px; background: #282c34; border-radius: 5px; display: -webkit-box; font-family: Consolas, Monaco, Menlo, monospace; font-size: 12px;"><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> matplotlib.pyplot <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> plt<br><span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">import</span> seaborn <span class="hljs-keyword" style="color: #c678dd; line-height: 26px;">as</span> sns<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 获取特征重要性</span><br>feature_importances = Model.feature_importances_<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 创建一个DataFrame来存储特征和它们的重要性</span><br>feature_importance_df = pd.DataFrame({<br> <span class="hljs-string" style="color: #98c379; line-height: 26px;">'Feature'</span>: data.columns.tolist()[:<span class="hljs-number" style="color: #d19a66; line-height: 26px;">-1</span>],<br> <span class="hljs-string" style="color: #98c379; line-height: 26px;">'Importance'</span>: feature_importances<br>})<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 对特征重要性进行排序</span><br>feature_importance_df = feature_importance_df.sort_values(by=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Importance'</span>, ascending=<span class="hljs-literal" style="color: #56b6c2; line-height: 26px;">False</span>)<br><br><span class="hljs-comment" style="color: #5c6370; font-style: italic; line-height: 26px;"># 绘制特征重要性图</span><br>plt.figure(figsize=(<span class="hljs-number" style="color: #d19a66; line-height: 26px;">10</span>, <span class="hljs-number" style="color: #d19a66; line-height: 26px;">8</span>))<br>sns.barplot(x=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Importance'</span>, y=<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Feature'</span>, data=feature_importance_df)<br>plt.title(<span class="hljs-string" style="color: #98c379; line-height: 26px;">'Feature Importance'</span>)<br>plt.show()<br></code></pre>
<figure data-tool="mdnice编辑器" style="margin-top: 10px; margin-bottom: 10px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: flex; flex-direction: column; justify-content: center; align-items: center;"><img src="https://files.mdnice.com/user/84866/eccc8690-9957-4474-977a-56dff83178d4.png" alt style="display: block; margin-top: 0px; margin-right: auto; margin-bottom: 0px; margin-left: auto; max-width: 100%; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgba(0, 0, 0, 0.4); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 0px; border-top-right-radius: 0px; border-bottom-right-radius: 0px; border-bottom-left-radius: 0px; object-fit: fill; box-shadow: rgba(0, 0, 0, 0) 0px 0px 0px 0px;"></figure>
<h2 data-tool="mdnice编辑器" style="margin-top: 30px; margin-bottom: 15px; margin-left: 0px; margin-right: 0px; padding-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; display: block;"><span class="prefix" style="display: none;"></span><span class="content" style="font-size: 22px; color: rgb(0, 0, 0); line-height: 1.5em; letter-spacing: 0em; text-align: left; font-weight: bold; display: block;">六、结论</span><span class="suffix" style="display: none;"></span></h2>
<p data-tool="mdnice编辑器" style="color: rgb(0, 0, 0); font-size: 16px; line-height: 1.8em; letter-spacing: 0em; text-align: left; text-indent: 0em; margin-top: 0px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; padding-top: 8px; padding-bottom: 8px; padding-left: 0px; padding-right: 0px;"> 通过使用<code style="color: rgb(30, 107, 184); font-size: 14px; line-height: 1.8em; letter-spacing: 0em; background-attachment: scroll; background-clip: border-box; background-color: rgba(27, 31, 35, 0.05); background-image: none; background-origin: padding-box; background-position-x: 0%; background-position-y: 0%; background-repeat: no-repeat; background-size: auto; width: auto; height: auto; margin-top: 0px; margin-bottom: 0px; margin-left: 2px; margin-right: 2px; padding-top: 2px; padding-bottom: 2px; padding-left: 4px; padding-right: 4px; border-top-style: none; border-bottom-style: none; border-left-style: none; border-right-style: none; border-top-width: 3px; border-bottom-width: 3px; border-left-width: 3px; border-right-width: 3px; border-top-color: rgb(0, 0, 0); border-bottom-color: rgba(0, 0, 0, 0.4); border-left-color: rgba(0, 0, 0, 0.4); border-right-color: rgba(0, 0, 0, 0.4); border-top-left-radius: 4px; border-top-right-radius: 4px; border-bottom-right-radius: 4px; border-bottom-left-radius: 4px; overflow-wrap: break-word; font-family: Consolas, Monaco, Menlo, monospace; word-break: break-all;">RandomForestClassifier</code>,我们能够预测白葡萄酒的质量。在这个过程中,我们进行了数据预处理、特征选择、模型训练和评估,并分析了特征的重要性。这只是一个简单的示例,实际应用中可能需要更复杂的数据预处理和模型调优。</p>
</section>
页:
[1]