一、前言
前一阵子访问网站发现特别慢,根本没法正常使用,后来用在线测速工具,测试下来发现全国一路标红,阿里云服务器带宽全部被打满,如下图所示
二、解决方法
后面通过配置nginx的安全ip黑名单及网络爬虫安全策略,可以让网站一路标绿,访问速度比较快,如下图效果
1) 在nginx的配置conf目录,配置反爬虫脚本(/home/app/nginx/conf/agent_deny.conf)
#禁止Scrapy|curl等工具的抓取@b@@b@if ($http_user_agent ~* (Scrapy|Curl|HttpClient))@b@@b@{@b@@b@ return 403;@b@@b@}@b@@b@#禁止指定UA及UA为空的访问@b@@b@if ($http_user_agent ~ "SemrushBot|FeedDemon|Bytespider|DuckDuckGo|SemrushBot|DataForSeoBot|YandexBot|dotbot|mj12bot|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|YisouSpider|MJ12bot|heritrix|EasouSpider|Ezooms|^$" )@b@@b@{@b@@b@ return 403; @b@@b@}@b@@b@#禁止非GET|HEAD|POST方式的抓取@b@@b@if ($request_method !~ ^(GET|HEAD|POST)$)@b@@b@{@b@@b@ return 403;@b@@b@}
2 ) ip黑名单配置(/home/app/nginx/conf/blockip.conf)
deny 101.67.49.0/24;@b@deny 185.191.171.0/24; @b@deny 39.173.107.0/24; @b@deny 54.163.58.57;@b@deny 60.188.11.0/24; @b@deny 66.249.65.237;@b@deny 66.249.65.35;@b@deny 66.249.65.36;@b@deny 66.249.65.37;@b@deny 66.249.79.160;@b@deny 66.249.79.160; @b@deny 85.208.96.0/24; @b@deny 95.217.203.110;
3)编辑nginx.conf配置文件(/home/app/nginx/conf/nginx.conf)
location / { @b@ root /home/xwood/www;@b@ index index.html index.htm;@b@ proxy_read_timeout 300; @b@ @b@ include agent_deny.conf; @b@ include blockip.conf ; @b@ }
4) 重启nginx服务即可,查看访问日志网络爬虫都是都会显示403禁止状态。
185.191.171.11 - - [04/Feb/2024:23:39:03 +0800] "GET /xwood-gw/mvn-libs/org/actframework/act-starter-beetl/1.6.0.2/act-starter-beetl-1.6.0.2-sources.jar.asc HTTP/1.1" 403 169 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)"@b@...@b@85.208.96.203 - - [04/Feb/2024:23:39:05 +0800] "GET /xwood-gw/mvn-libs/com/alibaba/ageiport-test-processor-core/0.1.6/ageiport-test-processor-core-0.1.6.jar.sha1 HTTP/1.1" 403 169 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)"