[Koha-bugs] [Bug 4042] Public OPAC search can fall prey to web crawlers

Thu Dec 22 18:37:57 CET 2016

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=4042

--- Comment #7 from Fred P <fred.pierre at smfpl.org> ---
I don't believe this is a Koha issue. Any public site can be "hit" by any user.
Blocking Chinese search giant Baidu makes a big difference. Disallow their
robots and you will get a lot less hits. You can also block by ip address range
by editing your Apache .htaccess file. Keep in mind that you want to back that
file up before making changes and take precautions to not block your own
access!

In the .htaccess for the appropriate site directory, blocking range 180.76
would disable baidu search engines:

order allow,deny
#partial ip addresses blocking
deny from 180.76

Adding this to your root directory as a robots.txt file should warn off Yandex
and Baidu robots, however spiders change and respect for the robots.txt varies:

#Baiduspider
User-agent: Baiduspider
Disallow: /

#Yandex
User-agent: Yandex
Disallow: /

It looks like Chris' proposals were adopted. Does this bug need to remain open?

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are watching all bug changes.