Abuse Detection in Apache

< Previous Page

Return to Title Page

Next Page >

Recognizing image/MP3 trollers

Every few months, another genius comes up with the same brilliant business plan:

Automatically patrol Web sites for copyright infringement!
Copyright holders will pay us millions! We'll be rich!

Of course, it doesn't work, because:

Yield is low
Incidence of false positives is high
Copyright holders seldom realize any new revenue from the practice, so their willingness to pay is limited

Trolling can play hob with Web servers, however, because

Image and audio files are often big
Trollers usually open many connections at once
robots.txt etiquette intentionally ignored

Solution: Detect mass access via

Access patterns (will usually grab html files first, then come back to grab images and audio en masse and faster than humanly possible)
HTTP_USER_AGENT
Domain names (e.g. "imagelock.com")

Blackhole and complain to ISP