-
Every few months, another genius comes up with the same
brilliant business plan:
-
Automatically patrol Web sites for copyright infringement!
-
Copyright holders will pay us millions! We'll be rich!
-
Of course, it doesn't work, because:
-
Yield is low
-
Incidence of false positives is high
-
Copyright holders seldom realize any new revenue from
the practice, so their willingness to pay is limited
-
Trolling can play hob with Web servers, however, because
-
Image and audio files are often big
-
Trollers usually open many connections at once
-
robots.txt etiquette intentionally ignored
-
Solution: Detect mass access via
-
Access patterns (will usually grab html files first, then
come back to grab images and audio en masse and faster than humanly possible)
-
HTTP_USER_AGENT
-
Domain names (e.g. "imagelock.com")
-
Blackhole and complain to ISP
|