Darth Autocrat (Lyndon NA)
Darth Autocrat (Lyndon NA)

@darth_na

14 تغريدة 3 قراءة Nov 27, 2023
.
:: Been scraped by Thieves? ::
Sadly, not new.
1) A % of the net is scraped/spun/mashed-up content
2) Many of the LLM AI tools are based on stolen content.
3) There's always some scummy **** willing to steal from you
But what to do about it?
#SEO
2/?
:Prevention:
The basics are:
a) Make sure you block (DNS/Server/Script) the most obvious/common bots/scrapers.
b) Utilise Hosts/CDNs with protective firewalls etc.
c) Don't make it easy (stop putting sitemap files in robots.txt - submit via SE tools)
>>>
3/?
:Prevention (cont.):
The not so basics inc.:
d) Vary Element labels (IDs/Classes) - make it more challenging to extract with xpath etc.
e) Vary content by UA/IP (if you serve "different" content, it's not worth the clean-up for many)
>>>
4/?
:Prevention (cont.):
f) Setup bait/traps - you can push them along a huge number of fake pages, with tiny tweaks, and waste their resources, and/or monitor "new pages", and time it etc.
g) Ensure you are monitoring/recording traffic.
Then it's down to reaction...
>>>
5/?
:Reaction:
a) Spend 3 minutes looking for a contact point.
If you can (easily) find one, send a Cease and Desist.
b) Back trace to their Host, then email them with the issue, some examples, and ask them to act whilst you move to legal channels.
>>>
6/?
c) DMCA - if there isn't too much difference between your original pieces, and their "copies",
you may get some success (more instances = better)
d) Consider legal channels - it's not "cheap",
but if it's clear what has been done, and it's been done at volume, do it
>>>
7/?
e) Gather the evidence - not just pairings of your/their URLs, copies of the pages, degree's of similarity, examples from the Internet Archive etc.,
but sometimes, they are stupid fuckers,
and brag about it on social/in forums etc.
Screenshot, save page etc.
>>>
8/?
f) Do the math - Don't just calculate the lost earnings,
but also track your time/effort research, analysing, gathering evidence etc. Your time, your money!
(If you go the legal route, they will ask for this)
>>>
9/?
g) Consider what you are asking for.
Rather than $, it may be worth taking the site,
(or both!).
(setup redirects - don't leave the site there, waiting for G to hammer it)
h) Make some rapid changes.
Prioritise the most impactful pages - make some tiny improvements.
>>>
10/?
:Other things to consider/try:
a) Consider Spam Reporting
Personally, I have little faith in the G Spam team,
but you might get lucky.
b) Out them, publicly
Your legal rep. might not condone this.
(If you do it: make sure you are right, have proof, and be civil!)
>>>
11/?
c) Contact their suppliers/vendors/account holders.
If they are MFA/A (made for ad-rev/affiliate-rev),
gather up some evidence, then reach out to them,
and see if they will take the right action.
>>>
12/?
:Don't:
As tempting as it may be - don't do anything "wrong".
Don't try to arrange hacks, bot-swarms etc.
Don't post threats or personal attacks/insults.
Be the "good person"!
>>>
13/?
Sadly, at the end of the day - this is largely unavoidable/unpreventable,
much like being hacked.
You cannot 100% prevent it.
But you can make it far less viable/palatable.
Talk to your host,
your CDN provider,
your web designer,
your dev.
Ask for help.
<Addendum>
Reading back through ...
... I missed out:
1) Rate limiting/throttling
(No normal person (usually) visits 20 pages per session/within 3 minutes)
2) Captcha (some are "hidden", others are interstitials etc.)

جاري تحميل الاقتراحات...