CYBERSPACE LABS, 27 June — No one knows less about content scraping than me, so hopefully someone out there can explain what I’m looking at. As you know, we’re all excited about Kathleen Cool joining the “salon” (author team) here at iRez. She made her first post on Monday, I left a comment on it on Tuesday, and today a curious website “DAILY SERPS: Your Daily Dose of SEO” has scraped the comment and used it as text with a cheesecakey lingerie babe photo.
I get it when Facebook “scrapes” your data to sell to marketers. I get it when “Girls Around Me” mashes up Facebook & Foursquare data to make an app. Recently someone scraped my whole Thomas Kinkade obituary, photos and all, which made some sense, ok, free article. I get Comment Spam too, where robots or humans like Mechanical Turkers leave nonsensical comments in an effort to get “free” linksback to whatever site they’re trying to drive up in search ratings (Miso gets “Asian Sex Slaves,” I get Louis Vuitton bags) Luckily our Akismet spam blocker seems to be 100% effective: I’ve never seen a false positive or a false negative in the 2 years we’ve had this blog. Which is a good thing since the spam comments outnumber the real ones by 10 to 1.
I assume the “SERPS” of Daily Serps means “Search Engine Results Page” and Your Daily Dose of SEO references “Search Engine Optimization,” but it’s just kinda weird / funny to see a random comment stuck with a random photo on some weird blog page that may or may not be around in a short while. There’s actually a fair bit of content on the site, seemingly random and presumably all scraped from different places. I can sort of “see” the shell game, I just don’t know what the “point” is.
Anybody know about this stuff?
23 thoughts on “We’ve Been Scraped!”
HA! This is almost as fun as the Commodity News (another SEO scraper) featuring my mocking “COMMODITY Update: Stockholder’s Report” post about my piece “Commodity” in with a bunch of scraped articles about beans, corn, ninja SEO moves & stock graphs. It was highly amusing.
BTW, just for the record, it was “chinese phone whores” (best translation I found of the kanji), not “asian sex slaves.” [that oughta boost your spider-hits now!]
haha, thanks Miso! I never followed any of the links, but I’m sure my MANY links to Louis Vuitton bags were all knock offs… so I suppose an “Asian Sex Slave” is a sort of knock off of a “real” “Chinese Phone Whore.”
TAN: I’ve never understood the fuss about Louis Vuitton bags anyway – I think they’re boring! 😛
Meanwhile — I knew what “Comment Spam” was because I’ve heard the term before (and gotten tons) but I didn’t even know what to call this, and just now realized it’s “Trackback Spam” and Googling that I find lots of discussion, including a post from Matt Mullenweg SEVEN years ago. Gawd! So much still to learn / catch up on!
The thing I still don’t get about the “DAILY SERPS” website is: what are they “selling”? If it’s trackback spam, what do the links do for them?
Or maybe it’s a sort of SaveMe Oh, hactivist / spam, virtual performance art website? 😀
Hi Miso… I once went to a “L1nger1e P4rty” in SL and the crazies had a field day with that too. I finally got past that whole in 2010. hehe
And also BTW… the folks at SNPros value misosusanowa on Twitter at 641.30 while vaneeesa on Twitter is valued at only 266.19, so you’d better listen to me if you wanna be in good shape for your IPO. Just some friendly advice 🙂
That’s cool, Miso. here’s what mine says, “Yordie Sands’s Twitter Account (@yordiesands) is worth $552.86 USD” I’m selling!! hehehe
What those aggregators are doing is profiting off the advertising cramming their pages. The linkback scam (which Prokofy Neva & Hamlet Au are very good at) is to intercept search engine queries on names, articles & writings, replacing the direct link with this intercepted/aggregated link so that the first links on a search query are to these intermediate pages & they get AdSense or other ad revenue on hits. This works because of all the linkback links peppering the site.
You’ll notice, on some blogs (like the above mentioned) how their “articles” are sometimes just one or two sentences with a link to the article they are “talking” about? Same deal/scam/manipulation – the money is in the linkback to the article, not the “content” they are offering.
This is one reason I seldom click on any Paper.Li links; they are merely scrapers/aggregators for this kind of search engine pollution.
This is very interesting to me, Miso. I’ve wondered about all that backlink stuff. I don’t have the ambition to try it myself (knowingly anyway) but I’ve seen some people generate seemingly incomprehesible stats. Btw, i took a peek at Mr Craps’ value “lscm’s Twitter Account (@isfullofcrap) is worth $2,754.49 USD.” And that makes sense to me, he has a very large (not huge) and very loyal following.
I’ve been discovering on Empire Avenue that quality counts A LOT. I see spammers putting out almost constant streams of tweets with feeds to FB and everywhere, but many earn less than me. I do maybe a dozen or two tweets a day and maybe six or ten FB posts. What I’m learning is, it’s conversations like this on WordPress that carry huge quality rankings.
Hold it Miso, I know what you’re thinking, “is my ass worth 3x Vaneeesa’s? Or is SNPros ‘Bullshit!’?” Well, considering that this here’s the World Wide Web, the Global Internet, capable of making you a billionaire or burying you like a trolled meme, there’s just one question you’ve gotta ask yourself…
oh do checkit, Vaneesa! i checked a couple big names Mr Crap & Grace McDunnough and they are both over $2500.
hahaha @ladygaga US$32,000,000
Whoa, Talk about fine ass! I missquoted mine, it was US$55,286… that period should have been a comma. The real question in my mind is, does that mean we should get that much money for each of our tweets?
Darn, this would make a fine article. One of you should write it. I still don’t know enough to even understand the subject. I only began learning what a social network really is about 3 months ago.
I’ve wondered about some of these issues in the past, just never tried to “break the code” so to speak. Let us know when you figure out what the DAILY SERPS are selling; I’m curious now. I’ve heard about SaveMe Oh but never seen her site.
The thing for me is information – that is what I came to the net for and that is what is valuable to me. People posting links on twitter to “articles” which are merely a link to their G+ or FB pages containing ANOTHER link to, perhaps the actual article or perhaps another linkback to ANOTHER page (often another “blog” of theirs) which has another link to perhaps the actual article that caught my interest…
This is pollution, nothing less. This is monetizing gone crazy. Sure, I guess if you get enough .0002 cent hits, you might be “making money” but you are doing nothing to “inform” anyone. You are coat-tailing; you are screwing search engines; you are scamming off my interests and goodwill. When I find a racket like this, I let people know in no uncertain terms what I think about these people muddying the waters, polluting information and making the web a gross maze of advertising spam.
There’s ways to do SEO that is not gaming the system. I don’t object to that. I do object to these middlemen-kidnappers-carpetbaggers; imagine if this was done in a library’s card catalog. It’s the opposite of “informing” people.
SPAM IS BULLSHIT!!!
Actually, now that I go back and look at my “bullshit” series
You could say that all “bullshit” is spam, in that, as you say, it’s noise that isn’t really content or getting you a lot closer to content.
Then again… I do have sort of bullshittish feelings toward marketing & advertising… they work hard to create a consumptive culture for products we might not need at all, or at least don’t have a need for that particular logo on our clothes or car. I know, I know, I define my individuality by the logos I choose to wear and drive.
if we all make widgets by hand and then sell or barter them… that’s great… but we have a world of peeps working all day making shoes to trade for swords…
If Mr. Ford can mass produce cars, AND advertise them to lots of eyeballs on the SuperBowl… then his volume can create lower pricing for all. And after all a Toyota Prius does cost less than a Tesla.
So Advertising / Marketing could be “bullshit,” but they could also help the efficiency of distributing goods.
As for a blog like New World Notes using it’s existing search rank advantage to “intercept” search queries to “real” content, I think you could look at that 2 ways.
Sometimes we arrive at content thru search, in which case the scenario you described doesn’t help at all.
Other times we are “readers” of a given publication like HuffPo or Salon.com, or in Vaneeesa’s dreams iRez… in that case the “reblog” could be helpful in that it makes the overall site a “well curated” site for information you care about, and links from there might take you anywhere on the web.
VB: I know what you mean about marketing & advertising, it’s really become an age where this is consuming us and categorizing us and packaging us.
However, i’ve seen people who want to get their message out, good people (I’m thinking of a woman I know in particular), and they work the system hard. She’s a DES daughter and she has been tireless in getting her message out. Some might call her work spam but I’ve helped her and in so doing i’ve learned, my awareness has grown. so your “efficiency” idea seems pretty valid here to me.
Any way you want to look at it, Hits/pageviews are treasured. However, I think the search engines are getting smarter and smarter in evaluating links. I’ve seen this on Empire Avenue as I mentioned in what I’ve observed with quantity vs quality.
On EA you are rated based on quantity & quality metrics. Currently i’m 3.5 of 5 for both (FB is my strongest, then WordPress, Twitter, Flickr). In my case, my quantity has caught up with quality (I believe its because of chats like this [lots of polysyllable words… hehe]). I don’t know how EA does it, it’s their secret but if they can quantify this, others can and do also. I’ve seen warning in EA about being identified as a spammer. Those who are see their hits become much less valuable.
For me personally, i don’t care if my content is reblogged. However, as in cases like Kath’s post, I am very irritated when its alter it.
And if you want to really “curate” this site, you need some powerful tools (no idea what those are tho). Can you imagine the kind of crap that Salon has to deal with! I know it’s a daily vigil though.
Miso: I’m with you on your feelings about most of what you say. Where i differ is, I see the kidnapping putting my stuff in front of new audiences, and two, developing my name brand. I just happened to create this name which i’ve discovered has some unique qualities. So, i sorta win unless my content is bastardized.
I have tons of those other types of posts in my twitter and to a lesser degree on FB. I seem to be able to wiz by most of them without wasting mental bandwidth. I have about 10 lists on Twitter than help me filter throught the 800+ peeps I follow. I eliminated a lot of the constant posters (like Guy Kawasaki, who is too cool for school — love love, but he just spams as far as I’m concerned, and i doubt that he is doing any of it, prolly some assistant) may be at the upper limit of being able to manage going through my entire list. More and more I’m using my list filters.
I’d like to do some SEO but I just haven’t had time to study it. I have a few things like header keywords and such, but really need to study this whole subject. Have you got any SEO tools or info you use?
I am an informationalist. You might say “digital librarian” or researcher. It is a concern of mine, as it is with many people, how this type of SEO madness can be taken too far, polluting an informational resource. “Curation” is one thing, but if a site is only auto-scraping by keyword/hit count, can you really call that “curation”?
I have stated before in these debates that I am not against capitalism, marketing or advertising. I remember a time when advertising was somewhat interesting, informative and geared towards people who benefited by it.
What is happening now is the use of these techniques to promote empty content. There’s no “curation” in an autobot-assembled site; the site exists merely to intercept searches by rank in order to capture “hits” – that is not advertising or marketing, that is deception, and gives me a very dim view of any “content” a site might have. It promotes nothing but the site; it offers no value.
Yes, I know a lot about SEO/gaming search engines; enough to spot it, study it and deny it access to my attention. Sorry, but I cannot help with tools or information regarding what I see (mostly now) as a black-hat attack on the information capabilities of the internet.
[one example: I cannot use my regular email address for this or any other wordpress-based blog; their entanglement with Gravatar gives me (and many others) a problem with commenting. I cannot make Gravatar remove me from their nasty little virus-worm list. This is a service? To me, it is malware/virus that I must route around/fool/overcome by using an email address that Gravatar hasn’t got its nasty hooks into yet]
BTW… once I hit on someone’s “links” three times, only to be sent on this type of round-robin goosechase, I never take a link from them again, ever. So their plan to monetize on my clicks is dead in the water; I will never pay attention to their links/tweets again.