『Google Indexed You. AI Has No Idea You Exist. | Stephen Burns, Common Crawl』のカバーアート

Google Indexed You. AI Has No Idea You Exist. | Stephen Burns, Common Crawl

Google Indexed You. AI Has No Idea You Exist. | Stephen Burns, Common Crawl

無料で聴く

ポッドキャストの詳細を見る

35% of the internet is blocking AI crawlers right now — most of it by accident, through default CDN settings that site owners never touched.

Stephen Burns is the Web Intelligence Lead at Common Crawl Foundation, the nonprofit that crawls 2.3 billion pages per month and provides the training data used by GPT, Claude, Llama, and most major LLMs. He covers harmonic centrality - the algorithm that determines which sites get crawled and end up in AI training sets - and why it operates completely differently from Google's PageRank. Sites with JavaScript-heavy builds, slow load times, or CDN defaults that block AI bots may not exist as far as LLMs are concerned.

This episode also covers the EU AI Act's August disclosure deadline, which will require AI companies to publish the top 1,000 domains they trained on - giving SEOs a new way to verify AI visibility for the first time.

Common Crawl has been cited in over 10,000 academic research papers and its data underpins over 80% of the training tokens in GPT-3. Burns works at the intersection of web-scale data and search infrastructure - this is the part of the pipeline that most SEOs have never had access to before.

adbl_web_anon_alc_button_suppression_t1
まだレビューはありません