Skip to main content

Internet Archive will ignore robots.txt files to keep historical record accurate

The Internet Archive has announced that going forward, it will no longer conform to directives given by robots.txt files. These files are predominantly used to advise search engines on which portions of the page should be crawled and indexed to help facilitate search queries.

In the past, the Internet Archive has complied with instructions laid out by robots.txt files, according to a report from Boing Boing. However, it has been decided that the way that these files are calibrated is often at odds with the service that the site sets out to provide.

Recommended Videos

“Over time we have observed that the robots.txt files that are geared toward search engine crawlers do not necessarily serve our archival purposes,” stated a blog post that the organization published last week. “Internet Archive’s goal is to create complete ‘snapshots’ of web pages, including the duplicate content and the large versions of files.”

Robots.txt files are increasingly being used to remove entire domains from search engines following their transition from a live, accessible site to a parked domain. If a site goes out of business, and is rendered inaccessible in this way, it also becomes unavailable for viewing via the Internet Archive’s Wayback Machine. The organization apparently receives queries about these sites on a daily basis.

The Internet Archive hopes that disregarding robots.txt files will help contribute to an accurate representation of prior points in the web’s history, removing their capacity to muddy the waters with instructions intended for search engines.

The organization has already ceased referring to robots.txt files on sites and pages related to the U.S. government and the U.S. military, to account for the enormous changes that can be made to domains between one administration and the next. This decision has caused no major problems, so there are high hopes that discontinuing the use of the files more broadly will be helpful.

Brad Jones
Former Digital Trends Contributor
Brad is an English-born writer currently splitting his time between Edinburgh and Pennsylvania. You can find him on Twitter…
Prices for these popular computer accessories are going up
MX Master 3S mouse

Logitech has bumped up prices by up to 25% on a bunch of its products, probably because of tariffs on stuff coming in from China. A video breakdown by Cameron Dougherty (via 9to5Mac) shows that around 51% of Logitech's lineup is feeling the impact, with an average price hike of about 14%.

For instance, the MX Master 3S mouse has increased by $20 to $119.99, the Pro X TKL keyboard shot up from $199.99 to $219.99, and the K400 Plus Wireless Touch keyboard is now $34.99 after a $7 increase (that’s a 25% jump!).

Read more
Best tech for new grads: From dorm to office
Sony WH-1000XM5 headphones seen in silver.

Graduation season is just around the corner, which means it's about time that you start thinking about the gifts you're going to get for new graduates. Some students will head into college and move into dorms, while some students are leaving that chapter behind and will instead find their way into an office. With those in mind, we've rounded up this list of offers featuring discounts for some of the best tech for new grads.

Anker 332 USB-C Hub -- $25 $35 29% off

Read more
The best deals in the Dell Spring Clearance Sale
A Dell laptop with a blue flower desktop background.

It's officially spring, and Dell Canada is doing some spring cleaning. There are two Dell Canada sales going on right now: the Spring Clearance Sale and a special promotion for students. Whether you're prepping for next school year or just upgrading your daily driver laptop, Dell is sure to have a deal that interests you.

During these sales, Dell has slashed up to $550 off select laptops and accessories, and students get 10% off certain products. Check out both sales below, and see what we've picked as the standout deals.

Read more
.