Is Web Scraping Legal? The Answer You Need for Ethical Web Scraping!

Justin Shin

Do you want to venture into web scraping and want to know the legalities involved? Then you should come in now to learn the legalities of web scraping and whether it is legal or not.

When web scraping is mentioned, many see it as a grey area that can quickly turn illegal. Some see it as outright hacking that could get you into legal trouble, while others will want to see it as an act of data theft. But in reality, all of these are misconceptions of what web scraping is all about and how the laws see and interpret it.

If you have been an avid ready of this blog, you will notice in many of the FAQs sections we say web scraping is generally legal in most cases — when the “is web scraping legal” question comes up.

However, we don’t explain it in detail for you to know the situation where web scraping can be considered legal and situations it can be illegal for you to partake in. In this article, I will be giving a comprehensive answer to the question, is web scraping legal? But first, let’s take a look at what is regarded as web scraping.

Web Scraping Legal Key points


Table Of Contents

What is Considered Web Scraping?

In looking at the legal angle of web scraping, it is important we define what can be seen as web scraping. This gives us a scope of what can be regarded as web scraping and what can’t be regarded as web scraping. In layman's terms, web scraping is the use of automation tools to scrape or collect data from the Internet.

This definition can be misleading as it includes methods like API scraping and web scraping. Here, I will define web scraping as the use of automation tools to scrape or collect data from web pages, not via an API. It involves loading web page content and then using a parser to collect the specific data point one is interested in.

Most of these automation tools, known as scrapers, use of evasive techniques to avoid detection and blocks. This is quite different from directly pulling data from databases or hacking a website to pull its data. It also does not include the use of data API provided by a website to feed you its data.


Web scraping is legal, but not in all cases. There is a general rule of thumb that you can use to determine whether your web scraping solution and intended use case make it legal or not. This framework was introduced by Amber Zamora in his work titled, “Making Room for Big Data: Web Scraping and an Affirmative Right to Access Publicly Available Information Online.” Let’s take a look at the key items.

Legality of Web Scraping

Factor Description Example Data
Public data Scraping public data is more permissible than private data Scraping business info on Google Maps rather than non-public user profiles
Terms of service Abiding by a site's ToS is key to stay legal Scraping Amazon product listings despite ToS prohibiting scraping
Volume of data Scraping modest amounts is safer than large volumes Scraping 100 posts per day rather than the full history of millions
Rate of scraping Spreading out scraping is better than slamming a site Making requests every 1 second rather than 100 requests per second
Use of data Non-commercial use has less restrictions Personal research project rather than for a commercial product
Provenance Citing the data source helps stay ethical Clearly stating "Data scraped from Twitter on [date]"
Impact on site Avoid overloading sites with scraping traffic Scraping at reasonable volumes without slowing target site performance

No Harm or Considerable Harm to the Website

The first step to engaging in legal and ethical web scraping is making sure no harm is done to the website you are scraping from. If you are a small web scraper, it is highly unlikely that you will be able to cause any damage to big sites like Facebook.

However, you can cause damage to small websites too. There are also web scraping projects big enough that can cause damage to the servers of big websites. When that happens, it becomes illegal to you, and the too many requests you send that cause considerable damage can be seen as abuse or even DDoS.

Only Publicly Available Data is Collected

It might interest you to know that the availability of data can determine whether scraping it is legal or not. In the HiQ Labs Vs. LinkedIn case, the court ruled in favor of HiQ Labs, affirming it is allowed to scrape data from the Internet provided the data is publicly available. When trying to stay legal in the web scraping business, then you should avoid scraping protected pages. This includes password-protected pages as well as paywall-protected pages, as that is termed illegal.

No Copyrighted Content is Scraped

One of the most complicated content on the Internet is copyrighted content. A good number of the content online are copyrighted without you even knowing. If you intend to scrape copyrighted content, then you need to pay close attention to the permission given to you by the law. This is because web scraping, as an act, does not seek permission from copyright owners, and your best et here is the law.

Interestingly, this differs depending on your location. In the EU, scraping of Copyrighted content is allowed, provided you do that for the purpose of generating information with other considerations. This is contained in the DSM Directive.

In this, you are allowed to scrape anything if it is for scientific research, provided it is publicly available. However, if your scraping task is for business, you should look at the robots.txt file for the directive. If the URLs you intend to scrape are part of the disallowed URLs, you should avoid scraping them. Scraping disallowed URLs with copyrighted content for business purposes is considered illegal.

In the US, scraping of copyrighted content is permitted under the fair usage doctrine. This is quite similar to what is contained in the DSM Directive. However, unlike the DSM Directive, a distinction wasn’t made between scientific and business usage. For you to know what is considered fair usage, I recommend you read the fair usage document here.

The most spectacular case to date about the fair usage doctrine is the case between Authors Guild vs. Google, where the court ruled in favor of Google, asserting that making virtual copies of books was permitted under the fair usage policy. There are other considerations you should take note of — read the fair usage document for more.

Data Scraped is Transformed into Another Product

If you intend just to sell the data scrapped without making any transformation, then you will be doing something illegal, as that is not permitted by the law. It is also not allowed to scrape data in other to make a considerably similar product.

Take, for example, if you scrape real estate data for quantitative analysis, you are doing nothing illegal, provided the data is publicly available on a website. However, if you scrape it from a competitor and display it on your website, then you have crossed the legal boundary, and that is considered illegal.


Scraping Personal Data: What is the Position of the Law

While you are allowed to scrape data provided it is available online, you need to be wary of personal data because of its complexities. In the past, people cared less about their personal data, and you could do anything you wanted to do with it. Governments, on their part, have also not been interested in the protection of personal data.

This is not longer the case as governments are waking up to protecting the personal data of their citizens, and the citizens, on their part, are increasingly interested in what their personal data is meant for. So is scraping publicly available personal data legal? To answer this, you need to pay attention to the regulations in your interested jurisdictions.

Jurisdictions - scraping personal data in different regions

Here is an example comparison table illustrating the legal position on scraping personal data in different regions (laws regarding scraping of personal data):

Region/Country Laws on Scraping Personal Data Laws & Sources
United States Generally prohibited without consent under privacy laws like California's CCPA CCPA
European Union Restricted under GDPR without consent; fines up to 4% of revenue GDPR
United Kingdom Typically prohibited under Data Protection Act without consent Data Protection Act 2018
Canada Violates PIPEDA privacy laws in most cases without consent PIPEDA
Australia Breaches Privacy Act in most cases; viewed as collecting sensitive information Privacy Act 1988
India Violates privacy rights under the Information Technology Act Information Technology Act 2000
China Highly illegal, can lead to fines and criminal charges China's Personal Information Protection Law
Brazil Violates General Data Protection Law except specific exceptions Lei Geral de Proteção de Dados Pessoais
Kenya Unclear, but Constitution protects right to privacy Constitution of Kenya
Japan Violates Act on the Protection of Personal Information Act on the Protection of Personal Information
Russia Requires consent under personal data laws; illegal without Federal Law on Personal Data
South Korea Violates Personal Information Protection Act Personal Information Protection Act
Mexico Violates Federal Law on Protection of Personal Data Ley Federal de Protección de Datos Personales en Posesión de Particulares
Indonesia Breaches data privacy provisions under Electronic Info and Transactions Law Electronic Information and Transactions Law
Saudi Arabia Violates Anti-Cyber Crime Law, risks fines and imprisonment Anti-Cyber Crime Law
Israel Violates Privacy Protection Act Privacy Protection Act 1981
Nigeria Violates Nigeria Data Protection Regulation NDPR
South Africa Infringes Protection of Personal Information Act POPIA
Turkey Breaches Law on Protection of Personal Data Law on Protection of Personal Data
Argentina Violates Personal Data Protection Act Ley de Protección de los Datos Personales
Thailand Violates Personal Data Protection Act Personal Data Protection Act
Singapore Infringes Personal Data Protection Act Personal Data Protection Act
Philippines Breaches Data Privacy Act Data Privacy Act of 2012
Pakistan Violates Prevention of Electronic Crimes Act PECA 2016
Malaysia Contravenes Personal Data Protection Act Personal Data Protection Act 2010
Vietnam Violates Law on Cybersecurity Law on Cybersecurity 2018
United Arab Emirates Violates Data Protection Law Data Protection Law
Colombia Violates Habeas Data Act Ley Estatutaria 1266 de 2008
Chile Violates Data Privacy Act Ley de Protección de Datos Personales
Peru Violates Personal Data Protection Law Ley de Protección de Datos Personales

Currently, the EU and California in the United States have the most pronounced personal data protection law. In the EU, there is the General Data Protection Regulation (GDPR) and in California, there is the California Consumer Privacy Act (CCPA). The law you should focus on depends on your location, that of your target, and or their location.

If you are in the EU, do business there, or your targets are there, then you should focus on GDPR. For CCPA, those resident in California or doing business there should focus on it. If you are not in either of these regions, you should find out what local laws say about scraping personal data.

GDPR Consideration

According to the GDPR, it does not matter where the source of the personal data is coming from — but you are not allowed to scrape publicly available personal data even if the user made it available online. If the HiQ Labs Vs. LinkedIn's case was in the EU, and at this time, this could have been the reason LinkedIn will win against HiQ Labs. There is a case where an EU-based business scraped personal data from the Polish Business Register, and the EU business was fined.

CCPA Consideration

Interestingly, things are a little different with the CCPA than the GDPR. For the CCPA, personal data made publicly available is no longer protected — including his right to opt out of the sale of his information. This makes the scraping of personal data made publicly available legal. With this, you can scrape personal data if you are in California, do business there, or your targets are there. This is in contrast with the GDPR that you aren’t allowed. Other states in the United States are following suit, Virginia and Colorado.


By now, you know facts are not protected by law in terms of web scraping as no one has their copyright since they are mere observations of reality. However, you have to be careful when scraping facts. This is because of the Database protection law. If someone invests a lot in collecting facts, you will not just come with automation software to collect them and think you are within the legal boundary.

In the EU, even facts can be protected according to the Database Directive, provided their collection verification and presentation require a considerable investment. In a situation where someone spent a lot for this, you can only scrape their content for scientific purposes, and it can only be scraped for business purposes if the owner didn’t explicitly block their URL in the robots.txt file. Again, things are a little different in the US as you are allowed to scrape facts in the US, provided it is made public in the public domain with no password protection.


How to Scrape Data Legally

From what you read above, it is expected you have an overview knowledge of what makes web scraping legal and illegal. It is important I discuss how to scrape data from the web within your legal boundaries so you avoid getting into trouble with the law.  Below are some of the considerations to take note of.

Scrape Only Publicly Available Data

One thing with web automation and programming generally is that they are not restrictive. You can do whatever you want with them. But for the sake of staying within your legal boundaries, you should only scrape publicly available pages. Any data that you need to log into an account that’s content protected by a password should be avoided. This also includes content that is hidden behind a paywall.

There are exceptions, though, especially if the site in question allows you. Take, for example, Onlyfans allow users to download content of creators they subscribe to, and as such, scraping them is not illegal.

Do Not Overwhelm a Website with Requests

No matter how legitimate your web scraping actions are, if they cause any damage or harm to your target website or web server, you have crossed the legal line. Being able to cause harm to a web server is determined by how powerful your target servers are and how many requests you send.

If you know you are dealing with a small site that can’t handle a lot of requests, you are advised to set delays between your requests or, at best, avoid more than one request simultaneously. It is even recommended you try to scrape your target site at night when you know traffic for legitimate users is low to avoid overwhelming it.

Avoid Scraping Copyrighted and Personal Data in Certain Locations

You are allowed to scrape facts in most cases. But if you know the data you want to scrape has a copyright on it, you are better off leaving such data, especially in certain regions. As you read above, the law in the EU allows you to scrape copyrighted data for scientific research, provided it will be transformed into a different form.

For business reasons, you are advised to respect the robots.txt file. In the US, copyrighted content is allowed for scraping provided you follow the fair usage policy.  If you are in the EU, you are also advised to avoid scraping personal data. However, those in the US can do that.

Website Scraping Policies and Anti-Scraping Techniques Comparison

Website Scraping Policy Anti-Scraping Techniques
Google Private use allowed, commercial use prohibited IP blocking, CAPTCHAs, rate limiting
Facebook Most scraping prohibited without permission Bot detection, blocking
Twitter Public data scraping allowed, limits apply Rate limiting, firewall rules
Amazon Scraping prohibited without permission Bot detection algorithms
eBay General scraping prohibited Monitoring, blocking
YouTube Scraping terms unclear, likely prohibited Rate limiting, firewall rules
Instagram Scraping terms unclear, likely prohibited Rate limiting, bot detection
Reddit Scraping discouraged, light non-disruptive scraping allowed Rate limiting

Read more, Best Websites For Web Scraping


FAQs

Q. Must I Respect the GDRP or CCPA Regulations?

The GDPR regulation is meant for those in the EU and those whose target data is from there. The CCPA is for those in California or who do business there. If you are not in either of the two jurisdictions and don’t collect data from users in those locations, then you need not worry about these regulations as they don’t apply to you. Instead, you should focus on the local law in your region to know whether what you intend to do in terms of web scraping is legal or not.

Q. Can I Get into Troubles Web Scraping?

Yes, you can get into trouble scraping data from websites, especially if you disregard the legal angle of web scraping. Web scraping, though legal, can get you in trouble. As with other tasks, it does have rules and regulations governing it, and going outside of that is a recipe for disaster.

To avoid getting into trouble when web scraping, you are advised to engage only in web scraping after looking at things of legal permission, and the law supports your own kind of web scraping. If you like, you can also bring in ethical web scraping into the mix, so you only scrape sites that allow such.

Q. Can You Get IP Banned from Web Scraping?

Most popular websites do have anti-spam systems in place. And one of their major tasks is to identify an IP address that sends it too many requests — typically of web scrapers and other bots. If this is discovered, such IP address is blocked either temporarily or permanently. This is because these sites do not like web scraping. For you to successfully scrape them, you will need to rotate the IP address frequently. I recommend rotating residential proxies from Bright Data or Smartproxy for an effective web scraping experience without getting blocked.


Conclusion

As a way of concluding this article, I need to tell you that the above is still not an exhaustive answer to the question of is web scraping is legal. This is just a gentle introduction, and there are also a lot of legal considerations. I recommend you do not take what you read above as legal advice as I am not a legal practitioner, and as such, nothing you read here should be taken as legal advice.

You should seek the help of a competent legal practitioner, as I only seek to explain my own understanding of whether web scraping is legal or not based on the available information at my disposal.

Related Posts

Top 10 Web Scraping Practice Sites (2023)

Are you looking to test your web scraping practical skills and looking for the best sites to test it out? Then read the article below to discover the best ...