In our previous post, we showed you how to build a powerful, free web scraper directly within Google Sheets using Apps Script. It was a fantastic solution for automating data collection from simple websites. However, as many of you discovered, the modern web is complex. The moment you try to scrape a dynamic e-commerce site or a portal protected by anti-bot measures, the basic solution hits a wall.
Today, we’re breaking through that wall. We’re going to upgrade our original script into a professional-grade data extraction tool that can handle the challenges of the modern web, all while keeping the convenience of managing everything from your Google Sheet.
Table of contents
- Why Do Simple Scrapers Fail? The Challenge of the Modern Web
- The Solution: Bright Data's Web Unlocker
- Bright Data vs. Apify: A Professional Perspective
- The Upgraded Script: How It Works
- How to Set Up and Use the Upgraded Scraper
- The source code
- New Business Opportunities Unlocked
- Summary: Why This Upgrade is a Game-Changer
Why Do Simple Scrapers Fail? The Challenge of the Modern Web
Websites have evolved. They aren’t just static pages anymore. When a simple scraper like our original one fails, it’s usually for one of these reasons:
- JavaScript-Rendered Content: Many sites load a basic page first and then use JavaScript to fetch and display the actual content (like prices or product details). Google’s
UrlFetchApp
often only sees the initial, empty page, missing the data you need. - Anti-Scraping Protections: To prevent abuse and protect their data, websites employ sophisticated defenses. The most common are:
- IP Blocking: If a server detects too many requests from a single IP address (like Google’s servers), it will block it.
- CAPTCHAs: Those “I’m not a robot” tests are designed to stop automated scripts in their tracks.
- Browser Fingerprinting: Websites can check for signs that a request is coming from an automated script rather than a real user’s browser.
Trying to fight this battle alone is a constant, frustrating game of cat and mouse. The solution is to use a specialized service that has already solved these problems.
The Solution: Bright Data’s Web Unlocker
This is where a service like Bright Data comes in. Bright Data is a leading web data platform that provides the infrastructure needed to access public web data reliably. Instead of making a direct request from Google to the target website, we send our request to Bright Data. They then use their vast network of proxies and their intelligent “Web Unlocker” technology” to:
- Route your request through a real residential or mobile IP address, making it look like a regular user.
- Automatically solve CAPTCHAs.
- Manage browser fingerprints and cookies.
- Retry failed requests until they succeed.
Essentially, Bright Data handles all the complex blocking issues, ensuring you get the clean HTML you need, every single time.
Bright Data vs. Apify: A Professional Perspective
When looking for scraping solutions, you’ll often see Apify mentioned. It’s a powerful platform with a marketplace of “Actors” (pre-built scrapers), many developed by the community. This is great, but it can feel less centralized than a service like Bright Data, which I consider to be a more professional, enterprise-focused service.
Both platforms offer tailor-made tools that can scrape specific websites and return structured JSON data. However, for our universal approach, we are using Bright Data’s “Web Unlocker.” This is a general-purpose tool that reliably returns the full HTML content from any URL, giving us maximum flexibility. While Apify has powerful scrapers for specific sites, it doesn’t offer a single, universal tool quite like the Web Unlocker that is designed to simply return the raw HTML from any URL, no matter the protection.
In my opinion, while both services are great, I find Bright Data to be more robust and reliable for business-critical tasks. Their pricing model is also more straightforward. Apify uses a subscription model based on “platform credits,” which can make costs hard to predict. Bright Data’s Pay-As-You-Go plan costs around $1.50 per 1,000 successful requests. This transparency is perfect for our project, and from an integration and cost perspective, I believe Bright Data offers a better value proposition.
Outstanding features of Bright Data
- Geolocation Targeting: You can make your requests appear as if they are coming from a specific country, state, or even city. This is essential for scraping localized content, such as regional pricing, local search results, or store availability.
- Scrape as Markdown: The API can return the scraped content directly in a clean Markdown format. This is incredibly powerful for feeding data directly into AI models or generating documentation without needing to parse complex HTML first.
- Return a Screenshot: You can request a visual screenshot of the target page. This is invaluable for visual verification, archiving how a page looked at a specific time, or debugging issues where the layout affects the data.
- Custom Cookies and Headers: The API allows you to send your own custom headers and cookies with a request. This is an advanced feature for mimicking a logged-in user session or a specific type of browser to access data that requires authentication or particular browser settings.
The Upgraded Script: How It Works
The evolution of our script goes far beyond just swapping out one API call. We’ve transformed it into a much more general-purpose and professional tool.
- A Complete Workflow Enhancement: While replacing
UrlFetchApp
with Bright Data was the core change to handle protected sites, we’ve enhanced the entire workflow. - Scraping Initiated Directly from Sheets: The script is now a bounded script, meaning it’s directly attached to your Google Sheet. We’ve added a custom menu item that allows you to trigger the entire process with a single click, making the user experience much smoother.
- Extract Any Text, Not Just Prices: We’ve removed all the price-specific logic. The script is now completely generic, capable of extracting any text-based data you point it to, whether it’s a product title, a stock status, a user review, or a news headline.
- Capture Multiple Results from a Single Page: The most significant functional upgrade is the ability to extract multiple elements from a single page. If your CSS selector matches several items (like all product names on a category page), the script will now pull all of them and neatly place each one in a separate column in your sheet.
These changes elevate the script from a simple price tracker into a versatile, robust data extraction engine managed entirely within your Google Sheet.
From technical point of view it works on the following way:
- Takes the target URL you want to scrape.
- Packages it into a request to the Bright Data API.
- Includes your secret API key for authentication.
- Sends the request and returns the clean HTML that Bright Data retrieves.
The rest of our script remains more or less the same!
How to Set Up and Use the Upgraded Scraper
Getting started involves a straightforward, one-time setup to connect your Google Sheet to the necessary script and the Bright Data API. Follow these steps to get your powerful new scraper up and running.
- Step: Create the Google Sheet and Open Apps Script
First, go to your Google Drive and create a new Google Sheet. From the menu in your new sheet, navigate to Extensions > Apps Script. This will open the script editor in a new browser tab, where you will place the scraper’s code. - Step: Install the Scraper Code and Required Library
- Paste the Apps Script Code: Scroll down to find the full script. Copy the code, paste it into the Apps Script editor you just opened, and click the ‘Save’ icon.
- Add the Cheerio Library: The script relies on a library called Cheerio to efficiently read and parse the HTML from a webpage, making it easy to extract specific data using a CSS selector.
- In the script editor’s left-hand menu, click the plus icon (+) next to ‘Libraries’.
- You will be prompted for a Script ID. To find this, open a new tab and search Google for “Cheerio Apps Script.” The first result is typically a GitHub page containing the ID.
- Copy the Script ID from the GitHub page, return to your script editor, and paste it into the Script ID field. Click the ‘Look up’ button.
- Select the latest version available from the dropdown menu and click the ‘Add’ button. Cheerio is now successfully linked to your project.
- Step: Configure the Bright Data API Connection
Our script sends requests to Bright Data, which then retrieves the HTML from the target website on your behalf, bypassing any anti-bot systems.- Get Your Bright Data API Key:
- Go to the Bright Data website and create an account.
- Once logged in, navigate to ‘Proxies & Scraping Infrastructure’ on the left-side navigation. Click the ‘Add’ button and select ‘Web Unlocker’.
- You will need to set up a ‘zone,’ which is a configuration for your scraping tasks. The default settings are fine for most websites. However, if you plan to scrape highly protected sites (known as ‘premium domains’), you must enable the premium domains setting for your zone.
- After you click ‘Add’ to create the zone, your API key and Zone ID will be generated. Copy both of these.
- Add Credentials to the Script:
- Return to your Apps Script editor.
- Paste the API Key and the Zone ID into their respective placeholder variables at the top of the script.
- Click the ‘Save project’ icon.
- Get Your Bright Data API Key:
- Step: Run the Scraper and Grant Permissions
You are now ready to begin scraping.- Refresh your Google Sheet. After reloading, you will see a new custom menu item called ‘Scraper’.
- In your sheet, paste a URL into column A and its corresponding CSS selector into column B.
- Click the ‘Scraper’ menu and select ‘Run Scraper’.
- Authorize the Script: The first time you run it, Google will require your permission for the script to work.
- An ‘Authorization required’ window will pop up. Click ‘Review permissions’.
- Choose your Google account. You may see a screen that says, “Google hasn’t verified this app.” This is perfectly normal and expected, as the script needs permission to connect to an external service and modify your spreadsheet.
- Review and grant all the necessary permissions to continue.
Now that the setup is complete, you can run the scraper anytime from the custom menu. The script will execute and populate the results directly into your sheet in a matter of seconds.
The source code
New Business Opportunities Unlocked
With this supercharged scraper, you can now build powerful business automation tools that were previously impossible:
- Reliable Price Comparison: Track prices across major e-commerce platforms like Amazon or Walmart without getting blocked.
- Real Estate Deal Alerts: Scrape multiple real estate portals for new listings that match your exact criteria and get notified instantly.
- Lead Generation: Extract business information from protected online directories to build targeted lead lists.
- News & Brand Monitoring: Track news sites, blogs, and forums for mentions of your brand, competitors, or industry keywords to stay ahead of trends and manage your reputation.
- Competitor Website Monitoring: Keep an eye on your competitors’ websites for any changes—from subtle text updates to major redesigns—and get alerted automatically.
- Market & Competitor Analysis: Monitor your competitors’ product stock levels, new product launches, or customer reviews on a daily basis.
Summary: Why This Upgrade is a Game-Changer
By switching from UrlFetchApp to Bright Data, we’ve transformed our simple tool into a robust data-gathering engine. The key advantages are:
- Reliability: No more failed requests or missing data. You get what you ask for.
- Power: Easily scrapes dynamic, JavaScript-heavy websites.
- Stealth: Bypasses common anti-scraping protections effortlessly.
- Simplicity: All the complexity is handled by Bright Data, while you continue to manage everything from a simple Google Sheet.
You now have a professional-grade web scraping solution at your fingertips, unlocking a new world of data-driven automation possibilities for your business.