Skip to main content

The AI-powered future of web data collection

oxycopilot resize
OxyLabs

At the web scraping conference OxyCon 2024, its organizer Oxylabs revealed the first AI copilot for web scraping. It comes as a feature of Oxylabs’ unified Web Scraper API, which serves as an all-in-one public web data scraping platform. Named OxyCopilot, the feature tackles one of the main challenges in the web scraping pipeline — building and maintaining custom data parsers with limited infrastructure, personnel, and time resources.

Oxylabs is a leading web intelligence platform with nearly a decade of experience in the field. OxyCopilot is not the first time the company has led the way in AI implementation for data scraping. In fact, from its inception, Oxylabs put special emphasis on fostering innovation and research and development (R&D). This attitude allowed a once humble startup from Lithuania to become a leading force in web data extraction recognized by the world’s top brands. However, it also meant encountering multiple challenges on its path.

Recommended Videos

What is OxyCopilot, and who is it for?

Integrating Oxylabs’ scraper APIs into one platform — Web Scraper API — provides users a one-stop shop for all their public web data needs. OxyCopilot is an AI-powered feature of this unified platform that helps users create requests for the API and build custom parsers.

The copilot understands natural language prompts and only needs a URL or multiple URLs belonging to the same domain to create parsing instructions. In other words, users can provide the URL and tell in simple language sentences what data they want from a particular domain. The AI copilot will provide parsing instructions in minutes. Feeding these instructions to the API will promptly return the parsed data. And anyone with a basic knowledge of web scraping can quickly learn to use it.

However, OxyCopilot was not planned in detail from the beginning. It did not come as a sudden stroke of genius. Instead, there were many ideas and iterations along the way, as well as multiple challenges and eureka moments in the history of Oxylabs that eventually led the company to create the unified Web Scraper API with an AI-powered assistant.

The path to innovation

Founded in 2015, Oxylabs faced the challenge common to early-stage startups: You must be inventive and dare to do things unconventionally to make a name for yourself. What amplified the challenge was that the web scraping and proxy solutions industry at the time was young, niche, and often overlooked or misunderstood.

Thus, it was upon industry pioneers to explain that only publicly available data is collected, implement strict KYC policy, and differentiate themselves from those who do not follow legal and ethical norms. Being original and unconventional when looking for technical solutions had to be balanced with and limited by following and promoting ethical conventions. Nevertheless, choosing this path proved right for Oxylabs. After a while, it allowed harnessing proprietary knowledge about the client’s needs and possible solutions that few companies possessed.

This competitive advantage cemented Oxylabs’ leading status. However, it brought new challenges, one of which is that such status can lead to deceptive comfort and inertia that stifle innovation.

Recognizing the risk, Oxylabs decided to address it with a favorable and motivating policy for thinking outside the box. It includes the inventor’s bonus policy, available to any employee who proposes a feasible, innovative idea; full support to inventors during the patenting procedure, and regular meetings for innovation mining. Effective implementation of these measures allowed Oxylabs to end up with over 100 patents in their portfolio and be named one of the best workplaces for innovators.

Implementing AI for data scraping

While Oxylabs was maturing as a company, AI capabilities were developing rapidly. These developments presented an important research direction for the company at the forefront of the industry, aiming to address the most pressing web scraping challenges. Understanding this, in 2020, Oxylabs established an AI and machine learning (ML) advisory board consisting of AI researchers and developers who worked with the world’s leading tech companies.

By 2021, Oxylabs has already experimented with AI in areas like proxy management, response recognition, and dynamic fingerprinting. These explorations led to important discoveries and implementations now unified in Web Scraper API. However, parsing, although a simpler task than web unblocking at the surface, was an area where AI’s benefits were yet to be discovered. To understand why automating parsing with AI proved to be challenging, one needs to look closer at the peculiarities of the parsing process.

The struggle with parsing

Web data parsing is the process of extracting unstructured data from HTML and structuring it to make it analyzable. While parsing is done by specialized tools known as data parsers, building and maintaining these tools proves challenging. Website layouts change, which causes parsers to break, delaying important business procedures. According to a recent survey, due to this reason, 57% of developers fix parsers several times a week, and 31% do it every day.

The mentioned research, conducted by Oxylabs and Censuswide, surveyed scraping professionals in two of the biggest markets for public web data, the USA and the UK. It has revealed that, generally, parsing processes cost 10 to 40 hours a week for 75% of scraping professionals. Meanwhile, if parsing is interrupted, 95% of businesses face negative impact within 24 hours.

The survey only confirmed what was clear to Oxylabs years ago — AI could bring great value to developers and businesses if it could automate parser building, generate instructions rapidly, and thus expand the range of people who can handle web scraping tasks. Already before the AI boom of 2022, Oxylabs released products capable of ML-driven adaptive parsing. Additionally, explorations in this direction led to Oxy Parser, an LLM-based open-source product that can automatically parse HTML into the Pydantic models.

Building the first AI-powered copilot for parsing

Encountering tools like ChatGPT, capable of interacting with the user through natural language prompts, made Oxylabs’ developers eager to utilize the large language models (LLMs), underlying these tools. However, creating such an assistant for data parsing was far from straightforward, even for some of the best minds in the public web data gathering industry.

The main problem that put the development of AI copilot for parsing on hold at the end of 2023 was generating Xpath, which works as a roadmap to finding specific webpage elements in HTML. The goal of OxyCopilot was to enable the client to automatically generate parsing templates for the domains they want to scrape. With this functionality, they could have all the parsing done on Oxylabs’ side, removing the necessity of using internal server resources. However, if the tool couldn’t find the Xpath and put it in the parsing instructions’ template, there was no chance of automating this process.

In the spring of 2024, a solution was found. Figuring out the solution was followed by three months of intense work by three seasoned ML engineers. OxyCopilot uses novel logic, which Oxylabs is currently patenting. [2] The company already had proprietary technology for generating parsing templates, and it became the stepping stone for automating the parsing process without excessive costs associated with using LLMs. Instead of calling LLMs for each request, the Oxylabs’ copilot generates parsing templates based on URLs and natural language prompts.

To be continued

Facing a challenge leads to struggle, leading to innovative solutions, which then leads to new challenges. This was Oxylabs’ path from its inception in 2015 to launching Web Scraper API, the all-in-one scraping platform, and its AI-powered OxyCopilot in 2024.

This path, however, has no end in sight. Instead, further improvements on OxyCopilot and the entire scraping platform, new AI applications for web scraping, and other innovations are on the horizon. Thus, web scraping professionals around the world have plenty to be excited about when thinking about the future.

Digital Trends partners with external contributors. All contributor content is reviewed by the Digital Trends editorial staff.
Chris Gallagher
Chris Gallagher is a New York native with a business degree from Sacred Heart University, now thriving as a professional…
Beyond the hype: Native AI urges insights professionals to demand measurable results from AI providers
native ai measurable impact image1

In today’s AI rush, businesses are seeking artificial intelligence solutions for two reasons: to solve known business problems and to “stay ahead of the curve.” While the latter may provide a boon for AI startups in the short term, generative AI platform Native AI says this has already started to lead to long-term problems for the industry.

“Hype can only take this industry so far,” says Native AI CEO Frank Pica. “Soon, providers of AI technology and applications will need to prove value to partners in order to thrive, or they will not survive once the hype wears off.” Pica says that many genAI startups focus exclusively on upstream value because it’s difficult to measure, so they can get away with phantom solutions for longer. But Native AI has found a competitive advantage in pursuing multiple use cases that produce measurable impact and ROI for businesses.

Read more
Kaadas launches groundbreaking SuperShield Intelligent Security System with 32 patents

In the tapestry of today's tech-enhanced homes, where every device weaves into another, the Kaadas Legend Master Series emerges as a beacon of safeguarding grace. Crafted with the ingenuity of seasoned pioneers, this masterpiece doesn't merely adapt to the realm of home security -- it redefines it. With its SuperShield Intelligent Security System cradling a treasure trove of 32 patents, the series stands as both a product and a testament to innovation.

As our living spaces evolve into pulsating, innovative ecosystems, robust security measures have transcended from a mere luxury to a fundamental pillar of home management. With its seamless ability to blend with various smart home technologies, this series isn't just about protecting four walls and a roof -- it's about crafting sanctuaries where peace of mind and advanced functionality dance in harmony. Dive into how the Kaadas Legend Master Series connects and enhances the symphony of your smart home life, offering a fortress of comfort in the ever-connected world.
The foundation of Smart Home Integration: SuperShield Intelligent Security System
At the heart of the Kaadas Legend Master Series lies the groundbreaking SuperShield Intelligent Security System. Designed with the assistance of Silicon Valley experts, this system incorporates six major security modules -- lock body, unlocking, information, monitoring, operation, and defense -- into a cohesive unit. Integrating these modules with the KeenOS operating system provides an intuitive user experience and elevates the lock's operational security. This system ensures that the Kaadas Legend Master Series can intelligently interact with other smart home devices, enabling automated responses and synchronized actions that bolster your home's safety and efficiency.
Enhancing connectivity with high-end smart home platforms
Integration with major smart home platforms is a critical feature of the Kaadas Legend Master Series. Whether it's Apple HomeKit, Google Assistant, or Amazon Alexa, this smart lock series ensures you can manage your home security through your preferred technology platform. This compatibility allows for voice commands and remote management, providing convenience and control at your fingertips. For instance, these integrations are all made possible by setting up automated routines, such as locking all doors at night or integrating with security systems to trigger alarms in case of unauthorized access.
Revolutionary features: From aviation-grade radar to AI-enhanced security
A standout feature of the Kaadas Legend Master Series is its incorporation of a 24GHz aviation-grade radar, which offers unmatched accuracy and enhances the smart lock's ability to interact with other security features in your home. This radar supports advanced functions like AI motion detection and human recognition, seamlessly integrating with home surveillance systems to provide real-time alerts and activity tracking. Additionally, the AI lock viewer offers 24-hour monitoring, ensuring home security is covered and controlled from a central smart home hub.
Longevity and reliability: Powering smart integration
The endurance of a smart home device is crucial for its integration into a larger ecosystem. The Kaadas Legend Master Series addresses this with its robust 13,400mAh lithium battery. This high-capacity battery ensures that the lock remains operational for seven to nine months without a recharge, surpassing industry standards. The long battery life minimizes maintenance and ensures the lock can continue communicating with other smart home devices, providing uninterrupted security and functionality.
User-centric design and interaction: The ultimate smart home experience
The Kaadas Legend Master Series is about security and enhancing the user experience through intelligent integration. It features a 4.7-inch large touchscreen with a high-resolution display, offering users easy access and control. This interface allows real-time notifications and video communication, integrating with other smart home devices for a cohesive and interactive experience. Whether remote unlocking via smartwatch or receiving security alerts on your smartphone, the series brings interactivity essential for a modern smart home.
Conclusion
The Kaadas Legend Master Series is a testament to what smart locks can and should be in a modern home. With its advanced security features, robust integration capabilities, and user-focused design, it not only secures your home but also enhances the smart home experience. By seamlessly integrating with your home's ecosystem, the Kaadas Legend Master Series transcends traditional security measures, offering a sophisticated, interconnected, and accessible system that meets the demands of today's tech-savvy homeowners. Investing in such a system protects your premises and elevates the convenience and quality of your daily life.

Read more
Need the perfect travel Adapter? See why Mattias Klum chooses TESSAN
tessan travel adapter image1 d1844f

Recently, world-renowned photographer, explorer, and filmmaker Mattias Klum announced his collaboration with TESSAN. He shared his recommendation of the TESSAN WTA series Travel Adapter on his official website and social media. This partnership has garnered widespread attention, showing more people what a perfect travel adapter should look like.

If you are a frequent international traveler, you will realize that a small travel adapter can significantly impact your journey. It is no exaggeration to say that while many international travelers focus on suitcases, organizers, and other travel essentials, they often overlook the importance of a travel adapter. Despite being a low-profile item, a travel adapter becomes indispensable when you carry multiple digital devices on an international trip because it addresses the crucial issue of charging and power compatibility.
Why a travel adapter is so important
First, you need to understand that not every country uses the same type of socket. For example, if you are on an international trip, departing from the U.S., passing through European countries, and eventually reaching Southeast Asia, you will discover that different countries have different socket types. This means your U.S. standard plug, like a simple phone charger, will not work in European or Southeast Asian sockets because it doesn't fit. Interesting, right? But also a frustrating fact. This is where a travel adapter becomes essential as it allows your electronic devices to fit into different countries' sockets. Finding the perfect travel adapter is a must for international travelers.

Read more