Extract Text From Websites

Бюджет: 250 $

I have a curated list of URLs that I need parsed for their readable text only—no images, no embedded links, no HTML clutter. Once the crawl is finished, I want the raw text returned as straightforward .txt files, one file per source page. You are free to code in Python, Bash, Node, or any stack you prefer; common libraries such as BeautifulSoup, Scrapy, Selenium, or Playwright are fine so long as the final output meets the spec. Please respect robots.txt, set reasonable delays between requests, and keep the natural order of the text exactly as it appears on the page. Deliverables • An executable script (with a brief README) so I can rerun the scrape in the future. • A zipped folder containing the plain-text results, clearly named after each URL. Let me know your estimated turnaround time and any technical questions you have so we can get started right away.

Python

Регистрация