Large-Scale PDF Scraping & Translation WordPress Website Development (((or custom PHP/MySQL solution))) MOST IMPORTANT: Before placing any bid, you must contact me privately to receive the link to the website from which the PDF documents will be scraped and downloaded. Do not place a bid before contacting me. We’re looking for an experienced developer to build a scalable system that can automatically scrape, translate, and publish a very large volume of PDF documents as SEO-friendly web pages 1. Data Extraction & Processing • Automatically scrape and download all PDF files from a publicly available website. • Extract text content from each PDF. 1a.[Pay special attention to extracting the title of each document (this will become the article title – see step 2 content publishing)]. 1b. [Remove any personal data (especially from the first pages of the documents) so that such information is not extracted or published.] • Translate extracted text using open source translator (or any other free reliable translator). ________________________________________ 2. Content Publishing • For each translated file, create a new article/page on the website. Each PDF file =>AI Translation => one SEO-friendly page. • Technology: WordPress (((or custom PHP/MySQL solution))). • Text must be stored in the database (not as iFrames) for full SEO rendering. The complete text must be visible as standard HTML text. ________________________________________ 3. SEO & Indexing • Auto-generate unique meta titles and meta descriptions for every page (fully crawlable, indexable). • Use clean, descriptive URLs (e.g. /category/document-title-keywords). o Each page should include: Title, Tags, Meta description, Full HTML/text content. • Implement an XML sitemap. ________________________________________ 4. Security & Reliability • Anti-scraping & anti-DDoS protection. • DMCA/copyright system - Please include a DMCA / Copyright Notice & Takedown Contact section on the website, where users can submit requests to remove copyrighted material that they believe has been published without authorization. ________________________________________ 5. Performance Targets • Fast page load times and mobile-first responsive design. Page load time: under 2.5 seconds (desktop & mobile). • Core Web Vitals score: 90+ (Google PageSpeed Insights). • TTFB: under 500 ms. ________________________________________ 6. Search & Navigation • Search bar with filters (categories, tags, keywords). • Fast search results with filtering options. • Browsing by category. • Support for multiple category levels (category, subcategory, sub-subcategory). • All pages must be free to read and browse for all visitors. ________________________________________ 7. Scalability • Implement a scalable architecture to handle a large volume of content efficiently. • The system/script must be capable of automatically scraping/downloading PDFs & translating, + publishing the initial 220,000+ PDF text files into indexable web pages upon launch. ________________________________________ Payment Terms: -100% of the payment will be placed in Escrow on Freelancer.com. - Payment will be released only after the project is fully functional on the live server and all requirements are met. - Proof required: a production-ready website, hosted and running on the client’s live domain and server, with all 220,000 initial documents uploaded and accessible, and achieving a Google PageSpeed Insights Core Web Vitals score of 90+ on the Document Page CPT (Custom Post Type). ________________________________________ Deliverable: A complete, production-ready website/system meeting all the above requirements. The system must be: -Fully installed, configured, and functional on the client’s own domain and hosting/server; - Delivered with all the initial 220,000 documents uploaded, indexed, and publicly accessible; - Optimized for performance and stability according to the agreed technical specifications; - Structured and coded in a way that allows easy customization and duplication for future websites with similar functionality. ________________________________________ !!!!! PLEASE READ BEFORE BIDDING !!!!! Do not bid if you do not have the skills to complete this project YOU NEEED SKILLS & EXPERIENCE with Large-Scale PDF Scraping & Translation WordPress Website Development. !!!!!!Do not bid if you have never done this before. This should be a simple project for someone who knows what they are doing.!!!!!!! TO APPLY: - Place your real bid amount, not a placeholder. I do not want to waste time renegotiating. Time-wasters, please do not bid. Place a real bid amount for this project, not a random sum, and do not ask for more money later. No generic bids. Bid what you actually want me to pay you. I will choose based on the content of your bid. - Please DO NOT bid if you haven’t read the full job description. Please start your proposal with the phrase -"The sun was pink today"- in the first line of your proposal; otherwise, it will not be considered. This is to confirm that you have read the full description. My time is just as important as yours, and I don’t want us to waste each other’s time. - Please DO NOT send copy-paste automated messages or automated bids. Questions & Clarifications Ask any questions or request clarifications before placing your bid. Do NOT ask questions or clarifications after bidding. !!!!Everything above is required for your bid to be considered!!!!!