Automated Market Report Analysis System

Замовник: AI | Опубліковано: 28.01.2026
Бюджет: 250 $

https://manus.im/share/file/d0d026db-0c6c-4ae5-9a45-92bda6f0c845 Project Requirements: Automated Market Report Analysis System Date: January 27, 2026 1. Project Overview The objective of this project is to build a fully automated system on Google Cloud Platform (GCP) to perform the following tasks: • Connect to a designated Google Drive folder where market analysis reports are uploaded daily. • Read and analyze the content of report files (PDF, DOCX, PPTX formats) using AI (Google Gemini API). • Extract 16 specific information fields from each report. • Write the extracted data to a pre-existing Google Sheet for storage and analysis. • Run automatically at 10:00 AM every day (Asia/Ho_Chi_Minh timezone) with the ability to trigger manually for testing. 2. Detailed Requirements Functional Requirements 1. Google Drive Integration: • The system must connect to a Google Drive folder with ID: 1FHWHB5nOcNGQ5yGQZvbk_pDGAqUFZt81. • Automatically scan for newly added files in this folder. • Support reading file formats: .pdf, .docx, and .pptx. 2. File Processing and Duplicate Prevention: • The system must have a mechanism to prevent reprocessing of files that have already been analyzed. • Recommendation: Record the names of successfully processed files in a column of the Google Sheet and check this list before each run. 3. Data Extraction Using AI: • Use Google Gemini API (model gemini-1.5-flash or equivalent) to read and understand text content extracted from files. • Accurately extract the following 16 information fields. If information for a field is not found, the value must be null or left empty. # Field Name Description 1 report_name Name of the report file (e.g., Daily Report - 20260127.pdf). 2 report_date Date of the report in YYYY-MM-DD format. 3 source_code Source code or product code mentioned in the report. 4 commodity_code Commodity code related to the report. 5 price_outlook_2_3m Analysis or forecast of price outlook for the next 2-3 months. 6 supply_pressure Analysis of supply pressure in the market. 7 demand_state Analysis of market demand state. 8 inventory_state Analysis of inventory state. 9 upstream_outlook Upstream market outlook. 10 key_driver Key factors driving the market. 11 risk_2_3m Forecasted risks for the next 2-3 months. 12 impact_horizon Impact horizon of the analysis (short-term, medium-term, long-term). 13 certainty_level Certainty level of the forecasts (High, Medium, Low). 14 evidence_short Brief summary of evidence supporting the analysis. 15 report_link Direct link to view the file on Google Drive (webViewLink). 16 ai_recommendation Action recommendation based on AI analysis. 1. Write Data to Google Sheet: • Write extracted data to Google Sheet with ID: 1njNeTZtckEUCzoquyCWmwaQV7tZjVwkxFwQVzaBKePs. • Data must be written to a sheet (tab) named AI_MARKET_ANALYSIS_SIGNAL. • Each report corresponds to a new row in the sheet, and each information field corresponds to a column. 2. Scheduling and Triggering: • The system must automatically run every day at 10:00 AM (Asia/Ho_Chi_Minh timezone). • Must have the ability to manually trigger the system to run at any time for testing or immediate processing of new files. Non-Functional Requirements • Security: Gemini API key and other sensitive information must be securely stored using Google Secret Manager. • Scalability: The solution must be built on GCP serverless services (Cloud Functions, Cloud Run) to easily scale as the number of reports increases. • Cost Optimization: Optimize costs by prioritizing free-tier services and low-cost options. • Logging: Record detailed logs for each run, including processed files, errors encountered, and final results for easy debugging. 3. Proposed Technical Implementation This is the recommended technical architecture to meet the above requirements efficiently and cost-effectively. Technology Stack • Platform: Google Cloud Platform (GCP) • Programming Language: Python 3.11+ • Compute Service: Google Cloud Functions (Generation 2) - A serverless environment that automatically scales and charges only when code runs. • Scheduling: Google Cloud Scheduler - A fully managed cron job service to trigger the Cloud Function on schedule. • Secret Storage: Google Secret Manager - Secure storage for the Gemini API key. • AI & Machine Learning: Google Gemini API - Used for intelligent information extraction. • Authentication & Authorization: Google IAM and Service Accounts - Manage Cloud Function access permissions to other services (Drive, Sheets, Secret Manager). Workflow 1. Cloud Scheduler triggers an HTTP request to the Cloud Function endpoint at 10:00 AM every day. 2. Cloud Function is activated. 3. Inside the Cloud Function, Python code will: a. Use the assigned Service Account for authentication and authorization. b. Connect to Secret Manager to retrieve the Gemini API key. c. Connect to Google Sheets API to read the list of already processed files. d. Connect to Google Drive API to list all files in the designated folder. e. Compare the two lists to identify new unprocessed files. f. For each new file: i. Download the file to the Cloud Function's memory. ii. Extract raw text content from the file (PDF, DOCX, PPTX). iii. Send the raw text to Gemini API along with a carefully designed prompt requesting extraction of 16 information fields and return as JSON. iv. Receive the JSON result from Gemini and add it to a collection list. g. After processing all new files, reconnect to Google Sheets API to write all collected results to the AI_MARKET_ANALYSIS_SIGNAL sheet. 4. Cloud Function completes and returns success status (200 OK).