Troubleshoot

While DigitalFate is designed to be as reliable as possible, issues can occasionally arise during setup or execution. This troubleshooting guide provides solutions to common problems and offers advice for resolving more complex errors.


πŸ“‹ Common Issues

1. API Key Errors

Symptoms:

  • Errors like Invalid API Key or Authentication failed.

  • Inability to connect to LLM providers.

Solution:

  • Double-check that you have entered your API key correctly in the configuration.

  • Ensure that your key is active and has appropriate permissions.

  • For multiple key configurations (OpenAI, Anthropic, etc.), verify that each key corresponds to the correct provider.

pythonCopyEditclient.set_config("OPENAI_API_KEY", "YOUR_API_KEY")
  • Make sure you are using the right endpoint for your provider (for example, Azure OpenAI, AWS Bedrock, etc.).


2. Failed Task Execution

Symptoms:

  • Tasks are stuck or return no result.

  • Error: Task execution failed.

Solution:

  • Check task descriptions: If your task requires too much context, consider breaking it into smaller sub-tasks or reducing the complexity.

  • Tool Compatibility: Ensure that the tools required for the task are properly loaded and compatible with the agent.

pythonCopyEdittask1 = Task(description="Research latest news in AI", tools=[Search])
client.agent(agent, task1)
  • Check agent configuration: Verify that the agent is properly configured with the correct permissions, and that it's able to access the necessary tools.


3. Incorrect Agent Responses

Symptoms:

  • Agent returns irrelevant or incorrect answers.

  • Inconsistent responses between runs.

Solution:

  • Context issues: Ensure the context you provide to the agent is clear, concise, and relevant to the task at hand.

  • Memory & State: If the agent is stateful, ensure that its memory configuration is correct and that it's not getting confused with conflicting state.

pythonCopyEditproduct_manager_agent = AgentConfiguration(
    memory=True,  # Ensure agent's memory is enabled
)
  • Agent configuration: Review your agent's job title, knowledge base, and task instructions to ensure they match the problem domain.


4. Tool Integration Problems

Symptoms:

  • Tools fail to execute properly.

  • Errors like Tool not found, or Tool failed to run.

Solution:

  • Install dependencies: Some tools may require external dependencies. Ensure all necessary libraries are installed.

    bashCopyEditpip install digitalfate[tools]
  • Verify tool compatibility: Check that the tool you are using is compatible with the version of DigitalFate installed.

  • Custom tools: If you're using custom tools, ensure your implementation follows the required structure and methods. For example, ensure your tool class inherits from digitalfate.client.tools.Tool.


5. Docker Deployment Issues

Symptoms:

  • Errors related to Docker container deployment.

  • Container fails to start.

Solution:

  • Ports: Ensure that the necessary ports are correctly mapped and available. DigitalFate typically runs on port 8000.

    bashCopyEditdocker run -p 8000:8000 digitalfate/server
  • Check logs: Look at the container logs for more detailed error messages. Run:

    bashCopyEditdocker logs <container_id>
  • Memory issues: Ensure your local machine has enough resources to run the container. DigitalFate may need additional memory if you're running large models or multiple agents simultaneously.


6. API Limit Errors

Symptoms:

  • Errors like Rate limit exceeded or Quota reached.

Solution:

  • Check API limits: Verify the rate limits set by your provider. Some providers may impose limits on how many requests you can make per minute/hour/day.

  • Backoff strategy: Implement a backoff strategy to retry requests after a certain period.

Example for backoff strategy:

pythonCopyEditimport time

try:
    client.call(task1)
except RateLimitError:
    time.sleep(60)  # Wait for 60 seconds before retrying
    client.call(task1)

7. Cloud Integration Issues

Symptoms:

  • Deployment failures on AWS, GCP, or Azure.

  • Errors when connecting to cloud resources.

Solution:

  • Check cloud credentials: Ensure that your cloud API keys (AWS_ACCESS_KEY_ID, GCP credentials, etc.) are correctly configured.

  • Cloud-specific configurations: Each cloud provider requires specific configurations such as region settings or service endpoints.

Example for AWS setup:

pythonCopyEditclient.set_config("AWS_ACCESS_KEY_ID", "YOUR_AWS_ACCESS_KEY_ID")
client.set_config("AWS_SECRET_ACCESS_KEY", "YOUR_AWS_SECRET_ACCESS_KEY")
client.set_config("AWS_REGION", "YOUR_AWS_REGION")
  • Cloud logs: Review cloud logs for any deployment errors. On AWS, check CloudWatch logs. On GCP, check Stackdriver logs.


8. Performance Issues

Symptoms:

  • Slow execution times.

  • Latency or timeout errors during task execution.

Solution:

  • Optimize task complexity: Reduce the complexity of tasks by breaking them down into smaller sub-tasks, especially if they involve large data sets or complex models.

  • Upgrade resources: If you're running DigitalFate locally, consider increasing available CPU or memory. On cloud platforms, consider upgrading to more powerful instances.

  • Task parallelization: Use task chaining or multi-agent collaboration for parallel execution to optimize performance.

pythonCopyEditclient.multi_agent([agent1, agent2], [task1, task2])

πŸ§‘β€πŸ’» Debugging Tips

1. Enable Verbose Logging

To get more detailed error messages, enable verbose logging in your configuration:

pythonCopyEditimport logging

logging.basicConfig(level=logging.DEBUG)

This will print debug information for every task, tool, and agent interaction.

2. Check Dependencies

Ensure all dependencies are installed correctly:

bashCopyEditpip install -r requirements.txt

3. Update DigitalFate

Keep DigitalFate up-to-date to benefit from bug fixes and new features:

bashCopyEditpip install --upgrade digitalfate

πŸ§‘β€πŸ”§ Advanced Debugging

If the above solutions do not resolve your issue, you may need to perform a deeper analysis of the problem:

  1. Trace the API calls: Log every request made to the API, including headers, payloads, and responses.

  2. Review agent reasoning flow: If an agent is failing to respond correctly, manually inspect its reasoning loop and the sequence of tasks it's executing.

  3. Check for system-level issues: If running on a server or cloud environment, ensure there are no hardware or network problems affecting performance.

Last updated