Research Standard

Apify

Scrape social media platforms, business data, and e-commerce via Apify actors — Instagram profiles/posts/hashtags/comments, LinkedIn profiles/jobs/posts, TikTok profiles/hashtags/videos, YouTube channels/search/comments, Facebook posts/groups/comments, Google Maps business search with contact/review/image extraction, Amazon products/reviews/pricing, and general-purpose multi-page web crawling with custom pageFunction extraction.

01
Workflow
02
References
15
Triggers
medium
Effort

The Problem

Scraping social media or business data through a generic AI means drowning the model in raw API responses. A hundred Instagram posts, unfiltered, hits the context window and costs a fortune in tokens before any analysis starts. Worse, the model has no access to the specialized actors that actually work on each platform — it tries to fetch pages directly, gets blocked immediately, and reports failure. You end up writing your own scripts or paying for a separate tool.

How This Skill Approaches It

The skill wraps nine Apify actors — Instagram, LinkedIn, TikTok, YouTube, Facebook, Google Maps, Amazon, and a general web scraper — in TypeScript wrappers that filter and transform data in code before anything reaches the model context. That's where the 95–99% token savings happen: you ask for 100 posts but only the top 10 by engagement land in context. Parallel multi-platform queries run via Promise.all for social listening dashboards in a single call. The lead enrichment pipeline chains Google Maps (business search with contact extraction) into a qualified filter, then optionally into LinkedIn enrichment. The web scraper handles any site with a custom pageFunction for extracting exactly what you need.

  • File-based TypeScript wrappers filter and transform data in code before returning to model context, achieving 95-99% token savings over direct MCP
  • Parallel multi-platform queries via Promise.all for social listening dashboards
  • Lead enrichment pipeline: Google Maps -> qualified filter -> optional LinkedIn enrichment
Not for X/Twitter operations, 4-tier progressive scraping with proxy escalation (use BrightData), parallel headless automation with auth profiles (use Browser), or real-Chrome bot bypass and computer use (use Interceptor)

In Action

What you say to your DA, and what the Apify skill actually does.

  • You say "pull the top posts from these three competitor Instagram accounts and show me what's getting the most engagement"
    Runs three scrapeInstagramProfile calls in parallel via Promise.all, filters each result set in code to the top posts by likes count, and returns only the filtered slice — roughly 500 tokens instead of 50,000.
  • You say "find qualified restaurant leads in Austin with emails and phone numbers, rating above 4.5"
    Runs searchGoogleMaps with scrapeContactInfo enabled, filters results in code to places meeting the rating and review count thresholds that have an email or phone number, and returns a clean lead list — name, rating, contact info, address.

Inside the Skill

The thinking, frameworks, and architecture that distinguish this skill from a generic version of the same task.

What It Does

Scrapes social platforms, business data, and e-commerce through Apify actors: Instagram, LinkedIn, TikTok, YouTube, Facebook, Google Maps business search, Amazon, and general-purpose web crawling. TypeScript wrappers filter and transform the data in code before any of it reaches the model, so a 100-post scrape costs roughly what 10 posts would. Runs platforms in parallel for social-listening dashboards and chains Google Maps into LinkedIn for lead enrichment.

The Problem

Scraping through a raw MCP dumps every unfiltered result straight into model context — a single Instagram profile with 100 posts burns ~52,000 tokens, most of it noise you'll throw away. You usually want the top 10 posts, the negative reviews from the last week, the qualified leads with an email. Doing that filtering after the data hits the model is too late; the tokens are already spent. Filtering in code first cuts that 52,000 down to ~500.

How It Works

This skill is a file-based MCP — a code-first API wrapper that replaces token-heavy MCP protocol calls. You call an actor wrapper, filter and sort the result in TypeScript, and only the filtered slice reaches model context. That code-before-context step is where the 95-99% token savings come from.

Available Actors

Social Media (5 platforms)

  • Instagram (145k users, 4.60★) - Profiles, posts, hashtags, comments
  • LinkedIn (26k users, 4.10★) - Profiles, jobs, posts
  • TikTok (90k users, 4.61★) - Profiles, videos, hashtags, comments
  • YouTube (40k users, 4.40★) - Channels, videos, comments, search
  • Facebook (35k users, 4.56★) - Posts, groups, comments

Business & Lead Generation

  • Google Maps (198k users, 4.76★) - HIGHEST VALUE!
    • Search businesses, extract contacts, reviews, images
    • Perfect for lead generation

E-commerce

  • Amazon (8k users, 4.97★) - Products, reviews, pricing

Web Scraping

  • Web Scraper (94k users, 4.39★) - General-purpose, works with ANY website

Quick Start

Basic Usage Pattern

import { scrapeInstagramProfile, searchGoogleMaps } from 'actors'

// 1. Call the actor wrapper
const profile = await scrapeInstagramProfile({
  username: 'target_username',
  maxPosts: 50
})

// 2. Filter in code - BEFORE data reaches model!
const viral = profile.latestPosts?.filter(p => p.likesCount > 10000)

// 3. Only filtered results reach model context
console.log(viral) // ~10 posts instead of 50

Examples by Use Case

Social Media Monitoring

Instagram - Track engagement:

import { scrapeInstagramProfile, scrapeInstagramPosts } from 'actors'

// Get profile with recent posts
const profile = await scrapeInstagramProfile({
  username: 'competitor',
  maxPosts: 100
})

// Filter in code - only high-performing posts from last 30 days
const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000)
const topRecent = profile.latestPosts
  ?.filter(p =>
    new Date(p.timestamp).getTime() > thirtyDaysAgo &&
    p.likesCount > 5000
  )
  .sort((a, b) => b.likesCount - a.likesCount)
  .slice(0, 10)

// Only 10 posts reach model instead of 100!

LinkedIn - Job search:

import { searchLinkedInJobs } from 'actors'

const jobs = await searchLinkedInJobs({
  keywords: 'AI engineer',
  location: 'San Francisco',
  remote: true,
  maxResults: 200
})

// Filter in code - only senior roles at well-funded startups
const topJobs = jobs.filter(j =>
  j.seniority?.includes('Senior') &&
  parseInt(j.applicants || '0') > 50
)

TikTok - Trend analysis:

import { scrapeTikTokHashtag } from 'actors'

const videos = await scrapeTikTokHashtag({
  hashtag: 'ai',
  maxResults: 500
})

// Filter in code - only viral content
const viral = videos
  .filter(v => v.playCount > 1000000)
  .sort((a, b) => b.playCount - a.playCount)
  .slice(0, 20)

Lead Generation (Business Intelligence)

Google Maps - Local business leads:

import { searchGoogleMaps } from 'actors'

// Search with contact info extraction
const places = await searchGoogleMaps({
  query: 'restaurants in Austin',
  maxResults: 500,
  includeReviews: true,
  maxReviewsPerPlace: 20,
  scrapeContactInfo: true // Extracts emails from websites!
})

// Filter in code - only highly-rated with email/phone
const qualifiedLeads = places
  .filter(p =>
    p.rating >= 4.5 &&
    p.reviewsCount >= 100 &&
    (p.email || p.phone)
  )
  .map(p => ({
    name: p.name,
    rating: p.rating,
    reviews: p.reviewsCount,
    email: p.email,
    phone: p.phone,
    website: p.website,
    address: p.address
  }))

// Export leads - only qualified results!
console.log(`Found ${qualifiedLeads.length} qualified leads`)

Google Maps - Review sentiment analysis:

import { scrapeGoogleMapsReviews } from 'actors'

const reviews = await scrapeGoogleMapsReviews({
  placeUrl: 'https://maps.google.com/maps?cid=12345',
  maxResults: 1000
})

// Filter in code - analyze sentiment by rating
const recentNegative = reviews
  .filter(r => {
    const thirtyDaysAgo = Date.now() - (30 * 24 * 60 * 60 * 1000)
    return (
      r.rating <= 2 &&
      new Date(r.publishedAtDate).getTime() > thirtyDaysAgo &&
      r.text.length > 50
    )
  })

// Identify common complaints
const complaints = recentNegative.map(r => r.text)

E-commerce & Competitive Intelligence

Amazon - Price monitoring:

import { scrapeAmazonProduct } from 'actors'

const product = await scrapeAmazonProduct({
  productUrl: 'https://www.amazon.com/dp/B08L5VT894',
  includeReviews: true,
  maxReviews: 200
})

// Filter in code - only recent negative reviews
const recentNegative = product.reviews
  ?.filter(r => {
    const weekAgo = Date.now() - (7 * 24 * 60 * 60 * 1000)
    return (
      r.rating <= 2 &&
      new Date(r.date).getTime() > weekAgo
    )
  })

console.log(`Price: $${product.price}`)
console.log(`Rating: ${product.rating}/5`)
console.log(`Recent issues: ${recentNegative?.length} complaints`)

Custom Web Scraping

Any Website - Custom extraction:

import { scrapeWebsite } from 'actors'

const products = await scrapeWebsite({
  startUrls: ['https://example.com/products'],
  linkSelector: 'a.product-link',
  maxPagesPerCrawl: 100,
  pageFunction: `
    async function pageFunction(context) {
      const { request, $, log } = context

      return {
        url: request.url,
        title: $('h1.product-title').text(),
        price: $('span.price').text(),
        inStock: $('.in-stock').length > 0,
        description: $('.description').text()
      }
    }
  `
})

// Filter in code - only available products under $100
const affordable = products.filter(p =>
  p.inStock &&
  parseFloat(p.price.replace('$', '')) < 100
)

Advanced Patterns

Pattern 1: Multi-Platform Social Listening

import {
  scrapeInstagramHashtag,
  scrapeTikTokHashtag,
  searchYouTube
} from 'actors'

// Run all platforms in parallel
const [instagramPosts, tiktokVideos, youtubeVideos] = await Promise.all([
  scrapeInstagramHashtag({ hashtag: 'ai', maxResults: 100 }),
  scrapeTikTokHashtag({ hashtag: 'ai', maxResults: 100 }),
  searchYouTube({ query: '#ai', maxResults: 100 })
])

// Combine and filter - only viral content across all platforms
const allViral = [
  ...instagramPosts.filter(p => p.likesCount > 10000),
  ...tiktokVideos.filter(v => v.playCount > 100000),
  ...youtubeVideos.filter(v => v.viewsCount > 50000)
]

console.log(`Found ${allViral.length} viral posts across 3 platforms`)

Pattern 2: Lead Enrichment Pipeline

import { searchGoogleMaps, scrapeLinkedInProfile } from 'actors'

// 1. Find businesses on Google Maps
const restaurants = await searchGoogleMaps({
  query: 'restaurants in SF',
  maxResults: 100,
  scrapeContactInfo: true
})

// 2. Filter for qualified leads
const qualified = restaurants.filter(r =>
  r.rating >= 4.5 &&
  r.email &&
  r.reviewsCount >= 50
)

// 3. Enrich with LinkedIn data (if available)
const enriched = await Promise.all(
  qualified.map(async (restaurant) => {
    // Try to find LinkedIn company page
    // ... additional enrichment logic
    return restaurant
  })
)

Pattern 3: Competitive Analysis Dashboard

import {
  scrapeInstagramProfile,
  scrapeYouTubeChannel,
  scrapeTikTokProfile
} from 'actors'

async function analyzeCompetitor(username: string) {
  // Gather data from all platforms
  const [instagram, youtube, tiktok] = await Promise.all([
    scrapeInstagramProfile({ username, maxPosts: 30 }),
    scrapeYouTubeChannel({ channelUrl: `https://youtube.com/@${username}`, maxVideos: 30 }),
    scrapeTikTokProfile({ username, maxVideos: 30 })
  ])

  // Calculate engagement metrics in code
  return {
    username,
    instagram: {
      followers: instagram.followersCount,
      avgLikes: average(instagram.latestPosts?.map(p => p.likesCount) || []),
      engagementRate: calculateEngagement(instagram)
    },
    youtube: {
      subscribers: youtube.subscribersCount,
      avgViews: average(youtube.videos?.map(v => v.viewsCount) || [])
    },
    tiktok: {
      followers: tiktok.followersCount,
      avgPlays: average(tiktok.videos?.map(v => v.playCount) || [])
    }
  }
}

Token Savings Calculator

Example: Instagram profile with 100 posts

MCP Approach:

1. search-actors → 1,000 tokens
2. call-actor → 1,000 tokens
3. get-actor-output → 50,000 tokens (100 unfiltered posts)
TOTAL: ~52,000 tokens

File-Based Approach:

const profile = await scrapeInstagramProfile({
  username: 'user',
  maxPosts: 100
})

// Filter in code - only top 10 posts
const top = profile.latestPosts
  ?.sort((a, b) => b.likesCount - a.likesCount)
  .slice(0, 10)

// TOTAL: ~500 tokens (only 10 filtered posts reach model)

Savings: 99% reduction (52,000 → 500 tokens)

Actor Reference

Social Media

Instagram

  • scrapeInstagramProfile(input) - Profile + posts
  • scrapeInstagramPosts(input) - Posts from user
  • scrapeInstagramHashtag(input) - Posts by hashtag
  • scrapeInstagramComments(input) - Comments on post

LinkedIn

  • scrapeLinkedInProfile(input) - Profile + experience + email
  • searchLinkedInJobs(input) - Job listings
  • scrapeLinkedInPosts(input) - Posts from profile/company

TikTok

  • scrapeTikTokProfile(input) - Profile + videos
  • scrapeTikTokHashtag(input) - Videos by hashtag
  • scrapeTikTokComments(input) - Comments on video

YouTube

  • scrapeYouTubeChannel(input) - Channel + videos
  • searchYouTube(input) - Search videos
  • scrapeYouTubeComments(input) - Comments on video

Facebook

  • scrapeFacebookPosts(input) - Posts from pages
  • scrapeFacebookGroups(input) - Group posts
  • scrapeFacebookComments(input) - Post comments

Business & Lead Generation

Google Maps

  • searchGoogleMaps(input) - Search places (with contact extraction!)
  • scrapeGoogleMapsPlace(input) - Single place details
  • scrapeGoogleMapsReviews(input) - Place reviews

E-commerce

Amazon

  • scrapeAmazonProduct(input) - Product details + reviews
  • scrapeAmazonReviews(input) - Product reviews only

Web Scraping

General Web

  • scrapeWebsite(input) - Custom multi-page crawling
  • scrapePage(url, pageFunction) - Single page extraction

️ Configuration

Environment Variables:

# Required - Get from https://console.apify.com/account/integrations
APIFY_TOKEN=apify_api_xxxxx...

Actor Run Options:

{
  memory: 2048,    // MB: 128, 256, 512, 1024, 2048, 4096, 8192
  timeout: 300,    // seconds
  build: 'latest'  // or specific build number
}

When to Use This vs MCP

Use File-Based (this skill):

  • ✅ Need to filter large datasets (>100 results)
  • ✅ Want to transform/aggregate data in code
  • ✅ Multiple sequential operations
  • ✅ Control flow (loops, conditionals)
  • ✅ Maximum token efficiency

Use MCP:

  • ❌ Simple single operations with small results (<10 items)
  • ❌ One-off exploratory queries
  • ❌ Don't want to write code

Gotchas

  • Actor selection matters. Each social platform has specific actors — don't use a generic scraper for Instagram when a dedicated Instagram actor exists.
  • Rate limits vary by platform and plan. Check actor documentation for limits before running large scrapes.
  • Scraped data format varies by actor. Read the actor's output schema before processing results.

Examples

Example 1: Scrape Instagram profile

User: "get the recent posts from this Instagram account"
→ Selects Instagram Profile actor
→ Runs with target profile URL
→ Returns structured post data (text, engagement, dates)

Example 2: LinkedIn company scrape

User: "scrape this company's LinkedIn page"
→ Selects LinkedIn Company actor
→ Returns company info, employee count, recent posts

Workflows · 1

  1. 01
    Update Workflows/Update.md

How to Invoke

Say any of these to your DA and PAI activates the Apify skill automatically:

  • "scrape Instagram"
  • "scrape LinkedIn"
  • "scrape TikTok"
  • "scrape YouTube"
  • "scrape Facebook"
  • "Google Maps leads"
  • "Amazon reviews"
  • "business intelligence"
  • "multi-platform social listening"
  • "competitive analysis"
  • "lead generation"
  • "social monitoring"
  • "Apify actors"
  • "web crawl"

Or invoke explicitly:

Skill("Apify")

References · 2

Auxiliary files the skill loads at runtime — frameworks, guides, configs.

  • INTEGRATION
  • README

Want PAI to do this for you?

Install PAI on your machine — your DA gets the Apify skill plus 44 others, all hooked into one Life OS.