If you’ve got rich data sitting in your product or marketing database - subscriptions, payments, churn events, or internal metrics - and you’re using PostHog for analytics, you’ve probably wondered:
“how do I get this data into PostHog so I can actually use it?”
This guide walks through how to sync data from any SQL database (like Redshift, Postgres, MySQL, or BigQuery) into PostHog - step by step.
Why You’d Want to Do This
Your database is where all data is recorded. PostHog is where you can visualize that data.
When you connect the two, you bridge the gap between your backend systems and your analytics - so you can answer questions like:
- How many paying users actually used the feature they subscribed for?
- Which accounts churned after a billing error?
- Are certain payment types linked to higher retention?
Once your database rows turn into PostHog events, you can start analyzing them alongside product behavior. No more CSV exports, no manual joins, no guessing.
Step 1: Identify What to Sync
Start by looking at your database tables and asking, which rows represent meaningful actions?
Examples:
| Example Row |
Event to Send to PostHog |
subscription_events
user_id=123, event='renewed', timestamp='2025-10-30'
|
subscription_renewed |
payments
payment_id=abc123, amount=49.99, status='successful'
|
payment_successful |
refunds
refund_id=xyz789, reason='user_cancelled'
|
refund_issued |
If your table describes something that happened, it’s a perfect candidate for an event in PostHog.
Step 2: Pull Data from Your Database
Here’s an example query to fetch new events from a table:
SELECT
user_id,
event_type,
event_timestamp,
amount,
country,
plan_id
FROM subscription_events
WHERE event_timestamp >= NOW() - INTERVAL '1 day';
This will pull every event that occurred in the last 24 hours. This is perfect for a daily sync job.
Step 3: Map Rows to PostHog Events
Each database row becomes one PostHog event.
Here’s what that looks like in Python:
event = {
"event": row["event_type"],
"distinct_id": row["user_id"], # Use your user_id or equivalent
"timestamp": row["event_timestamp"].isoformat(),
"properties": {
"amount": row["amount"],
"country": row["country"],
"plan_id": row["plan_id"],
"source": "database_sync"
}
}
You can then send these events using PostHog’s /batch API, which supports up to 500 events per request:
import gzip, json, requests
payload = {
"api_key": "<POSTHOG_API_KEY>",
"batch": [event for event in events]
}
gz = gzip.compress(json.dumps(payload).encode("utf-8"))
requests.post(
"https://app.posthog.com/batch/",
headers={"Content-Type": "application/json", "Content-Encoding": "gzip"},
data=gz
)
Step 4: Prevent Duplicates with a Tracking Table
If you run this sync regularly (and you should), you’ll want to avoid sending the same data twice.
Create a tracking table in your database:
CREATE TABLE IF NOT EXISTS public.synced_events (
row_id VARCHAR(255),
event_type VARCHAR(255),
synced_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (row_id, event_type)
);
Before sending an event, check if (row_id, event_type) already exists.
If it does, skip it.
If not, send it and insert it into this table.
This makes your pipeline safe to re-run any time.
Step 5: Make It Incremental
Instead of scanning the whole table every time, use timestamps or IDs:
SELECT *
FROM subscription_events
WHERE event_timestamp > (SELECT MAX(synced_at) FROM synced_events);
That way, you only pull fresh data.
You can even keep this value in a small control file or state table if you prefer.
Step 6: Automate It
You can run this pipeline anywhere, but the most common options are:
- AWS Glue: fully managed ETL, runs daily, integrates with Redshift easily.
- Airflow: flexible and good for multi-source syncs.
- Cron + Python Script: simple, fast, and works fine for small volumes.
Whatever you choose, the process is the same:
- Pull new rows.
- Transform to PostHog format.
- Skip duplicates.
- Send events in batches to PostHog.
Step 7: Analyze Your Data in PostHog
Once the data lands in PostHog, you can:
- Build funnels that combine backend and product behavior.
e.g. Subscription Started → Feature Used → Payment Successful - Segment users by billing plan, payment type, or lifecycle stage.
- Trigger feature flags or experiments for specific account groups.
This is where PostHog shines. You can now turn backend data into behavioral insights.
The Takeaway
Your database is your source of truth.
PostHog is your source of analytics.
By syncing them, you connect what users do with what your system knows - giving you the power to analyze, optimize, and automate with confidence.