<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Pixeljets]]></title><description><![CDATA[Building & growing SaaS products for fellow developers, in solo. Stories about modern web tech.]]></description><link>https://pixeljets.com/blog/</link><image><url>https://pixeljets.com/blog/favicon.png</url><title>Pixeljets</title><link>https://pixeljets.com/blog/</link></image><generator>Ghost 5.65</generator><lastBuildDate>Fri, 03 Apr 2026 23:36:44 GMT</lastBuildDate><atom:link href="https://pixeljets.com/blog/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Launching ModelRift: a web-based IDE for parametric 3D modeling]]></title><description><![CDATA[<!--kg-card-begin: markdown--><p>Last week I got a message from a parent: &quot;Hey, my 9 year old son uses ModelRift for creating things for his 3D printer, it&apos;s great! Product feedback: You should probably ask me to pay now, I feel like I&apos;ve used it enough.&quot;</p>
<p>That</p>]]></description><link>https://pixeljets.com/blog/modelrift-ide-for-parametric-3d-modeling/</link><guid isPermaLink="false">69ab09525e95f1d3ef1263be</guid><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Fri, 06 Mar 2026 17:13:27 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><p>Last week I got a message from a parent: &quot;Hey, my 9 year old son uses ModelRift for creating things for his 3D printer, it&apos;s great! Product feedback: You should probably ask me to pay now, I feel like I&apos;ve used it enough.&quot;</p>
<p>That one landed. Let me tell you how this thing came to exist.</p>
<iframe width="1543" height="868" src="https://www.youtube.com/embed/2jE_qX4u-rU" title="This AI assistant generates parametric CAD 3D Models in browser" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<hr>
<h2 id="the-rabbit-hole">The rabbit hole</h2>
<p>I bought a 3D printer. Turned out to be the best purchase I&apos;d made in years.</p>
<p>After printing a few things from Makerworld and Printables, I fell headfirst into parametric CAD design. Fusion 360, flanges, hinges, chamfers, fillets, loft, sweep, revolve. I learned words I&apos;d never needed before. At some point I realized I was more interested in designing models for other people than printing them myself.</p>
<p>Then I found <a href="https://en.wikipedia.org/wiki/OpenSCAD?ref=pixeljets.com">OpenSCAD</a>. It&apos;s essentially a DSL for 3D modeling: you describe geometry as code, and it renders the result. As a programmer, this felt like home. No mouse, no GUI, just code and math.</p>
<p>The problems started when I wanted complex geometry. Sinusoidal rib distributions, procedurally generated mazes, things that are genuinely hard to express in static <a href="https://en.wikipedia.org/wiki/Parametric_design?ref=pixeljets.com">parametric drawings</a>. I turned to ChatGPT and Gemini. They can generate basic OpenSCAD &quot;skeletons&quot; reasonably well. But the geometry is broken in roughly 90% of cases. The code is syntactically perfect. It runs without errors. It just produces... something that doesn&apos;t look anything like what you asked for.</p>
<p>So the workflow became: ask ChatGPT for code, paste into OpenSCAD desktop app, render, stare in horror, take a screenshot, draw arrows on it in some image editor, paste the screenshot back into ChatGPT, repeat. Ten iterations per model if you were lucky.</p>
<h2 id="what-i-built">What I built</h2>
<p><a href="https://modelrift.com/?ref=pixeljets.com">ModelRift</a> is a browser-based OpenSCAD editor with an embedded AI chat. The core loop: you describe what you want, the AI writes <code>.scad</code> code, OpenSCAD renders it server-side and returns a 3D preview. If the result is wrong, you click &quot;Annotate&quot;, draw directly on the rendered model - arrows, rectangles, text labels - and send the annotated screenshot back to the AI.</p>
<p>That annotation step is where most of the work went. Weeks of iteration on just the annotation mode. It uses <a href="http://fabricjs.com/?ref=pixeljets.com">Fabric.js</a> under the hood, and getting the overlay to composite correctly with the <a href="https://en.wikipedia.org/wiki/Three.js?ref=pixeljets.com">Three.js</a> viewport took longer than I want to admit. But it solves the actual problem: LLMs understand spatial feedback from visual annotations much better than text descriptions like &quot;the left side is too wide.&quot;</p>
<p>The model viewer gives you real-time orbit controls, and after each iteration you can see the updated geometry without leaving the page. The diff viewer shows exactly what changed in the <code>.scad</code> code between versions. You can revert to any previous revision.</p>
<h2 id="technical-bits-worth-mentioning">Technical bits worth mentioning</h2>
<p>The stack is React + Three.js (for .stl model preview) on the frontend, Node.js + PostgreSQL on the backend. For the job queue I used <a href="https://github.com/timgit/pg-boss?ref=pixeljets.com">pg-boss</a> instead of Redis - it runs on the same Postgres instance and eliminates one infrastructure dependency. Simple tradeoff that I&apos;m happy with.</p>
<p>The AI model is currently Gemini Flash with thinking mode enabled. I evaluated several options and Gemini Flash produced the fewest geometric errors at reasonable cost for this specific use case. The backend runs OpenSCAD as a CLI process, generates multi-view PNG renders using Sharp, and streams progress back to the frontend via Server-Sent Events.</p>
<p>One feature I&apos;m genuinely proud of is SVG import. If you want to engrave a logo or artwork onto a 3D model, you upload an SVG and ModelRift converts it to OpenSCAD geometry. I wrote the converter from scratch: a regex-based SVG path parser that handles M, L, H, V, C, S, A, Z commands, interpolates <a href="https://en.wikipedia.org/wiki/B%C3%A9zier_curve?ref=pixeljets.com">Bezier curves</a>, and outputs <code>polygon()</code> primitives with correct point indices. No external library. It runs in three modes (polygon, lines, or hybrid) and auto-detects which one fits the input. The result drops directly into the agent&apos;s context, so the AI can use your artwork as geometry in the generated model.</p>
<iframe width="1543" height="868" src="https://www.youtube.com/embed/jZMcmJ803ns" title="Easiest way to convert SVG to 3D model" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<p>One more thing: the intro video for the launch (the logo spinning and title typing in) was generated frame-by-frame from a plain HTML animation using Puppeteer at 60fps, then assembled into an mp4 with ffmpeg. No After Effects, no video editor. Just a Node.js script and some CSS animations.</p>
<h2 id="what-shipped-since-launch">What shipped since launch</h2>
<p>The initial v0.1 was the bare minimum: browser editor, live 3D preview, <a href="https://modelrift.com/blog/3d-file-formats-explained?ref=pixeljets.com">STL export</a>. Since then:</p>
<ul>
<li>v0.2 added a side-by-side code editor with live preview and a diff viewer for tracking changes</li>
<li>v0.3 added a public model gallery and user profiles - you can browse and remix models without touching the AI assistant at all</li>
<li>v0.3.1 added revision history (revert to any previous version of your SCAD code)</li>
<li>v0.3.3 improved SVG import significantly, adding proper polygon and line conversion modes</li>
</ul>
<p>The public gallery is at <a href="https://modelrift.com/models?ref=pixeljets.com">modelrift.com/models</a>. Yeah it still does not instantly enable you build a complex assemblies, which took weeks to grap in Fusion 360 or OnShape. But still worth a look even if you&apos;re skeptical about AI-generated geometry.<br>
<img src="https://pixeljets.com/blog/content/images/2026/03/soap-dish-v1.jpg" alt="soap-dish-v1" loading="lazy"></p>
<p><img src="https://pixeljets.com/blog/content/images/2026/03/soap-dish-v3.jpg" alt="soap-dish-v3" loading="lazy"></p>
<p><img src="https://pixeljets.com/blog/content/images/2026/03/engraving.jpg" alt="engraving" loading="lazy"></p>
<h2 id="first-payment">First payment</h2>
<p>I got my first payment for ModelRift in 3 weeks after launch.<br>
I still get shivers on first payment - even after launching a lot of SaaS products. This proves the thing is useful to someone and brings value.</p>
<h2 id="first-model-printed-by-the-community">First model printed by the community</h2>
<p>One of the early things that told me people were actually using this: someone published a real, practical model on the gallery and printed it.</p>
<p><a href="https://modelrift.com/models/exhaust-hose-adapter-for-small-skylight-for-3d-printers-and-others?ref=pixeljets.com">Exhaust hose adapter for small skylight</a> by viewprintlab - a parametric adapter for venting a 3D printer through a small skylight. Adjustable width from 18 to 22 inches, compatible with a 4-inch flexible hose, designed to attach without drilling using command strips. Two parts that print separately and slide together.</p>
<p>It&apos;s exactly the kind of model that makes no sense to look for on Printables - too specific to one person&apos;s window size - but takes maybe 20 minutes to generate in ModelRift and then works. That&apos;s the use case I was building for.</p>
<h2 id="where-it-stands">Where it stands</h2>
<p>Every new user gets 150 free credits, enough for roughly 15-30 models (depends on the complexity). After that it costs money because the LLM costs money and I&apos;d prefer not to go bankrupt.</p>
<p>This wasn&apos;t a weekend project. It took many days and nights, and there are still plenty of things to improve. But my family uses it for our own printing needs now, which is the most honest endorsement I can give.</p>
<p>If you try it, I&apos;d genuinely appreciate feedback: <a href="https://modelrift.com/?ref=pixeljets.com">modelrift.com</a></p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Time to move on: n8n vs code for SaaS builders]]></title><description><![CDATA[<!--kg-card-begin: markdown--><p>n8n is a no-code (low-code) automation platform - similar to Zapier but more technical - and it can be self-hosted. I love n8n (<a href="https://pixeljets.com/blog/n8n/">read my older detailed writeup about it</a>): it makes you feel powerful on day one &#x2014; drag a few nodes, connect an API, and suddenly you&#x2019;</p>]]></description><link>https://pixeljets.com/blog/n8n-vs-code/</link><guid isPermaLink="false">6908add5d6e20d89aaa976ca</guid><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Mon, 03 Nov 2025 13:46:52 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><p>n8n is a no-code (low-code) automation platform - similar to Zapier but more technical - and it can be self-hosted. I love n8n (<a href="https://pixeljets.com/blog/n8n/">read my older detailed writeup about it</a>): it makes you feel powerful on day one &#x2014; drag a few nodes, connect an API, and suddenly you&#x2019;ve automated a whole process. I&apos;ve built a lot with it myself, including custom nodes, e.g. <a href="https://scrapeninja.net/docs/n8n/?ref=pixeljets.com">ScrapeNinja web scraping API integration node</a>. I also maintain the <a href="https://github.com/restyler/awesome-n8n?ref=pixeljets.com">Awesome n8n GitHub repo</a>. For many non-technical indie builders and small teams, n8n feels like magic, and I see many agencies popping up that set up self-hosted n8n instances for their customers.<br>
n8n took over Zapier in Google Trends and in people minds in 2024-2025 - it got this momentum mostly because its self-hosted distribution model is brilliant and serves as an awesome marketing channel; and because AI influencers all over Youtube used n8n as a platform for &quot;Run your own AI chat bot in 5 min&quot; educational videos.</p>
<p><img src="https://pixeljets.com/blog/content/images/2025/11/2025-11-03-at-16.33.png" alt="n8n took over Zapier in Google Trends" loading="lazy"><br>
But eventually, some of us hit the ceiling.</p>
<p>After building a few internal tools and SaaS prototypes with n8n, and after talking with lots of smart folks doing the same, I noticed a pattern: the moment you start calling it a &#x201C;product&#x201D; or &quot;SaaS&quot; rather than an &#x201C;automation,&#x201D; you&#x2019;re probably past the point where n8n fits.<br>
It is important to note that most of the points mentioned in this post are relevant for most of no-code and low-code platforms.<br>
Here&#x2019;s when that happens.</p>
<hr>
<h2 id="1-you-need-a-real-ux">1. You Need a Real UX</h2>
<p>n8n shines behind the scenes, not in front of customers. When launching products, once I think &quot;would I use n8n here for MVP?&quot; (By the way, MVP abbr stands for Minimum Viable Product) this one is the major blocker. More abbreviations for not-too-technical readers - UX is a shortcut for User Experience, UI stands for User Interface.</p>
<p>What n8n can do well in &#x201C;UX land&#x201D;:</p>
<ul>
<li>Webhooks: it&#x2019;s totally fine to keep n8n as your webhook receiver for internal flows or to glue services together. Treat it like a background worker with HTTP ingress. There is a trick <a href="https://www.reddit.com/r/n8n/comments/1hehssw/how_do_i_create_a_webpage_using_n8n/?ref=pixeljets.com">to emulate HTML page output via n8n webhook</a> but believe me, this does not end well for real products.</li>
<li>Forms: there is a built-in <a href="https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-base.form/?ref=pixeljets.com">Form node</a> for quick data capture, approvals, or small operational tools. To my taste, they are ugly, but they do the job.</li>
<li>Chat UI - n8n has custom chat UI which allows you to embed AI assistant into HTML page. I don&apos;t really like its UX but it works - and is great as a starter kit to experiment with AI agents!</li>
</ul>
<p>For custom UIs, deploy separately &#x2014; another no&#x2011;code tool or a code app.</p>
<p>Where it starts to hurt:</p>
<ul>
<li>Once you&#x2019;re building user dashboards, onboarding flows, sessions, roles/permissions, or anything that must feel polished, the visual workflow metaphor fights you. You&#x2019;ll spend more time wiring webhooks and shaping JSON than building a focused UI.</li>
</ul>
<p>At that stage, you&#x2019;re not automating anymore &#x2014; you&#x2019;re engineering. A small code app (Next.js/Express/FastAPI) with a proper auth layer will be faster to iterate on and easier to test, version, and review.</p>
<p>If you never shipped real code to production, some news for you: modern frontend is a sticky mess. If you decide to go this route - I recommend to check <a href="https://ui.shadcn.com/?ref=pixeljets.com">shadcn UI</a> which is definitely one of best UI frameworks available on the market right now. I also recommed to check out <a href="https://pixeljets.com/blog/lovable-dev-vs-bolt-new/">Lovable and Bolt</a> which can mitigate your pain of getting started with UX; and then switch over to Claude Code or Cursor for editing.</p>
<hr>
<h2 id="2-payments-are-serious">2. Payments Are Serious</h2>
<p>Subscriptions are the major thing which turns your workflows or code into real SaaS Product.</p>
<p>Most no-code workflows can handle simple Stripe calls, but as soon as you add subscription logic, idempotency keys, retries, proration, or multi-currency handling, things get fragile. The &#x201C;Stripe node + webhook&#x201D; combo works for MVPs, not for a real SaaS with thousands of transactions and audit requirements.</p>
<p>Stripe API is not what it used to be in 2010 - it&apos;s really big now, caters for different use cases, and it takes some time to figure out how it works.</p>
<p>In code, you can keep webhooks thin, enqueue events, process with a job queue, and attach observability. That level of control around billing, error handling, and audit trails is hard to replicate in a visual editor.</p>
<hr>
<h2 id="3-n8n-community-license-does-not-fit-saas">3. n8n Community License Does Not Fit SaaS</h2>
<p>Recently, someone told me they were building a multi&#x2011;customer AI chatbot &#x201C;on top of n8n&#x201D; because &#x201C;n8n developers are cheaper than regular devs.&#x201D; That&#x2019;s an MVP&#x2011;stage shortcut, not a production strategy. Many folks don&#x2019;t realize the self&#x2011;hosted Community Edition license isn&#x2019;t designed for building and selling a SaaS on top of n8n.</p>
<p>Will n8n find out if you violate the license? Don&#x2019;t gamble. Do the right thing from the start&#x2014;either budget for a commercial license or move the product logic to code.</p>
<p>Licensing context and links:</p>
<ul>
<li>n8n is fair&#x2011;code, with the Sustainable Use License and a separate Enterprise License. Read: <a href="https://github.com/n8n-io/n8n/blob/master/LICENSE.md?ref=pixeljets.com">Sustainable Use License</a>, <a href="https://github.com/n8n-io/n8n/blob/master/LICENSE_EE.md?ref=pixeljets.com">Enterprise License</a>, and the <a href="https://docs.n8n.io/sustainable-use-license/?ref=pixeljets.com">license explainer</a>.</li>
<li>The SUL typically allows internal business use and consulting. Hosting n8n as the paid product or embedding it as a revenue&#x2011;generating feature requires a commercial agreement. n8n also offers an <a href="https://n8n.io/embed/?ref=pixeljets.com">Embed program</a> that, as of this writing, starts around $50,000/year.</li>
</ul>
<p>Direct quote from the SUL: <a href="https://docs.n8n.io/sustainable-use-license/?ref=pixeljets.com#what-is-the-sustainable-use-license">docs</a></p>
<pre><code>Our license restricts use to &quot;internal business purposes&quot;. In practice this means all use is allowed unless you are selling a product, service, or module in which the value derives entirely or substantially from n8n functionality.
</code></pre>
<p>Put simply: the moment you accept payments for a SaaS built on top of n8n, the Community Edition is not permitted.</p>
<p>Real&#x2011;world threads on multi&#x2011;tenant credentials and scale:</p>
<ul>
<li><a href="https://community.n8n.io/t/using-n8n-to-power-a-scalable-saas-for-ai-assistants-self-hosted-or-cloud/71760?ref=pixeljets.com">Using n8n to Power a Scalable SaaS for AI Assistants &#x2013; Self&#x2011;Hosted or Cloud?</a></li>
<li>Licensing clarifications from users and staff: <a href="https://www.reddit.com/r/n8n/comments/1ioogo9/?ref=pixeljets.com">Reddit</a></li>
</ul>
<p>Takeaway: once you need per&#x2011;tenant secrets, scoped permissions, and contractual clarity&#x2014;and you&#x2019;re not budgeting for Enterprise/Embed&#x2014;a small code backend with proper auth, a secrets manager, and per&#x2011;tenant queues is safer than stretching a single n8n instance.</p>
<hr>
<h2 id="4-you-need-versioning-that-actually-works">4. You Need Versioning That Actually Works</h2>
<p>Yes, workflows are JSON and you can export them. Some folks even automate export-to-Git from n8n itself or rely on built-in history. But merges and reviews remain painful compared to code.</p>
<p>Community anecdotes and constraints:</p>
<ul>
<li>Users report syncing JSON to Git or using history; full Git integration is an enterprise feature. Reddit thread: <a href="https://www.reddit.com/r/n8n/comments/1k6jhfe/?ref=pixeljets.com">Version control in n8n</a></li>
<li>Try merging two branches of a big workflow and you&#x2019;ll see why text diffs, typed models, and PRs still win.</li>
</ul>
<p>At any point, <code>git diff</code> beats &#x201C;export workflow as JSON.&#x201D; If your team depends on code review, CI, and tests, you&#x2019;ve outgrown the visual editor.<br>
Don&apos;t even get me started on modern AI coding tools which can review git history and commit code on your behalf... it&apos;s just another level and you should try it as soon as possible.</p>
<hr>
<h2 id="5-when-architecture-gets-complex">5. When Architecture Gets Complex</h2>
<p>Once you&#x2019;re orchestrating multiple APIs, queues, webhooks, and databases, n8n starts feeling like one giant script rather than a set of testable modules.</p>
<p>In code, you can separate layers, add tests, monitor performance, and scale horizontally. In n8n, complexity accumulates in a single visual flow, and operational concerns (retries, DLQs, idempotency, tracing) are harder to centralize.</p>
<p>The &#x201C;47 nodes at 2 AM&#x201D; anecdote shows up often for a reason&#x2014;visual debt is real. If you need SLAs and clean rollback paths, code gives you the controls.</p>
<ul>
<li>About the Code node: the <a href="https://docs.n8n.io/code/code-node/?ref=pixeljets.com">Code node</a> is excellent for quick JavaScript/Python snippets, but for non&#x2011;trivial logic you quickly run into dependency and runtime constraints.</li>
<li>Anecdote: I had a list of freelancers waiting for crypto payouts. Using n8n, I built a workflow that scanned Google Sheets and created payouts on the BEP&#x2011;20 (EVM&#x2011;compatible) blockchain, then wrote back transaction hashes or errors. I needed blockchain npm packages (like <code>ethers</code>), so I built a custom Docker image for n8n with those dependencies baked in. It worked&#x2014;but managing deps, image updates, and security patches felt like running a bespoke backend rather than &#x201C;no&#x2011;code&#x201D;.</li>
<li>AI constraints: self&#x2011;hosted Code node doesn&#x2019;t include AI enhancements. Writing code manually in a browser in 2025 is dated; AI assistance for the Code node is currently available to n8n Cloud users, not typical self&#x2011;hosted setups.</li>
<li>Signal to move: if you&#x2019;re repeatedly adding external SDKs or complex libraries, move that logic to a small service with its own repo, dependencies, tests, and CI. Call it from n8n via HTTP/queues&#x2014;or replace the flow with code entirely.</li>
</ul>
<hr>
<h2 id="6-performance-concerns">6. Performance Concerns</h2>
<p>Performance in n8n is about the whole workflow lifecycle: triggers, network calls, payload sizes, execution storage, and scaling &#x2014; not just the database.</p>
<ul>
<li>
<p>Item fan&#x2011;out and node count:</p>
<ul>
<li>Every node hop serializes/deserializes data and adds latency. Long chains with large items get slow and memory&#x2011;hungry.</li>
<li>Collapse trivial transforms into a single Code node where it reduces hops, or pre&#x2011;aggregate upstream so you pass fewer, smaller items.</li>
</ul>
</li>
<li>
<p>Network chattiness and batching:</p>
<ul>
<li>Loops that call external APIs per row are the classic bottleneck. Prefer bulk endpoints, batch inserts/updates, and rate&#x2011;aware concurrency.</li>
<li>Use Split In Batches to cap concurrency and add backoff; avoid tight polls when webhooks/events are available.</li>
</ul>
</li>
<li>
<p>Database as a bottleneck - in a lot of cases, Google Sheets and Airtable are used as a database and while these products are awesome getting started; it is a very poor choice even for a medium load. Luckily, in 2025 n8n has finally launched <a href="https://www.reddit.com/r/n8n/comments/1nnqls1/data_tables_are_here/?ref=pixeljets.com">data tables</a> - built-in db which should perform way better compared to Google Sheets.</p>
</li>
<li>
<p>Scale the runner, not just the editor:</p>
<ul>
<li>For sustained throughput, run Queue Mode with Redis and multiple workers (<code>N8N_EXECUTIONS_MODE=queue</code>, set worker concurrency) to process jobs in parallel.</li>
<li>Keep heavy workflows off the main/editor instance; scale workers horizontally and monitor queue depth/latency.</li>
</ul>
</li>
<li>
<p>Triggers and external systems:</p>
<ul>
<li>Polling (e.g., Google Sheets) adds unpredictable latency and hits quotas. Prefer webhooks and event sources where possible.</li>
<li>Sheets are fine for prototypes; for hot paths use a real datastore (PostgreSQL) and let n8n orchestrate, not store, the workload.</li>
</ul>
</li>
<li>
<p>Data integrity under retries:</p>
<ul>
<li>Per&#x2011;node retries can create duplicates. Use idempotency keys, unique constraints/UPSERTs downstream, or move critical write logic behind a small API that enforces exactly&#x2011;once semantics.</li>
</ul>
</li>
</ul>
<p>Bottom line: optimize workflows for fewer hops, smaller payloads, event&#x2011;driven triggers, and queue&#x2011;backed execution. Use Postgres (and/or a thin API) for durability and idempotency, and scale n8n with workers when throughput matters.</p>
<hr>
<h2 id="7-deployment">7. Deployment</h2>
<p>One of n8n&#x2019;s strongest advantages is skipping DevOps entirely: a single Docker container and you&#x2019;re productive. For many teams, that&#x2019;s perfect for internal automations and prototypes.</p>
<p>When DevOps becomes necessary:</p>
<ul>
<li>Compliance and backups: scheduled DB/file backups, retention policies, and auditable change management for credentials and workflows.</li>
<li>High availability and scale: queue mode with Redis, multiple workers, external Postgres, and rolling upgrades; pin versions and test migrations.</li>
<li>Security and access: TLS, SSO, RBAC, network rules (VPC/egress allowlists), and secret management (e.g., HashiCorp Vault or AWS Secrets Manager).</li>
<li>Custom runtimes: Code node dependencies via a custom image; private registries; base image patching for CVEs.</li>
<li>Isolation: separate instances per environment/tenant; constrain cross&#x2011;tenant access by design.</li>
</ul>
<p>If you&#x2019;re regularly touching these concerns, you&#x2019;ve crossed into &#x201C;DevOps needed&#x201D; territory&#x2014;either budget time/skills for it or move core product logic to a service you can operate with standard SRE practices.</p>
<hr>
<h2 id="8-observability">8. Observability</h2>
<p>The built&#x2011;in Execution view is a highlight of n8n: you can open a run and see exactly which node failed and why. It&#x2019;s excellent for ad&#x2011;hoc debugging &#x2014; but sometimes it just doesn&#x2019;t scale to concurrent millions of runs a day.</p>
<p>What to add for scale:</p>
<ul>
<li>Centralized logs: ship container logs to <a href="https://grafana.com/oss/loki/?ref=pixeljets.com">Grafana Loki</a> or the <a href="https://www.elastic.co/elastic-stack/?ref=pixeljets.com">ELK Stack</a>.</li>
<li>Metrics and dashboards: expose app and worker metrics to <a href="https://prometheus.io/?ref=pixeljets.com">Prometheus</a> and graph throughput, queue depth, execution latency (p95/p99), and error rate in <a href="https://grafana.com/?ref=pixeljets.com">Grafana</a>.</li>
<li>Request visibility: keep reverse&#x2011;proxy (nginx/Traefik) access logs and forward them to the same sink; correlate with <code>executionId</code> to trace inputs to outcomes.</li>
<li>Retention and pruning: tune n8n&#x2019;s execution data retention so the UI stays responsive and the DB doesn&#x2019;t balloon; rely on your log/metrics backends for long&#x2011;term history.</li>
</ul>
<p>If your ops questions sound like &#x201C;which tenant&#x2019;s runs are failing most this week?&#x201D; or &#x201C;why did queue latency spike?&#x201D;, it&#x2019;s time to treat n8n like any other production workload and invest in real logging and metrics.</p>
<hr>
<h2 id="final-thoughts">Final Thoughts</h2>
<p>n8n is perfect for what it was built for &#x2014; automating processes, validating ideas, stitching APIs.<br>
But when your project turns into a customer-facing product with real uptime, billing, and architecture needs, it&#x2019;s time to graduate to code.</p>
<p>AI coding is already here and becoming software engineer to ship real, sophisticated SaaS products is possible for everyone. It will be painful, but the results are well worth it.</p>
<p>You don&#x2019;t have to throw n8n away. Keep it for what it&#x2019;s good at &#x2014; internal automations, monitoring, glue work. But for your core product? Build it right.</p>
<hr>
<h2 id="practical-migration-patterns">Practical Migration Patterns</h2>
<ul>
<li>My best advice: get on a consulting call with adequate software engineer and describe your case. It&apos;s great if you have a working n8n workflow (Minimal Viable Product) and some customers at this stage, it simplifies things a lot! <a href="https://pixeljets.com/n8n-to-saas/?ref=pixeljets.com">I can help you to convert a n8n workflow into product, as well.</a></li>
<li>My 2nd best advice: talk to ChatGPT and descibe what you are doing and your major pains. Ask it how your workflow could be gradually migrated to real code.</li>
</ul>
<hr>
<h2 id="ai-tooling-makes-code-faster-than-clicks">AI Tooling Makes Code Faster Than Clicks</h2>
<p>If you&#x2019;ve avoided code because you thought it would be slower than dragging nodes, that&#x2019;s increasingly outdated.</p>
<ul>
<li>Claude Code, Cursor, and Codex are insanely powerful for day&#x2011;to&#x2011;day engineering. They scaffold endpoints, write tests, refactor safely, and keep context across large changes.</li>
<li>Chat-driven coding is often more convenient than navigating a complex web UI. You can iterate on logic, add retries, or change data models with a single prompt and get runnable diffs.</li>
<li>These tools make the &#x201C;rewrite&#x201D; path viable: migrate a workflow to a small service with proper types, tests, and observability in hours rather than days.</li>
</ul>
<hr>
<h2 id="my-recommendation">My Recommendation</h2>
<ul>
<li>Keep n8n for internal automations and glue.</li>
<li>For customer-facing product logic, billing, and multi-tenant auth, use real code: clear modules, a queue, typed models, tests, and real version control.</li>
<li>Lean on AI coding tools (Claude Code, Cursor, Codex) to make the jump quickly and safely.</li>
</ul>
<hr>
<h2 id="related-posts">Related Posts</h2>
<ul>
<li>My experience using n8n, from a developer perspective &#x2014; tradeoffs and tips: <a href="https://pixeljets.com/blog/n8n/">https://pixeljets.com/blog/n8n/</a></li>
<li>Self-hosted is awesome &#x2014; why I prefer self-hosting and how it applies to n8n deployments: <a href="https://pixeljets.com/blog/self-hosted-is-awesome/">https://pixeljets.com/blog/self-hosted-is-awesome/</a></li>
<li>Web scraping in n8n &#x2014; practical patterns, selectors, proxies, and caveats: <a href="https://pixeljets.com/blog/web-scraping-in-n8n/">https://pixeljets.com/blog/web-scraping-in-n8n/</a></li>
</ul>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[AI Sandboxes: Daytona vs microsandbox]]></title><description><![CDATA[<!--kg-card-begin: markdown--><h2 id="why-ai-products-need-sandboxing">Why AI Products Need Sandboxing</h2>
<p>Sandboxing has become a core feature of modern AI-powered development tools. As AI coding assistants and autonomous agents become more sophisticated, they generate and execute code that needs to run safely in isolated environments.</p>
<p>In my recent <a href="https://pixeljets.com/blog/lovable-dev-vs-bolt-new/">Lovable.dev and Bolt.new</a> blog post I</p>]]></description><link>https://pixeljets.com/blog/ai-sandboxes-daytona-vs-microsandbox/</link><guid isPermaLink="false">6874cf2e43ee8de05e0fca02</guid><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Mon, 14 Jul 2025 09:39:36 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><h2 id="why-ai-products-need-sandboxing">Why AI Products Need Sandboxing</h2>
<p>Sandboxing has become a core feature of modern AI-powered development tools. As AI coding assistants and autonomous agents become more sophisticated, they generate and execute code that needs to run safely in isolated environments.</p>
<p>In my recent <a href="https://pixeljets.com/blog/lovable-dev-vs-bolt-new/">Lovable.dev and Bolt.new</a> blog post I described two different approaches to this task. Lovable.dev uses Fly.io containers with Firecracker MicroVMs for stronger isolation, while Bolt.new uses WebAssembly-based WebContainers that run directly in the browser. Very different approaches, with different outcomes: and both solutions highlight the critical need for secure code execution when dealing with AI-generated code.</p>
<p><strong>If you enjoy sandboxing topic as I do, you should check out my <a href="https://github.com/restyler/awesome-sandbox?ref=pixeljets.com">Awesome Code Sandboxes</a> github page.</strong></p>
<p>The rise of AI coding tools has made sandboxing essential for:</p>
<ul>
<li><strong>Safe AI Code Execution</strong>: Running untrusted code generated by AI models</li>
<li><strong>Rapid Prototyping</strong>: Creating and testing applications quickly without security risks</li>
<li><strong>Educational Platforms</strong>: Allowing users to experiment with code safely</li>
<li><strong>Development Environments</strong>: Providing isolated workspaces for teams</li>
</ul>
<p>AS a big fan of <a href="https://pixeljets.com/blog/self-hosted-is-awesome/">self-hosting</a>, for me and my companies, the choice of sandboxing technology becomes even more critical, as I prefer products which are easier to deploy on cloud nodes of Hetzner / Netcup, but which still provide a certain level of security and all the features we might need.</p>
<h2 id="section-1-executive-summary"><strong>Section 1: Executive Summary</strong></h2>
<h3 id="11-overview"><strong>1.1. Overview</strong></h3>
<p>This blog post compares two platforms for running code in isolated environments: Daytona and Microsandbox. Both are designed for safe code execution, especially AI-generated code, but they take very different approaches.</p>
<p><a href="https://github.com/daytonaio/daytona/?ref=pixeljets.com"><strong>Daytona</strong></a> is a full development platform backed by venture capital. It handles the complete development process - from creating workspaces to team collaboration and deployment. It targets enterprise teams and AI workflows, offering both a managed cloud service and self-hosted options. Daytona aims to standardize development across organizations.</p>
<p><a href="https://github.com/microsandbox/microsandbox/?ref=pixeljets.com"><strong>Microsandbox</strong></a> is an open-source tool focused on one thing: running untrusted code securely. It provides strong isolation using virtual machines with fast startup times. Rather than being a complete platform, it&apos;s designed as a building block for other systems.</p>
<h3 id="12-key-technical-differences"><strong>1.2. Key Technical Differences</strong></h3>
<p>The main difference is how they handle security and isolation. Daytona uses containers (Docker/OCI) by default, which are fast and convenient but share the host system&apos;s kernel. It can use virtual machines for stronger isolation, but containers are the standard approach.</p>
<p>Microsandbox uses micro-virtualization with the libkrun library. Every sandbox gets its own virtual machine with a dedicated kernel (using KVM on Linux or HVF on macOS). This provides much stronger security than containers since there&apos;s no shared kernel that could be compromised.</p>
<h3 id="13-which-should-you-choose"><strong>1.3. Which Should You Choose?</strong></h3>
<p>Choose <strong>Daytona</strong> if you want a complete, supported platform for team development environments and you&apos;re comfortable with container-based security. It&apos;s good for organizations that need lots of features and don&apos;t mind the complexity of Kubernetes deployment.</p>
<p>Choose <strong>Microsandbox</strong> if security is your top priority and you need to run truly untrusted code. It&apos;s better for building custom systems where you need maximum isolation, even if the feature set is more limited.</p>
<h3 id="14-key-differences-table"><strong>1.4. Key Differences Table</strong></h3>
<p>This table shows the main differences between the two platforms:</p>
<table>
<thead>
<tr>
<th style="text-align:left">Feature</th>
<th style="text-align:left">Daytona</th>
<th style="text-align:left">Microsandbox</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left"><strong>Primary Focus</strong></td>
<td style="text-align:left">Complete development platform for teams</td>
<td style="text-align:left">Secure execution of untrusted code</td>
</tr>
<tr>
<td style="text-align:left"><strong>Core Isolation Tech</strong></td>
<td style="text-align:left">Containers (Docker) with optional VMs</td>
<td style="text-align:left">Virtual machines (libkrun)</td>
</tr>
<tr>
<td style="text-align:left"><strong>Startup Time</strong></td>
<td style="text-align:left">90-200ms</td>
<td style="text-align:left">Under 200ms</td>
</tr>
<tr>
<td style="text-align:left"><strong>Deployment</strong></td>
<td style="text-align:left">Cloud service or self-hosted (Kubernetes)</td>
<td style="text-align:left">Self-hosted only (simple install)</td>
</tr>
<tr>
<td style="text-align:left"><strong>Data Persistence</strong></td>
<td style="text-align:left">Full state management with snapshots</td>
<td style="text-align:left">Files saved to host directory</td>
</tr>
<tr>
<td style="text-align:left"><strong>License</strong></td>
<td style="text-align:left">AGPL-3.0 (restrictive)</td>
<td style="text-align:left">Apache-2.0 (permissive)</td>
</tr>
<tr>
<td style="text-align:left"><strong>Maturity</strong></td>
<td style="text-align:left">Mature, well-funded (21k+ stars)</td>
<td style="text-align:left">Early stage, community-driven (3.3k+ stars)</td>
</tr>
</tbody>
</table>
<h2 id="section-2-how-they-work"><strong>Section 2: How They Work</strong></h2>
<h3 id="21-daytona-architecture"><strong>2.1. Daytona Architecture</strong></h3>
<p>Daytona is built as a complex, multi-component system. Most of it (80-90%) is written in TypeScript, including the web backend (Next.js) and dashboard (React). The CLI and performance-critical parts use Go (10% of the code). This setup allows fast feature development with TypeScript while using Go for system-level tasks.</p>
<p>Daytona is designed to run and test AI coding assistants in controlled environments. It integrates with enterprise tools and runs on Kubernetes (managed with Helm charts) and uses Terraform for infrastructure. This makes it powerful but complex - you need DevOps expertise to run it yourself.</p>
<p>For security, Daytona uses <a href="https://docs.docker.com/get-started/overview/?ref=pixeljets.com">Docker containers</a> by default. This provides basic isolation that works for most development scenarios. You can configure stronger isolation with tools like <a href="https://github.com/nestybox/sysbox?ref=pixeljets.com">Sysbox</a> or VMs, but containers are the standard. While Daytona claims &quot;zero risk,&quot; containers share the host kernel, so a sophisticated attack could potentially escape the container and affect the host system.</p>
<h3 id="22-microsandbox-architecture"><strong>2.2. Microsandbox Architecture</strong></h3>
<p>Microsandbox takes a different approach, focusing on security and simplicity over features. It&apos;s a lean server that creates and manages virtual machines for code execution. It&apos;s written almost entirely in Rust, which provides memory safety and performance - important for security software.</p>
<p>Microsandbox&apos;s key feature is hardware-level isolation. It uses libkrun, a library that works with hypervisors (KVM on Linux, HVF on macOS). Each sandbox gets its own virtual machine with a dedicated kernel. This means if an attacker breaks out of one sandbox, they only compromise that single VM, not the host system or other sandboxes. This is much more secure than containers.</p>
<p>The project is built as separate modules: a core library, server, CLI, and specialized libraries for filesystems and data. This modular design makes it easier to integrate into custom systems, unlike Daytona&apos;s all-in-one platform approach.</p>
<h3 id="23-how-libkrun-works"><strong>2.3. How libkrun Works</strong></h3>
<p>Microsandbox uses <a href="https://github.com/containers/libkrun?ref=pixeljets.com">libkrun</a>, a library that makes VM-based isolation easy to use. The goals are simplicity, low resource usage, and fast boot times. It includes its own Virtual Machine Monitor (VMM) so it doesn&apos;t need external tools like <a href="https://www.qemu.org/?ref=pixeljets.com">QEMU</a>. It borrows code from optimized projects like <a href="https://github.com/firecracker-microvm/firecracker?ref=pixeljets.com">Firecracker</a> and only includes essential features, keeping it lightweight and secure.</p>
<p>libkrun has two important features for Microsandbox: networking and filesystem access.</p>
<p>For networking, it uses <strong>Transparent Socket Impersonation (TSI)</strong>. Instead of creating virtual network cards, it intercepts network calls from the VM and handles them on the host. This means all traffic looks like it&apos;s coming from the Microsandbox process itself. This avoids the complexity of traditional VM networking - no need for virtual switches or NAT rules.</p>
<p>For file access, it uses <a href="https://virtio-fs.gitlab.io/?ref=pixeljets.com">virtio-fs</a> to share a host directory directly with the guest VM. This is faster than emulating a full disk and allows easy file sharing between host and guest - which is how Microsandbox saves files.</p>
<p>These different technical choices reflect different goals. Daytona uses many technologies (TypeScript, Go, React, Kubernetes) to build features quickly and integrate with enterprise systems. It&apos;s trying to be a complete platform. Microsandbox uses fewer technologies (Rust and libkrun) to focus on security and performance for one specific job.</p>
<p>Choosing between them means choosing different philosophies. Daytona gives you a complete platform with managed workflows. Microsandbox gives you a security component that you build around. They also require different skills: Daytona needs Kubernetes and DevOps knowledge, while Microsandbox needs systems programming and Rust skills. Your choice depends on your team&apos;s expertise and goals.</p>
<h2 id="section-3-performance-comparison"><strong>Section 3: Performance Comparison</strong></h2>
<h3 id="31-startup-time-claims"><strong>3.1. Startup Time Claims</strong></h3>
<p>Both platforms emphasize fast startup times, which matters for short-lived tasks and AI workflows that need quick iteration.</p>
<p><strong>Daytona</strong> claims startup times under 90ms. Detailed benchmarks show 71ms for creation, 67ms for execution, and 59ms for cleanup (197ms total). Other sources mention 200ms startup times. The variation suggests 90ms is a best-case scenario when container images are already downloaded.</p>
<p><strong>Microsandbox</strong> claims startup times under 200ms. This is impressive because it&apos;s starting a full virtual machine with its own kernel, which traditionally takes many seconds. The speed comes from the lightweight libkrun VMM and modern hardware virtualization.</p>
<h3 id="32-warm-vs-cold-starts"><strong>3.2. Warm vs Cold Starts</strong></h3>
<p>These startup times are for &quot;warm&quot; starts after the initial setup. The first time you create an environment (&quot;cold&quot; start), there will be a delay while downloading the base image. Both platforms mention this in their documentation.</p>
<ul>
<li>Daytona&apos;s &quot;creation time&quot; of 71ms most likely corresponds to the execution of a docker run command or an equivalent container runtime API call on an image that is already present on the host. This is a measure of the container runtime&apos;s efficiency.</li>
<li>Microsandbox&apos;s &quot;boot time&quot; of under 200ms is a measure of the entire microVM initialization process: allocating memory, loading a minimal Linux kernel and initrd, and starting the first user-space process inside the guest. Achieving this speed for a VM likely involves advanced techniques, such as loading a pre-booted kernel state directly into memory from a snapshot, a common optimization in the microVM space to bypass the time-consuming hardware initialization phase of a traditional boot sequence (similar to how <a href="https://github.com/firecracker-microvm/firecracker?ref=pixeljets.com">Firecracker achieves fast startup</a>).</li>
</ul>
<h3 id="33-performance-beyond-startup"><strong>3.3. Performance Beyond Startup</strong></h3>
<p>For long-running processes, the initial startup time becomes less critical than sustained runtime performance. Here, the architectural differences lead to different performance profiles.</p>
<p><strong>Daytona</strong>, relying on containers, offers near-native execution performance for CPU-bound tasks. Since there is no hardware virtualization layer between the application and the host CPU, instructions run directly on the processor. The primary source of overhead comes from the security layers of the container runtime. System calls (syscalls) made by the application must pass through security filters like seccomp-bpf, and I/O operations for networking and filesystems are mediated by the container daemon. This introduces a small but measurable overhead compared to running directly on the host.</p>
<p><strong>Microsandbox</strong>, using microVMs, introduces a layer of hardware virtualization. CPU-bound tasks will incur some overhead from the constant switching between &quot;guest mode&quot; (running the sandboxed code) and &quot;host mode&quot; (running the VMM). Modern CPU virtualization extensions (Intel VT-x, AMD-V) are highly optimized to minimize this overhead, but it is not zero. I/O operations are handled by virtio paravirtualized drivers. These drivers are designed for high efficiency in virtualized environments, but they are not entirely zero-cost compared to a native syscall on the host. The <a href="https://github.com/containers/libkrun?ref=pixeljets.com">libkrun documentation</a> acknowledges that its virtio-fs implementation, while flexible, does not match the performance of dedicated block-based storage devices.</p>
<p>Both projects focus heavily on fast startup times, which suggests they&apos;re designed for short-lived tasks. This works well for AI model testing where you might create and destroy thousands of sandboxes quickly, or for serverless functions.</p>
<p>However, for long-running processes (1+ hours), startup time doesn&apos;t matter much. A 110ms difference is less than 0.003% of an hour-long task. For long-running workloads, you should focus on runtime stability, I/O performance, resource usage, and how well data persists over time.</p>
<h2 id="section-4-setup-and-usage"><strong>Section 4: Setup and Usage</strong></h2>
<p>This section includes practical examples showing how to create and manage sandboxes with both platforms.</p>
<h3 id="41-setting-up-daytona"><strong>4.1. Setting Up Daytona</strong></h3>
<p>Setting up Daytona involves multiple steps. You install the CLI, start the server, connect to Git providers (GitHub/GitLab), and configure where workspaces will run (Docker, AWS, etc.). Finally, you set targets for where to create workspaces.</p>
<p>Daytona needs substantial infrastructure: at least 4 vCPUs, 16GB RAM, 200GB disk space, and a <a href="https://kubernetes.io/?ref=pixeljets.com">Kubernetes</a> cluster. It uses <a href="https://helm.sh/?ref=pixeljets.com">Helm charts</a> for deployment. While this setup is powerful and flexible for complex environments, it requires significant DevOps expertise to manage.</p>
<p><strong>Example: Creating a Sandbox with Daytona SDK</strong></p>
<pre><code class="language-typescript">// daytona-example.ts
import { Daytona } from &quot;@daytonaio/sdk&quot;;

// Initialize the Daytona client
const daytona = new Daytona({
  apiKey: process.env.DAYTONA_API_KEY,
  serverUrl: &quot;https://api.daytona.io&quot;
});

async function createPythonSandbox() {
  try {
    // Create a new sandbox
    const sandbox = await daytona.sandbox.create({
      name: &quot;python-analysis&quot;,
      image: &quot;python:3.11&quot;,
      envVars: {
        &quot;PYTHONPATH&quot;: &quot;/workspace&quot;,
        &quot;DATA_SOURCE&quot;: &quot;https://api.example.com/data&quot;
      }
    });

    console.log(`Sandbox created: ${sandbox.id}`);

    // Upload a Python script
    await daytona.sandbox.uploadFile(sandbox.id, {
      filePath: &quot;/workspace/analyze.py&quot;,
      content: `
import pandas as pd
import requests
import os

def analyze_data():
    # Fetch data from API
    response = requests.get(os.getenv(&apos;DATA_SOURCE&apos;))
    data = response.json()
    
    # Process with pandas
    df = pd.DataFrame(data)
    result = df.describe()
    
    # Save results
    result.to_csv(&apos;/workspace/results.csv&apos;)
    print(&quot;Analysis complete!&quot;)

if __name__ == &quot;__main__&quot;:
    analyze_data()
`
    });

    // Execute the script
    const execution = await daytona.sandbox.execute(sandbox.id, {
      command: &quot;pip install pandas requests &amp;&amp; python analyze.py&quot;
    });

    console.log(&quot;Output:&quot;, execution.output);

    // Download results
    const results = await daytona.sandbox.downloadFile(
      sandbox.id, 
      &quot;/workspace/results.csv&quot;
    );
    
    console.log(&quot;Results downloaded:&quot;, results);

  } catch (error) {
    console.error(&quot;Error:&quot;, error);
  }
}

createPythonSandbox();
</code></pre>
<p><strong>Example: devcontainer.json Configuration</strong></p>
<pre><code class="language-json">{
  &quot;name&quot;: &quot;AI Analysis Environment&quot;,
  &quot;image&quot;: &quot;python:3.11-slim&quot;,
  &quot;features&quot;: {
    &quot;ghcr.io/devcontainers/features/common-utils:2&quot;: {
      &quot;installZsh&quot;: true,
      &quot;configureZshAsDefaultShell&quot;: true
    },
    &quot;ghcr.io/devcontainers/features/python:1&quot;: {
      &quot;version&quot;: &quot;3.11&quot;,
      &quot;installJupyterlab&quot;: true
    }
  },
  &quot;customizations&quot;: {
    &quot;vscode&quot;: {
      &quot;extensions&quot;: [
        &quot;ms-python.python&quot;,
        &quot;ms-toolsai.jupyter&quot;
      ]
    }
  },
  &quot;postCreateCommand&quot;: &quot;pip install pandas numpy matplotlib requests&quot;,
  &quot;remoteUser&quot;: &quot;vscode&quot;,
  &quot;workspaceFolder&quot;: &quot;/workspace&quot;,
  &quot;mounts&quot;: [
    &quot;source=${localWorkspaceFolder}/data,target=/workspace/data,type=bind&quot;
  ],
  &quot;forwardPorts&quot;: [8888, 5000],
  &quot;portsAttributes&quot;: {
    &quot;8888&quot;: {
      &quot;label&quot;: &quot;Jupyter Lab&quot;,
      &quot;onAutoForward&quot;: &quot;openBrowser&quot;
    }
  }
}
</code></pre>
<h3 id="42-setting-up-microsandbox"><strong>4.2. Setting Up Microsandbox</strong></h3>
<p>Microsandbox is much simpler to install. You run one command (<code>curl -sSL https://get.microsandbox.dev | sh</code>) to download and install it, then start the server with <code>msb server start</code>.</p>
<p>The trade-off is that Microsandbox has specific hardware requirements. It needs Linux with <a href="https://www.linux-kvm.org/?ref=pixeljets.com">KVM</a> enabled, or macOS with Apple Silicon (M1/M2/M3/M4). Windows support is planned but not available yet. Instead of managing complex infrastructure like Kubernetes, you just need to ensure your host machine has the right virtualization features.</p>
<p><strong>Example: Creating a Sandbox with Microsandbox SDK</strong></p>
<pre><code class="language-typescript">// microsandbox-example.ts
import { Microsandbox } from &quot;@microsandbox/sdk&quot;;

// Initialize the Microsandbox client
const msb = new Microsandbox({
  serverUrl: &quot;http://localhost:8080&quot; // Default local server
});

async function createPythonAnalysisSandbox() {
  try {
    // Create a new sandbox
    const sandbox = await msb.sandbox.create({
      image: &quot;python:3.11-alpine&quot;,
      command: &quot;sh&quot;,
      env: {
        &quot;PYTHONPATH&quot;: &quot;/workspace&quot;,
        &quot;DATA_SOURCE&quot;: &quot;https://api.example.com/data&quot;
      },
      workdir: &quot;/workspace&quot;
    });

    console.log(`Sandbox created: ${sandbox.id}`);

    // Write Python analysis script
    await msb.sandbox.writeFile(sandbox.id, &quot;/workspace/analyze.py&quot;, `
import json
import urllib.request
import os
import csv
from collections import Counter

def fetch_and_analyze():
    # Fetch data from API
    url = os.getenv(&apos;DATA_SOURCE&apos;, &apos;https://jsonplaceholder.typicode.com/posts&apos;)
    
    with urllib.request.urlopen(url) as response:
        data = json.loads(response.read().decode())
    
    # Simple analysis
    user_counts = Counter(post[&apos;userId&apos;] for post in data)
    
    # Save results
    with open(&apos;/workspace/analysis.csv&apos;, &apos;w&apos;, newline=&apos;&apos;) as f:
        writer = csv.writer(f)
        writer.writerow([&apos;User ID&apos;, &apos;Post Count&apos;])
        for user_id, count in user_counts.items():
            writer.writerow([user_id, count])
    
    print(f&quot;Analyzed {len(data)} posts from {len(user_counts)} users&quot;)
    return user_counts

if __name__ == &quot;__main__&quot;:
    result = fetch_and_analyze()
    print(&quot;Top 3 users:&quot;, dict(list(result.most_common(3))))
`);

    // Execute the analysis
    const result = await msb.sandbox.exec(sandbox.id, {
      command: &quot;python analyze.py&quot;,
      timeout: 30000
    });

    console.log(&quot;Execution output:&quot;, result.stdout);
    
    if (result.stderr) {
      console.error(&quot;Errors:&quot;, result.stderr);
    }

    // Read the results file
    const analysisResults = await msb.sandbox.readFile(
      sandbox.id, 
      &quot;/workspace/analysis.csv&quot;
    );
    
    console.log(&quot;Analysis results:&quot;, analysisResults);

    // Clean up
    await msb.sandbox.destroy(sandbox.id);
    console.log(&quot;Sandbox destroyed&quot;);

  } catch (error) {
    console.error(&quot;Error:&quot;, error);
  }
}

createPythonAnalysisSandbox();
</code></pre>
<p><strong>Example: Sandboxfile Configuration</strong></p>
<pre><code class="language-yaml"># Sandboxfile
name: python-analysis-env
image: python:3.11-alpine

# Environment setup
setup:
  - apk add --no-cache curl git
  - pip install --no-cache-dir requests pandas numpy

# Working directory
workdir: /workspace

# Environment variables
env:
  PYTHONPATH: /workspace
  PYTHON_ENV: sandbox
  TZ: UTC

# Resource limits
resources:
  memory: 512m
  cpu: 0.5
  disk: 1g

# Networking (optional)
network:
  internet: true
  ports:
    - 8000:8000  # Expose port for web services

# Persistence (optional)
volumes:
  - ./data:/workspace/data:ro
  - ./output:/workspace/output

# Security settings
security:
  read_only_root: false
  no_new_privileges: true
  user: 1000:1000
</code></pre>
<p><strong>Example: Quick Command-Line Usage</strong></p>
<pre><code class="language-bash"># Create ephemeral sandbox for quick tasks
msx python:3.11 &quot;pip install requests &amp;&amp; python -c &apos;import requests; print(requests.get(\&quot;https://api.github.com\&quot;).status_code)&apos;&quot;

# Create persistent project sandbox
msr python:3.11
# This creates ./menv directory for persistence

# Run commands in persistent sandbox
msr exec &quot;pip install pandas&quot;
msr exec &quot;python data_analysis.py&quot;

# View sandbox status
msr status

# Stop sandbox (keeps files in ./menv)
msr stop
</code></pre>
<h3 id="43-configuration-and-management"><strong>4.3. Configuration and Management</strong></h3>
<p>Configuration and daily management also reflect the platforms&apos; differing philosophies.</p>
<p><strong>Daytona</strong> uses <a href="https://containers.dev/?ref=pixeljets.com">devcontainer.json</a> files for project configuration. This is a widely adopted format, especially in <a href="https://code.visualstudio.com/docs/devcontainers/containers?ref=pixeljets.com">VS Code</a>, making it familiar to many developers. Management is offered through a comprehensive CLI for scripting and automation (daytona create, daytona code, etc.) and a full-featured web dashboard for graphical management. This provides a polished user experience catering to different preferences.</p>
<p><strong>Microsandbox</strong> uses a custom Sandboxfile for defining project-based environments. This file is similar to a Dockerfile or Vagrantfile. Management is performed exclusively through its command-line tools (msb, msr, msx, msi), which is ideal for automation and users comfortable in the terminal but lacks a graphical user interface.</p>
<p>The concept of &quot;ease of deployment&quot; is therefore subjective and highly dependent on the user&apos;s role and the scale of the deployment. Daytona&apos;s workflow is analogous to a modern PaaS platform. It abstracts the underlying infrastructure from the end-developer, providing them with a simple, standardized environment, while giving platform administrators powerful tools (Kubernetes, Helm) for management and control. For a platform engineering team tasked with providing a managed service to an entire organization, Daytona&apos;s structured, Kubernetes-native approach may be considered &quot;easier&quot; to integrate into their existing CI/CD pipelines and infrastructure-as-code practices.</p>
<p>Conversely, Microsandbox offers the experience of a classic developer command-line tool. It is direct, transparent, and highly scriptable. The user interacts directly with the server process running on their local or remote machine. For an individual developer or a small team needing to get a secure sandbox up and running immediately on a compatible machine, Microsandbox&apos;s single installation script is undeniably &quot;easier.&quot; The choice reflects the intended scale: Daytona is built for organizational scale, abstracting complexity for the many. Microsandbox is built for developer or small-team scale, offering simplicity for the few.</p>
<h2 id="section-5-long-running-workloads"><strong>Section 5: Long-Running Workloads</strong></h2>
<p>How well each platform handles long-running, I/O-heavy tasks depends on their filesystem and networking systems.</p>
<h3 id="51-filesystem-persistence"><strong>5.1. Filesystem Persistence</strong></h3>
<p><strong>Daytona</strong> provides a comprehensive API-driven approach to filesystem management. Its SDKs for Python and TypeScript offer methods for listing files, creating directories, changing permissions, and uploading/downloading files between the local machine and the remote sandbox. Daytona supports &quot;Volumes&quot; for persistent storage that&apos;s separate from individual sandboxes, similar to Kubernetes. The platform&apos;s sandboxes are &quot;stateful,&quot; preserving the filesystem across multiple interactions. This API-first approach is ideal for building automated workflows.</p>
<p><strong>Microsandbox</strong> implements persistence more directly. When using its &quot;project sandbox&quot; feature, any changes made to the filesystem inside the guest VM are automatically saved to a ./menv directory on the host machine. This uses virtio-fs technology to map the host directory into the guest. While this makes the sandbox&apos;s state easily inspectable from the host, it&apos;s subject to the maturity of the virtio-fs implementation. Some GitHub issues report I/O errors when mounting single files and problems with Unix sockets. This suggests the filesystem layer is still maturing and may have rough edges for complex I/O patterns.</p>
<h3 id="52-networking-capabilities"><strong>5.2. Networking Capabilities</strong></h3>
<p><strong>Daytona</strong> offers flexible networking tools for developers building network services. The <code>daytona forward</code> CLI command allows port forwarding from a workspace to your local machine for testing. It can also generate shareable public URLs for ports by routing traffic through Daytona&apos;s proxy service, which is valuable for collaboration and demos. The platform also supports VPN tunnels to development environments. This feature set is well-suited for web application development.</p>
<p><strong>Microsandbox</strong>&apos;s networking is handled by libkrun&apos;s Transparent Socket Impersonation (TSI). This system provides zero-configuration networking, which is a major advantage for simple use cases. However, it may be more limited than Daytona&apos;s explicit tooling. TSI is limited to IPv4 UDP and TCP sockets, which covers most common use cases but may exclude others. Some GitHub issues report unexpected hangs when processes try to make requests to 127.0.0.1, pointing to potential networking problems. Community feedback also indicates desire for more advanced network configurations, like per-sandbox VPN routing, which isn&apos;t currently supported.</p>
<h3 id="53-managing-long-running-processes"><strong>5.3. Managing Long-Running Processes</strong></h3>
<p><strong>Daytona</strong> has a first-class API concept called &quot;Sessions&quot; for managing background and long-running processes. The SDK provides functions to create, monitor (getSession), and delete these sessions. This makes the platform well-suited for running services, daemons, or any task that needs to persist in the background, as it provides the necessary programmatic hooks for lifecycle management.</p>
<p><strong>Microsandbox</strong> supports long-running processes through its stateful persistence model. A user can execute a command to run in the background within the guest VM, and as long as the sandbox isn&apos;t stopped or destroyed, that process will continue to run. Because the filesystem state is preserved across stop/start cycles, a service can be configured to restart automatically when the sandbox is brought back up. However, Microsandbox currently lacks a high-level API like Daytona&apos;s &quot;Sessions&quot; for managing these background processes. Instead, it relies on standard Linux process management tools (e.g., nohup, systemd) within the sandbox itself.</p>
<p>The comparison of I/O capabilities reveals a classic trade-off between maturity and innovation. Daytona employs well-established, battle-tested patterns for filesystem and network I/O: API-driven file management, explicit port forwarding, and concepts like Volumes borrowed from mature container orchestrators. These methods are reliable and feature-rich. Microsandbox, on the other hand, leverages innovative but less mature technologies like virtio-fs for direct host mapping and TSI for zero-configuration networking. While these approaches are elegant and powerful in theory, the evidence from the project&apos;s issue tracker suggests they still have rough edges that need to be smoothed out.</p>
<p>For a production workload that runs for over an hour and is highly sensitive to I/O stability and features&#x2014;such as a database server or a public-facing web application&#x2014;Daytona&apos;s mature and explicit I/O model currently presents a lower risk. Microsandbox&apos;s approach is highly promising and may become more robust over time, but for now, it appears better suited for long-running, compute-centric tasks where I/O requirements are simpler and the paramount concern is the security of the execution environment itself.</p>
<h2 id="section-6-project-status-and-licensing"><strong>Section 6: Project Status and Licensing</strong></h2>
<h3 id="61-community-and-support"><strong>6.1. Community and Support</strong></h3>
<p>The two projects exist at vastly different stages of maturity and are supported by different ecosystem models.</p>
<p><strong>Daytona</strong> is a well-funded commercial product, having raised over $5 million in venture capital. This financial backing is reflected in the project&apos;s polish and scale. It has a large and active community, with over 21,000 stars and 191 contributors on GitHub. Daytona offers a commercial PaaS product, Daytona Cloud, alongside its self-hostable version. The platform provides enterprise-grade capabilities like GPU support for machine learning workloads and role-based access control (RBAC). The project maintains extensive documentation and shows high activity with frequent releases and a clear public changelog.</p>
<p><strong>Microsandbox</strong> is a younger, community-driven open-source project. It has a smaller but growing community with approximately 3,300 GitHub stars. The project is highly active, but development appears to be driven by a smaller core team. It&apos;s positioned as a self-hosted, open-source alternative to commercial cloud sandboxing solutions. Its long-term viability depends on its ability to foster community engagement and achieve wider adoption. It is currently in beta state, indicating it&apos;s not yet considered production-ready by its authors.</p>
<h3 id="62-licensing-implications"><strong>6.2. Licensing Implications</strong></h3>
<p>The choice of open-source license is a critical and strategic differentiator between the two projects.</p>
<p><strong>Daytona</strong> is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This is a &quot;strong copyleft&quot; license with a specific clause for network-accessible services. It mandates that if a modified version of the Daytona software is run on a server and made accessible to users over a network, the complete source code of that modified version must also be made publicly available. This license is a strategic choice designed to create a commercial funnel. It makes it very difficult for another company to use Daytona&apos;s open-source code to build a competing commercial PaaS offering. This effectively encourages users to either contribute their modifications back to the public project or purchase a commercial license from Daytona.</p>
<p><strong>Microsandbox</strong> is licensed under the Apache License 2.0. This is a permissive license that places very few restrictions on users. It allows developers to freely use, modify, and distribute the software, and to incorporate it into proprietary, closed-source commercial products without any obligation to release their modifications.</p>
<p>This licensing difference is not a minor detail; it is a core element of each project&apos;s business model and philosophy. Daytona&apos;s AGPL-3.0 license is designed to protect its commercial interests and drive sales. Microsandbox&apos;s Apache-2.0 license is designed to encourage the widest possible adoption and integration, even within commercial products. For an organization evaluating these tools, this presents a critical build-versus-buy decision point. Using Microsandbox is a &quot;build&quot; decision; it provides a foundational component that can be freely built upon to create a custom, proprietary platform. Using the open-source version of Daytona forces a &quot;buy-in&quot; decision&#x2014;either buying into the open-source obligations of the AGPL or buying a commercial license. This makes Microsandbox a more flexible choice for organizations that intend to create and sell a product that includes a sandboxing component.</p>
<h3 id="63-documentation-and-developer-experience"><strong>6.3. Documentation and Developer Experience</strong></h3>
<p>The developer experience and quality of documentation also reflect the different maturity levels and resource allocations of the projects.</p>
<p><strong>Daytona</strong> provides extensive, well-structured, and polished documentation. The documentation site is built with modern tools like Astro and MDX and is maintained as a dedicated repository. It includes detailed guides, a full CLI reference, and complete SDK references for both Python and TypeScript, making it easy for developers to get started. The SDKs are professionally packaged and distributed on standard repositories like npm and pip, ensuring a smooth installation process.</p>
<p><strong>Microsandbox</strong>&apos;s documentation is more basic. The primary source of information is the comprehensive README.md file in the main GitHub repository. While it has a documentation website (docs.microsandbox.dev), it was inaccessible during research. The SDKs are also available on npm and pip, but the overall developer experience is that of a powerful, but still-in-beta, tool focused on core functionality rather than polished presentation.</p>
<h2 id="section-7-related-technologies-alternatives"><strong>Section 7: Related Technologies &amp; Alternatives</strong></h2>
<p>Beyond Daytona and Microsandbox, the sandboxing ecosystem includes several other notable technologies and platforms worth considering:</p>
<h3 id="71-core-sandboxing-technologies"><strong>7.1. Core Sandboxing Technologies</strong></h3>
<p><strong>Firecracker</strong>: The AWS-developed VMM that powers many platforms including e2b. Creates lightweight microVMs with 125ms startup times and minimal memory overhead.</p>
<p><strong>gVisor</strong>: Google&apos;s application kernel that intercepts system calls, providing stronger isolation than containers without requiring hardware virtualization.</p>
<p><strong>libkrun</strong>: The library powering Microsandbox. Enables embedded microVM creation with hardware isolation and fast startup times.</p>
<h3 id="72-alternative-platforms"><strong>7.2. Alternative Platforms</strong></h3>
<p><strong>e2b</strong>: AI-focused sandboxing platform using Firecracker microVMs. Offers both SaaS and self-hosted options with Apache-2.0 licensing. Targets similar AI use cases as Daytona.</p>
<p><strong>WebContainers</strong>: StackBlitz&apos;s browser-native Node.js runtime using WebAssembly. Eliminates server infrastructure by running entirely in browsers, ideal for lightweight prototyping.</p>
<p><strong>Cloudflare Workers</strong>: Edge computing platform using V8 Isolates with 0ms cold starts. Best for short-running functions distributed globally.</p>
<p><strong>Kata Containers</strong>: Hybrid container/VM runtime providing OCI compatibility with VM-level security. Integrates with Kubernetes while offering hardware isolation.</p>
<h3 id="73-development-environment-platforms"><strong>7.3. Development Environment Platforms</strong></h3>
<p><strong>Gitpod</strong>: Open-source CDE using containers with zero-trust architecture. AGPL-3.0 licensed like Daytona but focuses on Git-based ephemeral environments.</p>
<p><strong>Coder</strong>: Enterprise CDE using Terraform for flexible provisioning. Can deploy across containers, Kubernetes, or VMs depending on security needs.</p>
<p><strong>CodeSandbox</strong>: Hybrid approach using browser sandboxes for frontend and microVMs for full-stack development. Popular for prototyping and code sharing.</p>
<h3 id="74-how-they-compare"><strong>7.4. How They Compare</strong></h3>
<p>These alternatives highlight different trade-offs:</p>
<ul>
<li><strong>e2b</strong> offers similar AI focus to Daytona but with microVM security and permissive licensing</li>
<li><strong>WebContainers</strong> eliminates infrastructure costs but limits functionality to browser environments</li>
<li><strong>Cloudflare Workers</strong> provides global edge distribution but restricts to short-running functions</li>
<li><strong>Kata Containers</strong> delivers VM security with container compatibility but requires Kubernetes expertise</li>
<li><strong>Gitpod/Coder</strong> focus on development environments rather than general code execution</li>
</ul>
<h2 id="section-8-final-recommendations"><strong>Section 8: Final Recommendations</strong></h2>
<h3 id="81-summary"><strong>8.1. Summary</strong></h3>
<p>Choosing between Daytona and Microsandbox involves fundamental trade-offs based on your priorities around security, operations, and licensing:</p>
<ul>
<li><strong>Complete Platform vs. Security Tool:</strong> Daytona manages the entire development process. Microsandbox focuses on one thing: running untrusted code securely.</li>
<li><strong>Containers vs. Virtual Machines:</strong> Daytona uses containers (fast, compatible, shared kernel). Microsandbox uses VMs (more secure, isolated kernels, newer technology).</li>
<li><strong>Web Interface vs. Command Line:</strong> Daytona has a web UI and abstracts complexity. Microsandbox is CLI-focused and more direct.</li>
<li><strong>Commercial vs. Open Source:</strong> Daytona is commercially backed with restrictive licensing. Microsandbox is community-driven with permissive licensing.</li>
<li><strong>Proven vs. Innovative:</strong> Daytona uses established technologies. Microsandbox uses newer approaches that may have rough edges.</li>
</ul>
<h3 id="82-which-to-choose"><strong>8.2. Which to Choose</strong></h3>
<p>Based on this analysis:</p>
<p><strong>Choose Daytona if:</strong></p>
<ul>
<li>You want a complete development platform for your team</li>
<li>You&apos;re running mostly trusted code from internal developers</li>
<li>You need lots of features, web UI, team management, and cloud integrations</li>
<li>You&apos;re okay with AGPL licensing or plan to buy a commercial license</li>
<li>You have DevOps and Kubernetes expertise</li>
</ul>
<p><strong>Choose Microsandbox if:</strong></p>
<ul>
<li>Security is your top priority and you need to run fully untrusted code</li>
<li>You&apos;re building a custom platform and need a secure execution component</li>
<li>You want permissive licensing (Apache-2.0) for commercial use</li>
<li>You prefer simple deployment and CLI tools</li>
<li>You have systems expertise and can work with beta-stage software</li>
</ul>
<h3 id="83-future-outlook"><strong>8.3. Future Outlook</strong></h3>
<p>Both projects are on promising but divergent trajectories.</p>
<p><strong>Daytona</strong> is well-funded and positioned to become a major player in AI agent infrastructure and enterprise development environment management. Its future development will likely focus on expanding enterprise features, deepening integrations with cloud providers and AI tools, and further polishing its user experience to solidify its commercial leadership.</p>
<p><strong>Microsandbox</strong> represents accessible, lightweight virtualization. Its future success will depend on its ability to stabilize and mature its I/O systems (networking and filesystem persistence) and to cultivate a larger, more active contributor community. If it can overcome these hurdles, it has the potential to become a standard open-source component for secure sandboxing, similar to the role Firecracker plays for AWS Lambda.</p>
<p>Ultimately, the choice between them today is a choice between a mature, feature-rich, and commercially-oriented product (Daytona) and a newer, technologically-focused, and security-focused tool (Microsandbox).</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Solo, bootstrapped, minimal]]></title><description><![CDATA[<!--kg-card-begin: markdown--><p>Okay, so if you&apos;re not into building a big startup with investors and splitting equity, you can try going solo and bootstrapped. Will it work for you? I don&apos;t know. It works for me, as far as I can tell - and I&apos;m a</p>]]></description><link>https://pixeljets.com/blog/solo-bootstrapped/</link><guid isPermaLink="false">67fcb99c5f1df50428458985</guid><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Mon, 14 Apr 2025 08:24:12 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><p>Okay, so if you&apos;re not into building a big startup with investors and splitting equity, you can try going solo and bootstrapped. Will it work for you? I don&apos;t know. It works for me, as far as I can tell - and I&apos;m a tech person - by no means a genius, but I still can <a href="https://qwintry.com/?ref=pixeljets.com">package</a> <a href="https://scrapeninja.net/?ref=pixeljets.com">technology</a> <a href="https://oakpdf.com/?ref=pixeljets.com">into a</a> <a href="https://apiroad.net/?ref=pixeljets.com">product</a>.<br>
This post is an attempt to organize my scattered and fragmented observations and thoughts about bootstrapping small SaaS products - into a long (and still disjointed) text.</p>
<h2 id="the-freedom-of-being-solo">The freedom of being solo</h2>
<p>Being the only founder means no messy partner agreements or confusing profit splits. Bootstrapped means we&apos;re not running around pitching to VCs - we&apos;re just focusing on solving a real user problem. If there&apos;s no buyer, there&apos;s no revenue, end of story. This simplicity brings clarity: you can make decisions fast without endless discussions.</p>
<p>Sure, places like Y Combinator prefer multiple co-founders (if one burns out, the other steps in). But that often translates into compromise after compromise, especially when you&apos;re both technical. Instead of pushing the product and landing deals, you might end up arguing about frameworks and code reviews. Sometimes it works (like early Google), but more often it stalls the project, and then you discover that no one actually wants the product.<br>
When you&apos;re a single founder, the buck really does stop with you. Mistakes? You learn immediately and fix them. That&apos;s motivating.<br>
<img src="https://pixeljets.com/blog/content/images/2025/04/solo-founders-graph.png" alt="solo-founders-graph" loading="lazy"></p>
<h2 id="personal-health-company-health">Personal health == Company health</h2>
<p>You quickly realize that your own health&#x2014;both mental and physical - is your company&#x2019;s most critical asset. It&#x2019;s not just &#x201C;personal well-being&#x201D; anymore; it&#x2019;s about keeping the product alive. Skip a run or bail on that cold plunge, and your brain feels stuck in low gear. Sure, you can still show up physically, but your focus and creativity tank - an impossible situation when you&#x2019;re trying to build long-term products and tackle tough decisions.</p>
<p>There&#x2019;s also a kind of unspoken rule: if you consistently skip your exercise or your daily mind-clear ritual, it&#x2019;s almost like not coming to work at all. You can squeak by for a few days, especially if you see others cruising on coffee and sporadic bursts of energy. But a year or two down the line, neglecting your physical and mental well-being means your product suffers. You&#x2019;re half-asleep, burning daylight in an office chair, and your spark is gone. If you want to keep innovating and thinking several steps ahead, your &#x201C;personal health engine&#x201D; has to fire on all cylinders.</p>
<p><a href="https://pixeljets.com/blog/morning-sports-is-my-happiness-magic-pill/">Mornings can be rough</a> - maybe it&#x2019;s cold, maybe you&#x2019;re sore, and the last thing you want is a run. But you know the difference it makes: once you&#x2019;ve pushed through that resistance, your mind lights up, you&#x2019;re more present, and you&#x2019;re able to consistently do the heavy lifting that a single-founder role demands. Think of it less like a luxury and more like essential maintenance on the machine that&#x2019;s powering your company.</p>
<h2 id="the-sales-reality-check">The sales reality check</h2>
<p>Sales become a huge reality check once you&#x2019;re a solo tech founder. Suddenly, &#x201C;Who are my customers, and why should they pay me?&#x201D; is an urgent daily question. It&#x2019;s a crash course in marketing, positioning, and real human conversations&#x2014;no longer an abstract problem for some &#x201C;business partner&#x201D; to handle. If you&#x2019;ve never had to sell before, it can feel both terrifying and exhilarating. You realize it&#x2019;s your job to answer the toughest question: &#x201C;Does anyone actually want this?&#x201D;</p>
<p>Yet there&#x2019;s a common piece of advice&#x2014;&#x201C;Don&#x2019;t build if you don&#x2019;t know how to sell&#x201D;&#x2014;that can be a trap. If you take it too literally, you might end up building something dull, just because you think it will be easy to sell. That kills your motivation in the long run; boredom is the enemy of good products. Sure, you need to understand how to reach your audience, but if you&#x2019;re not at least partially passionate about the solution, it&#x2019;s going to show. People pick up on that lack of excitement; it makes you a less convincing salesperson.</p>
<p>I built <a href="https://scrapeninja.net/?ref=pixeljets.com">ScrapeNinja web scraping API</a> because I needed to scrape a specific site - and found the process exciting. That passion pushed me to keep iterating, figuring out the marketing and sales along the way. Sure, it was challenging, but having a genuine spark for your own product makes selling a more natural, less forced experience. When you care about what you&#x2019;ve built, you&#x2019;ll do whatever it takes to connect with the right people. That drive&#x2014;not a formulaic sales strategy&#x2014;is what truly keeps you and your product alive.</p>
<h2 id="dealing-with-isolation">Dealing with isolation</h2>
<p>But watch out for loneliness. When you are the entire company, you can&apos;t rely on coworkers to bounce ideas off - you are all the coworkers. You have to be comfortable spending time on your own. At the same time, you need community outside your own head: talk to friends, family (if they&apos;re not toxic), and other founders. Those conversations spark fresh insights and remind you you&apos;re not the only crazy person trying to do something big.</p>
<h2 id="the-art-of-delegation">The art of delegation</h2>
<p>Delegating is non-negotiable. Doing the same repetitive tasks over and over again is a huge time drain and kills your motivation. Once you understand a process and it&apos;s working, hand it off to someone else. If you absolutely hate sales, at least do it yourself enough times to figure out the basics, then hire or outsource.</p>
<h2 id="choosing-the-right-niche">Choosing the right niche</h2>
<p>Picking a niche that&#x2019;s too ambitious or capital-intensive can sink an indie dev before they even start. If you&#x2019;re building a super high-tech AI solution requiring an army of PhDs or trying to dethrone a mega-corporation with billions in funding, you&#x2019;ll get stuck scaling those massive complexities. On the flip side, if the scope is so tiny that it&#x2019;s basically a weekend hobby project, you won&#x2019;t see real traction - or enough revenue to keep you motivated. Aim for a middle ground: a problem that&#x2019;s valid, clear, and solvable with your current skill set and resources.</p>
<p>You want something you can launch quickly (I never launch a product if I cannot implement the main, core feature of the product in 3 weeks max), pivot if needed, and see tangible results. Focus on niches you find interesting enough to stick with during the inevitable tough times. Solve a problem that a specific user base actually cares about, but don&#x2019;t go for a target market of ten people. Likewise, avoid fads that might disappear next month. Steady, proven demand plus your personal curiosity is the sweet spot that can sustain an indie dev journey.</p>
<h2 id="minimalistic-lifestyle-cost-control">Minimalistic lifestyle &amp; cost control</h2>
<p>Keeping your personal expenses low is one of the smartest moves you can make as a solo founder. The lower your costs, the longer your runway &#x2014; meaning you don&#x2019;t have to scramble to raise money or get desperate for quick revenue. Bootstrapping is a maraphon. It&#x2019;s about being content with a simpler lifestyle so you can focus your resources on the business. That could mean sharing an apartment, cooking meals at home, or getting rid of monthly subscriptions you don&#x2019;t actually need. Little sacrifices add up, and the peace of mind you get from having more time is priceless. But, you&#x2019;re not trying to punish yourself or prove how frugal you can be - if I really want this bike right now, I just get it.</p>
<p>Minimalism also frees up mental space. When you&#x2019;re not juggling a bunch of bills or worrying about living paycheck to paycheck, you can allow yourself to be &#x201C;bored&#x201D; and just think. That boredom, ironically, is where some of the best ideas often come from. Instead of constantly rushing between tasks, you can take leisurely walks in the park&#x2014;giving your brain the breathing room it needs to solve problems creatively.</p>
<h2 id="the-freedom-is-the-real-wealth">The Freedom is the real wealth</h2>
<p>I have a friend who (from time to time) suggests, &#x201C;Why do you keep messing around alone with these indie, bootstrapped projects? Let&#x2019;s hire five developers, get some funding, and sell for tens of millions at least.&#x201D; And for a while, that proposition made me feel bad. I couldn&#x2019;t explain why, but I knew deep down that path wasn&#x2019;t for me, at least not now. But why I was feeling bad? Part of me felt like maybe I was being &#x201C;unambitious&#x201D; by not building the classic venture-backed startup. But the more I thought about it, the more I realized that a glossy, investor-friendly narrative just wasn&#x2019;t my personal compass. I found genuine fulfillment doing things exactly the way I wanted &#x2014; no board meetings, no forced decisions, no external scripts to follow.</p>
<p>&quot;Unambitous&quot;...<br>
Boostrapping is not about rejecting ambition altogether. Ambition is crucial; without it, we&#x2019;re just couch potatoes going nowhere. The real trick is recognizing and honoring your own ambitions rather than the ones people try to pin on you. There&#x2019;s a certain kind of wealth that doesn&#x2019;t show up on your bank statements or in a piece of real estate in Dubai - it&#x2019;s the wealth of deciding what you do with your own time, every day, without being &quot;lost&quot; in life in philosophical sense - you have a clear trajectory. Your own trajectory. When you&#x2019;re not locked into someone else&#x2019;s objectives, you can pivot, pause, or double down the moment you sense it&#x2019;s right. Some folks assume venture capital is an easy way to &#x201C;do nothing&#x201D; while your employees build your dream. In reality, it&#x2019;s a high-stakes game with a completely different set of pressures. By staying solo, you&#x2019;re never a hostage to outside demands; you work like crazy, but on your own terms, fueled by your own passion. That, in my book, is real freedom.</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[Building n8n web crawler for RAG]]></title><description><![CDATA[<p>This week, I&#x2019;m introducing a new project at ScrapeNinja: a <strong>recursive web crawler, packed into an n8n community node</strong>. It isn&#x2019;t just another scraper - it&#x2019;s an advanced, powerful open-source tool that executes in your local n8n instance and can be used to harvest</p>]]></description><link>https://pixeljets.com/blog/building-n8n-web-crawler-for-rag/</link><guid isPermaLink="false">67a490905f1df5042845889f</guid><category><![CDATA[webscraping]]></category><category><![CDATA[n8n]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Thu, 06 Feb 2025 10:51:12 GMT</pubDate><content:encoded><![CDATA[<p>This week, I&#x2019;m introducing a new project at ScrapeNinja: a <strong>recursive web crawler, packed into an n8n community node</strong>. It isn&#x2019;t just another scraper - it&#x2019;s an advanced, powerful open-source tool that executes in your local n8n instance and can be used to harvest huge amounts of data, for example I use it to consolidate technical documentation (many web pages) into a clean Markdown file that I can feed into a large language model (LLM) for retrieval augmented generation (RAG) and other advanced use cases.</p><p>Proceed to the code and installation instructions: <a href="https://github.com/restyler/n8n-nodes-scrapeninja?ref=pixeljets.com">https://github.com/restyler/n8n-nodes-scrapeninja</a></p><figure class="kg-card kg-embed-card kg-card-hascaption"><iframe width="200" height="150" src="https://www.youtube.com/embed/y9khNc9AjRs?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen title="Building knowledgebases for LLMs: n8n recursive web crawler"></iframe><figcaption>Video demo of ScrapeNinja n8n web crawler</figcaption></figure><p></p><p>Over the past few months, I&#x2019;ve tested different approaches for cleaning raw HTML and creating efficient scraping pipelines using <a href="https://n8n.io/?ref=pixeljets.com">n8n</a>. Since I have some experience building complex web scrapers using real coding, I quickly realized that n8n ecosystem can be significantly improved with better web scraping tools As a result, I have built <a href="https://www.npmjs.com/package/n8n-nodes-scrapeninja?ref=pixeljets.com">ScrapeNinja n8n community node</a>. <br>Read <a href="https://pixeljets.com/blog/web-scraping-in-n8n/">my blog post about web scraping in n8n</a> - and in this post, I will talk about new feature of ScrapeNinja node: web crawler.</p><h3 id="the-need-for-specialized-llm-knowledge-bases">The Need for Specialized LLM Knowledge Bases<br></h3><p>Large language models have changed how we interact with data, but they&#x2019;re often missing domain-specific insights. By structuring documentation into a single Markdown file, we give LLMs the context they need to produce accurate, detailed answers.<br></p><p>I&#x2019;ve observed that using a curated knowledge base fed into an LLM as a prompt (I am so glad we now get 128k+ context windows!) dramatically improves response quality and code generation. This realization led me to build a tool that automates the process of building knowledgebases from websites in a scalable, repeatable way.</p><h3 id="why-n8n"><br>Why n8n?</h3><p>I love n8n: it is an open-source, self hosted workflow engine where complex automations can be built using low code. It has an amazing community. </p><h3 id="web-scraping-challenges">Web Scraping Challenges<br></h3><p>There are plenty of web scraping tutorials for n8n, but real world web scraping is much harder once you try to do something useful. Common issues include:</p><p>&#x2022;&#x2003;<strong>Messy HTML Output:</strong> Raw HTML often includes scripts, styles, and irrelevant tags that confuse both humans and LLMs.</p><p>&#x2022;&#x2003;<strong>Manual URL Management:</strong> Listing URLs by hand became tedious, especially as documentation grew.</p><p>&#x2022;&#x2003;<strong>Resource-Heavy Operations:</strong> Spinning up new browser sessions for every page was slow and used too many resources.</p><p>I have built <a href="https://scrapeninja.net/?ref=pixeljets.com">ScrapeNinja web scraping API</a>, and I know how painful it is to maintain a web scraper.<br>As I have 10+ years of experience working as a software developer, most of my web scrapers are built using code (Node.js) but I have recently started using n8n more and more for my own tasks, since it is sometimes easier to maintain scrapers in n8n compared to maintaining a node.js application. Once something breaks, checking n8n execution logs is so much better compared to a situation when I realize that I don&apos;t even remember where my particular node.js process is running - as I literally have 10+ cloud servers. n8n also allows to use JS code anywhere and this is a very nice feature to have, compared to other no-code platforms which are &quot;too no-code&quot; for me. I don&apos;t like to feel restricted. n8n is also self-hosted and <a href="https://docs.n8n.io/hosting/installation/server-setups/docker-compose/?ref=pixeljets.com">runs perfectly well via docker compose file</a>. Just try it.<br></p><h3 id="the-birth-of-the-scrapeninja-n8n-crawler">The Birth of the ScrapeNinja n8n Crawler</h3><p>ScrapeNinja crawler was another evolutional step for me: I have already deployed ScrapeNinja scraping n8n node and got some experience building complex n8n nodes. Crawler is way more complex compared to my <code>Scrape</code> n8n operations which can be used to scrape a single page (included within the same n8n node).</p><p><a href="https://github.com/restyler/n8n-nodes-scrapeninja?ref=pixeljets.com">Github repo of the Scrapeninja n8n community node</a><br></p><p>I developed the ScrapeNinja n8n crawler to address these issues by combining workflow automation with flexible scraping capabilities. Key highlights include:</p><p>&#x2022;&#x2003;<strong>Recursive Traversal:</strong> It follows links based on URL rules until a page limit is reached or no new URLs are found.</p><p>&#x2022;&#x2003;<strong>Hybrid Extraction Techniques:</strong> It uses the <a href="https://scrapeninja.net/?ref=pixeljets.com">ScrapeNinja API</a> for /scrape or /scrape-js (fast raw requests or full browser rendering), while a local &#x201C;Extract primary content&#x201D; operation runs inside the n8n node.</p><p>&#x2022;&#x2003;<strong>Detailed Logging:</strong> All actions are logged in Postgres (via Supabase), simplifying debugging and performance checks.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2025/02/2025-02-06-at-15.38.png" class="kg-image" alt loading="lazy" width="2000" height="1583" srcset="https://pixeljets.com/blog/content/images/size/w600/2025/02/2025-02-06-at-15.38.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2025/02/2025-02-06-at-15.38.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2025/02/2025-02-06-at-15.38.png 1600w, https://pixeljets.com/blog/content/images/2025/02/2025-02-06-at-15.38.png 2314w" sizes="(min-width: 720px) 720px"><figcaption>Crawler node returns summary stats and MANY logs for debugging. They are also available in Postgres tables.</figcaption></figure><p>I even used it on ScrapeNinja&#x2019;s own documentation to confirm it could produce a single Markdown file from multiple pages.</p><p><br></p><h3 id="how-it-works-a-technical-overview">How It Works: A Technical Overview</h3><p>1.&#x2003;<strong>Initialization:</strong> The crawler starts with a seed URL.</p><p>2.&#x2003;<strong>Page Fetching with ScrapeNinja API:</strong> Depending on your configuration, it calls /scrape for fast raw requests or /scrape-js for browser-rendered pages, handling JavaScript-heavy sites.</p><p>3.&#x2003;<strong>Local Content Extraction:</strong> Once a page is fetched, the primary content is extracted locally, removing scripts, ads, and unnecessary tags to yield clear Markdown text.</p><p>4.&#x2003;<strong>Recursive Link Extraction:</strong> The crawler extracts and queues links, continuing until it hits the page limit or runs out of new links.</p><p>5.&#x2003;<strong>Data Storage and Logging:</strong> Processed pages and logs (in JSON) are stored in a Postgres database, aiding in visibility and troubleshooting.</p><h3 id="things-always-go-wrong-in-web-scraping">Things always go wrong in web scraping</h3><p>Crawling is a really complex process, many things can go wrong. To mitigate this, crawler node logs everything. I mean, everything. Take a look at these log entries:</p><!--kg-card-begin: markdown--><pre><code>[
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Fetching page \&quot;https://scrapeninja.net\&quot; using ScrapeNinja&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;runId&quot;: 2
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:12.011Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Selected URL \&quot;https://scrapeninja.net\&quot; for processing&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;depth&quot;: 0,
      &quot;runId&quot;: 2,
      &quot;queueId&quot;: 17
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:12.046Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Sending request to https://scrapeninja.p.rapidapi.com/scrape&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;runId&quot;: 2,
      &quot;engine&quot;: &quot;scrape&quot;,
      &quot;marketplace&quot;: &quot;rapidapi&quot;
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:12.072Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;First page link analysis for \&quot;https://scrapeninja.net\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;runId&quot;: 2,
      &quot;links_ignored&quot;: 10,
      &quot;crawl_external&quot;: false,
      &quot;links_included&quot;: 14,
      &quot;exclude_patterns&quot;: [],
      &quot;include_patterns&quot;: [],
      &quot;total_links_found&quot;: 24,
      &quot;sample_ignored_links&quot;: [
        &quot;https://www.producthunt.com/posts/scrapeninja?utm_source=badge-top-post-badge&amp;utm_medium=badge&amp;utm_souce=badge-scrapeninja&quot;,
        &quot;https://rapidapi.com/restyler/api/scrapeninja&quot;,
        &quot;https://rapidapi.com/restyler/api/scrapeninja/pricing&quot;,
        &quot;https://pipedream.com/apps/pipedream/integrations/scrapeninja&quot;,
        &quot;https://apiroad.net/proxy&quot;,
        &quot;https://pixeljets.com/blog/bypass-cloudflare/&quot;,
        &quot;https://pixeljets.com/blog/browser-as-api-web-scraping/&quot;,
        &quot;https://github.com/restyler/scrapeninja-api-php-client&quot;,
        &quot;https://www.make.com/en/integrations/scrapeninja&quot;,
        &quot;https://t.me/scrapeninja&quot;
      ],
      &quot;sample_included_links&quot;: [
        &quot;https://scrapeninja.net/&quot;,
        &quot;https://scrapeninja.net/docs/n8n/&quot;,
        &quot;https://scrapeninja.net/scraper-sandbox?slug=hackernews&quot;,
        &quot;https://scrapeninja.net/docs/&quot;,
        &quot;https://scrapeninja.net/docs/proxy-setup/&quot;,
        &quot;https://scrapeninja.net/curl-to-scraper&quot;,
        &quot;https://scrapeninja.net/scraper-sandbox&quot;,
        &quot;https://scrapeninja.net/cheerio-sandbox&quot;,
        &quot;https://scrapeninja.net/docs/make.com/&quot;,
        &quot;https://scrapeninja.net/openapi.yaml&quot;
      ]
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:13.554Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;ScrapeNinja response info for \&quot;https://scrapeninja.net\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;runId&quot;: 2,
      &quot;pageTitle&quot;: &quot;ScrapeNinja Web Scraping API: Turns websites into data, on scale. &#x1F680;&quot;,
      &quot;statusCode&quot;: 200
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:13.580Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Queued 14 new URLs for crawling&quot;,
    &quot;metadata&quot;: {
      &quot;runId&quot;: 2,
      &quot;parentUrl&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;linksQueued&quot;: 14,
      &quot;currentDepth&quot;: 0
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:13.620Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;Successfully processed page \&quot;https://scrapeninja.net\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;depth&quot;: 0,
      &quot;run_id&quot;: 2,
      &quot;status&quot;: &quot;completed&quot;,
      &quot;max_pages&quot;: 5,
      &quot;latency_ms&quot;: 1448,
      &quot;parent_url&quot;: null,
      &quot;links_found&quot;: 24,
      &quot;queue_stats&quot;: {
        &quot;total&quot;: 15,
        &quot;failed&quot;: 0,
        &quot;pending&quot;: 14,
        &quot;completed&quot;: 1
      },
      &quot;links_queued&quot;: 14,
      &quot;processed_pages&quot;: 1
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:13.638Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Selected URL \&quot;https://scrapeninja.net/\&quot; for processing&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/&quot;,
      &quot;depth&quot;: 1,
      &quot;runId&quot;: 2,
      &quot;queueId&quot;: 18
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:13.655Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Fetching page \&quot;https://scrapeninja.net/\&quot; using ScrapeNinja&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/&quot;,
      &quot;runId&quot;: 2
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:13.673Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Sending request to https://scrapeninja.p.rapidapi.com/scrape&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/&quot;,
      &quot;runId&quot;: 2,
      &quot;engine&quot;: &quot;scrape&quot;,
      &quot;marketplace&quot;: &quot;rapidapi&quot;
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:13.677Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;ScrapeNinja response info for \&quot;https://scrapeninja.net/\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/&quot;,
      &quot;runId&quot;: 2,
      &quot;pageTitle&quot;: &quot;ScrapeNinja Web Scraping API: Turns websites into data, on scale. &#x1F680;&quot;,
      &quot;statusCode&quot;: 200
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:14.862Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Queued 0 new URLs for crawling&quot;,
    &quot;metadata&quot;: {
      &quot;runId&quot;: 2,
      &quot;parentUrl&quot;: &quot;https://scrapeninja.net/&quot;,
      &quot;linksQueued&quot;: 0,
      &quot;currentDepth&quot;: 1
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:14.885Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;Successfully processed page \&quot;https://scrapeninja.net/\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/&quot;,
      &quot;depth&quot;: 1,
      &quot;run_id&quot;: 2,
      &quot;status&quot;: &quot;completed&quot;,
      &quot;max_pages&quot;: 5,
      &quot;latency_ms&quot;: 1108,
      &quot;parent_url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;links_found&quot;: 24,
      &quot;queue_stats&quot;: {
        &quot;total&quot;: 15,
        &quot;failed&quot;: 0,
        &quot;pending&quot;: 13,
        &quot;completed&quot;: 2
      },
      &quot;links_queued&quot;: 0,
      &quot;processed_pages&quot;: 2
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:14.901Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Selected URL \&quot;https://scrapeninja.net/docs/n8n/\&quot; for processing&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/docs/n8n/&quot;,
      &quot;depth&quot;: 1,
      &quot;runId&quot;: 2,
      &quot;queueId&quot;: 19
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:14.916Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Fetching page \&quot;https://scrapeninja.net/docs/n8n/\&quot; using ScrapeNinja&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/docs/n8n/&quot;,
      &quot;runId&quot;: 2
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:14.933Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Sending request to https://scrapeninja.p.rapidapi.com/scrape&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/docs/n8n/&quot;,
      &quot;runId&quot;: 2,
      &quot;engine&quot;: &quot;scrape&quot;,
      &quot;marketplace&quot;: &quot;rapidapi&quot;
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:14.935Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;ScrapeNinja response info for \&quot;https://scrapeninja.net/docs/n8n/\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/docs/n8n/&quot;,
      &quot;runId&quot;: 2,
      &quot;pageTitle&quot;: &quot;Using ScrapeNinja with n8n | ScrapeNinja&quot;,
      &quot;statusCode&quot;: 200
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:18.783Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Queued 9 new URLs for crawling&quot;,
    &quot;metadata&quot;: {
      &quot;runId&quot;: 2,
      &quot;parentUrl&quot;: &quot;https://scrapeninja.net/docs/n8n/&quot;,
      &quot;linksQueued&quot;: 9,
      &quot;currentDepth&quot;: 1
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:18.816Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;Successfully processed page \&quot;https://scrapeninja.net/docs/n8n/\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/docs/n8n/&quot;,
      &quot;depth&quot;: 1,
      &quot;run_id&quot;: 2,
      &quot;status&quot;: &quot;completed&quot;,
      &quot;max_pages&quot;: 5,
      &quot;latency_ms&quot;: 3806,
      &quot;parent_url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;links_found&quot;: 23,
      &quot;queue_stats&quot;: {
        &quot;total&quot;: 24,
        &quot;failed&quot;: 0,
        &quot;pending&quot;: 21,
        &quot;completed&quot;: 3
      },
      &quot;links_queued&quot;: 9,
      &quot;processed_pages&quot;: 3
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:18.833Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Selected URL \&quot;https://scrapeninja.net/cdn-cgi/l/email-protection\&quot; for processing&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/cdn-cgi/l/email-protection&quot;,
      &quot;depth&quot;: 1,
      &quot;runId&quot;: 2,
      &quot;queueId&quot;: 31
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:18.860Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Fetching page \&quot;https://scrapeninja.net/cdn-cgi/l/email-protection\&quot; using ScrapeNinja&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/cdn-cgi/l/email-protection&quot;,
      &quot;runId&quot;: 2
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:18.877Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Sending request to https://scrapeninja.p.rapidapi.com/scrape&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/cdn-cgi/l/email-protection&quot;,
      &quot;runId&quot;: 2,
      &quot;engine&quot;: &quot;scrape&quot;,
      &quot;marketplace&quot;: &quot;rapidapi&quot;
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:18.885Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;ScrapeNinja response info for \&quot;https://scrapeninja.net/cdn-cgi/l/email-protection\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/cdn-cgi/l/email-protection&quot;,
      &quot;runId&quot;: 2,
      &quot;pageTitle&quot;: &quot;Email Protection | Cloudflare&quot;,
      &quot;statusCode&quot;: 200
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:20.005Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Queued 0 new URLs for crawling&quot;,
    &quot;metadata&quot;: {
      &quot;runId&quot;: 2,
      &quot;parentUrl&quot;: &quot;https://scrapeninja.net/cdn-cgi/l/email-protection&quot;,
      &quot;linksQueued&quot;: 0,
      &quot;currentDepth&quot;: 1
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:20.030Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;Successfully processed page \&quot;https://scrapeninja.net/cdn-cgi/l/email-protection\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/cdn-cgi/l/email-protection&quot;,
      &quot;depth&quot;: 1,
      &quot;run_id&quot;: 2,
      &quot;status&quot;: &quot;completed&quot;,
      &quot;max_pages&quot;: 5,
      &quot;latency_ms&quot;: 1113,
      &quot;parent_url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;links_found&quot;: 4,
      &quot;queue_stats&quot;: {
        &quot;total&quot;: 24,
        &quot;failed&quot;: 0,
        &quot;pending&quot;: 20,
        &quot;completed&quot;: 4
      },
      &quot;links_queued&quot;: 0,
      &quot;processed_pages&quot;: 4
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:20.046Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Selected URL \&quot;https://scrapeninja.net/curl-to-php\&quot; for processing&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/curl-to-php&quot;,
      &quot;depth&quot;: 1,
      &quot;runId&quot;: 2,
      &quot;queueId&quot;: 30
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:20.062Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Fetching page \&quot;https://scrapeninja.net/curl-to-php\&quot; using ScrapeNinja&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/curl-to-php&quot;,
      &quot;runId&quot;: 2
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:20.082Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Sending request to https://scrapeninja.p.rapidapi.com/scrape&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/curl-to-php&quot;,
      &quot;runId&quot;: 2,
      &quot;engine&quot;: &quot;scrape&quot;,
      &quot;marketplace&quot;: &quot;rapidapi&quot;
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:20.090Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;ScrapeNinja response info for \&quot;https://scrapeninja.net/curl-to-php\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/curl-to-php&quot;,
      &quot;runId&quot;: 2,
      &quot;pageTitle&quot;: &quot;Convert cURL to PHP Web Scraper&quot;,
      &quot;statusCode&quot;: 200
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:21.557Z&quot;
  },
  {
    &quot;level&quot;: &quot;debug&quot;,
    &quot;message&quot;: &quot;Queued 0 new URLs for crawling&quot;,
    &quot;metadata&quot;: {
      &quot;runId&quot;: 2,
      &quot;parentUrl&quot;: &quot;https://scrapeninja.net/curl-to-php&quot;,
      &quot;linksQueued&quot;: 0,
      &quot;currentDepth&quot;: 1
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:21.583Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;Successfully processed page \&quot;https://scrapeninja.net/curl-to-php\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;url&quot;: &quot;https://scrapeninja.net/curl-to-php&quot;,
      &quot;depth&quot;: 1,
      &quot;run_id&quot;: 2,
      &quot;status&quot;: &quot;completed&quot;,
      &quot;max_pages&quot;: 5,
      &quot;latency_ms&quot;: 1451,
      &quot;parent_url&quot;: &quot;https://scrapeninja.net&quot;,
      &quot;links_found&quot;: 6,
      &quot;queue_stats&quot;: {
        &quot;total&quot;: 24,
        &quot;failed&quot;: 0,
        &quot;pending&quot;: 19,
        &quot;completed&quot;: 5
      },
      &quot;links_queued&quot;: 0,
      &quot;processed_pages&quot;: 5
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:21.599Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;Reached maximum pages (5), stopping crawler&quot;,
    &quot;metadata&quot;: {
      &quot;maxPages&quot;: 5,
      &quot;processedPages&quot;: 5
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:21.605Z&quot;
  },
  {
    &quot;level&quot;: &quot;info&quot;,
    &quot;message&quot;: &quot;Crawler process completed for run \&quot;2\&quot;&quot;,
    &quot;metadata&quot;: {
      &quot;maxPages&quot;: 5,
      &quot;processedPages&quot;: 5
    },
    &quot;created_at&quot;: &quot;2025-02-06T11:36:21.647Z&quot;
  }
]
</code></pre>
<!--kg-card-end: markdown--><h3 id="api-powered-vs-local-extraction">API-Powered vs. Local Extraction</h3><p></p><p>A key design choice was splitting remote and local responsibilities:</p><p>&#x2022;&#x2003;<strong>Remote Rendering with ScrapeNinja API:</strong> For speed or when a full browser environment is required, the /scrape or /scrape-js endpoints handle the page fetch.</p><p>&#x2022;&#x2003;<strong>Local &#x201C;Extract Primary Content&#x201D; Operation:</strong> The actual text extraction runs locally inside n8n, which offers control over processing and reduces load on remote systems.</p><p>I want to note that currently <strong>you need a ScrapeNinja API key (ScrapeNinja is a cloud SaaS) to launch n8n web crawler</strong> - but you do not need ScrapeNinja API key for other locally-running operations. ScrapeNinja has a free plan.</p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2025/02/2025-02-06-at-15.54.png" class="kg-image" alt loading="lazy" width="796" height="928" srcset="https://pixeljets.com/blog/content/images/size/w600/2025/02/2025-02-06-at-15.54.png 600w, https://pixeljets.com/blog/content/images/2025/02/2025-02-06-at-15.54.png 796w" sizes="(min-width: 720px) 720px"></figure><p><strong>For instance, &quot;Clean up HTML content&quot; and &quot;Extract ...&quot; operations do not require ScrapeNinja subscription</strong>, while&quot;Scrape ...&quot; and &quot;Crawl&quot; operations require ScrapeNinja API key.<br></p><p>This setup balances efficiency and flexibility, letting you tailor workflows to your specific requirements.</p><p><br></p><h3 id="getting-started-with-n8n-web-crawler-a-step-by-step-guide">Getting Started with n8n web crawler: A Step-by-Step Guide</h3><p><br></p><p>To set up the ScrapeNinja crawler in n8n:</p><p>1.&#x2003;<strong>Open n8n Dashboard:</strong> Go to <strong>Settings &#x2192; Community Nodes</strong>.</p><p>2.&#x2003;<strong>Install the Node:</strong> Find and install n8n-nodes-scrapeninja, then restart n8n if prompted.</p><p>3.&#x2003;<strong>Check the Version:</strong> Use <strong>0.4.0 or later</strong> for the ScrapeNinja n8n node. You&#x2019;ll also need your ScrapeNinja API key.</p><p>4.&#x2003;<strong>Configure Your Workflow:</strong> Insert the crawler node, define the seed URL, and set parameters like the maximum page count.</p><p>5.&#x2003;<strong>Run and Monitor:</strong> Execute the workflow, then review logs via JSON output or the Postgres crawler_logs table in Supabase.</p><p><br></p><p>I had it running smoothly on my self-hosted instance in a short time.</p><p><br></p><h3 id="building-powerful-knowledge-bases">Building Powerful Knowledge Bases</h3><p><br></p><p>One standout application is compiling large documentation sets into a unified Markdown file:</p><p>&#x2022;&#x2003;<strong>Your Own Docs:</strong> I tested it on ScrapeNinja documentation to confirm its capabilities.</p><p>&#x2022;&#x2003;<strong>Aggregated Content:</strong> Core text from each page is combined into a single file, ideal for LLM ingestion.</p><p>&#x2022;&#x2003;<strong>Enhanced LLM Responses:</strong> A dedicated knowledge base helps an LLM produce precise, contextual answers that generic training data cannot match.</p><p><br></p><h3 id="monitoring-debugging-and-maintaining-transparency">Monitoring, Debugging, and Maintaining Transparency</h3><p><br></p><p>Web crawling can take minutes or longer, so good monitoring is essential:</p><p>&#x2022;&#x2003;<strong>JSON Output Logs:</strong> Every step is logged in real time, making it easy to integrate with external tools or review later.</p><p>&#x2022;&#x2003;<strong>Postgres Crawler Logs:</strong> Logs are stored in a Postgres database (via Supabase) for persistent records of all operations.</p><p>&#x2022;&#x2003;<strong>Resource Monitoring:</strong> Memory usage and performance are checked continuously to handle multiple pages in parallel without excessive load.</p><p><br></p><p>This has been crucial for debugging and performance reviews. In one test, memory stayed under 119 MB even with ten concurrent browser sessions.</p><p><br></p><h3 id="real-world-applications-and-future-directions">Real-World Applications and Future Directions</h3><p><br></p><p>Though initially focused on LLM knowledge bases, the crawler can be adapted for:</p><p>&#x2022;&#x2003;<strong>E-commerce:</strong> Aggregate product data, reviews, and pricing from large catalogs.</p><p>&#x2022;&#x2003;<strong>Content Aggregation:</strong> Collect articles or forum posts for comprehensive data sets.</p><p>&#x2022;&#x2003;<strong>Market Analysis:</strong> Scrape competitor sites for trends and product insights.</p><p>&#x2022;&#x2003;<strong>Academic Research:</strong> Pull data from journals, conferences, or public repositories.</p><p><br></p><p>Future improvements may include better error handling, more advanced parallel processing, and closer integration with vector databases for RAG. Community feedback is welcome to shape the project&#x2019;s roadmap.</p><p><br></p><p><br><br></p><p></p><p><br></p><p>The ScrapeNinja Recursive Web Crawler for n8n is a result of practical insights from working on real-world scraping tasks. By combining the ScrapeNinja API (via /scrape or /scrape-js) with local extraction, it creates an efficient path to build domain-specific knowledge bases and improve LLM performance.</p><p><br></p><p>Try it on your self-hosted n8n setup. Install the node, configure it with your API key, and convert any web pages into structured, actionable data. Your feedback is welcome as I refine this project.</p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[Web scraping in n8n]]></title><description><![CDATA[<!--kg-card-begin: markdown--><p>I am a big fan of <a href="https://sh.pixeljets.com/n8n?ref=pixeljets.com">n8n</a> and I am using it for a lot of my projects. I love that it provides a self-hosted version and this self-hosted version is not paywalled like if often happens with so-called &quot;open core&quot; products which just use &quot;open source&</p>]]></description><link>https://pixeljets.com/blog/web-scraping-in-n8n/</link><guid isPermaLink="false">67950792d47a1c04104ad64a</guid><category><![CDATA[webscraping]]></category><category><![CDATA[n8n]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Sat, 25 Jan 2025 16:46:54 GMT</pubDate><content:encoded><![CDATA[<!--kg-card-begin: markdown--><p>I am a big fan of <a href="https://sh.pixeljets.com/n8n?ref=pixeljets.com">n8n</a> and I am using it for a lot of my projects. I love that it provides a self-hosted version and this self-hosted version is not paywalled like if often happens with so-called &quot;open core&quot; products which just use &quot;open source&quot; as a marketing term.</p>
<p>Web scraping in n8n can be both simple and sophisticated, depending on your approach and tools.</p>
<p>In this blog post, I will explore two ways of scraping: basic HTTP requests and advanced scraping techniques using <a href="https://scrapeninja.net/docs/n8n?ref=pixeljets.com">ScrapeNinja n8n integration</a>. Whether you&apos;re building a price monitoring system or gathering competitive intelligence, this guide will help you choose the right approach.</p>
<p>n8n is a powerful low-code automation platform that allows building complex workflows without writing code. While it&apos;s similar to Zapier and Make.com, it offers more technical flexibility and can be self-hosted, making it perfect for data-intensive operations like web scraping.<br>
Compared to custom-built web scrapers where you can often find yourself digging through cryptic text logs, n8n provides a very nice observabiliy out of the box: &quot;Executions&quot; tab on each scenario allows to explore how everything goes and where/how errors, if any, happen. This applies not just to web scraping scenarios, but any scenario in n8n, of course, but if you ever scraped some real website you know how often web scrapers break and require maintenance - this indeed can be painful!</p>
<p><a href="https://scrapeninja.net/?ref=pixeljets.com">ScrapeNinja</a> is a web scraping API built to mitigate common challenges in modern web scraping. It provides high-performance scraping capabilities with features like real browser TLS fingerprint emulation, proxy rotation, and JavaScript rendering. All this complexity is packed into two simple API endpoints: <code>/scrape?url=&lt;target_website&gt;</code> and <code>/scrape-js?url=&lt;target_website</code> - there are plenty of params to control the behaviour of ScrapeNinja but n8n node simplifies our life because most of these API params have UI controls and it&apos;s rather easy to find out how it works.</p>
<p>The integration between n8n and ScrapeNinja combines the power of workflow automation with enterprise-grade scraping capabilities.</p>
<h2 id="http-node">HTTP Node</h2>
<h3 id="understanding-the-basics">Understanding the Basics</h3>
<p>The HTTP node is your gateway to web scraping (and HTTP requests in general) in n8n. It&apos;s the Swiss Army knife of HTTP requests, capable of GET, POST, PUT, and other methods. While it might seem straightforward, there&apos;s more than meets the eye when it comes to its configuration and retry capabilities.<br>
<img src="https://pixeljets.com/blog/content/images/2025/01/n8n_1.png" alt="n8n HTTP node" loading="lazy"></p>
<h3 id="request-handling">Request Handling</h3>
<p>Once you start using the HTTP node for real-world web scraping, you quickly realize it&apos;s a pretty nice tool with a lot of available settings.</p>
<p>It works fine if you&apos;re scraping your own 10-page website. But if you try scraping another website, you&apos;ll likely run into some obstacles:</p>
<ul>
<li><strong>Default settings don&#x2019;t make much sense.</strong> The n8n HTTP node&#x2019;s default user agent is <code>axios/xx</code>. I recommend setting it to something more realistic, like the latest Chrome version (copy it from Chrome Dev Tools -&gt; &quot;Copy as cURL&quot; or visit <a href="https://www.whatsmyua.info/?ref=pixeljets.com">this website</a> and copy it from there).</li>
<li><strong>HTTP node is concurrent by default.</strong> Unless you enable the &quot;Batching&quot; option, the HTTP node will send all your requests simultaneously. Not very intuitive&#x2014;and a quick way to get your IP banned!<br>
<img src="https://pixeljets.com/blog/content/images/2025/02/2025-02-14-at-23.05.png" alt="Always use the batch option for HTTP node" loading="lazy"></li>
<li><strong>No TLS fingerprinting bypass.</strong> All requests are made via the Axios npm package, meaning they share the default Node.js TLS fingerprint. If the target website uses Cloudflare bot protection, it will detect that your request isn&#x2019;t from a real browser <a href="https://pixeljets.com/blog/bypass-cloudflare/">(even with a proper user agent)</a> and return a 403 &#x2014; no matter what IP you&apos;re using!</li>
<li><strong>No proxy rotation.</strong> The HTTP node doesn&apos;t support proxy rotation. Not surprising, since it&#x2019;s a relatively advanced feature found in dedicated web scraping tools, and the HTTP node was never designed for that purpose.</li>
</ul>
<h3 id="response-handling">Response Handling</h3>
<p>The HTTP node provides several important response configuration options:</p>
<ul>
<li><strong>Response Format</strong>: Automatically detects and parses various formats (JSON, XML, etc.)</li>
<li><strong>Response Headers</strong>: Option to include response headers in the output</li>
<li><strong>Response Status</strong>: Can be configured to succeed even when status code is not 2xx</li>
<li><strong>Never Error</strong>: When enabled, the node never errors regardless of the HTTP status code<br>
<img src="https://pixeljets.com/blog/content/images/2025/01/2025-01-25-at-20.36.png" alt="Never Error setting" loading="lazy"></li>
</ul>
<h3 id="retry-mechanics">Retry Mechanics</h3>
<p>The HTTP node comes with built-in retry functionality that can be a lifesaver when dealing with unstable connections or rate-limited APIs. Like all n8n nodes, it includes a generic retry mechanism for handling failures. However, this basic retry system is often too simplistic for real-world web scraping, where you need granular control over retry conditions based on specific response content or status codes.</p>
<p>Here&apos;s what you need to know about HTTP node retries:</p>
<ul>
<li><strong>Retry Options</strong>: You can set both the number of retries and the wait time between attempts</li>
<li><strong>Generic Nature</strong>: The retry mechanism is designed for general HTTP failures, not specialized scraping scenarios</li>
</ul>
<p>However, these retries are &quot;dumb&quot; - they use the same IP address and request fingerprint, which often isn&apos;t enough for serious scraping operations.</p>
<h3 id="notable-feature-curl-command-import">Notable Feature: cURL Command Import</h3>
<p>One of the most useful features is the ability to import cURL commands directly. This makes it incredibly easy to replicate browser requests - just copy the cURL command from your browser&apos;s developer tools and paste it into n8n. I have encountered some failures of cURL import feature, due to outdated npm library which is used by n8n to parse cURL syntax under the hood, but it was happening on a relatively complex requests copy&amp;pasted from Chrome Dev Tools console, chances that you will enounter these is pretty low, at least on simpler requests.<br>
<img src="https://pixeljets.com/blog/content/images/2025/01/curl3.gif" alt="n8n cURL import" loading="lazy"></p>
<h3 id="proxy-support-challenges">Proxy Support Challenges</h3>
<p>While the HTTP node does support proxies, there are known issues with proxies. As mentioned in the <a href="https://community.n8n.io/t/how-to-make-the-http-request-work-well-via-an-http-proxy/17803?ref=pixeljets.com">n8n community forum</a> and <a href="https://github.com/n8n-io/n8n/issues/7037?ref=pixeljets.com">this Github issue</a>, you might get troubles with proper proxy support because underlying npm library which n8n uses for HTTP node (Axios) does not support proxies which require https connection via <code>CONNECT</code> method properly.</p>
<h2 id="scrapeninja-node">ScrapeNinja Node</h2>
<h3 id="how-it-works">How It Works</h3>
<p>The <strong>ScrapeNinja n8n community node</strong> is a set of tools designed for web scraping and content extraction.</p>
<p><img src="https://pixeljets.com/blog/content/images/2025/02/sj-n8n-ops.png" alt="List of ScrapeNinja n8n operations" loading="lazy"></p>
<p>Some operations retrieve content from target websites, while others simplify content extraction from website responses.<br>
<a href="https://scrapeninja.net/docs/n8n/?ref=pixeljets.com#scrapeninja-n8n-node-list-of-operations">Read more in the ScrapeNinja docs</a>.</p>
<h3 id="request-flow">Request Flow</h3>
<p>For content retrieval, the request flow works as follows:</p>
<p>[your n8n self-hosted instance] &#x2192; [HTTP Node Helper] &#x2192; [ScrapeNinja API] &#x2192; [Target Website]</p>
<p>The <strong>ScrapeNinja API</strong> includes two powerful scraping engines designed to work reliably while bypassing various anti-scraping protections. All of the requests to target websites are completed via rotating proxies.</p>
<h3 id="the-power-of-advanced-scraping">The Power of Advanced Scraping</h3>
<p>ScrapeNinja transforms n8n from a basic scraping tool into a serious web harvesting platform. It&apos;s not just another HTTP client - it&apos;s a specialized scraping service that handles the complex challenges of modern web scraping. As a SaaS solution, it requires an API key and offers both free and paid plans to suit different scraping needs.</p>
<h3 id="core-capabilities">Core Capabilities</h3>
<p>The official ScrapeNinja node for n8n brings several nice capabilities:</p>
<ul>
<li>Chrome-like TLS fingerprinting</li>
<li>Automatic proxy rotation with multiple countries of proxies</li>
<li>JavaScript rendering</li>
<li>Built-in HTML parsing (<a href="https://scrapeninja.net/docs/js-extractor/?ref=pixeljets.com">JS Extractors</a>)</li>
<li>Cloudflare bypass capabilities<br>
<img src="https://pixeljets.com/blog/content/images/2025/01/2025-01-25-at-20.39.png" alt="ScrapeNinja n8n web scraping node" loading="lazy"></li>
<li>Built-in, locally executed crawler, which traverses through website links and retrieves all pages content recursively</li>
</ul>
<h3 id="response-structure">Response Structure</h3>
<p>ScrapeNinja always returns a consistent JSON structure, making it easy to process responses in your workflows:</p>
<pre><code class="language-json">{
  &quot;info&quot;: {
    &quot;version&quot;: &quot;2&quot;,
    &quot;statusCode&quot;: 200,
    &quot;statusMessage&quot;: &quot;&quot;,
    &quot;headers&quot;: {
      &quot;server&quot;: &quot;nginx&quot;,
      &quot;date&quot;: &quot;Sat, 25 Jan 2025 16:20:22 GMT&quot;,
      &quot;content-type&quot;: &quot;text/html; charset=utf-8&quot;,
      // ... other headers
    },
    &quot;screenshot&quot;: &quot;https://scrapeninja.net/screenshots/abc123.png&quot; // when screenshot option is enabled
  },
  &quot;body&quot;: &quot;&lt;html&gt;... scraped content ...&lt;/html&gt;&quot;,
  &quot;extractor&quot;: {  // when JS extractor is provided
    &quot;result&quot;: {
      &quot;items&quot;: [
        [
          &quot;some title&quot;,
          &quot;https://some-url&quot;,
          &quot;pr337h4m&quot;,
          24,
          &quot;2025-01-25T14:47:33&quot;,
          // ... extracted data
        ],
        // ... more items
      ]
    }
  }
}
</code></pre>
<p>This structured response provides:</p>
<ul>
<li>Complete request metadata in the <code>info</code> object</li>
<li>Original response headers</li>
<li>HTTP status information</li>
<li>Screenshot URL (when enabled)</li>
<li>Raw response body</li>
<li>Structured data from JS extractors (when JS extractor code is provided in request)</li>
</ul>
<h3 id="javascript-extractors">JavaScript Extractors</h3>
<p>One of ScrapeNinja&apos;s most powerful features is its <a href="https://scrapeninja.net/docs/js-extractor/?ref=pixeljets.com">JavaScript extractor functionality</a>. These are small JavaScript functions that run in the ScrapeNinja cloud to process and extract structured data from scraped content. Here&apos;s what makes them special:</p>
<ul>
<li><strong>Cloud Processing</strong>: Extractors run in ScrapeNinja&apos;s cloud environment, reducing load on your n8n instance</li>
<li><strong>Cheerio Integration</strong>: Built-in access to the Cheerio HTML parser for efficient DOM manipulation</li>
<li><strong>Clean JSON Output</strong>: Perfect for no-code environments where structured data is essential</li>
<li><strong>Reusable Logic</strong>: Write once, use across multiple similar pages. Also both ScrapeNinja <code>/scrape</code> and <code>/scrape-js</code> engines are using the same extractors, so switching to real browser rendering later when you suddenly decide to do it, is easy.</li>
</ul>
<h3 id="ai-powered-extractor-generation">AI-Powered Extractor Generation</h3>
<p>ScrapeNinja provides a cool <a href="https://scrapeninja.net/cheerio-sandbox-ai?ref=pixeljets.com">Cheerio Sandbox with AI capabilities</a> that helps you create extractors:</p>
<ol>
<li><strong>Automated Code Generation</strong>: Paste your HTML and describe what you want to extract</li>
<li><strong>Interactive Testing</strong>: Test your extractors in real-time against sample data</li>
<li><strong>AI-Assisted Improvements</strong>: Get suggestions for improving your extractors</li>
<li><strong>Optimization Features</strong>: The system automatically handles HTML cleanup and compression</li>
</ol>
<h3 id="feature-comparison-http-node-vs-scrapeninja-node">Feature Comparison: HTTP Node vs ScrapeNinja Node</h3>
<p>Here&apos;s a detailed comparison of features between the two nodes:</p>
<table>
<thead>
<tr>
<th>Feature</th>
<th>HTTP Node</th>
<th>ScrapeNinja Node</th>
</tr>
</thead>
<tbody>
<tr>
<td>Availability</td>
<td>Built-in n8n node</td>
<td>Requires API key (free/paid plans)</td>
</tr>
<tr>
<td>Basic HTTP Methods (GET, POST, etc.)</td>
<td>&#x2705;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Custom Headers</td>
<td>&#x2705;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Query Parameters</td>
<td>&#x2705;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Follow Redirects</td>
<td>&#x2705;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>cURL Import</td>
<td>&#x2705;</td>
<td>&#x274C;</td>
</tr>
<tr>
<td>JavaScript Rendering</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Screenshot Capture</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Built-in Proxy Support</td>
<td>Limited</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Smart Retries (by content)</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Retry on Unexpected Text</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Retry on Unexpected Status</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Automatic Proxy Rotation</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Cloudflare Bypass</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Browser Fingerprinting</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>HTML Parsing</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
<tr>
<td>Response Validation</td>
<td>Basic</td>
<td>Advanced</td>
</tr>
<tr>
<td>Geolocation Targeting</td>
<td>&#x274C;</td>
<td>&#x2705;</td>
</tr>
</tbody>
</table>
<h3 id="setting-up-scrapeninja-in-n8n">Setting Up ScrapeNinja in n8n</h3>
<p>Getting started with ScrapeNinja in n8n is straightforward:</p>
<ol>
<li>Install the community node (<code>n8n-nodes-scrapeninja</code>)</li>
<li>Configure your API credentials (supports both RapidAPI and APIRoad)</li>
<li>Start using advanced scraping features</li>
</ol>
<p><a href="https://community.n8n.io/t/new-scrapeninja-official-integration-with-n8n-web-scraping-api-with-rotating-proxies-and-real-browser/72032?ref=pixeljets.com">Read more on n8n community forum</a></p>
<h2 id="real-world-scraping-scenarios">Real-World Scraping Scenarios</h2>
<p>Let&apos;s look at some common scenarios where n8n can be used for web scraping:</p>
<h3 id="ai-agent-that-can-scrape-webpages">AI agent that can scrape webpages</h3>
<p><img src="https://pixeljets.com/blog/content/images/2025/01/2025-01-26-at-14.05.png" alt="AI agent using HTTP node" loading="lazy"><br>
This is an example of real-world workflow where Scrapeninja is probably a better fit compared to HTTP node.</p>
<p><a href="https://n8n.io/workflows/2006-ai-agent-that-can-scrape-webpages/?ref=pixeljets.com">https://n8n.io/workflows/2006-ai-agent-that-can-scrape-webpages/</a></p>
<p>If you want to get better in n8n, it is useful to check how workflow author is using n8n tools to cleanup HTML so it can be ingested into LLM context, and n8n workflow execute node to split scenario into smaller isolated parts. The HTML cleanup looks rather simplistic and I think using some external API like <a href="https://rapidapi.com/restyler/api/article-extractor-and-summarizer?ref=pixeljets.com">Article Extractor and Summarizer /extract endpoint</a> may be more bulletproof.</p>
<h3 id="e-commerce-data-collection">E-commerce Data Collection</h3>
<p>When scraping e-commerce sites, you often need to:</p>
<ul>
<li>Handle JavaScript-rendered content</li>
<li>Navigate through pagination</li>
<li>Extract structured data from complex layouts</li>
<li>Bypass anti-bot measures</li>
</ul>
<p>ScrapeNinja handles all these challenges while maintaining a high success rate.</p>
<h3 id="social-media-monitoring">Social Media Monitoring</h3>
<p>Social platforms are notoriously difficult to scrape due to:</p>
<ul>
<li>Sophisticated bot detection</li>
<li>Dynamic content loading</li>
<li>Rate limiting</li>
<li>Complex authentication requirements</li>
</ul>
<p>The ScrapeNinja node&apos;s advanced fingerprinting and proxy rotation make these challenges manageable.</p>
<h2 id="n8n-caveat-http-request-concurrency-control">n8n caveat: HTTP request concurrency control</h2>
<p>Let&apos;s say you are building a n8n screnario where you get website URLs from Google Sheet and request each URL via HTTP node or ScrapeNinja node, and put the HTML of the response back into Google Sheet. The naive approach would be to just add &quot;Google Sheets (get all rows)&quot; n8n node and &quot;HTTP node&quot; right after it. Let&apos;s say there are 100 URLs in your Google sheet. It is not obvious, <strong><a href="https://community.n8n.io/t/http-node-limiting-concurrency/2451?ref=pixeljets.com">but in this case n8n will run 100 HTTP requests at the same time</a></strong>. This can easily overload both the target website and you n8n instance. Even worse, if you want to store HTTP results somewhere, even if one of these HTTP requests fails, all the 100 HTTP requests results will be lost and next n8n node won&apos;t be executed. To mitigate this, <strong>always use built-in n8n Loop node when dealing with more than 10 external APIs or HTTP requests</strong>. Do not forget to put node which stores results in the same loop.</p>
<p><img src="https://pixeljets.com/blog/content/images/2025/01/2025-01-26-at-14.51.png" alt="Always use Loop node when dealing with many HTTP requests" loading="lazy"></p>
<h2 id="best-practices-for-production-scraping">Best Practices for Production Scraping</h2>
<p>When deploying scraping workflows to production, consider these tips:</p>
<ol>
<li>
<p><strong>Error Handling</strong></p>
<ul>
<li>Implement comprehensive error catching</li>
<li>Use n8n&apos;s error workflows</li>
<li>Monitor scraping success rates</li>
<li>Use n8n &quot;Executions&quot; tab on a scenario to see what is happening</li>
</ul>
</li>
<li>
<p><strong>Rate Limiting</strong></p>
<ul>
<li>use Loop n8n node to limit concurrency</li>
<li>Respect website terms of service</li>
<li>Implement appropriate delays</li>
<li>Use ScrapeNinja&apos;s built-in rate limiting features</li>
</ul>
</li>
<li>
<p><strong>Data Validation</strong></p>
<ul>
<li>Verify extracted data integrity</li>
<li>Handle missing or malformed data gracefully</li>
<li>Implement data cleaning workflows</li>
</ul>
</li>
</ol>
<h2 id="conclusion">Conclusion</h2>
<p>While n8n&apos;s HTTP node is perfect for basic web requests, serious scraping operations benefit significantly from ScrapeNinja integration. The combination provides a powerful, reliable, and scalable solution for modern web scraping challenges.</p>
<p>Remember: successful web scraping isn&apos;t just about getting the data - it&apos;s about getting it reliably, ethically, and efficiently. With n8n and ScrapeNinja, you have the tools to do just that.</p>
<!--kg-card-end: markdown-->]]></content:encoded></item><item><title><![CDATA[In-depth comparison of Lovable.dev and Bolt.new]]></title><description><![CDATA[<p>I extensively use AI tools for coding - primarily Claude Sonnet 3.5 in VS Code Copilot and the OpenAI ChatGPT macOS app (using the 01 and 40 models) as of December 2024. While these tools, which felt groundbreaking just months ago, have become an integral part of my daily</p>]]></description><link>https://pixeljets.com/blog/lovable-dev-vs-bolt-new/</link><guid isPermaLink="false">67651b5ad47a1c04104ad386</guid><category><![CDATA[node.js]]></category><category><![CDATA[ai]]></category><category><![CDATA[llm]]></category><category><![CDATA[react]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Fri, 20 Dec 2024 20:59:37 GMT</pubDate><content:encoded><![CDATA[<p>I extensively use AI tools for coding - primarily Claude Sonnet 3.5 in VS Code Copilot and the OpenAI ChatGPT macOS app (using the 01 and 40 models) as of December 2024. While these tools, which felt groundbreaking just months ago, have become an integral part of my daily workflow, I see significant room for improvement in AI code tools, particularly in UX and approach.</p><p>Despite progress, I still spend a lot of time copying and pasting between tools. While <a href="https://code.visualstudio.com/docs/copilot/copilot-edits?ref=pixeljets.com">VS Code Copilot&#x2019;s code edits</a> and AI autocomplete help bridge the gap between VS Code with Copilot and tools like Cursor.sh, they&#x2019;re far from seamless. I also continue to spend time setting up my development environment, configuring databases, managing DNS entries in the Cloudflare control panel, and setting up Nginx virtual hosts.</p><p>That&apos;s why I am so excited to see lovable.dev and bolt.new launched in 2024.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-12.09.png" class="kg-image" alt loading="lazy" width="976" height="1284" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-12.09.png 600w, https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-12.09.png 976w" sizes="(min-width: 720px) 720px"><figcaption>Both lovable.dev and bolt.new integrate well with Supabase (PostgreSQL on steroids)</figcaption></figure><p>Lovable.dev and Bolt.new represent a new wave of AI products for developers. The major difference compared to the &quot;old-school&quot; approach (can we call the ChatGPT app old-school now?) of code generation via ChatGPT or Copilot is that they:</p><ul><li>Allow a very quick start and smooth development progress <em>directly from a web browser.</em></li><li>Mitigate the pain of website deployment.</li><li>Provide a real sandbox for code execution.</li><li>Include a ready-to-use, opinionated database solution.</li></ul><p>With these features, LLM models can finally execute code in a controlled environment, interact with databases, and potentially free us from manual database design and endless copy-pasting of code.</p><h2 id="lovabledev">Lovable.dev</h2><p>Lovable.dev is a SaaS built by authors of open-source project <a href="https://github.com/gpt-engineer-org/gpt-engineer?ref=pixeljets.com">https://github.com/gpt-engineer-org/gpt-engineer</a> (50k+ of stars!)</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/lovable1.png" class="kg-image" alt loading="lazy" width="2000" height="1437" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/lovable1.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/lovable1.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/lovable1.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/lovable1.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Lovable.dev frontpage</figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-18.43.png" class="kg-image" alt loading="lazy" width="2000" height="1323" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-18.43.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-20-at-18.43.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/2024-12-20-at-18.43.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/2024-12-20-at-18.43.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>I am trying to build a project using Lovable</figcaption></figure><p>Here is the Lovable.dev CEO and founder:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://github.com/AntonOsika?ref=pixeljets.com"><div class="kg-bookmark-content"><div class="kg-bookmark-title">AntonOsika - Overview</div><div class="kg-bookmark-description">Founder &amp; CTO at Depict (2x top YC startup) Founder &amp; CEO at Lovable.dev, building the last piece of software. Also into Particle Physics. - AntonOsika</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://github.githubassets.com/assets/pinned-octocat-093da3e6fa40.svg" alt><span class="kg-bookmark-author">GitHub</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://avatars.githubusercontent.com/u/4467025?v=4?s=400" alt></div></a></figure><p>Looking at the <a href="https://github.com/gpt-engineer-org/gpt-engineer/commits/main/?ref=pixeljets.com">GPT Engineer repo commits</a>, it is obvious that the open source development of the project was active during 2023 and effectively stoppped approximately by the summer of 2024.</p><p>Reading GPTengineer code is recommended if you want to understand better how open-source AI code generation and execution works (but, if you hope to find some interesting code sandboxing techniques which could be later re-used under the hood of Lovable, you will be disappointed - there is no any sandboxing here to be seen). <a href="https://github.com/Aider-AI/aider?ref=pixeljets.com">Aider</a> is another open-source tool which is actively maintained and allows to leverage AI for coding - I recommend to explore it as well (and read <a href="https://aider.chat/blog/?ref=pixeljets.com">their awesome blog</a> where they analyze and benchmark how recent LLM models write real-world code).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/aider.png" class="kg-image" alt loading="lazy" width="2000" height="1522" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/aider.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/aider.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/aider.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/aider.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Aider blog is awesome for AI codegen reads</figcaption></figure><p>Ok, lets get back to Lovable!</p><h3 id="pricing">Pricing</h3><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/lovable-pricing.png" class="kg-image" alt loading="lazy" width="2000" height="1008" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/lovable-pricing.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/lovable-pricing.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/lovable-pricing.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/lovable-pricing.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Lovable.dev pricing</figcaption></figure><p>Paid plans start at 20 USD / mo.</p><p>Free plan is very restrictive - it only has 5 messages allowed per day. I exhausted my limit without building anything resembling a production project.</p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-18.40.png" class="kg-image" alt loading="lazy" width="708" height="718" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-18.40.png 600w, https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-18.40.png 708w"></figure><p>Starter plan for 20 USD is also pretty restrictive: 100 edits per month. But this should be enough to build something meaningful, especially if you are careful about your prompting!</p><h3 id="code-execution-flyio">Code execution: Fly.io</h3><p>Lovable needs to execute the code which was generated by the AI, to build the preview of the project. Obviously, this code is untrusted and requires some isolated environment to be run in. I have <a href="https://pixeljets.com/blog/executing-untrusted-javascript/">some previous posts on Javascript sandboxing</a> because I had a very similar task of untrusted code isolation while building my ScrapeNinja Web Scraping API where I allow customers to write some JS code for HTML extractors which are executed on ScrapeNinja servers.</p><p>Since GPTEngineer github repo does not provide any information regarding sandboxing of the generated code, we have to dig deeper into Chrome Dev Tools and Lovable network requests.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-23.33.png" class="kg-image" alt loading="lazy" width="2000" height="880" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-21-at-23.33.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-21-at-23.33.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/2024-12-21-at-23.33.png 1600w, https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-23.33.png 2382w" sizes="(min-width: 720px) 720px"><figcaption>Lovable uses Fly.io for code sandboxing</figcaption></figure><p>Okay, this was easy: the response headers of Lovable show us that they use <a href="https://fly.io/?ref=pixeljets.com">Fly.io</a> containers under the hood. And Fly.io leverages Firecracker MicroVMs technology, read more in <a href="https://fly.io/docs/reference/architecture/?ref=pixeljets.com">their awesome architecture post</a>.</p><h3 id="database">Database</h3><p>Lovable has freshly built-in Supabase connector. It provides a convenient way to tie your Lovable project to Supabase (which is a self-hostable PostgreSQL-powered Firebase alternative).<strong> There is no obvious way to attach self-hosted Supabase to Lovable, only cloud version is easy to connect. </strong>This means that Supabase pricing also applies to you now.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-18.26.png" class="kg-image" alt loading="lazy" width="2000" height="1398" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-18.26.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-20-at-18.26.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/2024-12-20-at-18.26.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/2024-12-20-at-18.26.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Supabase pricing</figcaption></figure><p>Supabase is seriously awesome &#x2013; but it can become expensive if you suddenly realize you need to scale it.</p><p>I enjoyed how I asked &quot;please add user auth&quot; and it was added by Lovable using modern Supabase React SDK. I hate adding auth and it could easily take me a couple of hours. With Lovable it takes just 1 minute!</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-20.23.png" class="kg-image" alt loading="lazy" width="1034" height="1206" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-20.23.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-20-at-20.23.png 1000w, https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-20.23.png 1034w" sizes="(min-width: 720px) 720px"><figcaption>Lovable designs database for you.</figcaption></figure><h3 id="code-editing-none">Code editing: NONE</h3><p><strong>Supabase does not have code editing built in. </strong>I mean, you can ask AI to change your code, but there is no way to really edit the files by yourself at the moment when you chat with AI. The only way to edit is to publish the project into Github and launch in-browser github.dev.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-18.52.png" class="kg-image" alt loading="lazy" width="2000" height="1452" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-18.52.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-20-at-18.52.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/2024-12-20-at-18.52.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/2024-12-20-at-18.52.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>I am looking at my project via github.dev. In my browser.</figcaption></figure><p> But you cannot build and run the project here. </p><h3 id="deployment-vercel-netlify">Deployment: Vercel &amp; Netlify</h3><p><br>Lovable <a href="https://docs.lovable.dev/tips-tricks/custom-domain?ref=pixeljets.com">now allows to easily deploy to Vercel and Netlify</a>.</p><h3 id="experience">Experience</h3><p>I tried Lovable.dev and Bolt.new with identical (poorly written) prompt: </p><blockquote>create crypto portfolio tracker. dark theme. sparkline on full width of the coin container with semi-transparent gradient. draggable coins to sort them. BTC and SOL by default, ability to add SOL, HYPE, TRX, BNB</blockquote><p>My understanding is Lovable is a bit more opionated in terms of tech stack: it uses React with Shadcn for frontend. The first response provided a working code with minor glitches. Then, I connected to Supabase and asked it to add Supabase auth and it managed to complete this task. </p><h2 id="boltnew">Bolt.new</h2><p>Bolt.new is a project by Stackblitz. </p><p>Open-source version of Bolt: <a href="https://github.com/stackblitz-labs/bolt.diy/?ref=pixeljets.com">https://github.com/stackblitz-labs/bolt.diy/</a></p><h3 id="code-editing-stackblitz">Code editing: StackBlitz</h3><p>StackBlitz is the collaborative browser-based IDE for web developers. This is clearly what gives Bolt an edge compared to Lovable: these guys know how to provide you a real IDE (with terminal) and working <code>npm install</code> &#x2013; right in your browser. </p><h3 id="code-execution-webcontainersio">Code execution: Webcontainers.io</h3><p>Since webcontainers.io is an integral part of Bolt.new, let&apos;s take a look how it works. Webcontainers is a project by Stackblitz, it is essentially a bunch of WASM (<a href="https://webassembly.org/?ref=pixeljets.com">Webassembly</a>) files which do real magic: they allow to emulate real OS with Node.js in it. <em>Right in your browser.</em> Read more here:</p><figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://blog.stackblitz.com/posts/introducing-webcontainers/?ref=pixeljets.com"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Introducing WebContainers: Run Node.js natively in your browser</div><div class="kg-bookmark-description">Today we&#x2019;re excited to announce WebContainers, a new type of WebAssembly-based operating system that boots instantly and enables Node.js environments to run natively in-browser.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://blog.stackblitz.com/favicon.svg" alt><span class="kg-bookmark-publisher">Eric Simons CEO</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://blog.stackblitz.com/posts/introducing-webcontainers/social.jpg" alt></div></a></figure><p>So... <strong>while Lovable.dev spends their resources to boot and run a VM in Fly.io to execute code generated by their AI engine, Bolt leverages Webassembly tech to use your own browser and your own machine to run an operation system in a browser.</strong></p><p>I have opened Bolt.new terminal and typed <code>npm i axios</code> in it. My Chrome Dev tools showed the axios download. Let me repeat this: entire OS network stack is emulated in browser. <em>npm &quot;thinks&quot; that it is executed in regular server environment. But the actual file download happens via XHR request in my Chrome Dev tools.</em> I still find this hard to believe.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-00.29.png" class="kg-image" alt loading="lazy" width="1950" height="762" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-21-at-00.29.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-21-at-00.29.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/2024-12-21-at-00.29.png 1600w, https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-00.29.png 1950w" sizes="(min-width: 720px) 720px"><figcaption>Webassembly emulates OS network stack</figcaption></figure><p>This is a <a href="https://github.com/nodejs/help/issues/3774?ref=pixeljets.com#issuecomment-2408875464">Github issue in Node.js github repo</a> which sheds some light how all this was implemented by Stackblitz team and which corners that had to cut. They had to implement certain parts of Node.js and built their own, very restrictive, wrappers for others, like networking. The sentiment of some Github commenters is rather negative because they think that Stackblitz team developed a proprietary technology on a foundation of open-source products. The issue also contains &quot;<a href="https://github.com/RealSput/Wenode/?ref=pixeljets.com">an open source implementation of Webcontainers</a>&quot; which I have not checked yet.</p><h3 id="stackblitz-code-execution-restrictions">StackBlitz code execution restrictions</h3><p>I have quickly encountered Stackblitz code sandbox restrictions in my second Bolt.new project. I have decided to build a simple proxy checker - a user can enter a proxy URL in &quot;http://user:pw@host:port&quot; format, and the API backend uses npm package like <a href="https://www.npmjs.com/package/http-proxy-agent?ref=pixeljets.com">https://www.npmjs.com/package/http-proxy-agent</a> to check if the supplied proxy is functional and what is the latency. Bolt.new admitted it cannot run this - you simply cannot implement this in Chrome network stack - so it&apos;s not possible to run this in Stackblitz IDE code execution environment!</p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-23.53.png" class="kg-image" alt loading="lazy" width="1106" height="1326" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-21-at-23.53.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-21-at-23.53.png 1000w, https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-23.53.png 1106w" sizes="(min-width: 720px) 720px"></figure><h3 id="database-1">Database</h3><p>I am writing this post on Dec 20 of 2024. And on Dec 19, Bolt.new released their new connector to Supabase.</p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-23.24.png" class="kg-image" alt loading="lazy" width="1224" height="1176" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-23.24.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-20-at-23.24.png 1000w, https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-23.24.png 1224w" sizes="(min-width: 720px) 720px"></figure><p>Just a week ago, Bolt.new team recommended Firebase as a database for their AI. I think they did the right thing. I like Supabase more, compared to Firebase (because I like self-hosted and I like SQL databases compared to NoSQL) so I am excited by official Supabase integration.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-20-at-23.11.png" class="kg-image" alt loading="lazy" width="2000" height="1147" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-20-at-23.11.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-20-at-23.11.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/2024-12-20-at-23.11.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/2024-12-20-at-23.11.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Bolt.new UI: notice npm install command running in real terminal, in my browser.</figcaption></figure><h3 id="pricing-1">Pricing</h3><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-00.38.png" class="kg-image" alt loading="lazy" width="2000" height="1284" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-21-at-00.38.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-21-at-00.38.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/12/2024-12-21-at-00.38.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/12/2024-12-21-at-00.38.png 2400w" sizes="(min-width: 720px) 720px"></figure><p>I think Bolt.new pricing is better compared to Lovable.dev. They calculate the tokens spent on input and output, and give 1M of tokens for free. 20 USD plan gives you 10M tokens. Due to Webassembly sandbox, I think that Bolt.new internal costs should be much smaller compared to Lovable and I think their customer-facing pricing model is better as well. </p><h3 id="experience-1">Experience</h3><p>I loved Bolt output: the crypto portfolio HTML was better. No minor glitches, almost perfect. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-01.01.png" class="kg-image" alt loading="lazy" width="1192" height="1184" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/12/2024-12-21-at-01.01.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/12/2024-12-21-at-01.01.png 1000w, https://pixeljets.com/blog/content/images/2024/12/2024-12-21-at-01.01.png 1192w" sizes="(min-width: 720px) 720px"><figcaption>Bolt.new works on adding Supabase auth to my project.</figcaption></figure><h2 id="conclusions">Conclusions</h2><p>We live in incredible times. By 2024, fully manual coding stopped making sense. The progress of LLM models and AI tooling makes writing code by hand feel strange and outdated. I can&#x2019;t imagine creating new products without AI anymore. Tools like Bolt.new and Lovable.dev mark another major step into the AI coding era.</p><p>If you ask me which one I&#x2019;d choose out of these two, I was initially hesitant - until I learned that Bolt.new released the Supabase connector. Now I think I&#x2019;d prefer Bolt because:</p><p>&#x2003;&#x2022;&#x2003;It feels more mature in terms of UX.</p><p>&#x2003;&#x2022;&#x2003;It provides a real editor and terminal.</p><p>&#x2003;&#x2022;&#x2003;It offers better pricing.</p><p>&#x2003;&#x2022;&#x2003;It leverages cutting-edge tech for code sandboxing (WebAssembly) - though it comes with certain restrictions, as I mentioned earlier. If you need real server-side code execution for your project, Lovable is a better choice due to Fly.io VMs under the hood.</p><p>What&apos;s your experience?</p>]]></content:encoded></item><item><title><![CDATA[My experience using n8n, from a developer perspective]]></title><description><![CDATA[<p>1.5 years ago, I wrote <a href="https://pixeljets.com/blog/zapier-make-com-pipedream-from-a-developer-perspective/">a blog post sharing my thoughts and experience on using Make.com, Zapier, and Pipedream</a> from my perspective (I recommend reading that piece before continuing here). When exploring these awesome platforms, I was mostly interested in how no-code and low-code products can enhance my</p>]]></description><link>https://pixeljets.com/blog/n8n/</link><guid isPermaLink="false">66127e504002005ce179bfb0</guid><category><![CDATA[n8n]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Sun, 19 May 2024 16:14:05 GMT</pubDate><content:encoded><![CDATA[<p>1.5 years ago, I wrote <a href="https://pixeljets.com/blog/zapier-make-com-pipedream-from-a-developer-perspective/">a blog post sharing my thoughts and experience on using Make.com, Zapier, and Pipedream</a> from my perspective (I recommend reading that piece before continuing here). When exploring these awesome platforms, I was mostly interested in how no-code and low-code products can enhance my life as a developer. My goal is to minimize coding and increase my output. </p><p><em>UPD 2025: more of my n8n blog posts: </em></p><ul><li><a href="https://pixeljets.com/blog/n8n-vs-code/">n8n vs real coding for SaaS products</a></li><li><a href="https://pixeljets.com/blog/web-scraping-in-n8n/">Web scraping in n8n</a></li></ul><p>My technical background heavily influences my no-code product choices: I don&apos;t like it when low-code products try to hide all the complexity of real web development or when code execution is poorly implemented. In a perfect world, I would like to be able to create projects starting from no-code but have the ability to write some fancy JS here and there when I decide to get my hands dirty. </p><p>Both Zapier and Make obviously target less technical audience segments, so I gave high praise to Pipedream for targeting developers like me and making code a first-class citizen in their low-code platform. In this blog post, I will describe my experience using the <a href="https://sh.pixeljets.com/n8n?ref=pixeljets.com">n8n self-hosted engine</a> and occasionally compare it with Make, Zapier, and Pipedream.</p><h2 id="self-hosted-services-are-great-in-2024">Self-hosted services are great in 2024</h2><p>I have recently wrote <a href="https://pixeljets.com/blog/self-hosted-is-awesome/">a blog post on awesome self hosted products</a> I use almost every day. Self hosted is awesome if you know a bit of Docker. n8n is another example of a self-hosted instrument which provides immense value for exactly zero dollars of recurring cost. </p><h2 id="n8n-self-hosted-low-code-powerhouse">n8n: self-hosted low-code powerhouse</h2><p>I was a bit hesitant setting up <a href="https://sh.pixeljets.com/n8n?ref=pixeljets.com">n8n</a> as I am a paying customer of Make.com which is a great platform. But, a lot of <a href="https://scrapeninja.net/?ref=pixeljets.com">ScrapeNinja</a> customers (which is my bootstrapped SaaS API for web scraping) were recently coming with basic support requests from n8n community, asking how my API could be used with n8n, and I wanted to help them, so I had to dive into it. </p><p>[UPD Jan 2025]: I have launched a ScrapeNinja n8n integration node for web scraping, <a href="https://scrapeninja.net/docs/n8n/?ref=pixeljets.com">read more</a>.</p><h2 id="about-n8n">About n8n</h2><p><a href="https://sh.pixeljets.com/n8n?ref=pixeljets.com">n8n</a> is a workflow automation tool. Zapier and Make are famous n8n competitors you probably know. The best thing about n8n is that <em>the whole source code</em> of this massive project <a href="https://github.com/n8n-io/n8n?ref=pixeljets.com">is available on Github</a>. Now wonder these guys have <strong>41k+</strong> Github stars - this is a big Typescript codebase and you can learn a lot by exploring it. For example, <a href="https://github.com/n8n-io/n8n/blob/bf2ee51e36214c44bb7bb57d27d33dd29a8ccf72/packages/nodes-base/nodes/Code/JavaScriptSandbox.ts?ref=pixeljets.com">take a look how they are sandboxing external JS code</a> - they are using a forked vm2 <a href="https://pixeljets.com/blog/executing-untrusted-javascript/">which I have reviewed</a> a few years ago.</p><h3 id="installation-self-hosted-docker">Installation (self-hosted, Docker)</h3><p>I launched n8n with literally a single bash command on my remote Hetzner server, from my VS Code (with VS Code Remote extension). I have chosen <a href="https://docs.n8n.io/hosting/installation/docker/?ref=pixeljets.com#starting-n8n">Docker n8n setup</a> for my installation. </p><p>The whole setup took like... 30 seconds. VS Code also automatically forwarded remote port of the server to my localhost, so I opened <code>http://localhost:5678</code> in Chrome and saw n8n signup screen which allowed me to create an admin user and log in. So smooth!</p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/04/2024-04-07-at-14.35.png" class="kg-image" alt loading="lazy" width="2000" height="1512" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/04/2024-04-07-at-14.35.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/04/2024-04-07-at-14.35.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/04/2024-04-07-at-14.35.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/04/2024-04-07-at-14.35.png 2400w" sizes="(min-width: 720px) 720px"></figure><h3 id="final-self-hosted-setup">Final self-hosted setup</h3><p>I played with this 30-sec setup of n8n for an hour and realized it can be useful for me on a longer term. Since I installed n8n on a remote Hetzner machine <a href="https://hetzner.cloud/?ref=kW64RqTXiNiN">(receive 20 EUR by registering on Hetzner, via my referral link)</a>, I&apos;ve created a subdomain for n8n for one of my domains over Cloudflare and also set up nginx as a reverse proxy to forward all requests from https://myn8n.scrapeninja.net to n8n running instance. This was another time investment of around 20 minutes.</p><p>Since I am using Cloudflare, it manages all HTTPS certificates for me.</p><p>Here is my nginx reverse proxy config which I took from n8n website:<br> </p><figure class="kg-card kg-code-card"><pre><code class="language-ini">server {
    listen 80;
    server_name n8n.scrapeninja.net;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    server_name n8n.scrapeninja.net;
    ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
    ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
    location / {
        proxy_pass http://localhost:5678;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade; # websocket support
        proxy_set_header Connection &quot;Upgrade&quot;;  # websocket support
        proxy_set_header Host $host;
        chunked_transfer_encoding off;
        proxy_buffering off;
        proxy_cache off;
    }
}
</code></pre><figcaption>nginx reverse proxy config for n8n</figcaption></figure><h3 id="code-as-a-first-class-citizen">Code as a first class citizen</h3><p>My biggest nice surprise so far is that n8n embraces JS everywhere: in basic inputs and as a separate code nodes. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/04/2024-04-08-at-10.56.png" class="kg-image" alt loading="lazy" width="2000" height="1561" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/04/2024-04-08-at-10.56.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/04/2024-04-08-at-10.56.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/04/2024-04-08-at-10.56.png 1600w, https://pixeljets.com/blog/content/images/2024/04/2024-04-08-at-10.56.png 2204w" sizes="(min-width: 720px) 720px"><figcaption>JSON.stringify can be put right into any input for n8n.</figcaption></figure><p>So, for instance, if I got a big HTTP response with JSON data, I can put just part of this JSON, serialized, back into my DB (NocoDB or Google Sheets) by marshalling it via <code>JSON.stringify</code> called within the n8n node form input. This is really convenient and I don&apos;t need to learn new syntax and functions for simple data transformations (like, in Make).</p><p>And if I need to do some heavy processing, I just create a Code node:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/04/2024-04-08-at-11.15.png" class="kg-image" alt loading="lazy" width="2000" height="1482" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/04/2024-04-08-at-11.15.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/04/2024-04-08-at-11.15.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/04/2024-04-08-at-11.15.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/04/2024-04-08-at-11.15.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>n8n Code node. It allows to loop through all items in JS or choose to execute the code block for each item sequentially.</figcaption></figure><p>Do you see the <code>console.log</code> statement? It is actually printed in Chrome Dev Tools console when I click &quot;Execute workflow&quot; button or &quot;Test step&quot; button.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/04/2024-04-08-at-11.25.png" class="kg-image" alt loading="lazy" width="2000" height="1615" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/04/2024-04-08-at-11.25.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/04/2024-04-08-at-11.25.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/04/2024-04-08-at-11.25.png 1600w, https://pixeljets.com/blog/content/images/2024/04/2024-04-08-at-11.25.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Properly working console.log makes debugging code blocks even more pleasant.&#xA0;</figcaption></figure><p>And here is a huge feature:<strong> external npm packages can be launched inside n8n JS code node </strong>(only in self-hosted setup, you can&apos;t do this in n8n Cloud, at least not in april of 2024). You can&apos;t do this in: Make, Zapier. <a href="https://pipedream.com/docs/code/nodejs?ref=pixeljets.com">Pipedream is on par with n8n (or even better) in terms of JS code execution.</a></p><p>But, it cannot read/write to file system or network.</p><p>Read more about n8n Code node <a href="https://docs.n8n.io/code/code-node/?ref=pixeljets.com#external-libraries">in the documentation</a>.</p><h3 id="not-just-javascript-but-also-python">Not just Javascript, but also Python</h3><p>n8n allows to execute Python <a href="https://community.n8n.io/t/python-node-got-created/5827/22?u=anthony&amp;ref=pixeljets.com">(since Feb 2023)</a>. <em>I should note that due to overhead on launching Python environment, for now, it is possible only in code blocks, not in every input field, so JS has to be sprinkled across your workflow even in case you hate it - this is totally not a problem for me - as I am perfectly fine with JS!</em></p><p></p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/04/2024-04-08-at-11.17.png" class="kg-image" alt loading="lazy" width="1348" height="1098" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/04/2024-04-08-at-11.17.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/04/2024-04-08-at-11.17.png 1000w, https://pixeljets.com/blog/content/images/2024/04/2024-04-08-at-11.17.png 1348w" sizes="(min-width: 720px) 720px"></figure><p>Python engine uses Pyodide under the hood, which transpiles the Python code into Webassembly, which is later executed via node.js (so internally it&apos;s still executed on V8 Javascript environment).</p><h2 id="performing-http-requests-and-web-scraping-in-n8n">Performing HTTP requests and web scraping in n8n</h2><p>Obviously, n8n provides an action to perform HTTP requests. I liked how there is an &quot;import from cURL&quot; link right there, and it works fine for <a href="https://rapidapi.com/restyler/api/scrapeninja?ref=pixeljets.com">ScrapeNinja API</a> at RapidAPI marketplace. RapidAPI marketplace allows to copy&amp;paste cURL code - so I just put it into n8n: </p><figure class="kg-card kg-video-card kg-card-hascaption"><div class="kg-video-container"><video src="https://pixeljets.com/blog/content/media/2024/05/1c10c5495b404810b6c2a5ebaeba387a.mp4" poster="https://img.spacergif.org/v1/1114x720/0a/spacer.png" width="1114" height="720" playsinline preload="metadata" style="background: transparent url(&apos;https://pixeljets.com/blog/content/images/2024/05/media-thumbnail-ember317.jpg&apos;) 50% 50% / cover no-repeat;"></video><div class="kg-video-overlay"><button class="kg-video-large-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button></div><div class="kg-video-player-container"><div class="kg-video-player"><button class="kg-video-play-icon"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M23.14 10.608 2.253.164A1.559 1.559 0 0 0 0 1.557v20.887a1.558 1.558 0 0 0 2.253 1.392L23.14 13.393a1.557 1.557 0 0 0 0-2.785Z"/></svg></button><button class="kg-video-pause-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><rect x="3" y="1" width="7" height="22" rx="1.5" ry="1.5"/><rect x="14" y="1" width="7" height="22" rx="1.5" ry="1.5"/></svg></button><span class="kg-video-current-time">0:00</span><div class="kg-video-time">/<span class="kg-video-duration"></span></div><input type="range" class="kg-video-seek-slider" max="100" value="0"><button class="kg-video-playback-rate">1&#xD7;</button><button class="kg-video-unmute-icon"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M15.189 2.021a9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h1.794a.249.249 0 0 1 .221.133 9.73 9.73 0 0 0 7.924 4.85h.06a1 1 0 0 0 1-1V3.02a1 1 0 0 0-1.06-.998Z"/></svg></button><button class="kg-video-mute-icon kg-video-hide"><svg xmlns="http://www.w3.org/2000/svg" viewbox="0 0 24 24"><path d="M16.177 4.3a.248.248 0 0 0 .073-.176v-1.1a1 1 0 0 0-1.061-1 9.728 9.728 0 0 0-7.924 4.85.249.249 0 0 1-.221.133H5.25a3 3 0 0 0-3 3v2a3 3 0 0 0 3 3h.114a.251.251 0 0 0 .177-.073ZM23.707 1.706A1 1 0 0 0 22.293.292l-22 22a1 1 0 0 0 0 1.414l.009.009a1 1 0 0 0 1.405-.009l6.63-6.631A.251.251 0 0 1 8.515 17a.245.245 0 0 1 .177.075 10.081 10.081 0 0 0 6.5 2.92 1 1 0 0 0 1.061-1V9.266a.247.247 0 0 1 .073-.176Z"/></svg></button><input type="range" class="kg-video-volume-slider" max="100" value="100"></div></div></div><figcaption>Import cURL of ScrapeNinja web scraping HTTP request into n8n</figcaption></figure><p>So easy! Of course if you want to make many requests to ScrapeNinja across your n8n workflows, you should create a new &quot;credential&quot; for ScrapeNinja API key instead of having it hardcoded into particular action.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/05/2024-05-20-at-21.06.png" class="kg-image" alt loading="lazy" width="2000" height="1285" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/05/2024-05-20-at-21.06.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/05/2024-05-20-at-21.06.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/05/2024-05-20-at-21.06.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/05/2024-05-20-at-21.06.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>API credentials storage</figcaption></figure><p>[UPD Jan 2025] <strong>You don&apos;t need to fight with ScrapeNinja API anymore to use it with n8n!</strong> I have built <a href="https://scrapeninja.net/docs/n8n/?ref=pixeljets.com">an official n8n ScrapeNinja integration node</a>, which is seriously awesome and includes a lot of free open source features like smart content extraction and html-to-markdown converter. Read more about <a href="https://pixeljets.com/blog/building-n8n-web-crawler-for-rag/">built-in recursive web crawler</a> as well.</p><h2 id="con-of-self-hosted-google-oauth-client">Con of self-hosted: Google oAuth client</h2><p>Self-hosted is great, but major issue of having your &quot;glue&quot; automation platform self-hosted is that you need to deal with oAuth process of tier1 players like Google. Let&apos;s say you want to read and write data to your Google Sheets. In Make, which is a cloud SaaS, all the dirty work of creating Google oAuth Client has been done by Make maintainers: you just authorize in your Google account and start using Google products.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/05/image.png" class="kg-image" alt loading="lazy" width="443" height="527"><figcaption>Example of Google permissions screen.</figcaption></figure><p>For n8n, you need to spend around 10-20 minutes configuring Google Console access and setting up an OAuth client. This includes configuring the consent screen and authorizing the &quot;n8n&quot; app to access certain APIs. <a href="https://docs.n8n.io/integrations/builtin/credentials/google/oauth-generic/?ref=pixeljets.com#prerequisites">Nothing too awful, here is the n8n tutorial.</a></p><p>However, for me, the experience of creating a new OAuth client in Google Console has consistently been a pain. Even after doing it five times, I still don&apos;t enjoy the process. It&apos;s much more cumbersome than just copying and pasting an API key (e.g., from the OpenAI dashboard).</p><h2 id="n8n-pricing-self-hosted">n8n pricing: self-hosted</h2><p>n8n has two editions: Community edition and Enterprise edition. Both of them could be self-hosted. Let me keep it concise: The difference in features of Community vs Enterprise is thin, so when you get the most basic Community edition of n8n and install it via docker compose on your 3 EUR Hetzner or Vultr server, you get (very roughly) 95% of n8n Enterprise Cloud functionality. Sic!</p><p><a href="https://docs.n8n.io/hosting/community-edition-features/?ref=pixeljets.com">Here is the list of features that Community edition lacks</a> - the list is rather short, and some of these lacking features can be mitigated using community nodes, for example, I think that <a href="https://www.npmjs.com/package/n8n-nodes-globals?ref=pixeljets.com">https://www.npmjs.com/package/n8n-nodes-globals</a> can be used as an alternative to <a href="https://docs.n8n.io/code/variables/?ref=pixeljets.com">global custom variables</a>.</p><p>n8n is a &quot;fair use&quot; licensed product. <a href="https://docs.n8n.io/sustainable-use-license/?ref=pixeljets.com">Read their license here.</a> Essentially, if you use n8n as a self-hosted product (&quot;community edition&quot;), it is free for you (unless you are building some sort of SaaS product where n8n is exposed to your end customers where they can enter their authentication details into n8n nodes - very rare use case). So, if you automate something for your company, or automate something for your customers, self-hosted n8n is FREE for you. You just pay for your server. I still see that <a href="https://www.reddit.com/r/n8n/comments/1ily9x3/how_much_does_all_of_this_cost/?ref=pixeljets.com">a lot</a> of people on Reddit who are asking &quot;okay, seriously, so how much this self-hosted license costs?&quot;. I understand: it&apos;s hard to believe that such polished automation product can be free when closed-source cloud platforms with better marketing charge their customers a fortune every month. Check <a href="https://www.reddit.com/r/n8n/comments/1if4k37/is_self_hosting_n8n_worth_it/?ref=pixeljets.com">this</a> thread regarding self hosted offering. </p><h2 id="n8n-pricing-cloud">n8n pricing: cloud</h2><p>It is possible to use n8n Cloud as another SaaS. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/05/2024-05-19-at-20.16.png" class="kg-image" alt loading="lazy" width="2000" height="1274" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/05/2024-05-19-at-20.16.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/05/2024-05-19-at-20.16.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/05/2024-05-19-at-20.16.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/05/2024-05-19-at-20.16.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>n8n pricing for May, 2024</figcaption></figure><p>At the time of writing, Make offered 9USD/mo plan while smallest n8n cloud plan is 20USD - both of these products are very affordable.</p><p>The big advantage of n8n Cloud plans is that they <strong>do not charge you for the complexity of your workflows</strong>, so if your workflow run takes hundreds of actions, you will spend a lot more on Make and Zapier, compared to n8n.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/05/2024-05-19-at-20.19.png" class="kg-image" alt loading="lazy" width="728" height="774" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/05/2024-05-19-at-20.19.png 600w, https://pixeljets.com/blog/content/images/2024/05/2024-05-19-at-20.19.png 728w" sizes="(min-width: 720px) 720px"><figcaption>n8n workflow execution does not charge for &quot;operations&quot;.</figcaption></figure><p>This cost model is very generous and somewhat similar to what <a href="https://pixeljets.com/blog/zapier-make-com-pipedream-from-a-developer-perspective/">Pipedream, another great low-code platform</a> offers (though Pipedream has since implemented a change to their pricing model, they now track the &quot;compute time&quot; of a workflow run - nice idea, which allows them to protect their cloud from abusers, while giving very fair pricing):</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/05/2024-05-19-at-20.22.png" class="kg-image" alt loading="lazy" width="640" height="894" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/05/2024-05-19-at-20.22.png 600w, https://pixeljets.com/blog/content/images/2024/05/2024-05-19-at-20.22.png 640w"><figcaption>Pipedream is a great n8n cloud competitor with similar charge model.</figcaption></figure><p></p><h2 id="hosting">Hosting</h2><p>I use Hetzner Cloud exclusively hosting all my servers nowadays (at the time of writing this article) and I recommend it to everyone - starting from indehackers launching their first website, and ending with medium-sized SaaS companies who come to me for consulting. Hetzner makes self-hosting n8n so affordable. Before Hetzner, I used to host at DigitalOcean, Linode and Google Cloud. Here is the current pricing for 4GB RAM x86 machine across these clouds:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/07/2024-07-26-at-18.56.png" class="kg-image" alt loading="lazy" width="2000" height="1297" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/07/2024-07-26-at-18.56.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/07/2024-07-26-at-18.56.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/07/2024-07-26-at-18.56.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/07/2024-07-26-at-18.56.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Hezner: ~5 USD for 4GB RAM, 2vcpu</figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/07/2024-07-26-at-19.01.png" class="kg-image" alt loading="lazy" width="2000" height="1185" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/07/2024-07-26-at-19.01.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/07/2024-07-26-at-19.01.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/07/2024-07-26-at-19.01.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/07/2024-07-26-at-19.01.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>DigitalOcean: 24 USD for 4GB RAM, 2vcpu</figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/07/2024-07-26-at-19.03.png" class="kg-image" alt loading="lazy" width="2000" height="1280" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/07/2024-07-26-at-19.03.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/07/2024-07-26-at-19.03.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/07/2024-07-26-at-19.03.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/07/2024-07-26-at-19.03.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Linode: 24 USD for 4GB RAM, 2vcpu</figcaption></figure><p>So, it&apos;s <strong>4x</strong> difference in cost (I know - this is insane!), and the new Hetzner Cloud UI is very swift and pleasant to use - I never felt like I sacrificed anything when using Hetzner as sole founder, for recent 3 years.</p><p>Here is my Hetzner referral link if you decide to try it: <a href="https://hetzner.cloud/?ref=kW64RqTXiNiN">https://hetzner.cloud/?ref=kW64RqTXiNiN</a> (you &#xA0;get &#x20AC;&#x2060;20 on sign up, I get &#x20AC;&#x2060;10)</p><h2 id="conclusion">Conclusion</h2><p>I enjoy using n8n so far, it has great UX, and it seems to be a very stable, mature product which is a great addition to my self-hosted toolbelt.</p><p>n8n may have a steeper learning curve compared to Make and Zapier, but just because of this situation with external oAuth. And if you are not too technical, and have no experience with Docker, you can still use n8n as a cloud SaaS product.</p><p>If you liked this post, chances are you will enjoy following <a href="https://www.linkedin.com/in/anthony-sidashin/?ref=pixeljets.com">my Linkedin</a>, where I review SaaS products and bleeding edge tech every week.</p>]]></content:encoded></item><item><title><![CDATA[Self-hosted is awesome]]></title><description><![CDATA[<p>I&apos;m a big fan of self-hosting. As an indie hacker who has launched several micro-SaaS products and as a CTO of a small company, I now prefer self-hosting all the tools I might need. With the rise of high-quality self-hosted offerings from talented teams using open source as</p>]]></description><link>https://pixeljets.com/blog/self-hosted-is-awesome/</link><guid isPermaLink="false">661642d04002005ce179c1bd</guid><category><![CDATA[n8n]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Thu, 11 Apr 2024 08:45:14 GMT</pubDate><content:encoded><![CDATA[<p>I&apos;m a big fan of self-hosting. As an indie hacker who has launched several micro-SaaS products and as a CTO of a small company, I now prefer self-hosting all the tools I might need. With the rise of high-quality self-hosted offerings from talented teams using open source as their primary marketing channel while offering a cloud version of their product to generate revenue, we can save tens and hundreds of thousands of dollars over the lifetime of our projects, while maintaining full control over our data. </p><h2 id="the-list-of-self-hosted-products-i-use">The list of self-hosted products I use</h2><p>These products are awesome and have state-of-art UX, not just &quot;it gets the job done so I can cope with its weird design&quot; kind of feeling &#x2013; like it used to be 10 years ago, when self-hosted products were often just a couple of scripts cobbled together by some software engineering student. Just a couple of wonderful examples of work-oriented self-hosted products I use: </p><ul><li><a href="https://www.metabase.com/?ref=pixeljets.com">Metabase</a> (online dashboard to build SQL reports, we use it at Qwintry); </li><li><a href="https://mattermost.com/?ref=pixeljets.com">Mattermost</a> (self-hosted Slack alternative); </li><li><a href="https://directus.io/?ref=pixeljets.com">Directus</a> (turns your SQL database into CMS)</li><li><a href="https://n8n.io/?ref=pixeljets.com">n8n</a> (developer-friendly Zapier)</li><li><a href="https://librechat.ai/?ref=pixeljets.com">LibreChat</a> (ChatGPT interface for all popular LLMs)</li><li><a href="https://www.mautic.org/?ref=pixeljets.com">Mautic</a> (Marketing newsletters)</li></ul><p>Oh, and there are also <em>Supabase... and ELK stack... and Zabbix... <a href="https://pixeljets.com/blog/poor-mans-sre/">(read more about our poor-mans SRE setup)</a></em></p><p><a href="https://www.reddit.com/r/selfhosted/?ref=pixeljets.com">This is a great self-hosted subreddit I recommend</a>. <em>I should note, that a lot of products being discussed there on Reddit are mostly for personal use (like, torrents, etc), but I am mostly seeking for solutions to do my work.</em></p><p>I believe self-hosting has several advantages for indie hackers and developers like myself. When you self-host, you have complete control over your infrastructure and data. This level of control is crucial for me as a developer because I need the flexibility to customize and integrate various tools seamlessly into my workflow.</p><h3 id="the-cons">The Cons</h3><p>That being said, I think it&apos;s important to acknowledge that self-hosting comes with its own set of challenges. Maintenance, security, and scalability become my team responsibility. Ensuring regular updates, implemeting proper security measures, and managing infrastructure can be time-consuming tasks. However, if you have the technical skills and are willing to put in the effort, I believe self-hosting provides a level of independence and control that can be highly beneficial in the long run - it certainly is in our case. </p><p>The big question is: <em>how technical do you need to be to embrace self-hosted products? </em>There are no any silver bullets here, and you need to know a lot about how computers work, but I don&apos;t think you need to have 5 years of web development or system administration background to self-host effectively and reliably anymore &#x2013; life got easier! </p><h2 id="docker-skills-are-essential">Docker skills are essential </h2><p>Luckily, Docker and Docker Compose have greatly simplified the process of self-hosting. I have to admit, as a developer and CTO, in 2014-2018, I was initially hesitant to use Docker for our own web projects (especially the ones in the active development phase). It significantly complicated our deployment workflow and added a lot of overhead while providing marginal value. I didn&apos;t fully buy into the concept of 100% reproducible builds, especially considering the increased DevOps complexity that came with it. </p><p>Back then, I loved the ability to quickly hack and debug PHP scripts directly on production. While it&apos;s definitely a bad practice, there were countless times when this approach proved invaluable while debugging hard-to-reproduce bugs in a live environment: crucial for a small, rapidly changing product with just 1-3 developers! </p><p>Moreover, I struggled to wrap my head around Docker&apos;s concepts, such as volumes and binds. The cryptic shorthand syntax made it even more challenging to understand how these components worked together. I found myself spending more time trying to decipher Docker&apos;s intricacies than focusing on our application&apos;s core functionality. </p><p>At the time, our team had a well-established workflow that allowed us to develop, test, and deploy our applications efficiently (AND FAST!). Introducing Docker felt like an unnecessary layer of complexity that would disrupt our processes without providing significant benefits in return. </p><p>However, our perspective eventually changed and has shifted towards Docker everywhere (with the exception of some development machines): our projects matured and required more predictability; we strived for better CI/CD, dedicated teams of testers, better DevOps practices, and improved horizontal scalability. What&apos;s equally important is that Docker matured as well and became a great, boring technology, and a de-facto standard to host everything, starting from API services and ending with databases. </p><p>And Docker is what makes running complex software, self-hosted, so enjoyable. These applications, developed by external teams using varying technical stacks, now always come with well-conditioned Docker Compose files (official or community-supported) that make the setup process a breeze. Instead of dealing with the hassle of manual installation and configuration, I simply use Docker&apos;s containerization approach to spin up these applications quickly and easily. </p><p><strong>You can still be a good web developer without knowing Docker &#x2013; but it&apos;s hard to enjoy self-hosting nowadays without knowing it!</strong></p><h2 id="gpt-and-claude-to-the-rescue">GPT and Claude to the rescue</h2><p>Besides Docker, to self-host effectively, you will also need good understanding of:</p><ul><li>how Nginx routing works (well, you can use another webserver as an ingress controller, I just prefer Nginx).</li><li>how Linux and networking in general works (firewall, ufw)</li><li>how DNS and/or Cloudflare DNS works</li></ul><p>Great news: you don&apos;t need to spend months experimenting with these - there is a great shortcut nowadays, with the raise of LLMs, it can be done in a matter of hours! (and it can help with docker errors, too)</p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/04/2024-04-10-at-11.15.png" class="kg-image" alt loading="lazy" width="2000" height="1662" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/04/2024-04-10-at-11.15.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/04/2024-04-10-at-11.15.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/04/2024-04-10-at-11.15.png 1600w, https://pixeljets.com/blog/content/images/2024/04/2024-04-10-at-11.15.png 2358w" sizes="(min-width: 720px) 720px"></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/04/2024-04-10-at-11.152.png" class="kg-image" alt loading="lazy" width="2000" height="1653" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/04/2024-04-10-at-11.152.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/04/2024-04-10-at-11.152.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/04/2024-04-10-at-11.152.png 1600w, https://pixeljets.com/blog/content/images/2024/04/2024-04-10-at-11.152.png 2342w" sizes="(min-width: 720px) 720px"><figcaption>With GPT, I don&apos;t need to remember nginx or docker compose config syntax anymore.</figcaption></figure><h2 id="funding-self-hosted-product-development">Funding self-hosted product development </h2><p>Developing high-quality, free, and open-source products requires funding. A common approach is to offer a cloud version of the product with some enterprise features to generate revenue. The Directus project had an interesting discussion on GitHub (<a href="https://github.com/directus/directus/discussions/17977?ref=pixeljets.com">https://github.com/directus/directus/discussions/17977</a>) where the founder shared ideas on sustaining their product. They eventually adopted a strategy allowing free use of the entire platform unless the legal entity exceeds $5,000,000 USD in annual &quot;total finances.&quot; - this is an interesting approach! </p><h2 id="open-core-concerns">Open-core concerns </h2><p>Not all self-hosted products are created equal. Some cross the line, offering a barely usable free version as a marketing gimmick instead of a genuinely useful product. In my opinion, an acceptable practice is to restrict clearly enterprise features, such as SSO mechanisms (like in n8n), under paid licenses. However, restricting the free product to the point of being nearly unusable is inadequate.</p><p>One example of a self-hosted product that faced community backlash is Budibase. They recently restricted their open-source version to a limited number of users, which led to criticism on Reddit (<a href="https://www.reddit.com/r/selfhosted/comments/17v48t8/budibase_will_soon_limit_users_on_oss_self_hosted/?ref=pixeljets.com">https://www.reddit.com/r/selfhosted/comments/17v48t8/budibase_will_soon_limit_users_on_oss_self_hosted/</a>). Some users blamed this decision on the fact that Budibase is now VC-funded, suggesting that the pursuit of profitability led to the limitation of their free offering.</p><p>This incident highlights the balance that open-source projects must strike when seeking to generate revenue while maintaining the trust and support of their community. Restricting the free version too heavily can lead to a loss of goodwill and a perception that the project has abandoned its open-source roots in favor of commercial interests.</p><h2 id="my-real-life-approach-to-self-host-a-product">My real life approach to self-host a product</h2><p>Let&apos;s say I want to host n8n. These are my steps:</p><h3 id="basic-steps-for-a-demo">Basic steps for a demo</h3><ol><li>Log in via SSH to my Ubuntu running on a remote Hetzner cloud server (I don&apos;t ever host anything in Docker on my Macbook, I do everything on Hetzner via plain SSH and/or awesome VS Code Remote extension, in case I need heavy config edits)</li><li>Verify docker is available and is running on the server: <code>docker ps</code></li><li>Google <code>n8n docker</code> and open docs: <a href="https://docs.n8n.io/hosting/installation/docker/?ref=pixeljets.com#prerequisites">https://docs.n8n.io/hosting/installation/docker/#prerequisites</a></li><li>Run these two commands highlighted in the manual to run the service</li><li>Verify docker containers of n8n is running. If not, explore container errors via <code>docker logs xxx</code> and check n8n github repository for similar issues: chances are someone already struggled with this! </li><li>Port forward the service to my localhost so I can open the service in my Chrome via <code>http://localhost:3333</code> </li></ol><p>These steps generally take <em>around 3-5 minutes max to get the service running</em>. Then, I usually play with the service for an hour or two to realize if I want to use it on a longterm basis.</p><ol><li>If I do, go to Cloudflare to my domain and create a subdomain on one of my domains, with an <code>A record</code> pointing to my Hetzner cloud server (e.g. <code>n8n.scrapeninja.net</code>. HTTPS certificates are handled by Cloudflare, has good DNS UI, and also protects my server ip addresses for occasional DDoS attacks (and it&apos;s free!)</li><li>I create a nginx host and point it to the docker container. Here is an example of such a config:</li></ol><figure class="kg-card kg-code-card"><pre><code>server {
    listen 80;
    server_name n8n.scrapeninja.net;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    server_name n8n1.scrapeninja.net;
    ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
    ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
    location / {
        proxy_pass http://localhost:5678;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade; # websocket support
        proxy_set_header Connection &quot;Upgrade&quot;;  # websocket support
        proxy_set_header Host $host;
        chunked_transfer_encoding off;
        proxy_buffering off;
        proxy_cache off;
    }
}
</code></pre><figcaption>This is a n8n reverse proxy config for nginx.</figcaption></figure><h2 id="hosting">Hosting</h2><p>I use Hetzner Cloud almost exclusively for hosting all my servers nowadays (at the time of writing this article), and I wholeheartedly recommend it to everyone&#x2014;from indie hackers launching their first website to medium-sized SaaS companies seeking consulting. Hetzner makes self-hosting <em>so</em> affordable. Before Hetzner, I used to host on DigitalOcean, Linode, and Google Cloud. Here is the current pricing for a 4GB RAM x86 machine across these clouds:</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/07/2024-07-26-at-18.56-1.png" class="kg-image" alt loading="lazy" width="2000" height="1297" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/07/2024-07-26-at-18.56-1.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/07/2024-07-26-at-18.56-1.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/07/2024-07-26-at-18.56-1.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/07/2024-07-26-at-18.56-1.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Hezner: ~5 USD for 4GB RAM, 2vcpu</figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/07/2024-07-26-at-19.01-1.png" class="kg-image" alt loading="lazy" width="2000" height="1185" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/07/2024-07-26-at-19.01-1.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/07/2024-07-26-at-19.01-1.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/07/2024-07-26-at-19.01-1.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/07/2024-07-26-at-19.01-1.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>DigitalOcean: 24 USD for 4GB RAM, 2vcpu</figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/07/2024-07-26-at-19.03-1.png" class="kg-image" alt loading="lazy" width="2000" height="1280" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/07/2024-07-26-at-19.03-1.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/07/2024-07-26-at-19.03-1.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/07/2024-07-26-at-19.03-1.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/07/2024-07-26-at-19.03-1.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Linode: 24 USD for 4GB RAM, 2vcpu</figcaption></figure><p>So, it&apos;s <strong>4x</strong> difference in cost (insane, I know!), and the new Hetzner Cloud UI is very swift and pleasant to use - I never felt like I sacrificed anything when using Hetzner, for recent 3 years.</p><p>Here is my Hetzner referral link if you decide to try it: <a href="https://hetzner.cloud/?ref=kW64RqTXiNiN">https://hetzner.cloud/?ref=kW64RqTXiNiN</a> (you &#xA0;get &#x20AC;&#x2060;20 on sign up, I get &#x20AC;&#x2060;10).</p><p></p><p>If you liked this post, chances are you will enjoy checking <a href="https://pixeljets.com/blog/">other posts</a> in this blog, and following <a href="https://www.linkedin.com/in/anthony-sidashin/?ref=pixeljets.com">my Linkedin</a>, where I review SaaS products and bleeding edge tech.</p>]]></content:encoded></item><item><title><![CDATA[Learning French with ChatGPT]]></title><description><![CDATA[<p>Duolingo and flashcards get boring quickly, so lately I&apos;ve been learning French with ChatGPT.</p><p>Observation #1: <strong>ChatGPT for Android got a very good speech-to-text and text-to-speech engine based on Whisper, since fall 2023</strong>. It understands what you say well and intones its phrases nicely when it speaks. You</p>]]></description><link>https://pixeljets.com/blog/learning-languages-chatgpt/</link><guid isPermaLink="false">6607f2b14002005ce179bf4a</guid><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Sat, 30 Mar 2024 11:24:25 GMT</pubDate><content:encoded><![CDATA[<p>Duolingo and flashcards get boring quickly, so lately I&apos;ve been learning French with ChatGPT.</p><p>Observation #1: <strong>ChatGPT for Android got a very good speech-to-text and text-to-speech engine based on Whisper, since fall 2023</strong>. It understands what you say well and intones its phrases nicely when it speaks. You can have real conversations with it on any topic. Of course, it&apos;s still far from being a human tutor in terms of analyzing your pronunciation, because what essentially reaches ChatGPT cloud is not the audio stream of your words, but just recognized text (with some glitches if you consider the sloppy pronunciation of an average beginner). </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/Screenshot_20240330_141455_ChatGPT.jpg" class="kg-image" alt loading="lazy" width="916" height="1984" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/Screenshot_20240330_141455_ChatGPT.jpg 600w, https://pixeljets.com/blog/content/images/2024/03/Screenshot_20240330_141455_ChatGPT.jpg 916w" sizes="(min-width: 720px) 720px"><figcaption>30% of my phone time looks like this now.</figcaption></figure><p>The conversion from speech to text happens on the local device. <em>(By the way, I see great potential here for neural networks that analyze the audio stream itself and compare it to reference pronunciation - check out <a href="https://getfluently.app/?ref=pixeljets.com">https://getfluently.app</a> from fellow founder Yuri Rebryk, a cool idea, but so far implemented only for English language, as far as I know.)</em></p><p>Observation #2: <strong>ChatGPT allows you to extend your vocabulary in different industries and niches. Sure, it loses to a human tutor in not &quot;hearing&quot; your pronunciation, but it infinitely wins in the depth and breadth of its knowledge.</strong> Today it can give you 50 words from Harry Potter, and tomorrow it can act out a dialogue where you go to a restaurant and discuss the nuances of pairing white wine with oysters with the french chef. Or an angry landlord shows up because you haven&apos;t paid rent for 3 months. The possibilities are endless.</p><p>Here&apos;s an example of my basic prompt #1: </p><p><em>I study French. My level is A2. Please create a simple dialogue happening in a French restaurant between two chefs who argue on ____. Don&apos;t use person labels.</em> </p><p><em>(The last phrase is so that each line doesn&apos;t have a person label prefix, which is distracting in the audio stream).</em></p><p>It&apos;s worth noting that ChatGPT can switch languages decently even within one line, BUT only if that line is long enough. So it can&apos;t pronounce a list of French words with their English translations well - it constantly mixes them up, trying to pronounce the words in a French or English manner - it&apos;s fine for reading, but almost impossible to listen to. But if you ask it to repeat each dialog line in English right after the French, then it sounds decent.</p><p>Prompt #2:</p><p><em>I study French. My level is A2. Please create a simple dialogue happening in a French restaurant. Add an English translation after each phrase. Don&apos;t use person labels.</em></p><p>ChatGPT can re-read a message out loud, even if you&apos;re already in text mode reading. There&apos;s a &quot;Read aloud&quot; option in the context menu for that.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/2024-03-30-at-13.50.png" class="kg-image" alt loading="lazy" width="910" height="1956" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/2024-03-30-at-13.50.png 600w, https://pixeljets.com/blog/content/images/2024/03/2024-03-30-at-13.50.png 910w" sizes="(min-width: 720px) 720px"><figcaption>&quot;Read aloud&quot; option is essential for improving listening skills.</figcaption></figure><p>If I encounter an expression I don&apos;t understand, I copy that snippet into the Yandex Translate app, translate it there, and listen to it separately. (Yandex is better than Google Translate in that it can pronounce even long lines, has fewer glitches, and provides more usage examples).</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/image.png" class="kg-image" alt loading="lazy" width="916" height="1984" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/image.png 600w, https://pixeljets.com/blog/content/images/2024/03/image.png 916w" sizes="(min-width: 720px) 720px"><figcaption>Usage examples from real movies is a killer feature of Yandex Translate.</figcaption></figure><p>After going through the whole dialogue, I dictate a new prompt in the same conversation with ChatGPT (it&apos;s already in context):</p><p>Prompt #3</p><p><em>Now let&apos;s replicate this dialog, I will answer your questions one by one.</em></p><p>And I try to reproduce the phrases and words I just learned. The dialog inevitably deviates from the original version, making it even more exciting and challenging.</p><p>The main advantage of this approach over watching YouTube is that the vocabulary is actively used by me, not just passively consumed, bringing it closer in usefulness to conversation clubs - except a conversation club meets once a week if I can make it, while ChatGPT is available at any moment, like during a walk.</p>]]></content:encoded></item><item><title><![CDATA[How to keep your product alive: poor man's SRE]]></title><description><![CDATA[<p>Let&apos;s imagine your product didn&apos;t die and managed to gain some real traction (&#x1F389; CONGRATULATIONS!). After a few years it stops being a small and nimble project and turns into something much bigger, involving dozens and hundreds of people. </p><h2 id="project-lifecycle-mvpgrowthmaintenance">Project lifecycle: MVP -&gt; growth -&</h2>]]></description><link>https://pixeljets.com/blog/poor-mans-sre/</link><guid isPermaLink="false">65fa9e524002005ce179bdcb</guid><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Wed, 20 Mar 2024 10:27:21 GMT</pubDate><content:encoded><![CDATA[<p>Let&apos;s imagine your product didn&apos;t die and managed to gain some real traction (&#x1F389; CONGRATULATIONS!). After a few years it stops being a small and nimble project and turns into something much bigger, involving dozens and hundreds of people. </p><h2 id="project-lifecycle-mvpgrowthmaintenance">Project lifecycle: MVP -&gt; growth -&gt; maintenance </h2><p>What does a full-stack engineer of a new project think about? Exciting questions: what programming language to choose, Bootstrap vs Tailwind, React vs vue.js, services or microservices, and so on. <em>&quot;How to build the product&quot;?</em> Those days are certainly wonderful, but there comes a time when rapidly deploying features using the latest framework is no longer as important and cool as maintaining the reliability and stability of the entire system - year after year, <strong>preferably without any major f**k-ups.</strong></p><p>Now that Qwintry can no longer call itself a startup, the focus shifts from introducing new features to SRE <em>(Site Reliability Engineering)</em> - ensuring the availability, reliability and performance of the product. Some might say it&apos;s grim and boring, but in my opinion, the ability to sit down and meticulously investigate, analyse, unravel the threads of logs and reconstruct events in production is what makes a person a valuable employee and a true engineer (and detective). And you can get a kick out of it too.</p><h2 id="poor-mans-approach-to-sre-of-a-small-business">Poor man&apos;s approach to SRE of a small business</h2><p>We don&apos;t have a dedicated SRE team - we&apos;re no longer a startup, but we&apos;re still a long way from being a rich corporation. So our approach is very different from what bigger companies do. When things go south, we plug the holes as best we can: our development people, our testing people, our devops people and myself - we all get involved in some way. In this situation, the stack of products we use to monitor, analyse and manage our services comes to the fore. Why is that? If you have to run to the server and grep through logs from the console every time you want to analyse a problem - no one wants to do that, it&apos;s inconvenient, and it also severely limits the number of people who can do that analysis.</p><h2 id="our-stack-of-boring-and-predictable-sre-tools">Our stack of boring and predictable SRE tools</h2><h3 id="zabbix">Zabbix</h3><p>For cloud and physical server monitoring and Telegram alerts, we&apos;ve been using <strong>Zabbix</strong> for a very long time, which allows us to track the health of servers, networks and services. It&apos;s not very ergonomic in terms of adding new hosts and creating alerts, to be honest, a lot of clicks and taps - but it gets the job done. Zabbix was launched in 2001 (sic) and is still actively developed and improved - a perfect example of a great open source product.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/2024-03-20-at-11.36.png" class="kg-image" alt loading="lazy" width="2000" height="1105" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/2024-03-20-at-11.36.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/03/2024-03-20-at-11.36.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/03/2024-03-20-at-11.36.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/03/2024-03-20-at-11.36.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Zabbix dashboard</figcaption></figure><p>I think <strong>Prometheus + Grafana</strong> now allow to build a very similar dashboard &#x2013; but Zabbix just works for us and we never had a lot of reasons to switch.</p><h3 id="elk-stack-for-log-storage-and-analysis">ELK Stack for log storage and analysis</h3><p>All logs from all servers and services flow to a machine with the <strong>ELK stack</strong> (Elasticsearch, Logstash, Kibana) - it happily eats hundreds of gigabytes of RAM, but for the last couple of years it hasn&apos;t crashed or caused any problems. <a href="https://www.linkedin.com/posts/anthony-sidashin_hetzner-linode-cloud-activity-7158059606005469184-KoLN?utm_source=share&amp;utm_medium=member_desktop">(My recent Linkedin post regarding our primary database migration to Hetzner uses ELK visualisation) </a></p><p>Okay, I was not a huge fan of ElasticSearch 4 years ago when running it on our machines was cumbersome and expensive. <a href="https://pixeljets.com/blog/clickhouse-vs-elasticsearch/">Clickhouse seemed to be a very promising alternative for log storage tasks (and it indeed is!).</a> But, ecosystem (Kibana!) is what makes ELK stack so great and strong - it just works with minimal effort. Now, when hardware and RAM got much more affordable for us, ELK stack is still rocking and I appreciate it a lot.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/2024-03-20-at-11.35.png" class="kg-image" alt loading="lazy" width="2000" height="1358" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/2024-03-20-at-11.35.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/03/2024-03-20-at-11.35.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/03/2024-03-20-at-11.35.png 1600w, https://pixeljets.com/blog/content/images/2024/03/2024-03-20-at-11.35.png 2292w" sizes="(min-width: 720px) 720px"><figcaption>Kibana excels at analyzing HTTP requests latency.</figcaption></figure><p>I think <strong>Grafana</strong> and recently <strong>Loki</strong> for log storage are a very competitive open source (open core?) players on the market which I still hope to try on some new project.</p><h3 id="metabase-sql-dashboard">Metabase: SQL dashboard</h3><p>For SQL reports, we use <strong>Metabase</strong> - a great open-source product with a great design that allows us to query databases and build reports based on them. (Don&apos;t forget to create a read-only user in your database and limit the export of users&apos; personal data!)</p><figure class="kg-card kg-image-card"><img src="https://pixeljets.com/blog/content/images/2024/03/preview.png" class="kg-image" alt loading="lazy" width="2000" height="1299" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/preview.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/03/preview.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/03/preview.png 1600w, https://pixeljets.com/blog/content/images/2024/03/preview.png 2264w" sizes="(min-width: 720px) 720px"></figure><p>Metabase is so nice: not just for SRE needs; but also for financial department, for support department - we build a lot of dashboards and reports there and this saved us countless hours of development work (as before Metabase, all reports were implemented in our primary app PHP+MySQL backend).</p><p>Metabase killer feature for us is that allows to build <a href="https://www.metabase.com/docs/latest/questions/native-editor/sql-parameters?ref=pixeljets.com">SQL reports which are parametrized</a>: you put certain [[symbols]] and these {{pieces}} become a convenient UI filters which can be used later by non-technical operators:</p><pre><code>
SELECT
  count(*)
FROM
  products
WHERE
  category = {{category}}

</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/02-widget.png" class="kg-image" alt loading="lazy" width="2000" height="1361" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/02-widget.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/03/02-widget.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/03/02-widget.png 1600w, https://pixeljets.com/blog/content/images/2024/03/02-widget.png 2092w" sizes="(min-width: 720px) 720px"><figcaption>SQL parameters to GUI in Metabase is a killer feature for our team</figcaption></figure><p>They also have visual GUI to replace SQL, but this is not what we use often: people who cannot write SQL usually just ask IT guys to create a new report for them.</p><p>I am pretty sure we will improve our report building workflow in 2024: with rapid LLM models evolution, I am pretty sure our finance department will soon be able to ask basic questions and GPT or Claude (200k tokens input is finally there!) will be able to spit out a usable SQL report for them.</p><h3 id="sentry-application-error-monitoring">Sentry: application error monitoring</h3><p>For monitoring code errors in production, we use <strong>Sentry</strong>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/2024-03-20-at-13.03.png" class="kg-image" alt loading="lazy" width="2000" height="1147" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/2024-03-20-at-13.03.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/03/2024-03-20-at-13.03.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/03/2024-03-20-at-13.03.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/03/2024-03-20-at-13.03.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Sentry dashboard</figcaption></figure><p>A few lines of code - and all our PHP, React-Native, Node.js apps are now reporting all error and exception stack traces into beautiful dashboard for easy investigations. It&apos;s awesome.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/2024-03-20-at-13.05.png" class="kg-image" alt loading="lazy" width="2000" height="1156" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/2024-03-20-at-13.05.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/03/2024-03-20-at-13.05.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/03/2024-03-20-at-13.05.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/03/2024-03-20-at-13.05.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Sentry shows the line of code where the error happened.</figcaption></figure><p></p><h2 id="choosing-boring-self-hosted-products-for-sre">Choosing boring self-hosted products for SRE</h2><p>All of these products are self-hosted and open source. Except for the cloud version of Sentry, which has done an <strong>excellent</strong> job with its cloud solution in terms of cost control (constant spikes of hundreds of thousands of errors per hour don&apos;t affect our invoice too badly), and costs very little money relative to the benefit it provides.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2024/03/2024-03-20-at-13.22.png" class="kg-image" alt loading="lazy" width="2000" height="1537" srcset="https://pixeljets.com/blog/content/images/size/w600/2024/03/2024-03-20-at-13.22.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2024/03/2024-03-20-at-13.22.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2024/03/2024-03-20-at-13.22.png 1600w, https://pixeljets.com/blog/content/images/size/w2400/2024/03/2024-03-20-at-13.22.png 2400w" sizes="(min-width: 720px) 720px"><figcaption>Sentry is a great example of an ethical and predictable approach in billing in unpredictable space of log storage</figcaption></figure><p>All of these products are <a href="https://mcfunley.com/choose-boring-technology?ref=pixeljets.com">boring, in a good sense</a>, not very innovative, work acceptably or well, and it&apos;s scary to think how many tens of thousands of dollars we would have shelled out if we had taken the &quot;best on the market&quot;, as wealthier companies do.</p><p>For example, <strong>Datadog</strong> and <strong>NewRelic</strong> dominate the observability systems and log storage market, and they have excellent tools, but they&apos;re quite expensive - thousands of dollars per month. The scariest thing is not that it&apos;s expensive, but that it can very quickly get out of control when a random log file accidentally grows not by 1GB per day, but by 20GB per day. When you have dozens of machines and hundreds of services - ALWAYS some shit like that happens. We know this pretty well, <a href="https://pixeljets.com/blog/aftership-charged-us-26k-and-refused-to-refund/">because we recently paid 26K USD for an API which we even forgot we are using</a>.</p>]]></content:encoded></item><item><title><![CDATA[How to set proxy in Python Requests]]></title><description><![CDATA[<h2 id="introduction">Introduction</h2><p>As a seasoned developer with a keen interest in web scraping and data extraction, I&apos;ve often leveraged Python for its simplicity and power. In this realm, understanding and utilizing proxies becomes a necessity, especially to navigate through the complexities of web requests, IP bans, and rate limiting.</p>]]></description><link>https://pixeljets.com/blog/python-requests-proxy/</link><guid isPermaLink="false">65ac02be14aa72faea11ab95</guid><category><![CDATA[webscraping]]></category><category><![CDATA[python]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Sat, 20 Jan 2024 18:07:17 GMT</pubDate><media:content url="https://pixeljets.com/blog/content/images/2024/01/adam-chang-IWenq-4JHqo-unsplash--1-.jpg" medium="image"/><content:encoded><![CDATA[<h2 id="introduction">Introduction</h2><img src="https://pixeljets.com/blog/content/images/2024/01/adam-chang-IWenq-4JHqo-unsplash--1-.jpg" alt="How to set proxy in Python Requests"><p>As a seasoned developer with a keen interest in web scraping and data extraction, I&apos;ve often leveraged Python for its simplicity and power. In this realm, understanding and utilizing proxies becomes a necessity, especially to navigate through the complexities of web requests, IP bans, and rate limiting. In this article, I&apos;ll share my insights and experiences on using proxies with Python&apos;s Requests library. We&apos;ll start from the basics and gradually move to more advanced techniques like retries and rotating proxies. My journey through these concepts has been filled with trials and errors, and I aim to provide you with a clear path to mastering proxies in Python, peppered with practical, real-world code examples. If you are relatively new to web scraping in Python and proxies in particular, I recommend my blog post about <a href="https://pixeljets.com/blog/choosing-proxy-web-scraping/">choosing a proxy for web scraping</a> and do not forget that you will probably need to check that the proxy is working properly, this is where my <a href="https://pixeljets.com/blog/simple-proxy-checker-in-curl-bash/">simple bash proxy checker</a> might save you a lot of keystrokes over time.</p><h3 id="prerequisites-installation">Prerequisites &amp; Installation</h3><p>Before we dive into the nuances of proxy usage in Python, let&apos;s set the stage with the necessary prerequisites. First and foremost, you need a basic understanding of Python (if it needs mentioning, I will be using Python version 3 in the code examples below). </p><p>If you&apos;re already comfortable with Python&apos;s syntax and basic libraries, you&apos;re good to go. Additionally, familiarity with HTTP requests and responses will be beneficial, as proxies predominantly deal with these elements. Installing the Requests library is our starting point. This library simplifies HTTP requests in Python, providing an intuitive and user-friendly way to send and receive data. You can install it using pip, Python&apos;s package manager. Just run <code>pip install requests</code>. Once you have Requests installed, the next step is to ensure you have access to a proxy or a list of proxies. </p><p>Proxies can be free or paid, and the choice depends on your specific needs and the level of reliability you require. In my experience, paid proxies tend to be much more reliable for real-world scraping tasks<a href="https://pixeljets.com/blog/web-scraping-sending-requests/">,</a> and using free proxies is usually very time consuming and work poorly - I don&apos;t really recommend this unless you need to do just 3 or 5 requests for basic tests.</p><h2 id="how-to-use-a-proxy-with-python-requests">How to use a proxy with Python Requests</h2><h3 id="basic-example">Basic example</h3><p>Using a proxy with Python Requests is straightforward. In its simplest form, you define your proxy and pass it as a parameter to the <code>requests.get()</code> or <code>requests.post()</code> method. Here&apos;s a basic example:</p><pre><code class="language-python">import requests

# Replace with your proxy URL
proxy = &quot;http://your_proxy_here&quot;

# Using the proxy with a GET request
response = requests.get(&quot;http://example.com&quot;, proxies={&quot;http&quot;: proxy, &quot;https&quot;: proxy})
print(response.text)
</code></pre><p>In this code, replace <code>&quot;http://your_proxy_here&quot;</code> with your actual proxy URL. This example demonstrates a GET request, but the same logic applies to other types of HTTP requests.</p><h3 id="with-authentication">With authentication</h3><p>When using authenticated proxies, you need to provide a username and password alongside the proxy URL. This can be a bit tricky, as the credentials need to be included in the proxy URL itself. Here&apos;s a basic example of how to use authenticated proxies in Python:</p><pre><code class="language-python">import requests

# Your proxy credentials
proxy_user = &quot;user&quot;
proxy_pass = &quot;password&quot;

# Your proxy URL
proxy_url = &quot;http://your_proxy_here&quot;

# Forming the authenticated proxy URL
proxies = {
    &quot;http&quot;: f&quot;http://{proxy_user}:{proxy_pass}@{proxy_url}&quot;,
    &quot;https&quot;: f&quot;http://{proxy_user}:{proxy_pass}@{proxy_url}&quot;
}

# Using the authenticated proxy with a GET request
response = requests.get(&quot;http://example.com&quot;, proxies=proxies)
print(response.text)
</code></pre><h3 id="with-retries">With retries</h3><p>In a real-world scenario, proxies might fail, and it&apos;s crucial to handle these failures gracefully. Retries are an effective way to ensure your request eventually goes through. Here&apos;s how I implement retries in my projects:</p><pre><code class="language-python">from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

# Session with retry strategy
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[500, 502, 503, 504])
session.mount(&apos;http://&apos;, HTTPAdapter(max_retries=retries))
session.mount(&apos;https://&apos;, HTTPAdapter(max_retries=retries))

# Making a request using the session
response = session.get(&quot;http://example.com&quot;, proxies={&quot;http&quot;: proxy, &quot;https&quot;: proxy})
print(response.text)
</code></pre><p>This approach uses a session object with a mounted HTTPAdapter. The <code>Retry</code> object defines the retry strategy, indicating the number of retries and the status codes that trigger a retry.</p><h3 id="rotating-proxies-from-the-list">Rotating proxies from the list</h3><p>When dealing with a large number of requests, using a single proxy might not be sufficient due to rate limiting or IP bans. Rotating proxies can solve this problem.</p><p><em>I should note that some proxy providers currently offer rotating ips via just one proxy address. In this case, you don&apos;t really need to cycle through the list of proxies &#x2013; you just need to retry your request, obviously (like in the code example above). The code below is handling a different case, when you have multiple proxies with separate addresses and you need to rotate through them.</em></p><p> Here&apos;s how I rotate proxies in Python:</p><pre><code class="language-python">from itertools import cycle

# List of proxies
proxies = [&quot;http://proxy1.example.com&quot;, &quot;http://proxy2.example.com&quot;, &quot;http://proxy3.example.com&quot;]
proxy_pool = cycle(proxies)

# Function to make a request with rotating proxies
def make_request(url):
    for _ in range(len(proxies)):
        proxy = next(proxy_pool)
        try:
            response = requests.get(url, proxies={&quot;http&quot;: proxy, &quot;https&quot;: proxy})
            return response
        except:
            # Handle exception (e.g., log and continue to the next proxy)
            pass

# Usage
response = make_request(&quot;http://example.com&quot;)
print(response.text if response else &quot;Request failed&quot;)
</code></pre><p>In this snippet, <code>cycle</code> from <code>itertools</code> is</p><p>used to rotate through the proxy list. Each request attempts to use a different proxy, providing a simple yet effective way to manage multiple proxies.</p><h2 id="conclusion">Conclusion</h2><p>In summary, understanding and efficiently utilizing proxies in Python can significantly enhance your web scraping capabilities. By integrating basic proxy usage, implementing retries, and rotating through multiple proxies, you can overcome common challenges like IP bans and rate limiting. This knowledge is not just limited to Python; it&apos;s applicable to other languages and frameworks, as I&apos;ve explored in my articles on <a href="https://pixeljets.com/blog/puppeteer-api-web-scraping/">Puppeteer</a> and <a href="https://pixeljets.com/blog/node-fetch-vs-axios-vs-got-for-web-scraping-in-node-js/">Node.js web scraping</a>. Remember, the key to successful web scraping lies in being respectful to the target websites, adhering to their terms of service, and using proxies judiciously to avoid any unethical practices.</p><p>As always, I&apos;m eager to share more insights and practical tips in future articles. If you found this article helpful, you might also enjoy my Youtube video <a href="https://youtu.be/kPe3wtA9aPM?ref=pixeljets.com">on discovering hidden website APIs via chrome dev tools</a>. Happy coding and scraping!</p>]]></content:encoded></item><item><title><![CDATA[How to set proxy in Playwright]]></title><description><![CDATA[<p>In this article I will describe how to set a proxy in Playwright (Node.js version of Playwright).</p><p>Playwright is obviously one of the best and most modern solutions to automate browsers in 2024. It uses the CDP protocol to send commands to browsers and supports Chromium, Chrome and Firefox</p>]]></description><link>https://pixeljets.com/blog/proxy-in-playwright/</link><guid isPermaLink="false">65a502fa14aa72faea11aa9b</guid><category><![CDATA[playwright]]></category><category><![CDATA[webscraping]]></category><category><![CDATA[node.js]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Fri, 19 Jan 2024 11:50:32 GMT</pubDate><media:content url="https://pixeljets.com/blog/content/images/2024/01/tom-coomer-de0P588zgls-unsplash.webp" medium="image"/><content:encoded><![CDATA[<img src="https://pixeljets.com/blog/content/images/2024/01/tom-coomer-de0P588zgls-unsplash.webp" alt="How to set proxy in Playwright"><p>In this article I will describe how to set a proxy in Playwright (Node.js version of Playwright).</p><p>Playwright is obviously one of the best and most modern solutions to automate browsers in 2024. It uses the CDP protocol to send commands to browsers and supports Chromium, Chrome and Firefox browsers out of the box. It is open source and very well maintained. It&apos;s main use case is UI test automation and web scraping. Setting up proxies is useful for both of these use cases - especially for web scraping, where using high quality proxies is crucial. The Playwright SDK is available for many programming languages (C#, Python, Node.js), this blog post is dedicated to the Node.js version of Playwright. By the way, if you can&apos;t decide <a href="https://pixeljets.com/blog/web-scraping-playwright-python-nodejs/">between the Node.js and Python versions of Playwright, read my article about their differences</a>.</p><p>Setting Playwright proxy can be done on two layers:</p><ul><li>on global browser instance, and</li><li>on browser context</li></ul><h2 id="the-simplest-way-to-set-proxy-in-playwright">The simplest way to set proxy in Playwright</h2><p>Chances are you need just this way! </p><pre><code class="language-js">const browser = await chromium.launch({
  proxy: {
    server: &apos;http://myproxy.com:3128&apos;,
    username: &apos;usr&apos;,
    password: &apos;pwd&apos;
  }
});</code></pre><p>That&apos;s all you need!</p><p>In case you have your proxy credentials in standard <code>http://username:pw@host:port</code> syntax, you need to convert it into Playwright format. Here is how I usually do it in my Node.js projects:</p><h2 id="my-way-to-setup-proxy-in-nodejs-playwright-projects">My way to setup proxy in Node.js Playwright projects</h2><p>I don&apos;t recommend to hardcode proxy into your project Javascript/Typescript code. Even if you don&apos;t change your proxy every day, if you specify it this way, you will have to commit these sensitive credentials into git repository, which is usually considered a bad practice. Instead, I recommend to use environment variables to use .env file. I also recommend to use standard one-line syntax for the proxy:</p><figure class="kg-card kg-code-card"><pre><code>PROXY_URL=http://user:pw@proxy-host:port</code></pre><figcaption>put this into .env file in the root of your project</figcaption></figure><p>Now, in main file of your project, init dotenv (do not forget to install dotenv using <code>npm i dotenv</code> command:</p><pre><code class="language-javascript">import dotenv from &apos;dotenv&apos;
dotenv.config()

function convertProxyToPlaywrightFormat(proxyUrl) {
    const url = new URL(proxyUrl);
    return {
        server: `${url.protocol}//${url.host}`,
        username: url.username,
        password: url.password
    };
}

const proxyOptions = convertProxyToPlaywrightFormat(proxyUrl);</code></pre><p>This way, we avoid having 3 env variables just for proxy (username, pw, host) and replace them with only one variable.</p><p>And here is the full code of using this proxy in playwright:</p><figure class="kg-card kg-code-card"><pre><code class="language-javascript">import &apos;dotenv/config&apos;;
import { chromium } from &apos;playwright&apos;;

function convertProxyToPlaywrightFormat(proxyUrl) {
    const url = new URL(proxyUrl);
    return {
        server: `${url.protocol}//${url.host}`,
        username: url.username,
        password: url.password
    };
}

async function main() {
    const proxyUrl = process.env.PROXY_URL;

    if (!proxyUrl) {
        console.error(&apos;Proxy URL not found in .env file&apos;);
        process.exit(1);
    }

    const proxyOptions = convertProxyToPlaywrightFormat(proxyUrl);

    const browser = await chromium.launch({
        proxy: proxyOptions,
    });

    const page = await browser.newPage();
    await page.goto(&apos;http://example.com&apos;);

    await browser.close();
}

main();
</code></pre><figcaption>Simple way to use environment variable to feed proxy into Playwright</figcaption></figure><h2 id="playwright-for-web-scraping-rotating-proxies-and-retries">Playwright for web scraping: rotating proxies and retries</h2><p>If you are trying to use some kind of rotating proxies for real world scraping with Playwright, the code above will of course won&apos;t work reliably: the main reason is that proxies, even good, high quality ones, are naturally another moving part which reduces the overall reliability of the connection to the target website, so it will inevitably fail a lot (this code will fail even without using proxies, but with proxies it will fail much more often!). To mitigate this, having retry strategy is crucial. Here is a very simple way to retry &#xA0;the Playwright request:</p><figure class="kg-card kg-code-card"><pre><code class="language-javascript">import &apos;dotenv/config&apos;;
import { chromium } from &apos;playwright&apos;;

function convertProxyToPlaywrightFormat(proxyUrl) {
    const url = new URL(proxyUrl);
    return {
        server: `${url.protocol}//${url.host}`,
        username: url.username,
        password: url.password
    };
}

async function tryNavigate(page, url, maxRetries = 3) {
    for (let attempt = 1; attempt &lt;= maxRetries; attempt++) {
        try {
            await page.goto(url);
            return; // If successful, return without throwing an error
        } catch (error) {
            console.error(`Attempt ${attempt} failed: ${error.message}`);
            if (attempt === maxRetries) {
                throw error; // Rethrow the last error if all retries fail
            }
        }
    }
}

async function main() {
    const proxyUrl = process.env.PROXY_URL;

    if (!proxyUrl) {
        console.error(&apos;Proxy URL not found in .env file&apos;);
        process.exit(1);
    }

    const proxyOptions = convertProxyToPlaywrightFormat(proxyUrl);
    const browser = await chromium.launch({
        proxy: proxyOptions,
    });

    try {
        const page = await browser.newPage();
        await tryNavigate(page, &apos;http://example.com&apos;);
    } catch (error) {
        console.error(`Failed to navigate: ${error.message}`);
    } finally {
        await browser.close();
    }
}

main();
</code></pre><figcaption>The same code with simple retry strategy</figcaption></figure><p>When loading web pages using Playwright with a proxy, it is also often a good idea to reduce the amount of loaded resources, <a href="https://pixeljets.com/blog/blocking-images-in-playwright/">for example, by blocking resource load by resource type.</a></p><h2 id="setting-different-proxies-for-one-playwright-instance">Setting different proxies for one Playwright instance</h2><p>Sometimes, you want to reduce the amount of browser instances, while using different proxies for different requests. This helps to reduce hardware and RAM usage (and Playwright is very resource intensive!). This is where Playwright contexts become useful: BrowserContexts provide a way to operate multiple independent browser sessions. It is important to understand that two browser contexts launched from one Playwright browser share nothing: for websites, these two contexts are essentially looking like a different browsers (<a href="https://playwright.dev/docs/api/class-browsercontext?ref=pixeljets.com">read more in Playwright official docs</a>). Let&apos;s say you have this kind of .env file:</p><pre><code>PROXY_URL=http://user:pw@proxy-host:port
PROXY2_URL=http://user2:pw@proxy-host2:port</code></pre><p>Here is how you can use these two different proxies in one Playwright instance:</p><figure class="kg-card kg-code-card"><pre><code>import &apos;dotenv/config&apos;;
import { chromium } from &apos;playwright&apos;;

function convertProxyToPlaywrightFormat(proxyUrl) {
    const url = new URL(proxyUrl);
    return {
        server: `${url.protocol}//${url.host}`,
        username: url.username,
        password: url.password
    };
}

async function main() {
    const proxyUrl = process.env.PROXY_URL;
    const proxy2Url = process.env.PROXY2_URL;

    if (!proxyUrl || !proxy2Url) {
        console.error(&apos;One or both proxy URLs not found in .env file&apos;);
        process.exit(1);
    }

    const proxyOptions = convertProxyToPlaywrightFormat(proxyUrl);
    const proxy2Options = convertProxyToPlaywrightFormat(proxy2Url);

    const browser = await chromium.launch();

    // Create two different contexts with different proxies
    const context1 = await browser.newContext({ proxy: proxyOptions });
    const context2 = await browser.newContext({ proxy: proxy2Options });

    const page1 = await context1.newPage();
    const page2 = await context2.newPage();

    // Do something with both pages. 
    // Cookies and sessions are not shared between page1 and page2
    await page1.goto(&apos;http://example.com&apos;);
    await page2.goto(&apos;http://example.com&apos;);

    // Close the browser contexts
    await context1.close();
    await context2.close();

    // Close the browser
    await browser.close();
}

main();
</code></pre><figcaption>Setting Playwright proxy on context level</figcaption></figure><p>Thank you for reading this writeup! If you enjoyed this article, you might also be interested in how <a href="https://pixeljets.com/blog/bypass-cloudflare/">I investigated Cloudflare anti-scraping protections and bypassed them</a> and <a href="https://pixeljets.com/blog/how-do-download-pdf-in-playwright/">how to download PDF in Playwright</a>.</p>]]></content:encoded></item><item><title><![CDATA[Modern web scraping with Playwright: choosing between Python and NodeJS]]></title><description><![CDATA[<p>When diving into the world of automated browser testing and scraping with Playwright, one of the first decisions you&apos;ll encounter is the choice of programming language. Playwright is not a one-language wonder; it caters to a polyglot audience. Let&apos;s see how Node.js and Python version</p>]]></description><link>https://pixeljets.com/blog/web-scraping-playwright-python-nodejs/</link><guid isPermaLink="false">658f329214aa72faea11a953</guid><category><![CDATA[playwright]]></category><category><![CDATA[webscraping]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Fri, 29 Dec 2023 21:31:21 GMT</pubDate><content:encoded><![CDATA[<p>When diving into the world of automated browser testing and scraping with Playwright, one of the first decisions you&apos;ll encounter is the choice of programming language. Playwright is not a one-language wonder; it caters to a polyglot audience. Let&apos;s see how Node.js and Python version of Playwright compare.</p><h3 id="a-bit-of-a-history">A bit of a history</h3><p>Playwright was created by a guy who was one of authors of Puppeteer.js: Andrey Lushnikov (who was part of Chrome DevTools team back then). Playwright was built on the lessons of Puppeteer: it was cross-browser from scratch (while Puppeteer got Firefox experimental support only recently in 2023), it had better syntax, it had a lot of higher level tools e.g. for test runners.</p><h2 id="nodejs-or-python">Node.js or Python?</h2><h4 id="nodejs-the-native-habitat">Node.js: the native habitat</h4><p>Playwright was born in the Node.js ecosystem, making it a natural habitat for this tool. If you&apos;re scaling up, Node.js shines particularly brightly. Why, you ask? It boils down to process management.</p><p>Node.js&apos;s version of Playwright doesn&apos;t spawn a new node process for every browser instance. This is crucial when you&apos;re managing multiple browser instances simultaneously. If you&apos;ve got a system running various scripts at unpredictable intervals, you definitely want to avoid the overhead of spinning up a new node process each time.</p><p>I stumbled upon <a href="https://github.com/microsoft/playwright-python/issues/1850?ref=pixeljets.com">a GitHub issue</a> that put this into perspective. A developer was wrestling with the Python Playwright implementation, which, under the hood, was spinning up a separate node process for each instance. The result? A CPU and memory usage spikes.</p><p>Here is another GitHub issue where Playwright maintainers recommend Node.js for heavy lifting: <a href="https://github.com/microsoft/playwright-python/issues/1289?ref=pixeljets.com">[Question]: Performance benchmarking of python playwright versus node #1289</a> and <a href="https://github.com/microsoft/playwright/issues/27187?ref=pixeljets.com">another one.</a></p><h4 id="python-simplicity-meets-elegance">Python: simplicity meets elegance</h4><p>Python&apos;s implementation of Playwright, while elegant and simple for scripting, may not be the best companion for heavy lifting. Each call to <code>sync_playwright()</code> in Python fires up a new node process. Although Python is a delight for quick scripts and data analysis, this behavior might bog down your system when you&apos;re trying to keep it light on its feet.</p><h4 id="other-languages-a-world-of-choices">Other Languages: a world of choices</h4><p>It&apos;s worth noting that Playwright isn&apos;t just a two-player game. It extends its reach to other languages like Java and C#. If you&apos;re already working within these ecosystems, it makes sense to stick to your guns and use what you&apos;re comfortable with.</p><h4 id="my-recommendation-nodejs-for-scalability">My recommendation: Node.js for scalability</h4><p>For heavy-duty scraping and testing, especially at scale, Node.js is the recommended route. It keeps your CPU from sweating and your memory from overflowing. Moreover, Node.js being Playwright&apos;s native environment, you get first-class support and performance.</p><h2 id="code-example-blocking-image-download">Code example: blocking image download</h2><p>Let&apos;s get our hands dirty and take a look how basic code looks like in Node.js and Python versions of Playwright. Here&apos;s how you can block images when opening a page using both Node.js and Python with Playwright.</p><h3 id="playwright-nodejs-version">Playwright Node.js version &#xA0;</h3><pre><code class="language-javascript">const { chromium } = require(&apos;playwright&apos;); // or &apos;firefox&apos; or &apos;webkit&apos;

(async () =&gt; {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  // Block images
  await page.route(&apos;**/*.{png,jpg,jpeg}&apos;, route =&gt; route.abort());

  await page.goto(&apos;https://example.com&apos;);
  // ... you can perform actions on the page here

  await browser.close();
})();
</code></pre><p>In this Node.js example, we use Playwright&apos;s <code>route</code> method to intercept network requests and abort any that are for image resources.</p><h3 id="playwright-python-version">Playwright Python version </h3><p>For the Python version, you will need to have the Playwright Python package installed and the Playwright browser binaries downloaded. This can be done by running <code>pip install playwright</code> followed by <code>playwright install</code>.</p><pre><code class="language-python">import asyncio
from playwright.async_api import async_playwright

async def block_images_and_open_page():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        # Block images
        await page.route(&apos;**/*.{png,jpg,jpeg}&apos;, lambda route: route.abort())

        await page.goto(&apos;https://example.com&apos;)
        # ... you can perform actions on the page here

        await browser.close()

asyncio.run(block_images_and_open_page())
</code></pre><p>In this Python async example, we use the <code>async_playwright</code> context manager to handle the Playwright object lifecycle and <code>route</code> method to intercept and abort image requests.</p><p>Both snippets demonstrate how to block image loading, which can be particularly useful when scraping websites where images are not needed, thus speeding up page load times and reducing bandwidth usage.</p><h2 id="playwright-for-web-scraping-fingerprinting-and-stealth-modes">Playwright for web scraping: fingerprinting and stealth modes</h2><p>If we talk about using Playwright for real-world web scraping, this is where things get interesting, especially in the context of comparing Node.js with Python.</p><p>Official Playwright packages simply do not have a task to remove all traces of automation on the browser, as the primary goal of programmatically controlled browser was always UI testing, and not web scraping.</p><p>Most of big websites implement basic or sophisticated anti-scraping measures which try to fingerprint your web browser and block it if it looks like an automated one (so stock versions of Playwright are blocked by most of modern websites which have any kind of anti-scraping protection, even the most basic one). There are huge companies which are involved in detecting scrapers on multiple levels: </p><ul><li>on <a href="https://pixeljets.com/blog/bypass-cloudflare/">TLS fingerprint level</a>; </li><li>on <a href="https://lwthiker.com/networks/2022/06/17/http2-fingerprinting.html?ref=pixeljets.com">HTTP/2 level</a>; </li><li>and on <a href="https://fingerprint.com/blog/browser-fingerprinting-techniques/?ref=pixeljets.com">browser capabilities level</a>. </li></ul><p>For Node.js, there exists a whole ecosystem of stealth plugins, and you can use these stealth improvements to make your Playwright instance look more like a regular human browser: </p><h3 id="fingerprint-suite-from-apify">Fingerprint Suite from Apify</h3><p><a href="https://github.com/apify/fingerprint-suite?ref=pixeljets.com">https://github.com/apify/fingerprint-suite</a></p><h3 id="playwright-extra">Playwright Extra</h3><p>This is basically an interoperability layer with Puppeteer stealth packages.</p><p><a href="https://www.npmjs.com/package/playwright-extra?ref=pixeljets.com">https://www.npmjs.com/package/playwright-extra</a></p><p><strong>These two sets of tools are not a silver bullet and do not guarantee successful scraping process</strong>, for example Apify Suite and Playwright Extra perform much worse than <a href="https://scrapeninja.net/docs/?ref=pixeljets.com">ScrapeNinja /scrape-js</a> in terms of detection, but at least they are trying to help you in this regard.</p><p>Of course, Python developers want to have similar solutions for Playwright Python &#x2013; <a href="https://github.com/microsoft/playwright-python/issues/1744?ref=pixeljets.com">here is a Github issue about it, which was closed without any good resolution</a>.</p><p>If you still need a stealthy Python browser, your best bet is <a href="https://github.com/ultrafunkamsterdam/undetected-chromedriver?ref=pixeljets.com">Undetected Chromedriver</a> (it is based on heavily hacked Selenium)</p><h2 id="conclusion">Conclusion</h2><p>Choosing the right tool for the job is as important as the job itself. In the world of Playwright, Node.js offers the best scalability, stealth mode and performance, especially for complex, resource-intensive tasks like web scraping. Python may seduce with its syntax and simplicity, but when it comes to Playwright, the Node.js version holds the upper hand for scaling. Python and other language bindings have their place, but for the heavy-duty browser automation, Node.js is the way to go. <strong>But.. if you are comfortable with Python and you are new to Node.js ecosystem, I would definitely recommend to stick to the tool you know until you hit any real performance issues.</strong></p><p>If you liked this post, you will probably enjoy reading my blog post where I used <a href="https://pixeljets.com/blog/scrape-apple-com-for-refurbished-iphones-and-get-alerts/">low code to web scrape Apple.com for refurbished iPhones and to get push alerts</a> and how I built <a href="https://pixeljets.com/blog/puppeteer-api-web-scraping/">specialized browser API for web scraping</a>. </p>]]></content:encoded></item><item><title><![CDATA[Blocking images in Playwright]]></title><description><![CDATA[Blocking unnecessary resources in Playwright is a pretty easy task, thanks to builtin route() function.]]></description><link>https://pixeljets.com/blog/blocking-images-in-playwright/</link><guid isPermaLink="false">65873e2814aa72faea11a8d7</guid><category><![CDATA[playwright]]></category><category><![CDATA[webscraping]]></category><dc:creator><![CDATA[Anthony Sidashin]]></dc:creator><pubDate>Sat, 23 Dec 2023 20:51:57 GMT</pubDate><content:encoded><![CDATA[<h1></h1><p>In the summer of 2023, I had a client with a custom project where my task was to extract data from a notoriously dynamic single page application (SPA). The site was a huge bunch of JavaScript with robust anti-scraping measures. It felt like a digital game of cat and mouse, and I was the determined mouse. That&apos;s when Playwright, a framework for automated browser testing, came to the rescue. It wasn&apos;t just about rendering SPAs or managing dynamic content; Playwright excelled at bypassing JavaScript-based anti-scraping mechanisms.</p><p>I used <a href="https://playwright.dev/docs/intro?ref=pixeljets.com">Node.js version of Playwright</a> (it is also available as Python package, do not confuse these two!) and stealth npm packages to blend in seamlessly, making my scraping activities indistinguishable from regular browsing. The interesting part was that I used 4g proxies to avoid getting captchas, but it made page loads very slow &#x2013; so I did my best to increase the speed of page loads by blocking unneccesary media and images, and I am going to show you some code how this could be done.</p><h2 id="playwright-a-quick-overview">Playwright: A Quick Overview</h2><p>Playwright, for those who might not be familiar, is a framework for automated browser testing. But its utility extends far beyond that, especially in web scraping. I&apos;ve personally found it incredibly effective in handling SPAs and navigating through complex scenarios which requires button clicking and other web page interactions &#x2013; it simulates a user&apos;s interaction with a web page, making it a powerful tool for scraping dynamic content (my best one is endless scrolling!).</p><h2 id="the-role-of-route-in-playwright">The Role of route() in Playwright</h2><p>To block images on page download, you need to learn about one super-useful essential Playwright feature: <code>route()</code> function. This function allows us to intercept and modify network requests. In my journey with web scraping, I&apos;ve used <code>route()</code> in various scenarios &#x2013; from modifying request headers to injecting scripts. It offers an unprecedented level of control over how the browser interacts with a website, giving us the power to tailor our scraping strategy to specific needs. Compared to Puppeteer, another great Node.js package for programmatical browser control, Playwright <code>route()</code> function is much more concise and convenient. For example, it allows to use JS regexes out of the box! Please refer to <a href="https://playwright.dev/docs/api/class-page?ref=pixeljets.com#page-route">route docs</a>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://pixeljets.com/blog/content/images/2023/12/2023-12-24-at-00.45.png" class="kg-image" alt loading="lazy" width="1970" height="1418" srcset="https://pixeljets.com/blog/content/images/size/w600/2023/12/2023-12-24-at-00.45.png 600w, https://pixeljets.com/blog/content/images/size/w1000/2023/12/2023-12-24-at-00.45.png 1000w, https://pixeljets.com/blog/content/images/size/w1600/2023/12/2023-12-24-at-00.45.png 1600w, https://pixeljets.com/blog/content/images/2023/12/2023-12-24-at-00.45.png 1970w" sizes="(min-width: 720px) 720px"><figcaption>Examples how route() can be used in Playwright</figcaption></figure><h2 id="the-challenge-proxies-and-resource-management">The Challenge: Proxies and Resource Management</h2><p>As I&apos;ve said, I used 4g proxies for my project. &#xA0;They&apos;re essential for maintaining anonymity and bypassing IP-based restrictions. However, proxies can be slow and, if you&apos;re not careful, expensive in terms of traffic. This is where resource management becomes crucial.</p><h2 id="blocking-unnecessary-resources-in-playwright">Blocking unnecessary resources in Playwright</h2><p>In many scraping tasks, we don&apos;t need to load images, fonts, or even CSS. These resources consume bandwidth and slow down our scraping process. By blocking these resources, we can significantly speed up our scraping jobs and reduce costs.</p><h3 id="nodejs-playwright-code-to-block-images-css-and-fonts">Node.js Playwright code to block images, css, and fonts</h3><p>Here&apos;s a practical example of how to implement resource blocking in Playwright. This Node.js code uses top level async/await (so please use Node.js 16+) , ES6 imports (so don&apos;t forget to use <code>type:module</code> in package.json of your project), and the <code>route()</code> function to block images, fonts, and CSS.</p><pre><code class="language-javascript">import { chromium } from &apos;playwright&apos;;



const browser = await chromium.launch();
const page = await browser.newPage();

// Use route() to intercept and block certain types of resources
await page.route(&apos;**/*&apos;, (route) =&gt; {
    const resourceType = route.request().resourceType();
    if ([&apos;image&apos;, &apos;stylesheet&apos;, &apos;font&apos;].includes(resourceType)) {
        route.abort();
    } else {
        route.continue();
    }
});

// Navigate to the website
await page.goto(&apos;https://example.com&apos;);

// Perform your scraping tasks here

await browser.close();

</code></pre><p>In this code, we launch a Chromium browser, create a new page, and set up a route handler. The handler checks the resource type of each network request and aborts it if it&apos;s an image, stylesheet, or font. This significantly reduces the amount of data downloaded and processed, leading to faster and more cost-effective scraping.</p><h3 id="extending-blocking-get-rid-of-unneccesary-tracking-and-js">Extending blocking: get rid of unneccesary tracking and JS</h3><p>Let&apos;s top it up a notch and get rid of various tracking scripts which usually slow down the page loading times significantly.</p><pre><code class="language-javascript">import { chromium } from &apos;playwright&apos;;

// Using top-level async/await
(async () =&gt; {
  const browser = await chromium.launch();
  const page = await browser.newPage();

  // Define a list of common ad network patterns
  const adNetworkPatterns = [
    &apos;**/*doubleclick.net/**&apos;,
    &apos;**/*googleadservices.com/**&apos;,
    &apos;**/*googlesyndication.com/**&apos;,
    // Add more patterns as needed
  ];

  // Use route() to intercept and block certain types of resources and ad network requests
  await page.route(&apos;**/*&apos;, (route) =&gt; {
    const requestUrl = route.request().url();
    const resourceType = route.request().resourceType();

    // Check if the request is for an ad network
    const isAdRequest = adNetworkPatterns.some(pattern =&gt; requestUrl.match(pattern));

    if ([&apos;image&apos;, &apos;stylesheet&apos;, &apos;font&apos;].includes(resourceType) || isAdRequest) {
      route.abort();
    } else {
      route.continue();
    }
  });

  // Navigate to the website
  await page.goto(&apos;https://example.com&apos;);

  // Perform your scraping tasks here

  await browser.close();
})();
</code></pre><h2 id="conclusion">Conclusion</h2><p>Playwright&apos;s flexibility, particularly with functions like <code>route()</code>, makes it an incredible tool for web scraping and automation. By understanding and leveraging these capabilities, we can efficiently navigate through the complexities of modern web applications. Remember, in the world of web scraping, it&apos;s not just about getting the data; it&apos;s about getting it efficiently and responsibly. Keep exploring, and happy scraping!</p>]]></content:encoded></item></channel></rss>