<?xml version='1.0' encoding='utf-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>https://urara-demo.netlify.app/</id>
  <title><![CDATA[Daniel Liu]]></title>
  <icon>https://urara-demo.netlify.app/favicon.png</icon>
  <link href="https://urara-demo.netlify.app" />
  <link href="https://urara-demo.netlify.app/atom.xml" rel="self" type="application/atom+xml" />
  <updated>2026-01-30T00:12:52.474Z</updated>
  <author>
    <name><![CDATA[Daniel Liu]]></name>
  </author>
  <category term="Python" scheme="https://urara-demo.netlify.app/?tags=Python" />
  <category term="Machine Learning" scheme="https://urara-demo.netlify.app/?tags=Machine%20Learning" />
  <category term="SQL" scheme="https://urara-demo.netlify.app/?tags=SQL" />
  <category term="Work Experience" scheme="https://urara-demo.netlify.app/?tags=Work%20Experience" />
  <category term="Year Review" scheme="https://urara-demo.netlify.app/?tags=Year%20Review" />
  <category term="Monash" scheme="https://urara-demo.netlify.app/?tags=Monash" />
  <entry>
    <title type="html"><![CDATA[undefined]]></title>
    <link href="https://urara-demo.netlify.app/website-development" />
    <id>https://urara-demo.netlify.app/website-development</id>
    <published>2026-01-30T00:11:47.764Z</published>
    <updated>2026-01-30T00:11:47.764Z</updated>
    <content type="html">
      <![CDATA[<p data-svelte-h="svelte-1tq3su">****—title: “Building My Personal Website: The Journey”created: 2026-01-15</p> <h2 id="tags-web-development-portfolio" data-svelte-h="svelte-3k6rbk"><a href="#tags-web-development-portfolio">tags: [Web Development, Portfolio]</a></h2> <p data-svelte-h="svelte-t950xa">Over the past few days, I’ve embarked on a comprehensive redesign and improvement of my personal website. What started as a simple portfolio site has evolved into a fully-featured platform showcasing my projects, blog posts, and professional journey. This blog post hopes to document the design and technical decisions and lessons learned throughout the journey.</p> <hr> <h2 id="the-starting-point-inspiration-and-foundation" data-svelte-h="svelte-1wp69id"><a href="#the-starting-point-inspiration-and-foundation">The Starting Point: Inspiration and Foundation</a></h2> <h3 id="drawing-inspiration-from-sais-website" data-svelte-h="svelte-14cxwz6"><a href="#drawing-inspiration-from-sais-website">Drawing Inspiration from Sai’s Website</a></h3> <p data-svelte-h="svelte-7ig18h">I initially took inspiration from <a href="https://www.saikumarmk.com/" rel="nofollow noopener noreferrer external" target="_blank">Sai’s website</a>, which is also built using the <a href="https://github.com/importantimport/urara" rel="nofollow noopener noreferrer external" target="_blank">Urara</a> template. However, while Sai’s site provided a solid foundation and aesthetic direction, I made significant changes to tailor the website to my own preferences and needs, which I will later discuss in detail.</p> <h3 id="why-urara-and-sveltekit" data-svelte-h="svelte-1842duc"><a href="#why-urara-and-sveltekit">Why Urara and SvelteKit?</a></h3> <p data-svelte-h="svelte-1cya75r">I chose Urara as my foundation because:</p> <ol data-svelte-h="svelte-zrzvj0"><li><strong>Static site</strong>: Suitable for a simple blog/portfolio site</li> <li><strong>Markdown support</strong>: Useful for easy and clean blog posts</li> <li><strong>Tailwind CSS + DaisyUI</strong>: Rapid UI development with beautiful components</li></ol> <hr> <h2 id="design-decisions-and-implementation" data-svelte-h="svelte-1djp3p4"><a href="#design-decisions-and-implementation">Design Decisions and Implementation</a></h2> <h3 id="before-2023--after-2026" data-svelte-h="svelte-7mcucs"><a href="#before-2023--after-2026">Before (2023) &amp; After (2026)</a></h3> <p data-svelte-h="svelte-1ck5j79">Logging into my site in 2026, I felt it was outdated and aesthetically displeasing. I wanted to do something about this, but lacked the design or front-end skills necessary to make significant deviations from the boilerplate template. How did I overcome this? It’s simple, I began taking advantage of the AI capabilities of Cursor.</p> <p data-svelte-h="svelte-ol35o8">After countless hours of tinkering with new designs and features, I’m proud to showcase the significant strides my personal website has taken since its inception in 2023.</p> <p><img src="./beforeafterwebsite.jpg" alt="before_after_website" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <h3 id="homepage-layout-redesign" data-svelte-h="svelte-1uzlol3"><a href="#homepage-layout-redesign">Homepage Layout Redesign</a></h3> <p data-svelte-h="svelte-1o3dk1i">Hopefully, everyone agrees that the UI is significantly improved. To achieve the desired look, the homepage went through several iterations before settling on the final structure:</p> <p data-svelte-h="svelte-frayw5"><strong>Final Layout:</strong></p> <ol data-svelte-h="svelte-121vldt"><li><strong>Intro Section</strong>: Profile picture, personalized greeting (“Hi, I’m Daniel”), professional subtitle, and bio</li> <li><strong>Highlights</strong>: Quick-access cards for key professional pages (About Me, Projects, Resume)</li> <li><strong>Apps Section</strong>: Dedicated space for my created applications</li> <li><strong>Latest Posts</strong>: Grid of the four most recent blog posts</li></ol> <p data-svelte-h="svelte-1pf47d2"><strong>Key Design Choices:</strong></p> <ul data-svelte-h="svelte-efigq8"><li><strong>Spacing and Dividers</strong>: Added subtle horizontal dividers between sections to create visual separation without being too heavy</li> <li><strong>Responsive Design</strong>: Ensured all sections work seamlessly on mobile and desktop</li> <li><strong>Social Links Placement</strong>: Positioned “Say Hello” section with LinkedIn and GitHub icons</li></ul> <h3 id="about-me-page-from-blog-post-to-interactive-timeline" data-svelte-h="svelte-s8uvzq"><a href="#about-me-page-from-blog-post-to-interactive-timeline">About Me Page: From Blog Post to Interactive Timeline</a></h3> <p data-svelte-h="svelte-1kyqf0s">The About Me page transformation was one of the most significant changes:</p> <p data-svelte-h="svelte-4e4zt5"><strong>Before</strong>: A simple markdown blog post<br> <strong>After</strong>: An interactive, tabbed timeline with three sections:</p> <ul data-svelte-h="svelte-1h6agm0"><li>Experience (default)</li> <li>Education</li> <li>Extracurriculars</li></ul> <p data-svelte-h="svelte-1h6o1ur"><strong>Implementation Details:</strong></p> <ul data-svelte-h="svelte-15d62m8"><li>Each timeline entry is a card with:<ul><li>Category badge (EXPERIENCE/EDUCATION/EXTRAS) in the top-right corner</li> <li>Date range with primary color accent</li> <li>Bold title for hierarchy</li> <li>Description with bullets (<code>»</code>) instead of standard list items</li> <li>Adjusted text opacity (80% for descriptions) for better contrast</li></ul></li></ul> <p data-svelte-h="svelte-v9kram"><strong>Why This Design?</strong></p> <ul data-svelte-h="svelte-bhht01"><li><strong>Better UX</strong>: Users can quickly filter by category</li> <li><strong>Visual Hierarchy</strong>: Clear distinction between different types of experiences</li> <li><strong>Professional Appearance</strong>: Timeline format is familiar and easy to scan</li> <li><strong>Mobile-Friendly</strong>: Cards stack vertically on smaller screens</li></ul> <h3 id="projects-page-modern-grid-with-dual-actions" data-svelte-h="svelte-bb1mjd"><a href="#projects-page-modern-grid-with-dual-actions">Projects Page: Modern Grid with Dual Actions</a></h3> <p data-svelte-h="svelte-18v24gn">The projects page received a complete visual overhaul:</p> <p data-svelte-h="svelte-mgnddi"><strong>Before</strong>: A simple list of my projects with basic links<br> <strong>After</strong>: A grid layout with improved project cards featuring:</p> <ul data-svelte-h="svelte-1jwtrkh"><li>Responsive 3-column grid (1 column on mobile,  3 on desktop)</li> <li>Project cards with images, titles, descriptions, and technology tags</li> <li>Action buttons: GitHub and “Try it out”</li> <li>Hover effects</li></ul> <p data-svelte-h="svelte-1h6o1ur"><strong>Implementation Details:</strong></p> <ul data-svelte-h="svelte-b6qi83"><li>Each project card includes:<ul><li>New project logo in a container</li> <li>Title and description with proper typography hierarchy</li> <li>Technology tag</li> <li>Two action buttons at the bottom: GitHub and demo link</li> <li>Hover effects: image zoom and title colour change to primary</li></ul></li></ul> <p data-svelte-h="svelte-v9kram"><strong>Why This Design?</strong></p> <ul data-svelte-h="svelte-1vfwxxe"><li><strong>Dual Actions</strong>: Users can either view the code (GitHub) or try the live demo, providing flexibility</li> <li><strong>Visual Consistency</strong>: Cards match the overall site aesthetic with rounded corners and shadows</li> <li><strong>Clear Hierarchy</strong>: Tags and buttons are clearly separated with adequate spacing for better readability</li></ul> <hr> <h2 id="feedback-form-development" data-svelte-h="svelte-1pzlotu"><a href="#feedback-form-development">Feedback Form Development</a></h2> <h3 id="choosing-the-right-solution" data-svelte-h="svelte-5wope0"><a href="#choosing-the-right-solution">Choosing the Right Solution</a></h3> <p data-svelte-h="svelte-136vfjp">After receiving feedback from my roommate about the website’s usability, I realised there was still work to be done to make the site more intuitive. I needed a structured way to collect responses from friends and visitors to identify pain points and improvement opportunities. This led me to explore feedback form solutions that could be integrated into my website.</p> <p data-svelte-h="svelte-iv95z">I explored several feedback form services, including Survicate, Usersnap, and Zoho Forms, but all required payment after trial periods. I ultimately settled on <a href="https://tally.so" rel="nofollow noopener noreferrer external" target="_blank">Tally.so</a>. It’s free, offers easy integration with simple embed code, flexible form design, and no backend requirements as all submissions are handled by Tally.</p> <h3 id="smart-timing-and-implementation" data-svelte-h="svelte-1im99o"><a href="#smart-timing-and-implementation">Smart Timing and Implementation</a></h3> <p data-svelte-h="svelte-supda6">One of the most important decisions was <strong>when</strong> to show the feedback prompt. The goal was to capture feedback from users who had actually explored the site, formed opinions, and genuinely cared about providing feedback. The challenge was balancing timing. Show it too early, and you get annoying, low-quality feedback; show it too late and users have already left.</p> <p data-svelte-h="svelte-qi8yne">I implemented an engagement-based system where the prompt only appears after all of these criteria are met:</p> <ol data-svelte-h="svelte-mbivgn"><li><strong>Scroll Depth</strong>: User has scrolled at least 50% down on at least one page (tracked across page navigations)</li> <li><strong>Time on Site</strong>: User has been on the current page for at least 20 seconds</li> <li><strong>Page Views</strong>: User has visited at least 2 pages (homepage + another page)</li></ol> <p data-svelte-h="svelte-cuqjsq">Engagement metrics are tracked via <code>localStorage</code> to persist across page navigations. The prompt appears as a small card in the bottom-right corner that users can exit with an X button (won’t show again after dismissal). Once criteria are met, it fades in smoothly, and clicking “Share Feedback” opens the Tally pop-up form.</p> <p><img src="./feedbackredirect.jpg" alt="feedbackredirect" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <hr> <h2 id="technical-implementation-api-integration-and-deployment" data-svelte-h="svelte-1ilfm95"><a href="#technical-implementation-api-integration-and-deployment">Technical Implementation: API Integration and Deployment</a></h2> <h3 id="the-nba-position-predictor-integration" data-svelte-h="svelte-1bvglyj"><a href="#the-nba-position-predictor-integration">The NBA Position Predictor Integration</a></h3> <p data-svelte-h="svelte-1jwhrul"><strong>Motivation:</strong> One of the most interesting technical challenges was integrating my <a href="https://www.danielliu.xyz/projects/positionn/" rel="nofollow noopener noreferrer external" target="_blank">NBA Position Predictor application</a> into the website. I had previously created a <a href="https://positionn.streamlit.app/" rel="nofollow noopener noreferrer external" target="_blank">streamlit app</a>, but it’s cold startup (running the app after prolonged downtime) took ~30 seconds. This was unacceptable.</p> <p data-svelte-h="svelte-1g8qela">I decided to reuse my ML models by building a FastAPI backend API and creating a new frontend design that integrates into my personal site. After initial tests, my NBA Position Predictor now takes less than 1 second.</p> <p data-svelte-h="svelte-175pbuc"><strong>Initial Approach</strong>: Embedding the streamlit applicaiton via <code>&lt;iframe&gt;</code></p> <ul data-svelte-h="svelte-8om5m9"><li><strong>Rejected</strong>: Same loading issues as before and lack of design control</li></ul> <p data-svelte-h="svelte-cqk2km"><strong>Final Approach</strong>: Native SvelteKit UI with separate FastAPI backend</p> <ul data-svelte-h="svelte-1u7ou6k"><li><strong>Benefits</strong>:<ul><li>Full control over UI/UX</li> <li>Better performance (~30 seconds -&gt; ~0.5 seconds)</li> <li>Seamless integration with site design</li></ul></li></ul> <p data-svelte-h="svelte-6ru5g9"><strong>Implementation:</strong></p> <ul data-svelte-h="svelte-8xdo1"><li>Connected to an external FastAPI backend deployed at <code>https://positionn-api.fly.dev</code>. The code can be found in my <a href="https://github.com/danielliu2707/positionn" rel="nofollow noopener noreferrer external" target="_blank">positionn repo</a>. FastAPI was chosen for its straightforward deployment process across most API hosting providers.</li> <li>Created <code>/projects/positionn</code> route with a custom SvelteKit page</li> <li>Two input modes: Player Statistics and Player Dimensions</li> <li>Client-side validation with range checking (e.g., 0-80 for points, 100-250cm for height)</li> <li>Info tooltips explaining valid ranges for each field</li></ul> <h3 id="api-provider-journey-from-render-to-flyio" data-svelte-h="svelte-krg5ib"><a href="#api-provider-journey-from-render-to-flyio">API Provider Journey: From Render to Fly.io</a></h3> <p data-svelte-h="svelte-1ko82mp">The backend deployment process then involved evaluating multiple hosting providers:</p> <p data-svelte-h="svelte-edjiln"><strong>Render (Initial Choice)</strong></p> <ul data-svelte-h="svelte-1019mjs"><li><strong>Pros</strong>:<ul><li>Free</li> <li>Simple deployment: auto-deploys with GitHub commits</li></ul></li> <li><strong>Cons</strong>:<ul><li>Very slow cold starts (~50 seconds)</li> <li>Costs ~$19/month for the API to run continually (i.e. no cold starts)</li></ul></li></ul> <p data-svelte-h="svelte-zd59mo"><strong>Fly.io (Final Choice)</strong></p> <ul data-svelte-h="svelte-1ycg714"><li><strong>Pros</strong>:<ul><li><strong>Cost-Effective</strong>: ~$3/month for a 512MB RAM, 1 shared CPU instance</li> <li><strong>Low Cold Starts</strong>: ~0.8 seconds</li> <li><strong>Auto-Stop Configuration</strong>: Machines suspend when idle, saving money</li> <li><strong>Scalability</strong>: Easy to adjust resources (e.g., 256MB to 512MB RAM)</li> <li><strong>Simple Deployment</strong>: Command-line based, straightforward workflow</li></ul></li> <li><strong>Cons</strong>:<ul><li><strong>No auto-deploy</strong>: No auto-deployment with GitHub commits (at least I couldn’t find it)</li></ul></li></ul> <p data-svelte-h="svelte-kmflpb"><strong>Configuration Details:</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default" toml="true"><div class="language-id">toml</div><div class='code-container'><code><div class='line'>auto_stop_machines = suspend  # Saves money when idle. Not running 24/7.</div><div class='line'>min_machines_running = 0   # Can scale to zero</div><div class='line'>cpu_kind = "shared"        # Cost-optimised</div><div class='line'>cpus = 1                   # Adequate for ML model</div><div class='line'>memory_mb = 512            # Adequate for ML model</div></code></div></pre><!-- HTML_TAG_END --> <p data-svelte-h="svelte-5b0fg2"><strong>Trade-offs:</strong></p> <ul data-svelte-h="svelte-b7zkjh"><li>✅ Low cost (~$0.50-2/month with suspension of machines)</li> <li>✅ Acceptable cold start time (~0.8 seconds)</li> <li>❌ No automatic GitHub deployment (manual <code>fly deploy</code> required)</li> <li>❌ Slightly more setup than fully managed platforms</li></ul> <p data-svelte-h="svelte-1qy6egp"><strong>Deployment Process:</strong></p> <ol data-svelte-h="svelte-4nchdb"><li>Login: <code>fly auth login</code></li> <li>Launch: <code>fly launch</code> (uses existing <code>fly.toml</code>) - only for creating a new app for the first time</li> <li>Deploy: Automatic or manual <code>fly deploy</code> - for deploying updates to an existing app</li> <li>Monitor: <code>fly logs</code> and <code>fly status</code></li></ol> <p data-svelte-h="svelte-1qd3268">The manual deployment process, while not as seamless as GitHub, is straightforward enough that redeploying model improvements only takes a few terminal commands.</p> <hr> <h2 id="development-challenges-and-solutions" data-svelte-h="svelte-10iced8"><a href="#development-challenges-and-solutions">Development Challenges and Solutions</a></h2> <h3 id="theme-management-preventing-flash" data-svelte-h="svelte-1lwuh96"><a href="#theme-management-preventing-flash">Theme Management: Preventing Flash</a></h3> <p data-svelte-h="svelte-1rjjodf"><strong>Problem</strong>: Theme would flash from dark to light on page load<strong>Root Cause</strong>: Theme was being set in JavaScript after the page load, causing a flash</p> <p data-svelte-h="svelte-5o9stu"><strong>Solution</strong>:</p> <ol data-svelte-h="svelte-1pqx1kj"><li>Added a blocking script in <code>app.html</code> that reads <code>localStorage</code> and sets <code>data-theme</code> synchronously in the <code>&lt;head&gt;</code></li> <li>Initialised theme state in the header component by reading the <code>data-theme</code> attribute</li></ol> <h3 id="multiple-page-component-conflicts" data-svelte-h="svelte-1l2rill"><a href="#multiple-page-component-conflicts">Multiple Page Component Conflicts</a></h3> <p data-svelte-h="svelte-ni0dvy"><strong>Problem</strong>: <code>Multiple page component files found in src/routes/about_me : +page.svelte and +page.svelte.md</code> <strong>Root Cause</strong>: Urara’s <code>pnpm build</code> was copying markdown files from <code>urara/</code> to <code>src/routes/</code>, conflicting with manually created Svelte pages<strong>Solution</strong>: Removed the source markdown file from <code>urara/about_me/</code> to prevent conflicts</p> <h3 id="image-path-management" data-svelte-h="svelte-hlm6p7"><a href="#image-path-management">Image Path Management</a></h3> <p data-svelte-h="svelte-11n27l0"><strong>Challenge</strong>: Non-consistent relative vs. absolute paths for images in markdown files<strong>Solution</strong>: Used relative paths within each blog, ensuring images are copied correctly during the build process</p> <hr> <h2 id="the-software--tools-that-made-it-all-possible" data-svelte-h="svelte-1maasd2"><a href="#the-software--tools-that-made-it-all-possible">The Software &amp; Tools That Made It All Possible</a></h2> <p data-svelte-h="svelte-1ivybn">Throughout this rebuild, I ended up touching far more tools and services than I initially expected, each playing a role somewhere between development, deployment, hosting, and user experience.</p> <p data-svelte-h="svelte-jqy9qf">Here’s a concise summary of the stack involved in bringing my website to life. The important tools are bolded:</p> <h3 id="frontend-framework--theme" data-svelte-h="svelte-11p8klt"><a href="#frontend-framework--theme">Frontend Framework &amp; Theme</a></h3> <ul data-svelte-h="svelte-vmvxt5"><li><strong>SvelteKit + Urara Template</strong> — Core framework + blog/portfolio scaffolding</li> <li>Tailwind CSS + DaisyUI — Rapid UI development + pre-built components</li></ul> <h3 id="hosting-domains--deployment" data-svelte-h="svelte-1r2sc22"><a href="#hosting-domains--deployment">Hosting, Domains &amp; Deployment</a></h3> <ul data-svelte-h="svelte-1ds4ady"><li><strong>Vercel</strong> — Hosts the main static site (fast, free)</li> <li>Namecheap — Purchased the custom domain danielliu.xyz</li> <li><strong>Fly.io</strong> — Deploys the FastAPI backend powering the NBA Position Predictor</li> <li>(Considered: Render — rejected due to slow cold starts and higher sustained cost)</li></ul> <h3 id="backendapi--ml" data-svelte-h="svelte-1ic0d41"><a href="#backendapi--ml">Backend/API / ML</a></h3> <ul data-svelte-h="svelte-79dx3h"><li>Python + scikit-learn — For model training used in the Position Predictor</li> <li><strong>FastAPI</strong> — Lightweight backend for ML inference</li> <li>Streamlit (original version) — Earlier iteration before moving to API architecture</li></ul> <h3 id="feedback--user-input" data-svelte-h="svelte-wpv3ek"><a href="#feedback--user-input">Feedback &amp; User Input</a></h3> <ul data-svelte-h="svelte-1t6mjrm"><li><strong>Tally.so</strong> — Collects structured website feedback through an embedded form with pop-up UI</li> <li>(Considered: Survicate, Usersnap, Zoho — all require payment after trial or lacked business email to signup)</li></ul> <h3 id="version-control--collaboration" data-svelte-h="svelte-1agqk36"><a href="#version-control--collaboration">Version Control &amp; Collaboration</a></h3> <ul data-svelte-h="svelte-1nr960g"><li>Git + GitHub — Source control and repo hosting</li></ul> <h3 id="design--content-creation" data-svelte-h="svelte-uqu8si"><a href="#design--content-creation">Design &amp; Content Creation</a></h3> <ul data-svelte-h="svelte-1pmnup2"><li>Cursor — AI-assisted development</li> <li>Canva — Created visuals for before/after comparisons and project previews</li></ul> <hr> <h2 id="key-learnings-and-best-practices" data-svelte-h="svelte-12vpv1j"><a href="#key-learnings-and-best-practices">Key Learnings and Best Practices</a></h2> <h3 id="1-start-with-a-template-customize-extensively" data-svelte-h="svelte-1kbsd21"><a href="#1-start-with-a-template-customize-extensively">1. Start with a Template, Customize Extensively</a></h3> <p data-svelte-h="svelte-16q81mj">Using Urara as a foundation saved significant time, but the magic came from personalisation. Don’t be afraid to deviate from the template to match your vision.</p> <h3 id="2-prioritise-user-experience" data-svelte-h="svelte-owdifb"><a href="#2-prioritise-user-experience">2. Prioritise User Experience</a></h3> <p data-svelte-h="svelte-12cj407">Every design decision should consider:</p> <ul data-svelte-h="svelte-1muv7s6"><li><strong>Mobile responsiveness</strong>: Test on multiple screen sizes</li> <li><strong>Loading performance</strong>: Reduce running time and consider cost-tradeoffs</li> <li><strong>Visual hierarchy</strong>: Clear distinction between content types</li></ul> <h3 id="3-api-provider-selection-criteria" data-svelte-h="svelte-1i43dit"><a href="#3-api-provider-selection-criteria">3. API Provider Selection Criteria</a></h3> <p data-svelte-h="svelte-pngty6">When choosing a hosting provider for APIs:</p> <ul data-svelte-h="svelte-1vom4ei"><li><strong>Cold start time</strong>: Critical for user experience</li> <li><strong>Cost</strong>: Balance between features and budget</li> <li><strong>Ease of deployment</strong>: Consider your workflow preferences</li> <li><strong>Scalability</strong>: Plan for future growth</li></ul> <h3 id="4-version-control-and-deployment" data-svelte-h="svelte-17zue3b"><a href="#4-version-control-and-deployment">4. Version Control and Deployment</a></h3> <ul data-svelte-h="svelte-1211fy2"><li><strong>Git</strong>: Use feature branches for major changes</li> <li><strong>Commit messages</strong>: Be descriptive about what changed and why. This comes in handy when developing over many years</li></ul> <hr> <h2 id="future-improvements" data-svelte-h="svelte-1idni06"><a href="#future-improvements">Future Improvements</a></h2> <p data-svelte-h="svelte-1hk8qs9">While the website is now fully functional, there are always opportunities for enhancement:</p> <ol data-svelte-h="svelte-18fmv8u"><li><p><strong>Performance Optimization</strong>:</p> <ul><li>Image optimization (compress and use responsive images for different screen sizes) and lazy loading (load images when they enter viewpoint to reduce initial page load time)</li></ul></li> <li><p><strong>API Enhancements</strong>:</p> <ul><li>Add more applications</li> <li>Implement caching strategies to reduce API response time. Repeated requests for identical inputs return cahced results</li></ul></li></ol> <hr> <h2 id="conclusion" data-svelte-h="svelte-kmpttn"><a href="#conclusion">Conclusion</a></h2> <p data-svelte-h="svelte-82qi8z">Building this personal website has been an incredibly rewarding experience. From initial inspiration to final deployment. The journey through different API providers, design iterations, and technical challenges has provided valuable insights that I’ll carry forward into future projects.</p> <p data-svelte-h="svelte-1mxiss5">The website now serves as both a portfolio and summary of my professional journey. It’s a project that intend to evolve as I advance in my career.</p> <p data-svelte-h="svelte-aimo47">If you’re building your own personal website, I hope this post provides some useful insights and inspiration.</p>]]>
    </content>
  </entry>
  <entry>
    <title type="html"><![CDATA[Building Positionn: ML for NBA Position Prediction]]></title>
    <link href="https://urara-demo.netlify.app/positionn-development" />
    <id>https://urara-demo.netlify.app/positionn-development</id>
    <published>2026-01-25T00:00:00.000Z</published>
    <updated>2026-01-30T00:11:47.760Z</updated>
    <content type="html">
      <![CDATA[<p data-svelte-h="svelte-pusb9g">For as long as I can remember, I have been in love with basketball.</p> <p data-svelte-h="svelte-skomlm">While I could never become an NBA player myself, us data fanatics are fortunate there exists an abundance of rich basketball data. This opens up a world of interesting projects, with <a href="https://www.danielliu.xyz/projects/positionn/" rel="nofollow noopener noreferrer external" target="_blank">positionn</a> serving as an app to predict your ideal NBA position and the players you most closely resemble.</p> <p data-svelte-h="svelte-nolld4">This article provides a comprehensive overview of the process, learnings, and improvements from months of model development using machine learning techniques.</p> <hr> <h2 id="step-1-data-retrieval" data-svelte-h="svelte-1ddjypr"><a href="#step-1-data-retrieval">Step 1: Data Retrieval</a></h2> <p data-svelte-h="svelte-fegqwa">The <a href="https://github.com/swar/nba_api" rel="nofollow noopener noreferrer external" target="_blank">NBA API</a> is a free-to-use Python package to access the APIs for NBA.com. Without going into too much detail, as it isn’t the premise of this article, I wrote a <a href="https://github.com/danielliu2707/positionn/blob/main/01_scrape_data.py" rel="nofollow noopener noreferrer external" target="_blank">script</a> that fetches and preprocesses player data between the 2015-2026 NBA seasons.</p> <p data-svelte-h="svelte-l2vwio">The data includes:</p> <ul data-svelte-h="svelte-eod7h"><li><strong>Basic player statistics:</strong> points, rebounds, assists, steals, blocks etc, to build the model on.</li> <li><strong>Player headshots:</strong> to display NBA players with similar statistics using their headshot images.</li></ul> <hr> <h2 id="step-2-data-preprocessing--feature-engineering" data-svelte-h="svelte-tp2k5t"><a href="#step-2-data-preprocessing--feature-engineering">Step 2: Data preprocessing &amp; feature engineering</a></h2> <p data-svelte-h="svelte-2tp5n5">While the NBA API provided high-quality data, there were a few notable issues which I remedied in the <a href="https://github.com/danielliu2707/positionn/blob/main/02_stats_preprocessing.ipynb" rel="nofollow noopener noreferrer external" target="_blank">preprocessing notebook</a>:</p> <ul data-svelte-h="svelte-1tb69nu"><li>My data retrieval script often crashed from the size of the dataset requests posted to the NBA API, forcing data retrieval in batches. I subsequently merged these batches in this stage of preprocessing.</li> <li>I computed commonly used advanced metrics, such as assist-to-turnover-ratio, stocks, and FIC, all of which are calculations formed from the basic statistics I already had.</li> <li>The assist-to-turnover-ratio had 20 observations with an <code>Inf</code> value. This occurs when the NBA player recorded 0 turnovers in a season, resulting in a division by 0, creating <code>Inf</code> values. These records were removed.</li></ul> <hr> <h2 id="step-3-problem-definition" data-svelte-h="svelte-67eubd"><a href="#step-3-problem-definition">Step 3: Problem Definition</a></h2> <p data-svelte-h="svelte-puky0v">The objective of Positionn is to accurately predict the position (Guard, Forward, Center) a user would play based on their basic statistics. Using the retrieved and preprocessed data, we could train a model to assist us with this <strong>multi-class classification</strong> problem.</p> <p data-svelte-h="svelte-3svoc8">Before any training, I discovered the dataset was mildly imbalanced, with 1,781 Guards, 1476 Forwards, and 559 Centers. Since we want fair predictive performance across all classes, I decided to proceed using the following metrics for imbalanced datasets:</p> <ul data-svelte-h="svelte-v1fz9n"><li><strong>Balanced accuracy</strong>: Averages how well the model correctly identifies each class, so rare classes count just as much as common ones.</li> <li><strong>F1 macro</strong>: Computes the F1 score for each class independently (balancing precision and recall per class) and then takes the average. This treats all classes equally regardless of their frequency, making it suitable for imbalanced datasets.</li></ul> <hr> <h2 id="step-4-feature-selection" data-svelte-h="svelte-1py028k"><a href="#step-4-feature-selection">Step 4: Feature Selection</a></h2> <p data-svelte-h="svelte-kr4lk6">To ensure the importance of player statistics in my dataset, I used the <code>SelectKBest()</code>, <code>mutual_info_clasif()</code>, and <code>SelectKBest(score_func=chi2)</code> methods to give a general score to each feature (see below).</p> <p><img src="./feature-importance.png" alt="feature-importance" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <ul data-svelte-h="svelte-1s8et2u"><li>From the visualisations, I decided to prune out <code>age</code> and <code>year</code>. These were consistently the least important attributes across all three importance measures.</li></ul> <hr> <h2 id="step-5-training--comparing-different-models" data-svelte-h="svelte-8uitks"><a href="#step-5-training--comparing-different-models">Step 5: Training &amp; Comparing different models</a></h2> <p data-svelte-h="svelte-1i6o9hp">To ensure a sufficient description of the major types of models I tried, I will separate the training process description into steps for the <strong>basic models</strong> and <strong>gradient boosting models</strong>.</p> <h3 id="basic-models" data-svelte-h="svelte-121kr6m"><a href="#basic-models">Basic Models</a></h3> <p data-svelte-h="svelte-oov0a3">The initial model in production, a support vector machine with gridsearch hyperparameter tuning, was the best performing model in terms of its <strong>nested cross-validation</strong> balanced accuracy and f1 scores. This was selected from a subset of model families which include:</p> <ul data-svelte-h="svelte-199x9l2"><li>Logistic Regression</li> <li>Decision Tree</li> <li>Random Forest</li> <li>Histogram Gradient Boosting</li> <li>Support Vector Machine</li></ul> <p data-svelte-h="svelte-1xuy2xk">To ensure a fair evaluation process, all models were trained using the same general process. I will now describe <code>PlayerStatisticsModel</code>, a class with the methods for training &amp; evaluating these basic models.</p> <p data-svelte-h="svelte-1ynzsp7"><strong>First, I defined the initialisation of the <code>PlayerStatisticsModel</code> class.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>class PlayerStatisticsModel:</div><div class='line'>    def __init__(</div><div class='line'>        self, model, preprocessor: ColumnTransformer,</div><div class='line'>        data: pd.DataFrame, target: pd.Series, cv: int</div><div class='line'>    ):</div><div class='line'>        """</div><div class='line'>         Initialisation of PlayerStatisticsModel class.</div><div class='line'>        </div><div class='line'>         Inputs:</div><div class='line'>          - model: sklearn machine learning model class</div><div class='line'>          - preprocessor (ColumnTransformer): preprocessing steps</div><div class='line'>          - data: input features</div><div class='line'>          - target: input target</div><div class='line'>          - cv: number of cv folds to estimate generalisability of model</div><div class='line'>        """</div><div class='line'>        self.metrics = ["balanced_accuracy", "f1_macro"]  # metrics to evaluate/fit models</div><div class='line'>        self.cv = cv  # number of cv folds</div><div class='line'>        self.results = pd.DataFrame()   # dataframe to store performance results of models</div><div class='line'>        self.data = data</div><div class='line'>        self.target = target</div><div class='line'>        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(data, target, test_size = 0.25, stratify=target, random_state=42)</div><div class='line'>        self.preprocessor = preprocessor</div><div class='line'>        self.model = Pipeline(</div><div class='line'>            steps=[</div><div class='line'>                ("preprocessor", preprocessor),</div><div class='line'>                ("model", model)</div><div class='line'>            ]</div><div class='line'>        )</div></code></div></pre><!-- HTML_TAG_END --> <ul data-svelte-h="svelte-x8wmqo"><li>Notably, every model trained using the <code>PlayerStatisticsModel</code> class will use the same train/test split as we set <code>random_state=42</code>.</li> <li>The Scikit-Learn <code>model</code> passed to the class object, such as <code>LogisticRegression()</code> or <code>DecisionTreeClassifier()</code>, fits into the <code>Pipeline()</code>, with preprocessing (i.e. standard scaling) applied.</li></ul> <p data-svelte-h="svelte-amjhn2"><strong>Next, I defined the <code>.cv_score()</code> method to run basic cross-validation on `self.model.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>def cv_score(self, model_name: str) -&gt; tuple[dict[list], dict[list]]:</div><div class='line'>    """</div><div class='line'>     Runs cross-validation on the default model (i.e. without hyperparameter tuning), obtaining the performance of the model.</div><div class='line'></div><div class='line'>     Inputs:</div><div class='line'>      - model_name (str): name of the model that will exist as its index in the self.results df</div><div class='line'></div><div class='line'>     Returns:</div><div class='line'>      - dict_scores_agg (dict[list]): dictionary containing the mean & std results for each model metric</div><div class='line'>      - dict_scores_folds (dict[list]): dictionary containing the model metric results for each fold</div><div class='line'>    """   </div><div class='line'>    # get CV scores for default model</div><div class='line'>    dict_scores_agg = dict()</div><div class='line'>    dict_scores_folds = dict()</div><div class='line'>    for metric in self.metrics:</div><div class='line'>        cv_results = cross_validate(self.model, data, target, cv=self.cv, scoring=metric)</div><div class='line'>        scoring = cv_results['test_score']</div><div class='line'>        dict_scores_agg[metric] = (scoring.mean(), scoring.std())</div><div class='line'>        dict_scores_folds[metric] = scoring</div><div class='line'>    self._add_model_results(dict_scores_agg, model_name)</div><div class='line'>    return dict_scores_agg, dict_scores_folds</div></code></div></pre><!-- HTML_TAG_END --> <ul data-svelte-h="svelte-1h93njq"><li>You will notice the results from each cross-validation fold is stored in dictionaries <code>dict_scores_agg</code> and <code>dict_scores_folds</code>. These store the average and fold-specific balanced accuracy and f1 scores of the model.</li> <li>The results are then added to a record of different model performances by calling the <code>self._add_model_results()</code> helper function.</li></ul> <p data-svelte-h="svelte-1v3ro9j"><strong>Since we will continue to use <code>self._add_model_results()</code> helper method, let’s formally define it.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>def _add_model_results(self, model_result: dict[list], model_name: str):</div><div class='line'>    """</div><div class='line'>     Anonymous method that concatenates the mean and standard deviation cross-validation score for all specified metrics of a model onto a </div><div class='line'>     pre-existing dataframe (self.results). This enables us to quickly</div><div class='line'>     compare the cv-performance of models as we build them.</div><div class='line'></div><div class='line'>     Inputs:</div><div class='line'>      - model_result (dict[list]): dictionary containing the mean & std results for each model metric</div><div class='line'>      - model_name (str): name of the model that will exist as its index in the self.results df</div><div class='line'>    """   </div><div class='line'>    # Gets model results as df to append to self.results</div><div class='line'>    df = pd.DataFrame(model_result)</div><div class='line'></div><div class='line'>    # extract rows and add suffixes to column names, then combine</div><div class='line'>    mean_row = df.iloc[0].rename(lambda col: f"&#123;col&#125;_mean")</div><div class='line'>    std_row = df.iloc[1].rename(lambda col: f"&#123;col&#125;_std")</div><div class='line'>    flattened_series = pd.concat([mean_row, std_row])</div><div class='line'>    flattened_df = pd.DataFrame([flattened_series])</div><div class='line'></div><div class='line'>    # set the index to the model name</div><div class='line'>    flattened_df.index = [model_name]</div><div class='line'></div><div class='line'>    # if model already exists in results df, then replace. Otherwise, append.</div><div class='line'>    if model_name in list(self.results.index):</div><div class='line'>        self.results.loc[model_name] = flattened_df.iloc[0]</div><div class='line'>    else:</div><div class='line'>        self.results = pd.concat([self.results, flattened_df])</div></code></div></pre><!-- HTML_TAG_END --> <ul data-svelte-h="svelte-1vla84p"><li>This helper method appends the mean and standard deviation balanced accuracy and f1 cross-validation scores to <code>self.results</code>, a dataframe to quickly compare the cv performance of models as we build them.</li></ul> <p data-svelte-h="svelte-qy8thq"><strong>We define another convenience method, <code>.print_cv_results()</code>, which prints the mean and standard deviation cross-validation score in a pretty format.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>def print_cv_results(self, dict_results:dict[tuple]):</div><div class='line'>    """</div><div class='line'>     Prints the mean and standard deviation cross-validation score for a specified metrics in a pretty format.</div><div class='line'>    </div><div class='line'>     Inputs:</div><div class='line'>      - dict_results (dict[tuple]): the mean & std results of a model on all evaluation metrics</div><div class='line'>    """</div><div class='line'>    for metric, value in dict_results.items():</div><div class='line'>        print(f"The mean cross-validation &#123;metric&#125; score is: "</div><div class='line'>        f"&#123;value[0]:.3f&#125; ± &#123;value[1]:.3f&#125;")</div></code></div></pre><!-- HTML_TAG_END --> <p data-svelte-h="svelte-1loq5ol"><strong>Next, we define a general method <code>.grid_search_cv_score()</code> to fit and tune a models hyperparameters with gridsearch, performing <em>nested cross-validation</em> to estimate its balanced accuracy and f1 on unseen data.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>def grid_search_cv_score(self, model_name: str, param_grid: dict):</div><div class='line'>    """</div><div class='line'>     Fits a model with grid_search to tune its hyperparameters. Then performs</div><div class='line'>     nested cross-validation to obtain an estimate of the generalisability</div><div class='line'>     of the model & variance of this estimate.</div><div class='line'></div><div class='line'>     Inputs:</div><div class='line'>      - model_name (str): name of the model that will exist as its index in the self.results df</div><div class='line'>      - param_grid (dict): grid of hyperparameters to iterate through</div><div class='line'>     </div><div class='line'>     Returns:</div><div class='line'>      - dict_scores_agg (dict[list]): dictionary containing the mean & std results for each model metric</div><div class='line'>      - dict_scores_folds (dict[list]): dictionary containing the model metric results for each fold</div><div class='line'>      - best_model: sklearn model of best performing model after gridsearch hyperparameter tuning</div><div class='line'>    """</div><div class='line'>    with self._suppress_warnings_and_output():</div><div class='line'>        model_grid_search = GridSearchCV(self.model, param_grid=param_grid, n_jobs=-1, scoring=self.metrics, refit=self.metrics[0], verbose=1)</div><div class='line'></div><div class='line'>        # fit grid search on training data to extract best model</div><div class='line'>        model_grid_search.fit(self.X_train, self.y_train)</div><div class='line'>        best_model = model_grid_search.best_estimator_</div><div class='line'></div><div class='line'>        # get nested CV scores for best model</div><div class='line'>        dict_scores_agg = dict()</div><div class='line'>        dict_scores_folds = dict()</div><div class='line'></div><div class='line'>        # perform nested cv for each metric, storing results in dictionaries</div><div class='line'>        for metric in self.metrics:</div><div class='line'>            cv_results = cross_validate(model_grid_search, data, target, cv=self.cv, n_jobs=-1, scoring=metric)</div><div class='line'>            scoring = cv_results['test_score']</div><div class='line'>            dict_scores_agg[metric] = (scoring.mean(), scoring.std())</div><div class='line'>            dict_scores_folds[metric] = scoring</div><div class='line'></div><div class='line'>        # add results to dataframe for easy model comparison</div><div class='line'>        self._add_model_results(dict_scores_agg, model_name)</div><div class='line'>        return dict_scores_agg, dict_scores_folds, best_model</div></code></div></pre><!-- HTML_TAG_END --> <ul data-svelte-h="svelte-102ktah"><li>The method requires <code>param_grid</code>, a grid of hyperparameter values to try on the model.</li> <li>Only the best performing model will have its performance and pipeline model returned.</li></ul> <p data-svelte-h="svelte-ngbmer"><strong>Similarly, we define a general method <code>.randomised_search_cv_score()</code> to fit and tune a models hyperparameters with randomsearch, performing <em>nested cross-validation</em> to estimate its balanced accuracy and f1 on unseen data.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>def randomised_search_cv_score(self, model_name: str, param_distributions: dict, n_iter: int):</div><div class='line'>    """</div><div class='line'>     Fits a model with randomised_search to tune its hyperparameters.</div><div class='line'>     It Then performs nested cross-validation to obtain an estimate of the </div><div class='line'>     generalisability of the model & variance of this estimate.</div><div class='line'></div><div class='line'>     Inputs:</div><div class='line'>      - model_name (str): name of the model that will exist as its index in the self.results df</div><div class='line'>      - param_distributions (dict): range of possible values each hyperparameter that can be sampled</div><div class='line'>      - n_iter (int): number of iterations to run randomised_search (i.e. models to be fitted & evaluated)</div><div class='line'>     </div><div class='line'>     Returns:</div><div class='line'>      - dict_scores_agg (dict[list]): dictionary containing the mean & std results for each model metric</div><div class='line'>      - dict_scores_folds (dict[list]): dictionary containing the model metric results for each fold</div><div class='line'>      - best_model: sklearn model of best performing model after gridsearch hyperparameter tuning</div><div class='line'>    """</div><div class='line'>    with self._suppress_warnings_and_output():</div><div class='line'>        model_random_search = RandomizedSearchCV(self.model, param_distributions=param_distributions, n_iter=n_iter, n_jobs=-1, scoring=self.metrics, refit=self.metrics[0], verbose=1)</div><div class='line'>        # fit grid search on training data to extract best model</div><div class='line'>        model_random_search.fit(self.X_train, self.y_train)</div><div class='line'>        best_model = model_random_search.best_estimator_</div><div class='line'>        # get nested CV scores for best model</div><div class='line'>        dict_scores_agg = dict()</div><div class='line'>        dict_scores_folds = dict()</div><div class='line'>        # perform nested cv for each metric, storing results in dictionaries</div><div class='line'>        for metric in self.metrics:</div><div class='line'>            cv_results = cross_validate(model_random_search, data, target, cv=self.cv, n_jobs=-1, scoring=metric)</div><div class='line'>            scoring = cv_results['test_score']</div><div class='line'>            dict_scores_agg[metric] = (scoring.mean(), scoring.std())</div><div class='line'>            dict_scores_folds[metric] = scoring</div><div class='line'>        # add results to dataframe for easy model comparison    </div><div class='line'>        self._add_model_results(dict_scores_agg, model_name)</div><div class='line'>        return dict_scores_agg, dict_scores_folds, best_model</div></code></div></pre><!-- HTML_TAG_END --> <ul data-svelte-h="svelte-5z3h9g"><li>The method requires <code>param_distributions</code>, a range of hyperparameter values to try on the model.</li></ul> <p data-svelte-h="svelte-1efgdzh"><strong>What is the difference between grid search and randomised search?</strong></p> <ul data-svelte-h="svelte-1snqdt5"><li>Answer question to explain rationale!</li></ul> <p data-svelte-h="svelte-thyve"><strong>Why do we need nested cross-validation for evaluating and selection of models when performing hyperparameter tuning?</strong></p> <ul data-svelte-h="svelte-1snqdt5"><li>Answer question to explain rationale!</li></ul> <p data-svelte-h="svelte-oirojr"><strong>While we returned the performance and tuned model itself, the gridsearch and randomised search methods did not provide us with the tuned hyperparameters themselves. The <code>.get_best_tuned_params()</code> method provides this desired functionality.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>def get_best_tuned_params(self, best_model, param_grid: dict):</div><div class='line'>    """</div><div class='line'>     Gets the best tuned parameters for a model after gridsearch</div><div class='line'>     or randomised_search hyperparameter tuning.</div><div class='line'></div><div class='line'>     Inputs:</div><div class='line'>      - best_model: sklearn model of best performing model after hyperparameter tuning</div><div class='line'>      - param_grid (dict): grid of hyperparameters to iterate through</div><div class='line'>     </div><div class='line'>     Returns:</div><div class='line'>      - dictionary of tuned parameters and their values for the best model</div><div class='line'>    """</div><div class='line'>    params = [param for param in best_model.get_params() if param.startswith('model__') and param in list(param_grid.keys())]</div><div class='line'></div><div class='line'>    params_values = [best_model.get_params()[param] for param in params]</div><div class='line'></div><div class='line'>    return dict(zip(params, params_values))</div></code></div></pre><!-- HTML_TAG_END --> <p data-svelte-h="svelte-1294iyv"><strong>In some ad-hoc instances, I was interested in reinforced the superiority of a model by comparing the individual fold performance between two models, plotting their balanced accuracy and f1 scores. The <code>.plot_fold_comparison()</code> method does this.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>def plot_fold_comparison(self, metric: str, fold_results: tuple[list[int]], label_names: tuple[str], colors: tuple[str]):</div><div class='line'>    """</div><div class='line'>     Plots a comparison of the model's evaluation metric (i.e. performance)</div><div class='line'>     on each individual cross-validation fold. If there are 4</div><div class='line'>     cross-validation folds, a scatterplot with four dots will be plotted.</div><div class='line'>    </div><div class='line'>     Inputs:</div><div class='line'>      - metric (str): sklearn evaluation metric of choice</div><div class='line'>      - fold_results (tuple[list[int]]): tuple containing the fold results of each model</div><div class='line'>      - label_names (tuple[str]): names to give each to each model in the legend of plot</div><div class='line'>      - colors (tuple[str]): colors to give each model in the plot</div><div class='line'>    """</div><div class='line'>    # handle edge case: user passes in invalid metric to plot</div><div class='line'>    if metric not in self.metrics:</div><div class='line'>        raise Exception(f"Please pass in a valid metric: &#123;self.metrics&#125;")</div><div class='line'>    else:</div><div class='line'>        fig, ax = plt.subplots(figsize=(6, 5))</div><div class='line'></div><div class='line'>        # iterate through each model's fold_result, adding scatterplots</div><div class='line'>        for i in range(len(fold_results)):</div><div class='line'>            indicies = np.arange(len(fold_results[i][metric]))</div><div class='line'>            sns.scatterplot(x=indicies, y=fold_results[i][metric], color=f"tab:&#123;colors[i]&#125;", label=label_names[i], ax=ax)</div><div class='line'></div><div class='line'>        # plot axis settings</div><div class='line'>        ax.set_xlabel("Cross-validation iteration")</div><div class='line'>        ax.set_ylabel("Balanced Accuracy")</div><div class='line'>        ax.set_title(f"&#123;self.cv&#125;-Fold &#123;metric&#125;")</div><div class='line'>        ax.set_xticks(np.arange(0, self.cv))</div><div class='line'>        ax.set_ylim(0,1)</div><div class='line'>        ax.legend(bbox_to_anchor=(1.05, 1), loc="upper left")</div></code></div></pre><!-- HTML_TAG_END --> <p data-svelte-h="svelte-1hbx3ki">For instance, we used this method to visualise a comparison between the cross-validation folds of the best performing logistic regression and tree models.</p> <ul data-svelte-h="svelte-3xxxt8"><li>Insert image here!!! (— Comparison of folds between best Logistic Regression &amp; Decision Tree models —)</li></ul> <p data-svelte-h="svelte-17f4suy">Mostly, this method was not extremely useful.</p> <p data-svelte-h="svelte-1gljw4h"><strong>Finally, you may have noticed I used the <code>._suppress_warnings_and_output()</code> helper method throughout the previous methods. Its purpose is to suppress annoying output warning messages from running gridsearch or randomised search.</strong></p> <!-- HTML_TAG_START --><pre class="shiki material-default"><div class='code-container'><code><div class='line'>@contextmanager</div><div class='line'>def _suppress_warnings_and_output(self):</div><div class='line'>    # save original warning environment variable</div><div class='line'>    original_warnings = os.environ.get("PYTHONWARNINGS", "")</div><div class='line'>    </div><div class='line'>    # suppress UserWarning and FutureWarning</div><div class='line'>    os.environ["PYTHONWARNINGS"] = "ignore::UserWarning, ignore::FutureWarning"</div><div class='line'>    warnings.filterwarnings("ignore", category=UserWarning)</div><div class='line'>    warnings.filterwarnings("ignore", category=FutureWarning)</div><div class='line'></div><div class='line'>    # capture all stdout/stderr output</div><div class='line'>    with io.capture_output() as captured:</div><div class='line'>        try:</div><div class='line'>            yield captured</div><div class='line'>        finally:</div><div class='line'>            # restore warnings environment and filters</div><div class='line'>            os.environ["PYTHONWARNINGS"] = original_warnings</div><div class='line'>            warnings.filterwarnings("default", category=UserWarning)</div><div class='line'>            warnings.filterwarnings("default", category=FutureWarning)</div></code></div></pre><!-- HTML_TAG_END --> <p data-svelte-h="svelte-1rva4ax">Continue describing how I used these methods to train a single model. Provide an example (e.g. LogisticRegression).</p> <p data-svelte-h="svelte-1hfqw6p">Show the end result after model training. (i.e. the final table). One thing that catches my eye is that both Gradient Boosting methods outperformed any of the basic methods. So how did we go back Gradient Boosting (the final models in production after I found they outperformed svms; mention how they were a late addition).</p> <h3 id="gradient-boosting-models" data-svelte-h="svelte-bgndv7"><a href="#gradient-boosting-models">Gradient Boosting Models</a></h3> <hr> <p data-svelte-h="svelte-d8s9lk"><strong>Note:</strong> Big Mistake: Training on player_id. Forced me to re-run my entire workflow and assess whether any model rankings changed. LGBM still remained best model.</p> <hr> <hr> <p data-svelte-h="svelte-19urb6j">This article provides my comprehensive overview of XGBoost; the algorithm and  package in Python.</p> <hr> <p><img src="./xgboost.jpg" alt="xgboost" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <h2 id="ensemble-learning" data-svelte-h="svelte-1fdu9t5"><a href="#ensemble-learning">Ensemble Learning</a></h2> <p data-svelte-h="svelte-1v76sou">Ensemble learning is a technique that combines multiple individual models, aggregating their predictions, to produce better predictions than a single model alone. XGBoost is a form of ensemble-based learning, so it feels natural to begin by describing this effective method of building models. common ensemble-based methods are bagging and boosting.</p> <h3 id="bagging-boostrap-aggregating" data-svelte-h="svelte-gu7zh4"><a href="#bagging-boostrap-aggregating">Bagging (Boostrap Aggregating)</a></h3> <p data-svelte-h="svelte-dlpxfc">Bagging involves training the same model on multiple bootstrap samples (i.e., randomly sampled with replacement from the training data). This produces several individual models, each trained on variations of the training set. The final prediction is the aggregate of the individual model predictions - usually their average (for regression) and majority (for classification).</p> <p><img src="./bagging.jpg" alt="bagging" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <h3 id="boosting" data-svelte-h="svelte-1qfiu07"><a href="#boosting">Boosting</a></h3> <p data-svelte-h="svelte-1d7cmgs">Boosting is a sequential ensemble method where ‘weak’ models are trained one after another. Each new model focuses on the errors made by the previous models, placing greater weight on examples that were previous mispredicted. This ensures the ensemble incrementally improves its performance as more models are added. The final predictor is the weighted sum of all ‘weak’ models - not just the last one.</p> <p><img src="./boosting.jpg" alt="boosting" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <h3 id="bagging-vs-boosting" data-svelte-h="svelte-mtf3sd"><a href="#bagging-vs-boosting">Bagging vs Boosting</a></h3> <div class="overflow-x-auto mb-4"><table class="table w-full"><thead data-svelte-h="svelte-kujtix"><tr><th>Property</th> <th>Bagging</th> <th>Boosting</th></tr></thead> <tbody data-svelte-h="svelte-p19jyf"><tr><td>Training</td> <td>Parallel (independent models)</td> <td>Sequential (each depends on previous)</td></tr> <tr><td>Focus</td> <td>Reduce variance w/ independent, diverse models</td> <td>Reduce bias by iteratively improving upon weaknesses</td></tr> <tr><td>How models differ</td> <td>Resampled data</td> <td>Error focus</td></tr> <tr><td>Final prediction</td> <td>Average / vote</td> <td>Weighted sum</td></tr> <tr><td>Typical base learner</td> <td>Strong (e.g. full trees)</td> <td>Weak (e.g. shallow trees/stumps)</td></tr> <tr><td>Algorithms</td> <td>Random Forest</td> <td>AdaBoost, Gradient Boosting, XGBoost, LightGBM, CatBoost</td></tr></tbody></table></div> <h2 id="history-of-boosting" data-svelte-h="svelte-1o1dmte"><a href="#history-of-boosting">History of Boosting</a></h2> <h3 id="adaboost" data-svelte-h="svelte-1xx7ptp"><a href="#adaboost">AdaBoost</a></h3> <p data-svelte-h="svelte-12q9q50">We begin the Boosting story with AdaBoost, where misclassified samples are assigned higher weights for the next model, so the next model pays more attention to them. This idea doesn’t generalise well to common scenarios nowadays; regression, arbitrary loss, or sparse data.</p> <h3 id="gradient-boosting" data-svelte-h="svelte-1534cuz"><a href="#gradient-boosting">Gradient Boosting</a></h3> <p data-svelte-h="svelte-22xxqm">Instead of simply reweighting samples to focus on mistakes, Gradient Boosting reframes AdaBoost as an optimisation problem. Rather than having each iterative model predict the target directly, each new tree predicts the <strong>residuals</strong> (mistakes) of the current ensemble’s predictions.</p> <h3 id="example-house-price-prediction" data-svelte-h="svelte-l62q8m"><a href="#example-house-price-prediction">Example: House price prediction</a></h3> <p data-svelte-h="svelte-tgi2o">As a starting point, suppose the model predicts the same value for every house, say <code>$350k</code>. The true prices might be:</p> <ul data-svelte-h="svelte-zthx88"><li><code>y = [$200k, $300k, $400k, $500k]</code></li></ul> <p data-svelte-h="svelte-1q85ly">Then the initial predictions are:</p> <ul data-svelte-h="svelte-xgez96"><li><code>y_hat = [$350k, $350k, $350k, $350k]</code></li></ul> <p data-svelte-h="svelte-107zv2q">and the initial residuals are:</p> <ul data-svelte-h="svelte-1s92ie1"><li><code>r_i = y_i - y_hat_i = [$-150k, $-50k, $+50k, $+150k]</code></li></ul> <p data-svelte-h="svelte-18k2jxk">These residuals represent <strong>what the current model has failed to explain</strong>. Under squared error loss, they are also equal to the <strong>negative gradient</strong> of the loss with respect to the predictions.</p> <p data-svelte-h="svelte-tvvk4a">For squared error, the loss for a single sample is:</p> <ul data-svelte-h="svelte-10l0xep"><li><code>L(y, y_hat) = 0.5 * (y - y_hat)^2</code></li></ul> <p data-svelte-h="svelte-1oifaxs">If you take the derivative of this with respect to <code>y_hat</code>, you get:</p> <ul data-svelte-h="svelte-1d67205"><li><code>dL/dy_hat = (y_hat - y)</code></li></ul> <p data-svelte-h="svelte-1b1d9jj">The <strong>negative gradient is therefore the residual</strong>:</p> <ul data-svelte-h="svelte-m7ivhj"><li><code>-(dL/dy_hat) = y - y_hat = r_i</code></li></ul> <p data-svelte-h="svelte-y56c4d">So when Gradient Boosting trains the next weak model (usually a shallow tree), it does **not train it on the original house prices <code>y</code>. Instead:</p> <ul data-svelte-h="svelte-1hxc1e6"><li>The <strong>features</strong> (e.g. bedrooms, size, suburb) stay the same.</li> <li>The <strong>targets</strong> become the residuals <code>r_i</code>.</li></ul> <p data-svelte-h="svelte-1ib36io">We’re no longer predicting the full price. We’re predicting <strong>corrections</strong> that tell us how far our existing model is from a perfect one.</p> <p data-svelte-h="svelte-trlrrx">After training this tree <code>h_1(x)</code>, we update the model:</p> <ul data-svelte-h="svelte-ayz7bp"><li><code>F_1(x) = F_0(x) + η * h_1(x)</code></li></ul> <p data-svelte-h="svelte-3qt7bc">where <code>η</code> (eta) is the learning rate. Because <code>h_1(x)</code> approximates the negative gradient (the direction of steepest descent), adding it reduces the loss.</p> <p data-svelte-h="svelte-1ft6hh8">Gradient Boosting repeats this process:</p> <ol data-svelte-h="svelte-7lq5vk"><li>Compute residuals <code>r_i = y_i - F_{m-1}(x_i)</code> (for squared loss; more generally, use negative gradients).</li> <li>Fit a shallow tree <code>h_m(x)</code> to <code>(x_i, r_i)</code>.</li> <li>Update the model: <code>F_m(x) = F_{m-1}(x) + η * h_m(x)</code>.</li></ol> <p data-svelte-h="svelte-m95w92">Each tree is a <strong>small correction step</strong> in the direction that most reduces the loss. This is mathematically the same idea as gradient descent, except instead of updating weights directly (like in a neural network), Gradient Boosting updates the <strong>function</strong> itself by adding trees.</p> <h4 id="analogy" data-svelte-h="svelte-13vg43z"><a href="#analogy">Analogy</a></h4> <p data-svelte-h="svelte-oqa4wt">Think of writing an essay and getting feedback:</p> <ol data-svelte-h="svelte-1muvugp"><li>You write a first draft (initial model).</li> <li>The teacher marks mistakes and comments (these are like residuals).</li> <li>You fix only those mistakes (fit a model to the residuals).</li> <li>You submit again and get a new set of corrections.</li> <li>You fix those and repeat.</li></ol> <p data-svelte-h="svelte-1tjcsfb">You don’t rewrite the entire essay from scratch each time — you only correct <strong>what remains wrong</strong>. After enough rounds of focused corrections, the essay becomes strong.</p> <p data-svelte-h="svelte-yu25i4">Gradient Boosting works the same way: each tree is a small correction pass over the current model. Individually, the trees are weak, but together they form a powerful predictor.</p> <h2 id="extreme-gradient-boosting-xgboost" data-svelte-h="svelte-1ljez2q"><a href="#extreme-gradient-boosting-xgboost">eXtreme Gradient Boosting (XGBoost)</a></h2> <p data-svelte-h="svelte-z1nrf6">XGBoost is an industry-favourite for its speed, scalability, and flexibility. It extends on the classic Gradient Boosting idea of having each new tree predict the <strong>residuals</strong> (mistakes) of the current ensemble’s predictions with the following:</p> <ul data-svelte-h="svelte-1lemkig"><li>Reguluarisation</li> <li>Optimisation</li> <li>Flexibility</li></ul> <p data-svelte-h="svelte-bh4jns">Here is an <a href="https://medium.com/analytics-vidhya/what-makes-xgboost-so-extreme-e1544a4433bb" rel="nofollow noopener noreferrer external" target="_blank">excellent article</a> that describe XGBoost under-the-hood.</p> <p data-svelte-h="svelte-lx0e4t">Here is another <a href="https://medium.com/@heyamit10/xgboost-explained-d215f091fb85" rel="nofollow noopener noreferrer external" target="_blank">article</a> that details practical applications of XGBoost and Optuna for hyperparameter tuning.</p>]]>
    </content>
    <category term="Python" scheme="https://urara-demo.netlify.app/?tags=Python" />
    <category term="Machine Learning" scheme="https://urara-demo.netlify.app/?tags=Machine%20Learning" />
  </entry>
  <entry>
    <title type="html"><![CDATA[ThoughtSpot SQL]]></title>
    <link href="https://urara-demo.netlify.app/courses/databricks" />
    <id>https://urara-demo.netlify.app/courses/databricks</id>
    <published>2026-01-13T00:00:00.000Z</published>
    <updated>2026-01-30T00:11:47.760Z</updated>
    <content type="html">
      <![CDATA[<p data-svelte-h="svelte-1uox74i">This blog serves as my reflection and learning hub for the <a href="https://www.thoughtspot.com/sql-tutorial?utm_source=chatgpt.com" rel="nofollow noopener noreferrer external" target="_blank">ThoughtSpot SQL course</a>.</p> <hr> <h2 id="there-is-nothing-here-yet" data-svelte-h="svelte-kfbhje"><a href="#there-is-nothing-here-yet">There is nothing here yet!</a></h2>]]>
    </content>
    <category term="SQL" scheme="https://urara-demo.netlify.app/?tags=SQL" />
  </entry>
  <entry>
    <title type="html"><![CDATA[Databricks SQL]]></title>
    <link href="https://urara-demo.netlify.app/courses/thoughtspot" />
    <id>https://urara-demo.netlify.app/courses/thoughtspot</id>
    <published>2026-01-13T00:00:00.000Z</published>
    <updated>2026-01-30T00:11:47.760Z</updated>
    <content type="html">
      <![CDATA[<p data-svelte-h="svelte-ve2rc6">This blog serves as my reflection and learning hub for several <a href="https://www.databricks.com/learn/training/home?utm_source=chatgpt.com" rel="nofollow noopener noreferrer external" target="_blank">Databricks SQL courses</a>.</p> <hr> <h2 id="there-is-nothing-here-yet" data-svelte-h="svelte-kfbhje"><a href="#there-is-nothing-here-yet">There is nothing here yet!</a></h2>]]>
    </content>
    <category term="SQL" scheme="https://urara-demo.netlify.app/?tags=SQL" />
  </entry>
  <entry>
    <title type="html"><![CDATA[Canva Chronicles: My 12-Week Experience & Reflection]]></title>
    <link href="https://urara-demo.netlify.app/canva_learnings" />
    <id>https://urara-demo.netlify.app/canva_learnings</id>
    <published>2025-05-25T00:00:00.000Z</published>
    <updated>2026-01-30T00:11:47.708Z</updated>
    <content type="html">
      <![CDATA[<p data-svelte-h="svelte-1gg1p3w">This article serves as a reflection on my internship experience at Canva - my learnings, favourite moments and the wild three-month journey. So without further ado, let’s dive into it.</p> <p><img src="./IMG_0556.jpg" alt="canva_exp" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <hr> <h2 id="the-journey" data-svelte-h="svelte-ehuhjd"><a href="#the-journey">The journey</a></h2> <h3 id="how-did-i-end-up-at-canva" data-svelte-h="svelte-krltdf"><a href="#how-did-i-end-up-at-canva">How did I end up at Canva?</a></h3> <p data-svelte-h="svelte-ixpgck">A bit of background. Canva had always been one of my dream companies for a few reasons:</p> <ul data-svelte-h="svelte-1peyytc"><li>I had been using Canva for a long time. In fact, it shocked me that I had been using Canva since year 10 to create stunning infographic designs. I always knew of the product and loved using it, but it never registered that the product was named Canva, and that it was a major technology employer based in Australia until I applied.</li> <li>A couple of Canva employees showed up at an uni club trivia night I attended during my first year. One question was: ‘where was Canva founded?’ I blindly guessed Perth, and somehow got it right. Ever since then, I had been enamoured by Canva. Perhaps it was the inspiration that came with seeing an amazing product originate from own hometown.</li></ul> <p data-svelte-h="svelte-1khvtui">So without hesitation, I applied. It was a longshot. I knew just how competitive it would be. In fact, I later found out that just 89 interns were taken from a pool of ~ 7000 applicants. But after an online assessment and a technical plus behavioural interview, I had made it in.</p> <h3 id="week-1---onboarding" data-svelte-h="svelte-mz0mxz"><a href="#week-1---onboarding">Week 1 - Onboarding</a></h3> <p><img src="./IMG_0585.jpg" alt="onboarding" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <p data-svelte-h="svelte-1mzr9ad">Onboarding week swiftly came and went. We were taken through ample modules and setup the software necessary for our projects (e.g. Snowflake, GitHub, Accessing Data Warehouse). I also met my support team - including my host, co-host and buddy - who were all super smart &amp; lovely people, and discussed my intern project with my host!</p> <p data-svelte-h="svelte-c4soxy">All the interns spent the first week from the comfort of their home (or closest office), which meant that I explored the amazing Melbourne office. A small, cosy and humble abode in the heart of Collingwood, the Melbourne office had delicious breakfast and lunched served up by the chefs, a friendly vibe team who greeted you every morning, great background music, free merch at the swag station and sick artwork fit for a design company.</p> <h3 id="week-2---campus-week" data-svelte-h="svelte-1anxj48"><a href="#week-2---campus-week">Week 2 - Campus week</a></h3> <p><img src="./IMG_9725.jpg" alt="campus_week" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <p data-svelte-h="svelte-lmo1ay">Campus week was when the fun really began - especially since I was sick during week 1. All the interns were flown to Sydney and given accommodation for a week, which meant some of us (myself included) got to meet our team in person, there were plenty of intern activities planned and we capped off the week with the annual Canva end-of-year party!</p> <p data-svelte-h="svelte-11rfdrb">Let me tell you, the party was amazing. There was plenty of music (including a dance floor) to keep the vibes high, the street was decked out for us with food trucks, we attended stand-up comedy, learned how to shoot arrows and bonded over alcohol on the rooftop. Actually, we could’ve gotten drunk anytime with beer taps and alcohol readily available on the Sydney office rooftop.</p> <p data-svelte-h="svelte-189254u">I’m going to be completely transparent. I didn’t get much work done this week. There were just too many intern events - both during and outside of work hours - that I didn’t find the time to fully concentrate on my intern project. Fortunately, my team had told me to enjoy myself this week and not worry too much about work.</p> <h3 id="weeks-3-to-8---sydney" data-svelte-h="svelte-i9qztw"><a href="#weeks-3-to-8---sydney">Weeks 3 to 8 - Sydney</a></h3> <p><img src="./IMG_0400.jpg" alt="sydney" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <p data-svelte-h="svelte-lw1lp8">I decided to stay in Sydney for a few extra weeks - even after the all-expenses-paid Campus Week had wrapped up. My reasoning? I felt there was so much more Syndey could offer professionally and socially. I was fortunate in both of these aspects.</p> <p data-svelte-h="svelte-1gkfikc">Professionally, most of the product Data Scientists were located in Sydney including my host, co-host and buddy. This meant that I could better connect with my team and learn a heck of a lot more (scroll down to find my key learnings). In fact, those two months felt jam-packed with personal milestones that I could celebrate with the team - including an engagement, baby announcement and permanent residency approval! Even though I wish I could’ve seen my team a bit more - many came in just once a week - I totally get why, and I still had an amazing time with them.</p> <p data-svelte-h="svelte-19dgjay">Even though I wish I could’ve seen my team a bit more as many came in just once a week, I totally get why they couldn’t come in more often and still had an amazing time with them. Plus, I had my buddy next to me four days a week!</p> <p data-svelte-h="svelte-sr7obp">Socially, Sydney had more people. Especially interns. While I do wish I had taken the opportunity to mingle with more interns, I truly feel blessed to have gotten close to some amazing ones - especially those that also relocated from other states &amp; stayed at Iglu accommodation. We would regularly explore the many wonders of Sydney (Bondi beach, Harbour bridge, Opera House, Manly etc), play games (Exploding Kitten, Codenames, Switch) and work together. I never felt there was a second to spare as some activity had always been planned for the weekend.</p> <p data-svelte-h="svelte-bnp5fk">Canva clubs were also a major highlight. You were allowed to attend up to four club-sponsored events each month, which meant the entire experience would be on the house! I was able to try out new activities like pilates (with my team) and pickleball, and indulge in treats like yo-chi, all for the price of … NOTHING! Who doesn’t love free stuff, all the while feeling part of a community :o</p> <h3 id="weeks-9-to-11---melbourne" data-svelte-h="svelte-v53cph"><a href="#weeks-9-to-11---melbourne">Weeks 9 to 11 - Melbourne</a></h3> <p><img src="./IMG_0679.JPG" alt="melbourne" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <p data-svelte-h="svelte-kkisau">The palpable sorrow felt from leaving all of my amazing friends and team in Sydney started to hit my first day back. Although I had been to the Melbourne office once prior, I never realised just how difficult it was to commute to from my house (~ 1hr 15 mins). In spite of this, my experience back in Melbourne was much the same.</p> <p data-svelte-h="svelte-18gm9bv">I had made significant progress on my project - receiving positive feedback from key stakeholders and collaborating with a PM on a new feature that came from my report, and was set for experimentation. Safe to say, the last couple of weeks were extremely busy for everyone.</p> <p data-svelte-h="svelte-1g2n3t4">Immersing myself into the Melbourne office culture wasn’t too difficult. Though, there were only around 10 interns and with everyone working overtime, it was difficult to organise many social outings. You wouldn’t believe the crazy hours some interns pulled. Alas, I knew when I worked best and prioritised waking up feeling refreshed.</p> <p data-svelte-h="svelte-1e7mpjb">Despite less social events, I still had a great time. I was able to meet a few teammates who were based out of Melbourne - even finding out one of them came from Perth aswell! We organised pickleball and basketball amongst the interns during these final few weeks, and enjoyed the lovely food (arguably better than Sydney’s) made in-house by the chefs. I would also describe general vibe among the Melbourne office as more ‘cosy’ - chill music playing, a closer-knit community and physical proximity all contributed to this change of scenery.</p> <h3 id="week-12---presentations" data-svelte-h="svelte-1jn2un"><a href="#week-12---presentations">Week 12 - Presentations</a></h3> <p><img src="./IMG_0596.jpg" alt="presos" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <p data-svelte-h="svelte-qrxeco">The juxtaposition of the final week to the first week was immense. The last push to finish our work, submit our personal reflections &amp; presentations and mingle with our peers one last time meant there was little time for much else. The Data Science interns gave a presentation among most Data Scientists at Canva at the start of the week, and all interns gave their final intern presentation on Wednesday and Thursday. Being in product, I was given the opportunity to present 1st among all Data Scientists and 2nd among all 89 interns. It was quite the double-edged sword. On the one hand, I would be free from comparison prior to my presentation and would have the most crowd engagement (lots of interns shouting their support in the chat). On the other hand, since our presentations were broadcasted to the entire company, I would have the most eyes on me (which is also kind of good).</p> <p data-svelte-h="svelte-f0mkqd">My presentations went well as I received positive feedback all around. However, comparison being the thief of joy that it is - it was hard not to compare my work to other interns (especially Data Scientists). For whatever reason, I felt that my work didn’t sound as ‘impactful’ or ‘cool’ as other interns - which was probably not true - and found myself always comparing. Reflecting back, it’s good to appreciate the amazing work other’s do but don’t forget all the work you’ve put in to deliver your best go at a challenging problem.</p> <h3 id="week-12---final-day" data-svelte-h="svelte-cberji"><a href="#week-12---final-day">Week 12 - Final day</a></h3> <p><img src="./IMG_0680.JPG" alt="final_day" class="rounded-lg my-2" loading="lazy" decoding="async"></p> <p data-svelte-h="svelte-12rmd21">After the stress that came with presentations, it was time to wind down and relax on the final day. I said my goodbyes to everyone - especially my team - expressing my gratitude for the impact that so many Canvanauts had on my career. In particular, my coach/host. I could not have asked for a better host and felt genuinely sad that I wouldn’t be working alongside him anymore. He always gave my credit - giving me many kudos on Canvaworld - and made sure that everything was okay personally before getting into any work-related topics. He even organised a suprise final meeting with all the product Data Scientists where they gave me four cans of berry jam (berries are my favourite fruit), a basketball book (I love basketball) and some berry tea.</p> <p data-svelte-h="svelte-1t5zbi6">There is honestly too much that went on during this final week to describe in depth and so I may have missed some details.</p> <h3 id="final-remarks" data-svelte-h="svelte-kh8lo7"><a href="#final-remarks">Final remarks</a></h3> <p data-svelte-h="svelte-1e8v843">A few months have passed and the dust has settled on the internship. I’m back at university now - completing an industry placement - but can’t help myself from reminiscing on the wonderful 12-week experience I had over the summer. Although we all received our decision calls regarding return offers, and most interns didn’t recieve one (including myself), I still feel as if I can walk away proud of everything I achieved and all the memories I created.</p> <hr> <h2 id="learnings" data-svelte-h="svelte-16w4la3"><a href="#learnings">Learnings</a></h2> <p data-svelte-h="svelte-2ze38v">Here are a couple of key learnings that I hope to apply at my upcoming graduate role.</p> <ol data-svelte-h="svelte-1sgzd8q"><li><p><strong>Quantifying impact is difficult as a Data Scientist</strong>, so be proactive and setup regular meetings with stakeholders to discuss the impact of your work. However, make the meeting all about you, and how you can quantify your impact. A better use of both parties’ time would be to use the meeting to discuss quantitative insights you’ve found and how they may be used (e.g. new feature ideas, modifications to existing features). Always follow up with estimated timelines of any new ideas proposed by the other party. For example, I was able to quantify the impact of my research report by hosting a meeting with a product manager, who discussed their new feature idea that my research supported.</p></li> <li><p><strong>Always ask ‘so what’?</strong> So what is the impact of the insight? So what is its important? So what can we do about it? These questions are now what I will call the three ‘so whats’ - which if kept in the forefront of your mind, will keep you on track. While I’m gaining business acumen, it may be difficult to see the big picture, or to tie an insight to some impactful action (what can we do about it?). To remedy this, ask around. Propose the idea to your manager, teammates, and product managers to pick their brain on the insight to see whether it’s truly relevant, and can not only be impactful, but deliver <strong>feasible business results</strong>.</p></li> <li><p><strong>Get to know EVERYONE in your team by setting up coffee catchups</strong>. This is definitely one area of improvement. At the conclusion of my internship, I felt comfortable asking for advice from the few teammates I regularly interacted with on the project, but I knew I had missed a major opportunity by not getting to know everyone in my team (atleast) better from the start. <strong>First impressions are key</strong>. Being proactive in setting up coffee catchups to get to know others personally and showing curiosity in their work/roles will help me to feel more welcome (it’s really up to me to feel part the team), and have more avenues for support. Seeing other interns (especially Viv) network has really inspired me to do the same.</p></li> <li><p><strong>Come prepared for important meetings with an agenda</strong>. This is one area I did well in. I had a continuous agenda doc that I would prepopulate with topics that I wanted to cover in all my one-on-ones with my manager. The agenda doc wasn’t complicated - simply a table with dot-points. I also found that creating and sharing an agenda for important meetings with stakeholders (e.g. product managers), with topics and ideas to already there to discuss really helped in facilitating the conversation. Preparing in this way for my meeting with the AI product manager, and asking my coach to review the agenda helped to demonstrate my professionalism and come out of the meeting with something valuable. Remember, <strong>people are busy, so make the most of the meeting by coming prepared</strong>.</p></li> <li><p><strong>Set expectations with your manager early</strong>. Inevitably, you will go through a formal performance review. To position yourself for success, ensure your first few meetings establish both parties expectation and interpretation of your growth profiles. Moreover, regular conversations with not just your manager, but other senior employees in the same field are so important for understanding how to perform above and beyond expectations. I would recommend setting aside at least a <strong>bi-weekly meeting</strong> to discuss your performance, career aspirations, and establish a plan to achieving these goals. This might include “completing project X”.</p></li> <li><p><strong>Keep yourself informed about the internals of the company</strong>. Upon reflection, I regrettably feel as if I had not paid enough attention to keeping up-to date with internal business developments/projects. Moving forward, I intend to give my undivided attention to town halls, read trending confluence pages, and develop greater business awareness to ensure I can tie my work back to <a href="https://www.atlassian.com/agile/agile-at-scale/okr" rel="nofollow noopener noreferrer external" target="_blank">OKRs</a> (Objectives and Key Results) when completing my performance review.</p></li> <li><p><strong>Create a brag book</strong>. Towards the end of my internship, a teammate advised me to create a book/spreadsheet of all my accomplishments and their impact. While I hadn’t maintained one throughout my internship, I undoubtedly see immense value in doing so - especially when completing performance reviews. Moving forward, I intend to adopt an <a href="https://netwerkmovement.com/how-to-create-a-brag-book/" rel="nofollow noopener noreferrer external" target="_blank">online template</a> that documents all my professional achievements and tasks, no matter how little.</p></li> <li><p><strong>Develop your product sense</strong>; regularly use and actively learn about the products of your company. This is an area I excelled in which paid dividends. Embedded in the Canva Docs team, I dedicated myself to learning about the product, leading to a strong product sense and understanding of how our users might engage with it. For instance, I discovered an AI feature was of the upmost importnance for user engagement. Moving forward with this mindset of curiosity will be essential as I look to work as a product Data Scientist in the future.</p></li> <li><p><strong>Meaningful project names matter</strong>, especially as you must become your strongest proponent to have a successful career in Data Science. Pitch your work, sell its importance and increase your visibility. What’s the point in doing work if no-one uses it, sees its value, or remembers it? My coach always comes up with a catchy project name that people will remember and could associate to an idea. After tossing around some ideas with my team, I named my project ‘Sweet Suite Adoption’ - a nod to both the ‘sweet’ Canva suite and its goal: improving user adoption of those products.</p></li></ol>]]>
    </content>
    <category term="Work Experience" scheme="https://urara-demo.netlify.app/?tags=Work%20Experience" />
  </entry>
  <entry>
    <title type="html"><![CDATA[My Recap of 2023]]></title>
    <link href="https://urara-demo.netlify.app/year_review" />
    <id>https://urara-demo.netlify.app/year_review</id>
    <published>2023-11-22T00:00:00.000Z</published>
    <updated>2026-01-30T00:11:47.772Z</updated>
    <content type="html">
      <![CDATA[<p data-svelte-h="svelte-1evcjai">It’s that time of the year again where people and companies alike are given their yearly review such as my personal favourite, Spotify Wrapped. I hope to start my own tradition too of recapping my year in a nutshell.</p> <hr> <h2 id="summer-holidays-nov-feb" data-svelte-h="svelte-qgkngq"><a href="#summer-holidays-nov-feb">Summer Holidays (Nov-Feb)</a></h2> <p data-svelte-h="svelte-rkflqx">After a gruelling first year of university and having just completed exams, I immediately flew back home to Perth in order to be with my family and close friends. Throughout this time back in Perth there were two major tasks I aimed to accomplish.</p> <h3 id="r" data-svelte-h="svelte-lsu196"><a href="#r">R</a></h3> <p data-svelte-h="svelte-1lro4cs">In line with my goal of learning a new language every holiday, these holidays saw me learn the basics of R with a focus on the all-encompassing tidyverse package. I was able to go through the <a href="https://r4ds.had.co.nz/" rel="nofollow noopener noreferrer external" target="_blank">R for Data Science</a> book and discovered a hidden gem of a youtube channel in <a href="https://www.youtube.com/channel/UC0cF_3ZXpYErsASTc29ftPg" rel="nofollow noopener noreferrer external" target="_blank">Kelsey Gonzalez</a>, a lead data scientist at IBM who taught the tidyverse package at American University which enabled me to quickly learn the language.</p> <h3 id="internship-hunt" data-svelte-h="svelte-1k9v0tx"><a href="#internship-hunt">Internship Hunt</a></h3> <p data-svelte-h="svelte-1nhiugu">It’s common knowledge that recruiters prefer penultimate year students because they have the ability to start work only a year following the internship. Having just finished my first year, I was aware of the struggles I would encounter whilst searching for an internship but perhaps was not prepared for the amount of rejections I would recieve. Interview after Interview, Rejection after Rejection. It seemed like an endless cycle that was frankly causing me a lot of stress. However, after getting through an initial resume screening, phone interview, assessment centre and final video interview, I was fortunate to land an internship as a Data Analytics intern at Major Road Projects Victoria (MRPV) during these holidays. It goes without saying, I was ecastic.</p> <hr> <h2 id="semester-1-feb-jun" data-svelte-h="svelte-10lznad"><a href="#semester-1-feb-jun">Semester 1 (Feb-Jun)</a></h2> <h3 id="major-road-projects-victoria" data-svelte-h="svelte-u3notk"><a href="#major-road-projects-victoria">Major Road Projects Victoria</a></h3> <p data-svelte-h="svelte-dvj1rj">My internship began in January and would run till July which meant I would be working part-time (3 days a week) during semester 1 of my second year. Initially, this was quite the load to manage but as slowly I became more and more comfortable in my work and started thoroughly enjoying the routine I had established. My work at MRPV was simple, complete weekly/monthly tasks in Excel and find areas in our processes to automate using Python. Although the work was quite slow given it was a government organisation, the memories that I made and experiences I took away from my first ever job will forever stay with me.</p> <p data-svelte-h="svelte-z5s00f">Some of my personal favourite experiences included:</p> <ul data-svelte-h="svelte-3url70"><li>Going a tour of the Essendon Fields construction site and having the ability to socialise for an extended period with a lot of the engineering interns. Infact, I still hang out with one of the interns I met during this site tour nearly 6 months after the end of my internship.</li> <li>A new advisor joining our team. Two months into my internship, our team added another advisor which I found to be one of the best things that happened because he often worked the same days I was in office and we were able to talk about almost anything from tech, career aspirations and even history.</li></ul> <h3 id="cca-sponsorship-officer" data-svelte-h="svelte-12q4wmr"><a href="#cca-sponsorship-officer">CCA Sponsorship Officer</a></h3> <p data-svelte-h="svelte-viaafh">Throughout semester 1, I was one of six sponsorship officers in the Computing Commerce Association (CCA) helping to attract and maintain relationships with companies who provide services or products relating to Computing and/or Commerce. The workload in this role was more than manageable and I’d highly encourage anyone looking to dip their toes into university clubs to try out a sponsorship team in order to gauge the atmosphere of a club whilst having relatively little commitment in terms of club work throughout the semester (if that’s what you’re after).</p> <h3 id="units" data-svelte-h="svelte-htturb"><a href="#units">Units</a></h3> <p data-svelte-h="svelte-1lpiuam">I will attempt to provide a more comprehensive review of all FIT (Faculty of IT) units in a future blog post but for now I would like leave my two cents on the units I took in semester 1 of 2023:</p> <ul data-svelte-h="svelte-nqo8hv"><li><a href="https://handbook.monash.edu/2024/units/FIT1047" rel="nofollow noopener noreferrer external" target="_blank"><strong>Intro to computer systems, networks and security (FIT1047)</strong></a>: This is a good unit introductory unit run by tutors that I find genuinely care about your learning. As a double degree student who took FIT1008 in the prior semester where we learnt MIPS, the content in FIT1047 was much simpler with MARIE and Logisim used to understand the fundamentals of computer systems, Wireshark for networking and cybersecurity. Overall, I found it not awfully challenging but I suspect first years taking this unit in their first semester might have had a different experience if they had no prior knowledge of computer architecture.</li> <li><a href="https://handbook.monash.edu/2024/units/ETC1010" rel="nofollow noopener noreferrer external" target="_blank"><strong>Intro to data analysis (ETC1010)</strong></a>: I found this unit enjoyable, especially given it is effectively a unit that teaches the basics of R (which I learnt over the holidays). I found some of the assignment and examination questions a bit ambiguous and littered with spelling mistakes but overall, the unit was enjoyable and quite easy.</li> <li><a href="https://handbook.monash.edu/2024/units/FIT2004" rel="nofollow noopener noreferrer external" target="_blank"><strong>Algorithms and data structures (FIT2004)</strong></a>: The infamous FIT2004… An essential unit needed by any good programmer which really stretches your thinking cap and expects you to fail ALOT. Proficiency in Python is expected and no effort will be made to teach you any Python. I was personally very challenged by this unit as it introduces a new bucket of problems to solve and definitely is a problem solving class. In saying this, I found it really rewarding and was still able to barely manage a HD by dedicating countless hours every week solving <strong>all</strong> the tutorial problems, asking a plethora of questions and starting the assignments as soon as they were released.</li></ul> <hr> <h2 id="winter-holidays-jun-jul" data-svelte-h="svelte-1qitlnu"><a href="#winter-holidays-jun-jul">Winter Holidays (Jun-Jul)</a></h2> <h3 id="nyk-failure-project" data-svelte-h="svelte-uec7sh"><a href="#nyk-failure-project">NYK Failure Project</a></h3> <p data-svelte-h="svelte-1c24sv3">As an avid NBA and New York Knicks fan, I’ve been pondering my life choices over the past 8 years watching the Knicks play. Following our relative success over the past 2 seasons, I was inspired to take a dive into what has made us such a horrible NBA team over the past decade which can be found through my report <a href="https://github.com/danielliu2707/NYK_Failures/blob/main/Failures%20of%20the%20New%20York%20Knicks.pdf" rel="nofollow noopener noreferrer external" target="_blank">here</a>.</p> <p data-svelte-h="svelte-1ll8k4y">In order to build this report, I spent close to a month obtaining NBA data through web scraping/APIs, wrangling the data using R, uncovering patterns in the data and building meaningful visualisations using the beautiful ggplot2. I often failed and restarted through the inability to find anything worthwhile to talk about or struggling to tidy the data into a format I could use. I restarted about 3 times and had to spend countless hours procuring the internet for suitable datasets which I eventually found through <a href="https://www.google.com/search?q=Sportrac&oq=Sportrac&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIKCAEQABixAxiABDIKCAIQABixAxiABDINCAMQABiDARixAxiABDIHCAQQABiABDINCAUQLhivARjHARiABDIHCAYQABiABDIQCAcQABiDARixAxiABBiKBTIHCAgQABiABDIHCAkQABiABNIBCDMwOTJqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8" rel="nofollow noopener noreferrer external" target="_blank">Sportrac</a> and the <a href="https://github.com/swar/nba_api" rel="nofollow noopener noreferrer external" target="_blank">NBA API</a>.</p> <p data-svelte-h="svelte-ekkr7v">In the end, I was very happy with the report that I had built as my first beginner project with hopefully bigger things ahead in the future.</p> <h3 id="cca-retreat" data-svelte-h="svelte-14izkb4"><a href="#cca-retreat">CCA Retreat</a></h3> <p data-svelte-h="svelte-88t9tm">The CCA committee took on Phillip Island as roughly 20 of us looked to enjoy a weekend of action packed fun. We started by meeting up at Monash University before heading off to Pings Dumpling (highly recommend) for lunch. Then we departed for our accomodation in Phillip Island where most of us stayed up and played card games, walked along the beach where we saw actual shooting stars and talked all night. The next day was even more action packed as some of us went bouldering, some baked cakes and some enjoyed cooking dinner which fortunately for everyone else, I didn’t participate in. That night I got about an hour of sleep as we talked about life before I had to drive home the next morning. For many of us, this was the first time we were able to talk for an extended period with many of our fellow committee members which really built a stronger and more tight knit committee.</p> <hr> <h2 id="semester-2-jul-nov" data-svelte-h="svelte-1udpqv4"><a href="#semester-2-jul-nov">Semester 2 (Jul-Nov)</a></h2> <h3 id="internship-hunt-v2" data-svelte-h="svelte-13wxgsj"><a href="#internship-hunt-v2">Internship Hunt v2</a></h3> <p data-svelte-h="svelte-xve39l">As my internship at MRPV came to a close and a majority of companies began opening up applications for their summer internship programs, once again I began seeking an opportunity to taste more real world experience. This time I was eligible for a lot more internships and applied to roughly 40 places, from which I was fortunate to land an offer from Newcrest Mining as a Data Science intern. It felt surreal as this company and position was at the top of my list of those roughly 40 companies I applied for.</p> <p data-svelte-h="svelte-1mqx9ln">Throughout this process, the many rejections taught me a great deal. Firstly, I definitely need to work on my technical interviews through practicing relevant technical and non-technical interview questions with a particular focus on data science questions. Secondly, a particular weakness of mine is psychometric tests which I will certainly need to work on through UCAT resources. Finally, I will need to add to my resume throughout next year as a majority of the places rejected me during the initial resume screening.</p> <h3 id="cca-treasurer" data-svelte-h="svelte-sfpcqe"><a href="#cca-treasurer">CCA Treasurer</a></h3> <p data-svelte-h="svelte-q5e7hd">In September of this year I ran for the position of Treasurer in CCA. This was a position that I had long desired because of my interest in managing finances and for the ability to be in a ‘top 4’ position in a university club which would afford me the opportunity to leave a significant impact on one of the largest university clubs. So far the executive team for 2023/2024 has been amazing with everyone genuinely seeming passionate about bringing new, innovative ideas to further develop our club.</p> <h3 id="units-1" data-svelte-h="svelte-xsemkj"><a href="#units-1">Units</a></h3> <p data-svelte-h="svelte-1bubabg">In semester 2 I went back to a normal load with what I considered to be a moderately difficult semester in terms of the units I took. Lets have a look at what I thought of each unit I took this semester.</p> <ul data-svelte-h="svelte-wefwzo"><li><a href="https://handbook.monash.edu/2024/units/FIT2094" rel="nofollow noopener noreferrer external" target="_blank"><strong>Databases (FIT2094)</strong></a>: Personally I loved this unit, especially in the second half when we were finished with database design and moved onto database queries using SQL. I found it truely enjoyable to work on the assignments, especially considering the teaching staff were all so friendly and quick to help. A major bonus of this unit is that it’s highly practical with no exam!</li> <li><a href="https://handbook.monash.edu/2024/units/FIT1043" rel="nofollow noopener noreferrer external" target="_blank"><strong>Introduction to data science (FIT1043)</strong></a>: FIT1043 is widely recognised as an ‘easy’ unit with my tutor even complaining about how easy this unit is. It should come as no surprise that I found this unit enjoyable as it not only covered the basics of machine learning including regression, classification and clustering but also as a much needed non time-consuming unit.</li> <li><a href="https://handbook.monash.edu/2024/units/ETC2420" rel="nofollow noopener noreferrer external" target="_blank"><strong>Statistical Thinking (ETC2420)</strong></a>: This unit is essential in that it unlocks machine learning units that have it as a pre-requisite which made me take it as soon as possible. The content itself was difficult to comprehend at times, especially Bayesian analysis but I found it highly rewarding when I eventually grasped the concepts through lots of well … thinking. The name of the unit really resonates when you take this unit as you need to really ‘think’ of how to interpret the data given the statistical concepts taught including permutation tests, bootstrapping, bayesian inference, maximum likelihood estimation and more!</li> <li><a href="https://handbook.monash.edu/2024/units/ETC2410" rel="nofollow noopener noreferrer external" target="_blank"><strong>Introductory econometrics (ETC2410)</strong></a>: I found this unit challenging not because of the concepts but rather the sheer quantity of content that was covered. I enjoyed learning hypothesis testing, quadratic regression, autocorrelation, heteroskedasticity and time series analysis but the amount of notes I took was absurd. In the end the exam was much easier than anticipated and I felt the teaching staff did a good job of preparing us for the final exam. As a core unit for both Econometrics and Finance majors, I can definitely see the necessity in taking this unit.</li></ul> <hr> <h2 id="concluding-remarks" data-svelte-h="svelte-syaiyn"><a href="#concluding-remarks">Concluding Remarks</a></h2> <p data-svelte-h="svelte-1ntmar9">There is a lot more to my year than what was mentioned such as my experiences volunteering at Monash Open Day or the Melbourne United Basketball Club but I thought that I’d bring to light some of my most memorable moments, achievements and failures (the biggest of which is not having a girlfriend yet) throughout the year.</p>]]>
    </content>
    <category term="Year Review" scheme="https://urara-demo.netlify.app/?tags=Year%20Review" />
    <category term="Monash" scheme="https://urara-demo.netlify.app/?tags=Monash" />
  </entry>
</feed>