use search_hub::tagging::{default_tags, TaggingEngine};

struct Sample {
    name: &'static str,
    text: &'static str,
}

const SAMPLES: &[Sample] = &[
    Sample {
        name: "Rust backend API",
        text: r#"
Building a Modern Web API with Rust

In this tutorial we will build a RESTful API using the Actix-web framework
in Rust. The API will expose CRUD endpoints for managing a book collection
stored in a PostgreSQL database. We will use SQLx for async database access
with connection pooling, and serde for JSON serialization and deserialization
of our request and response types.

To get started, create a new cargo project and add the required dependencies
to your Cargo.toml: actix-web, serde, serde_json, sqlx, tokio, and uuid.
Enable the runtime-tokio feature for sqlx so we can use async database
operations throughout the application.

First define our data model with a Book struct containing id, title, author,
isbn, and published_year fields. Derive Serialize and Deserialize from serde
so actix-web can automatically convert between JSON and our Rust types.
Then create database migration files that define the books table schema with
appropriate indexes on the isbn and author columns.

Next implement the HTTP handlers. The list handler queries all books and
returns them as a JSON array. The create handler validates the incoming JSON
body, inserts a new row, and returns the created book with a 201 status code.
The get, update, and delete handlers follow the same pattern using the book
id extracted from the URL path parameters.

For error handling we define an ApiError enum that maps to appropriate HTTP
status codes. Use actix-web's ResponseError trait to automatically convert
our error types into JSON error responses. This keeps the handler code clean
and focused on business logic rather than HTTP plumbing.

Add middleware for logging, CORS support, and request validation. Configure
the server to bind on 0.0.0.0:8080 with 4 worker threads. Finally write
integration tests using actix_web::test to verify each endpoint works
correctly with both valid and invalid inputs.

Deploy the application using Docker with a multi-stage build for minimal
image size. Use docker-compose to run the API server alongside a PostgreSQL
container, with environment variables for configuration. Add health check
endpoints and structured logging for production monitoring.
"#,
    },
    Sample {
        name: "Python data science",
        text: r#"
Exploratory Data Analysis with Python and Pandas

Data analysis begins with loading your dataset into a pandas DataFrame.
Use the read_csv function to import CSV files and inspect the first few
rows with the head method. Check data types with dtypes and get summary
statistics using the describe method on numerical columns.

Data cleaning is a critical step before any modeling. Handle missing values
by either dropping rows with dropna or filling them with fillna using the
mean or median of the column. Remove duplicate rows with drop_duplicates
and convert data types as needed using the astype method.

For data visualization, matplotlib and seaborn are the standard libraries
in the Python ecosystem. Create scatter plots with plt.scatter to explore
relationships between numeric variables, histograms with plt.hist to
understand distributions, and box plots with seaborn.boxplot to detect
outliers in your data. Customize your plots with titles, axis labels,
and color palettes for publication-quality figures.

Feature engineering transforms raw data into inputs suitable for machine
learning models. Create new columns from existing ones, encode categorical
variables using one-hot encoding with pandas.get_dummies, and scale numeric
features using scikit-learn's StandardScaler. Split your data into training
and test sets with train_test_split to evaluate model performance.

Build a regression model using scikit-learn's LinearRegression or a
classification model using RandomForestClassifier. Fit the model on the
training data with the fit method, make predictions with predict, and
evaluate accuracy using metrics like mean_squared_error for regression
or accuracy_score for classification tasks.

Use Jupyter notebooks for interactive development with inline plotting
and markdown annotations. Document your analysis steps clearly so others
can reproduce your results. Save your cleaned datasets with to_csv for
future use and export your models with joblib or pickle for deployment.
"#,
    },
    Sample {
        name: "Frontend web design",
        text: r#"
Responsive Web Design with Modern CSS

Building a responsive website starts with a solid CSS foundation using
Flexbox and CSS Grid for layout. Define a container with display: flex
to create horizontal or vertical layouts that adapt to screen size. Use
justify-content and align-items to position elements within the flex
container, and the flex-wrap property to allow items to flow onto
multiple lines on smaller screens.

CSS Grid provides two-dimensional layout control with grid-template-columns
and grid-template-rows. Define named grid areas with grid-template-areas
and place items using the grid-area property. This makes it easy to create
complex page layouts that reflow naturally from desktop to tablet to mobile.

Typography is the foundation of good design. Set a harmonious type scale
using clamp for fluid typography that scales between minimum and maximum
values. Use custom properties (CSS variables) to maintain consistency
across your design system. Define --color-primary, --font-heading, and
--spacing-unit variables that can be changed globally.

Accessibility is not optional. Use semantic HTML elements like header,
nav, main, section, and footer. Add aria labels to interactive elements
and ensure color contrast ratios meet WCAG AA standards. Test your site
with keyboard navigation and screen readers to verify all functionality
is accessible to users with disabilities.

Animations enhance user experience when used thoughtfully. Use CSS
transitions for hover effects on buttons and links, and keyframe
animations for loading states and page transitions. The prefers-reduced-
motion media query respects users who prefer less animation.

Mobile-first design means starting with the smallest screen and adding
complexity with min-width media queries. This approach ensures your site
works well on all devices and loads efficiently on mobile connections.
Test regularly using browser dev tools in responsive design mode.
"#,
    },
    Sample {
        name: "Linux devops",
        text: r#"
Linux Server Administration and Automation

Managing Linux servers efficiently requires mastery of the command line
and automation tools. Start with the basics of process management using
ps, top, and htop to monitor running processes. Use kill and killall
to terminate unresponsive processes and systemctl to manage systemd
services. Check resource usage with free for memory, df for disk space,
and netstat or ss for network connections.

Shell scripting is essential for automation. Write bash scripts using
variables, loops, conditionals, and functions. Use find with exec to
batch-process files, grep with regex for pattern matching in logs, and
awk or sed for text processing. Schedule recurring tasks with cron
and systemd timers for more complex scheduling needs.

Containerization with Docker simplifies application deployment. Write
Dockerfiles that specify the base image, install dependencies, copy
application code, and define the startup command. Use docker-compose
to orchestrate multi-container applications with linked services,
networks, and persistent volumes. Tag and push images to a registry
for deployment across environments.

Kubernetes orchestrates containers at scale. Define deployments with
replica counts, services for networking, and configmaps for environment
configuration. Use kubectl to inspect pods, view logs, and scale
applications horizontally. Implement health checks with liveness and
readiness probes to ensure your applications are running correctly.

Configuration management with Ansible keeps your infrastructure
consistent. Write playbooks in YAML that define the desired state of
your servers. Use roles to organize tasks, handlers, and variables
into reusable components. Run ad-hoc commands with ansible to quickly
check server status across your entire infrastructure.

Monitor your infrastructure with Prometheus for metrics collection and
Grafana for dashboards. Set up alerts for critical conditions like high
CPU usage, disk space running low, or services going offline. Centralize
logs using the ELK stack or Loki for troubleshooting and analysis.
"#,
    },
    Sample {
        name: "AI machine learning",
        text: r#"
Training Deep Learning Models with PyTorch

Deep learning has transformed how we approach complex pattern recognition
tasks. PyTorch provides a flexible framework for building and training
neural networks using tensor computations with automatic differentiation.
Define a model by subclassing nn.Module and implementing the forward
method that specifies how input data flows through the network layers.

Data preparation is crucial for model performance. Use the DataLoader
class to efficiently batch and shuffle your dataset during training.
Apply data augmentation techniques like random cropping, flipping, and
color jitter to reduce overfitting and improve generalization. Normalize
input tensors to have zero mean and unit variance for stable training.

The training loop iterates over epochs, processing batches of data
through the model, computing the loss with a criterion like cross-entropy
for classification or mean squared error for regression, and calling
backward to compute gradients. Use an optimizer like Adam or SGD with
learning rate scheduling to minimize the loss function over time.

Convolutional neural networks excel at image recognition tasks. Stack
Conv2d layers with increasing channel depth, interleaved with ReLU
activations and max-pooling layers to reduce spatial dimensions.
Add batch normalization to stabilize training and dropout layers to
prevent overfitting. End with fully connected layers for classification.

Transformer architectures dominate natural language processing. The
self-attention mechanism allows the model to weigh the importance of
different positions in the input sequence. Multi-head attention runs
multiple attention operations in parallel, capturing different types
of relationships between tokens. Positional encodings provide sequence
order information to the model.

Transfer learning leverages pretrained models for new tasks. Load a
model pretrained on ImageNet, freeze the early layers, and replace the
final classification head with new layers for your specific dataset.
Fine-tune the model with a lower learning rate to adapt the pretrained
features to your domain while preserving the general visual knowledge.
"#,
    },
    Sample {
        name: "Mobile development",
        text: r#"
Building Cross-Platform Mobile Apps with Flutter

Flutter enables building native-quality mobile applications for both
iOS and Android from a single Dart codebase. The framework uses a
widget-based architecture where everything from a simple text label
to complex layouts is a widget. Compose widgets together using
child and children properties to build your user interface hierarchy.

State management is a key concern in mobile app development. Use
setState for simple local state, or adopt Provider, Riverpod, or
Bloc for more complex application state that needs to be shared
across multiple screens. Keep your business logic separate from
your UI code by using ViewModels or Controllers that manage state
and expose it to widgets via streams or change notifiers.

Navigation and routing handle moving between screens in your app.
Use the Navigator widget with named routes for simple apps, or
implement a router with GoRouter for more complex navigation
patterns including deep linking and nested navigation. Pass data
between screens using constructor arguments or route parameters.

Platform-specific features require accessing native APIs through
platform channels. Implement features like camera access, location
services, biometric authentication, and push notifications by
writing platform-specific code in Kotlin or Swift and invoking it
from Dart through MethodChannel calls. Use community packages from
pub.dev for common native features.

Performance optimization is critical for a smooth user experience.
Profile your app using the Flutter DevTools to identify widget
rebuilds and jank. Use const constructors where possible to reduce
rebuilds, implement lazy loading for lists with ListView.builder,
and cache images using cached_network_image. Reduce app size by
removing unused resources and using code shrinking.

Testing mobile apps requires multiple approaches. Write unit tests
for your business logic and data models. Use widget tests to verify
individual widget behavior and integration tests for full user flows.
Run tests on both iOS and Android simulators to catch platform-
specific issues before releasing to app stores.
"#,
    },
    Sample {
        name: "Gaming graphics",
        text: r#"
Real-Time 3D Rendering with Vulkan and GLSL

Modern game engines leverage GPU compute capabilities to render
complex 3D scenes at interactive frame rates. The Vulkan API
provides low-level access to graphics hardware with explicit
control over memory management and command buffers. Set up a
Vulkan instance, select a physical device, create a logical
device, and configure graphics and present queues for rendering.

The rendering pipeline transforms 3D geometry into 2D images.
Vertex shaders process individual vertices, applying model-view-
projection matrix transformations to place objects in clip space.
Fragment shaders determine the color of each pixel using lighting
calculations, texture sampling, and material properties defined
in GLSL shading language source code.

A game loop runs at 60 frames per second, processing input events,
updating game state, and rendering each frame. Use delta time to
ensure consistent movement speeds regardless of frame rate. Implement
fixed time step for physics simulations to maintain stability.
Separate the update and render phases for better parallelism.

Physics simulation brings game worlds to life. Use a physics engine
like PhysX or Bullet for rigid body dynamics including collision
detection, joint constraints, and force-based movement. Implement
broad phase and narrow phase collision detection to efficiently
find colliding pairs among thousands of objects in the scene.

Spatial data structures accelerate rendering by culling objects
outside the camera view frustum. Use bounding volume hierarchies,
octrees, or binary space partitioning trees to organize scene
geometry. Implement occlusion culling to skip rendering objects
hidden behind other geometry, saving GPU processing time.

Post-processing effects enhance visual quality after the main
render pass. Apply bloom for glowing highlights, ambient occlusion
for realistic shadowing in corners, and tone mapping to convert
HDR values to displayable colors. Use compute shaders for GPU-
based particle systems and screen-space reflections.
"#,
    },
    Sample {
        name: "Audio production",
        text: r#"
Digital Audio Production and Music Streaming

Digital audio workstations have revolutionized music production by
providing powerful tools for recording, editing, and mixing audio.
Record multiple tracks simultaneously through audio interfaces with
low-latency monitoring. Edit waveform regions with cut, copy, paste,
and crossfade operations to arrange your recordings into a coherent
composition.

Audio effects processing shapes the character of your sound. Use
equalizers to boost or cut specific frequency ranges, compressors
to control dynamic range by reducing loud peaks, and reverbs to
simulate acoustic spaces from small rooms to large concert halls.
Delay and chorus effects add depth and width to your mixes.

Podcast production follows a different workflow focused on spoken
word clarity. Record with quality microphones in acoustically treated
spaces to minimize background noise and room reflections. Use noise
gates to silence pauses between speech, de-essers to reduce sibilance,
and compressors to smooth out volume variations across the episode.

Music streaming platforms deliver audio content to millions of
listeners worldwide. Encode audio files using codecs like AAC or
Opus that balance sound quality with bandwidth efficiency. Generate
album artwork, metadata tags, and playlist descriptions to help
listeners discover your content through search and recommendations.

Radio broadcasting combines live and pre-recorded content with
scheduling automation. Use playout software to manage playlists,
cues, and commercial breaks. Broadcast audio over internet radio
using Icecast or Shoutcast servers with streaming protocols like
HLS for adaptive bitrate delivery to listeners on various devices.

Live sound reinforcement requires understanding of acoustics and
signal flow. Set up a mixing console with auxiliary sends for
stage monitors and effects returns. Use graphic equalizers to tune
the room response and feedback suppressors to prevent howling.
Balance the front-of-house mix so every instrument and voice is
clear and present in the audience area.
"#,
    },
    Sample {
        name: "Social community",
        text: r#"
Building Online Communities and Social Platforms

Social media platforms connect people around shared interests and
experiences. Designing a social platform requires careful consideration
of user profiles, content feeds, and interaction mechanisms. Users
create profiles with biographical information, profile pictures, and
privacy settings that control who can see their content and activity.

Content feeds are the central feature of any social platform. Implement
algorithms that surface relevant posts based on recency, engagement
metrics, and user preferences. Support multiple content types including
text posts, image sharing, video uploads, and link previews with rich
metadata fetched from shared URLs.

Real-time messaging enables direct communication between users. Build
chat systems using WebSocket connections for instant message delivery
with typing indicators, read receipts, and push notifications. Organize
conversations into private direct messages and group chats with support
for media attachments, emoji reactions, and message threading.

Community management tools help moderators maintain healthy discussions.
Provide reporting mechanisms for inappropriate content, automated spam
detection using machine learning classifiers, and moderation queues
where flagged content is reviewed before being shown to the broader
community. Implement warning systems and temporary or permanent bans.

Social features like likes, shares, comments, and follows create
engagement loops that keep users returning to the platform. Notify
users when someone interacts with their content through in-app
notifications and email digests. Show trending topics and popular
content in discovery sections to help users find new communities.

Content moderation at scale requires both automated and human review.
Train natural language models to detect hate speech, harassment, and
misinformation. Establish clear community guidelines that define
acceptable behavior and content standards. Provide appeals processes
so users can challenge moderation decisions they disagree with.
"#,
    },
];

#[test]
fn explore_tagging_thresholds() {
    let tags = default_tags();
    let mut engine = TaggingEngine::new(&tags, 0.40).expect("failed to init tagging engine");

    let thresholds = [
        0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90,
    ];

    for sample in SAMPLES {
        println!();
        println!("=== {} ===", sample.name);
        println!("Text length: {} chars", sample.text.len());
        println!();

        for &threshold in &thresholds {
            let matched = engine
                .tags_for_with_threshold(sample.text, 5, threshold)
                .expect("tagging failed");

            if matched.is_empty() {
                println!("  {:.2}: (none)", threshold);
            } else {
                let tags_repr: Vec<String> = matched
                    .iter()
                    .map(|(tag, score)| format!("{} ({:.3})", tag, score))
                    .collect();
                println!("  {:.2}: {}", threshold, tags_repr.join(", "));
            }
        }
    }
}