A browser-based tool for fuzzy matching between datasets. This application helps you find and score potential matches between two datasets based on configurable similarity thresholds.
Visit fuzzymatcher.pages.dev to use the application immediately.
Features
- Browser-Only Processing: All data processing happens client-side - no server required
- Multiple File Formats: Import data from CSV and Excel files
- Customizable Matching:
- Configure field weights for matching priority
- Set thresholds for good, moderate, and poor matches
- Preprocess data for better matching (lowercase, remove special characters, etc.)
- Powerful Results:
- Color-coded match results based on quality
- Export results to CSV for further analysis
- Statistics and visualizations for match quality distribution
- Responsive Design: Works on desktop and mobile devices
Use Cases
- Data Cleansing: Find duplicate records across datasets
- Entity Resolution: Link records from different sources that refer to the same entity
- Customer Matching: Match customers across different systems or databases
- Address Verification: Compare addresses to standardized data
- Record Linkage: Connect datasets without common unique identifiers
How to Use
1. Prepare Your Data
For best results:
- Ensure both datasets have the same field structure
- Clean data beforehand if possible (standardize formats, fix obvious errors)
- Include unique identifiers in each dataset
2. Load Your Data
- Go to the "Items To Match" tab and upload your primary dataset
- Go to the "Source To Match Against" tab and upload your comparison dataset
3. Configure Field Weights
In the "Items To Match" tab:
- Set weights for each field (defaults to 1)
- Use higher weights (e.g., 2, 3) for more reliable fields
- Use "0" to exclude a field from match calculation
- Use "M" to designate a "must-match" field
4. Adjust Matching Parameters
In the "Control Settings" tab:
- Number of matches to find: Maximum matches to return per record
- Minimum Acceptable Match Quality: Threshold below which matches are ignored
- Good Quality Match: Threshold for a good match
- Preprocessing Options: Select data cleanup options to apply
5. Run the Matching Process
Click "Run Matching" to start the process. The progress bar will show status.
6. Review Results
- Results Tab: View all matches with color coding by quality
- Analysis Tab: See statistics and visualizations
- Click "Export to CSV" to save results for further analysis
Technical Details
The application uses the following technologies:
- Bootstrap 5: For responsive UI elements
- PapaParse: For CSV parsing
- SheetJS: For Excel file processing
- Chart.js: For data visualization
- Browser-native string similarity algorithms: For fuzzy matching
All processing happens in the browser, with no data sent to any server.
Limitations
- Performance: Large datasets (>10,000 records) may cause performance issues
- Memory Usage: Browser memory limits apply
- Algorithm: The simplified string matching algorithm is less sophisticated than specialized libraries
- Browser Support: Modern browsers required (Chrome, Firefox, Edge, Safari)
Future Improvements
- Advanced matching algorithms
- Phonetic matching for names
- Export to Excel
- Save/restore settings
- Manual match review workflow
- Batch processing for large datasets