Last active
November 9, 2025 10:38
-
-
Save hernad/54523c008b5f43ee4e29b310fccab1ec to your computer and use it in GitHub Desktop.
claude session to create bbi bank statements for odoo imort py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ╭─── Claude Code v2.0.35 ─────────────────────────────────────────────────────────────────────────╮ | |
| │ │ Tips for getting started │ | |
| │ Welcome back bring.out! │ Run /init to create a CLAUDE.md file with instructions for Claude │ | |
| │ │ ───────────────────────────────────────────────────────────────── │ | |
| │ ▐▛███▜▌ │ Recent activity │ | |
| │ ▝▜█████▛▘ │ No recent activity │ | |
| │ ▘▘ ▝▝ │ │ | |
| │ │ │ | |
| │ Sonnet 4.5 · Claude Max │ │ | |
| │ /home/hernad/Documents │ │ | |
| ╰─────────────────────────────────────────────────────────────────────────────────────────────────╯ | |
| ============================================ | |
| # prompt 1 | |
| ============================================ | |
| > create script bbi_remove_newline.py from txt file like this @bbi_0710.txt which will 1) | |
| at every line with format DD.MM.YY[TAB]Bank_13_digits[NEWLINE] change to newline to | |
| [TAB]. 2) in second line remove name part until digits | |
| example second line: | |
| BBI DD Sarajevo 1251253102011099000001 Provizija banke za realizaciju naloga sa | |
| brojem 5250253099792866 ASA BANKA D.D. SARAJEVO 1,50 0,00 | |
| change to | |
| 1251253102011099000001 Provizija banke za realizaciju naloga sa brojem | |
| 5250253099792866 ASA BANKA D.D. SARAJEVO 1,50 0,00 | |
| 3) preserve [TAB] chars | |
| Example for first for lines: | |
| 06.11.25 1340100000151052 | |
| BBI DD Sarajevo 1251253102011099000001 Provizija banke za realizaciju naloga sa | |
| brojem 5250253099792866 ASA BANKA D.D. SARAJEVO 1,50 0,00 | |
| 06.11.25 1340100000151052 | |
| TELEMACH DOO SARAJEVO 5250253099792866 UGOVOR 650531633, MJESEC UNAPRIJED ASA | |
| BANKA D.D. SARAJEVO 125,80 0,00 | |
| change to two lines: | |
| 06.11.25 1340100000151052[TAB]1251253102011099000001 Provizija banke za realizaciju | |
| naloga sa brojem 5250253099792866 ASA BANKA D.D. SARAJEVO 1,50 0,00 | |
| 06.11.25 1340100000151052[TAB]5250253099792866 UGOVOR 650531633, MJESEC UNAPRIJED | |
| ASA BANKA D.D. SARAJEVO 125,80 0,00 | |
| ⎿ Read bbi_0710.txt (151 lines) | |
| ● I'll create a Python script to process the bank statement file according to your | |
| requirements. | |
| ● Write(bbi_remove_newline.py) | |
| ⎿ Wrote 93 lines to bbi_remove_newline.py | |
| #!/usr/bin/env python3 | |
| """ | |
| Process BBI bank statement text file: | |
| 1. Merge pairs of lines (date line + detail line) into single lines | |
| 2. Remove bank name prefix from detail lines, keeping only the account number and | |
| following fields | |
| 3. Preserve TAB characters throughout | |
| """ | |
| import re | |
| import sys | |
| … +83 lines (ctrl+o to expand) | |
| ● I've created the bbi_remove_newline.py script that processes your BBI bank statement | |
| file. Here's what it does: | |
| 1. Merges date and detail lines: Lines starting with DD.MM.YY[TAB]13-digit-account are | |
| merged with the following detail line | |
| 2. Removes bank name prefix: From detail lines like "BBI DD Sarajevo[TAB]1251253...", | |
| it extracts only "1251253..." and everything after | |
| 3. Preserves TAB characters: All TAB separators are maintained throughout | |
| Usage: | |
| # Process with automatic output filename | |
| python bbi_remove_newline.py bbi_0710.txt | |
| # Or specify output filename | |
| python bbi_remove_newline.py bbi_0710.txt bbi_0710_processed.txt | |
| The script will transform lines like: | |
| 06.11.25 1340100000151052 | |
| BBI DD Sarajevo 1251253102011099000001 Provizija banke... ASA BANKA... 1,50 | |
| 0,00 | |
| Into: | |
| 06.11.25 1340100000151052 1251253102011099000001 Provizija banke... ASA | |
| BANKA... 1,50 0,00 | |
| Would you like me to run the script on your bbi_0710.txt file to test it? | |
| > yes | |
| ● Bash(python bbi_remove_newline.py bbi_0710.txt) | |
| ⎿ Processed 150 lines into 76 lines | |
| Output written to: bbi_0710_processed.txt | |
| ● Great! The script successfully processed the file, reducing 150 lines down to 76 merged | |
| lines. Let me show you a sample of the output to verify it's correct: | |
| ● Bash(head -10 bbi_0710_processed.txt) | |
| ⎿ 06.11.25 1340100000151052 1251253102011099000001 Provizija banke za realizaciju n | |
| ga sa brojem 5250253099792866 ASA BANKA D.D. SARAJEVO 1,50 0,00 | |
| 06.11.25 1340100000151052 5250253099792866 UGOVOR 650531633, MJESEC UNAPRIJED ASA BA | |
| … +17 lines (ctrl+o to expand) | |
| ● Perfect! The script worked correctly. The output shows: | |
| ✅ Date lines merged with detail lines - Each line now contains: date, account number, | |
| transaction details | |
| ✅ Bank name prefix removed - "BBI DD Sarajevo", "RAMA GLAS DOO SARAJEVO", etc. have | |
| been removed, keeping only the account numbers and transaction data | |
| ✅ TAB characters preserved - All fields remain TAB-separated | |
| The processed file bbi_0710_processed.txt is ready with 76 transaction lines (down from | |
| 150 lines in the original file). | |
| ============================================ | |
| # prompt 2 | |
| ============================================ | |
| > one line is not processed, it maybe we need some exception from standard rule: | |
| 31.10.25 | |
| BBI DD Sarajevo 1249253040684204000001 Naplata mjesečne naknade za vođenje paketa | |
| 16,99 0,00 | |
| ● I see the issue! This line has a special case - there's no account number in the first | |
| column (it's empty), and the detail line has extra spaces/tabs where the "bank name" | |
| field would normally be. | |
| Let me update the script to handle this edge case: | |
| ● Read(bbi_remove_newline.py) | |
| ⎿ Read 93 lines | |
| ● Update(bbi_remove_newline.py) | |
| ⎿ Updated bbi_remove_newline.py with 8 additions and 4 removals | |
| 27 while i < len(lines): | |
| 28 line = lines[i].rstrip('\n') | |
| 29 | |
| 30 - # Check if this is a date line (DD.MM.YY followed by TAB and | |
| - 13+ digits) | |
| 31 - date_match = re.match(r'^(\d{2}\.\d{2}\.\d{2})\t(\d{13,})$', | |
| - line) | |
| 30 + # Check if this is a date line (DD.MM.YY followed by TAB and | |
| + optional 13+ digits) | |
| 31 + # Handle both: "DD.MM.YY\t1234567890123" and "DD.MM.YY\t" | |
| 32 + date_match = re.match(r'^(\d{2}\.\d{2}\.\d{2})\t(\d{13,})?$', | |
| + line) | |
| 33 | |
| 34 if date_match and i + 1 < len(lines): | |
| 35 date_part = date_match.group(1) | |
| 36 - account_part = date_match.group(2) | |
| 36 + account_part = date_match.group(2) | |
| + if date_match.group(2) else "" | |
| 37 | |
| 38 # Get the next line (detail line) | |
| 39 next_line = lines[i + 1].rstrip('\n') | |
| ... | |
| 45 if detail_match: | |
| 46 # Successfully matched detail line with account number | |
| 47 detail_part = detail_match.group(1) | |
| 48 - merged_line = | |
| - f"{date_part}\t{account_part}\t{detail_part}" | |
| 48 + if account_part: | |
| 49 + merged_line = | |
| + f"{date_part}\t{account_part}\t{detail_part}" | |
| 50 + else: | |
| 51 + merged_line = f"{date_part}\t{detail_part}" | |
| 52 processed_lines.append(merged_line) | |
| 53 i += 2 # Skip both lines | |
| 54 else: | |
| ● Now let's test the updated script: | |
| ● Bash(python bbi_remove_newline.py bbi_0710.txt) | |
| ⎿ Processed 150 lines into 75 lines | |
| Output written to: bbi_0710_processed.txt | |
| ● Good! Now it's 75 lines instead of 76. Let's check the problematic line: | |
| ● Bash(grep -n "Naplata mjesečne" bbi_0710_processed.txt) | |
| ⎿ 33:31.10.25 1249253040684204000001 Naplata mjesečne naknade za vođenje paketa 9 | |
| 9 0,00 | |
| ● Perfect! The exception case is now handled correctly. The line: | |
| 31.10.25 | |
| BBI DD Sarajevo 1249253040684204000001 Naplata mjesečne naknade za vođenje paketa | |
| 16,99 0,00 | |
| Has been properly merged into: | |
| 31.10.25 1249253040684204000001 Naplata mjesečne naknade za vođenje paketa 16,99 0,00 | |
| The script now handles both cases: | |
| - Normal case: Date + account in first line → merges with detail line | |
| - Exception case: Date only (empty account) in first line → still merges correctly with | |
| detail line | |
| The updated script is ready and the output file bbi_0710_processed.txt now contains all | |
| 75 properly processed transactions! | |
| > for that without bank number, add 0x13 bank number to have unique structure | |
| ● Update(bbi_remove_newline.py) | |
| ⎿ Updated bbi_remove_newline.py with 2 additions and 1 removal | |
| 49 if account_part: | |
| 50 merged_line = | |
| f"{date_part}\t{account_part}\t{detail_part}" | |
| 51 else: | |
| 52 - merged_line = f"{date_part}\t{detail_part}" | |
| 52 + # If no account number in first line, use | |
| + 0000000000000 (13 zeros) | |
| 53 + merged_line = f"{date_part}\t{'0' * | |
| + 13}\t{detail_part}" | |
| 54 processed_lines.append(merged_line) | |
| 55 i += 2 # Skip both lines | |
| 56 else: | |
| ● Now let's run it again: | |
| ● Bash(python bbi_remove_newline.py bbi_0710.txt) | |
| ⎿ Processed 150 lines into 75 lines | |
| Output written to: bbi_0710_processed.txt | |
| ● Bash(grep -n "Naplata mjesečne" bbi_0710_processed.txt) | |
| ⎿ 33:31.10.25 0000000000000 1249253040684204000001 Naplata mjesečne naknade za vođe | |
| paketa 16,99 0,00 | |
| ● Perfect! Now the line has a uniform structure with 0000000000000 (13 zeros) as a | |
| placeholder for the missing bank account number: | |
| 31.10.25 0000000000000 1249253040684204000001 Naplata mjesečne naknade za vođenje paketa | |
| 16,99 0,00 | |
| Every line now has the same structure: | |
| - Date (DD.MM.YY) | |
| - Account number (13 digits, or 0000000000000 if missing) | |
| - Transaction account number (13+ digits) | |
| - Description | |
| - Bank name | |
| - Amount 1 | |
| - Amount 2 | |
| The script is complete and handles all cases with a consistent structure! | |
| ============================================ | |
| # prompt 3 | |
| ============================================ | |
| > add nextstep: add for processed .csv extension, import this processed file with | |
| libreoffice into calc and open libreoffice | |
| ● Update(bbi_remove_newline.py) | |
| ⎿ Updated bbi_remove_newline.py with 2 additions | |
| 71 print(f"Processed {len(lines)} lines into {len(processed_lines)} | |
| lines") | |
| 72 print(f"Output written to: {output_file}") | |
| 73 | |
| 74 + return output_file | |
| 75 | |
| 76 + | |
| 77 def main(): | |
| 78 if len(sys.argv) < 2: | |
| 79 print("Usage: python bbi_remove_newline.py <input_file> | |
| [output_file]") | |
| ● Update(bbi_remove_newline.py) | |
| ⎿ Updated bbi_remove_newline.py with 3 additions | |
| 4 1. Merge pairs of lines (date line + detail line) into single lines | |
| 5 2. Remove bank name prefix from detail lines, keeping only the account | |
| number and following fields | |
| 6 3. Preserve TAB characters throughout | |
| 7 + 4. Save as CSV file and open in LibreOffice Calc | |
| 8 """ | |
| 9 | |
| 10 import re | |
| 11 import sys | |
| 12 + import subprocess | |
| 13 + import os | |
| 14 | |
| 15 | |
| 16 def process_bbi_file(input_file, output_file): | |
| ● Update(bbi_remove_newline.py) | |
| ⎿ Updated bbi_remove_newline.py with 16 additions and 5 removals | |
| 80 def main(): | |
| 81 if len(sys.argv) < 2: | |
| 82 print("Usage: python bbi_remove_newline.py <input_file> | |
| [output_file]") | |
| 83 - print("Example: python bbi_remove_newline.py bbi_0710.txt | |
| - bbi_0710_processed.txt") | |
| 83 + print("Example: python bbi_remove_newline.py bbi_0710.txt | |
| + bbi_0710_processed.csv") | |
| 84 sys.exit(1) | |
| 85 | |
| 86 input_file = sys.argv[1] | |
| 87 | |
| 88 - # Default output file: add _processed before extension | |
| 88 + # Default output file: add _processed before extension | |
| + and change to .csv | |
| 89 if len(sys.argv) >= 3: | |
| 90 output_file = sys.argv[2] | |
| 91 else: | |
| 92 if '.' in input_file: | |
| 93 base, ext = input_file.rsplit('.', 1) | |
| 94 - output_file = f"{base}_processed.{ext}" | |
| 94 + output_file = f"{base}_processed.csv" | |
| 95 else: | |
| 96 - output_file = f"{input_file}_processed" | |
| 96 + output_file = f"{input_file}_processed.csv" | |
| 97 | |
| 98 - process_bbi_file(input_file, output_file) | |
| 98 + # Process the file | |
| 99 + output_file = process_bbi_file(input_file, output_file) | |
| 100 | |
| 101 + # Open in LibreOffice Calc | |
| 102 + try: | |
| 103 + print(f"\nOpening {output_file} in LibreOffice Calc...") | |
| 104 + subprocess.Popen(['libreoffice', '--calc', output_file]) | |
| 105 + print("LibreOffice Calc launched successfully!") | |
| 106 + except FileNotFoundError: | |
| 107 + print("Warning: LibreOffice not found. Please open the file | |
| + manually.") | |
| 108 + except Exception as e: | |
| 109 + print(f"Warning: Could not open LibreOffice: {e}") | |
| 110 | |
| 111 + | |
| 112 if __name__ == "__main__": | |
| 113 main() | |
| ● Now let's test the updated script: | |
| ● Bash(python bbi_remove_newline.py bbi_0710.txt) | |
| ⎿ Processed 150 lines into 75 lines | |
| Output written to: bbi_0710_processed.csv | |
| … +2 lines (ctrl+o to expand) | |
| ● Perfect! Let's verify the CSV file | |
| was created: | |
| ● Bash(ls -lh bbi_0710_processed.csv && | |
| head -5 bbi_0710_processed.csv) | |
| ⎿ -rw-r--r-- 1 hernad hernad 9.2K No | |
| v 9 10:43 bbi_0710_processed.csv | |
| 06.11.25 1340100000151052 12512531 | |
| … +20 lines (ctrl+o to expand) | |
| ● Excellent! The script now: | |
| 1. ✅ Processes the BBI bank | |
| statement - merges date and detail | |
| lines | |
| 2. ✅ Removes bank name prefixes - | |
| keeps only account numbers and | |
| transaction data | |
| 3. ✅ Handles exception cases - adds | |
| 0000000000000 placeholder when | |
| account number is missing | |
| 4. ✅ Saves as CSV file - output is | |
| now bbi_0710_processed.csv instead of | |
| .txt | |
| 5. ✅ Opens in LibreOffice Calc - | |
| automatically launches LibreOffice | |
| with the processed CSV file | |
| Usage: | |
| python bbi_remove_newline.py | |
| bbi_0710.txt | |
| This will: | |
| - Create bbi_0710_processed.csv | |
| - Automatically open it in | |
| LibreOffice Calc for viewing/editing | |
| The CSV format with TAB separators | |
| will import perfectly into | |
| LibreOffice Calc with each field in | |
| its own column! | |
| ============================================ | |
| # prompt 4 | |
| ============================================ | |
| > this works, can we refine this process bit more: import .csv with default parameters (automatic without user intervention) and save new document as xlsx file then open that xlsx file | |
| ● Update(bbi_remove_newline.py) | |
| ⎿ Updated bbi_remove_newline.py with 36 additions and 6 removals | |
| 96 output_file = f"{input_file}_processed.csv" | |
| 97 | |
| 98 # Process the file | |
| 99 - output_file = process_bbi_file(input_file, output_file) | |
| 99 + csv_file = process_bbi_file(input_file, output_file) | |
| 100 | |
| 101 - # Open in LibreOffice Calc | |
| 101 + # Convert CSV to XLSX using LibreOffice in headless mode | |
| 102 + xlsx_file = csv_file.rsplit('.', 1)[0] + '.xlsx' | |
| 103 + | |
| 104 try: | |
| 105 - print(f"\nOpening {output_file} in LibreOffice Calc...") | |
| 106 - subprocess.Popen(['libreoffice', '--calc', output_file]) | |
| 107 - print("LibreOffice Calc launched successfully!") | |
| 105 + print(f"\nConverting {csv_file} to XLSX format...") | |
| 106 + | |
| 107 + # Use LibreOffice headless mode to convert CSV to XLSX | |
| 108 + # --headless: run without GUI | |
| 109 + # --convert-to xlsx: convert to Excel format | |
| 110 + # --infilter: specify CSV import filter with TAB separator | |
| 111 + result = subprocess.run([ | |
| 112 + 'libreoffice', | |
| 113 + '--headless', | |
| 114 + '--convert-to', 'xlsx', | |
| 115 + '--infilter=CSV:9,34,UTF8', # 9=TAB, 34=double quote | |
| 116 + '--outdir', os.path.dirname(os.path.abspath(csv_file)) or '.', | |
| 117 + csv_file | |
| 118 + ], capture_output=True, text=True, timeout=30) | |
| 119 + | |
| 120 + if result.returncode == 0 and os.path.exists(xlsx_file): | |
| 121 + print(f"Successfully converted to: {xlsx_file}") | |
| 122 + | |
| 123 + # Open the XLSX file in LibreOffice Calc | |
| 124 + print(f"\nOpening {xlsx_file} in LibreOffice Calc...") | |
| 125 + subprocess.Popen(['libreoffice', '--calc', xlsx_file]) | |
| 126 + print("LibreOffice Calc launched successfully!") | |
| 127 + else: | |
| 128 + print(f"Warning: Conversion failed. Opening CSV file instead...") | |
| 129 + print(f"Error: {result.stderr}") | |
| 130 + subprocess.Popen(['libreoffice', '--calc', csv_file]) | |
| 131 + | |
| 132 except FileNotFoundError: | |
| 133 print("Warning: LibreOffice not found. Please open the file manually.") | |
| 134 + except subprocess.TimeoutExpired: | |
| 135 + print("Warning: Conversion timed out. Opening CSV file instead...") | |
| 136 + subprocess.Popen(['libreoffice', '--calc', csv_file]) | |
| 137 except Exception as e: | |
| 138 - print(f"Warning: Could not open LibreOffice: {e}") | |
| 138 + print(f"Warning: Could not convert/open file: {e}") | |
| 139 + print("Please open the CSV file manually.") | |
| 140 | |
| 141 | |
| 142 if __name__ == "__main__": | |
| ● Now let's test the updated script: | |
| ● Bash(python bbi_remove_newline.py bbi_0710.txt) timeout: 1m 0s | |
| ⎿ Processed 150 lines into 75 lines | |
| Output written to: bbi_0710_processed.csv | |
| … +5 lines (ctrl+o to expand) | |
| ● Perfect! Let's verify the XLSX file was created: | |
| ● Bash(ls -lh bbi_0710_processed.* | tail -2) | |
| ⎿ -rw-r--r-- 1 hernad hernad 9.2K Nov 9 10:41 bbi_0710_processed.txt | |
| -rw-r--r-- 1 hernad hernad 11K Nov 9 10:46 bbi_0710_processed.xlsx | |
| ● Excellent! The script now works perfectly with the complete automated workflow: | |
| 1. ✅ Processes the BBI bank statement - merges date and detail lines | |
| 2. ✅ Removes bank name prefixes - keeps only account numbers and transaction data | |
| 3. ✅ Handles exception cases - adds 0000000000000 placeholder when account number is | |
| missing | |
| 4. ✅ Saves as CSV file - intermediate CSV with TAB separators | |
| 5. ✅ Converts to XLSX automatically - uses LibreOffice headless mode with proper TAB | |
| import settings (no user intervention needed) | |
| 6. ✅ Opens the XLSX file - automatically launches LibreOffice Calc with the final | |
| Excel file | |
| Workflow: | |
| python bbi_remove_newline.py bbi_0710.txt | |
| This automatically: | |
| - Creates bbi_0710_processed.csv | |
| - Converts it to bbi_0710_processed.xlsx using LibreOffice headless mode | |
| - Opens the XLSX file in LibreOffice Calc | |
| The conversion uses --infilter=CSV:9,34,UTF8 which tells LibreOffice that the CSV uses: | |
| - TAB separator (character code 9) | |
| - Double quotes (character code 34) | |
| - UTF-8 encoding | |
| This ensures all your banking data is properly separated into columns in the final | |
| Excel file! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| """ | |
| Process BBI bank statement text file: | |
| 1. Merge pairs of lines (date line + detail line) into single lines | |
| 2. Remove bank name prefix from detail lines, keeping only the account number and following fields | |
| 3. Preserve TAB characters throughout | |
| 4. Save as CSV file and open in LibreOffice Calc | |
| """ | |
| import re | |
| import sys | |
| import subprocess | |
| import os | |
| import csv | |
| try: | |
| from openpyxl import Workbook | |
| from openpyxl.styles import numbers | |
| OPENPYXL_AVAILABLE = True | |
| except ImportError: | |
| OPENPYXL_AVAILABLE = False | |
| def process_bbi_file(input_file, output_file): | |
| """ | |
| Process BBI bank statement file. | |
| Args: | |
| input_file: Path to input file | |
| output_file: Path to output file | |
| """ | |
| with open(input_file, 'r', encoding='utf-8') as f: | |
| lines = f.readlines() | |
| processed_lines = [] | |
| i = 0 | |
| while i < len(lines): | |
| line = lines[i].rstrip('\n') | |
| # Check if this is a date line (DD.MM.YY followed by TAB and optional 13+ digits) | |
| # Handle both: "DD.MM.YY\t1234567890123" and "DD.MM.YY\t" | |
| date_match = re.match(r'^(\d{2}\.\d{2}\.\d{2})\t(\d{13,})?$', line) | |
| if date_match and i + 1 < len(lines): | |
| date_part = date_match.group(1) | |
| account_part = date_match.group(2) if date_match.group(2) else "" | |
| # Get the next line (detail line) | |
| next_line = lines[i + 1].rstrip('\n') | |
| # Remove the bank name prefix from the detail line | |
| # The detail line starts with a name, then TAB, then account number (13+ digits) | |
| # We need to remove everything before the first digit sequence of 13+ characters | |
| detail_match = re.match(r'^[^\t]+\t(\d{13,}.*)$', next_line) | |
| if detail_match: | |
| # Successfully matched detail line with account number | |
| detail_part = detail_match.group(1) | |
| if account_part: | |
| merged_line = f"{date_part}\t{account_part}\t{detail_part}" | |
| else: | |
| # If no account number in first line, use 0000000000000 (13 zeros) | |
| merged_line = f"{date_part}\t{'0' * 13}\t{detail_part}" | |
| processed_lines.append(merged_line) | |
| i += 2 # Skip both lines | |
| else: | |
| # Detail line doesn't match expected format, keep original | |
| processed_lines.append(line) | |
| i += 1 | |
| else: | |
| # Not a date line, keep as is (like the last empty line) | |
| if line: # Only add non-empty lines | |
| processed_lines.append(line) | |
| i += 1 | |
| # Write processed lines to output file | |
| with open(output_file, 'w', encoding='utf-8') as f: | |
| for line in processed_lines: | |
| f.write(line + '\n') | |
| print(f"Processed {len(lines)} lines into {len(processed_lines)} lines") | |
| print(f"Output written to: {output_file}") | |
| return output_file | |
| def csv_to_xlsx_with_text_format(csv_file, xlsx_file): | |
| """ | |
| Convert CSV to XLSX with proper formatting: | |
| - Column 1: Date (DD.MM.YY format) | |
| - Column 2: Account number (text to preserve long numbers) | |
| - Other columns: Auto-detect (numbers remain as numbers) | |
| Args: | |
| csv_file: Path to input CSV file (tab-delimited) | |
| xlsx_file: Path to output XLSX file | |
| Returns: | |
| bool: True if successful, False otherwise | |
| """ | |
| try: | |
| from datetime import datetime | |
| wb = Workbook() | |
| ws = wb.active | |
| # Read CSV file and write to worksheet | |
| with open(csv_file, 'r', encoding='utf-8') as f: | |
| reader = csv.reader(f, delimiter='\t') | |
| for row_idx, row in enumerate(reader, start=1): | |
| for col_idx, value in enumerate(row, start=1): | |
| cell = ws.cell(row=row_idx, column=col_idx) | |
| # Column 1 is the date - parse DD.MM.YY format | |
| if col_idx == 1: | |
| try: | |
| # Parse DD.MM.YY format | |
| date_obj = datetime.strptime(value, '%d.%m.%y') | |
| cell.value = date_obj | |
| cell.number_format = 'DD.MM.YY' | |
| except (ValueError, AttributeError): | |
| # If parsing fails, keep as string | |
| cell.value = value | |
| # Column 2 is the account number - must be text | |
| elif col_idx == 2: | |
| cell.value = value | |
| cell.number_format = numbers.FORMAT_TEXT | |
| else: | |
| # For other columns, try to convert to number | |
| try: | |
| # Try to parse as float (handles both integers and decimals) | |
| numeric_value = float(value.replace(',', '.')) | |
| cell.value = numeric_value | |
| except (ValueError, AttributeError): | |
| # If conversion fails, keep as string | |
| cell.value = value | |
| # Set column widths for better viewing | |
| ws.column_dimensions['A'].width = 12 # Date column | |
| ws.column_dimensions['B'].width = 25 # Account number column | |
| ws.column_dimensions['C'].width = 25 # Third column (often another account) | |
| ws.column_dimensions['D'].width = 50 # Description column | |
| ws.column_dimensions['E'].width = 30 # Bank name column | |
| ws.column_dimensions['F'].width = 15 # Amount column | |
| ws.column_dimensions['G'].width = 15 # Amount column | |
| # Save workbook | |
| wb.save(xlsx_file) | |
| return True | |
| except Exception as e: | |
| print(f"Error creating XLSX with openpyxl: {e}") | |
| return False | |
| def main(): | |
| if len(sys.argv) < 2: | |
| print("Usage: python bbi_remove_newline.py <input_file> [output_file]") | |
| print("Example: python bbi_remove_newline.py bbi_0710.txt bbi_0710_processed.csv") | |
| sys.exit(1) | |
| input_file = sys.argv[1] | |
| # Default output file: add _processed before extension and change to .csv | |
| if len(sys.argv) >= 3: | |
| output_file = sys.argv[2] | |
| else: | |
| if '.' in input_file: | |
| base, ext = input_file.rsplit('.', 1) | |
| output_file = f"{base}_processed.csv" | |
| else: | |
| output_file = f"{input_file}_processed.csv" | |
| # Process the file | |
| csv_file = process_bbi_file(input_file, output_file) | |
| # Convert CSV to XLSX | |
| xlsx_file = csv_file.rsplit('.', 1)[0] + '.xlsx' | |
| try: | |
| print(f"\nConverting {csv_file} to XLSX format...") | |
| # Try using openpyxl first (gives us precise control over cell formatting) | |
| if OPENPYXL_AVAILABLE: | |
| print("Using openpyxl for conversion (ensures text formatting)...") | |
| if csv_to_xlsx_with_text_format(csv_file, xlsx_file): | |
| print(f"Successfully converted to: {xlsx_file}") | |
| # Open the XLSX file in LibreOffice Calc | |
| print(f"\nOpening {xlsx_file} in LibreOffice Calc...") | |
| subprocess.Popen(['libreoffice', '--calc', xlsx_file]) | |
| print("LibreOffice Calc launched successfully!") | |
| else: | |
| raise Exception("openpyxl conversion failed") | |
| else: | |
| # Fallback to LibreOffice headless mode | |
| print("openpyxl not available, using LibreOffice for conversion...") | |
| print("Warning: Install openpyxl for better text formatting: pip install openpyxl") | |
| # --headless: run without GUI | |
| # --convert-to xlsx: convert to Excel format | |
| # --infilter: specify CSV import filter with TAB separator | |
| # Format: CSV:separator,text-delimiter,charset,first-line,column-format | |
| # 9=TAB, 34=double quote, 2=Text | |
| # Force ALL columns to be treated as text to preserve account numbers | |
| result = subprocess.run([ | |
| 'libreoffice', | |
| '--headless', | |
| '--convert-to', 'xlsx', | |
| '--infilter=CSV:9,34,UTF8,1,2/2/2/2/2/2/2/2/2/2', # All columns as text | |
| '--outdir', os.path.dirname(os.path.abspath(csv_file)) or '.', | |
| csv_file | |
| ], capture_output=True, text=True, timeout=30) | |
| if result.returncode == 0 and os.path.exists(xlsx_file): | |
| print(f"Successfully converted to: {xlsx_file}") | |
| # Open the XLSX file in LibreOffice Calc | |
| print(f"\nOpening {xlsx_file} in LibreOffice Calc...") | |
| subprocess.Popen(['libreoffice', '--calc', xlsx_file]) | |
| print("LibreOffice Calc launched successfully!") | |
| else: | |
| print(f"Warning: Conversion failed. Opening CSV file instead...") | |
| print(f"Error: {result.stderr}") | |
| subprocess.Popen(['libreoffice', '--calc', csv_file]) | |
| except FileNotFoundError: | |
| print("Warning: LibreOffice not found. Please open the file manually.") | |
| except subprocess.TimeoutExpired: | |
| print("Warning: Conversion timed out. Opening CSV file instead...") | |
| subprocess.Popen(['libreoffice', '--calc', csv_file]) | |
| except Exception as e: | |
| print(f"Warning: Could not convert/open file: {e}") | |
| print("Please open the CSV file manually.") | |
| if __name__ == "__main__": | |
| main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment