Xojo Documentation PDF

Underwriters_Technologies · December 4, 2023, 6:30pm

That’s like saying, “Isn’t there some people programming for Xojo?”

There’s an unbelievable number of things that can be done with LLM. And I’ll bet every single one of you will be doing it a year from now. Remember when you first heard about access to the Internet. Don’t you wish you jumped on the ship a little bit earlier?

AlbertoD · December 4, 2023, 7:03pm

I guess, what Emile is saying:

and

Michael_Dettmer · December 6, 2023, 7:59am

Hi all,
as stated somewhere else, I am using Dash for Mac (best tool):

And for the Windows lovers:
https://zealdocs.org/
Both work fine for me though only an old version 2019 R2 is available. But there is a lot of other stuff available. I wonder how this doc found it’s way in there. I think Christian was involved somehow?

I asked Geoff in London but did not get an answer … or I did not understand ?

Emile_Schwarz · December 6, 2023, 8:06am

Yes, that was that.

Underwriters_Technologies · December 12, 2023, 6:03pm

Robert, thanks for creating a PDF from the current version of the documentation.

Even though it’s still one large file and needs breaking down into smaller segments, I believe it could be pretty useful

Would it be possible for you to share this comprehensive PDF with us?

Robert_Livingston · December 13, 2023, 12:41am

I want to get the PDF version of the current state of the documentation (version 2023.Release4). This takes a little while. Then I will return to this.

Underwriters_Technologies · December 13, 2023, 3:18pm

That would be great. Thanks. But in the meantime, do you have the R3?

Robert_Livingston · December 13, 2023, 8:44pm

I have the PDF’s for 2023 R4
They have been divided into:

GettingStarted
Topics
API
–Deprecated
–Exceptions
Resources

Deprecated and Exceptions have been pulled out of API as separate PDFs. Of course, it is possible to recombine PDF’s into whatever collections one might want.

Now I want to get the permission from the publisher to make these available to folks on the forum:

All contents © 1998-2023 by Xojo, Inc. All rights reserved. No part of the Xojo Documentation or the related files may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording, or otherwise) without the prior written permission of the publisher.

Tom_Dixon · December 13, 2023, 8:55pm

Sending an email to hello@xojo.com with your request for permission would be the quickest way to get it in front of them for approval.

Robert_Livingston · December 13, 2023, 9:12pm

Will do.

Robert_Livingston · December 14, 2023, 12:04am

Link to PDF’s

These are the 6 PDF’s mentioned above.

GettingStarted
Topics
API
–Deprecated
–Exceptions
Resources

They are being provided free of charge with the permission of Xojo. I am responsible errors that I might have introduced by my creating these PDFs.

Jean-Yves_Pochez · December 14, 2023, 7:36am

thanks @Robert_Livingston
how did you proceed to build this pdf documentation ?

Underwriters_Technologies · December 14, 2023, 5:23pm

This is wonderful. Thank you.

Ian_Kennedy · December 14, 2023, 5:31pm

I would love to find a way of turning a series of HTML files into a Word document for translators to work on. Opening every page manually and copy and pasting is pretty tedious.

Jean-Yves_Pochez · December 14, 2023, 5:49pm

I found that the links inside the pdfs sends to the html page on the web, and it should send to the same place but in the pdf file… don’t know if you can fix that easily ?

Robert_Livingston · December 14, 2023, 6:16pm

MacOS has the capability of rendering any Web page as a PDF. I used the on-line version of the documentation so the Web page were available.

It would seem likely that if you were a sophisticated user of the tools that are used to create the documentation, you might be able to write a program to create the PDF’s. I do not know. The off-line version of the documentation consists of files that exist in formats that are unfamiliar with, and I do not know how to parse them.

© Copyright 2023, Xojo, Inc.
Built with Sphinx using a theme provided by Read the Docs.

Sphinx uses the reStructuredText markup language by default, and can read MyST markdown via third-party extensions. Both of these are powerful and straightforward to use, and have functionality for complex documentation and publishing workflows. They both build upon Docutils to parse and write documents.

One problem that I encountered with my approach is that part of the document appears sort of like a picture. In the Deprecations section one frequently encounters this:

For my own purposes, I wanted to capture this text but when you try and copy from the PDF it behaves like a picture rather than text. The rest of the document behaves like text and can be selected. I do not know what this is all about. AcrobatPro includes the ability to OCR the content of a PDF and it you invoke this, the text now is selectable. I presume it is grabbing the text out of the picture.

In partial answer to Ian’s question, AcrobatPro can also turn a PDF into a Word document. This is done vary successfully working with the documentation PDF’s that I made available. But curiously, even if you first OCR the PDF, when you deal with the created Word document, this Warning is non-selectable text. It is basically a picture. You can rotate it etc.

This was turning into such a rabbit hole that I gave up. Briefly Gabriel had available a version that was relatively “crude” but it seemed to preserve the ability to select the text in a Warning. I don’t know how this was done.

Back to Jean-Yves question, as I stated, you can create a PDF from any web page. So then the task becomes sequentially going through the ~1700 web pages and extracting the PDF’s from them. AcrobatPro is happy to recombine these into a single PDF. The remaining issue is that this is going to take a long time to manually go through the 1700 web pages. So I automated it with Keyboard Maestro. It is still a lengthy process that I run over night.

Robert_Livingston · December 14, 2023, 6:26pm

I found that the links inside the pdfs sends to the html page on the web, and it should send to the same place but in the pdf file… don’t know if you can fix that easily ?

I agree that it would be better under many circumstances, but I do not know how. For my use, I have split the Documentation into a few separate PDF’s. Anyone could recombine them into one, but if you don’t do this then many links would have no hope of functioning.

The document also includes links to the web outside of the documentation itself and they could not work either.

Gabriel_L · December 14, 2023, 7:21pm

Here is a python solution for anyone that wants to transform lots of html files, found in a folder (and it’s subfolders), to a single PDF file. It should create a PDF file with categories and easy to read.

# first: pip install beautifulsoup4 pdfkit
# second: https://wkhtmltopdf.org/downloads.html <- install it

import os
import pdfkit
from bs4 import BeautifulSoup
from pdfkit.configuration import Configuration

def html_to_pdf(input_directory, output_directory, output_pdf_name):
    print("Starting PDF generation process...")
    
    # Set the path to the wkhtmltopdf executable
    config = Configuration(wkhtmltopdf=r'C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe')

    options = {
        'page-size': 'Letter',
        'encoding': "UTF-8",
        'zoom': '1.0',  # Adjust as needed to scale content
        'viewport-size': '1280x1024',  # Adjust viewport size if necessary
        'margin-top': '10mm',  # Adjust margins to fit tables better
        'margin-right': '10mm',
        'margin-bottom': '10mm',
        'margin-left': '10mm',
        'no-images': '',  # Disable loading of images
        'disable-external-links': '',  # Disable all external links
        'disable-internal-links': '',  # Disable all internal links
        'grayscale': ''  # Convert to grayscale
    }

    combined_html = "<html><body>"

    file_count = 0
    for root, dirs, files in os.walk(input_directory):
        for file in files:
            if file.endswith(".html") and file.lower() != "index.html":  # Skip index.html files
                file_path = os.path.join(root, file)
                print(f"Processing file: {file_path}")
                with open(file_path, 'r', encoding='utf-8') as f:
                    soup = BeautifulSoup(f, 'html.parser')
                    article_body = soup.find('div', itemprop='articleBody')
                    if article_body:
                        combined_html += str(article_body)
                        print(f"Added contents of {file}")
                    else:
                        print(f"'articleBody' div not found in {file}")
                file_count += 1

    combined_html += "</body></html>"

    if file_count == 0:
        print("No HTML files found in the directory. Exiting.")
        return

    # Create the output directory if it doesn't exist
    if not os.path.exists(output_directory):
        os.makedirs(output_directory)

    output_pdf_path = os.path.join(output_directory, output_pdf_name)
    print("Generating PDF...")
    pdfkit.from_string(combined_html, output_pdf_path, options=options, configuration=config, verbose=True)
    print(f"PDF generation complete. File saved as: {output_pdf_path}")

# Using the specified directory paths
input_directory = r'C:\Users\gabri\Desktop\docs\deprecated_class_members'  # Directory containing HTML files
output_directory = r'C:\Users\gabri\Desktop\docs\pdf_output'  # Directory for the output PDF
output_pdf_name = 'deprecated_class_members.pdf'  # Name of the output PDF file
html_to_pdf(input_directory, output_directory, output_pdf_name)

Jean-Yves_Pochez · December 14, 2023, 8:37pm

nice !
when I try it on the xojo doc folder, it gives me an obscure youtube not found error ?
any way to avoid this ?

Error: Failed to load https://www.youtube.com/youtubei/v1/log_event?alt=json&key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8, with network status code 299 and http status code 413 - Error downloading https://www.youtube.com/youtubei/v1/log_event?alt=json&key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8 - server replied: Request Entity Too Large

Gilles_Plante · December 14, 2023, 9:56pm

In the xojo doc folder, is there anything related to YouTube ? Could it be a file that should not be processed was processed ?

@Gabriel_L , what version of Python did you use ? I an surprised that

combined_html += "</body></html>"

works, at some point in time “+=” (pre-incrrement) were not implemented in Python.