FWIW if you are interested in such tooling consider also
soffice
andpandoc
which have (as far as I can tell) similar features but have been existing for years now and are not related to Microsoft.Edit: not related to Microsoft AND Google, seems the transcription aspect (which IMHO is still weird in that context but OK) is done via Google servers, cf https://lemmy.ml/post/23629310/15586865
The single exception to this (which is actually buried fairly deep in the feature list) is the audio transcription tool. I didn’t take a closer look at what is used to perform this, but at least it’s not “just” document conversion like pandoc.
audio transcription tool
Thanks for the clarification but I’m a bit confused here, like audio transcription, STT, done by e.g. Whisper? If so what’s the use case? When I think of Office documents audio transcription is not something I have in mind.
PS: related, asked on Github too https://github.com/microsoft/markitdown/issues/20#issuecomment-2544630753
You should open a fresh issue for questions like that instead of asking on an unrelated one.
I’m not completely clear either on how Microsoft have implemented this previously. As I said, I didn’t look very deep into the repository.
If these are indeed other Python projects they piled together, as others suggest, I’d be happy to hear what speech recognition library this might’ve built on.
Huh, Beautiful Soup is still relevant. I was using it twenty years ago when it first came out.
FYI the link in your comment got cut off before the last bracket so it’s not linking to the wiki page directly.
Fixed, thanks. Though it’s 4 days later, so I’m not sure it will help anyone 🤷
This could be useful to me. A while ago I was trying to make something that take all unread posts from my feed reader, make an epub out of them and then put it behind an OPDS server.
I found converting HTML from RSS to first markdown and then compiling them to an epub the most reliable way to take out the unnecessary markup from the source HTML. I used pandoc for this.
I used pandoc for this.
Please come back and share if it’s done better or worst and if so along which dimensions. Quite curious to better understand the differences.
oh yeah that’s definitely a good use case